aboutsummaryrefslogtreecommitdiffstats
path: root/llama.cpp-vulkan/README
diff options
context:
space:
mode:
authorDanilo Macrì <danix@danix.xyz>2026-04-03 12:20:05 -0400
committerGitHub <noreply@github.com>2026-04-03 12:20:05 -0400
commita7976bfe662097273e91471e2609df2d30120656 (patch)
treec54b2a6d28a89333b771bdee05e6baa45fe0c94f /llama.cpp-vulkan/README
parent1045963959ddfb697898fa90476f837aae4e2881 (diff)
parentebb26eac2948e02def3c7ac1ac23c4ecd345a5a7 (diff)
downloadmy-slackbuilds-a7976bfe662097273e91471e2609df2d30120656.tar.gz
my-slackbuilds-a7976bfe662097273e91471e2609df2d30120656.zip
Merge pull request #5 from danixland/restructure-flat-layout
repo: flatten layout — move packages to root, extras to .extras/
Diffstat (limited to 'llama.cpp-vulkan/README')
-rw-r--r--llama.cpp-vulkan/README22
1 files changed, 22 insertions, 0 deletions
diff --git a/llama.cpp-vulkan/README b/llama.cpp-vulkan/README
new file mode 100644
index 0000000..5509d44
--- /dev/null
+++ b/llama.cpp-vulkan/README
@@ -0,0 +1,22 @@
+llama.cpp
+
+LLM inference in C/C++
+
+The main goal of llama.cpp is to enable LLM inference with minimal
+setup and state-of-the-art performance on a wide range of hardware
+locally and in the cloud.
+
+ - Plain C/C++ implementation without any dependencies
+ - Apple silicon is a first-class citizen - optimized via ARM NEON,
+ Accelerate and Metal frameworks
+ - AVX, AVX2, AVX512 and AMX support for x86 architectures
+ - RVV, ZVFH, ZFH, ZICBOP and ZIHINTPAUSE support for RISC-V
+ architectures
+ - 1.5-bit, 2-bit, 3-bit, 4-bit, 5-bit, 6-bit, and 8-bit integer
+ quantization for faster inference and reduced memory use
+ - Custom CUDA kernels for running LLMs on NVIDIA GPUs (support for
+ AMD GPUs via HIP and Moore Threads GPUs via MUSA)
+ - Vulkan and SYCL backend support
+ - CPU+GPU hybrid inference to partially accelerate models larger than
+ the total VRAM capacity
+