Added some of the packages I maintain on my personal system

author: danix <danix@danix.xyz> 2026-03-31 09:34:59 +0200
committer: danix <danix@danix.xyz> 2026-03-31 09:34:59 +0200
commit: e51dcfd426d0bb475a3d12eed8d54a46b0f17444 (patch)
tree: 153619df5af5632abd399a8baa51b7b9eb47a828 /llama.cpp-vulkan/README
parent: e935183ec65007f4ae24ac41aab8dfbd15e28af4 (diff)
download: my-slackbuilds-e51dcfd426d0bb475a3d12eed8d54a46b0f17444.tar.gz
my-slackbuilds-e51dcfd426d0bb475a3d12eed8d54a46b0f17444.zip
1 files changed, 22 insertions, 0 deletions
diff --git a/llama.cpp-vulkan/README b/llama.cpp-vulkan/README
new file mode 100644
index 0000000..5509d44
--- /dev/null
+++ b/llama.cpp-vulkan/README
@@ -0,0 +1,22 @@
+llama.cpp
+
+LLM inference in C/C++
+
+The main goal of llama.cpp is to enable LLM inference with minimal
+setup and state-of-the-art performance on a wide range of hardware
+locally and in the cloud.
+
+ - Plain C/C++ implementation without any dependencies
+ - Apple silicon is a first-class citizen - optimized via ARM NEON,
+   Accelerate and Metal frameworks
+ - AVX, AVX2, AVX512 and AMX support for x86 architectures
+ - RVV, ZVFH, ZFH, ZICBOP and ZIHINTPAUSE support for RISC-V
+   architectures
+ - 1.5-bit, 2-bit, 3-bit, 4-bit, 5-bit, 6-bit, and 8-bit integer
+   quantization for faster inference and reduced memory use
+ - Custom CUDA kernels for running LLMs on NVIDIA GPUs (support for
+   AMD GPUs via HIP and Moore Threads GPUs via MUSA)
+ - Vulkan and SYCL backend support
+ - CPU+GPU hybrid inference to partially accelerate models larger than
+   the total VRAM capacity
+
author	danix <danix@danix.xyz>	2026-03-31 09:34:59 +0200
committer	danix <danix@danix.xyz>	2026-03-31 09:34:59 +0200
commit	e51dcfd426d0bb475a3d12eed8d54a46b0f17444 (patch)
tree	153619df5af5632abd399a8baa51b7b9eb47a828 /llama.cpp-vulkan/README
parent	e935183ec65007f4ae24ac41aab8dfbd15e28af4 (diff)
download	my-slackbuilds-e51dcfd426d0bb475a3d12eed8d54a46b0f17444.tar.gz my-slackbuilds-e51dcfd426d0bb475a3d12eed8d54a46b0f17444.zip