forked from ggml-org/llama.cpp
-
-
Notifications
You must be signed in to change notification settings - Fork 266
Pull requests: TheTom/llama-cpp-turboquant
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
sync: upstream master b9190 + MTP/spec stack (DO NOT MERGE — tester review)
AMD ZenDNN
Apple Metal
Ascend NPU
build
devops
documentation
Improvements or additions to documentation
examples
ggml
Hexagon
IBM zDNN
jinja parser
model
nix
Nvidia GPU
OpenCL
OpenVINO
python
script
server/ui
server
SYCL
testing
Vulkan
WebGPU
#146
opened May 17, 2026 by
TheTom
Owner
Loading…
5 of 9 tasks
vulkan: add TurboQuant KV cache support and optimized turbo mat-vec paths
ggml
Vulkan
#140
opened May 10, 2026 by
Fenix46
Loading…
fix(qwen35): support Qwen3.5:9B loading from Ollama GGUF
model
#135
opened May 8, 2026 by
Jordan-HS
Loading…
vendor: bump cpp-httplib to 0.43.2 (openssl 4.0.0 fix)
python
script
#121
opened May 4, 2026 by
TheTom
Owner
Loading…
1 of 3 tasks
HIP mixed TurboQuant vec FA on gfx900/gfx906
build
ggml
Nvidia GPU
#99
opened Apr 21, 2026 by
2bigO
Loading…
perf: turbo VEC flash attention — +9% decode on CUDA via autoresearch
ggml
Nvidia GPU
script
#53
opened Apr 4, 2026 by
signalnine
Loading…
7 tasks done
fix: HIP/ROCm compatibility — check cudaMemcpyToSymbol errors, guard …
ggml
Nvidia GPU
#41
opened Apr 1, 2026 by
terrysimons
•
Draft
ProTip!
Type g i on any issue or pull request to go back to the issue listing page.