Tags · dfriehs/llama.cpp

b2023

ggml : fix IQ3_XXS on Metal (ggml-org#5219)

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

llama : apply classifier-free guidance to logits directly (ggml-org#4951

)

CUDA: faster dequantize kernels for Q4_0 and Q4_1 (ggml-org#4938)

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

sync : ggml