Skip to content

Tags: CodeLinaro/llama.cpp

Tags

b6775

Toggle b6775's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
gguf-py : add support for endian conversion of BF16 data (ggml-org#16594

)

BF16 requires special handling in this script
while it's a 2-bytes data, but view is 1-byte by default.
Switch to correct view before attempting byteswapping.

With this change correctly byteswapping models like
Meta-Llama-3-8B-Instruct-bf16-GGUF
should be possible.

b6745

Toggle b6745's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
metal : add opt_step_adamw and op_sum (ggml-org#16529)

* scaffold to support opt step adamw on metal (not written so far)

* add opt-step-adamw kernel for metal

* pass op->src[4] as a separate buffer to the pipeline

* add bounds check to opt-step-adamw kernel

* complete scaffold for GGML_OP_SUM

* naive GGML_OP_SUM kernel

* remove unwanted comment

* change OP_SUM capability gate

* Add has_simdgroup_reduction to both ops to pass CI

b6725

Toggle b6725's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
webui: updated the chat service to only include max_tokens in the req… (

ggml-org#16489)

* webui: updated the chat service to only include max_tokens in the request payload when the setting is explicitly provided, while still mapping explicit zero or null values to the infinite-token sentinel

* chore: update webui build output

b6713

Toggle b6713's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
server : fix cancel pending task (ggml-org#16467)

Co-authored-by: DevAI <DevAI@gmail.com>

b6700

Toggle b6700's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
llama : add --no-host to disable host buffers (ggml-org#16310)

* implement --no-host to disable host buffer

* fix equal_mparams

* move no-host enumeration order together with other model params

---------

Co-authored-by: slaren <slarengh@gmail.com>

b6664

Toggle b6664's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
CI: reenable cdna in rocm docker builds (ggml-org#16376)

b6661

Toggle b6661's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
ci: Properly install rocwmma for hip builds (ggml-org#16305)

* CI: Properly install rocwmma for hip builds

on windows we now windows install rocwmma from ubuntu pacakges

* CI: update linux rocm docker build to use rocm 7.0

b6550

Toggle b6550's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
ggml : implement set_rows with i32 index (ggml-org#16159)

* implement set_rows with i32 index

* template fix

* test quantized path

warnings--

* Apply suggestions from code review

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* forgotten name change

* deduplicate cuda/sycl and test-fix

* indent++

* vulkan: support set_rows with i32 index type (ggml-org#16162)

* disable i32 index for webgpu for now

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: Jeff Bolz <jbolz@nvidia.com>

b6451

Toggle b6451's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
ggml-backend : add GGML_BACKEND_DEVICE_TYPE_IGPU device type (ggml-or…

…g#15797)

* ggml-backend : add GGML_BACKEND_DEVICE_TYPE_IGPU device type

ggml-backend : add device id to device props

llama : only use iGPU devices if there are no GPU devices

llama : do not use multiple devices from different backends with the same device id

b6423

Toggle b6423's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
json : support `enum` values within `allOf` (ggml-org#15830)