[Feat] Add csrc/ascend NPU custom ops for GSA by leideng · Pull Request #729 · ModelEngine-Group/unified-cache-management

leideng · 2026-02-03T14:27:17Z

Purpose

Merge all new Ascend NPU custom ops in csrc/ascend into the develop branch. These ops enable GSA on NPU devices by providing:

npu_hamming_dist_top_k — Hamming-distance-based top-K for import KV selection (GQA and MLA variants).
npu_reshape_and_cache_bnsd — Reshape-and-cache for BNSD (batch × num_heads × seq × dim) layout on NPU.

The implementation follows the vLLM-Ascend build system and integrates with an independent Python package ucm_custom_ops. The usage is as follows

import ucm_custom_ops
torch.ops._C_ucm.npu_reshape_and_cache_bnsd(...)
torch.ops._C_ucm.npu_hamming_dist_top_k(...)

Modifications

New updated in `ucm/sparse/gsa_on_device/csrc/ascend` and `test/sparse/gsa`

Area	Description
Torch bindings	`torch_binding.cpp`, `torch_binding_meta.cpp` — Register both ops for `PrivateUse1` (NPU) with meta implementations for shape inference and graph capture.
Hamming dist top-K	`hamming_dist_top_k/` — Full op_host (tiling, split, proto) and op_kernel implementation.
Reshape and cache BNSD	`reshape_and_cache_bnsd/` — op_host and op_kernel for BNSD reshape-and-cache.
test/sparse/gsa/test_reshape_graph.py	test script for op `reshpae_and_cache_bnsd`
test/sparse/gsa/test_hamming_gqa.py	test script for op `hamming_dist_top_k` in GQA mode
test/sparse/gsa/test_hamming_mla.py	test script for op `hamming_dist_top_k` in MLA mode

NPU OPS APIs (summary)

npu_hamming_dist_top_k
(hashq, hashkCache, hashkCacheRope, topN, seqLen, chunk_size?, max_seq_len?, sink?, recent?, support_offload?, key_block_table?, mask?, indices?) -> Tensor
npu_reshape_and_cache_bnsd
(hashq, hashkCache, slot_mapping, seq_len, hashk_cache_out) -> Tensor

Test

Unit / integration tests (eager and graph):
- test/gsa/test_reshape_graph.py — test_reshape_and_cache_bnsd, test_reshape_and_cache_bnsd_graph
- test/gsa/test_hamming_gqa.py — test_hamming_dist_top_k_graph and eager path
- test/gsa/test_hamming_mla.py — test_hamming_dist_top_k_mla_eager, test_hamming_dist_top_k_mla_graph
Build: From repo root, bash csrc/ascend/build_aclnn.sh builds the custom op library; install_python_package.sh installs the wheel. In addition, you should execute source csrc/ascend/_ucm_ops_custom/vendors/ucm/bin/set_env.bash so import ucm_custom_ops and torch.ops._C_ucm.* work on NPU.

The screenshots and logs for testing both ops are also attached here.

test_reshape_graph_successful.log
test_hamming_mla_successful.log
test_hamming_gqa_successful.log

In addition, I have run offline inference with the new NPU ops, which has been succesful.

gsaondevice_02051759.log

…s and it can generate csrc/ascend/_ucm_ops_custom/vendors/ucm/op_api/lib/libcust_opapi.so

…d_aclnn.sh

leideng added 8 commits February 3, 2026 13:59

change kvcomp to gsa_on_device

d2def05

lagrely follow vllm-ascend build system, now bash build_aclnn.sh work…

6632d96

…s and it can generate csrc/ascend/_ucm_ops_custom/vendors/ucm/op_api/lib/libcust_opapi.so

prepare the python package, but it still has some problems...

6e709b6

Successfully installed ucm-custom-ops-1.1

7b73c99

add test scipts; seperate the python package install script from buil…

69acb9d

…d_aclnn.sh

udpate test scripts

9f9d4f5

typo fix

479131e

it works now for test_reshape_graph.py

3d4505c

leideng requested review from FangRun2, Tarrei, Wwwzff, mag1c-h and ygwpz as code owners February 3, 2026 14:27

leideng added 3 commits February 4, 2026 16:14

both bnsd and hamming_gqa works; hamming_mla does not work yet

030acec

test_hamming_mla works now!

620c0bf

change to new ops api

a492c1f

leideng requested review from Infinite666, wangwenxin0312 and wuhuxiao as code owners February 4, 2026 09:26

leideng added 3 commits February 4, 2026 17:27

move gsa test scripts to test/sparse/gsa

067eca5

point out source files from vllm-ascend

73e0c7f

move csrc/ascend into ucm/sparse/gsa_on_device

e58f256

leideng force-pushed the gsa_ops_v2 branch from 785e8ac to e58f256 Compare February 4, 2026 10:31

leideng added 7 commits February 4, 2026 18:34

update setup.py

da8b56d

pass pre-commit check

0d1696f

Merge branch 'develop' into gsa_ops_v2

c1e0f8d

move GSA test scripts from test/sparse into ucm/sparse/test

21db445

update licence info based on vllm-ascend's Apache 2.0 license

f610a8a

exclude gsa's csrc for spell check

fca4b86

typo correction

9c91520

leideng added 7 commits February 5, 2026 16:38

we should not have leading ./

1a1c5bd

pass clang-format-20 check

0e5e431

update format again

fd46cde

update format by clang-format 21.1.2

a51ed05

bug fix for headers in vllm-ascend-0.11rc1

2784ab4

Merge branch 'develop' into gsa_ops_v2

d317343

pass cpp linter

6817802

Infinite666 approved these changes Feb 6, 2026

View reviewed changes

ygwpz approved these changes Feb 6, 2026

View reviewed changes

ygwpz merged commit 183a263 into ModelEngine-Group:develop Feb 6, 2026
11 of 12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feat] Add csrc/ascend NPU custom ops for GSA#729

[Feat] Add csrc/ascend NPU custom ops for GSA#729
ygwpz merged 28 commits intoModelEngine-Group:developfrom
leideng:gsa_ops_v2

leideng commented Feb 3, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

leideng commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Modifications

New updated in ucm/sparse/gsa_on_device/csrc/ascend and test/sparse/gsa

NPU OPS APIs (summary)

Test

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

leideng commented Feb 3, 2026 •

edited

Loading

New updated in `ucm/sparse/gsa_on_device/csrc/ascend` and `test/sparse/gsa`