Fix FinchPress for Qwen models familiy by alessiodevoto · Pull Request #82 · NVIDIA/kvpress

alessiodevoto · 2025-06-18T10:59:50Z

This addresses #80 by :

adding a special case for Qwen (the eos_token strategy is not suitable, so we use another special token)
testing that all tokenizers will work with FinchPress (i.e. whether tokenizer has either bos_token or suitable special token)

I'm not sure about the tokenizer testing as if a user wants to run make test, they need to have access and download all the possible tokenizers. I would either (a) make this test optional (if possible) (b) remove it and just raise an Error in FinchPress

Signed-off-by: alessiodevoto <devoto.alessio@gmail.com>

alessiodevoto · 2025-06-18T15:41:31Z

Now in FinchPress we add a special token and use that to (a) delimit the context (b) check whether we are in prefilling stage or not, this way we use the same approach for all models

evaluation/evaluate.py

kvpress/presses/finch_press.py

pyproject.toml

Signed-off-by: alessiodevoto <devoto.alessio@gmail.com>

pyproject.toml

SimJeg · 2025-06-19T12:27:44Z

@alessiodevoto thanks for your work 🙏 Before we merge:

@alessiodevoto could you confirm you can run evaluation with FinchPress + Qwen3 ?
@giulio98 could you review the code and confirm it's ok for you ?

alessiodevoto · 2025-06-19T12:42:55Z

@SimJeg yes, tested on Qwen2.5 and Qwen3 👍

Signed-off-by: alessiodevoto <devoto.alessio@gmail.com>

giulio98 · 2025-06-20T13:01:43Z

Hello @alessiodevoto @SimJeg , I did first pass on the code please here

kvpress/kvpress/presses/finch_press.py

Line 140 in 3b3b842

model.resize_token_embeddings(len(tokenizer))

do not resize model embeddings as it can produce a silent bug if the delimiter token is not correctly removed from the output embeddings.

SimJeg · 2025-06-20T13:43:19Z

I did first pass on the code please here

please make all your comments

do not resize model embeddings as it can produce a silent bug if the delimiter token is not correctly removed from the output embeddings.

can you elaborate: which bug ? while would the delimiter not be correctly removed ?

kvpress/presses/finch_press.py

alessiodevoto · 2025-06-20T14:03:09Z

Hi @giulio98 thanks for the review! I'm not entirely sure I get your concern, but I believe the "silent bug" you're referring to shouldn't occur here since we: 1) explicitly check for exactly one delimiter token, and 2) remove it directly from the embeddings. Please let me know if I'm missing something!

Signed-off-by: alessiodevoto <devoto.alessio@gmail.com>

giulio98 · 2025-06-20T15:39:51Z

Hi @alessiodevoto @SimJeg,

I tested the current version of this PR on LongBench NarrativeQA for Qwen2.5-7B-Instruct using the following YARN scaling config:

model_kwargs.update({
    "max_position_embeddings": 131072,
    "rope_scaling": {
        "factor": 4.0,
        "original_max_position_embeddings": 32768,
        "type": "yarn"
    }
})

I got a very low score (6.84) and thought it was due to this warning when using resize_token_embeddings

The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance...

But instead the implementation is correct because the delimiter token embeddings is correctly removed from output. After further inspection, I realized the problem was caused by the rerotation logic, which didn’t account for rope_scaling. I tried to fix it with the following code, which should now be agnostic to both LLaMA 3 and YARN scaling factors:

@staticmethod
def _rerotate_cos_sin(x, inv_freq, important_pos_batch):
    B, H, L = important_pos_batch.shape
    device = important_pos_batch.device
    device_type = x.device.type
    dtype = x.dtype
    idx = torch.arange(0, L, device=device).unsqueeze(0)
    inv_freq = inv_freq[None, None, :, None].float().expand(B, H, -1, 1)
    idx = idx[:, None, :].float().expand(B, H, L)
    delta_pos = (idx - important_pos_batch).unsqueeze(2)

    device_type = device_type if isinstance(device_type, str) and device_type != "mps" else "cpu"

    with torch.autocast(device_type=device_type, enabled=False):
        freqs = delta_pos * inv_freq
        freqs = freqs.transpose(2, 3)
        emb = torch.cat((freqs, freqs), dim=-1)
        cos = emb.cos().contiguous()
        sin = emb.sin().contiguous()
    return cos.to(dtype=dtype), sin.to(dtype=dtype)

I then replaced this block (

kvpress/kvpress/presses/finch_press.py

Line 98 in a65cf49

indices = indices.unsqueeze(-1).expand(-1, -1, -1, module.head_dim)

):

indices = indices.unsqueeze(-1).expand(-1, -1, -1, module.head_dim)
# Rerotate keys
if self.rerotate_keys:
    cos, sin = kwargs["position_embeddings"]
    keys = (keys * cos.unsqueeze(1)) + (rotate_half(keys) * (-sin.unsqueeze(1)))
    keys = keys.gather(2, indices).contiguous()
    cos, sin = cos[:, : indices.shape[2]], sin[:, : indices.shape[2]]
    keys = (keys * cos.unsqueeze(1)) + (rotate_half(keys) * sin.unsqueeze(1))
else:
    keys = keys.gather(2, indices).contiguous()

with this version:

# Rerotate keys
if self.rerotate_keys:
    new_cos, new_sin = self._rerotate_cos_sin(keys, module.rotary_emb.inv_freq, indices)
    indices = indices.unsqueeze(-1).expand(-1, -1, -1, module.head_dim)
    keys = keys.gather(2, indices).contiguous()
    keys = (keys * new_cos) + (rotate_half(keys) * new_sin)
else:
    indices = indices.unsqueeze(-1).expand(-1, -1, -1, module.head_dim)
    keys = keys.gather(2, indices).contiguous()

Results

Model	Before Fix	After Fix	Δ
Llama3.1-8B-Instruct	30.47	30.59	+0.12
Qwen2.5-7B-Instruct (no YARN)	28.96	28.84	−0.12
Qwen2.5-7B-Instruct (YARN)	6.84	27.78	+20.94

Let me know if we should open a separate PR for this fix, or if you'd prefer integrating it here.
P.S.: I also observed degradation in expected_attention when using Qwen with YARN.

SimJeg · 2025-06-20T15:50:32Z

@giulio98, interesting finding thanks ! please open a new issue and / or a PR to fix FinchPress, ExpectedAttentionPress and KeyRerotationPress. I will merge this one

Signed-off-by: Max Jeblick <maximilianjeblick@gmail.com>

maxjeblick self-assigned this Jun 18, 2025

maxjeblick self-requested a review June 18, 2025 11:03

maxjeblick removed their assignment Jun 18, 2025

alessiodevoto added 3 commits June 18, 2025 17:22

fix finch + tokenizer testing

74501d3

Signed-off-by: alessiodevoto <devoto.alessio@gmail.com>

smarter eval

9384f44

Signed-off-by: alessiodevoto <devoto.alessio@gmail.com>

test tokenizers

7921a2f

Signed-off-by: alessiodevoto <devoto.alessio@gmail.com>

alessiodevoto force-pushed the fix-finch-sep branch 4 times, most recently from 283c2ad to 79cc835 Compare June 18, 2025 15:33

removed tokenizer testing + fixed delimiter token for Finch

4735ced

Signed-off-by: alessiodevoto <devoto.alessio@gmail.com>

alessiodevoto force-pushed the fix-finch-sep branch from 79cc835 to 4735ced Compare June 18, 2025 15:36

SimJeg reviewed Jun 19, 2025

View reviewed changes

finch press update for sep token

8ef4b0c

Signed-off-by: alessiodevoto <devoto.alessio@gmail.com>

SimJeg reviewed Jun 19, 2025

View reviewed changes

pyproject.toml Outdated Show resolved Hide resolved

SimJeg linked an issue Jun 19, 2025 that may be closed by this pull request

FinchPress not working on Qwen Model Family #80

Closed

remove newline

3b3b842

Signed-off-by: alessiodevoto <devoto.alessio@gmail.com>

SimJeg approved these changes Jun 19, 2025

View reviewed changes

SimJeg reviewed Jun 20, 2025

View reviewed changes

kvpress/presses/finch_press.py Outdated Show resolved Hide resolved

kvpress/presses/finch_press.py Outdated Show resolved Hide resolved

cleaner code

a65cf49

Signed-off-by: alessiodevoto <devoto.alessio@gmail.com>

SimJeg approved these changes Jun 20, 2025

View reviewed changes

SimJeg merged commit 97408ee into main Jun 20, 2025
3 checks passed

SimJeg deleted the fix-finch-sep branch June 20, 2025 15:50

maxjeblick pushed a commit that referenced this pull request Aug 12, 2025

Fix FinchPress for Qwen (#82)

f1e7372

Signed-off-by: Max Jeblick <maximilianjeblick@gmail.com>

Conversation

alessiodevoto commented Jun 18, 2025

Uh oh!

alessiodevoto commented Jun 18, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

SimJeg commented Jun 19, 2025

Uh oh!

alessiodevoto commented Jun 19, 2025

Uh oh!

giulio98 commented Jun 20, 2025

Uh oh!

SimJeg commented Jun 20, 2025

Uh oh!

Uh oh!

Uh oh!

alessiodevoto commented Jun 20, 2025

Uh oh!

giulio98 commented Jun 20, 2025

Results

Uh oh!

SimJeg commented Jun 20, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants