Bump neuron SDK version by dacorvo · Pull Request #3260 · huggingface/text-generation-inference

dacorvo · 2025-06-10T08:39:00Z

What does this PR do?

This pull-request updates the neuron backend to use the latest optimum-neuron package that is based on AWS Neuron SDK 2.22.

Note that the modeling code has been heavily modified in optimum-neuron:

mistral and gpt2 are not supported anymore,
llama now uses a different modeling.

This allows to hide the differences between the two backends in terms of input parameters.

When on-device sampling is enabled, we need to emulate the greedy behaviour using top-k=1, top-p=1, temperature=1.

Narsil

LGTM !

dacorvo added 13 commits June 10, 2025 06:24

chore(neuron): bump version to 0.2.0

b094f02

refactor(neuron): use named parameters in inputs helpers

2eb2236

This allows to hide the differences between the two backends in terms of input parameters.

refactor(neuron): remove obsolete code paths

0b640f7

fix(neuron): use neuron_config whenever possible

83eadbb

fix(neuron): use new cache import path

c4dd2a8

fix(neuron): neuron config is not stored in config anymore

3989501

fix(nxd): adapt model retrieval to new APIs

b916076

fix(generator): emulate greedy in sampling parameters

4e8ffec

When on-device sampling is enabled, we need to emulate the greedy behaviour using top-k=1, top-p=1, temperature=1.

test(neuron): update models and expectations

bf529ef

feat(neuron): support on-device sampling

3e977bd

fix(neuron): adapt entrypoint

5d2b159

tests(neuron): remove obsolete models

2c8b0e3

fix(neuron): adjust test expectations for llama on nxd

d5bad17

dacorvo requested review from Narsil, danieldk, drbh and tengomucho June 10, 2025 09:37

tengomucho approved these changes Jun 10, 2025

View reviewed changes

Narsil approved these changes Jun 10, 2025

View reviewed changes

dacorvo merged commit 79183d1 into main Jun 10, 2025
31 of 33 checks passed

dacorvo deleted the neuron_use_nxd_backend branch June 10, 2025 15:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bump neuron SDK version#3260

Bump neuron SDK version#3260
dacorvo merged 13 commits intomainfrom
neuron_use_nxd_backend

dacorvo commented Jun 10, 2025 •

edited

Loading

Uh oh!

Narsil left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

dacorvo commented Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

Narsil left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dacorvo commented Jun 10, 2025 •

edited

Loading