Gaudi: Fix llava-next and mllama crash issue by yuanwu2017 · Pull Request #3127 · huggingface/text-generation-inference

yuanwu2017 · 2025-03-20T05:48:52Z

What does this PR do?

In text-generation-inference main branch, the warmup request only includes one image and texts of max prefill batch size。 The backend needs to fill in the number of pictures, otherwise model AutoProcessor will make an error. In tgi-gaudi, I made a patch for fixing it. huggingface@4021019

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

Signed-off-by: yuanwu <yuan.wu@intel.com>

yuanwu2017 · 2025-03-20T05:49:40Z

@regisss @baptistecolle Please help to review.

yuanwu2017 · 2025-03-20T07:32:18Z

llava-next:
Please refer to the following doc.
https://github.com/huggingface/tgi-gaudi?tab=readme-ov-file#llava-v16-mistral-7b-on-1-card

volume=$PWD/data   # share a volume with the Docker container to avoid downloading weights every run
model=llava-hf/llava-v1.6-mistral-7b-hf
docker run -it -p $port:80 \
   --runtime=habana \
   -v $volume:/data \
   -v $PWD:/workspace \
   -v $PWD/quant_files/quantization_config:/usr/src/quantization_config \
   -v $PWD/quant_files/llava-v1.6-mistral-7b-hf-1x/hqt_output:/usr/src/hqt_output \
   -e http_proxy=${http_proxy}     -e https_proxy=${https_proxy} -e no_proxy=${no_proxy} \
   -e HUGGINGFACE_HUB_CACHE=/data/hub \
   -e HUGGING_FACE_HUB_TOKEN=$hf_token \
   -e HABANA_VISIBLE_DEVICES=6,7 \
   -e OMPI_MCA_btl_vader_single_copy_mechanism=none \
   -e PT_HPU_ENABLE_LAZY_COLLECTIVES=true \
   -e HF_HUB_ENABLE_HF_TRANSFER=1 \
   -e ENABLE_HPU_GRAPH=true \
   -e LIMIT_HPU_GRAPH=true \
   -e USE_FLASH_ATTENTION=true \
   -e FLASH_ATTENTION_RECOMPUTE=true \
   --cap-add=sys_nice \
   --ipc=host \
   $image --model-id $model \
     --max-input-tokens 4096 --max-batch-prefill-tokens 65536  \
     --max-total-tokens 8192 --max-batch-size 16

mllama

model=meta-llama/Llama-3.2-11B-Vision-Instruct
docker run -it -p $port:80 \
   --runtime=habana \
   -v $volume:/data \
   -e HUGGINGFACE_HUB_CACHE=/data/hub \
   -e HABANA_VISIBLE_DEVICES=6,7 \
   -e HUGGING_FACE_HUB_TOKEN=$hf_token \
   -e OMPI_MCA_btl_vader_single_copy_mechanism=none \
   -e PT_HPU_ENABLE_LAZY_COLLECTIVES=true \
   -e HF_HUB_ENABLE_HF_TRANSFER=1 \
   -e ENABLE_HPU_GRAPH=true \
   -e LIMIT_HPU_GRAPH=true \
   -e USE_FLASH_ATTENTION=true \
   -e FLASH_ATTENTION_RECOMPUTE=true \
   --cap-add=sys_nice \
   --ipc=host \
   tgi-gaudi:latest --model-id $model \
      --max-input-tokens 512 --max-batch-prefill-tokens 32768 \
      --max-total-tokens 1024 --max-batch-size 64

baptistecolle

Thanks for fixing the crash issues! 🙌 Aside from a few minor nits, everything looks good to me.

Also, TGI has some styling requirements that were disabled in the fork. I’m adding styling tests for all backend folders: PR #3128.

You can run them locally by executing:

pre-commit install  
pre-commit run --all-files

Let me know if you have any questions! 😊

backends/gaudi/server/text_generation_server/models/custom_modeling/llava_next.py

yuanwu2017 · 2025-03-21T05:44:06Z

Thanks for fixing the crash issues! 🙌 Aside from a few minor nits, everything looks good to me.

Also, TGI has some styling requirements that were disabled in the fork. I’m adding styling tests for all backend folders: PR #3128.

You can run them locally by executing:
pre-commit install  
pre-commit run --all-files  
Let me know if you have any questions! 😊

Done.

Signed-off-by: yuanwu <yuan.wu@intel.com>

baptistecolle

LGTM

yuanwu2017 · 2025-03-24T01:03:40Z

@baptistecolle This may be a permission issue, and it has nothing to do with my modification.

baptistecolle · 2025-03-25T08:47:27Z

@baptistecolle This may be a permission issue, and it has nothing to do with my modification.

Yes, this is indeed a permission issue that is not linked to the PR. The authentification token cannot be fetched on external fork, this is a security feature to prevent leakage.

regisss

LGTM

regisss · 2025-03-25T14:08:04Z

cc @Narsil

Fix llava-next and mllama crash issue

bb55318

Signed-off-by: yuanwu <yuan.wu@intel.com>

yuanwu2017 changed the title ~~Fix llava-next and mllama crash issue~~ Gaudi: Fix llava-next and mllama crash issue Mar 20, 2025

baptistecolle requested review from baptistecolle and regisss March 20, 2025 10:43

baptistecolle suggested changes Mar 20, 2025

View reviewed changes

backends/gaudi/server/text_generation_server/models/custom_modeling/llava_next.py Outdated Show resolved Hide resolved

backends/gaudi/server/text_generation_server/models/custom_modeling/llava_next.py Outdated Show resolved Hide resolved

yuanwu2017 added 2 commits March 21, 2025 05:47

Fix the errors of style check

3c6630c

Signed-off-by: yuanwu <yuan.wu@intel.com>

Change default values

c6f97fd

Signed-off-by: yuanwu <yuan.wu@intel.com>

baptistecolle approved these changes Mar 21, 2025

View reviewed changes

regisss approved these changes Mar 25, 2025

View reviewed changes

regisss merged commit f5f14dc into huggingface:main Mar 25, 2025
0 of 32 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gaudi: Fix llava-next and mllama crash issue#3127

Gaudi: Fix llava-next and mllama crash issue#3127
regisss merged 3 commits intohuggingface:mainfrom
yuanwu2017:fix

yuanwu2017 commented Mar 20, 2025

Uh oh!

yuanwu2017 commented Mar 20, 2025

Uh oh!

yuanwu2017 commented Mar 20, 2025 •

edited

Loading

Uh oh!

baptistecolle left a comment

Uh oh!

Uh oh!

Uh oh!

yuanwu2017 commented Mar 21, 2025

Uh oh!

baptistecolle left a comment

Uh oh!

yuanwu2017 commented Mar 24, 2025

Uh oh!

baptistecolle commented Mar 25, 2025

Uh oh!

regisss left a comment

Uh oh!

regisss commented Mar 25, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

yuanwu2017 commented Mar 20, 2025

What does this PR do?

Before submitting

Who can review?

Uh oh!

yuanwu2017 commented Mar 20, 2025

Uh oh!

yuanwu2017 commented Mar 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

baptistecolle left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

yuanwu2017 commented Mar 21, 2025

Uh oh!

baptistecolle left a comment

Choose a reason for hiding this comment

Uh oh!

yuanwu2017 commented Mar 24, 2025

Uh oh!

baptistecolle commented Mar 25, 2025

Uh oh!

regisss left a comment

Choose a reason for hiding this comment

Uh oh!

regisss commented Mar 25, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

yuanwu2017 commented Mar 20, 2025 •

edited

Loading