Skip to content

Gaudi: Fix llava-next and mllama crash issue#3127

Merged
regisss merged 3 commits intohuggingface:mainfrom
yuanwu2017:fix
Mar 25, 2025
Merged

Gaudi: Fix llava-next and mllama crash issue#3127
regisss merged 3 commits intohuggingface:mainfrom
yuanwu2017:fix

Conversation

@yuanwu2017
Copy link
Contributor

What does this PR do?

In text-generation-inference main branch, the warmup request only includes one image and texts of max prefill batch size。 The backend needs to fill in the number of pictures, otherwise model AutoProcessor will make an error. In tgi-gaudi, I made a patch for fixing it. huggingface@4021019

Fixes # (issue)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

Signed-off-by: yuanwu <yuan.wu@intel.com>
@yuanwu2017
Copy link
Contributor Author

@regisss @baptistecolle Please help to review.

@yuanwu2017 yuanwu2017 changed the title Fix llava-next and mllama crash issue Gaudi: Fix llava-next and mllama crash issue Mar 20, 2025
@yuanwu2017
Copy link
Contributor Author

yuanwu2017 commented Mar 20, 2025

llava-next:
Please refer to the following doc.
https://github.com/huggingface/tgi-gaudi?tab=readme-ov-file#llava-v16-mistral-7b-on-1-card

volume=$PWD/data   # share a volume with the Docker container to avoid downloading weights every run
model=llava-hf/llava-v1.6-mistral-7b-hf
docker run -it -p $port:80 \
   --runtime=habana \
   -v $volume:/data \
   -v $PWD:/workspace \
   -v $PWD/quant_files/quantization_config:/usr/src/quantization_config \
   -v $PWD/quant_files/llava-v1.6-mistral-7b-hf-1x/hqt_output:/usr/src/hqt_output \
   -e http_proxy=${http_proxy}     -e https_proxy=${https_proxy} -e no_proxy=${no_proxy} \
   -e HUGGINGFACE_HUB_CACHE=/data/hub \
   -e HUGGING_FACE_HUB_TOKEN=$hf_token \
   -e HABANA_VISIBLE_DEVICES=6,7 \
   -e OMPI_MCA_btl_vader_single_copy_mechanism=none \
   -e PT_HPU_ENABLE_LAZY_COLLECTIVES=true \
   -e HF_HUB_ENABLE_HF_TRANSFER=1 \
   -e ENABLE_HPU_GRAPH=true \
   -e LIMIT_HPU_GRAPH=true \
   -e USE_FLASH_ATTENTION=true \
   -e FLASH_ATTENTION_RECOMPUTE=true \
   --cap-add=sys_nice \
   --ipc=host \
   $image --model-id $model \
     --max-input-tokens 4096 --max-batch-prefill-tokens 65536  \
     --max-total-tokens 8192 --max-batch-size 16

mllama

model=meta-llama/Llama-3.2-11B-Vision-Instruct
docker run -it -p $port:80 \
   --runtime=habana \
   -v $volume:/data \
   -e HUGGINGFACE_HUB_CACHE=/data/hub \
   -e HABANA_VISIBLE_DEVICES=6,7 \
   -e HUGGING_FACE_HUB_TOKEN=$hf_token \
   -e OMPI_MCA_btl_vader_single_copy_mechanism=none \
   -e PT_HPU_ENABLE_LAZY_COLLECTIVES=true \
   -e HF_HUB_ENABLE_HF_TRANSFER=1 \
   -e ENABLE_HPU_GRAPH=true \
   -e LIMIT_HPU_GRAPH=true \
   -e USE_FLASH_ATTENTION=true \
   -e FLASH_ATTENTION_RECOMPUTE=true \
   --cap-add=sys_nice \
   --ipc=host \
   tgi-gaudi:latest --model-id $model \
      --max-input-tokens 512 --max-batch-prefill-tokens 32768 \
      --max-total-tokens 1024 --max-batch-size 64

Copy link
Contributor

@baptistecolle baptistecolle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing the crash issues! 🙌 Aside from a few minor nits, everything looks good to me.

Also, TGI has some styling requirements that were disabled in the fork. I’m adding styling tests for all backend folders: PR #3128.

You can run them locally by executing:

pre-commit install  
pre-commit run --all-files  

Let me know if you have any questions! 😊

@yuanwu2017
Copy link
Contributor Author

Thanks for fixing the crash issues! 🙌 Aside from a few minor nits, everything looks good to me.

Also, TGI has some styling requirements that were disabled in the fork. I’m adding styling tests for all backend folders: PR #3128.

You can run them locally by executing:

pre-commit install  
pre-commit run --all-files  

Let me know if you have any questions! 😊

Done.

Signed-off-by: yuanwu <yuan.wu@intel.com>
Signed-off-by: yuanwu <yuan.wu@intel.com>
Copy link
Contributor

@baptistecolle baptistecolle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yuanwu2017
Copy link
Contributor Author

image @baptistecolle This may be a permission issue, and it has nothing to do with my modification.

@baptistecolle
Copy link
Contributor

image @baptistecolle This may be a permission issue, and it has nothing to do with my modification.

Yes, this is indeed a permission issue that is not linked to the PR. The authentification token cannot be fetched on external fork, this is a security feature to prevent leakage.

Copy link
Collaborator

@regisss regisss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@regisss
Copy link
Collaborator

regisss commented Mar 25, 2025

cc @Narsil

@regisss regisss merged commit f5f14dc into huggingface:main Mar 25, 2025
0 of 32 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants