Refine warmup and upgrade to synapse AI 1.21.0#3234
Refine warmup and upgrade to synapse AI 1.21.0#3234regisss merged 3 commits intohuggingface:mainfrom
Conversation
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
|
@regisss please help review |
|
docker run -p 8080:80 -v /scratch-2/huggingface/:/data --runtime=habana --rm -e ATTENTION=paged -e no_proxy=localhost --cap-add=sys_nice --ipc=host --name=tgi-bf16 tgi-gaudi:latest --model-id deepseek-ai/DeepSeek-R1 --sharded true --num-shard 8 --max-input-length 512 --max-total-tokens 1024 --kv-cache-dtype fp8_e4m3fn --max-batch-prefill-tokens 4096 --max-waiting-tokens 7 --waiting-served-ratio 1.2 --max-concurrent-requests 512 --max-batch-size 64 curl 127.0.0.1:8080/generate -X POST -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":100}}' -H 'Content-Type: application/json' |
|
"generated_text":" Deep Learning is a subset of machine learning that involves neural networks with three or more layers. These neural networks attempt to simulate the behavior of the human brain—though they are far from matching its ability—and allow it to “learn” from large amounts of data. While a neural network with a single layer can still make approximate predictions, additional hidden layers can help optimize the accuracy.\n\nDeep Learning drives many artificial intelligence (AI) applications and services that improve automation, performing tasks without human intervention. Examples include"} |

What does this PR do?
Fixes # (issue)
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.