fix: fp8 dimensions size by dongs0104 · Pull Request #1739 · huggingface/text-generation-inference

dongs0104 · 2024-04-13T08:27:54Z

fp8 quantization currently limited to tensors with shapes where both dimensions are divisible by 16.

Hello @Narsil ,
when I was using fp8 quantize on H100, I get some error which is size is not divisible by 16,

check filter_out_small_unaligned_layers https://github.com/pytorch-labs/float8_experimental/blob/ac065d09a6259574a85027edc84eb647dc6c90c2/float8_experimental/float8_linear_utils.py#L82-L93

they check layers size, this thing is when user requests is not batched by 16 this make pad for dummy requests

What does this PR do?

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@Narsil

fp8 quantization currently limited to tensors with shapes where both dimensions are divisible by 16.

dongs0104 · 2024-04-23T21:55:47Z

Hello @Narsil ,
when I was using fp8 quantize on H100, I get some error which is size is not divisible by 16,

check filter_out_small_unaligned_layers https://github.com/pytorch-labs/float8_experimental/blob/ac065d09a6259574a85027edc84eb647dc6c90c2/float8_experimental/float8_linear_utils.py#L82-L93

they check layers size, this thing is when user requests is not batched by 16 this make pad for dummy requests

Narsil · 2024-04-30T10:20:57Z

Hi @dongs0104 This depends on your torch version, torch nightly (I think 2.2.2 also) does not require the padding.

Adding extra padding KILLS performance by a huge factor (current implementation is still slower than fp16 for some reason but at least comparable).
Since we're updating torch, this issue should go away that way: #1730

dongs0104 · 2024-04-30T11:02:08Z

Adding extra padding KILLS performance by a huge factor (current implementation is still slower than fp16 for some reason but at least comparable). Since we're updating torch, this issue should go away that way: #1730

@Narsil I agree with you padding make performance less,

also I used on v2.0.1 TGI version, which is using torch 2.1.1 so it will be solve when #1730 is merged, so this PR you can close. :)

fix: fp8 dimensions size

c07f54a

fp8 quantization currently limited to tensors with shapes where both dimensions are divisible by 16.

OlivierDehaene requested a review from Narsil April 15, 2024 09:08

dongs0104 closed this Apr 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: fp8 dimensions size#1739

fix: fp8 dimensions size#1739
dongs0104 wants to merge 1 commit intohuggingface:mainfrom
dongs0104:patch-2

dongs0104 commented Apr 13, 2024 •

edited

Loading

Uh oh!

dongs0104 commented Apr 23, 2024

Uh oh!

Narsil commented Apr 30, 2024

Uh oh!

dongs0104 commented Apr 30, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dongs0104 commented Apr 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

dongs0104 commented Apr 23, 2024

Uh oh!

Narsil commented Apr 30, 2024

Uh oh!

dongs0104 commented Apr 30, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dongs0104 commented Apr 13, 2024 •

edited

Loading