Skip to content

fix: fp8 dimensions size#1739

Closed
dongs0104 wants to merge 1 commit intohuggingface:mainfrom
dongs0104:patch-2
Closed

fix: fp8 dimensions size#1739
dongs0104 wants to merge 1 commit intohuggingface:mainfrom
dongs0104:patch-2

Conversation

@dongs0104
Copy link
Contributor

@dongs0104 dongs0104 commented Apr 13, 2024

fp8 quantization currently limited to tensors with shapes where both dimensions are divisible by 16.

Hello @Narsil ,
when I was using fp8 quantize on H100, I get some error which is size is not divisible by 16,

check filter_out_small_unaligned_layers https://github.com/pytorch-labs/float8_experimental/blob/ac065d09a6259574a85027edc84eb647dc6c90c2/float8_experimental/float8_linear_utils.py#L82-L93

they check layers size, this thing is when user requests is not batched by 16 this make pad for dummy requests

What does this PR do?

Fixes # (issue)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@Narsil

fp8 quantization currently limited to tensors with shapes where both dimensions are divisible by 16.
@OlivierDehaene OlivierDehaene requested a review from Narsil April 15, 2024 09:08
@dongs0104
Copy link
Contributor Author

Hello @Narsil ,
when I was using fp8 quantize on H100, I get some error which is size is not divisible by 16,

check filter_out_small_unaligned_layers https://github.com/pytorch-labs/float8_experimental/blob/ac065d09a6259574a85027edc84eb647dc6c90c2/float8_experimental/float8_linear_utils.py#L82-L93

they check layers size, this thing is when user requests is not batched by 16 this make pad for dummy requests

@Narsil
Copy link
Contributor

Narsil commented Apr 30, 2024

Hi @dongs0104 This depends on your torch version, torch nightly (I think 2.2.2 also) does not require the padding.

Adding extra padding KILLS performance by a huge factor (current implementation is still slower than fp16 for some reason but at least comparable).
Since we're updating torch, this issue should go away that way: #1730

@dongs0104
Copy link
Contributor Author

Adding extra padding KILLS performance by a huge factor (current implementation is still slower than fp16 for some reason but at least comparable). Since we're updating torch, this issue should go away that way: #1730

@Narsil I agree with you padding make performance less,

also I used on v2.0.1 TGI version, which is using torch 2.1.1 so it will be solve when #1730 is merged, so this PR you can close. :)

@dongs0104 dongs0104 closed this Apr 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants