Skip to content

[Feature]: AutoDeploy: investigate fp8 gemms that do not have pad to 16 requirement #8814

@lucaslie

Description

@lucaslie

🚀 The feature, motivation and pitch

Follow-up to #8811

Instead of padding the existing fp8 GEMM kernel we use, let's investigate alternative kernels (e.g. trtllm fp8 kernel) that can handle unpadded shapes that are not mod 16 natively

Alternatives

No response

Additional context

No response

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and checked the documentation and examples for answers to frequently asked questions.

Metadata

Metadata

Labels

AutoDeploy<NV> AutoDeploy BackendCustomized kernels<NV>Specialized/modified CUDA kernels in TRTLLM for LLM ops, beyond standard TRT. Dev & perf.

Type

Projects

Status

In review

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions