🚀 The feature, motivation and pitch
Follow-up to #8811
Instead of padding the existing fp8 GEMM kernel we use, let's investigate alternative kernels (e.g. trtllm fp8 kernel) that can handle unpadded shapes that are not mod 16 natively
Alternatives
No response
Additional context
No response
Before submitting a new issue...