Skip to content

Adding FasterTransformer#103

Merged
moconnor725 merged 1 commit intoNVIDIA:masterfrom
lxp121:master
Jul 13, 2019
Merged

Adding FasterTransformer#103
moconnor725 merged 1 commit intoNVIDIA:masterfrom
lxp121:master

Conversation

@lxp121
Copy link
Collaborator

@lxp121 lxp121 commented Jul 13, 2019

FasterTransformer is a faster transformer layer inference implementation for BERT and other transformer based models.

The Faster Transformer implements an equivalent but highly optimized BERT transformer layer for inference. On Volta and Turing GPUs, FP16 precision is used automatically to access the computing power of tensor cores.

Faster Transformer is built on top of the CUDA and cuBLAS. It supports three kinds of sequence lengths, 32, 64 and 128. Two key parameters of the transformer layer, the number of heads and the size of each head, are passed in runtime. Thus, not only the BERT Base (12 heads * 64 per head) , but also customized models like 4 heads * 32 per head and 8 heads * 96 per heads, are well supported. Our implementation shows good speedups on both small and large batch size cases.

C++ API, TensorRT plugin, and TensorFlow OP wrapper are available. You can easily integrate this optimized transformer layer into your TensorFlow or other inference service codes that built in native C++ or TensorRT. In addition to codes that illustrate the API invocations, we also provide a simple end-to-end BERT TensorFlow inference sample.

…entation for BERT and other transformer based models.
Copy link
Contributor

@moconnor725 moconnor725 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved

@moconnor725 moconnor725 merged commit 2cfd880 into NVIDIA:master Jul 13, 2019
PeganovAnton pushed a commit to PeganovAnton/DeepLearningExamples that referenced this pull request Sep 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants