Skip to content

Conversation

@BramVanroy
Copy link
Collaborator

@BramVanroy BramVanroy commented Sep 29, 2019

  • This commit replaces references to PyTorch activation functions/modules by a dict of functions that lives in modeling_utils. This ensures that all activation functions are available to all modules, praticularly custom functions such as swish and new_gelu.
  • In addition, when available (PT1.2) the native PyTorch gelu function will be used - it supports a CPP/CUDA implementation.

NOTE that this replaces all nn.Module's by bare functions except for one which was required for testing to be of the type nn.Module. If requested, this can be reverted so that only function calls are replaced by ACT2FN functions, and that existing nn.Modules are untouched.

NOTE that one would thus also expect that all usages of activation functions are taken from ACT2FN for consistency's sake.

NOTE since the Module counter-part of PyTorch's GeLU isn't available (yet), it might be worth waiting to implement this pull, and then use Modules and functions in the right places where one would expect, i.e. Module when part of architecture, function when processing other kinds of data.

Bram Vanroy added 2 commits September 28, 2019 12:52
* This commit replaces references to activation functions/modules by a dict of functions that lives in `modeling_utils`. This ensures that all activation functions are available to all modules.
* In addition, when available the native PyTorch gelu function will be used.
@codecov-io
Copy link

codecov-io commented Sep 29, 2019

Codecov Report

Merging #1371 into master will increase coverage by 0.97%.
The diff coverage is 85.71%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1371      +/-   ##
==========================================
+ Coverage   83.76%   84.74%   +0.97%     
==========================================
  Files          84       84              
  Lines       12596    12559      -37     
==========================================
+ Hits        10551    10643      +92     
+ Misses       2045     1916     -129
Impacted Files Coverage Δ
transformers/modeling_openai.py 80.41% <100%> (ø) ⬆️
transformers/modeling_xlnet.py 72.02% <100%> (+0.77%) ⬆️
transformers/modeling_gpt2.py 83.88% <100%> (-0.11%) ⬇️
transformers/modeling_roberta.py 71.01% <100%> (+5.54%) ⬆️
transformers/modeling_distilbert.py 95.8% <100%> (-0.03%) ⬇️
transformers/modeling_transfo_xl.py 75.16% <100%> (ø) ⬆️
transformers/modeling_bert.py 88.41% <100%> (+0.23%) ⬆️
transformers/modeling_xlm.py 88.36% <100%> (-0.07%) ⬇️
transformers/modeling_transfo_xl_utilities.py 54.16% <37.5%> (+0.27%) ⬆️
transformers/modeling_utils.py 92.57% <90%> (-0.12%) ⬇️
... and 7 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ae50ad9...716d783. Read the comment docs.

@stale
Copy link

stale bot commented Nov 29, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Nov 29, 2019
@BramVanroy
Copy link
Collaborator Author

Unstale.

@stale stale bot removed the wontfix label Nov 29, 2019
@julien-c
Copy link
Member

julien-c commented Dec 5, 2019

Would feel syntactically cleaner if we could do ACT2FN.gelu() instead of a dict (also gives some IDE goodness like autocomplete) (I guess through a class or namespace or something), what do you guys think?

@BramVanroy
Copy link
Collaborator Author

Would feel syntactically cleaner if we could do ACT2FN.gelu() instead of a dict (also gives some IDE goodness like autocomplete) (I guess through a class or namespace or something), what do you guys think?

Sounds good but note that this is not something I introduced. The ACT2FN dict already existed, but wasn't used consistently it seemed.

@julien-c
Copy link
Member

julien-c commented Dec 6, 2019

Ah yeah, I see. Would you want to do this change, if you have the time/bandwidth? (+ rebasing on current master so we can merge easily?)

@stale
Copy link

stale bot commented Mar 4, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Mar 4, 2020
@julien-c
Copy link
Member

julien-c commented Mar 4, 2020

AFAICT, this has been done by @sshleifer on master. Re-open if necessary!

@julien-c julien-c closed this Mar 4, 2020
kashif pushed a commit to kashif/transformers that referenced this pull request Nov 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants