Skip to content

Conversation

@heitorschueroff
Copy link
Contributor

@heitorschueroff heitorschueroff commented Jul 6, 2020

Implemented the SiLU activation function as discussed in #3169.

Copy link
Contributor

@glaringlee glaringlee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@albanD
This is an initial draft, should this be put in nn?

@vadimkantorov
Copy link
Contributor

Will there be a torch.nn.Swish module as well? (hopefully with inplace = True support)

@albanD
Copy link
Collaborator

albanD commented Jul 6, 2020

I think that if you want to have the full nn experience, we should add it to nn.functional and a Module that uses the functional version yes.
But if you want to leave it as a follow up up to keep this one small as it is a first task, it is fine as well!

@TFUsers
Copy link

TFUsers commented Jul 6, 2020

Please modify this pull request to rename the swish to the SiLU.

The swish was originally coined the "SiLU" in https://arxiv.org/pdf/1606.08415.pdf and https://arxiv.org/abs/1702.03118 long before the swish paper. Renaming other peoples' exact same ideas is unacceptable, as we all know.
A recent discussion and a recent jax and tensorflow issue show that we should call this the SiLU. In light of recent efforts to make the ML community more equitable and fair, this is a no-brainer.
PyTorch hasn't added the swish precisely because its renaming was unfair (just search previous pull requests). Changing the swish to the SiLU, as Jax just did and tensorflow will do, will put this problem to rest.

Here's a possible docstring:

    r"""silu(input) -> Tensor
    Applies element-wise the function
    :math:`\text{SiLU}(x) = x * \sigma(x)`
    where :math:`\sigma(x)` is the logistic sigmoid.
    See `Gaussian Error Linear Units (GELUs) <https://arxiv.org/abs/1606.08415>`_ where the SiLU (Sigmoid Linear Unit) was originally coined,
    and see `Sigmoid-Weighted Linear Units for Neural Network Function Approximation in Reinforcement Learning <https://arxiv.org/abs/1702.03118>`_ and `Swish: a Self-Gated Activation Function <https://arxiv.org/abs/1710.05941v1>`_ where the SiLU was experimented with later.
    """

@dr-ci
Copy link

dr-ci bot commented Jul 6, 2020

💊 CI failures summary and remediations

As of commit 20ea162 (more details on the Dr. CI page):


💚 💚 Looks good so far! There are no failures yet. 💚 💚


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 32 times.

@albanD
Copy link
Collaborator

albanD commented Jul 6, 2020

That sounds like a good point @TFUsers

@heitorschueroff could you make the following changes then:

  • Change all the namings in the current PR from swish to silu.
  • Add in torch/functional.py a new function like:
def swish(inp):
  """
  Some doc explaining it is called silu here and in torch.nn
  """
  return torch.silu(inp)

This is just to make sure that people looking for swish in our documentation page hit something that points them to what they're looking for.

Also the doc is missing from your PR:

  • Add entries for silu and silu_ in torch/_tensor_docs.py similar to the sigmoid ones.
  • Add entries for silu in torch/_torch_docs.py similar to the sigmoid one.
  • Make sure to add entries in docs/source/torch.rst next to sigmoid for both swish and silu.
  • Make sure to add entries in docs/source/tensor.rst for silu and silu_.

@gchanan
Copy link
Contributor

gchanan commented Jul 7, 2020

CC @mruberry: we have an alias mechanism, right? That we can use to alias swish to silu.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should swish have python_module: nn ?

@mruberry
Copy link
Collaborator

mruberry commented Jul 7, 2020

CC @mruberry: we have an alias mechanism, right? That we can use to alias swish to silu.

Not yet. File a follow-up issue and assign it to me and I'll add it when we finalize aliasing in a few weeks.

@gchanan
Copy link
Contributor

gchanan commented Jul 7, 2020

this is related to pytorch_module: nn, but I don't think we need to make this a method; relu isn't a method, for example.

@glaringlee
Copy link
Contributor

glaringlee commented Jul 7, 2020

@gchanan
relu has torch.relu, check here

result = torch.relu(input)

Should we add python_module:nn and add silu into nn/functional.py?
otherwise, silu can only be accessed through torch._C._nn.silu
or
Should we remove python_module:nn and just provide torch.silu?

@hendrycks
Copy link

hendrycks commented Jul 7, 2020

Should we remove python_module:nn and just provide torch.silu?

Originally torch.gelu (the activation function used in BERT) was provided, but then it became F.gelu.

@glaringlee
Copy link
Contributor

glaringlee commented Jul 7, 2020

To be consistent, let's do nn.functional.silu (F.silu)

@heitorschueroff heitorschueroff requested a review from apaszke as a code owner July 7, 2020 22:01
@heitorschueroff heitorschueroff linked an issue Jul 8, 2020 that may be closed by this pull request
@heitorschueroff heitorschueroff changed the title Added swish activation function Added SiLU activation function Jul 8, 2020
@heitorschueroff heitorschueroff added the module: nn Related to torch.nn label Jul 8, 2020
Copy link
Contributor

@glaringlee glaringlee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far looks good to me.
There are several things that need to add before ship this:

  1. For documents, please update the following two rst file to make it indexed.
    a. docs/source/nn.functional.rst
    In 'Non-linear activation functions' section, probably after hardsigmoid
    b. docs/source/nn.rst
    In `Non-linear Activations (weighted sum, nonlinearity) section, add SiLU in
    here is what it should looks like (you can click nn.SiLU and nn.functional.silu)
    https://glaringlee.github.io/search.html?q=silu&check_keywords=yes&area=default

    After the rst file get updated, you can try build the doc locally. Here is how.
    https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md#writing-documentation

  2. Update torch/_override.py
    Add silu into the list of get_testing_overrides(), this is the root cause of the CI test failure.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@heitorschueroff has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@glaringlee glaringlee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now.

@facebook-github-bot
Copy link
Contributor

@heitorschueroff merged this pull request in 75a4862.

@iiSeymour
Copy link
Contributor

Did this not make it into 1.6 in the end?

@albanD
Copy link
Collaborator

albanD commented Jul 28, 2020

The original branch cut happened on the 23rd of July IIRC. And then only important bugfix are cherry picked into the release.
So I don't think so :/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Merged module: nn Related to torch.nn

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature Request] Swish Activation Function

10 participants