Skip to content

Conversation

@tunz
Copy link
Contributor

@tunz tunz commented Feb 18, 2018

Issue #4737

This PR provides a function to enable/disable flush denormal mode on x86 supporting SSE3. Intel is providing this two macros _MM_SET_FLUSH_ZERO_MODE and _MM_SET_DENORMALS_ZERO_MODE, so I used these.

it seems some old versions of gcc have an issue to prevent the use of intrinsics (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57202), so always add -msse3 option.

I'm not sure if we have to enable this option by default or not, but I think this approach seems reasonable as a first step.

Copy link
Contributor

@apaszke apaszke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Just curious why is that useful? Does it change the CPU performance a lot?

r"""
set_flush_denormal(on) -> bool
Returns ``True`` if it successfully configures flush denormal mode.

This comment was marked as off-topic.

This comment was marked as off-topic.

@tunz
Copy link
Contributor Author

tunz commented Feb 19, 2018

Yes. The benchmark of #4737 gets about 13x faster on my machine.

Without flushing denormal numbers:

Starting...
p=100, q=100, r=10
  iter   V_maxdiff       W_maxdiff             obj            time
--------------------------------------------------------------------------------
   100  1.9202e-03      8.9862e-03      6.2078e+02         0.00460
   200  6.6091e-04      3.3026e-03      6.1045e+02         0.00391
   300  6.3047e-04      3.7313e-03      6.0724e+02         0.00365
...
 17000  2.3246e-06      1.0252e-05      5.9653e+02         0.05804
 17100  2.2724e-06      1.0252e-05      5.9653e+02         0.05816
 17200  2.1532e-06      9.4175e-06      5.9653e+02         0.05767
--------------------------------------------------------------------------------
Completed. total time: 8.237918376922607

With flushing denormal numbers:

Starting...
p=100, q=100, r=10
  iter   V_maxdiff       W_maxdiff             obj            time
--------------------------------------------------------------------------------
   100  1.9202e-03      8.9862e-03      6.2078e+02         0.00458
   200  6.6091e-04      3.3026e-03      6.1045e+02         0.00393
   300  6.3047e-04      3.7313e-03      6.0724e+02         0.00367
...
 16200  4.4415e-06      1.1805e-05      5.9654e+02         0.00366
 16300  3.0734e-06      1.0103e-05      5.9654e+02         0.00365
 16400  2.2054e-06      9.6560e-06      5.9654e+02         0.00365
--------------------------------------------------------------------------------
Completed. total time: 0.5998060703277588

@apaszke
Copy link
Contributor

apaszke commented Feb 19, 2018

@pytorchbot test this please

@soumith soumith merged commit fae6c67 into pytorch:master Feb 20, 2018
@soumith
Copy link
Contributor

soumith commented Feb 20, 2018

thanks @tunz !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants