fix max/min on cuda in presence of NaN (fixes #6996) #7052

t-vi · 2018-04-27T21:18:14Z

This adds tests for NaN to the min and max kernels and should return NaNs in the right places.

aten/src/THC/THCTensorMathReduce.cuh

  inline __device__ T operator()(T a, T b) const {
-    return THCNumerics<T>::lt(a, b) ? a : b;
+    // a != a means a == NaN
+    return (THCNumerics<T>::lt(a, b) ||


test/test_cuda.py

+            actual = f(a.cuda()).cpu()
+            expected = f(a).cpu()
+            self.assertEqual(torch.isnan(actual), torch.isnan(expected), 'nans for {}'.format(name))
+            self.assertEqual(actual[~torch.isnan(actual)],


t-vi · 2018-04-28T07:04:40Z

Will do. Thank you ngimel!

Thank you ngimel and zou3519!

ezyang · 2018-04-30T03:03:02Z

Looks correct. @t-vi would you mind running some quick perf numbers just to characterize what the effect of the extra neq test is?

t-vi · 2018-04-30T06:59:00Z

My conclusion would be measurable in isolation (in the order 10%-20% slowdown), negligible in any other context. Just like max probably wasn't the bottleneck before, doing twice as many comparisons in max doesn't really matter if you do any nontrivial stuff elsewhere.

So I run this several times (to avoid init topics, the first run takes longer than the others). I lifted this from somewhere in the pytorch/benchmark repository - if you have a better methodology, I'd be happy to apply it.

import gc, torch, time
with torch.no_grad():
    a = torch.empty(100, 1000, 1000, device='cuda')
    a.normal_()
    start = torch.cuda.Event(enable_timing=True)
    end = torch.cuda.Event(enable_timing=True)
    gc.collect()
    torch.cuda.synchronize()
    start.record()
    start_cpu_secs = time.time()
    b = a.max()
    end_cpu_secs = time.time()
    end.record()
    torch.cuda.synchronize()
    gpu_msecs = start.elapsed_time(end)
    print(torch.__version__, "msecs maxall gpu", gpu_msecs, "cpu", (end_cpu_secs - start_cpu_secs)*1000)
    if 1:
        a = torch.empty(100, 1000, 1000, device='cuda')
        a.normal_()
        start = torch.cuda.Event(enable_timing=True)
        end = torch.cuda.Event(enable_timing=True)
        gc.collect()
        torch.cuda.synchronize()
        start.record()
        start_cpu_secs = time.time()
        b = a.max(1)
        end_cpu_secs = time.time()
        end.record()
        torch.cuda.synchronize()
        gpu_msecs = start.elapsed_time(end)
        print(torch.__version__, "msecs max[1] gpu", gpu_msecs, "cpu", (end_cpu_secs - start_cpu_secs)*1000)

And I get:

0.4.0 msecs maxall gpu 1.6861120462417603 cpu 1.6646385192871094
0.4.0 msecs max[1] gpu 2.239487886428833 cpu 0.07939338684082031

vs.

0.5.0a0+497ff06 msecs maxall gpu 1.8479679822921753 cpu 1.8224716186523438
0.5.0a0+497ff06 msecs max[1] gpu 1.922752022743225 cpu 0.054836273193359375

apaszke

Just wanted to clarify our strategy here. Do we want to match the CPU operators? If so, that's not the right way to go. The way they are implemented is slightly different. Actually I think the CPU ops mix both the condition I linked, and the one you put here. Can you please make them all consistent?

The benchmarks look like noise to me, so that's ok.

t-vi · 2018-04-30T08:42:09Z

To me, the main difference seems to be breaking - this is needed in the cpu implementation because otherwise the NaN in theMax var will cause the next comparison to favour value. Stopping the calculation might be a bit more efficient - when there are NaNs, so not really. Also, we wanted a cuda implementation that doesn't need comments to explain whats going on. :) That said, I might be missing something...

apaszke · 2018-04-30T08:56:32Z

But what's the actual fix in this PR then? If we want to have kernels return NaNs when there are NaNs, then why don't we treat CPU kernels that way

apaszke · 2018-04-30T18:03:28Z

Ok, never mind my comment. I did the math again and it seems to be ok.

t-vi · 2018-04-30T18:26:29Z

(Sorry, the second mail doesn't seem to have reached the issue log... :( )
Oh, it is roughly the same, CPU does the comparison first and then checks if value was NaN (then theMax is NaN because how the ">" is formulated) and if it is, it aborts and theMax will be NaN.
Previously, the Cuda kernel would put NaN into the accumulator (a in how the cuda function is called, theMax in CPU) and then overwrite it when doing the next comparison.
With the proposed fix, the cuda kernel form checks if the left hand side (a - equivalent to theMax in how it is used) is NaN and if it is, it keeps a. I think this is result-equivalent to how the CPU works.

…7052) Thank you ngimel and zou3519!

fix max/min on cuda in presence of NaN (fixes pytorch#6996)

06095e8

t-vi requested review from apaszke, colesbury, ezyang, gchanan, soumith and zdevito as code owners April 27, 2018 21:18

zou3519 reviewed Apr 27, 2018

View reviewed changes

Add proper isnan to THCNumerics.

497ff06

Thank you ngimel and zou3519!

ezyang approved these changes Apr 30, 2018

View reviewed changes

apaszke approved these changes Apr 30, 2018

View reviewed changes

apaszke merged commit 20c965f into pytorch:master Apr 30, 2018

Jorghi12 pushed a commit to wsttiger/pytorch that referenced this pull request May 10, 2018

fix max/min on cuda in presence of NaN (fixes pytorch#6996) (pytorch#…

658fd8a

…7052) Thank you ngimel and zou3519!

weiyangfb pushed a commit to weiyangfb/pytorch that referenced this pull request Jun 11, 2018

fix max/min on cuda in presence of NaN (fixes pytorch#6996) (pytorch#…

3a01ad1

…7052) Thank you ngimel and zou3519!

ezyang added the open source label Jun 24, 2019

fix max/min on cuda in presence of NaN (fixes #6996) #7052

fix max/min on cuda in presence of NaN (fixes #6996) #7052

Uh oh!

Conversation

t-vi commented Apr 27, 2018

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

t-vi commented Apr 28, 2018 via email

Uh oh!

ezyang commented Apr 30, 2018

Uh oh!

t-vi commented Apr 30, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

apaszke left a comment

Choose a reason for hiding this comment

Uh oh!

t-vi commented Apr 30, 2018 via email

Uh oh!

apaszke commented Apr 30, 2018

Uh oh!

apaszke commented Apr 30, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

t-vi commented Apr 30, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

t-vi commented Apr 30, 2018 •

edited

Loading

apaszke commented Apr 30, 2018 •

edited

Loading