Skip to content

Conversation

@muthuArivoli
Copy link
Contributor

Related to #38349.

@muthuArivoli muthuArivoli marked this pull request as draft August 5, 2020 03:33
@dr-ci
Copy link

dr-ci bot commented Aug 5, 2020

💊 CI failures summary and remediations

As of commit bdfabb0 (more details on the Dr. CI page):


  • 2/2 failures possibly* introduced in this PR
    • 1/2 non-CircleCI failure(s)

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_windows_vs2019_py36_cuda10.1_build (1/1)

Step: "Build" (full log | diagnosis details | 🔁 rerun)

FAILED: caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/cuda/CuSparseHandlePool.cpp.obj
caused by: Failed to read response header
caused by: failed to fill whole buffer
AVE_AVX_CPU_DEFINITION -DHAVE_AVX2_CPU_DEFINITION /MD /O2 /Ob2 /DNDEBUG /w /bigobj -DNDEBUG -DCUDA_HAS_FP16=1 -DUSE_GCC_GET_CPUID -DUSE_AVX -DUSE_AVX2 -DTH_HAVE_THREAD /EHsc /DNOMINMAX /wd4267 /wd4251 /wd4522 /wd4838 /wd4305 /wd4244 /wd4190 /wd4101 /wd4996 /wd4275 /bigobj -O2 -DTORCH_CUDA_BUILD_MAIN_LIB -std:c++14 /showIncludes /Focaffe2\CMakeFiles\torch_cuda.dir\__\aten\src\ATen\cuda\detail\CUDAHooks.cpp.obj /Fdcaffe2\CMakeFiles\torch_cuda.dir\ /FS -c ..\aten\src\ATen\cuda\detail\CUDAHooks.cpp 
FAILED: caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/cuda/detail/CUDAHooks.cpp.obj  
AVE_AVX_CPU_DEFINITION -DHAVE_AVX2_CPU_DEFINITION /MD /O2 /Ob2 /DNDEBUG /w /bigobj -DNDEBUG -DCUDA_HAS_FP16=1 -DUSE_GCC_GET_CPUID -DUSE_AVX -DUSE_AVX2 -DTH_HAVE_THREAD /EHsc /DNOMINMAX /wd4267 /wd4251 /wd4522 /wd4838 /wd4305 /wd4244 /wd4190 /wd4101 /wd4996 /wd4275 /bigobj -O2 -DTORCH_CUDA_BUILD_MAIN_LIB -std:c++14 /showIncludes /Focaffe2\CMakeFiles\torch_cuda.dir\__\aten\src\ATen\cuda\detail\CUDAHooks.cpp.obj /Fdcaffe2\CMakeFiles\torch_cuda.dir\ /FS -c ..\aten\src\ATen\cuda\detail\CUDAHooks.cpp 
error: failed to execute compile
caused by: error reading compile response from server
caused by: Failed to read response header
caused by: failed to fill whole buffer
AVX_CPU_DEFINITION -DHAVE_AVX2_CPU_DEFINITION /MD /O2 /Ob2 /DNDEBUG /w /bigobj -DNDEBUG -DCUDA_HAS_FP16=1 -DUSE_GCC_GET_CPUID -DUSE_AVX -DUSE_AVX2 -DTH_HAVE_THREAD /EHsc /DNOMINMAX /wd4267 /wd4251 /wd4522 /wd4838 /wd4305 /wd4244 /wd4190 /wd4101 /wd4996 /wd4275 /bigobj -O2 -DTORCH_CUDA_BUILD_MAIN_LIB -std:c++14 /showIncludes /Focaffe2\CMakeFiles\torch_cuda.dir\__\aten\src\ATen\cuda\CuSparseHandlePool.cpp.obj /Fdcaffe2\CMakeFiles\torch_cuda.dir\ /FS -c ..\aten\src\ATen\cuda\CuSparseHandlePool.cpp 
FAILED: caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/cuda/CuSparseHandlePool.cpp.obj  
AVX_CPU_DEFINITION -DHAVE_AVX2_CPU_DEFINITION /MD /O2 /Ob2 /DNDEBUG /w /bigobj -DNDEBUG -DCUDA_HAS_FP16=1 -DUSE_GCC_GET_CPUID -DUSE_AVX -DUSE_AVX2 -DTH_HAVE_THREAD /EHsc /DNOMINMAX /wd4267 /wd4251 /wd4522 /wd4838 /wd4305 /wd4244 /wd4190 /wd4101 /wd4996 /wd4275 /bigobj -O2 -DTORCH_CUDA_BUILD_MAIN_LIB -std:c++14 /showIncludes /Focaffe2\CMakeFiles\torch_cuda.dir\__\aten\src\ATen\cuda\CuSparseHandlePool.cpp.obj /Fdcaffe2\CMakeFiles\torch_cuda.dir\ /FS -c ..\aten\src\ATen\cuda\CuSparseHandlePool.cpp 
error: failed to execute compile
caused by: error reading compile response from server
caused by: Failed to read response header
caused by: failed to fill whole buffer
ninja: build stopped: subcommand failed. 
-- Building version 1.7.0a0+bdfabb0 
ect\build\win_tmp\bin\randomtemp.exe -DCUDNN_LIBRARY=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\lib\x64 -DJAVA_HOME=C:\Program Files\OpenJDK\jdk-12.0.2 -DNUMPY_INCLUDE_DIR=C:\Jenkins\Miniconda3\lib\site-packages\numpy\core\include -DPYTHON_EXECUTABLE=C:\Jenkins\Miniconda3\python.exe -DPYTHON_INCLUDE_DIR=C:\Jenkins\Miniconda3\include -DPYTHON_LIBRARY=C:\Jenkins\Miniconda3/libs/python36.lib -DTORCH_BUILD_VERSION=1.7.0a0+bdfabb0 -DUSE_CUDA=1 -DUSE_NUMPY=True C:\Users\circleci\project 
cmake --build . --target install --config Release -- -j 16 
Traceback (most recent call last): 

ci.pytorch.org: 1 failed


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 44 times.

@muthuArivoli muthuArivoli changed the title [WIP] Implement torch.nextafter Implement torch.nextafter Aug 6, 2020
@muthuArivoli muthuArivoli marked this pull request as ready for review August 6, 2020 22:53
@muthuArivoli
Copy link
Contributor Author

@mruberry PTAL

@ailzhang ailzhang requested a review from mruberry August 7, 2020 01:40
@ailzhang ailzhang added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Aug 7, 2020
r"""
nextafter(input, other, *, out=None) -> Tensor
Return the next floating-point value after input towards other, elementwise.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:attr:`input` and :attr:`other` here and in the following paragraph.

def test_nextafter(self, device, dtype):
# Test special cases
t1 = torch.tensor([0, 0, 10], device=device, dtype=dtype)
t2 = torch.tensor([inf, -inf, 10], device=device, dtype=dtype)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better add nan, too

t1 = torch.tensor([0, 0, 10], device=device, dtype=dtype)
t2 = torch.tensor([inf, -inf, 10], device=device, dtype=dtype)
actual = torch.nextafter(t1, t2)
expected = np.nextafter(t1.cpu().numpy(), t2.cpu().numpy())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better test the flip of these torch.nextafter(t2, t1) as well

Copy link
Collaborator

@mruberry mruberry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really good, @muthuArivoli, nice work! I just made a few suggestions for the test and doc.

@muthuArivoli
Copy link
Contributor Author

@mruberry, I removed the half implementation since it didn't work. I also have made all the other changes requested.

@mruberry
Copy link
Collaborator

mruberry commented Aug 7, 2020

Everything looks very good - would just add 'nan' as another special value in the test (along with inf/-inf)?

@muthuArivoli
Copy link
Contributor Author

Everything looks very good - would just add 'nan' as another special value in the test (along with inf/-inf)?

I added nan in the test after that test mentioned. Other tests seemed to use assertTrue with isnan to test for nan, since I think nan!=nan, which is why I used it.

@mruberry
Copy link
Collaborator

mruberry commented Aug 7, 2020

Everything looks very good - would just add 'nan' as another special value in the test (along with inf/-inf)?

I added nan in the test after that test mentioned. Other tests seemed to use assertTrue with isnan to test for nan, since I think nan!=nan, which is why I used it.

Aha! Thank you for pointing that out. My mistake.

@mruberry mruberry self-requested a review August 7, 2020 22:27
@mruberry mruberry added the module: numpy Related to numpy support, and also numpy compatibility of our operators label Aug 7, 2020
Copy link
Collaborator

@mruberry mruberry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work, @muthuArivoli! Thanks for this PR.

I just updated #38349, too, if you're interested in other functions.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mruberry has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@mruberry
Copy link
Collaborator

We have some internal mobile builds that are failing after this PR:

aten/src/ATen/cpu/vec256/vec256_base.h:389:16: error: no member named 'nextafter' in namespace 'std'; did you mean simply 'nextafter'?
      ret[i] = std::nextafter(values[i], b[i]);

ten/src/ATen/native/cpu/BinaryOpsKernel.cpp:749:22: error: no member named 'nextafter' in 'at::vec256::(anonymous namespace)::Vec256<float>'
            return a.nextafter(b);

cc @dreiss

I'm not very familiar with mobile but I can follow-up with them to get a better understanding of what's going on here and how to address this issue. One option, if you like, would be to removed the CPU vectorized code paths for now, add this function, and then add the vectorized code paths back once we have a plan to deal with this mobile issue. Either that or we can wait for the mobile team to weigh-in.

@mruberry
Copy link
Collaborator

Now that #42291 is in, this PR will need analogous updates.

@muthuArivoli
Copy link
Contributor Author

Just updated with the mobile fixes. I realized however, that I haven't done anything for autograd. Do I need to do anything for autograd, and if so, how would I, since I don't think the gradient is well defined for nextafter?

@mruberry
Copy link
Collaborator

Just updated with the mobile fixes. I realized however, that I haven't done anything for autograd. Do I need to do anything for autograd, and if so, how would I, since I don't think the gradient is well defined for nextafter?

You could add a stub like this:

https://github.com/muthuArivoli/pytorch/blob/ab5c39810aa4d914b1f4e97f2873976cd3415b56/tools/autograd/derivatives.yaml#L220

Which would throw a nicer error message if someone tries to backward through nextafter.

@muthuArivoli
Copy link
Contributor Author

Just updated with the mobile fixes. I realized however, that I haven't done anything for autograd. Do I need to do anything for autograd, and if so, how would I, since I don't think the gradient is well defined for nextafter?

You could add a stub like this:

https://github.com/muthuArivoli/pytorch/blob/ab5c39810aa4d914b1f4e97f2873976cd3415b56/tools/autograd/derivatives.yaml#L220

Which would throw a nicer error message if someone tries to backward through nextafter.

Thanks! Just added that.

@muthuArivoli
Copy link
Contributor Author

@mruberry Could you run this against the internal mobile builds?

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mruberry has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@mruberry
Copy link
Collaborator

Android builds look good. Starting the land process.

@facebook-github-bot
Copy link
Contributor

@mruberry merged this pull request in b8102b1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Merged module: numpy Related to numpy support, and also numpy compatibility of our operators open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants