Skip to content

Conversation

@hfxunlp
Copy link

@hfxunlp hfxunlp commented Aug 15, 2020

Take care of the state of autocast in parallel_apply, so there is no need to decorate model implementations.

@dr-ci
Copy link

dr-ci bot commented Aug 15, 2020

💊 CI failures summary and remediations

As of commit 2072a64 (more details on the Dr. CI page):


  • 4/4 failures introduced in this PR

🕵️ 4 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_windows_vs2019_py36_cpu_build (1/4)

Step: "Build" (full log | diagnosis details | 🔁 rerun)

FAILED: third_party/ideep/mkl-dnn/src/cpu/CMakeFiles/dnnl_cpu.dir/gemm_convolution_utils.cpp.obj
enmp:experimental -DNDEBUG -openmp:experimental  /MP    /wd4800 /wd4068 /wd4305 /wd4551 /wd4244  /MD /O2 /Ob2 /DNDEBUG /w /bigobj -DNDEBUG -DUSE_GCC_GET_CPUID -DUSE_AVX -DUSE_AVX2 -std:c++14 /showIncludes /Fothird_party\ideep\mkl-dnn\src\cpu\x64\CMakeFiles\dnnl_cpu_x64.dir\gemm\bf16\jit_avx512_core_gemm_bf16bf16f32_kern.cpp.obj /Fdthird_party\ideep\mkl-dnn\src\cpu\x64\CMakeFiles\dnnl_cpu_x64.dir\ /FS -c ..\third_party\ideep\mkl-dnn\src\cpu\x64\gemm\bf16\jit_avx512_core_gemm_bf16bf16f32_kern.cpp 
Microsoft (R) C/C++ Optimizing Compiler Version 19.27.29111 for x64
Copyright (C) Microsoft Corporation.  All rights reserved.

imental -DNDEBUG -openmp:experimental  /MP    /wd4800 /wd4068 /wd4305 /wd4551 /wd4244  /MD /O2 /Ob2 /DNDEBUG /w /bigobj -DNDEBUG -DUSE_GCC_GET_CPUID -DUSE_AVX -DUSE_AVX2 -std:c++14 /Od /showIncludes /Fothird_party\ideep\mkl-dnn\src\cpu\x64\CMakeFiles\dnnl_cpu_x64.dir\gemm\bf16\jit_avx512_core_s16_copy_an_kern_autogen.cpp.obj /Fdthird_party\ideep\mkl-dnn\src\cpu\x64\CMakeFiles\dnnl_cpu_x64.dir\ /FS -c ..\third_party\ideep\mkl-dnn\src\cpu\x64\gemm\bf16\jit_avx512_core_s16_copy_an_kern_autogen.cpp 
Microsoft (R) C/C++ Optimizing Compiler Version 19.27.29111 for x64
Copyright (C) Microsoft Corporation.  All rights reserved.

cl : Command line warning D9025 : overriding '/O2' with '/Od'
ird_party\pybind11\include /DWIN32 /D_WINDOWS /GR /EHsc /w /bigobj -openmp:experimental -DNDEBUG -openmp:experimental  /MP    /wd4800 /wd4068 /wd4305 /wd4551 /wd4244  /MD /O2 /Ob2 /DNDEBUG /w /bigobj -DNDEBUG -DUSE_GCC_GET_CPUID -DUSE_AVX -DUSE_AVX2 -std:c++14 /showIncludes /Fothird_party\ideep\mkl-dnn\src\cpu\CMakeFiles\dnnl_cpu.dir\gemm_convolution_utils.cpp.obj /Fdthird_party\ideep\mkl-dnn\src\cpu\CMakeFiles\dnnl_cpu.dir\ /FS -c ..\third_party\ideep\mkl-dnn\src\cpu\gemm_convolution_utils.cpp 
FAILED: third_party/ideep/mkl-dnn/src/cpu/CMakeFiles/dnnl_cpu.dir/gemm_convolution_utils.cpp.obj  
ird_party\pybind11\include /DWIN32 /D_WINDOWS /GR /EHsc /w /bigobj -openmp:experimental -DNDEBUG -openmp:experimental  /MP    /wd4800 /wd4068 /wd4305 /wd4551 /wd4244  /MD /O2 /Ob2 /DNDEBUG /w /bigobj -DNDEBUG -DUSE_GCC_GET_CPUID -DUSE_AVX -DUSE_AVX2 -std:c++14 /showIncludes /Fothird_party\ideep\mkl-dnn\src\cpu\CMakeFiles\dnnl_cpu.dir\gemm_convolution_utils.cpp.obj /Fdthird_party\ideep\mkl-dnn\src\cpu\CMakeFiles\dnnl_cpu.dir\ /FS -c ..\third_party\ideep\mkl-dnn\src\cpu\gemm_convolution_utils.cpp 
C:\Users\circleci\project\third_party\ideep\mkl-dnn\src\cpu\gemm_convolution_utils.cpp(401) : fatal error C1001: Internal compiler error.
(compiler file 'd:\agent\_work\7\s\src\vctools\Compiler\Utc\src\p2\main.c', line 195)
 To work around this problem, try simplifying or changing the program near the locations listed above.
If possible please provide a repro here: https://developercommunity.visualstudio.com 
Please choose the Technical Support command on the Visual C++ 
 Help menu, or open the Technical Support help file for more information
  cl!RaiseException()+0x69
  cl!RaiseException()+0x69
  cl!CloseTypeServerPDB()+0x22e6b

See CircleCI build pytorch_windows_vs2019_py36_cuda10.1_build (2/4)

Step: "Build" (full log | diagnosis details | 🔁 rerun)

FAILED: third_party/ideep/mkl-dnn/src/cpu/CMakeFiles/dnnl_cpu.dir/gemm_convolution_utils.cpp.obj
 
 Computing Toolkit\CUDA\v10.1\include" /DWIN32 /D_WINDOWS /GR /EHsc /w /bigobj -openmp:experimental -DNDEBUG -openmp:experimental  /MP    /wd4800 /wd4068 /wd4305 /wd4551 /wd4244  /MD /O2 /Ob2 /DNDEBUG /w /bigobj -DNDEBUG -DCUDA_HAS_FP16=1 -DUSE_GCC_GET_CPUID -DUSE_AVX -DUSE_AVX2 -std:c++14 /showIncludes /Fothird_party\ideep\mkl-dnn\src\cpu\CMakeFiles\dnnl_cpu.dir\ref_lrn.cpp.obj /Fdthird_party\ideep\mkl-dnn\src\cpu\CMakeFiles\dnnl_cpu.dir\ /FS -c ..\third_party\ideep\mkl-dnn\src\cpu\ref_lrn.cpp 
Microsoft (R) C/C++ Optimizing Compiler Version 19.27.29111 for x64
Copyright (C) Microsoft Corporation.  All rights reserved.

lkit\CUDA\v10.1\include" /DWIN32 /D_WINDOWS /GR /EHsc /w /bigobj -openmp:experimental -DNDEBUG -openmp:experimental  /MP    /wd4800 /wd4068 /wd4305 /wd4551 /wd4244  /MD /O2 /Ob2 /DNDEBUG /w /bigobj -DNDEBUG -DCUDA_HAS_FP16=1 -DUSE_GCC_GET_CPUID -DUSE_AVX -DUSE_AVX2 -std:c++14 /showIncludes /Fothird_party\ideep\mkl-dnn\src\cpu\CMakeFiles\dnnl_cpu.dir\ref_resampling.cpp.obj /Fdthird_party\ideep\mkl-dnn\src\cpu\CMakeFiles\dnnl_cpu.dir\ /FS -c ..\third_party\ideep\mkl-dnn\src\cpu\ref_resampling.cpp 
Microsoft (R) C/C++ Optimizing Compiler Version 19.27.29111 for x64
Copyright (C) Microsoft Corporation.  All rights reserved.

include" /DWIN32 /D_WINDOWS /GR /EHsc /w /bigobj -openmp:experimental -DNDEBUG -openmp:experimental  /MP    /wd4800 /wd4068 /wd4305 /wd4551 /wd4244  /MD /O2 /Ob2 /DNDEBUG /w /bigobj -DNDEBUG -DCUDA_HAS_FP16=1 -DUSE_GCC_GET_CPUID -DUSE_AVX -DUSE_AVX2 -std:c++14 /showIncludes /Fothird_party\ideep\mkl-dnn\src\cpu\CMakeFiles\dnnl_cpu.dir\gemm_convolution_utils.cpp.obj /Fdthird_party\ideep\mkl-dnn\src\cpu\CMakeFiles\dnnl_cpu.dir\ /FS -c ..\third_party\ideep\mkl-dnn\src\cpu\gemm_convolution_utils.cpp 
FAILED: third_party/ideep/mkl-dnn/src/cpu/CMakeFiles/dnnl_cpu.dir/gemm_convolution_utils.cpp.obj  
include" /DWIN32 /D_WINDOWS /GR /EHsc /w /bigobj -openmp:experimental -DNDEBUG -openmp:experimental  /MP    /wd4800 /wd4068 /wd4305 /wd4551 /wd4244  /MD /O2 /Ob2 /DNDEBUG /w /bigobj -DNDEBUG -DCUDA_HAS_FP16=1 -DUSE_GCC_GET_CPUID -DUSE_AVX -DUSE_AVX2 -std:c++14 /showIncludes /Fothird_party\ideep\mkl-dnn\src\cpu\CMakeFiles\dnnl_cpu.dir\gemm_convolution_utils.cpp.obj /Fdthird_party\ideep\mkl-dnn\src\cpu\CMakeFiles\dnnl_cpu.dir\ /FS -c ..\third_party\ideep\mkl-dnn\src\cpu\gemm_convolution_utils.cpp 
C:\Users\circleci\project\third_party\ideep\mkl-dnn\src\cpu\gemm_convolution_utils.cpp(401) : fatal error C1001: Internal compiler error.
(compiler file 'd:\agent\_work\7\s\src\vctools\Compiler\Utc\src\p2\main.c', line 195)
 To work around this problem, try simplifying or changing the program near the locations listed above.
If possible please provide a repro here: https://developercommunity.visualstudio.com 
Please choose the Technical Support command on the Visual C++ 
 Help menu, or open the Technical Support help file for more information
  cl!RaiseException()+0x69
  cl!RaiseException()+0x69
  cl!CloseTypeServerPDB()+0x22e6b

See CircleCI build pytorch_windows_vs2019_py36_cuda11.0_build (3/4)

Step: "Build" (full log | diagnosis details | 🔁 rerun)

FAILED: third_party/ideep/mkl-dnn/src/cpu/CMakeFiles/dnnl_cpu.dir/gemm_convolution_utils.cpp.obj
 -openmp:experimental  /MP    /wd4800 /wd4068 /wd4305 /wd4551 /wd4244  /MD /O2 /Ob2 /DNDEBUG /w /bigobj -DNDEBUG -DCUDA_HAS_FP16=1 -DUSE_GCC_GET_CPUID -DUSE_AVX -DUSE_AVX2 -std:c++14 /Od /showIncludes /Fothird_party\ideep\mkl-dnn\src\cpu\x64\CMakeFiles\dnnl_cpu_x64.dir\gemm\f32\jit_avx512_core_f32_copy_an_kern_autogen.cpp.obj /Fdthird_party\ideep\mkl-dnn\src\cpu\x64\CMakeFiles\dnnl_cpu_x64.dir\ /FS -c ..\third_party\ideep\mkl-dnn\src\cpu\x64\gemm\f32\jit_avx512_core_f32_copy_an_kern_autogen.cpp 
Microsoft (R) C/C++ Optimizing Compiler Version 19.27.29111 for x64
Copyright (C) Microsoft Corporation.  All rights reserved.

cl : Command line warning D9025 : overriding '/O2' with '/Od'
bj -openmp:experimental -DNDEBUG -openmp:experimental  /MP    /wd4800 /wd4068 /wd4305 /wd4551 /wd4244  /MD /O2 /Ob2 /DNDEBUG /w /bigobj -DNDEBUG -DCUDA_HAS_FP16=1 -DUSE_GCC_GET_CPUID -DUSE_AVX -DUSE_AVX2 -std:c++14 /showIncludes /Fothird_party\ideep\mkl-dnn\src\cpu\x64\CMakeFiles\dnnl_cpu_x64.dir\gemm\f32\jit_avx512_common_gemm_f32.cpp.obj /Fdthird_party\ideep\mkl-dnn\src\cpu\x64\CMakeFiles\dnnl_cpu_x64.dir\ /FS -c ..\third_party\ideep\mkl-dnn\src\cpu\x64\gemm\f32\jit_avx512_common_gemm_f32.cpp 
Microsoft (R) C/C++ Optimizing Compiler Version 19.27.29111 for x64
Copyright (C) Microsoft Corporation.  All rights reserved.

\include /DWIN32 /D_WINDOWS /GR /EHsc /w /bigobj -openmp:experimental -DNDEBUG -openmp:experimental  /MP    /wd4800 /wd4068 /wd4305 /wd4551 /wd4244  /MD /O2 /Ob2 /DNDEBUG /w /bigobj -DNDEBUG -DCUDA_HAS_FP16=1 -DUSE_GCC_GET_CPUID -DUSE_AVX -DUSE_AVX2 -std:c++14 /showIncludes /Fothird_party\ideep\mkl-dnn\src\cpu\CMakeFiles\dnnl_cpu.dir\gemm_convolution_utils.cpp.obj /Fdthird_party\ideep\mkl-dnn\src\cpu\CMakeFiles\dnnl_cpu.dir\ /FS -c ..\third_party\ideep\mkl-dnn\src\cpu\gemm_convolution_utils.cpp 
FAILED: third_party/ideep/mkl-dnn/src/cpu/CMakeFiles/dnnl_cpu.dir/gemm_convolution_utils.cpp.obj  
\include /DWIN32 /D_WINDOWS /GR /EHsc /w /bigobj -openmp:experimental -DNDEBUG -openmp:experimental  /MP    /wd4800 /wd4068 /wd4305 /wd4551 /wd4244  /MD /O2 /Ob2 /DNDEBUG /w /bigobj -DNDEBUG -DCUDA_HAS_FP16=1 -DUSE_GCC_GET_CPUID -DUSE_AVX -DUSE_AVX2 -std:c++14 /showIncludes /Fothird_party\ideep\mkl-dnn\src\cpu\CMakeFiles\dnnl_cpu.dir\gemm_convolution_utils.cpp.obj /Fdthird_party\ideep\mkl-dnn\src\cpu\CMakeFiles\dnnl_cpu.dir\ /FS -c ..\third_party\ideep\mkl-dnn\src\cpu\gemm_convolution_utils.cpp 
C:\Users\circleci\project\third_party\ideep\mkl-dnn\src\cpu\gemm_convolution_utils.cpp(401) : fatal error C1001: Internal compiler error.
(compiler file 'd:\agent\_work\7\s\src\vctools\Compiler\Utc\src\p2\main.c', line 195)
 To work around this problem, try simplifying or changing the program near the locations listed above.
If possible please provide a repro here: https://developercommunity.visualstudio.com 
Please choose the Technical Support command on the Visual C++ 
 Help menu, or open the Technical Support help file for more information
  cl!RaiseException()+0x69
  cl!RaiseException()+0x69
  cl!CloseTypeServerPDB()+0x22e6b

See CircleCI build binary_windows_libtorch_3_7_cpu_release_build (4/4)

Step: "Build" (full log | diagnosis details | 🔁 rerun)

FAILED: third_party/ideep/mkl-dnn/src/cpu/CMakeFiles/dnnl_cpu.dir/gemm_convolution_utils.cpp.obj
[264/2275] Building CXX object third_party\protobuf\cmake\CMakeFiles\libprotobuf-lite.dir\__\src\google\protobuf\io\zero_copy_stream_impl.cc.obj 
[265/2275] Building CXX object third_party\protobuf\cmake\CMakeFiles\libprotobuf-lite.dir\__\src\google\protobuf\generated_enum_util.cc.obj 
[266/2275] Building CXX object third_party\protobuf\cmake\CMakeFiles\libprotobuf-lite.dir\__\src\google\protobuf\io\zero_copy_stream_impl_lite.cc.obj 
[267/2275] Building RC object third_party\protobuf\cmake\CMakeFiles\libprotobuf-lite.dir\version.rc.res 
[268/2275] Building CXX object third_party\protobuf\cmake\CMakeFiles\libprotobuf-lite.dir\__\src\google\protobuf\generated_message_table_driven_lite.cc.obj 
[269/2275] Building CXX object third_party\protobuf\cmake\CMakeFiles\libprotobuf-lite.dir\__\src\google\protobuf\generated_message_util.cc.obj 
[270/2275] Building CXX object third_party\protobuf\cmake\CMakeFiles\libprotobuf-lite.dir\__\src\google\protobuf\message_lite.cc.obj 
[271/2275] Building CXX object third_party\protobuf\cmake\CMakeFiles\libprotobuf-lite.dir\__\src\google\protobuf\parse_context.cc.obj 
[272/2275] Building CXX object third_party\protobuf\cmake\CMakeFiles\libprotobuf-lite.dir\__\src\google\protobuf\repeated_field.cc.obj 
[273/2275] Building CXX object third_party\ideep\mkl-dnn\src\cpu\CMakeFiles\dnnl_cpu.dir\gemm_convolution_utils.cpp.obj 
FAILED: third_party/ideep/mkl-dnn/src/cpu/CMakeFiles/dnnl_cpu.dir/gemm_convolution_utils.cpp.obj  
arty\pybind11\include /DWIN32 /D_WINDOWS /GR /EHsc /w /bigobj -openmp:experimental -DNDEBUG -openmp:experimental  /MP    /wd4800 /wd4068 /wd4305 /wd4551 /wd4244  /MD /O2 /Ob2 /DNDEBUG /w /bigobj -DNDEBUG   -DUSE_GCC_GET_CPUID -DUSE_AVX -DUSE_AVX2 -std:c++14 /showIncludes /Fothird_party\ideep\mkl-dnn\src\cpu\CMakeFiles\dnnl_cpu.dir\gemm_convolution_utils.cpp.obj /Fdthird_party\ideep\mkl-dnn\src\cpu\CMakeFiles\dnnl_cpu.dir\ /FS -c ..\..\third_party\ideep\mkl-dnn\src\cpu\gemm_convolution_utils.cpp 
C:\w\b\windows\pytorch\third_party\ideep\mkl-dnn\src\cpu\gemm_convolution_utils.cpp(401) : fatal error C1001: Internal compiler error.
(compiler file 'd:\agent\_work\7\s\src\vctools\Compiler\Utc\src\p2\main.c', line 195)
 To work around this problem, try simplifying or changing the program near the locations listed above.
If possible please provide a repro here: https://developercommunity.visualstudio.com 
Please choose the Technical Support command on the Visual C++ 
 Help menu, or open the Technical Support help file for more information
  cl!RaiseException()+0x69
  cl!RaiseException()+0x69
  cl!CloseTypeServerPDB()+0x22e6b

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 8 times.

Copy link
Contributor

@mrshenli mrshenli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @anoidgit, thanks for adding this. Shall we add a test to cover the new code?

@mrshenli mrshenli requested a review from mcarilli August 18, 2020 17:23
@mrshenli mrshenli added module: data parallel triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Aug 18, 2020
@hfxunlp
Copy link
Author

hfxunlp commented Aug 19, 2020

@mrshenli Hi, with pleasure. It would be better if there was a corresponding test, but I do not know how to make it, I cannot also find the test code for nn.parallel for reference. If it is necessary, I hope you or someone else who are good at this can make it. Anyway, I think the code shall be fine, I have used the after modification version for my experiments.

@mrshenli
Copy link
Contributor

Hey @anoidgit, you can add this to test_data_parallel.py. See the example below:

@unittest.skipIf(not TEST_MULTIGPU, "multi-GPU not supported")
def test_parallel_apply(self):
l1 = nn.Linear(10, 5).to("cuda:0", torch.float)
l2 = nn.Linear(10, 5).to("cuda:1", torch.float)
i1 = torch.randn(2, 10, device="cuda:0", dtype=torch.float)
i2 = torch.randn(2, 10, device="cuda:1", dtype=torch.float)
expected1 = l1(i1)
expected2 = l2(i2)
modules = (l1, l2)
expected_outputs = (expected1, expected2)
# each input can be either a collection of positional arguments
# or an object representing the single argument
for inputs in [((i1,), (i2,)), (i1, i2)]:
outputs = dp.parallel_apply(modules, inputs, None)
for out, expected in zip(outputs, expected_outputs):
self.assertEqual(out, expected)

@hfxunlp
Copy link
Author

hfxunlp commented Aug 20, 2020

Hi @mrshenli , thanks a lot for your help. I made the test following the example, please help ensure its correctness.

Copy link
Contributor

@mrshenli mrshenli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Just a minor comment. Thanks for contributing!

expected_outputs = (expected1, expected2)

# each input can be either a collection of positional arguments
# or an object representing the single argument
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason for the long indent before "or"?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for pointing this out. It also looks strange to me, but I do not know the reason, I am following the example. If it is not right, please help fix.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that might be some debug vestige. No worries, I will land this as is, and submit a PR to fix both.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mrshenli has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@mcarilli
Copy link
Collaborator

mcarilli commented Aug 24, 2020

Code changes look good, thanks! DP + autocast documentation should also be updated, but I can do that myself in a separate PR. Updating documentation is not urgent because guidance in the existing documentation will continue to work after this PR, it will just become overkill.

@hfxunlp
Copy link
Author

hfxunlp commented Aug 25, 2020

@mcarilli many thanks for your efforts :)

@facebook-github-bot
Copy link
Contributor

@mrshenli merged this pull request in f02753f.

facebook-github-bot pushed a commit that referenced this pull request Dec 3, 2021
Summary:
Following #60540 and pull request #43102

Pull Request resolved: #69218

Reviewed By: gchanan

Differential Revision: D32803814

Pulled By: ngimel

fbshipit-source-id: 06fdbbee2c7734153271be70ec4bc24263c8c367
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Merged module: data parallel open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants