Skip to content

Conversation

@peterjc123
Copy link
Collaborator

  1. Use LoadLibraryEx if available
  2. Print more info on error

@peterjc123 peterjc123 requested review from ezyang and malfet June 22, 2020 05:16
@dr-ci
Copy link

dr-ci bot commented Jun 22, 2020

💊 CI failures summary and remediations

As of commit a166018 (more details on the Dr. CI page):


  • 1/1 failures possibly* introduced in this PR
    • 1/1 non-CircleCI failure(s)

ci.pytorch.org: 1 failed


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 23 times.

@peterjc123 peterjc123 force-pushed the dynamic_library_win_fix branch from 60f371e to da24813 Compare June 22, 2020 15:26
Copy link
Contributor

@ezyang ezyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know if we actually have test coverage on Windows for this?

@ezyang
Copy link
Contributor

ezyang commented Jun 22, 2020

Test failure is legit:

[ RUN      ] RNNTest.Sizes_CUDA
unknown file: error: C++ exception with description "error in LoadLibrary for caffe2_nvrtc.dll. WinError 87: The parameter is incorrect.


Exception raised from DynamicLibrary at ..\aten\src\ATen\DynamicLibrary.cpp:66 (most recent call first):
00007FF9696670F200007FF969667090 c10.dll!c10::Error::Error [<unknown file> @ <unknown line number>]
00007FF938AE9AA000007FF938AE9900 torch_cpu.dll!at::DynamicLibrary::DynamicLibrary [<unknown file> @ <unknown line number>]
00007FF940E5494800007FF940E53D30 torch_cuda.dll!at::cuda::getPinnedMemoryAllocator [<unknown file> @ <unknown line number>]
00007FF940E543A100007FF940E53D30 torch_cuda.dll!at::cuda::getPinnedMemoryAllocator [<unknown file> @ <unknown line number>]
00007FF940EF5C3B00007FF940EF5AC0 torch_cuda.dll!getTHCCachingHostAllocator [<unknown file> @ <unknown line number>]
00007FF940EF3B8E00007FF940E565F0 torch_cuda.dll!at::native::set_storage_cuda_ [<unknown file> @ <unknown line number>]
00007FF938D6567F00007FF938D65500 torch_cpu.dll!at::native::empty_cpu [<unknown file> @ <unknown line number>]
00007FF938F05C8F00007FF938EAC950 torch_cpu.dll!at::native::mkldnn_sigmoid_ [<unknown file> @ <unknown line number>]
00007FF938EBEDD500007FF938EAC950 torch_cpu.dll!at::native::mkldnn_sigmoid_ [<unknown file> @ <unknown line number>]
00007FF938EC082E00007FF938EAC950 torch_cpu.dll!at::native::mkldnn_sigmoid_ [<unknown file> @ <unknown line number>]
00007FF938EBEDD500007FF938EAC950 torch_cpu.dll!at::native::mkldnn_sigmoid_ [<unknown file> @ <unknown line number>]
00007FF938FA179100007FF938FA1670 torch_cpu.dll!at::empty [<unknown file> @ <unknown line number>]
00007FF9405F814000007FF9405E5BF0 torch_cuda.dll!at::native::cummin_helper_cuda [<unknown file> @ <unknown line number>]
00007FF9405FBDC800007FF9405E5BF0 torch_cuda.dll!at::native::cummin_helper_cuda [<unknown file> @ <unknown line number>]
00007FF9405FD35900007FF9405FCE00 torch_cuda.dll!at::native::cat_out_cuda [<unknown file> @ <unknown line number>]
00007FF9405FCD4400007FF9405FCC60 torch_cuda.dll!at::native::cat_cuda [<unknown file> @ <unknown line number>]
00007FF940E9A5C200007FF940E565F0 torch_cuda.dll!at::native::set_storage_cuda_ [<unknown file> @ <unknown line number>]
00007FF940EACAE900007FF940E565F0 torch_cuda.dll!at::native::set_storage_cuda_ [<unknown file> @ <unknown line number>]
00007FF938F906CD00007FF938F87F00 torch_cpu.dll!at::bucketize_out [<unknown file> @ <unknown line number>]
00007FF938F6F55100007FF938F6F4F0 torch_cpu.dll!at::_cat [<unknown file> @ <unknown line number>]
00007FF938D7AFF500007FF938D7AEE0 torch_cpu.dll!at::native::cat [<unknown file> @ <unknown line number>]
00007FF93905F20D00007FF938FEBB90 torch_cpu.dll!at::zeros_out [<unknown file> @ <unknown line number>]
00007FF938B2192E00007FF938B15D10 torch_cpu.dll!torch::nn::functional::BatchNormFuncOptions::~BatchNormFuncOptions [<unknown file> @ <unknown line number>]
00007FF938F906CD00007FF938F87F00 torch_cpu.dll!at::bucketize_out [<unknown file> @ <unknown line number>]
00007FF938F95DE100007FF938F95D80 torch_cpu.dll!at::cat [<unknown file> @ <unknown line number>]
00007FF93A2B0B5600007FF93A21B6A0 torch_cpu.dll!torch::autograd::GraphRoot::apply [<unknown file> @ <unknown line number>]
00007FF938B2192E00007FF938B15D10 torch_cpu.dll!torch::nn::functional::BatchNormFuncOptions::~BatchNormFuncOptions [<unknown file> @ <unknown line number>]
00007FF938F906CD00007FF938F87F00 torch_cpu.dll!at::bucketize_out [<unknown file> @ <unknown line number>]
00007FF938F95DE100007FF938F95D80 torch_cpu.dll!at::cat [<unknown file> @ <unknown line number>]
00007FF75ED6E7D000007FF75EA0E030 test_api.exe!c10::ivalue::Future::wait [<unknown file> @ <unknown line number>]
00007FF75EE6BB3C00007FF75EA0E030 test_api.exe!c10::ivalue::Future::wait [<unknown file> @ <unknown line number>]
00007FF75EE6BA2600007FF75EA0E030 test_api.exe!c10::ivalue::Future::wait [<unknown file> @ <unknown line number>]
00007FF75EE89AF300007FF75EA0E030 test_api.exe!c10::ivalue::Future::wait [<unknown file> @ <unknown line number>]
00007FF75EE8989D00007FF75EA0E030 test_api.exe!c10::ivalue::Future::wait [<unknown file> @ <unknown line number>]
00007FF75EE8A37800007FF75EA0E030 test_api.exe!c10::ivalue::Future::wait [<unknown file> @ <unknown line number>]
00007FF75EE6BBDC00007FF75EA0E030 test_api.exe!c10::ivalue::Future::wait [<unknown file> @ <unknown line number>]
00007FF75EE6BB0600007FF75EA0E030 test_api.exe!c10::ivalue::Future::wait [<unknown file> @ <unknown line number>]
00007FF75EE89D2C00007FF75EA0E030 test_api.exe!c10::ivalue::Future::wait [<unknown file> @ <unknown line number>]
00007FF75E9D9195 <unknown symbol address> test_api.exe!<unknown symbol> [<unknown file> @ <unknown line number>]
00007FF75EE8E79000007FF75EA0E030 test_api.exe!c10::ivalue::Future::wait [<unknown file> @ <unknown line number>]
00007FF97168797400007FF971687960 KERNEL32.DLL!BaseThreadInitThunk [<unknown file> @ <unknown line number>]
00007FF9741DA27100007FF9741DA250 ntdll.dll!RtlUserThreadStart [<unknown file> @ <unknown line number>]
" thrown in the test body.
[  FAILED  ] RNNTest.Sizes_CUDA (1972 ms)
[ RUN      ] RNNTest.EndToEndLSTM_CUDA

(answer: yes we do!)

Copy link
Contributor

@ezyang ezyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ci failing

@peterjc123
Copy link
Collaborator Author

peterjc123 commented Jun 23, 2020

ci failing

Fixed.

Update: It doesn't. Because we are relying on PATH. Change to reverting back to the default behavior (LoadLibrary if error code 126).

@ezyang
Copy link
Contributor

ezyang commented Jun 23, 2020

no dice?

@peterjc123
Copy link
Collaborator Author

peterjc123 commented Jun 23, 2020

no dice?

LOL. I've made a stupid mistake.

Comment on lines 49 to 64
HMODULE theModule;
bool reload = true;
if (GetProcAddress(GetModuleHandle("KERNEL32.DLL"), "AddDllDirectory") != NULL) {
theModule = LoadLibraryExA(
name,
NULL,
LOAD_LIBRARY_SEARCH_DEFAULT_DIRS);
if (theModule != NULL || (GetLastError() != ERROR_MOD_NOT_FOUND)) {
reload = false;
}
}

if (reload) {
theModule = LoadLibraryA(name);
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why theModule != NULL is not a sufficient check? Moreover, why simply not do something like:
theModule = has_LOAD_LIBRARY_SEARCH_DEFAULT_DIRS() ? LoadLibraryEx(...) : LoadLibrary(...);

Suggested change
HMODULE theModule;
bool reload = true;
if (GetProcAddress(GetModuleHandle("KERNEL32.DLL"), "AddDllDirectory") != NULL) {
theModule = LoadLibraryExA(
name,
NULL,
LOAD_LIBRARY_SEARCH_DEFAULT_DIRS);
if (theModule != NULL || (GetLastError() != ERROR_MOD_NOT_FOUND)) {
reload = false;
}
}
if (reload) {
theModule = LoadLibraryA(name);
}
HMODULE theModule = NULL;
if (GetProcAddress(GetModuleHandle("KERNEL32.DLL"), "AddDllDirectory") != NULL) {
theModule = LoadLibraryExA(
name,
NULL,
LOAD_LIBRARY_SEARCH_DEFAULT_DIRS);
}
if (theModule == NULL) {
theModule = LoadLibraryA(name);
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the main differences between LoadLibrary and LoadLibraryEx is that the search order and the directories being searched. So only when we get ERROR_MOD_NOT_FOUND, it is meaningful to try LoadLibrary afterwards.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still can't imagine a valid PyTorch installation, where LoadLibraryEx would fail but LoadLibrary will succeed.
Alternatively, if we will fall-back to LoadLibrary if LoadLibraryEx would fail, shouldn't we always call LoadLibrary (because, it would not ignore paths specified via AddDllDirectory, would it?)

Or, why not do something like that:

 DWORD loadFlags = 0;
 if (hasAddDllDirectory) loadFlags = LOAD_LIBRARY_SEARCH_DEFAULT_DIRS;
 theModule = loadLibraryExA(name, NULL, loadFlags);

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still can't imagine a valid PyTorch installation, where LoadLibraryEx would fail but LoadLibrary will succeed.

LoadLibraryExA with LOAD_LIBRARY_SEARCH_DEFAULT_DIRS doesn't considers directories in PATH.

Alternatively, if we will fall-back to LoadLibrary if LoadLibraryEx would fail, shouldn't we always call LoadLibrary (because, it would not ignore paths specified via AddDllDirectory, would it?)

Yes, it is partially the reason. Also, the search order is more reasonable because SYSTEM dir is searched in the last step.

Co-authored-by: Nikita Shulga <nikita.shulga@gmail.com>
@zou3519 zou3519 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jun 23, 2020
Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@malfet is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@malfet is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

peterjc123 added a commit to peterjc123/pytorch that referenced this pull request Jun 24, 2020
Summary:
1. Use LoadLibraryEx if available
2. Print more info on error
Pull Request resolved: pytorch#40365

Differential Revision: D22194974

Pulled By: malfet

fbshipit-source-id: e8309f39d78fd4681de5aa032288882910dff928
(cherry picked from commit a2d4d9e)
@facebook-github-bot
Copy link
Contributor

@malfet merged this pull request in a2d4d9e.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Merged open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants