Skip to content

Conversation

@malfet
Copy link
Contributor

@malfet malfet commented Sep 29, 2020

According to http://www.netlib.org/lapack/explore-html/d3/da8/group__complex16_g_esing_gaccb06ed106ce18814ad7069dcb43aa27.html
rwork should be an array of doubles, but it was allocated as array of floats (actually ints)

Fixes crash from #45269

@malfet malfet added module: crash Problem manifests as a hard crash, as opposed to a RuntimeError triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module module: complex Related to complex number support in PyTorch labels Sep 29, 2020
@malfet malfet requested review from anjali411 and gchanan September 29, 2020 05:28
Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@malfet has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@dr-ci
Copy link

dr-ci bot commented Sep 29, 2020

💊 CI failures summary and remediations

As of commit d9e8f1a (more details on the Dr. CI page):



🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_libtorch_linux_xenial_cuda11_0_cudnn8_py3_gcc7_build (1/1)

Step: "Build" (full log | diagnosis details | 🔁 rerun)

Sep 29 15:55:19 caused by: Connection refused (os error 111)
Sep 29 15:55:19 ++++ extract_trap_cmd 
Sep 29 15:55:19 ++++ printf '%s\n' '' 
Sep 29 15:55:19 +++ printf '%s\n' cleanup 
Sep 29 15:55:19 ++ trap -- ' 
Sep 29 15:55:19 cleanup' EXIT 
Sep 29 15:55:19 ++ [[ pytorch-libtorch-linux-xenial-cuda11.0-cudnn8-py3-gcc7-build != *pytorch-win-* ]] 
Sep 29 15:55:19 ++ which sccache 
Sep 29 15:55:19 ++ sccache --stop-server 
Sep 29 15:55:19 Stopping sccache server... 
Sep 29 15:55:19 error: couldn't connect to server 
Sep 29 15:55:19 caused by: Connection refused (os error 111) 
Sep 29 15:55:19 ++ true 
Sep 29 15:55:19 ++ rm /var/lib/jenkins/sccache_error.log 
Sep 29 15:55:19 rm: cannot remove '/var/lib/jenkins/sccache_error.log': No such file or directory 
Sep 29 15:55:19 ++ true 
Sep 29 15:55:19 ++ [[ pytorch-libtorch-linux-xenial-cuda11.0-cudnn8-py3-gcc7-build == *rocm* ]] 
Sep 29 15:55:19 ++ SCCACHE_ERROR_LOG=/var/lib/jenkins/sccache_error.log 
Sep 29 15:55:19 ++ SCCACHE_IDLE_TIMEOUT=1200 
Sep 29 15:55:19 ++ RUST_LOG=sccache::server=error 
Sep 29 15:55:19 ++ sccache --start-server 
Sep 29 15:55:19 Starting sccache server... 

❄️ 1 failure tentatively classified as flaky

but reruns have not yet been triggered to confirm:

See CircleCI build pytorch_macos_10_13_py3_build (1/1)

Step: "Update Homebrew" (full log | diagnosis details | 🔁 rerun) ❄️

fatal: Could not read from remote repository.
Receiving objects:  98% (175/178) Receiving objects:  99% (177/178) Receiving objects: 100% (178/178) Receiving objects: 100% (178/178), 63.88 KiB | 10.65 MiB/s, done. 
Resolving deltas:  96% (89/92) Resolving deltas:  97% (90/92) Resolving deltas: 100% (92/92) Resolving deltas: 100% (92/92), completed with 85 local objects. 
From ssh://github.com/Homebrew/homebrew-cask-versions 
 + 15f6b44...92646be master     -> origin/master  (forced update) 
+ git reset --hard origin/master 
HEAD is now at 92646be Update dropbox-beta from 107.3.412 to 107.3.416 (#9683) 
+ for path in '$(find /usr/local/Homebrew -type d -name .git)' 
+ cd /usr/local/Homebrew/Library/Taps/homebrew/homebrew-core/.git/.. 
+ git fetch --depth=1 origin 
Connection to github.com closed by remote host.  
fatal: Could not read from remote repository. 
 
Please make sure you have the correct access rights 
and the repository exists. 

ci.pytorch.org: 1 failed


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 9 times.

@malfet malfet force-pushed the malfet/fix-torch.svd-for-complex-doubles branch from c12a6d1 to d9e8f1a Compare September 29, 2020 15:43
@malfet malfet requested a review from anjali411 September 29, 2020 15:43
@malfet
Copy link
Contributor Author

malfet commented Sep 29, 2020

Updated PR to use C++ typechecking as extra guard against such errors in the future

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@malfet has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@malfet merged this pull request in 772ce9a.

@malfet malfet deleted the malfet/fix-torch.svd-for-complex-doubles branch September 30, 2020 15:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Merged module: complex Related to complex number support in PyTorch module: crash Problem manifests as a hard crash, as opposed to a RuntimeError triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants