Skip to content

Conversation

@CaoE
Copy link
Collaborator

@CaoE CaoE commented Sep 1, 2022

The model TTS will crash due to the issue:: when input of BN is not contiguous and the data type of input is different with that of parameters, BN will raise error RuntimeError: !needs_dynamic_casting<func_t>::check(iter) INTERNAL ASSERT FAILED at "xxx/pytorch/aten/src/ATen/native/cpu/Loops.h":311, please report a bug to PyTorch.

Make the data types of output and input consistenst for batchnorm to fix the issue.

@pytorch-bot pytorch-bot bot added the release notes: nn release notes category label Sep 1, 2022
@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Sep 1, 2022

🔗 Helpful links

❌ 1 New Failures

As of commit 7c849fbc1c (more details on the Dr. CI page):

Expand to see more
  • 1/1 failures introduced in this PR

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages

See GitHub Actions build pull / linux-bionic-cuda11_6-py3_10-gcc7-deploy / test (deploy, 1, 1, linux.4xlarge.nvidia.gpu) (1/1)

Step: "Install nvidia driver, nvidia-docker runtime, set GPU_FLAG" (full log | diagnosis details)

2022-09-01T11:52:48.1643639Z ##[error]Final attempt failed. Child_process exited with error code 1
2022-09-01T11:52:37.3478507Z 
2022-09-01T11:52:37.3478710Z Please refer to the following page for additional information and to install
2022-09-01T11:52:37.3479082Z optional driver components:
2022-09-01T11:52:37.3479268Z 
2022-09-01T11:52:37.3479505Z  http://negativo17.org/nvidia-driver/
2022-09-01T11:52:37.3479720Z 
2022-09-01T11:52:37.3479745Z 
2022-09-01T11:52:37.3479877Z (Answer: Abort installation)
2022-09-01T11:52:37.3480534Z ERROR: The installation was canceled due to the availability or presence of an alternate driver installation. Please see /var/log/nvidia-installer.log for more details.
2022-09-01T11:52:37.3484909Z + false
2022-09-01T11:52:48.1643639Z ##[error]Final attempt failed. Child_process exited with error code 1
2022-09-01T11:52:48.1644379Z 
2022-09-01T11:52:48.1644916Z 
2022-09-01T11:52:48.1724610Z Prepare all required actions
2022-09-01T11:52:48.1725009Z Getting action download info
2022-09-01T11:52:48.3427133Z ##[group]Run ./.github/actions/get-workflow-job-id
2022-09-01T11:52:48.3427441Z with:
2022-09-01T11:52:48.3427870Z   github-token: ***
2022-09-01T11:52:48.3428097Z env:
2022-09-01T11:52:48.3428341Z   GIT_DEFAULT_BRANCH: master
2022-09-01T11:52:48.3428607Z ##[endgroup]

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@mingfeima mingfeima requested a review from frank-wei September 2, 2022 01:34
@CaoE
Copy link
Collaborator Author

CaoE commented Sep 2, 2022

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: Approval needed from one of the following (Rule 'superuser'):
swolchok, gtarjun, lazysjb, kurman, cccclai, ...

Details for Dev Infra team Raised by workflow job

@yanbing-j yanbing-j added intel priority matters to intel architecture from performance wise intel This tag is for PR from Intel labels Sep 7, 2022
@pytorch-bot
Copy link

pytorch-bot bot commented Sep 8, 2022

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/84410

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures, 4 Pending

As of commit 68f02d2:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@CaoE CaoE force-pushed the ecao/BN_fix branch 3 times, most recently from ec7a9ff to cd7ff8f Compare September 29, 2022 01:21
@CaoE
Copy link
Collaborator Author

CaoE commented Sep 29, 2022

@frank-wei Could you please view this PR ? Thank you.

@facebook-github-bot
Copy link
Contributor

/easycla

As part of the transition to the PyTorch Foundation, this project now requires contributions be covered under the new CLA. See #85559 for additional details.

This comment will trigger a new check of this PR. If you are already covered, you will simply see a new "EasyCLA" check that passes. If you are not covered, a bot will leave a new comment with a link to sign.

@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Oct 4, 2022

CLA Signed

The committers listed above are authorized under a signed CLA.

  • ✅ login: CaoE / name: Cao E (cd7ff8f5abc9f357c9014abfa09f4205e38776ce)

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 12, 2022
@CaoE
Copy link
Collaborator Author

CaoE commented Oct 12, 2022

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: Approval needed from one of the following (Rule 'superuser'):
DanilBaibak, hlin09, ajitmaths, tenpercent, mikeiovine, ...

Details for Dev Infra team Raised by workflow job

@CaoE CaoE requested a review from malfet October 12, 2022 07:02
@CaoE
Copy link
Collaborator Author

CaoE commented Oct 12, 2022

@malfet Could you please view this PR ? It will fix a crash issue. Thank you.

Copy link
Contributor

@malfet malfet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, would be nice to answer the question about explicit .cpu() calls

@malfet
Copy link
Contributor

malfet commented Oct 12, 2022

Also, please note that RuntimeError: is not a crash, but a structured exception, that could be safely handled by callee.

@CaoE
Copy link
Collaborator Author

CaoE commented Oct 13, 2022

@malfet Thank you for your correction and kind help. May I know if this PR will be landed in pytorch 1.13 ?

@CaoE
Copy link
Collaborator Author

CaoE commented Oct 13, 2022

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@github-actions
Copy link
Contributor

Hey @CaoE.
You've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.
For changes that are 'topic: not user facing' there is no need for a release notes label.

@malfet malfet linked an issue Oct 13, 2022 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request cla signed intel priority matters to intel architecture from performance wise intel This tag is for PR from Intel Merged open source release notes: nn release notes category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

RuntimeError caused by batchnorm

8 participants