Skip to content

Conversation

@fegin
Copy link
Contributor

@fegin fegin commented Mar 28, 2022

Stack from ghstack (oldest at bottom):

These pre/post hooks must be registered even if the FlatParamsWrapper does not flatten any parameters; the FlatParamsWrapper may contains some FSDP-wrapped submodules.

Differential Revision: D35194483

…_list is empty

These pre/post hooks must be registered even if the FlatParamsWrapper does not flatten any parameters; the FlatParamsWrapper may contains some FSDP-wrapped submodules.

Differential Revision: [D35194483](https://our.internmc.facebook.com/intern/diff/D35194483/)

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Mar 28, 2022

🔗 Helpful links

💊 CI failures summary and remediations

As of commit 8864624 (more details on the Dr. CI page):


  • 4/4 failures introduced in this PR

🕵️ 3 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See GitHub Actions build pull / linux-xenial-cuda11.3-py3.7-gcc7-bazel-test / build-and-test (1/3)

Step: "Build" (full log | diagnosis details | 🔁 rerun)

2022-03-30T17:28:15.8736969Z �[36;1m echo "ERR...t available for the merge-base of your branch"�[0m
2022-03-30T17:28:15.8733684Z �[36;1mfi�[0m
2022-03-30T17:28:15.8733951Z �[36;1m# Covers the case where a previous tag doesn't exist for the tree�[0m
2022-03-30T17:28:15.8734335Z �[36;1m# this is only really applicable on trees that don't have `.circleci/docker` at its merge base, i.e. nightly�[0m
2022-03-30T17:28:15.8734693Z �[36;1mif ! git rev-parse "$MERGE_BASE:.circleci/docker"; then�[0m
2022-03-30T17:28:15.8735085Z �[36;1m  echo "Directory '.circleci/docker' not found in commit $MERGE_BASE, you should probably rebase onto a more recent commit"�[0m
2022-03-30T17:28:15.8735407Z �[36;1m  exit 1�[0m
2022-03-30T17:28:15.8735607Z �[36;1mfi�[0m
2022-03-30T17:28:15.8735868Z �[36;1mPREVIOUS_DOCKER_TAG=$(git rev-parse "$MERGE_BASE:.circleci/docker")�[0m
2022-03-30T17:28:15.8736242Z �[36;1m# If no image exists but the hash is the same as the previous hash then we should error out here�[0m
2022-03-30T17:28:15.8736593Z �[36;1mif [[ "${PREVIOUS_DOCKER_TAG}" = "${DOCKER_TAG}" ]]; then�[0m
2022-03-30T17:28:15.8736969Z �[36;1m  echo "ERROR: Something has gone wrong and the previous image isn't available for the merge-base of your branch"�[0m
2022-03-30T17:28:15.8737435Z �[36;1m  echo "       contact the PyTorch team to restore the original images"�[0m
2022-03-30T17:28:15.8737710Z �[36;1m  exit 1�[0m
2022-03-30T17:28:15.8737911Z �[36;1mfi�[0m
2022-03-30T17:28:15.8738134Z �[36;1mecho ::set-output name=rebuild::yes�[0m
2022-03-30T17:28:15.8749079Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2022-03-30T17:28:15.8749339Z env:
2022-03-30T17:28:15.8749520Z   IN_CI: 1
2022-03-30T17:28:15.8749718Z   IS_GHA: 1
2022-03-30T17:28:15.8749938Z   GIT_DEFAULT_BRANCH: master
2022-03-30T17:28:15.8750195Z   BASE_REVISION: a2c33f2632523d1468b476d137bcd2589417e0cc

See GitHub Actions build pull / linux-xenial-cuda11.3-py3.7-gcc7 / build (2/3)

Step: "Build" (full log | diagnosis details | 🔁 rerun)

2022-03-30T17:28:12.0465593Z �[36;1m echo "ERR...t available for the merge-base of your branch"�[0m
2022-03-30T17:28:12.0462421Z �[36;1mfi�[0m
2022-03-30T17:28:12.0462659Z �[36;1m# Covers the case where a previous tag doesn't exist for the tree�[0m
2022-03-30T17:28:12.0463015Z �[36;1m# this is only really applicable on trees that don't have `.circleci/docker` at its merge base, i.e. nightly�[0m
2022-03-30T17:28:12.0463343Z �[36;1mif ! git rev-parse "$MERGE_BASE:.circleci/docker"; then�[0m
2022-03-30T17:28:12.0463700Z �[36;1m  echo "Directory '.circleci/docker' not found in commit $MERGE_BASE, you should probably rebase onto a more recent commit"�[0m
2022-03-30T17:28:12.0463994Z �[36;1m  exit 1�[0m
2022-03-30T17:28:12.0464162Z �[36;1mfi�[0m
2022-03-30T17:28:12.0464579Z �[36;1mPREVIOUS_DOCKER_TAG=$(git rev-parse "$MERGE_BASE:.circleci/docker")�[0m
2022-03-30T17:28:12.0464928Z �[36;1m# If no image exists but the hash is the same as the previous hash then we should error out here�[0m
2022-03-30T17:28:12.0465249Z �[36;1mif [[ "${PREVIOUS_DOCKER_TAG}" = "${DOCKER_TAG}" ]]; then�[0m
2022-03-30T17:28:12.0465593Z �[36;1m  echo "ERROR: Something has gone wrong and the previous image isn't available for the merge-base of your branch"�[0m
2022-03-30T17:28:12.0465940Z �[36;1m  echo "       contact the PyTorch team to restore the original images"�[0m
2022-03-30T17:28:12.0466268Z �[36;1m  exit 1�[0m
2022-03-30T17:28:12.0466438Z �[36;1mfi�[0m
2022-03-30T17:28:12.0466629Z �[36;1mecho ::set-output name=rebuild::yes�[0m
2022-03-30T17:28:12.0476887Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2022-03-30T17:28:12.0477115Z env:
2022-03-30T17:28:12.0477266Z   IN_CI: 1
2022-03-30T17:28:12.0477430Z   IS_GHA: 1
2022-03-30T17:28:12.0477651Z   BASE_REVISION: a2c33f2632523d1468b476d137bcd2589417e0cc
2022-03-30T17:28:12.0478071Z   DOCKER_IMAGE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7:a2c09c6009bb8a10cbb45a8c5f7c573593289be0

See GitHub Actions build pull / deploy-linux-xenial-cuda11.3-py3.7-gcc7 / build (3/3)

Step: "Build" (full log | diagnosis details | 🔁 rerun)

2022-03-30T17:28:11.8762575Z �[36;1m echo "ERR...t available for the merge-base of your branch"�[0m
2022-03-30T17:28:11.8759729Z �[36;1mfi�[0m
2022-03-30T17:28:11.8759960Z �[36;1m# Covers the case where a previous tag doesn't exist for the tree�[0m
2022-03-30T17:28:11.8760299Z �[36;1m# this is only really applicable on trees that don't have `.circleci/docker` at its merge base, i.e. nightly�[0m
2022-03-30T17:28:11.8760615Z �[36;1mif ! git rev-parse "$MERGE_BASE:.circleci/docker"; then�[0m
2022-03-30T17:28:11.8760958Z �[36;1m  echo "Directory '.circleci/docker' not found in commit $MERGE_BASE, you should probably rebase onto a more recent commit"�[0m
2022-03-30T17:28:11.8761236Z �[36;1m  exit 1�[0m
2022-03-30T17:28:11.8761397Z �[36;1mfi�[0m
2022-03-30T17:28:11.8761616Z �[36;1mPREVIOUS_DOCKER_TAG=$(git rev-parse "$MERGE_BASE:.circleci/docker")�[0m
2022-03-30T17:28:11.8761942Z �[36;1m# If no image exists but the hash is the same as the previous hash then we should error out here�[0m
2022-03-30T17:28:11.8762248Z �[36;1mif [[ "${PREVIOUS_DOCKER_TAG}" = "${DOCKER_TAG}" ]]; then�[0m
2022-03-30T17:28:11.8762575Z �[36;1m  echo "ERROR: Something has gone wrong and the previous image isn't available for the merge-base of your branch"�[0m
2022-03-30T17:28:11.8762910Z �[36;1m  echo "       contact the PyTorch team to restore the original images"�[0m
2022-03-30T17:28:11.8763204Z �[36;1m  exit 1�[0m
2022-03-30T17:28:11.8763368Z �[36;1mfi�[0m
2022-03-30T17:28:11.8763616Z �[36;1mecho ::set-output name=rebuild::yes�[0m
2022-03-30T17:28:11.8773837Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2022-03-30T17:28:11.8774054Z env:
2022-03-30T17:28:11.8774195Z   IN_CI: 1
2022-03-30T17:28:11.8774358Z   IS_GHA: 1
2022-03-30T17:28:11.8774567Z   BASE_REVISION: a2c33f2632523d1468b476d137bcd2589417e0cc
2022-03-30T17:28:11.8774962Z   DOCKER_IMAGE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7:a2c09c6009bb8a10cbb45a8c5f7c573593289be0

1 failure not recognized by patterns:

Job Step Action
GitHub Actions pull / pytorch-xla-linux-bionic-py3.7-clang8 / test (xla, 1, 1, linux.2xlarge) Test 🔁 rerun

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@facebook-github-bot facebook-github-bot added the oncall: distributed Add this issue/PR to distributed oncall triage queue label Mar 28, 2022
…n if params_list is empty"

These pre/post hooks must be registered even if the FlatParamsWrapper does not flatten any parameters; the FlatParamsWrapper may contains some FSDP-wrapped submodules.

Differential Revision: [D35194483](https://our.internmc.facebook.com/intern/diff/D35194483/)

[ghstack-poisoned]
fegin added a commit that referenced this pull request Mar 28, 2022
…_list is empty

Pull Request resolved: #74860

These pre/post hooks must be registered even if the FlatParamsWrapper does not flatten any parameters; any submodule inside FlatParamsWrapper should be pre/post processed by the hooks.
ghstack-source-id: 152390556

Differential Revision: [D35194483](https://our.internmc.facebook.com/intern/diff/D35194483/)
Copy link
Contributor

@rohan-varma rohan-varma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but can we add the test described in #74810?

…n if params_list is empty"

These pre/post hooks must be registered even if the FlatParamsWrapper does not flatten any parameters; the FlatParamsWrapper may contains some FSDP-wrapped submodules.

Differential Revision: [D35194483](https://our.internmc.facebook.com/intern/diff/D35194483/)

[ghstack-poisoned]
@fegin fegin requested a review from awgu as a code owner March 29, 2022 22:42
fegin added a commit that referenced this pull request Mar 29, 2022
…_list is empty

Pull Request resolved: #74860

These pre/post hooks must be registered even if the FlatParamsWrapper does not flatten any parameters; any submodule inside FlatParamsWrapper should be pre/post processed by the hooks.
ghstack-source-id: 152526345

Differential Revision: [D35194483](https://our.internmc.facebook.com/intern/diff/D35194483/)
setattr(module, LINEAR_SKIP, linear_skip)
return fsdp, linear_skip_tensor_names

fsdp, linear_skip_tensor_names = _create_module()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

linear_skip_tensor_names is unused. I think it is for checking unused parameter in checkpoint, can you add a TODO here?

loss = fsdp(inp)
loss.sum().backward()

state_dict = fsdp.state_dict()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the overall goal to resolve the issue is to ensure that this can be loaded into a local module, I can try this though and add a follow up change. Or feel free to do so if you have the time.

…n if params_list is empty"

These pre/post hooks must be registered even if the FlatParamsWrapper does not flatten any parameters; the FlatParamsWrapper may contains some FSDP-wrapped submodules.

Differential Revision: [D35194483](https://our.internmc.facebook.com/intern/diff/D35194483/)

[ghstack-poisoned]
…n if params_list is empty"

These pre/post hooks must be registered even if the FlatParamsWrapper does not flatten any parameters; the FlatParamsWrapper may contains some FSDP-wrapped submodules.

Differential Revision: [D35194483](https://our.internmc.facebook.com/intern/diff/D35194483/)

[ghstack-poisoned]
fegin added a commit that referenced this pull request Mar 30, 2022
…_list is empty

Pull Request resolved: #74860

These pre/post hooks must be registered even if the FlatParamsWrapper does not flatten any parameters; any submodule inside FlatParamsWrapper should be pre/post processed by the hooks.
ghstack-source-id: 152557128

Differential Revision: [D35194483](https://our.internmc.facebook.com/intern/diff/D35194483/)
…n if params_list is empty"

These pre/post hooks must be registered even if the FlatParamsWrapper does not flatten any parameters; the FlatParamsWrapper may contains some FSDP-wrapped submodules.

Differential Revision: [D35194483](https://our.internmc.facebook.com/intern/diff/D35194483/)

[ghstack-poisoned]
…n if params_list is empty"

These pre/post hooks must be registered even if the FlatParamsWrapper does not flatten any parameters; the FlatParamsWrapper may contains some FSDP-wrapped submodules.

Differential Revision: [D35194483](https://our.internmc.facebook.com/intern/diff/D35194483/)

[ghstack-poisoned]
fegin added a commit that referenced this pull request Mar 30, 2022
…_list is empty

Pull Request resolved: #74860

These pre/post hooks must be registered even if the FlatParamsWrapper does not flatten any parameters; any submodule inside FlatParamsWrapper should be pre/post processed by the hooks.
ghstack-source-id: 152594052

Differential Revision: [D35194483](https://our.internmc.facebook.com/intern/diff/D35194483/)
facebook-github-bot pushed a commit that referenced this pull request Mar 31, 2022
…_list is empty (#74860)

Summary:
Pull Request resolved: #74860

These pre/post hooks must be registered even if the FlatParamsWrapper does not flatten any parameters; any submodule inside FlatParamsWrapper should be pre/post processed by the hooks.
ghstack-source-id: 152594052

Test Plan: CI

Reviewed By: rohan-varma

Differential Revision: D35194483

fbshipit-source-id: c25d7846f317c7ce78d77d335d041fed8db8f3a1
@github-actions
Copy link
Contributor

Hey @fegin.
You've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.
For changes that are 'topic: not user facing' there is no need for a release notes label.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed oncall: distributed Add this issue/PR to distributed oncall triage queue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants