Add lightweight reparametrization for `_stateless` calls #68969

emcastillo · 2021-11-29T02:09:15Z

#61447 introduced a mechanism for performing functional calls in a model using the reparametrization API. However, the overhead introduced in a single call was too large.
I tried to address this by modifying the reparametrization code to support spare tensors, but the changes needed were too large due to type checking and several parts of the code expecting actual nn.Module objects so this option was not feasible.

resnet50 and call functional with a parameters dict covering the 0, 25, 50, and 100% of the model total parameters.

Used script:
https://gist.github.com/emcastillo/f344a58638bd71d130c71c45f86f0c3a

% of parameters passed	CPU Time (us)	GPU Time (us)
regular call	5539	184909
0	5561	184843
25	11363	189236
50	18716	195378
75	22851	198641
100	27441	202281

This PR just swaps the __getattr__ of the submodules to look into a dict holding only the parameters when called, greatly reducing the burden of having to instantiate custom modules and calling forward to just retrieve a tensor.

The execution times now are as follows:

% of parameters passed	CPU Time (us)	GPU Time (us)
regular call	5939	187533
0	5899	187570
25	8541	188953
50	10045	189826
75	11049	190344
100	11911	190800
functorch with 100% params	14014	191727

Now we see that the CPU time overhead is greatly reduced and the GPU time barely increases due to the effective overlap.

cc @albanD @zou3519

pytorch-probot · 2021-11-29T02:09:17Z

CI Flow Status

⚛️ CI Flow

Ruleset - Version: v1
Ruleset - File: https://github.com/emcastillo/pytorch/blob/dab2ca50474ecaa2271c59f8b9e0e5d55a221cb5/.github/generated-ciflow-ruleset.json
PR ciflow labels: ciflow/default

Workflows	Labels (bold enabled)	Status
Triggered Workflows
linux-bionic-py3.7-clang9	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/noarch`, `ciflow/trunk`	✅ triggered
linux-docs	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/docs`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-vulkan-bionic-py3.7-clang9	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`, `ciflow/vulkan`	✅ triggered
linux-xenial-cuda11.3-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-xenial-cuda11.3-py3.7-gcc7-bazel-test	`ciflow/all`, `ciflow/bazel`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-xenial-py3-clang5-mobile-build	`ciflow/all`, `ciflow/default`, `ciflow/linux`, `ciflow/mobile`, `ciflow/trunk`	✅ triggered
linux-xenial-py3-clang5-mobile-custom-build-static	`ciflow/all`, `ciflow/default`, `ciflow/linux`, `ciflow/mobile`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-clang7-asan	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/sanitizers`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-clang7-onnx	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/onnx`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-gcc7	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-gcc7-no-ops	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single	`ciflow/all`, `ciflow/android`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit	`ciflow/all`, `ciflow/android`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
win-vs2019-cpu-py3	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/trunk`, `ciflow/win`	✅ triggered
win-vs2019-cuda11.3-py3	`ciflow/all`, `ciflow/cuda`, `ciflow/default`, `ciflow/trunk`, `ciflow/win`	✅ triggered
Skipped Workflows
caffe2-linux-xenial-py3.7-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
docker-builds	`ciflow/all`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-arm64	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-arm64-coreml	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-arm64-custom-ops	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-arm64-full-jit	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-arm64-metal	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-x86-64	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-x86-64-coreml	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-x86-64-full-jit	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
libtorch-linux-xenial-cuda10.2-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
libtorch-linux-xenial-cuda11.3-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
linux-bionic-cuda10.2-py3.9-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/slow`, `ciflow/trunk`	🚫 skipped
linux-docs-push	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
linux-xenial-cuda11.3-py3.7-gcc7-no-ops	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
macos-10-15-py3-arm64	`ciflow/all`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
macos-10-15-py3-lite-interpreter-x86-64	`ciflow/all`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
macos-11-py3-x86-64	`ciflow/all`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
parallelnative-linux-xenial-py3.7-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
periodic-libtorch-linux-bionic-cuda11.5-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-libtorch-linux-xenial-cuda11.1-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-linux-bionic-cuda11.5-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/scheduled`, `ciflow/slow`, `ciflow/slow-gradcheck`	🚫 skipped
periodic-linux-xenial-cuda11.1-py3.7-gcc7-debug	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-win-vs2019-cuda11.1-py3	`ciflow/all`, `ciflow/cuda`, `ciflow/scheduled`, `ciflow/win`	🚫 skipped
periodic-win-vs2019-cuda11.5-py3	`ciflow/all`, `ciflow/cuda`, `ciflow/scheduled`, `ciflow/win`	🚫 skipped
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-build	`ciflow/all`, `ciflow/android`, `ciflow/cpu`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped

You can add a comment to the PR and tag @pytorchbot with the following commands:

# ciflow rerun, "ciflow/default" will always be added automatically
@pytorchbot ciflow rerun

# ciflow rerun with additional labels "-l <ciflow/label_name>", which is equivalent to adding these labels manually and trigger the rerun
@pytorchbot ciflow rerun -l ciflow/scheduled -l ciflow/slow

For more information, please take a look at the CI Flow Wiki.

facebook-github-bot · 2021-11-29T02:09:20Z

🔗 Helpful links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/68969
🔧 Opt-in to CIFlow to control what jobs run on your PRs

💊 CI failures summary and remediations

As of commit 4be2c41 (more details on the Dr. CI page):

17/17 failures possibly* introduced in this PR
- 1/17 non-scanned failure(s)

🕵️ 14 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

linux-xenial-cuda11.3-py3.7-gcc7 / build (1/14)

Step: "Checkout PyTorch" (full log | diagnosis details | 🔁 rerun)

2022-01-28T02:06:03.6090747Z /home/ec2-user/act...ait_for_ssh_to_drain.sh: No such file or directory

  "cla signed",
  "ciflow/default"
]
2022-01-28T02:06:03.6057822Z   GITHUB_TOKEN: ***
2022-01-28T02:06:03.6058009Z   AWS_DEFAULT_REGION: us-east-1
2022-01-28T02:06:03.6058182Z   PR_NUMBER: 68969
2022-01-28T02:06:03.6058388Z   SHA1: 4be2c41e0478f21ab98df793bb55b8496e65cace
2022-01-28T02:06:03.6058586Z   PYTORCH_RETRY_TEST_CASES: 1
2022-01-28T02:06:03.6058832Z   JOB_BASE_NAME: linux-xenial-cuda11.3-py3.7-gcc7-build
2022-01-28T02:06:03.6059060Z ##[endgroup]
2022-01-28T02:06:03.6090747Z /home/ec2-user/actions-runner/_work/_temp/6062ae10-9b59-49b9-9054-24f40d4b3cab.sh: line 1: .github/scripts/wait_for_ssh_to_drain.sh: No such file or directory
2022-01-28T02:06:03.6093399Z ##[error]Process completed with exit code 1.
2022-01-28T02:06:03.6108143Z ##[group]Run # Prune all of the docker images
2022-01-28T02:06:03.6108406Z �[36;1m# Prune all of the docker images�[0m
2022-01-28T02:06:03.6108646Z �[36;1mdocker system prune -af�[0m
2022-01-28T02:06:03.6118679Z shell: /usr/bin/bash -e {0}
2022-01-28T02:06:03.6118881Z env:
2022-01-28T02:06:03.6119128Z   BUILD_ENVIRONMENT: linux-xenial-cuda11.3-py3.7-gcc7
2022-01-28T02:06:03.6119517Z   DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7
2022-01-28T02:06:03.6119944Z   SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2
2022-01-28T02:06:03.6120267Z   XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla

pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single / build-and-test (2/14)

Step: "Checkout PyTorch" (full log | diagnosis details | 🔁 rerun)

2022-01-28T02:06:25.8461180Z /home/ec2-user/act...ait_for_ssh_to_drain.sh: No such file or directory

  "ciflow/default"
]
2022-01-28T02:06:25.8433305Z   GITHUB_TOKEN: ***
2022-01-28T02:06:25.8433518Z   AWS_DEFAULT_REGION: us-east-1
2022-01-28T02:06:25.8433745Z   PR_NUMBER: 68969
2022-01-28T02:06:25.8433990Z   SHA1: 4be2c41e0478f21ab98df793bb55b8496e65cace
2022-01-28T02:06:25.8434228Z   PYTORCH_RETRY_TEST_CASES: 1
2022-01-28T02:06:25.8434693Z   JOB_BASE_NAME: pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-build-and-test
2022-01-28T02:06:25.8435121Z   NUM_TEST_SHARDS: 1
2022-01-28T02:06:25.8435333Z ##[endgroup]
2022-01-28T02:06:25.8461180Z /home/ec2-user/actions-runner/_work/_temp/f989910f-b9cc-4e96-be0c-8fa020500fe0.sh: line 1: .github/scripts/wait_for_ssh_to_drain.sh: No such file or directory
2022-01-28T02:06:25.8464400Z ##[error]Process completed with exit code 1.
2022-01-28T02:06:25.8481072Z ##[group]Run # Ensure the working directory gets chowned back to the current user
2022-01-28T02:06:25.8481407Z �[36;1m# Ensure the working directory gets chowned back to the current user�[0m
2022-01-28T02:06:25.8481718Z �[36;1mdocker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .�[0m
2022-01-28T02:06:25.8491985Z shell: /usr/bin/bash -e {0}
2022-01-28T02:06:25.8492166Z env:
2022-01-28T02:06:25.8492514Z   BUILD_ENVIRONMENT: pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single
2022-01-28T02:06:25.8493044Z   DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c
2022-01-28T02:06:25.8493441Z   SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2
2022-01-28T02:06:25.8493750Z   XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla

linux-vulkan-bionic-py3.7-clang9 / build (3/14)

Step: "Checkout PyTorch" (full log | diagnosis details | 🔁 rerun)

2022-01-28T02:06:24.1509594Z /home/ec2-user/act...ait_for_ssh_to_drain.sh: No such file or directory

  "cla signed",
  "ciflow/default"
]
2022-01-28T02:06:24.1481523Z   GITHUB_TOKEN: ***
2022-01-28T02:06:24.1481694Z   AWS_DEFAULT_REGION: us-east-1
2022-01-28T02:06:24.1481880Z   PR_NUMBER: 68969
2022-01-28T02:06:24.1482082Z   SHA1: 4be2c41e0478f21ab98df793bb55b8496e65cace
2022-01-28T02:06:24.1482280Z   PYTORCH_RETRY_TEST_CASES: 1
2022-01-28T02:06:24.1482530Z   JOB_BASE_NAME: linux-vulkan-bionic-py3.7-clang9-build
2022-01-28T02:06:24.1482743Z ##[endgroup]
2022-01-28T02:06:24.1509594Z /home/ec2-user/actions-runner/_work/_temp/d72f0b5b-de30-4e0e-ba31-6ba7bd1c0e9c.sh: line 1: .github/scripts/wait_for_ssh_to_drain.sh: No such file or directory
2022-01-28T02:06:24.1512770Z ##[error]Process completed with exit code 1.
2022-01-28T02:06:24.1527057Z ##[group]Run # Prune all of the docker images
2022-01-28T02:06:24.1527305Z �[36;1m# Prune all of the docker images�[0m
2022-01-28T02:06:24.1527502Z �[36;1mdocker system prune -af�[0m
2022-01-28T02:06:24.1537955Z shell: /usr/bin/bash -e {0}
2022-01-28T02:06:24.1538128Z env:
2022-01-28T02:06:24.1538340Z   BUILD_ENVIRONMENT: linux-vulkan-bionic-py3.7-clang9
2022-01-28T02:06:24.1538690Z   DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-py3.7-clang9
2022-01-28T02:06:24.1539031Z   SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2
2022-01-28T02:06:24.1539326Z   XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla

linux-xenial-py3.7-clang7-asan / build (4/14)

Step: "Checkout PyTorch" (full log | diagnosis details | 🔁 rerun)

2022-01-28T02:05:53.2523819Z /home/ec2-user/act...ait_for_ssh_to_drain.sh: No such file or directory

  "cla signed",
  "ciflow/default"
]
2022-01-28T02:05:53.2494976Z   GITHUB_TOKEN: ***
2022-01-28T02:05:53.2495373Z   AWS_DEFAULT_REGION: us-east-1
2022-01-28T02:05:53.2495579Z   PR_NUMBER: 68969
2022-01-28T02:05:53.2495798Z   SHA1: 4be2c41e0478f21ab98df793bb55b8496e65cace
2022-01-28T02:05:53.2496005Z   PYTORCH_RETRY_TEST_CASES: 1
2022-01-28T02:05:53.2496257Z   JOB_BASE_NAME: linux-xenial-py3.7-clang7-asan-build
2022-01-28T02:05:53.2496477Z ##[endgroup]
2022-01-28T02:05:53.2523819Z /home/ec2-user/actions-runner/_work/_temp/63d75639-08a8-43e9-8d9b-0335f4de5fd5.sh: line 1: .github/scripts/wait_for_ssh_to_drain.sh: No such file or directory
2022-01-28T02:05:53.2526736Z ##[error]Process completed with exit code 1.
2022-01-28T02:05:53.2542886Z ##[group]Run # Prune all of the docker images
2022-01-28T02:05:53.2543181Z �[36;1m# Prune all of the docker images�[0m
2022-01-28T02:05:53.2543437Z �[36;1mdocker system prune -af�[0m
2022-01-28T02:05:53.2554228Z shell: /usr/bin/bash -e {0}
2022-01-28T02:05:53.2554430Z env:
2022-01-28T02:05:53.2554690Z   BUILD_ENVIRONMENT: linux-xenial-py3.7-clang7-asan
2022-01-28T02:05:53.2555080Z   DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang7-asan
2022-01-28T02:05:53.2555475Z   SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2
2022-01-28T02:05:53.2555815Z   XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla

linux-xenial-py3.7-clang7-onnx / build (5/14)

Step: "Checkout PyTorch" (full log | diagnosis details | 🔁 rerun)

2022-01-28T02:06:10.1140541Z /home/ec2-user/act...ait_for_ssh_to_drain.sh: No such file or directory

  "cla signed",
  "ciflow/default"
]
2022-01-28T02:06:10.1114085Z   GITHUB_TOKEN: ***
2022-01-28T02:06:10.1114273Z   AWS_DEFAULT_REGION: us-east-1
2022-01-28T02:06:10.1114445Z   PR_NUMBER: 68969
2022-01-28T02:06:10.1114651Z   SHA1: 4be2c41e0478f21ab98df793bb55b8496e65cace
2022-01-28T02:06:10.1114848Z   PYTORCH_RETRY_TEST_CASES: 1
2022-01-28T02:06:10.1115090Z   JOB_BASE_NAME: linux-xenial-py3.7-clang7-onnx-build
2022-01-28T02:06:10.1115313Z ##[endgroup]
2022-01-28T02:06:10.1140541Z /home/ec2-user/actions-runner/_work/_temp/74a77735-8965-4b10-9300-103bc9ba81dc.sh: line 1: .github/scripts/wait_for_ssh_to_drain.sh: No such file or directory
2022-01-28T02:06:10.1142891Z ##[error]Process completed with exit code 1.
2022-01-28T02:06:10.1184283Z ##[group]Run # Prune all of the docker images
2022-01-28T02:06:10.1184533Z �[36;1m# Prune all of the docker images�[0m
2022-01-28T02:06:10.1184762Z �[36;1mdocker system prune -af�[0m
2022-01-28T02:06:10.1195436Z shell: /usr/bin/bash -e {0}
2022-01-28T02:06:10.1195615Z env:
2022-01-28T02:06:10.1195825Z   BUILD_ENVIRONMENT: linux-xenial-py3.7-clang7-onnx
2022-01-28T02:06:10.1196294Z   DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang7-onnx
2022-01-28T02:06:10.1196631Z   SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2
2022-01-28T02:06:10.1196939Z   XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla

linux-xenial-py3-clang5-mobile-custom-build-static / build (6/14)

Step: "Checkout PyTorch" (full log | diagnosis details | 🔁 rerun)

2022-01-28T02:05:52.0407035Z /home/ec2-user/act...ait_for_ssh_to_drain.sh: No such file or directory

  "cla signed",
  "ciflow/default"
]
2022-01-28T02:05:52.0380843Z   GITHUB_TOKEN: ***
2022-01-28T02:05:52.0381037Z   AWS_DEFAULT_REGION: us-east-1
2022-01-28T02:05:52.0381220Z   PR_NUMBER: 68969
2022-01-28T02:05:52.0381435Z   SHA1: 4be2c41e0478f21ab98df793bb55b8496e65cace
2022-01-28T02:05:52.0381644Z   PYTORCH_RETRY_TEST_CASES: 1
2022-01-28T02:05:52.0381955Z   JOB_BASE_NAME: linux-xenial-py3-clang5-mobile-custom-build-static-build
2022-01-28T02:05:52.0382246Z ##[endgroup]
2022-01-28T02:05:52.0407035Z /home/ec2-user/actions-runner/_work/_temp/7ea8c3e9-7d4f-4267-a7e3-4b6e820b0f9b.sh: line 1: .github/scripts/wait_for_ssh_to_drain.sh: No such file or directory
2022-01-28T02:05:52.0409931Z ##[error]Process completed with exit code 1.
2022-01-28T02:05:52.0425123Z ##[group]Run # Prune all of the docker images
2022-01-28T02:05:52.0425382Z �[36;1m# Prune all of the docker images�[0m
2022-01-28T02:05:52.0425601Z �[36;1mdocker system prune -af�[0m
2022-01-28T02:05:52.0435913Z shell: /usr/bin/bash -e {0}
2022-01-28T02:05:52.0436106Z env:
2022-01-28T02:05:52.0436373Z   BUILD_ENVIRONMENT: linux-xenial-py3-clang5-mobile-custom-build-static
2022-01-28T02:05:52.0436825Z   DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c
2022-01-28T02:05:52.0437207Z   SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2
2022-01-28T02:05:52.0437531Z   XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla

linux-xenial-py3.7-gcc7 / build (7/14)

Step: "Checkout PyTorch" (full log | diagnosis details | 🔁 rerun)

2022-01-28T14:10:43.2343680Z /home/ec2-user/act...ait_for_ssh_to_drain.sh: No such file or directory

  "cla signed",
  "ciflow/default"
]
2022-01-28T14:10:43.2316674Z   GITHUB_TOKEN: ***
2022-01-28T14:10:43.2316867Z   AWS_DEFAULT_REGION: us-east-1
2022-01-28T14:10:43.2317040Z   PR_NUMBER: 68969
2022-01-28T14:10:43.2317243Z   SHA1: 4be2c41e0478f21ab98df793bb55b8496e65cace
2022-01-28T14:10:43.2317458Z   PYTORCH_RETRY_TEST_CASES: 1
2022-01-28T14:10:43.2317672Z   JOB_BASE_NAME: linux-xenial-py3.7-gcc7-build
2022-01-28T14:10:43.2317884Z ##[endgroup]
2022-01-28T14:10:43.2343680Z /home/ec2-user/actions-runner/_work/_temp/86cb55ee-0524-4343-9edc-94dae3f583d8.sh: line 1: .github/scripts/wait_for_ssh_to_drain.sh: No such file or directory
2022-01-28T14:10:43.2346456Z ##[error]Process completed with exit code 1.
2022-01-28T14:10:43.2377738Z ##[group]Run # Prune all of the docker images
2022-01-28T14:10:43.2377988Z �[36;1m# Prune all of the docker images�[0m
2022-01-28T14:10:43.2378188Z �[36;1mdocker system prune -af�[0m
2022-01-28T14:10:43.2388257Z shell: /usr/bin/bash -e {0}
2022-01-28T14:10:43.2388419Z env:
2022-01-28T14:10:43.2388631Z   BUILD_ENVIRONMENT: linux-xenial-py3.7-gcc7
2022-01-28T14:10:43.2388950Z   DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3.7-gcc7
2022-01-28T14:10:43.2389288Z   SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2
2022-01-28T14:10:43.2389585Z   XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla

linux-bionic-py3.7-clang9 / build (8/14)

Step: "Checkout PyTorch" (full log | diagnosis details | 🔁 rerun)

2022-01-28T02:05:37.8797130Z /home/ec2-user/act...ait_for_ssh_to_drain.sh: No such file or directory

  "cla signed",
  "ciflow/default"
]
2022-01-28T02:05:37.8770830Z   GITHUB_TOKEN: ***
2022-01-28T02:05:37.8771017Z   AWS_DEFAULT_REGION: us-east-1
2022-01-28T02:05:37.8771190Z   PR_NUMBER: 68969
2022-01-28T02:05:37.8771396Z   SHA1: 4be2c41e0478f21ab98df793bb55b8496e65cace
2022-01-28T02:05:37.8771599Z   PYTORCH_RETRY_TEST_CASES: 1
2022-01-28T02:05:37.8771835Z   JOB_BASE_NAME: linux-bionic-py3.7-clang9-build
2022-01-28T02:05:37.8772047Z ##[endgroup]
2022-01-28T02:05:37.8797130Z /home/ec2-user/actions-runner/_work/_temp/b785f044-1808-43ec-923d-576d1f2c6dcd.sh: line 1: .github/scripts/wait_for_ssh_to_drain.sh: No such file or directory
2022-01-28T02:05:37.8799671Z ##[error]Process completed with exit code 1.
2022-01-28T02:05:37.8829498Z ##[group]Run # Prune all of the docker images
2022-01-28T02:05:37.8829763Z �[36;1m# Prune all of the docker images�[0m
2022-01-28T02:05:37.8830004Z �[36;1mdocker system prune -af�[0m
2022-01-28T02:05:37.8840149Z shell: /usr/bin/bash -e {0}
2022-01-28T02:05:37.8840354Z env:
2022-01-28T02:05:37.8840596Z   BUILD_ENVIRONMENT: linux-bionic-py3.7-clang9
2022-01-28T02:05:37.8840943Z   DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-py3.7-clang9
2022-01-28T02:05:37.8841309Z   SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2
2022-01-28T02:05:37.8841633Z   XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla

linux-docs / build (9/14)

Step: "Checkout PyTorch" (full log | diagnosis details | 🔁 rerun)

2022-01-28T02:05:26.9002570Z /home/ec2-user/act...ait_for_ssh_to_drain.sh: No such file or directory

  "cla signed",
  "ciflow/default"
]
2022-01-28T02:05:26.8977370Z   GITHUB_TOKEN: ***
2022-01-28T02:05:26.8977560Z   AWS_DEFAULT_REGION: us-east-1
2022-01-28T02:05:26.8977736Z   PR_NUMBER: 68969
2022-01-28T02:05:26.8977942Z   SHA1: 4be2c41e0478f21ab98df793bb55b8496e65cace
2022-01-28T02:05:26.8978143Z   PYTORCH_RETRY_TEST_CASES: 1
2022-01-28T02:05:26.8978524Z   JOB_BASE_NAME: linux-docs-build
2022-01-28T02:05:26.8978716Z ##[endgroup]
2022-01-28T02:05:26.9002570Z /home/ec2-user/actions-runner/_work/_temp/44ecc678-57a1-40ca-8c7b-51d094551b14.sh: line 1: .github/scripts/wait_for_ssh_to_drain.sh: No such file or directory
2022-01-28T02:05:26.9005292Z ##[error]Process completed with exit code 1.
2022-01-28T02:05:26.9027176Z ##[group]Run # Prune all of the docker images
2022-01-28T02:05:26.9027592Z �[36;1m# Prune all of the docker images�[0m
2022-01-28T02:05:26.9027941Z �[36;1mdocker system prune -af�[0m
2022-01-28T02:05:26.9042111Z shell: /usr/bin/bash -e {0}
2022-01-28T02:05:26.9042410Z env:
2022-01-28T02:05:26.9042679Z   BUILD_ENVIRONMENT: linux-docs
2022-01-28T02:05:26.9043221Z   DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3.7-gcc5.4
2022-01-28T02:05:26.9043833Z   SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2
2022-01-28T02:05:26.9044390Z   XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla

linux-xenial-py3.7-gcc7-no-ops / build (10/14)

Step: "Checkout PyTorch" (full log | diagnosis details | 🔁 rerun)

2022-01-28T02:05:48.7763451Z /home/ec2-user/act...ait_for_ssh_to_drain.sh: No such file or directory

  "cla signed",
  "ciflow/default"
]
2022-01-28T02:05:48.7735857Z   GITHUB_TOKEN: ***
2022-01-28T02:05:48.7736045Z   AWS_DEFAULT_REGION: us-east-1
2022-01-28T02:05:48.7736217Z   PR_NUMBER: 68969
2022-01-28T02:05:48.7736424Z   SHA1: 4be2c41e0478f21ab98df793bb55b8496e65cace
2022-01-28T02:05:48.7736621Z   PYTORCH_RETRY_TEST_CASES: 1
2022-01-28T02:05:48.7736864Z   JOB_BASE_NAME: linux-xenial-py3.7-gcc7-no-ops-build
2022-01-28T02:05:48.7737087Z ##[endgroup]
2022-01-28T02:05:48.7763451Z /home/ec2-user/actions-runner/_work/_temp/23ffe5d6-b5f0-477d-ba9a-e5107bcbf783.sh: line 1: .github/scripts/wait_for_ssh_to_drain.sh: No such file or directory
2022-01-28T02:05:48.7766171Z ##[error]Process completed with exit code 1.
2022-01-28T02:05:48.7779907Z ##[group]Run # Prune all of the docker images
2022-01-28T02:05:48.7780145Z �[36;1m# Prune all of the docker images�[0m
2022-01-28T02:05:48.7780354Z �[36;1mdocker system prune -af�[0m
2022-01-28T02:05:48.7790202Z shell: /usr/bin/bash -e {0}
2022-01-28T02:05:48.7790378Z env:
2022-01-28T02:05:48.7790597Z   BUILD_ENVIRONMENT: linux-xenial-py3.7-gcc7-no-ops
2022-01-28T02:05:48.7790930Z   DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3.7-gcc7
2022-01-28T02:05:48.7791269Z   SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2
2022-01-28T02:05:48.7791566Z   XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla

linux-xenial-cuda11.3-py3.7-gcc7-bazel-test / build-and-test (11/14)

Step: "Checkout PyTorch" (full log | diagnosis details | 🔁 rerun)

2022-01-28T02:06:46.3232516Z /home/ec2-user/act...ait_for_ssh_to_drain.sh: No such file or directory

  "ciflow/default"
]
2022-01-28T02:06:46.3203408Z   GITHUB_TOKEN: ***
2022-01-28T02:06:46.3203594Z   AWS_DEFAULT_REGION: us-east-1
2022-01-28T02:06:46.3203787Z   PR_NUMBER: 68969
2022-01-28T02:06:46.3204001Z   SHA1: 4be2c41e0478f21ab98df793bb55b8496e65cace
2022-01-28T02:06:46.3204213Z   PYTORCH_RETRY_TEST_CASES: 1
2022-01-28T02:06:46.3204519Z   JOB_BASE_NAME: linux-xenial-cuda11.3-py3.7-gcc7-bazel-test-build-and-test
2022-01-28T02:06:46.3204794Z   NUM_TEST_SHARDS: 1
2022-01-28T02:06:46.3204974Z ##[endgroup]
2022-01-28T02:06:46.3232516Z /home/ec2-user/actions-runner/_work/_temp/83baa67a-2a8a-4a1c-b61e-1c4f21dffa1c.sh: line 1: .github/scripts/wait_for_ssh_to_drain.sh: No such file or directory
2022-01-28T02:06:46.3235092Z ##[error]Process completed with exit code 1.
2022-01-28T02:06:46.3250875Z ##[group]Run # Ensure the working directory gets chowned back to the current user
2022-01-28T02:06:46.3251271Z �[36;1m# Ensure the working directory gets chowned back to the current user�[0m
2022-01-28T02:06:46.3251584Z �[36;1mdocker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .�[0m
2022-01-28T02:06:46.3262190Z shell: /usr/bin/bash -e {0}
2022-01-28T02:06:46.3262364Z env:
2022-01-28T02:06:46.3262611Z   BUILD_ENVIRONMENT: linux-xenial-cuda11.3-py3.7-gcc7-bazel-test
2022-01-28T02:06:46.3263021Z   DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7
2022-01-28T02:06:46.3263400Z   SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2
2022-01-28T02:06:46.3263719Z   XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla

linux-xenial-py3.7-gcc5.4 / build (12/14)

Step: "Checkout PyTorch" (full log | diagnosis details | 🔁 rerun)

2022-01-28T02:06:04.2841454Z /home/ec2-user/act...ait_for_ssh_to_drain.sh: No such file or directory

  "cla signed",
  "ciflow/default"
]
2022-01-28T02:06:04.2807638Z   GITHUB_TOKEN: ***
2022-01-28T02:06:04.2807952Z   AWS_DEFAULT_REGION: us-east-1
2022-01-28T02:06:04.2808256Z   PR_NUMBER: 68969
2022-01-28T02:06:04.2808578Z   SHA1: 4be2c41e0478f21ab98df793bb55b8496e65cace
2022-01-28T02:06:04.2808876Z   PYTORCH_RETRY_TEST_CASES: 1
2022-01-28T02:06:04.2809229Z   JOB_BASE_NAME: linux-xenial-py3.7-gcc5.4-build
2022-01-28T02:06:04.2809550Z ##[endgroup]
2022-01-28T02:06:04.2841454Z /home/ec2-user/actions-runner/_work/_temp/343e43b6-dfcd-4fdb-a3a6-78e02bba0e1f.sh: line 1: .github/scripts/wait_for_ssh_to_drain.sh: No such file or directory
2022-01-28T02:06:04.2843398Z ##[error]Process completed with exit code 1.
2022-01-28T02:06:04.2858979Z ##[group]Run # Prune all of the docker images
2022-01-28T02:06:04.2859256Z �[36;1m# Prune all of the docker images�[0m
2022-01-28T02:06:04.2859504Z �[36;1mdocker system prune -af�[0m
2022-01-28T02:06:04.2869694Z shell: /usr/bin/bash -e {0}
2022-01-28T02:06:04.2869899Z env:
2022-01-28T02:06:04.2870123Z   BUILD_ENVIRONMENT: linux-xenial-py3.7-gcc5.4
2022-01-28T02:06:04.2870490Z   DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3.7-gcc5.4
2022-01-28T02:06:04.2870848Z   SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2
2022-01-28T02:06:04.2871188Z   XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla

linux-xenial-py3-clang5-mobile-build / build (13/14)

Step: "Checkout PyTorch" (full log | diagnosis details | 🔁 rerun)

2022-01-28T02:06:25.4235466Z /home/ec2-user/act...ait_for_ssh_to_drain.sh: No such file or directory

  "cla signed",
  "ciflow/default"
]
2022-01-28T02:06:25.4209600Z   GITHUB_TOKEN: ***
2022-01-28T02:06:25.4209784Z   AWS_DEFAULT_REGION: us-east-1
2022-01-28T02:06:25.4209955Z   PR_NUMBER: 68969
2022-01-28T02:06:25.4210157Z   SHA1: 4be2c41e0478f21ab98df793bb55b8496e65cace
2022-01-28T02:06:25.4210368Z   PYTORCH_RETRY_TEST_CASES: 1
2022-01-28T02:06:25.4210610Z   JOB_BASE_NAME: linux-xenial-py3-clang5-mobile-build-build
2022-01-28T02:06:25.4210846Z ##[endgroup]
2022-01-28T02:06:25.4235466Z /home/ec2-user/actions-runner/_work/_temp/7a46c300-3679-4dea-9590-91ff6b935ced.sh: line 1: .github/scripts/wait_for_ssh_to_drain.sh: No such file or directory
2022-01-28T02:06:25.4238321Z ##[error]Process completed with exit code 1.
2022-01-28T02:06:25.4252566Z ##[group]Run # Prune all of the docker images
2022-01-28T02:06:25.4252799Z �[36;1m# Prune all of the docker images�[0m
2022-01-28T02:06:25.4253008Z �[36;1mdocker system prune -af�[0m
2022-01-28T02:06:25.4263214Z shell: /usr/bin/bash -e {0}
2022-01-28T02:06:25.4263384Z env:
2022-01-28T02:06:25.4263603Z   BUILD_ENVIRONMENT: linux-xenial-py3-clang5-mobile-build
2022-01-28T02:06:25.4263966Z   DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-asan
2022-01-28T02:06:25.4264311Z   SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2
2022-01-28T02:06:25.4264604Z   XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla

pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit / build-and-test (14/14)

Step: "Checkout PyTorch" (full log | diagnosis details | 🔁 rerun)

2022-01-28T02:06:31.5512227Z /home/ec2-user/act...ait_for_ssh_to_drain.sh: No such file or directory

  "ciflow/default"
]
2022-01-28T02:06:31.5485075Z   GITHUB_TOKEN: ***
2022-01-28T02:06:31.5485281Z   AWS_DEFAULT_REGION: us-east-1
2022-01-28T02:06:31.5485496Z   PR_NUMBER: 68969
2022-01-28T02:06:31.5485730Z   SHA1: 4be2c41e0478f21ab98df793bb55b8496e65cace
2022-01-28T02:06:31.5485960Z   PYTORCH_RETRY_TEST_CASES: 1
2022-01-28T02:06:31.5486438Z   JOB_BASE_NAME: pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit-build-and-test
2022-01-28T02:06:31.5486883Z   NUM_TEST_SHARDS: 1
2022-01-28T02:06:31.5487083Z ##[endgroup]
2022-01-28T02:06:31.5512227Z /home/ec2-user/actions-runner/_work/_temp/f01c691f-3fa2-447d-b078-7493bfec6fe2.sh: line 1: .github/scripts/wait_for_ssh_to_drain.sh: No such file or directory
2022-01-28T02:06:31.5515828Z ##[error]Process completed with exit code 1.
2022-01-28T02:06:31.5530522Z ##[group]Run # Ensure the working directory gets chowned back to the current user
2022-01-28T02:06:31.5530839Z �[36;1m# Ensure the working directory gets chowned back to the current user�[0m
2022-01-28T02:06:31.5531132Z �[36;1mdocker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .�[0m
2022-01-28T02:06:31.5541868Z shell: /usr/bin/bash -e {0}
2022-01-28T02:06:31.5542108Z env:
2022-01-28T02:06:31.5542608Z   BUILD_ENVIRONMENT: pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit
2022-01-28T02:06:31.5543342Z   DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c
2022-01-28T02:06:31.5543794Z   SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2
2022-01-28T02:06:31.5544088Z   XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla

2 failures not recognized by patterns:

Job	Step	Action
^{win-vs2019-cuda11.3-py3 / build}	^{Checkout PyTorch}	🔁 rerun
^{win-vs2019-cpu-py3 / build}	^{Checkout PyTorch}	🔁 rerun

ci.pytorch.org: 1 failed

Failed: pr/pytorch-linux-bionic-rocm4.5-py3.7

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

albanD

Very interesting!
How does this compose with other parametrization now?

emcastillo · 2021-12-20T07:19:33Z

@albanD thanks!!!

How does this compose with other parametrization now?

I just fixed a small bug relating to this. Currently the module maintains the original behavior, this means that the parameter passed as an argument to the functional call will be used instead of the original parametrization for that attribute. This is
verified with the test_reparametrized_module

We can also add a mode in which the parameterizations are kept as they are and never replaced, or we can try to introspect the parameterizations and replace the parameter inside, but this feels very hacky.

emcastillo · 2022-01-04T05:49:20Z

@Chillee @albanD guess you guys are still on holidays :). Let me know what you think of this when you have some spare time. Thanks!

Chillee · 2022-01-04T09:45:55Z

@emcastillo I'll leave the actual review to Alban, but just wanted to say that this is awesome, and we'd be glad to change functorch to using this after our performance concerns have been resolved :)

albanD

Thanks for looking into this.
I do agree this is a solution that will work even though we do loose a little bit of flexibility: in particular we enforce that stateless ignores all parametrization which was not the case before. It is an open question if we want that though?

I am still not convinced though that parametrization cannot be sped up to be similar to this and I think it will be a generally useful thing to do.

@Chillee is this perf improvement enough that you can use this for functorch? If so, we can add this as a temporary fix and then move back to parametrization when the perf gap there has been solved?

albanD · 2022-01-04T14:17:58Z

torch/nn/utils/_stateless.py

Is this a TODO?

yes! I should fix this comment

albanD · 2022-01-04T14:21:58Z

torch/nn/utils/_stateless.py

super().__getattribute__ here right?

that led to infinite recursion :D, so I went with the base object method. (probably I Just did something wrong as usual)

tried again to fix it and it worked now, seems I was messing with something else originally :D

emcastillo · 2022-01-05T07:39:16Z

in particular we enforce that stateless ignores all parametrization which was not the case before. It is an open question if we want that though?

Actually, this is the current behavior in master branch if I am not mistaken, we just register a parametrization that acts like an identity function for the value of the parameter passed as a tensor. So in essence, we are replacing the previously registered parameterizations and returning the parameter that we pass in the state dict.

I think it is possible to call the actual parametrization in this PR approach, do we want that?
maybe adding a kwarg override_parametrization to control this behavior is a good idea/

I am still not convinced though that parametrization cannot be sped up to be similar to this and I think it will be a generally useful thing to do.

I tried to change the parametrization code to support spare tensors along with modules, but all the typing annotations made the code to be a cludge, I ended up with this design because it did the same thing and is cleaner.
Wrapping every parameter in a torch.nn.Module and accessing it via __call__ every time is overkill and the source of performance degradation. So maybe allowing simple callables can ease the burden? I should try this and measure.

BTW thanks for the comments! while writing this, I realized several alternatives to improve this that I didn't consider before.

emcastillo · 2022-01-07T03:18:06Z

@albanD, I just pushed support to apply the existing parameterizations to a parameter via a kwarg. I think this should solve your main concern!

If we have a parametrization over module.weight, and we pass a parameters_and_buffers['weight'] value, with this option we will do parametrization(parameters_and_buffers['weight']) instead of just blindly overwriting the former.
This allows us to use the parameterizations directly instead of directly passing parametrizations.weight.original paths that may be hard to provide in complex modules.

Also, I cleaned up old comments and add type declarations.

emcastillo · 2022-01-27T02:01:03Z

@albanD @Chillee

Just measured the same example with functorch and these are the results

% of parameters passed	CPU Time (us)	GPU Time (us)
This PR with 100% of params changed	11911	190800
functorch	14014	191727

Seems that this PR is slightly faster,
the main difference is that functorch has the make_functional call that creates a functional version of the module and after that, it just swaps the params on every functional call.

This PR will re-create the functional module on every call to avoid dealing with shared-state as an initial requirement for #61447.
The execution time is pretty similar for a Renet-50.

import cupyx
import torch
import torchvision.models as models
from functorch import make_functional_with_buffers


def main():
    model = models.resnet50(pretrained=True).cuda()
    func, params, buffers = make_functional_with_buffers(model)
    func.__name__ = 'resnet_func'
    x = torch.rand((128, 3, 224, 224)).cuda()
    print('Non functional call')
    print(cupyx.time.repeat(lambda: model(x), n_repeat=20))
    print('functional call')
    print(cupyx.time.repeat(func, (params, buffers, x), n_repeat=20))


if __name__ == "__main__":
    main()

emcastillo · 2022-01-27T02:16:43Z

Sorry, for the above comment I took the master _stateless times instead of this PR ones. This has been corrected now.

albanD

The change looks good to me.
Just one question on wether we want the apply_parametrizations but that's it.

albanD · 2022-01-27T14:17:17Z

torch/nn/utils/_stateless.py

nit: false -> False

torch/nn/utils/_stateless.py

Chillee · 2022-01-27T18:59:08Z

btw, this is a repro for the issue I was running into before.

(needs this PR to handle duplicate params: #71542)

import torch
from torch.nn.utils import _stateless
from transformers import AutoConfig, AutoModelForCausalLM, AutoModelForMaskedLM, AutoModelForSeq2SeqLM, ReformerConfig, BigBirdConfig, BertConfig

config = AutoConfig.from_pretrained("t5-small")
model =  AutoModelForSeq2SeqLM.from_config(config)
input_ids = torch.randint(0, config.vocab_size, (1, 128))
decoder_ids = torch.randint(0, config.vocab_size, (1, 128))

train_inputs = {'input_ids': input_ids, 'labels': decoder_ids}

params_and_buffers = {**dict(model.named_parameters(remove_duplicate=False)), **dict(model.named_buffers(remove_duplicate=False))}

_stateless.functional_call(model, params_and_buffers, (), train_inputs)

Currently throws

ValueError: Module Embedding(32128, 512) does not have a parametrization on weight

Would be great if we could verify this works now, but not blocking (since it's broken anyways right now on master...)

@albanD

jbschlosser · 2022-01-27T20:30:41Z

Would be great if we could verify this works now, but not blocking (since it's broken anyways right now on master...)

For posterity: this is broken on master because of the weight tying - calling remove_parametrization() removes both parametrizations, making a subsequent call to remove_parametrization() fail.

Since this PR changes the logic to avoid using the parametrization mechanism in torch.nn.utils underneath, I expect it to fix this issue. Definitely agree we should have a test for the weight-tied case once #71542 lands.

emcastillo · 2022-01-28T01:48:14Z

@Chillee @jbschlosser I just tested this PR together with #71542 in the code snippet above and I confirm the error is gone!
Also, double-checked and the error appears in #71542 alone :).

emcastillo · 2022-01-28T01:58:43Z

@albanD, review comment addressed! this should be ready to ship :)
Thanks!

albanD

Thanks for the update!

facebook-github-bot · 2022-01-28T03:15:23Z

@albanD has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

albanD · 2022-01-28T14:05:15Z

@pytorchbot ciflow rerun

pytorch-bot · 2022-01-28T14:05:17Z

This command didn't do anything.
You don't need to manually issue ciflow rerun commands anymore. Just adding a ciflow/ label will trigger the workflow.

Summary: #61447 introduced a mechanism for performing functional calls in a model using the reparametrization API. However, the overhead introduced in a single call was too large. I tried to address this by modifying the reparametrization code to support spare tensors, but the changes needed were too large due to type checking and several parts of the code expecting actual `nn.Module` objects so this option was not feasible. resnet50 and call functional with a parameters dict covering the 0, 25, 50, and 100% of the model total parameters. Used script: https://gist.github.com/emcastillo/f344a58638bd71d130c71c45f86f0c3a | % of parameters passed | CPU Time (us) | GPU Time (us) | |------------------------|---------------|---------------| | regular call | 5539 | 184909 | | 0 | 5561 | 184843 | | 25 | 11363 | 189236 | | 50 | 18716 | 195378 | | 75 | 22851 | 198641 | | 100 | 27441 | 202281 | This PR just swaps the `__getattr__` of the submodules to look into a dict holding only the parameters when called, greatly reducing the burden of having to instantiate custom modules and calling forward to just retrieve a tensor. The execution times now are as follows: | % of parameters passed | CPU Time (us) | GPU Time (us) | |------------------------|---------------|---------------| | regular call | 5939 | 187533 | | 0 | 5899 | 187570 | | 25 | 8541 | 188953 | | 50 | 10045 | 189826 | | 75 | 11049 | 190344 | | 100 | 11911 | 190800 | | functorch with 100% params | 14014 | 191727 Now we see that the CPU time overhead is greatly reduced and the GPU time barely increases due to the effective overlap. cc albanD zou3519 Pull Request resolved: #68969 Reviewed By: george-qi Differential Revision: D33836360 Pulled By: albanD fbshipit-source-id: 532561f64b18ca14c6ae2d77dcacb339397a589d

Summary: #61447 introduced a mechanism for performing functional calls in a model using the reparametrization API. However, the overhead introduced in a single call was too large. I tried to address this by modifying the reparametrization code to support spare tensors, but the changes needed were too large due to type checking and several parts of the code expecting actual `nn.Module` objects so this option was not feasible. resnet50 and call functional with a parameters dict covering the 0, 25, 50, and 100% of the model total parameters. Used script: https://gist.github.com/emcastillo/f344a58638bd71d130c71c45f86f0c3a | % of parameters passed | CPU Time (us) | GPU Time (us) | |------------------------|---------------|---------------| | regular call | 5539 | 184909 | | 0 | 5561 | 184843 | | 25 | 11363 | 189236 | | 50 | 18716 | 195378 | | 75 | 22851 | 198641 | | 100 | 27441 | 202281 | This PR just swaps the `__getattr__` of the submodules to look into a dict holding only the parameters when called, greatly reducing the burden of having to instantiate custom modules and calling forward to just retrieve a tensor. The execution times now are as follows: | % of parameters passed | CPU Time (us) | GPU Time (us) | |------------------------|---------------|---------------| | regular call | 5939 | 187533 | | 0 | 5899 | 187570 | | 25 | 8541 | 188953 | | 50 | 10045 | 189826 | | 75 | 11049 | 190344 | | 100 | 11911 | 190800 | | functorch with 100% params | 14014 | 191727 Now we see that the CPU time overhead is greatly reduced and the GPU time barely increases due to the effective overlap. cc albanD zou3519 Pull Request resolved: #68969 Reviewed By: george-qi Differential Revision: D33836360 Pulled By: albanD fbshipit-source-id: 532561f64b18ca14c6ae2d77dcacb339397a589d (cherry picked from commit fd4b6bd)

Summary: pytorch/pytorch#61447 introduced a mechanism for performing functional calls in a model using the reparametrization API. However, the overhead introduced in a single call was too large. I tried to address this by modifying the reparametrization code to support spare tensors, but the changes needed were too large due to type checking and several parts of the code expecting actual `nn.Module` objects so this option was not feasible. resnet50 and call functional with a parameters dict covering the 0, 25, 50, and 100% of the model total parameters. Used script: https://gist.github.com/emcastillo/f344a58638bd71d130c71c45f86f0c3a | % of parameters passed | CPU Time (us) | GPU Time (us) | |------------------------|---------------|---------------| | regular call | 5539 | 184909 | | 0 | 5561 | 184843 | | 25 | 11363 | 189236 | | 50 | 18716 | 195378 | | 75 | 22851 | 198641 | | 100 | 27441 | 202281 | This PR just swaps the `__getattr__` of the submodules to look into a dict holding only the parameters when called, greatly reducing the burden of having to instantiate custom modules and calling forward to just retrieve a tensor. The execution times now are as follows: | % of parameters passed | CPU Time (us) | GPU Time (us) | |------------------------|---------------|---------------| | regular call | 5939 | 187533 | | 0 | 5899 | 187570 | | 25 | 8541 | 188953 | | 50 | 10045 | 189826 | | 75 | 11049 | 190344 | | 100 | 11911 | 190800 | | functorch with 100% params | 14014 | 191727 Now we see that the CPU time overhead is greatly reduced and the GPU time barely increases due to the effective overlap. cc albanD zou3519 Pull Request resolved: pytorch/pytorch#68969 Reviewed By: george-qi Differential Revision: D33836360 Pulled By: albanD fbshipit-source-id: 532561f64b18ca14c6ae2d77dcacb339397a589d (cherry picked from commit fd4b6bd)

emcastillo requested review from albanD and jbschlosser as code owners November 29, 2021 02:09

pytorch-probot bot added the ciflow/default label Nov 29, 2021

facebook-github-bot added the cla signed label Nov 29, 2021

emcastillo changed the title ~~Add a lightweight reparametrization mechanism for functional calls~~ Add lightweight reparametrization for _stateless calls Nov 29, 2021

pytorchbot added the open source label Nov 29, 2021

soulitzer added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Nov 29, 2021

zou3519 self-requested a review November 30, 2021 23:33

albanD reviewed Dec 1, 2021

View reviewed changes

emcastillo force-pushed the reparam-tensors branch from b6103e8 to 1a9c2b9 Compare December 20, 2021 07:16

albanD requested a review from Chillee December 21, 2021 12:53

albanD reviewed Jan 4, 2022

View reviewed changes

emcastillo force-pushed the reparam-tensors branch from 1a9c2b9 to 6be2f19 Compare January 7, 2022 03:14

emcastillo force-pushed the reparam-tensors branch 2 times, most recently from e6ff61e to dab2ca5 Compare January 7, 2022 06:35

albanD reviewed Jan 27, 2022

View reviewed changes

torch/nn/utils/_stateless.py Outdated

Copy link

Collaborator

albanD Jan 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: false -> False

torch/nn/utils/_stateless.py Outdated Show resolved Hide resolved

Emilio Castillo added 3 commits January 28, 2022 00:42

Add a lightweight reparametrization mechanism for functional calls

f05399e

Fixes

8d1327c

Support parametrizations

1cecf78

Emilio Castillo added 2 commits January 28, 2022 00:42

Fix typing

e64b2f7

fix flake8

e4b41e5

Remove kwarg

58a45c1

emcastillo force-pushed the reparam-tensors branch from dab2ca5 to 58a45c1 Compare January 28, 2022 01:57

Fix docstring

4be2c41

albanD approved these changes Jan 28, 2022

View reviewed changes

pytorchmergebot closed this Jan 28, 2022

samdow mentioned this pull request Oct 18, 2022

[release] Add warning to stateless.functional_call for deprecated behavior #87079

Merged

samdow mentioned this pull request Dec 8, 2022

[stateless] add weight tying support #90477

Closed

Add lightweight reparametrization for _stateless calls #68969

Add lightweight reparametrization for _stateless calls #68969

Uh oh!

Conversation

emcastillo commented Nov 29, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-probot bot commented Nov 29, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚛️ CI Flow

Uh oh!

facebook-github-bot commented Nov 29, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful links

💊 CI failures summary and remediations

🕵️ 14 new failures recognized by patterns

linux-xenial-cuda11.3-py3.7-gcc7 / build (1/14)

pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single / build-and-test (2/14)

linux-vulkan-bionic-py3.7-clang9 / build (3/14)

linux-xenial-py3.7-clang7-asan / build (4/14)

linux-xenial-py3.7-clang7-onnx / build (5/14)

linux-xenial-py3-clang5-mobile-custom-build-static / build (6/14)

linux-xenial-py3.7-gcc7 / build (7/14)

linux-bionic-py3.7-clang9 / build (8/14)

linux-docs / build (9/14)

linux-xenial-py3.7-gcc7-no-ops / build (10/14)

linux-xenial-cuda11.3-py3.7-gcc7-bazel-test / build-and-test (11/14)

linux-xenial-py3.7-gcc5.4 / build (12/14)

linux-xenial-py3-clang5-mobile-build / build (13/14)

pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit / build-and-test (14/14)

2 failures not recognized by patterns:

ci.pytorch.org: 1 failed

Uh oh!

albanD left a comment

Choose a reason for hiding this comment

Uh oh!

emcastillo commented Dec 20, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

emcastillo commented Jan 4, 2022

Uh oh!

Chillee commented Jan 4, 2022

Uh oh!

albanD left a comment

Choose a reason for hiding this comment

Uh oh!

albanD Jan 4, 2022

Choose a reason for hiding this comment

Uh oh!

emcastillo Jan 5, 2022

Choose a reason for hiding this comment

Uh oh!

albanD Jan 4, 2022

Choose a reason for hiding this comment

Uh oh!

emcastillo Jan 5, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

emcastillo Jan 6, 2022

Choose a reason for hiding this comment

Uh oh!

emcastillo commented Jan 5, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

emcastillo commented Jan 7, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

emcastillo commented Jan 27, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

emcastillo commented Jan 27, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

albanD left a comment

Choose a reason for hiding this comment

Uh oh!

albanD Jan 27, 2022

Choose a reason for hiding this comment

Uh oh!

Add lightweight reparametrization for `_stateless` calls #68969

Add lightweight reparametrization for `_stateless` calls #68969

emcastillo commented Nov 29, 2021 •

edited

Loading

pytorch-probot bot commented Nov 29, 2021 •

edited

Loading

facebook-github-bot commented Nov 29, 2021 •

edited

Loading

emcastillo commented Dec 20, 2021 •

edited

Loading

emcastillo Jan 5, 2022 •

edited

Loading

emcastillo commented Jan 5, 2022 •

edited

Loading

emcastillo commented Jan 7, 2022 •

edited

Loading

emcastillo commented Jan 27, 2022 •

edited

Loading

emcastillo commented Jan 27, 2022 •

edited

Loading

Chillee commented Jan 27, 2022 •

edited

Loading

emcastillo commented Jan 28, 2022 •

edited

Loading