Skip to content

Conversation

@emcastillo
Copy link
Collaborator

@emcastillo emcastillo commented Nov 29, 2021

#61447 introduced a mechanism for performing functional calls in a model using the reparametrization API. However, the overhead introduced in a single call was too large.
I tried to address this by modifying the reparametrization code to support spare tensors, but the changes needed were too large due to type checking and several parts of the code expecting actual nn.Module objects so this option was not feasible.

resnet50 and call functional with a parameters dict covering the 0, 25, 50, and 100% of the model total parameters.

Used script:
https://gist.github.com/emcastillo/f344a58638bd71d130c71c45f86f0c3a

% of parameters passed CPU Time (us) GPU Time (us)
regular call 5539 184909
0 5561 184843
25 11363 189236
50 18716 195378
75 22851 198641
100 27441 202281

This PR just swaps the __getattr__ of the submodules to look into a dict holding only the parameters when called, greatly reducing the burden of having to instantiate custom modules and calling forward to just retrieve a tensor.

The execution times now are as follows:

% of parameters passed CPU Time (us) GPU Time (us)
regular call 5939 187533
0 5899 187570
25 8541 188953
50 10045 189826
75 11049 190344
100 11911 190800
functorch with 100% params 14014 191727

Now we see that the CPU time overhead is greatly reduced and the GPU time barely increases due to the effective overlap.

cc @albanD @zou3519

@pytorch-probot
Copy link

pytorch-probot bot commented Nov 29, 2021

CI Flow Status

⚛️ CI Flow

Ruleset - Version: v1
Ruleset - File: https://github.com/emcastillo/pytorch/blob/dab2ca50474ecaa2271c59f8b9e0e5d55a221cb5/.github/generated-ciflow-ruleset.json
PR ciflow labels: ciflow/default

Workflows Labels (bold enabled) Status
Triggered Workflows
linux-bionic-py3.7-clang9 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/noarch, ciflow/trunk ✅ triggered
linux-docs ciflow/all, ciflow/cpu, ciflow/default, ciflow/docs, ciflow/linux, ciflow/trunk ✅ triggered
linux-vulkan-bionic-py3.7-clang9 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk, ciflow/vulkan ✅ triggered
linux-xenial-cuda11.3-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
linux-xenial-cuda11.3-py3.7-gcc7-bazel-test ciflow/all, ciflow/bazel, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
linux-xenial-py3-clang5-mobile-build ciflow/all, ciflow/default, ciflow/linux, ciflow/mobile, ciflow/trunk ✅ triggered
linux-xenial-py3-clang5-mobile-custom-build-static ciflow/all, ciflow/default, ciflow/linux, ciflow/mobile, ciflow/trunk ✅ triggered
linux-xenial-py3.7-clang7-asan ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/sanitizers, ciflow/trunk ✅ triggered
linux-xenial-py3.7-clang7-onnx ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/onnx, ciflow/trunk ✅ triggered
linux-xenial-py3.7-gcc5.4 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
linux-xenial-py3.7-gcc7 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
linux-xenial-py3.7-gcc7-no-ops ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single ciflow/all, ciflow/android, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit ciflow/all, ciflow/android, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
win-vs2019-cpu-py3 ciflow/all, ciflow/cpu, ciflow/default, ciflow/trunk, ciflow/win ✅ triggered
win-vs2019-cuda11.3-py3 ciflow/all, ciflow/cuda, ciflow/default, ciflow/trunk, ciflow/win ✅ triggered
Skipped Workflows
caffe2-linux-xenial-py3.7-gcc5.4 ciflow/all, ciflow/cpu, ciflow/linux, ciflow/trunk 🚫 skipped
docker-builds ciflow/all, ciflow/trunk 🚫 skipped
ios-12-5-1-arm64 ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-arm64-coreml ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-arm64-custom-ops ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-arm64-full-jit ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-arm64-metal ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-x86-64 ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-x86-64-coreml ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-x86-64-full-jit ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
libtorch-linux-xenial-cuda10.2-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/trunk 🚫 skipped
libtorch-linux-xenial-cuda11.3-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/trunk 🚫 skipped
linux-bionic-cuda10.2-py3.9-gcc7 ciflow/all, ciflow/cuda, ciflow/linux, ciflow/slow, ciflow/trunk 🚫 skipped
linux-docs-push ciflow/all, ciflow/cpu, ciflow/linux, ciflow/scheduled 🚫 skipped
linux-xenial-cuda11.3-py3.7-gcc7-no-ops ciflow/all, ciflow/cuda, ciflow/linux, ciflow/trunk 🚫 skipped
macos-10-15-py3-arm64 ciflow/all, ciflow/macos, ciflow/trunk 🚫 skipped
macos-10-15-py3-lite-interpreter-x86-64 ciflow/all, ciflow/macos, ciflow/trunk 🚫 skipped
macos-11-py3-x86-64 ciflow/all, ciflow/macos, ciflow/trunk 🚫 skipped
parallelnative-linux-xenial-py3.7-gcc5.4 ciflow/all, ciflow/cpu, ciflow/linux, ciflow/trunk 🚫 skipped
periodic-libtorch-linux-bionic-cuda11.5-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-libtorch-linux-xenial-cuda11.1-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-linux-bionic-cuda11.5-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled, ciflow/slow, ciflow/slow-gradcheck 🚫 skipped
periodic-linux-xenial-cuda11.1-py3.7-gcc7-debug ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-win-vs2019-cuda11.1-py3 ciflow/all, ciflow/cuda, ciflow/scheduled, ciflow/win 🚫 skipped
periodic-win-vs2019-cuda11.5-py3 ciflow/all, ciflow/cuda, ciflow/scheduled, ciflow/win 🚫 skipped
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-build ciflow/all, ciflow/android, ciflow/cpu, ciflow/linux, ciflow/trunk 🚫 skipped

You can add a comment to the PR and tag @pytorchbot with the following commands:
# ciflow rerun, "ciflow/default" will always be added automatically
@pytorchbot ciflow rerun

# ciflow rerun with additional labels "-l <ciflow/label_name>", which is equivalent to adding these labels manually and trigger the rerun
@pytorchbot ciflow rerun -l ciflow/scheduled -l ciflow/slow

For more information, please take a look at the CI Flow Wiki.

@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Nov 29, 2021

🔗 Helpful links

💊 CI failures summary and remediations

As of commit 4be2c41 (more details on the Dr. CI page):


  • 17/17 failures possibly* introduced in this PR
    • 1/17 non-scanned failure(s)

🕵️ 14 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See GitHub Actions build linux-xenial-cuda11.3-py3.7-gcc7 / build (1/14)

Step: "Checkout PyTorch" (full log | diagnosis details | 🔁 rerun)

2022-01-28T02:06:03.6090747Z /home/ec2-user/act...ait_for_ssh_to_drain.sh: No such file or directory
  "cla signed",
  "ciflow/default"
]
2022-01-28T02:06:03.6057822Z   GITHUB_TOKEN: ***
2022-01-28T02:06:03.6058009Z   AWS_DEFAULT_REGION: us-east-1
2022-01-28T02:06:03.6058182Z   PR_NUMBER: 68969
2022-01-28T02:06:03.6058388Z   SHA1: 4be2c41e0478f21ab98df793bb55b8496e65cace
2022-01-28T02:06:03.6058586Z   PYTORCH_RETRY_TEST_CASES: 1
2022-01-28T02:06:03.6058832Z   JOB_BASE_NAME: linux-xenial-cuda11.3-py3.7-gcc7-build
2022-01-28T02:06:03.6059060Z ##[endgroup]
2022-01-28T02:06:03.6090747Z /home/ec2-user/actions-runner/_work/_temp/6062ae10-9b59-49b9-9054-24f40d4b3cab.sh: line 1: .github/scripts/wait_for_ssh_to_drain.sh: No such file or directory
2022-01-28T02:06:03.6093399Z ##[error]Process completed with exit code 1.
2022-01-28T02:06:03.6108143Z ##[group]Run # Prune all of the docker images
2022-01-28T02:06:03.6108406Z �[36;1m# Prune all of the docker images�[0m
2022-01-28T02:06:03.6108646Z �[36;1mdocker system prune -af�[0m
2022-01-28T02:06:03.6118679Z shell: /usr/bin/bash -e {0}
2022-01-28T02:06:03.6118881Z env:
2022-01-28T02:06:03.6119128Z   BUILD_ENVIRONMENT: linux-xenial-cuda11.3-py3.7-gcc7
2022-01-28T02:06:03.6119517Z   DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7
2022-01-28T02:06:03.6119944Z   SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2
2022-01-28T02:06:03.6120267Z   XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla

See GitHub Actions build pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single / build-and-test (2/14)

Step: "Checkout PyTorch" (full log | diagnosis details | 🔁 rerun)

2022-01-28T02:06:25.8461180Z /home/ec2-user/act...ait_for_ssh_to_drain.sh: No such file or directory
  "ciflow/default"
]
2022-01-28T02:06:25.8433305Z   GITHUB_TOKEN: ***
2022-01-28T02:06:25.8433518Z   AWS_DEFAULT_REGION: us-east-1
2022-01-28T02:06:25.8433745Z   PR_NUMBER: 68969
2022-01-28T02:06:25.8433990Z   SHA1: 4be2c41e0478f21ab98df793bb55b8496e65cace
2022-01-28T02:06:25.8434228Z   PYTORCH_RETRY_TEST_CASES: 1
2022-01-28T02:06:25.8434693Z   JOB_BASE_NAME: pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-build-and-test
2022-01-28T02:06:25.8435121Z   NUM_TEST_SHARDS: 1
2022-01-28T02:06:25.8435333Z ##[endgroup]
2022-01-28T02:06:25.8461180Z /home/ec2-user/actions-runner/_work/_temp/f989910f-b9cc-4e96-be0c-8fa020500fe0.sh: line 1: .github/scripts/wait_for_ssh_to_drain.sh: No such file or directory
2022-01-28T02:06:25.8464400Z ##[error]Process completed with exit code 1.
2022-01-28T02:06:25.8481072Z ##[group]Run # Ensure the working directory gets chowned back to the current user
2022-01-28T02:06:25.8481407Z �[36;1m# Ensure the working directory gets chowned back to the current user�[0m
2022-01-28T02:06:25.8481718Z �[36;1mdocker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .�[0m
2022-01-28T02:06:25.8491985Z shell: /usr/bin/bash -e {0}
2022-01-28T02:06:25.8492166Z env:
2022-01-28T02:06:25.8492514Z   BUILD_ENVIRONMENT: pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single
2022-01-28T02:06:25.8493044Z   DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c
2022-01-28T02:06:25.8493441Z   SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2
2022-01-28T02:06:25.8493750Z   XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla

See GitHub Actions build linux-vulkan-bionic-py3.7-clang9 / build (3/14)

Step: "Checkout PyTorch" (full log | diagnosis details | 🔁 rerun)

2022-01-28T02:06:24.1509594Z /home/ec2-user/act...ait_for_ssh_to_drain.sh: No such file or directory
  "cla signed",
  "ciflow/default"
]
2022-01-28T02:06:24.1481523Z   GITHUB_TOKEN: ***
2022-01-28T02:06:24.1481694Z   AWS_DEFAULT_REGION: us-east-1
2022-01-28T02:06:24.1481880Z   PR_NUMBER: 68969
2022-01-28T02:06:24.1482082Z   SHA1: 4be2c41e0478f21ab98df793bb55b8496e65cace
2022-01-28T02:06:24.1482280Z   PYTORCH_RETRY_TEST_CASES: 1
2022-01-28T02:06:24.1482530Z   JOB_BASE_NAME: linux-vulkan-bionic-py3.7-clang9-build
2022-01-28T02:06:24.1482743Z ##[endgroup]
2022-01-28T02:06:24.1509594Z /home/ec2-user/actions-runner/_work/_temp/d72f0b5b-de30-4e0e-ba31-6ba7bd1c0e9c.sh: line 1: .github/scripts/wait_for_ssh_to_drain.sh: No such file or directory
2022-01-28T02:06:24.1512770Z ##[error]Process completed with exit code 1.
2022-01-28T02:06:24.1527057Z ##[group]Run # Prune all of the docker images
2022-01-28T02:06:24.1527305Z �[36;1m# Prune all of the docker images�[0m
2022-01-28T02:06:24.1527502Z �[36;1mdocker system prune -af�[0m
2022-01-28T02:06:24.1537955Z shell: /usr/bin/bash -e {0}
2022-01-28T02:06:24.1538128Z env:
2022-01-28T02:06:24.1538340Z   BUILD_ENVIRONMENT: linux-vulkan-bionic-py3.7-clang9
2022-01-28T02:06:24.1538690Z   DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-py3.7-clang9
2022-01-28T02:06:24.1539031Z   SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2
2022-01-28T02:06:24.1539326Z   XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla

See GitHub Actions build linux-xenial-py3.7-clang7-asan / build (4/14)

Step: "Checkout PyTorch" (full log | diagnosis details | 🔁 rerun)

2022-01-28T02:05:53.2523819Z /home/ec2-user/act...ait_for_ssh_to_drain.sh: No such file or directory
  "cla signed",
  "ciflow/default"
]
2022-01-28T02:05:53.2494976Z   GITHUB_TOKEN: ***
2022-01-28T02:05:53.2495373Z   AWS_DEFAULT_REGION: us-east-1
2022-01-28T02:05:53.2495579Z   PR_NUMBER: 68969
2022-01-28T02:05:53.2495798Z   SHA1: 4be2c41e0478f21ab98df793bb55b8496e65cace
2022-01-28T02:05:53.2496005Z   PYTORCH_RETRY_TEST_CASES: 1
2022-01-28T02:05:53.2496257Z   JOB_BASE_NAME: linux-xenial-py3.7-clang7-asan-build
2022-01-28T02:05:53.2496477Z ##[endgroup]
2022-01-28T02:05:53.2523819Z /home/ec2-user/actions-runner/_work/_temp/63d75639-08a8-43e9-8d9b-0335f4de5fd5.sh: line 1: .github/scripts/wait_for_ssh_to_drain.sh: No such file or directory
2022-01-28T02:05:53.2526736Z ##[error]Process completed with exit code 1.
2022-01-28T02:05:53.2542886Z ##[group]Run # Prune all of the docker images
2022-01-28T02:05:53.2543181Z �[36;1m# Prune all of the docker images�[0m
2022-01-28T02:05:53.2543437Z �[36;1mdocker system prune -af�[0m
2022-01-28T02:05:53.2554228Z shell: /usr/bin/bash -e {0}
2022-01-28T02:05:53.2554430Z env:
2022-01-28T02:05:53.2554690Z   BUILD_ENVIRONMENT: linux-xenial-py3.7-clang7-asan
2022-01-28T02:05:53.2555080Z   DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang7-asan
2022-01-28T02:05:53.2555475Z   SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2
2022-01-28T02:05:53.2555815Z   XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla

See GitHub Actions build linux-xenial-py3.7-clang7-onnx / build (5/14)

Step: "Checkout PyTorch" (full log | diagnosis details | 🔁 rerun)

2022-01-28T02:06:10.1140541Z /home/ec2-user/act...ait_for_ssh_to_drain.sh: No such file or directory
  "cla signed",
  "ciflow/default"
]
2022-01-28T02:06:10.1114085Z   GITHUB_TOKEN: ***
2022-01-28T02:06:10.1114273Z   AWS_DEFAULT_REGION: us-east-1
2022-01-28T02:06:10.1114445Z   PR_NUMBER: 68969
2022-01-28T02:06:10.1114651Z   SHA1: 4be2c41e0478f21ab98df793bb55b8496e65cace
2022-01-28T02:06:10.1114848Z   PYTORCH_RETRY_TEST_CASES: 1
2022-01-28T02:06:10.1115090Z   JOB_BASE_NAME: linux-xenial-py3.7-clang7-onnx-build
2022-01-28T02:06:10.1115313Z ##[endgroup]
2022-01-28T02:06:10.1140541Z /home/ec2-user/actions-runner/_work/_temp/74a77735-8965-4b10-9300-103bc9ba81dc.sh: line 1: .github/scripts/wait_for_ssh_to_drain.sh: No such file or directory
2022-01-28T02:06:10.1142891Z ##[error]Process completed with exit code 1.
2022-01-28T02:06:10.1184283Z ##[group]Run # Prune all of the docker images
2022-01-28T02:06:10.1184533Z �[36;1m# Prune all of the docker images�[0m
2022-01-28T02:06:10.1184762Z �[36;1mdocker system prune -af�[0m
2022-01-28T02:06:10.1195436Z shell: /usr/bin/bash -e {0}
2022-01-28T02:06:10.1195615Z env:
2022-01-28T02:06:10.1195825Z   BUILD_ENVIRONMENT: linux-xenial-py3.7-clang7-onnx
2022-01-28T02:06:10.1196294Z   DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang7-onnx
2022-01-28T02:06:10.1196631Z   SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2
2022-01-28T02:06:10.1196939Z   XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla

See GitHub Actions build linux-xenial-py3-clang5-mobile-custom-build-static / build (6/14)

Step: "Checkout PyTorch" (full log | diagnosis details | 🔁 rerun)

2022-01-28T02:05:52.0407035Z /home/ec2-user/act...ait_for_ssh_to_drain.sh: No such file or directory
  "cla signed",
  "ciflow/default"
]
2022-01-28T02:05:52.0380843Z   GITHUB_TOKEN: ***
2022-01-28T02:05:52.0381037Z   AWS_DEFAULT_REGION: us-east-1
2022-01-28T02:05:52.0381220Z   PR_NUMBER: 68969
2022-01-28T02:05:52.0381435Z   SHA1: 4be2c41e0478f21ab98df793bb55b8496e65cace
2022-01-28T02:05:52.0381644Z   PYTORCH_RETRY_TEST_CASES: 1
2022-01-28T02:05:52.0381955Z   JOB_BASE_NAME: linux-xenial-py3-clang5-mobile-custom-build-static-build
2022-01-28T02:05:52.0382246Z ##[endgroup]
2022-01-28T02:05:52.0407035Z /home/ec2-user/actions-runner/_work/_temp/7ea8c3e9-7d4f-4267-a7e3-4b6e820b0f9b.sh: line 1: .github/scripts/wait_for_ssh_to_drain.sh: No such file or directory
2022-01-28T02:05:52.0409931Z ##[error]Process completed with exit code 1.
2022-01-28T02:05:52.0425123Z ##[group]Run # Prune all of the docker images
2022-01-28T02:05:52.0425382Z �[36;1m# Prune all of the docker images�[0m
2022-01-28T02:05:52.0425601Z �[36;1mdocker system prune -af�[0m
2022-01-28T02:05:52.0435913Z shell: /usr/bin/bash -e {0}
2022-01-28T02:05:52.0436106Z env:
2022-01-28T02:05:52.0436373Z   BUILD_ENVIRONMENT: linux-xenial-py3-clang5-mobile-custom-build-static
2022-01-28T02:05:52.0436825Z   DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c
2022-01-28T02:05:52.0437207Z   SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2
2022-01-28T02:05:52.0437531Z   XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla

See GitHub Actions build linux-xenial-py3.7-gcc7 / build (7/14)

Step: "Checkout PyTorch" (full log | diagnosis details | 🔁 rerun)

2022-01-28T14:10:43.2343680Z /home/ec2-user/act...ait_for_ssh_to_drain.sh: No such file or directory
  "cla signed",
  "ciflow/default"
]
2022-01-28T14:10:43.2316674Z   GITHUB_TOKEN: ***
2022-01-28T14:10:43.2316867Z   AWS_DEFAULT_REGION: us-east-1
2022-01-28T14:10:43.2317040Z   PR_NUMBER: 68969
2022-01-28T14:10:43.2317243Z   SHA1: 4be2c41e0478f21ab98df793bb55b8496e65cace
2022-01-28T14:10:43.2317458Z   PYTORCH_RETRY_TEST_CASES: 1
2022-01-28T14:10:43.2317672Z   JOB_BASE_NAME: linux-xenial-py3.7-gcc7-build
2022-01-28T14:10:43.2317884Z ##[endgroup]
2022-01-28T14:10:43.2343680Z /home/ec2-user/actions-runner/_work/_temp/86cb55ee-0524-4343-9edc-94dae3f583d8.sh: line 1: .github/scripts/wait_for_ssh_to_drain.sh: No such file or directory
2022-01-28T14:10:43.2346456Z ##[error]Process completed with exit code 1.
2022-01-28T14:10:43.2377738Z ##[group]Run # Prune all of the docker images
2022-01-28T14:10:43.2377988Z �[36;1m# Prune all of the docker images�[0m
2022-01-28T14:10:43.2378188Z �[36;1mdocker system prune -af�[0m
2022-01-28T14:10:43.2388257Z shell: /usr/bin/bash -e {0}
2022-01-28T14:10:43.2388419Z env:
2022-01-28T14:10:43.2388631Z   BUILD_ENVIRONMENT: linux-xenial-py3.7-gcc7
2022-01-28T14:10:43.2388950Z   DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3.7-gcc7
2022-01-28T14:10:43.2389288Z   SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2
2022-01-28T14:10:43.2389585Z   XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla

See GitHub Actions build linux-bionic-py3.7-clang9 / build (8/14)

Step: "Checkout PyTorch" (full log | diagnosis details | 🔁 rerun)

2022-01-28T02:05:37.8797130Z /home/ec2-user/act...ait_for_ssh_to_drain.sh: No such file or directory
  "cla signed",
  "ciflow/default"
]
2022-01-28T02:05:37.8770830Z   GITHUB_TOKEN: ***
2022-01-28T02:05:37.8771017Z   AWS_DEFAULT_REGION: us-east-1
2022-01-28T02:05:37.8771190Z   PR_NUMBER: 68969
2022-01-28T02:05:37.8771396Z   SHA1: 4be2c41e0478f21ab98df793bb55b8496e65cace
2022-01-28T02:05:37.8771599Z   PYTORCH_RETRY_TEST_CASES: 1
2022-01-28T02:05:37.8771835Z   JOB_BASE_NAME: linux-bionic-py3.7-clang9-build
2022-01-28T02:05:37.8772047Z ##[endgroup]
2022-01-28T02:05:37.8797130Z /home/ec2-user/actions-runner/_work/_temp/b785f044-1808-43ec-923d-576d1f2c6dcd.sh: line 1: .github/scripts/wait_for_ssh_to_drain.sh: No such file or directory
2022-01-28T02:05:37.8799671Z ##[error]Process completed with exit code 1.
2022-01-28T02:05:37.8829498Z ##[group]Run # Prune all of the docker images
2022-01-28T02:05:37.8829763Z �[36;1m# Prune all of the docker images�[0m
2022-01-28T02:05:37.8830004Z �[36;1mdocker system prune -af�[0m
2022-01-28T02:05:37.8840149Z shell: /usr/bin/bash -e {0}
2022-01-28T02:05:37.8840354Z env:
2022-01-28T02:05:37.8840596Z   BUILD_ENVIRONMENT: linux-bionic-py3.7-clang9
2022-01-28T02:05:37.8840943Z   DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-py3.7-clang9
2022-01-28T02:05:37.8841309Z   SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2
2022-01-28T02:05:37.8841633Z   XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla

See GitHub Actions build linux-docs / build (9/14)

Step: "Checkout PyTorch" (full log | diagnosis details | 🔁 rerun)

2022-01-28T02:05:26.9002570Z /home/ec2-user/act...ait_for_ssh_to_drain.sh: No such file or directory
  "cla signed",
  "ciflow/default"
]
2022-01-28T02:05:26.8977370Z   GITHUB_TOKEN: ***
2022-01-28T02:05:26.8977560Z   AWS_DEFAULT_REGION: us-east-1
2022-01-28T02:05:26.8977736Z   PR_NUMBER: 68969
2022-01-28T02:05:26.8977942Z   SHA1: 4be2c41e0478f21ab98df793bb55b8496e65cace
2022-01-28T02:05:26.8978143Z   PYTORCH_RETRY_TEST_CASES: 1
2022-01-28T02:05:26.8978524Z   JOB_BASE_NAME: linux-docs-build
2022-01-28T02:05:26.8978716Z ##[endgroup]
2022-01-28T02:05:26.9002570Z /home/ec2-user/actions-runner/_work/_temp/44ecc678-57a1-40ca-8c7b-51d094551b14.sh: line 1: .github/scripts/wait_for_ssh_to_drain.sh: No such file or directory
2022-01-28T02:05:26.9005292Z ##[error]Process completed with exit code 1.
2022-01-28T02:05:26.9027176Z ##[group]Run # Prune all of the docker images
2022-01-28T02:05:26.9027592Z �[36;1m# Prune all of the docker images�[0m
2022-01-28T02:05:26.9027941Z �[36;1mdocker system prune -af�[0m
2022-01-28T02:05:26.9042111Z shell: /usr/bin/bash -e {0}
2022-01-28T02:05:26.9042410Z env:
2022-01-28T02:05:26.9042679Z   BUILD_ENVIRONMENT: linux-docs
2022-01-28T02:05:26.9043221Z   DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3.7-gcc5.4
2022-01-28T02:05:26.9043833Z   SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2
2022-01-28T02:05:26.9044390Z   XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla

See GitHub Actions build linux-xenial-py3.7-gcc7-no-ops / build (10/14)

Step: "Checkout PyTorch" (full log | diagnosis details | 🔁 rerun)

2022-01-28T02:05:48.7763451Z /home/ec2-user/act...ait_for_ssh_to_drain.sh: No such file or directory
  "cla signed",
  "ciflow/default"
]
2022-01-28T02:05:48.7735857Z   GITHUB_TOKEN: ***
2022-01-28T02:05:48.7736045Z   AWS_DEFAULT_REGION: us-east-1
2022-01-28T02:05:48.7736217Z   PR_NUMBER: 68969
2022-01-28T02:05:48.7736424Z   SHA1: 4be2c41e0478f21ab98df793bb55b8496e65cace
2022-01-28T02:05:48.7736621Z   PYTORCH_RETRY_TEST_CASES: 1
2022-01-28T02:05:48.7736864Z   JOB_BASE_NAME: linux-xenial-py3.7-gcc7-no-ops-build
2022-01-28T02:05:48.7737087Z ##[endgroup]
2022-01-28T02:05:48.7763451Z /home/ec2-user/actions-runner/_work/_temp/23ffe5d6-b5f0-477d-ba9a-e5107bcbf783.sh: line 1: .github/scripts/wait_for_ssh_to_drain.sh: No such file or directory
2022-01-28T02:05:48.7766171Z ##[error]Process completed with exit code 1.
2022-01-28T02:05:48.7779907Z ##[group]Run # Prune all of the docker images
2022-01-28T02:05:48.7780145Z �[36;1m# Prune all of the docker images�[0m
2022-01-28T02:05:48.7780354Z �[36;1mdocker system prune -af�[0m
2022-01-28T02:05:48.7790202Z shell: /usr/bin/bash -e {0}
2022-01-28T02:05:48.7790378Z env:
2022-01-28T02:05:48.7790597Z   BUILD_ENVIRONMENT: linux-xenial-py3.7-gcc7-no-ops
2022-01-28T02:05:48.7790930Z   DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3.7-gcc7
2022-01-28T02:05:48.7791269Z   SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2
2022-01-28T02:05:48.7791566Z   XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla

See GitHub Actions build linux-xenial-cuda11.3-py3.7-gcc7-bazel-test / build-and-test (11/14)

Step: "Checkout PyTorch" (full log | diagnosis details | 🔁 rerun)

2022-01-28T02:06:46.3232516Z /home/ec2-user/act...ait_for_ssh_to_drain.sh: No such file or directory
  "ciflow/default"
]
2022-01-28T02:06:46.3203408Z   GITHUB_TOKEN: ***
2022-01-28T02:06:46.3203594Z   AWS_DEFAULT_REGION: us-east-1
2022-01-28T02:06:46.3203787Z   PR_NUMBER: 68969
2022-01-28T02:06:46.3204001Z   SHA1: 4be2c41e0478f21ab98df793bb55b8496e65cace
2022-01-28T02:06:46.3204213Z   PYTORCH_RETRY_TEST_CASES: 1
2022-01-28T02:06:46.3204519Z   JOB_BASE_NAME: linux-xenial-cuda11.3-py3.7-gcc7-bazel-test-build-and-test
2022-01-28T02:06:46.3204794Z   NUM_TEST_SHARDS: 1
2022-01-28T02:06:46.3204974Z ##[endgroup]
2022-01-28T02:06:46.3232516Z /home/ec2-user/actions-runner/_work/_temp/83baa67a-2a8a-4a1c-b61e-1c4f21dffa1c.sh: line 1: .github/scripts/wait_for_ssh_to_drain.sh: No such file or directory
2022-01-28T02:06:46.3235092Z ##[error]Process completed with exit code 1.
2022-01-28T02:06:46.3250875Z ##[group]Run # Ensure the working directory gets chowned back to the current user
2022-01-28T02:06:46.3251271Z �[36;1m# Ensure the working directory gets chowned back to the current user�[0m
2022-01-28T02:06:46.3251584Z �[36;1mdocker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .�[0m
2022-01-28T02:06:46.3262190Z shell: /usr/bin/bash -e {0}
2022-01-28T02:06:46.3262364Z env:
2022-01-28T02:06:46.3262611Z   BUILD_ENVIRONMENT: linux-xenial-cuda11.3-py3.7-gcc7-bazel-test
2022-01-28T02:06:46.3263021Z   DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7
2022-01-28T02:06:46.3263400Z   SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2
2022-01-28T02:06:46.3263719Z   XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla

See GitHub Actions build linux-xenial-py3.7-gcc5.4 / build (12/14)

Step: "Checkout PyTorch" (full log | diagnosis details | 🔁 rerun)

2022-01-28T02:06:04.2841454Z /home/ec2-user/act...ait_for_ssh_to_drain.sh: No such file or directory
  "cla signed",
  "ciflow/default"
]
2022-01-28T02:06:04.2807638Z   GITHUB_TOKEN: ***
2022-01-28T02:06:04.2807952Z   AWS_DEFAULT_REGION: us-east-1
2022-01-28T02:06:04.2808256Z   PR_NUMBER: 68969
2022-01-28T02:06:04.2808578Z   SHA1: 4be2c41e0478f21ab98df793bb55b8496e65cace
2022-01-28T02:06:04.2808876Z   PYTORCH_RETRY_TEST_CASES: 1
2022-01-28T02:06:04.2809229Z   JOB_BASE_NAME: linux-xenial-py3.7-gcc5.4-build
2022-01-28T02:06:04.2809550Z ##[endgroup]
2022-01-28T02:06:04.2841454Z /home/ec2-user/actions-runner/_work/_temp/343e43b6-dfcd-4fdb-a3a6-78e02bba0e1f.sh: line 1: .github/scripts/wait_for_ssh_to_drain.sh: No such file or directory
2022-01-28T02:06:04.2843398Z ##[error]Process completed with exit code 1.
2022-01-28T02:06:04.2858979Z ##[group]Run # Prune all of the docker images
2022-01-28T02:06:04.2859256Z �[36;1m# Prune all of the docker images�[0m
2022-01-28T02:06:04.2859504Z �[36;1mdocker system prune -af�[0m
2022-01-28T02:06:04.2869694Z shell: /usr/bin/bash -e {0}
2022-01-28T02:06:04.2869899Z env:
2022-01-28T02:06:04.2870123Z   BUILD_ENVIRONMENT: linux-xenial-py3.7-gcc5.4
2022-01-28T02:06:04.2870490Z   DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3.7-gcc5.4
2022-01-28T02:06:04.2870848Z   SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2
2022-01-28T02:06:04.2871188Z   XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla

See GitHub Actions build linux-xenial-py3-clang5-mobile-build / build (13/14)

Step: "Checkout PyTorch" (full log | diagnosis details | 🔁 rerun)

2022-01-28T02:06:25.4235466Z /home/ec2-user/act...ait_for_ssh_to_drain.sh: No such file or directory
  "cla signed",
  "ciflow/default"
]
2022-01-28T02:06:25.4209600Z   GITHUB_TOKEN: ***
2022-01-28T02:06:25.4209784Z   AWS_DEFAULT_REGION: us-east-1
2022-01-28T02:06:25.4209955Z   PR_NUMBER: 68969
2022-01-28T02:06:25.4210157Z   SHA1: 4be2c41e0478f21ab98df793bb55b8496e65cace
2022-01-28T02:06:25.4210368Z   PYTORCH_RETRY_TEST_CASES: 1
2022-01-28T02:06:25.4210610Z   JOB_BASE_NAME: linux-xenial-py3-clang5-mobile-build-build
2022-01-28T02:06:25.4210846Z ##[endgroup]
2022-01-28T02:06:25.4235466Z /home/ec2-user/actions-runner/_work/_temp/7a46c300-3679-4dea-9590-91ff6b935ced.sh: line 1: .github/scripts/wait_for_ssh_to_drain.sh: No such file or directory
2022-01-28T02:06:25.4238321Z ##[error]Process completed with exit code 1.
2022-01-28T02:06:25.4252566Z ##[group]Run # Prune all of the docker images
2022-01-28T02:06:25.4252799Z �[36;1m# Prune all of the docker images�[0m
2022-01-28T02:06:25.4253008Z �[36;1mdocker system prune -af�[0m
2022-01-28T02:06:25.4263214Z shell: /usr/bin/bash -e {0}
2022-01-28T02:06:25.4263384Z env:
2022-01-28T02:06:25.4263603Z   BUILD_ENVIRONMENT: linux-xenial-py3-clang5-mobile-build
2022-01-28T02:06:25.4263966Z   DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-asan
2022-01-28T02:06:25.4264311Z   SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2
2022-01-28T02:06:25.4264604Z   XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla

See GitHub Actions build pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit / build-and-test (14/14)

Step: "Checkout PyTorch" (full log | diagnosis details | 🔁 rerun)

2022-01-28T02:06:31.5512227Z /home/ec2-user/act...ait_for_ssh_to_drain.sh: No such file or directory
  "ciflow/default"
]
2022-01-28T02:06:31.5485075Z   GITHUB_TOKEN: ***
2022-01-28T02:06:31.5485281Z   AWS_DEFAULT_REGION: us-east-1
2022-01-28T02:06:31.5485496Z   PR_NUMBER: 68969
2022-01-28T02:06:31.5485730Z   SHA1: 4be2c41e0478f21ab98df793bb55b8496e65cace
2022-01-28T02:06:31.5485960Z   PYTORCH_RETRY_TEST_CASES: 1
2022-01-28T02:06:31.5486438Z   JOB_BASE_NAME: pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit-build-and-test
2022-01-28T02:06:31.5486883Z   NUM_TEST_SHARDS: 1
2022-01-28T02:06:31.5487083Z ##[endgroup]
2022-01-28T02:06:31.5512227Z /home/ec2-user/actions-runner/_work/_temp/f01c691f-3fa2-447d-b078-7493bfec6fe2.sh: line 1: .github/scripts/wait_for_ssh_to_drain.sh: No such file or directory
2022-01-28T02:06:31.5515828Z ##[error]Process completed with exit code 1.
2022-01-28T02:06:31.5530522Z ##[group]Run # Ensure the working directory gets chowned back to the current user
2022-01-28T02:06:31.5530839Z �[36;1m# Ensure the working directory gets chowned back to the current user�[0m
2022-01-28T02:06:31.5531132Z �[36;1mdocker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .�[0m
2022-01-28T02:06:31.5541868Z shell: /usr/bin/bash -e {0}
2022-01-28T02:06:31.5542108Z env:
2022-01-28T02:06:31.5542608Z   BUILD_ENVIRONMENT: pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit
2022-01-28T02:06:31.5543342Z   DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c
2022-01-28T02:06:31.5543794Z   SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2
2022-01-28T02:06:31.5544088Z   XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla

2 failures not recognized by patterns:

Job Step Action
GitHub Actions win-vs2019-cuda11.3-py3 / build Checkout PyTorch 🔁 rerun
GitHub Actions win-vs2019-cpu-py3 / build Checkout PyTorch 🔁 rerun

ci.pytorch.org: 1 failed


This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@emcastillo emcastillo changed the title Add a lightweight reparametrization mechanism for functional calls Add lightweight reparametrization for _stateless calls Nov 29, 2021
@soulitzer soulitzer added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Nov 29, 2021
@zou3519 zou3519 self-requested a review November 30, 2021 23:33
Copy link
Collaborator

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very interesting!
How does this compose with other parametrization now?

@emcastillo
Copy link
Collaborator Author

emcastillo commented Dec 20, 2021

@albanD thanks!!!

How does this compose with other parametrization now?

I just fixed a small bug relating to this. Currently the module maintains the original behavior, this means that the parameter passed as an argument to the functional call will be used instead of the original parametrization for that attribute. This is
verified with the test_reparametrized_module

We can also add a mode in which the parameterizations are kept as they are and never replaced, or we can try to introspect the parameterizations and replace the parameter inside, but this feels very hacky.

@albanD albanD requested a review from Chillee December 21, 2021 12:53
@emcastillo
Copy link
Collaborator Author

@Chillee @albanD guess you guys are still on holidays :). Let me know what you think of this when you have some spare time. Thanks!

@Chillee
Copy link
Collaborator

Chillee commented Jan 4, 2022

@emcastillo I'll leave the actual review to Alban, but just wanted to say that this is awesome, and we'd be glad to change functorch to using this after our performance concerns have been resolved :)

Copy link
Collaborator

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking into this.
I do agree this is a solution that will work even though we do loose a little bit of flexibility: in particular we enforce that stateless ignores all parametrization which was not the case before. It is an open question if we want that though?

I am still not convinced though that parametrization cannot be sped up to be similar to this and I think it will be a generally useful thing to do.

@Chillee is this perf improvement enough that you can use this for functorch? If so, we can add this as a temporary fix and then move back to parametrization when the perf gap there has been solved?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a TODO?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes! I should fix this comment

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super().__getattribute__ here right?

Copy link
Collaborator Author

@emcastillo emcastillo Jan 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that led to infinite recursion :D, so I went with the base object method. (probably I Just did something wrong as usual)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tried again to fix it and it worked now, seems I was messing with something else originally :D

@emcastillo
Copy link
Collaborator Author

emcastillo commented Jan 5, 2022

in particular we enforce that stateless ignores all parametrization which was not the case before. It is an open question if we want that though?

Actually, this is the current behavior in master branch if I am not mistaken, we just register a parametrization that acts like an identity function for the value of the parameter passed as a tensor. So in essence, we are replacing the previously registered parameterizations and returning the parameter that we pass in the state dict.

I think it is possible to call the actual parametrization in this PR approach, do we want that?
maybe adding a kwarg override_parametrization to control this behavior is a good idea/

I am still not convinced though that parametrization cannot be sped up to be similar to this and I think it will be a generally useful thing to do.

I tried to change the parametrization code to support spare tensors along with modules, but all the typing annotations made the code to be a cludge, I ended up with this design because it did the same thing and is cleaner.
Wrapping every parameter in a torch.nn.Module and accessing it via __call__ every time is overkill and the source of performance degradation. So maybe allowing simple callables can ease the burden? I should try this and measure.

BTW thanks for the comments! while writing this, I realized several alternatives to improve this that I didn't consider before.

@emcastillo
Copy link
Collaborator Author

emcastillo commented Jan 7, 2022

@albanD, I just pushed support to apply the existing parameterizations to a parameter via a kwarg. I think this should solve your main concern!

If we have a parametrization over module.weight, and we pass a parameters_and_buffers['weight'] value, with this option we will do parametrization(parameters_and_buffers['weight']) instead of just blindly overwriting the former.
This allows us to use the parameterizations directly instead of directly passing parametrizations.weight.original paths that may be hard to provide in complex modules.

Also, I cleaned up old comments and add type declarations.

@emcastillo emcastillo force-pushed the reparam-tensors branch 2 times, most recently from e6ff61e to dab2ca5 Compare January 7, 2022 06:35
@emcastillo
Copy link
Collaborator Author

emcastillo commented Jan 27, 2022

@albanD @Chillee

Just measured the same example with functorch and these are the results

% of parameters passed CPU Time (us) GPU Time (us)
This PR with 100% of params changed 11911 190800
functorch 14014 191727

Seems that this PR is slightly faster,
the main difference is that functorch has the make_functional call that creates a functional version of the module and after that, it just swaps the params on every functional call.

This PR will re-create the functional module on every call to avoid dealing with shared-state as an initial requirement for #61447.
The execution time is pretty similar for a Renet-50.

import cupyx
import torch
import torchvision.models as models
from functorch import make_functional_with_buffers


def main():
    model = models.resnet50(pretrained=True).cuda()
    func, params, buffers = make_functional_with_buffers(model)
    func.__name__ = 'resnet_func'
    x = torch.rand((128, 3, 224, 224)).cuda()
    print('Non functional call')
    print(cupyx.time.repeat(lambda: model(x), n_repeat=20))
    print('functional call')
    print(cupyx.time.repeat(func, (params, buffers, x), n_repeat=20))


if __name__ == "__main__":
    main()

@emcastillo
Copy link
Collaborator Author

emcastillo commented Jan 27, 2022

Sorry, for the above comment I took the master _stateless times instead of this PR ones. This has been corrected now.

Copy link
Collaborator

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change looks good to me.
Just one question on wether we want the apply_parametrizations but that's it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: false -> False

@Chillee
Copy link
Collaborator

Chillee commented Jan 27, 2022

btw, this is a repro for the issue I was running into before.

(needs this PR to handle duplicate params: #71542)

import torch
from torch.nn.utils import _stateless
from transformers import AutoConfig, AutoModelForCausalLM, AutoModelForMaskedLM, AutoModelForSeq2SeqLM, ReformerConfig, BigBirdConfig, BertConfig

config = AutoConfig.from_pretrained("t5-small")
model =  AutoModelForSeq2SeqLM.from_config(config)
input_ids = torch.randint(0, config.vocab_size, (1, 128))
decoder_ids = torch.randint(0, config.vocab_size, (1, 128))

train_inputs = {'input_ids': input_ids, 'labels': decoder_ids}

params_and_buffers = {**dict(model.named_parameters(remove_duplicate=False)), **dict(model.named_buffers(remove_duplicate=False))}

_stateless.functional_call(model, params_and_buffers, (), train_inputs)

Currently throws

ValueError: Module Embedding(32128, 512) does not have a parametrization on weight

Would be great if we could verify this works now, but not blocking (since it's broken anyways right now on master...)

@albanD

@jbschlosser
Copy link
Contributor

Would be great if we could verify this works now, but not blocking (since it's broken anyways right now on master...)

For posterity: this is broken on master because of the weight tying - calling remove_parametrization() removes both parametrizations, making a subsequent call to remove_parametrization() fail.

Since this PR changes the logic to avoid using the parametrization mechanism in torch.nn.utils underneath, I expect it to fix this issue. Definitely agree we should have a test for the weight-tied case once #71542 lands.

Emilio Castillo added 2 commits January 28, 2022 00:42
@emcastillo
Copy link
Collaborator Author

emcastillo commented Jan 28, 2022

@Chillee @jbschlosser I just tested this PR together with #71542 in the code snippet above and I confirm the error is gone!
Also, double-checked and the error appears in #71542 alone :).

@emcastillo
Copy link
Collaborator Author

@albanD, review comment addressed! this should be ready to ship :)
Thanks!

Copy link
Collaborator

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update!

@facebook-github-bot
Copy link
Contributor

@albanD has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@albanD
Copy link
Collaborator

albanD commented Jan 28, 2022

@pytorchbot ciflow rerun

@pytorch-bot
Copy link

pytorch-bot bot commented Jan 28, 2022

This command didn't do anything.
You don't need to manually issue ciflow rerun commands anymore. Just adding a ciflow/ label will trigger the workflow.

facebook-github-bot pushed a commit that referenced this pull request Jan 28, 2022
Summary:
#61447 introduced a mechanism for performing functional calls in a model using the reparametrization API. However, the overhead introduced in a single call was too large.
I tried to address this by modifying the reparametrization code to support spare tensors, but the changes needed were too large due to type checking and several parts of the code expecting actual `nn.Module` objects so this option was not feasible.

resnet50 and call functional with a parameters dict covering the 0, 25, 50, and 100% of the model total parameters.

Used script:
https://gist.github.com/emcastillo/f344a58638bd71d130c71c45f86f0c3a

| % of parameters passed | CPU Time (us) | GPU Time (us) |
|------------------------|---------------|---------------|
| regular call           | 5539          | 184909        |
| 0                      | 5561          | 184843        |
| 25                     | 11363         | 189236        |
| 50                     | 18716         | 195378        |
| 75                     | 22851         | 198641        |
| 100                    | 27441         | 202281        |

This PR just swaps the `__getattr__` of the submodules to look into a dict holding only the parameters when called, greatly reducing the burden of having to instantiate custom modules and calling forward to just retrieve a tensor.

The execution times now are as follows:

| % of parameters passed | CPU Time (us) | GPU Time (us) |
|------------------------|---------------|---------------|
| regular call           | 5939          | 187533        |
| 0                      | 5899          | 187570        |
| 25                     | 8541         | 188953        |
| 50                     | 10045         | 189826        |
| 75                     | 11049         | 190344        |
| 100                    | 11911         | 190800        |
| functorch with 100% params | 14014 | 191727

Now we see that the CPU time overhead is greatly reduced and the GPU time barely increases due to the effective overlap.

cc albanD zou3519

Pull Request resolved: #68969

Reviewed By: george-qi

Differential Revision: D33836360

Pulled By: albanD

fbshipit-source-id: 532561f64b18ca14c6ae2d77dcacb339397a589d
pytorchmergebot pushed a commit that referenced this pull request Jan 28, 2022
Summary:
#61447 introduced a mechanism for performing functional calls in a model using the reparametrization API. However, the overhead introduced in a single call was too large.
I tried to address this by modifying the reparametrization code to support spare tensors, but the changes needed were too large due to type checking and several parts of the code expecting actual `nn.Module` objects so this option was not feasible.

resnet50 and call functional with a parameters dict covering the 0, 25, 50, and 100% of the model total parameters.

Used script:
https://gist.github.com/emcastillo/f344a58638bd71d130c71c45f86f0c3a

| % of parameters passed | CPU Time (us) | GPU Time (us) |
|------------------------|---------------|---------------|
| regular call           | 5539          | 184909        |
| 0                      | 5561          | 184843        |
| 25                     | 11363         | 189236        |
| 50                     | 18716         | 195378        |
| 75                     | 22851         | 198641        |
| 100                    | 27441         | 202281        |

This PR just swaps the `__getattr__` of the submodules to look into a dict holding only the parameters when called, greatly reducing the burden of having to instantiate custom modules and calling forward to just retrieve a tensor.

The execution times now are as follows:

| % of parameters passed | CPU Time (us) | GPU Time (us) |
|------------------------|---------------|---------------|
| regular call           | 5939          | 187533        |
| 0                      | 5899          | 187570        |
| 25                     | 8541         | 188953        |
| 50                     | 10045         | 189826        |
| 75                     | 11049         | 190344        |
| 100                    | 11911         | 190800        |
| functorch with 100% params | 14014 | 191727

Now we see that the CPU time overhead is greatly reduced and the GPU time barely increases due to the effective overlap.

cc albanD zou3519

Pull Request resolved: #68969

Reviewed By: george-qi

Differential Revision: D33836360

Pulled By: albanD

fbshipit-source-id: 532561f64b18ca14c6ae2d77dcacb339397a589d
(cherry picked from commit fd4b6bd)
cyyever pushed a commit to cyyever/pytorch_private that referenced this pull request Feb 3, 2022
Summary:
pytorch/pytorch#61447 introduced a mechanism for performing functional calls in a model using the reparametrization API. However, the overhead introduced in a single call was too large.
I tried to address this by modifying the reparametrization code to support spare tensors, but the changes needed were too large due to type checking and several parts of the code expecting actual `nn.Module` objects so this option was not feasible.

resnet50 and call functional with a parameters dict covering the 0, 25, 50, and 100% of the model total parameters.

Used script:
https://gist.github.com/emcastillo/f344a58638bd71d130c71c45f86f0c3a

| % of parameters passed | CPU Time (us) | GPU Time (us) |
|------------------------|---------------|---------------|
| regular call           | 5539          | 184909        |
| 0                      | 5561          | 184843        |
| 25                     | 11363         | 189236        |
| 50                     | 18716         | 195378        |
| 75                     | 22851         | 198641        |
| 100                    | 27441         | 202281        |

This PR just swaps the `__getattr__` of the submodules to look into a dict holding only the parameters when called, greatly reducing the burden of having to instantiate custom modules and calling forward to just retrieve a tensor.

The execution times now are as follows:

| % of parameters passed | CPU Time (us) | GPU Time (us) |
|------------------------|---------------|---------------|
| regular call           | 5939          | 187533        |
| 0                      | 5899          | 187570        |
| 25                     | 8541         | 188953        |
| 50                     | 10045         | 189826        |
| 75                     | 11049         | 190344        |
| 100                    | 11911         | 190800        |
| functorch with 100% params | 14014 | 191727

Now we see that the CPU time overhead is greatly reduced and the GPU time barely increases due to the effective overlap.

cc albanD zou3519

Pull Request resolved: pytorch/pytorch#68969

Reviewed By: george-qi

Differential Revision: D33836360

Pulled By: albanD

fbshipit-source-id: 532561f64b18ca14c6ae2d77dcacb339397a589d
(cherry picked from commit fd4b6bd)
cyyever pushed a commit to cyyever/pytorch_private that referenced this pull request Feb 3, 2022
Summary:
pytorch/pytorch#61447 introduced a mechanism for performing functional calls in a model using the reparametrization API. However, the overhead introduced in a single call was too large.
I tried to address this by modifying the reparametrization code to support spare tensors, but the changes needed were too large due to type checking and several parts of the code expecting actual `nn.Module` objects so this option was not feasible.

resnet50 and call functional with a parameters dict covering the 0, 25, 50, and 100% of the model total parameters.

Used script:
https://gist.github.com/emcastillo/f344a58638bd71d130c71c45f86f0c3a

| % of parameters passed | CPU Time (us) | GPU Time (us) |
|------------------------|---------------|---------------|
| regular call           | 5539          | 184909        |
| 0                      | 5561          | 184843        |
| 25                     | 11363         | 189236        |
| 50                     | 18716         | 195378        |
| 75                     | 22851         | 198641        |
| 100                    | 27441         | 202281        |

This PR just swaps the `__getattr__` of the submodules to look into a dict holding only the parameters when called, greatly reducing the burden of having to instantiate custom modules and calling forward to just retrieve a tensor.

The execution times now are as follows:

| % of parameters passed | CPU Time (us) | GPU Time (us) |
|------------------------|---------------|---------------|
| regular call           | 5939          | 187533        |
| 0                      | 5899          | 187570        |
| 25                     | 8541         | 188953        |
| 50                     | 10045         | 189826        |
| 75                     | 11049         | 190344        |
| 100                    | 11911         | 190800        |
| functorch with 100% params | 14014 | 191727

Now we see that the CPU time overhead is greatly reduced and the GPU time barely increases due to the effective overlap.

cc albanD zou3519

Pull Request resolved: pytorch/pytorch#68969

Reviewed By: george-qi

Differential Revision: D33836360

Pulled By: albanD

fbshipit-source-id: 532561f64b18ca14c6ae2d77dcacb339397a589d
(cherry picked from commit fd4b6bd)
cyyever pushed a commit to cyyever/pytorch_private that referenced this pull request Feb 9, 2022
Summary:
pytorch/pytorch#61447 introduced a mechanism for performing functional calls in a model using the reparametrization API. However, the overhead introduced in a single call was too large.
I tried to address this by modifying the reparametrization code to support spare tensors, but the changes needed were too large due to type checking and several parts of the code expecting actual `nn.Module` objects so this option was not feasible.

resnet50 and call functional with a parameters dict covering the 0, 25, 50, and 100% of the model total parameters.

Used script:
https://gist.github.com/emcastillo/f344a58638bd71d130c71c45f86f0c3a

| % of parameters passed | CPU Time (us) | GPU Time (us) |
|------------------------|---------------|---------------|
| regular call           | 5539          | 184909        |
| 0                      | 5561          | 184843        |
| 25                     | 11363         | 189236        |
| 50                     | 18716         | 195378        |
| 75                     | 22851         | 198641        |
| 100                    | 27441         | 202281        |

This PR just swaps the `__getattr__` of the submodules to look into a dict holding only the parameters when called, greatly reducing the burden of having to instantiate custom modules and calling forward to just retrieve a tensor.

The execution times now are as follows:

| % of parameters passed | CPU Time (us) | GPU Time (us) |
|------------------------|---------------|---------------|
| regular call           | 5939          | 187533        |
| 0                      | 5899          | 187570        |
| 25                     | 8541         | 188953        |
| 50                     | 10045         | 189826        |
| 75                     | 11049         | 190344        |
| 100                    | 11911         | 190800        |
| functorch with 100% params | 14014 | 191727

Now we see that the CPU time overhead is greatly reduced and the GPU time barely increases due to the effective overlap.

cc albanD zou3519

Pull Request resolved: pytorch/pytorch#68969

Reviewed By: george-qi

Differential Revision: D33836360

Pulled By: albanD

fbshipit-source-id: 532561f64b18ca14c6ae2d77dcacb339397a589d
(cherry picked from commit fd4b6bd)
cyyever pushed a commit to cyyever/pytorch_private that referenced this pull request Feb 9, 2022
Summary:
pytorch/pytorch#61447 introduced a mechanism for performing functional calls in a model using the reparametrization API. However, the overhead introduced in a single call was too large.
I tried to address this by modifying the reparametrization code to support spare tensors, but the changes needed were too large due to type checking and several parts of the code expecting actual `nn.Module` objects so this option was not feasible.

resnet50 and call functional with a parameters dict covering the 0, 25, 50, and 100% of the model total parameters.

Used script:
https://gist.github.com/emcastillo/f344a58638bd71d130c71c45f86f0c3a

| % of parameters passed | CPU Time (us) | GPU Time (us) |
|------------------------|---------------|---------------|
| regular call           | 5539          | 184909        |
| 0                      | 5561          | 184843        |
| 25                     | 11363         | 189236        |
| 50                     | 18716         | 195378        |
| 75                     | 22851         | 198641        |
| 100                    | 27441         | 202281        |

This PR just swaps the `__getattr__` of the submodules to look into a dict holding only the parameters when called, greatly reducing the burden of having to instantiate custom modules and calling forward to just retrieve a tensor.

The execution times now are as follows:

| % of parameters passed | CPU Time (us) | GPU Time (us) |
|------------------------|---------------|---------------|
| regular call           | 5939          | 187533        |
| 0                      | 5899          | 187570        |
| 25                     | 8541         | 188953        |
| 50                     | 10045         | 189826        |
| 75                     | 11049         | 190344        |
| 100                    | 11911         | 190800        |
| functorch with 100% params | 14014 | 191727

Now we see that the CPU time overhead is greatly reduced and the GPU time barely increases due to the effective overlap.

cc albanD zou3519

Pull Request resolved: pytorch/pytorch#68969

Reviewed By: george-qi

Differential Revision: D33836360

Pulled By: albanD

fbshipit-source-id: 532561f64b18ca14c6ae2d77dcacb339397a589d
(cherry picked from commit fd4b6bd)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants