Added requested_bytes to CUDA Caching Allocator Stats #88575

c-odrin · 2022-11-07T14:30:38Z

Summary:
The caching allocator can be configured to round memory allocations in order to reduce fragmentation. Sometimes however, the overhead from rounding can be higher than the fragmentation it helps reduce.

We have added a new stat to CUDA caching allocator stats to help track if rounding is adding too much overhead and help tune the roundup_power2_divisions flag:
- "requested_bytes.{current,peak,allocated,freed}": memory requested by client code, compare this with allocated_bytes to check if allocation rounding adds too much overhead

Test Plan: Added test case in caffe2/test/test_cuda.py

Differential Revision: D40810674

pytorch-bot · 2022-11-07T14:30:40Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/88575

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

ROCm jobs fail to access AMD apt repo

❌ 8 Failures

As of commit bbaa879:

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot · 2022-11-07T14:31:23Z

This pull request was exported from Phabricator. Differential Revision: D40810674

facebook-github-bot · 2022-11-07T14:39:29Z

This pull request was exported from Phabricator. Differential Revision: D40810674

facebook-github-bot · 2022-11-07T18:34:18Z

This pull request was exported from Phabricator. Differential Revision: D40810674

facebook-github-bot · 2022-11-07T23:31:51Z

This pull request was exported from Phabricator. Differential Revision: D40810674

facebook-github-bot · 2022-11-09T16:08:43Z

This pull request was exported from Phabricator. Differential Revision: D40810674

facebook-github-bot · 2022-11-10T17:37:49Z

This pull request was exported from Phabricator. Differential Revision: D40810674

facebook-github-bot · 2022-11-11T18:05:41Z

This pull request was exported from Phabricator. Differential Revision: D40810674

zdevito

It makes sense to record this stat. I have a few inline comments. I also think that there is code missing to handle resetting the statistic when all the other statistics are reset. I don't see tests for reading requested_bytes out of the block info.

c10/cuda/CUDACachingAllocator.cpp

c10/cuda/CUDACachingAllocator.h

facebook-github-bot · 2022-12-06T14:53:42Z

This pull request was exported from Phabricator. Differential Revision: D40810674

facebook-github-bot · 2022-12-06T17:32:05Z

This pull request was exported from Phabricator. Differential Revision: D40810674

facebook-github-bot · 2022-12-11T12:59:07Z

This pull request was exported from Phabricator. Differential Revision: D40810674

c-odrin · 2022-12-12T14:46:25Z

It makes sense to record this stat. I have a few inline comments. I also think that there is code missing to handle resetting the statistic when all the other statistics are reset. I don't see tests for reading requested_bytes out of the block info.

Thank you for the review! I've addressed the inline comments, added tests for reading requested_bytes out of segment info/block info and added code to reset the statistics when all the other statistics are reset.

zdevito

Looks good. I have a couple nits that should be addressed (missing blockinfo export and test), but this otherwise looks good to me.

c10/cuda/CUDACachingAllocator.cpp

torch/csrc/cuda/Module.cpp

facebook-github-bot · 2023-01-10T01:48:13Z

This pull request was exported from Phabricator. Differential Revision: D40810674

Summary: Pull Request resolved: pytorch#88575 The caching allocator can be configured to round memory allocations in order to reduce fragmentation. Sometimes however, the overhead from rounding can be higher than the fragmentation it helps reduce. We have added a new stat to CUDA caching allocator stats to help track if rounding is adding too much overhead and help tune the roundup_power2_divisions flag: - "requested_bytes.{all,large_pool,small_pool}.{current,peak,allocated,freed}": memory requested by client code, compare this with allocated_bytes to check if allocation rounding adds too much overhead Test Plan: Added test case in caffe2/test/test_cuda.py Differential Revision: D40810674 fbshipit-source-id: d71624c0173c1ca272a1a76acc2bc1ff022edfab

facebook-github-bot · 2023-02-01T15:22:33Z

This pull request was exported from Phabricator. Differential Revision: D40810674

facebook-github-bot · 2023-02-07T16:24:33Z

@c-odrin has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2023-02-09T14:51:27Z

@c-odrin has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2023-02-09T21:35:36Z

@pytorchbot merge

(Initiating merge automatically since Phabricator Diff has merged)

pytorchmergebot · 2023-02-09T21:37:21Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Memory usage increase after pytorch#88575

Memory usage increases after #88575. Docker crashes with exit code 137, clearly means out of memory Pull Request resolved: #94548 Approved by: https://github.com/seemethere

facebook-github-bot added the fb-exported label Nov 7, 2022

c-odrin force-pushed the export-D40810674 branch from f6b245f to 3bd959a Compare November 7, 2022 14:39

c-odrin force-pushed the export-D40810674 branch from 3bd959a to 609851b Compare November 7, 2022 18:34

c-odrin force-pushed the export-D40810674 branch from 609851b to 7f16c56 Compare November 7, 2022 23:31

c-odrin force-pushed the export-D40810674 branch from 7f16c56 to 050cba1 Compare November 9, 2022 16:08

c-odrin force-pushed the export-D40810674 branch from 050cba1 to 9b98296 Compare November 10, 2022 17:37

c-odrin force-pushed the export-D40810674 branch from 9b98296 to e80ba76 Compare November 11, 2022 18:05

c-odrin requested review from hyuen and zdevito November 14, 2022 15:00

zdevito reviewed Dec 5, 2022

View reviewed changes

c-odrin force-pushed the export-D40810674 branch from e80ba76 to 63f645d Compare December 6, 2022 14:53

c-odrin force-pushed the export-D40810674 branch from 63f645d to 348f39f Compare December 6, 2022 17:32

c-odrin force-pushed the export-D40810674 branch from 3f07abe to 4e65787 Compare December 11, 2022 12:50

c-odrin force-pushed the export-D40810674 branch from 4e65787 to 1f0e522 Compare December 11, 2022 12:59

c-odrin requested a review from zdevito December 12, 2022 15:23

zdevito approved these changes Dec 13, 2022

View reviewed changes

c10/cuda/CUDACachingAllocator.cpp Outdated Show resolved Hide resolved

torch/csrc/cuda/Module.cpp Outdated Show resolved Hide resolved

c-odrin force-pushed the export-D40810674 branch from 1f0e522 to 22f94d8 Compare January 10, 2023 01:48

c-odrin force-pushed the export-D40810674 branch from 22f94d8 to 28f9656 Compare February 1, 2023 15:22

Merge branch 'master' into export-D40810674

bbaa879

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Feb 9, 2023

pytorchmergebot added the Merged label Feb 9, 2023

pytorchmergebot closed this in 54b7c7d Feb 9, 2023

huydhn added a commit to huydhn/pytorch that referenced this pull request Feb 9, 2023

Use 12xlarge to build libtorch

486fec7

Memory usage increase after pytorch#88575

huydhn mentioned this pull request Feb 9, 2023

Lower libtorch build parallelization to avoid OOM #94548

Closed

huydhn reopened this Feb 9, 2023

huydhn closed this Feb 9, 2023

pytorch deleted a comment from pytorchmergebot Feb 9, 2023

Added requested_bytes to CUDA Caching Allocator Stats #88575

Added requested_bytes to CUDA Caching Allocator Stats #88575

Uh oh!

Conversation

c-odrin commented Nov 7, 2022

Uh oh!

pytorch-bot bot commented Nov 7, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/88575

❗ 1 Active SEVs

❌ 8 Failures

Uh oh!

facebook-github-bot commented Nov 7, 2022

Uh oh!

facebook-github-bot commented Nov 7, 2022

Uh oh!

facebook-github-bot commented Nov 7, 2022

Uh oh!

facebook-github-bot commented Nov 7, 2022

Uh oh!

facebook-github-bot commented Nov 9, 2022

Uh oh!

facebook-github-bot commented Nov 10, 2022

Uh oh!

facebook-github-bot commented Nov 11, 2022

Uh oh!

zdevito left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

facebook-github-bot commented Dec 6, 2022

Uh oh!

facebook-github-bot commented Dec 6, 2022

Uh oh!

facebook-github-bot commented Dec 11, 2022

Uh oh!

c-odrin commented Dec 12, 2022

Uh oh!

zdevito left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

facebook-github-bot commented Jan 10, 2023

Uh oh!

facebook-github-bot commented Feb 1, 2023

Uh oh!

facebook-github-bot commented Feb 7, 2023

Uh oh!

facebook-github-bot commented Feb 9, 2023

Uh oh!

facebook-github-bot commented Feb 9, 2023

Uh oh!

pytorchmergebot commented Feb 9, 2023

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

pytorch-bot bot commented Nov 7, 2022 •

edited

Loading