Make the CUTLASS swizzle options configurable and default to 2. #146088

masnesral · 2025-01-30T22:29:21Z

Stack from ghstack (oldest at bottom):

-> Make the CUTLASS swizzle options configurable and default to 2. #146088

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @amjames @desertfire @chauhang @aakhundov

[ghstack-poisoned]

pytorch-bot · 2025-01-30T22:29:25Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/146088

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

ROCM Infra failures during checkout of PyTorch

❌ 1 New Failure

As of commit f8d8bc3 with merge base 354fe48 ():

NEW FAILURE - The following job has failed:

trunk / linux-focal-rocm6.3-py3.10 / test (distributed, 1, 1, linux.rocm.gpu.4) (gh)
Process completed with exit code 1.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: 52d5595 Pull Request resolved: #146088

henrylhtsang · 2025-01-31T02:27:24Z

torch/_inductor/config.py

    # This is mainly used to reduce test time in CI.
    cutlass_max_profiling_configs: Optional[int] = None

+    # The L2 swizzle values to consider when profiling CUTLASS configs in max_autotune.


probably also mention what are good values to put in here

Chillee

I'm not sure I quite understand this PR? Is it just to reduce compilation time?

masnesral · 2025-01-31T16:25:17Z

I'm not sure I quite understand this PR? Is it just to reduce compilation time?

@Chillee, yeah. For one data point, see: https://fburl.com/workplace/gx0zim0l
I'd think the default is not a big deal and the more important part is to make it configurable. WDYT? Would you make the default 1,2,3,4?

henrylhtsang · 2025-01-31T20:17:02Z

I'm not sure I quite understand this PR? Is it just to reduce compilation time?

It will also 4x the number of configs, which cannot be controlled by cutlass_max_profiling_configs. For example, even if you set cutlass_max_profiling_configs = 10, you will still be autotuning 40 configs.

henrylhtsang · 2025-02-04T04:02:33Z

I'm not sure I quite understand this PR? Is it just to reduce compilation time?

@Chillee, yeah. For one data point, see: https://fburl.com/workplace/gx0zim0l I'd think the default is not a big deal and the more important part is to make it configurable. WDYT? Would you make the default 1,2,3,4?

@Chillee @masnesral can you land this just to recover flaky test signal?
https://www.internalfb.com/intern/test/281475163758161?ref_report_id=0
https://www.internalfb.com/intern/test/562950140502597?ref_report_id=0

Even if we want to set default as [1, 2, 4, 8], as long as it is configurable, we can fix that in the test.

masnesral · 2025-02-04T17:36:38Z

I'll land as is. I only chose '2' because @henrylhtsang suggest that in offline discussion. If someone can give insights on the "best" default, I'll gladly change it.

masnesral · 2025-02-04T17:36:43Z

@pytorchbot merge

pytorchmergebot · 2025-02-04T17:38:32Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2025-02-04T19:05:20Z

Merge failed

Reason: 1 jobs have failed, first few of them are: trunk / linux-focal-rocm6.3-py3.10 / test (distributed, 1, 1, linux.rocm.gpu.4)

Details for Dev Infra team

Raised by workflow job

masnesral · 2025-02-04T21:59:16Z

@pytorchbot merge -i

pytorchmergebot · 2025-02-04T22:01:21Z

Merge started

Your change will be merged while ignoring the following 1 checks: trunk / linux-focal-rocm6.3-py3.10 / test (distributed, 1, 1, linux.rocm.gpu.4)

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Make the CUTLASS swizzle options configurable and default to 2.

f8d8bc3

[ghstack-poisoned]

pytorch-bot bot added ciflow/inductor module: inductor labels Jan 30, 2025

masnesral added a commit that referenced this pull request Jan 30, 2025

Make the CUTLASS swizzle options configurable and default to 2.

07a13fc

ghstack-source-id: 52d5595 Pull Request resolved: #146088

masnesral added the topic: not user facing topic category label Jan 30, 2025

masnesral requested review from Chillee, henrylhtsang and mlazos January 30, 2025 22:32

masnesral marked this pull request as ready for review January 30, 2025 22:33

henrylhtsang approved these changes Jan 31, 2025

View reviewed changes

Chillee reviewed Jan 31, 2025

View reviewed changes

mlazos approved these changes Feb 1, 2025

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Feb 4, 2025

pytorchmergebot added the merging label Feb 4, 2025

pytorchmergebot removed the merging label Feb 4, 2025

pytorchmergebot added the merging label Feb 4, 2025

pytorchmergebot added the Merged label Feb 4, 2025

pytorchmergebot closed this in 13e17aa Feb 4, 2025

pytorchmergebot removed the merging label Feb 4, 2025

github-actions bot deleted the gh/masnesral/174/head branch March 7, 2025 02:07

Make the CUTLASS swizzle options configurable and default to 2. #146088

Make the CUTLASS swizzle options configurable and default to 2. #146088

Uh oh!

Conversation

masnesral commented Jan 30, 2025 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jan 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/146088

❗ 1 Active SEVs

❌ 1 New Failure

Uh oh!

henrylhtsang Jan 31, 2025

Choose a reason for hiding this comment

Uh oh!

Chillee left a comment

Choose a reason for hiding this comment

Uh oh!

masnesral commented Jan 31, 2025

Uh oh!

henrylhtsang commented Jan 31, 2025

Uh oh!

henrylhtsang commented Feb 4, 2025

Uh oh!

masnesral commented Feb 4, 2025

Uh oh!

masnesral commented Feb 4, 2025

Uh oh!

pytorchmergebot commented Feb 4, 2025

Merge started

Uh oh!

pytorchmergebot commented Feb 4, 2025

Merge failed

Uh oh!

masnesral commented Feb 4, 2025

Uh oh!

pytorchmergebot commented Feb 4, 2025

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

masnesral commented Jan 30, 2025 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Jan 30, 2025 •

edited

Loading