[c10d][Sym mem] Add set_signal_pad_size API for SymmetricMemory by yang-yu-hang · Pull Request #169156 · pytorch/pytorch

yang-yu-hang · 2025-11-27T01:52:02Z

Summary:
The signal pad size for symmetric memory was previously hardcoded as a constexpr, which may be too small for workloads that launch a large number of blocks. This change exposes set_signal_pad_size and get_signal_pad_size APIs to allow users to configure the signal pad size before making symmetric memory allocations.

Changes:

1. Core API (C++)

Renamed signal_pad_size constexpr to default_signal_pad_size in CUDASymmetricMemoryTypes.hpp
Added get_signal_pad_size() and set_signal_pad_size(size_t) function declarations in CUDASymmetricMemoryTypes.hpp
Implemented the functions in SymmetricMemory.cpp using std::optional<size_t> to distinguish between default and user-configured values
Added TORCH_API exports in SymmetricMemory.hpp for public API access

2. Backend Updates

Updated CUDASymmetricMemory.cu to call get_signal_pad_size() instead of using the hardcoded constant
Updated NCCLSymmetricMemory.cu to use configurable signal pad size with local variable signal_pad_size
Updated NVSHMEMSymmetricMemory.cu to use configurable signal pad size with local variable signal_pad_size

3. Python Bindings

Added Python bindings in init.cpp with comprehensive docstrings explaining usage
Added Python wrapper functions in torch/distributed/_symmetric_memory/init.py
Updated __all__ to export the new API functions

4. Tests

Added test_get_signal_pad_size() to verify the API returns a positive integer and Python/C++ consistency
Added test_set_signal_pad_size() to verify setting, getting, and restoring signal pad size values

Test Plan:
PYTHONPATH=. python3 test/distributed/test_symmetric_memory.py

pytorch-bot · 2025-11-27T01:52:06Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/169156

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 2 Unrelated Failures

As of commit baf585a with merge base d900f5e ():

NEW FAILURE - The following job has failed:

Limited CI for symmetric memory tests on H100 / linux-jammy-cuda12.8-py3.10-gcc11-sm90-symm / test (h100-symm-mem, 1, 1, linux.aws.h100.4) (gh)
'Test'

FLAKY - The following job failed but was likely due to flakiness present on trunk:

pull / linux-jammy-py3.14-clang12 / test (default, 2, 5, lf.linux.4xlarge) (gh) (similar failure)
test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_misc_1_max_autotune_True_cpu_with_stack_allocation

BROKEN TRUNK - The following job failed but was present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

trunk / macos-py3-arm64 / test (default, 1, 3, macos-m1-stable) (gh) (trunk failure)
inductor/test_collective_autotuning 1/1 failed!

This comment was automatically generated by Dr. CI and updates every 15 minutes.

torch/csrc/distributed/c10d/symm_mem/SymmetricMemory.cpp

ngimel

Looks good, a few minor comments. Thanks!

torch/csrc/distributed/c10d/init.cpp

torch/csrc/distributed/c10d/symm_mem/CUDASymmetricMemory.cu

yang-yu-hang · 2025-12-02T00:54:24Z

@pytorchmergebot merge

pytorchmergebot · 2025-12-02T00:56:18Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2025-12-02T01:22:48Z

Merge failed

Reason: 2 jobs have failed, first few of them are: trunk / win-vs2022-cpu-py3 / build, trunk / win-vs2022-cuda12.8-py3 / build

Details for Dev Infra team

Raised by workflow job

yang-yu-hang · 2025-12-02T03:25:50Z

@pytorchmergebot merge -r

pytorchmergebot · 2025-12-02T03:27:22Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

pytorchmergebot · 2025-12-02T03:27:25Z

Successfully rebased add-signal-pad-size-api onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout add-signal-pad-size-api && git pull --rebase)

pytorchmergebot · 2025-12-02T03:28:47Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2025-12-02T03:55:16Z

Merge failed

Reason: 2 jobs have failed, first few of them are: trunk / win-vs2022-cpu-py3 / build, trunk / win-vs2022-cuda12.8-py3 / build

Details for Dev Infra team

Raised by workflow job

yang-yu-hang · 2025-12-03T18:06:38Z

@pytorchmergebot merge -i

pytorchmergebot · 2025-12-03T18:09:02Z

Merge started

Your change will be merged while ignoring the following 1 checks: trunk / linux-jammy-rocm-py3.10 / test (default, 4, 6, linux.rocm.gpu.gfx942.1, unstable)

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2025-12-04T00:07:01Z

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

yang-yu-hang · 2025-12-04T00:09:48Z

@pytorchmergebot merge -i

pytorchmergebot · 2025-12-04T00:15:14Z

Merge started

Your change will be merged while ignoring the following 1 checks: trunk / linux-jammy-rocm-py3.10 / test (default, 4, 6, linux.rocm.gpu.gfx942.1, unstable)

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2025-12-04T03:08:36Z

Merge failed

Reason: 1 jobs have failed, first few of them are: Limited CI for symmetric memory tests on H100 / linux-jammy-cuda12.8-py3.10-gcc11-sm90-symm / test (h100-symm-mem, 1, 1, linux.aws.h100.4)

Details for Dev Infra team

Raised by workflow job

Summary: The signal pad size for symmetric memory was previously hardcoded as a constexpr, which may be too small for workloads that launch a large number of blocks. This change exposes `set_signal_pad_size` and `get_signal_pad_size` APIs to allow users to configure the signal pad size before making symmetric memory allocations. ### Changes: #### 1. Core API (C++) - Renamed `signal_pad_size` constexpr to `default_signal_pad_size` in CUDASymmetricMemoryTypes.hpp - Added `get_signal_pad_size()` and `set_signal_pad_size(size_t)` function declarations in CUDASymmetricMemoryTypes.hpp - Implemented the functions in SymmetricMemory.cpp using `std::optional<size_t>` to distinguish between default and user-configured values - Added TORCH_API exports in SymmetricMemory.hpp for public API access #### 2. Backend Updates - Updated CUDASymmetricMemory.cu to call `get_signal_pad_size()` instead of using the hardcoded constant - Updated NCCLSymmetricMemory.cu to use configurable signal pad size with local variable `signal_pad_size` - Updated NVSHMEMSymmetricMemory.cu to use configurable signal pad size with local variable `signal_pad_size` #### 3. Python Bindings - Added Python bindings in init.cpp with comprehensive docstrings explaining usage - Added Python wrapper functions in torch/distributed/_symmetric_memory/__init__.py - Updated `__all__` to export the new API functions #### 4. Tests - Added `test_get_signal_pad_size()` to verify the API returns a positive integer and Python/C++ consistency - Added `test_set_signal_pad_size()` to verify setting, getting, and restoring signal pad size values Test Plan: Build and existing symmetric memory tests. The API follows the same pattern as existing `set_backend`/`get_backend` APIs.

meta-codesync · 2025-12-04T03:36:23Z

@yang-yu-hang has imported this pull request. If you are a Meta employee, you can view this in D88278682.

yang-yu-hang · 2025-12-04T04:21:39Z

@pytorchmergebot merge

pytorchmergebot · 2025-12-04T04:23:35Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2025-12-04T04:29:07Z

Merge failed

Reason: 1 mandatory check(s) failed. The first few are:

pull / linux-jammy-py3.14-clang12 / test (default, 2, 5, lf.linux.4xlarge)

Dig deeper by viewing the failures on hud

Details for Dev Infra team

Raised by workflow job

Failing merge rule: Core Maintainers

yang-yu-hang · 2025-12-04T05:00:38Z

@pytorchmergebot merge -r

pytorchmergebot · 2025-12-04T05:02:17Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

pytorchmergebot · 2025-12-04T05:02:19Z

Tried to rebase and push PR #169156, but it was already up to date. Try rebasing against main by issuing:
@pytorchbot rebase -b main

pytorchmergebot · 2025-12-04T05:02:20Z

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

ngimel · 2025-12-04T17:51:33Z

@pytorchbot merge -f "test failures unrelated"

pytorchmergebot · 2025-12-04T17:53:35Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

…rch#169156) Summary: The signal pad size for symmetric memory was previously hardcoded as a constexpr, which may be too small for workloads that launch a large number of blocks. This change exposes `set_signal_pad_size` and `get_signal_pad_size` APIs to allow users to configure the signal pad size before making symmetric memory allocations. ### Changes: #### 1. Core API (C++) - Renamed `signal_pad_size` constexpr to `default_signal_pad_size` in CUDASymmetricMemoryTypes.hpp - Added `get_signal_pad_size()` and `set_signal_pad_size(size_t)` function declarations in CUDASymmetricMemoryTypes.hpp - Implemented the functions in SymmetricMemory.cpp using `std::optional<size_t>` to distinguish between default and user-configured values - Added TORCH_API exports in SymmetricMemory.hpp for public API access #### 2. Backend Updates - Updated CUDASymmetricMemory.cu to call `get_signal_pad_size()` instead of using the hardcoded constant - Updated NCCLSymmetricMemory.cu to use configurable signal pad size with local variable `signal_pad_size` - Updated NVSHMEMSymmetricMemory.cu to use configurable signal pad size with local variable `signal_pad_size` #### 3. Python Bindings - Added Python bindings in init.cpp with comprehensive docstrings explaining usage - Added Python wrapper functions in torch/distributed/_symmetric_memory/__init__.py - Updated `__all__` to export the new API functions #### 4. Tests - Added `test_get_signal_pad_size()` to verify the API returns a positive integer and Python/C++ consistency - Added `test_set_signal_pad_size()` to verify setting, getting, and restoring signal pad size values Test Plan: `PYTHONPATH=. python3 test/distributed/test_symmetric_memory.py` Pull Request resolved: pytorch#169156 Approved by: https://github.com/ngimel

) Summary: The signal pad size for symmetric memory was previously hardcoded as a constexpr, which may be too small for workloads that launch a large number of blocks. This change exposes `set_signal_pad_size` and `get_signal_pad_size` APIs to allow users to configure the signal pad size before making symmetric memory allocations. ### Changes: #### 1. Core API (C++) - Renamed `signal_pad_size` constexpr to `default_signal_pad_size` in CUDASymmetricMemoryTypes.hpp - Added `get_signal_pad_size()` and `set_signal_pad_size(size_t)` function declarations in CUDASymmetricMemoryTypes.hpp - Implemented the functions in SymmetricMemory.cpp using `std::optional<size_t>` to distinguish between default and user-configured values - Added TORCH_API exports in SymmetricMemory.hpp for public API access #### 2. Backend Updates - Updated CUDASymmetricMemory.cu to call `get_signal_pad_size()` instead of using the hardcoded constant - Updated NCCLSymmetricMemory.cu to use configurable signal pad size with local variable `signal_pad_size` - Updated NVSHMEMSymmetricMemory.cu to use configurable signal pad size with local variable `signal_pad_size` #### 3. Python Bindings - Added Python bindings in init.cpp with comprehensive docstrings explaining usage - Added Python wrapper functions in torch/distributed/_symmetric_memory/__init__.py - Updated `__all__` to export the new API functions #### 4. Tests - Added `test_get_signal_pad_size()` to verify the API returns a positive integer and Python/C++ consistency - Added `test_set_signal_pad_size()` to verify setting, getting, and restoring signal pad size values Test Plan: `PYTHONPATH=. python3 test/distributed/test_symmetric_memory.py` Pull Request resolved: #169156 Approved by: https://github.com/ngimel

pytorch-bot bot added ciflow/h100-symm-mem release notes: distributed (c10d) release notes category labels Nov 27, 2025

yang-yu-hang force-pushed the add-signal-pad-size-api branch 2 times, most recently from 51db3dc to 5788dec Compare November 27, 2025 22:20

yang-yu-hang marked this pull request as ready for review November 27, 2025 23:59

yang-yu-hang requested review from kwen2501 and ngimel November 27, 2025 23:59

Skylion007 reviewed Nov 29, 2025

View reviewed changes

torch/csrc/distributed/c10d/symm_mem/SymmetricMemory.cpp Outdated Show resolved Hide resolved

yang-yu-hang force-pushed the add-signal-pad-size-api branch from 5788dec to 2596b3a Compare November 30, 2025 04:19

ngimel reviewed Dec 1, 2025

View reviewed changes

torch/csrc/distributed/c10d/init.cpp Outdated Show resolved Hide resolved

torch/csrc/distributed/c10d/symm_mem/CUDASymmetricMemory.cu Outdated Show resolved Hide resolved

yang-yu-hang force-pushed the add-signal-pad-size-api branch 2 times, most recently from ce459a8 to 577f9a6 Compare December 1, 2025 21:16

ngimel approved these changes Dec 1, 2025

View reviewed changes

yang-yu-hang force-pushed the add-signal-pad-size-api branch from 577f9a6 to 1d835c1 Compare December 1, 2025 21:47

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Dec 2, 2025

pytorchmergebot added the merging label Dec 2, 2025

pytorchmergebot removed the merging label Dec 2, 2025

pytorchmergebot force-pushed the add-signal-pad-size-api branch from 1d835c1 to f368c05 Compare December 2, 2025 03:27

pytorchmergebot added the merging label Dec 2, 2025

pytorchmergebot removed the merging label Dec 2, 2025

yang-yu-hang added the keep-going Don't stop on first failure, keep running tests until the end label Dec 2, 2025

pytorchmergebot added the merging label Dec 3, 2025

pytorchmergebot removed the merging label Dec 4, 2025

yang-yu-hang force-pushed the add-signal-pad-size-api branch from a8d202a to baf585a Compare December 4, 2025 03:30

pytorchmergebot added the merging label Dec 4, 2025

pytorchmergebot removed the merging label Dec 4, 2025

pytorchmergebot added the merging label Dec 4, 2025

pytorchmergebot closed this in 324a828 Dec 4, 2025

pytorchmergebot added Merged and removed merging labels Dec 4, 2025

yang-yu-hang deleted the add-signal-pad-size-api branch December 4, 2025 21:34

Conversation

yang-yu-hang commented Nov 27, 2025

Changes:

1. Core API (C++)

2. Backend Updates

3. Python Bindings

4. Tests

Uh oh!

pytorch-bot bot commented Nov 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/169156

❌ 1 New Failure, 2 Unrelated Failures

Uh oh!

Uh oh!

ngimel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

yang-yu-hang commented Dec 2, 2025

Uh oh!

pytorchmergebot commented Dec 2, 2025

Merge started

Uh oh!

pytorchmergebot commented Dec 2, 2025

Merge failed

Uh oh!

yang-yu-hang commented Dec 2, 2025

Uh oh!

pytorchmergebot commented Dec 2, 2025

Uh oh!

pytorchmergebot commented Dec 2, 2025

Uh oh!

pytorchmergebot commented Dec 2, 2025

Merge started

Uh oh!

pytorchmergebot commented Dec 2, 2025

Merge failed

Uh oh!

yang-yu-hang commented Dec 3, 2025

Uh oh!

pytorchmergebot commented Dec 3, 2025

Merge started

Uh oh!

pytorchmergebot commented Dec 4, 2025

Uh oh!

yang-yu-hang commented Dec 4, 2025

Uh oh!

pytorchmergebot commented Dec 4, 2025

Merge started

Uh oh!

pytorchmergebot commented Dec 4, 2025

Merge failed

Uh oh!

meta-codesync bot commented Dec 4, 2025

Uh oh!

yang-yu-hang commented Dec 4, 2025

Uh oh!

pytorchmergebot commented Dec 4, 2025

Merge started

Uh oh!

pytorchmergebot commented Dec 4, 2025

Merge failed

Uh oh!

yang-yu-hang commented Dec 4, 2025

Uh oh!

pytorchmergebot commented Dec 4, 2025

Uh oh!

pytorchmergebot commented Dec 4, 2025

Uh oh!

pytorchmergebot commented Dec 4, 2025

Uh oh!

ngimel commented Dec 4, 2025

Uh oh!

pytorchmergebot commented Dec 4, 2025

Merge started

Uh oh!

Reviewers

Assignees

pytorch-bot bot commented Nov 27, 2025 •

edited

Loading