Skip to content

[c10d][Sym mem] Add set_signal_pad_size API for SymmetricMemory#169156

Closed
yang-yu-hang wants to merge 1 commit intopytorch:mainfrom
yang-yu-hang:add-signal-pad-size-api
Closed

[c10d][Sym mem] Add set_signal_pad_size API for SymmetricMemory#169156
yang-yu-hang wants to merge 1 commit intopytorch:mainfrom
yang-yu-hang:add-signal-pad-size-api

Conversation

@yang-yu-hang
Copy link
Contributor

Summary:
The signal pad size for symmetric memory was previously hardcoded as a constexpr, which may be too small for workloads that launch a large number of blocks. This change exposes set_signal_pad_size and get_signal_pad_size APIs to allow users to configure the signal pad size before making symmetric memory allocations.

Changes:

1. Core API (C++)

  • Renamed signal_pad_size constexpr to default_signal_pad_size in CUDASymmetricMemoryTypes.hpp
  • Added get_signal_pad_size() and set_signal_pad_size(size_t) function declarations in CUDASymmetricMemoryTypes.hpp
  • Implemented the functions in SymmetricMemory.cpp using std::optional<size_t> to distinguish between default and user-configured values
  • Added TORCH_API exports in SymmetricMemory.hpp for public API access

2. Backend Updates

  • Updated CUDASymmetricMemory.cu to call get_signal_pad_size() instead of using the hardcoded constant
  • Updated NCCLSymmetricMemory.cu to use configurable signal pad size with local variable signal_pad_size
  • Updated NVSHMEMSymmetricMemory.cu to use configurable signal pad size with local variable signal_pad_size

3. Python Bindings

  • Added Python bindings in init.cpp with comprehensive docstrings explaining usage
  • Added Python wrapper functions in torch/distributed/_symmetric_memory/init.py
  • Updated __all__ to export the new API functions

4. Tests

  • Added test_get_signal_pad_size() to verify the API returns a positive integer and Python/C++ consistency
  • Added test_set_signal_pad_size() to verify setting, getting, and restoring signal pad size values

Test Plan:
PYTHONPATH=. python3 test/distributed/test_symmetric_memory.py

@pytorch-bot
Copy link

pytorch-bot bot commented Nov 27, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/169156

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 2 Unrelated Failures

As of commit baf585a with merge base d900f5e (image):

NEW FAILURE - The following job has failed:

FLAKY - The following job failed but was likely due to flakiness present on trunk:

BROKEN TRUNK - The following job failed but was present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@yang-yu-hang yang-yu-hang force-pushed the add-signal-pad-size-api branch 2 times, most recently from 51db3dc to 5788dec Compare November 27, 2025 22:20
@yang-yu-hang yang-yu-hang marked this pull request as ready for review November 27, 2025 23:59
@yang-yu-hang yang-yu-hang force-pushed the add-signal-pad-size-api branch from 5788dec to 2596b3a Compare November 30, 2025 04:19
Copy link
Collaborator

@ngimel ngimel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, a few minor comments. Thanks!

@yang-yu-hang yang-yu-hang force-pushed the add-signal-pad-size-api branch 2 times, most recently from ce459a8 to 577f9a6 Compare December 1, 2025 21:16
@yang-yu-hang yang-yu-hang force-pushed the add-signal-pad-size-api branch from 577f9a6 to 1d835c1 Compare December 1, 2025 21:47
@yang-yu-hang
Copy link
Contributor Author

@pytorchmergebot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Dec 2, 2025
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 2 jobs have failed, first few of them are: trunk / win-vs2022-cpu-py3 / build, trunk / win-vs2022-cuda12.8-py3 / build

Details for Dev Infra team Raised by workflow job

@yang-yu-hang
Copy link
Contributor Author

@pytorchmergebot merge -r

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Successfully rebased add-signal-pad-size-api onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout add-signal-pad-size-api && git pull --rebase)

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 2 jobs have failed, first few of them are: trunk / win-vs2022-cpu-py3 / build, trunk / win-vs2022-cuda12.8-py3 / build

Details for Dev Infra team Raised by workflow job

@yang-yu-hang yang-yu-hang added the keep-going Don't stop on first failure, keep running tests until the end label Dec 2, 2025
@yang-yu-hang
Copy link
Contributor Author

@pytorchmergebot merge -i

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged while ignoring the following 1 checks: trunk / linux-jammy-rocm-py3.10 / test (default, 4, 6, linux.rocm.gpu.gfx942.1, unstable)

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

@yang-yu-hang
Copy link
Contributor Author

@pytorchmergebot merge -i

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged while ignoring the following 1 checks: trunk / linux-jammy-rocm-py3.10 / test (default, 4, 6, linux.rocm.gpu.gfx942.1, unstable)

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 jobs have failed, first few of them are: Limited CI for symmetric memory tests on H100 / linux-jammy-cuda12.8-py3.10-gcc11-sm90-symm / test (h100-symm-mem, 1, 1, linux.aws.h100.4)

Details for Dev Infra team Raised by workflow job

Summary:
The signal pad size for symmetric memory was previously hardcoded as a constexpr, which may be too small for workloads that launch a large number of blocks. This change exposes `set_signal_pad_size` and `get_signal_pad_size` APIs to allow users to configure the signal pad size before making symmetric memory allocations.

### Changes:

#### 1. Core API (C++)
- Renamed `signal_pad_size` constexpr to `default_signal_pad_size` in CUDASymmetricMemoryTypes.hpp
- Added `get_signal_pad_size()` and `set_signal_pad_size(size_t)` function declarations in CUDASymmetricMemoryTypes.hpp
- Implemented the functions in SymmetricMemory.cpp using `std::optional<size_t>` to distinguish between default and user-configured values
- Added TORCH_API exports in SymmetricMemory.hpp for public API access

#### 2. Backend Updates
- Updated CUDASymmetricMemory.cu to call `get_signal_pad_size()` instead of using the hardcoded constant
- Updated NCCLSymmetricMemory.cu to use configurable signal pad size with local variable `signal_pad_size`
- Updated NVSHMEMSymmetricMemory.cu to use configurable signal pad size with local variable `signal_pad_size`

#### 3. Python Bindings
- Added Python bindings in init.cpp with comprehensive docstrings explaining usage
- Added Python wrapper functions in torch/distributed/_symmetric_memory/__init__.py
- Updated `__all__` to export the new API functions

#### 4. Tests
- Added `test_get_signal_pad_size()` to verify the API returns a positive integer and Python/C++ consistency
- Added `test_set_signal_pad_size()` to verify setting, getting, and restoring signal pad size values


Test Plan:
Build and existing symmetric memory tests. The API follows the same pattern as existing `set_backend`/`get_backend` APIs.
@yang-yu-hang yang-yu-hang force-pushed the add-signal-pad-size-api branch from a8d202a to baf585a Compare December 4, 2025 03:30
@meta-codesync
Copy link

meta-codesync bot commented Dec 4, 2025

@yang-yu-hang has imported this pull request. If you are a Meta employee, you can view this in D88278682.

@yang-yu-hang
Copy link
Contributor Author

@pytorchmergebot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 mandatory check(s) failed. The first few are:

Dig deeper by viewing the failures on hud

Details for Dev Infra team Raised by workflow job

Failing merge rule: Core Maintainers

@yang-yu-hang
Copy link
Contributor Author

@pytorchmergebot merge -r

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Tried to rebase and push PR #169156, but it was already up to date. Try rebasing against main by issuing:
@pytorchbot rebase -b main

@pytorchmergebot
Copy link
Collaborator

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

@ngimel
Copy link
Collaborator

ngimel commented Dec 4, 2025

@pytorchbot merge -f "test failures unrelated"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@yang-yu-hang yang-yu-hang deleted the add-signal-pad-size-api branch December 4, 2025 21:34
umechand-amd pushed a commit to ROCm/pytorch that referenced this pull request Dec 8, 2025
…rch#169156)

Summary:
The signal pad size for symmetric memory was previously hardcoded as a constexpr, which may be too small for workloads that launch a large number of blocks. This change exposes `set_signal_pad_size` and `get_signal_pad_size` APIs to allow users to configure the signal pad size before making symmetric memory allocations.

### Changes:

#### 1. Core API (C++)
- Renamed `signal_pad_size` constexpr to `default_signal_pad_size` in CUDASymmetricMemoryTypes.hpp
- Added `get_signal_pad_size()` and `set_signal_pad_size(size_t)` function declarations in CUDASymmetricMemoryTypes.hpp
- Implemented the functions in SymmetricMemory.cpp using `std::optional<size_t>` to distinguish between default and user-configured values
- Added TORCH_API exports in SymmetricMemory.hpp for public API access

#### 2. Backend Updates
- Updated CUDASymmetricMemory.cu to call `get_signal_pad_size()` instead of using the hardcoded constant
- Updated NCCLSymmetricMemory.cu to use configurable signal pad size with local variable `signal_pad_size`
- Updated NVSHMEMSymmetricMemory.cu to use configurable signal pad size with local variable `signal_pad_size`

#### 3. Python Bindings
- Added Python bindings in init.cpp with comprehensive docstrings explaining usage
- Added Python wrapper functions in torch/distributed/_symmetric_memory/__init__.py
- Updated `__all__` to export the new API functions

#### 4. Tests
- Added `test_get_signal_pad_size()` to verify the API returns a positive integer and Python/C++ consistency
- Added `test_set_signal_pad_size()` to verify setting, getting, and restoring signal pad size values

Test Plan:
`PYTHONPATH=. python3 test/distributed/test_symmetric_memory.py`

Pull Request resolved: pytorch#169156
Approved by: https://github.com/ngimel
JacobSzwejbka pushed a commit that referenced this pull request Dec 8, 2025
)

Summary:
The signal pad size for symmetric memory was previously hardcoded as a constexpr, which may be too small for workloads that launch a large number of blocks. This change exposes `set_signal_pad_size` and `get_signal_pad_size` APIs to allow users to configure the signal pad size before making symmetric memory allocations.

### Changes:

#### 1. Core API (C++)
- Renamed `signal_pad_size` constexpr to `default_signal_pad_size` in CUDASymmetricMemoryTypes.hpp
- Added `get_signal_pad_size()` and `set_signal_pad_size(size_t)` function declarations in CUDASymmetricMemoryTypes.hpp
- Implemented the functions in SymmetricMemory.cpp using `std::optional<size_t>` to distinguish between default and user-configured values
- Added TORCH_API exports in SymmetricMemory.hpp for public API access

#### 2. Backend Updates
- Updated CUDASymmetricMemory.cu to call `get_signal_pad_size()` instead of using the hardcoded constant
- Updated NCCLSymmetricMemory.cu to use configurable signal pad size with local variable `signal_pad_size`
- Updated NVSHMEMSymmetricMemory.cu to use configurable signal pad size with local variable `signal_pad_size`

#### 3. Python Bindings
- Added Python bindings in init.cpp with comprehensive docstrings explaining usage
- Added Python wrapper functions in torch/distributed/_symmetric_memory/__init__.py
- Updated `__all__` to export the new API functions

#### 4. Tests
- Added `test_get_signal_pad_size()` to verify the API returns a positive integer and Python/C++ consistency
- Added `test_set_signal_pad_size()` to verify setting, getting, and restoring signal pad size values

Test Plan:
`PYTHONPATH=. python3 test/distributed/test_symmetric_memory.py`

Pull Request resolved: #169156
Approved by: https://github.com/ngimel
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/h100-symm-mem ciflow/trunk Trigger trunk jobs on your pull request keep-going Don't stop on first failure, keep running tests until the end Merged release notes: distributed (c10d) release notes category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants