Tags · pytorch/pytorch

trunk/89165c0a2b5d3c147c19a492437291c8ff18aa7f

Update triton to 3.5.1 release (#166968)

This includes sm103 triton-lang/triton#8485 fix

Pull Request resolved: #166968
Approved by: https://github.com/Lucaskabela, https://github.com/njriasan

Nov 5, 2025
89165c0
zip
tar.gz

trunk/59563dfe56a086a4a95025f0ccfe373bc1fd3759

Refactor out headeronly ArrayRef (#164991)

Differential Revision: [D85091961](https://our.internmc.facebook.com/intern/diff/D85091961)
Pull Request resolved: #164991
Approved by: https://github.com/swolchok

Nov 5, 2025
59563df
zip
tar.gz

trunk/39160dba0c5120c65705a44e556c8c4af243e573

shrink_group implementation to expose ncclCommShrink API (#164518)

Closes #164529

To expose the new [ncclCommShrink](https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/api/comms.html#ncclcommshrink) API to PyTorch.

This is useful when you need to exclude certain GPUs or nodes from a collective operation, for example in fault tolerance scenarios or when dynamically adjusting resource utilization.

For more info:  [Shrinking a communicator](https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/usage/communicators.html#shrinking-a-communicator)

Pull Request resolved: #164518
Approved by: https://github.com/kwen2501

Nov 5, 2025
39160db
zip
tar.gz

trunk/14956eaef4a14901a95a6d0779d99db11fd7406b

[ROCm][CI] revert ROCm magma commit hash to last known good (#167044)

PR #166693 updated the magma commit hash but this has been linked to ROCm 7.1 CI failures.  Go back to last known working magma version.

Pull Request resolved: #167044
Approved by: https://github.com/jeffdaily

Co-authored-by: Jeff Daily <jeff.daily@amd.com>

Nov 5, 2025
14956ea
zip
tar.gz

trunk/6052a01b71277eb767d87daf47d109f8e0edd5c0

[BE][Typing][Dynamo] Type torch/_dynamo/variables/dicts.py (#167022)

Provides type coverage to torch/_dynamo/variables/dicts.py

Coverage report:
`mypy torch/_dynamo/variables/dicts.py --linecount-report /tmp/coverage_log`

Compare before to after - we go from 0 lines and 0 funcs covered to 1547 lines and 89 funcs covered

Pull Request resolved: #167022
Approved by: https://github.com/Skylion007

Nov 5, 2025
6052a01
zip
tar.gz

trunk/5863ba1b2e4de9ea0ae16a663465ec5d3d6f9f52

[12/N] Apply ruff UP035 rule (#166929)

This PR continues to apply ruff UP035 rule to test code and some remaining torch files.

Pull Request resolved: #166929
Approved by: https://github.com/Lucaskabela

Nov 5, 2025
5863ba1
zip
tar.gz

trunk/4271ffe91849335ffbcc2014c948694f8ec107fd

don't produce invalid grid configs (#166974)

Proper fix for #164048, fixes gather too, reverts #164049
Pull Request resolved: #166974
Approved by: https://github.com/eqy

Nov 5, 2025
4271ffe
zip
tar.gz

trunk/658c5f879c37142b1df51c7eb6c5a5bb06318597

[Inductor][Grouped Gemm] Add Blackwell CuTeDSL Kernel (#167003)

Summary: This is a reland of #165036, which previously contained a minor bug in the logic that determined whether the kernel should be enabled. As a result, it was incorrectly activated on non-Blackwell GPUs.

Test Plan:
Inductor test (fbcode):
`INDUCTOR_TEST_DISABLE_FRESH_CACHE=1 TORCHINDUCTOR_CACHE_DIR=~/cutetest buck2 run mode/opt //caffe2/test/inductor:cutedsl_grouped_mm -c fbcode.nvcc_arch=b200a -c fbcode.enable_gpu_sections=true -c fbcode.platform010_cuda_version=12.8 -m "ovr_config//third-party/pypi/nvidia-cutlass-dsl/constraints:4.2.1"`

Tritonbench (fbcode):
`clear; CUDA_VISIBLE_DEVICES=7 TRITON_PRINT_AUTOTUNING=1 TRITON_ALWAYS_COMPILE=1 TORCH_LOGS=+inductor TORCHINDUCTOR_FORCE_DISABLE_CACHES=1 TORCHINDUCTOR_MAX_AUTOTUNE_GEMM=1 buck2 run mode/opt //pytorch/tritonbench:run -c fbcode.nvcc_arch=b200a -c fbcode.enable_gpu_sections=true -c fbcode.platform010_cuda_version=12.8 -m "ovr_config//third-party/pypi/nvidia-cutlass-dsl/constraints:4.2.1" -- --op grouped_gemm --only aten_grouped_mm,preprocessed_pt2_cute_grouped_mm --precision bf16  --num-inputs 1 --metrics tflops,accuracy`

Tritonbench(oss):
`clear; CUDA_VISIBLE_DEVICES=2 TRITON_PRINT_AUTOTUNING=1 TRITON_ALWAYS_COMPILE=1 TORCH_LOGS=+inductor TORCHINDUCTOR_FORCE_DISABLE_CACHES=1 TORCHINDUCTOR_MAX_AUTOTUNE_GEMM=1 python run.py --op grouped_gemm --only aten_grouped_mm,preprocessed_pt2_triton_grouped_mm --precision bf16  --num-inputs 1 --metrics tflops,accuracy`

Unit Tests(oss):
`clear; python test/inductor/test_cutedsl_grouped_mm.py`

Differential Revision: D86231180

Pull Request resolved: #167003
Approved by: https://github.com/jananisriram

Nov 5, 2025
658c5f8
zip
tar.gz

trunk/641de23c96e2c0d2848a7aa2aacb2f77843177a5

ci: Add aarch64 docker builds for modern clang (#166416)

Should enable us to build using some arm optimizations that are only
available on the newest versions of clang.

Signed-off-by: Eli Uriegas <eliuriegas@meta.com>
Pull Request resolved: #166416
Approved by: https://github.com/malfet

Nov 5, 2025
641de23
zip
tar.gz

trunk/431dfe8692f3f927c19c739884054d7f1d42a33d

[dynamo] extend `collections.defaultdict` support with `*args`, `**kw…

…args` and custom `default_factory` (#166793)

Fixes #166238

Extend `collections.defaultdict` to accept `*args` and `**kwargs` in the constructor. And also support custom `default_factory`, such as `dd.default_factory` (a `GetAttrVariable`).

Pull Request resolved: #166793
Approved by: https://github.com/guilhermeleobas

Nov 5, 2025
431dfe8
zip
tar.gz

PreviousNext

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

trunk/89165c0a2b5d3c147c19a492437291c8ff18aa7f

trunk/59563dfe56a086a4a95025f0ccfe373bc1fd3759

trunk/39160dba0c5120c65705a44e556c8c4af243e573

trunk/14956eaef4a14901a95a6d0779d99db11fd7406b

trunk/6052a01b71277eb767d87daf47d109f8e0edd5c0

trunk/5863ba1b2e4de9ea0ae16a663465ec5d3d6f9f52

trunk/4271ffe91849335ffbcc2014c948694f8ec107fd

trunk/658c5f879c37142b1df51c7eb6c5a5bb06318597

trunk/641de23c96e2c0d2848a7aa2aacb2f77843177a5

trunk/431dfe8692f3f927c19c739884054d7f1d42a33d

Tags: pytorch/pytorch