Tags: pytorch/pytorch
Tags
Update triton to 3.5.1 release (#166968) This includes sm103 triton-lang/triton#8485 fix Pull Request resolved: #166968 Approved by: https://github.com/Lucaskabela, https://github.com/njriasan
Refactor out headeronly ArrayRef (#164991) Differential Revision: [D85091961](https://our.internmc.facebook.com/intern/diff/D85091961) Pull Request resolved: #164991 Approved by: https://github.com/swolchok
shrink_group implementation to expose ncclCommShrink API (#164518) Closes #164529 To expose the new [ncclCommShrink](https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/api/comms.html#ncclcommshrink) API to PyTorch. This is useful when you need to exclude certain GPUs or nodes from a collective operation, for example in fault tolerance scenarios or when dynamically adjusting resource utilization. For more info: [Shrinking a communicator](https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/usage/communicators.html#shrinking-a-communicator) Pull Request resolved: #164518 Approved by: https://github.com/kwen2501
[ROCm][CI] revert ROCm magma commit hash to last known good (#167044) PR #166693 updated the magma commit hash but this has been linked to ROCm 7.1 CI failures. Go back to last known working magma version. Pull Request resolved: #167044 Approved by: https://github.com/jeffdaily Co-authored-by: Jeff Daily <jeff.daily@amd.com>
[BE][Typing][Dynamo] Type torch/_dynamo/variables/dicts.py (#167022) Provides type coverage to torch/_dynamo/variables/dicts.py Coverage report: `mypy torch/_dynamo/variables/dicts.py --linecount-report /tmp/coverage_log` Compare before to after - we go from 0 lines and 0 funcs covered to 1547 lines and 89 funcs covered Pull Request resolved: #167022 Approved by: https://github.com/Skylion007
[12/N] Apply ruff UP035 rule (#166929) This PR continues to apply ruff UP035 rule to test code and some remaining torch files. Pull Request resolved: #166929 Approved by: https://github.com/Lucaskabela
don't produce invalid grid configs (#166974) Proper fix for #164048, fixes gather too, reverts #164049 Pull Request resolved: #166974 Approved by: https://github.com/eqy
[Inductor][Grouped Gemm] Add Blackwell CuTeDSL Kernel (#167003) Summary: This is a reland of #165036, which previously contained a minor bug in the logic that determined whether the kernel should be enabled. As a result, it was incorrectly activated on non-Blackwell GPUs. Test Plan: Inductor test (fbcode): `INDUCTOR_TEST_DISABLE_FRESH_CACHE=1 TORCHINDUCTOR_CACHE_DIR=~/cutetest buck2 run mode/opt //caffe2/test/inductor:cutedsl_grouped_mm -c fbcode.nvcc_arch=b200a -c fbcode.enable_gpu_sections=true -c fbcode.platform010_cuda_version=12.8 -m "ovr_config//third-party/pypi/nvidia-cutlass-dsl/constraints:4.2.1"` Tritonbench (fbcode): `clear; CUDA_VISIBLE_DEVICES=7 TRITON_PRINT_AUTOTUNING=1 TRITON_ALWAYS_COMPILE=1 TORCH_LOGS=+inductor TORCHINDUCTOR_FORCE_DISABLE_CACHES=1 TORCHINDUCTOR_MAX_AUTOTUNE_GEMM=1 buck2 run mode/opt //pytorch/tritonbench:run -c fbcode.nvcc_arch=b200a -c fbcode.enable_gpu_sections=true -c fbcode.platform010_cuda_version=12.8 -m "ovr_config//third-party/pypi/nvidia-cutlass-dsl/constraints:4.2.1" -- --op grouped_gemm --only aten_grouped_mm,preprocessed_pt2_cute_grouped_mm --precision bf16 --num-inputs 1 --metrics tflops,accuracy` Tritonbench(oss): `clear; CUDA_VISIBLE_DEVICES=2 TRITON_PRINT_AUTOTUNING=1 TRITON_ALWAYS_COMPILE=1 TORCH_LOGS=+inductor TORCHINDUCTOR_FORCE_DISABLE_CACHES=1 TORCHINDUCTOR_MAX_AUTOTUNE_GEMM=1 python run.py --op grouped_gemm --only aten_grouped_mm,preprocessed_pt2_triton_grouped_mm --precision bf16 --num-inputs 1 --metrics tflops,accuracy` Unit Tests(oss): `clear; python test/inductor/test_cutedsl_grouped_mm.py` Differential Revision: D86231180 Pull Request resolved: #167003 Approved by: https://github.com/jananisriram
ci: Add aarch64 docker builds for modern clang (#166416) Should enable us to build using some arm optimizations that are only available on the newest versions of clang. Signed-off-by: Eli Uriegas <eliuriegas@meta.com> Pull Request resolved: #166416 Approved by: https://github.com/malfet
[dynamo] extend `collections.defaultdict` support with `*args`, `**kw… …args` and custom `default_factory` (#166793) Fixes #166238 Extend `collections.defaultdict` to accept `*args` and `**kwargs` in the constructor. And also support custom `default_factory`, such as `dd.default_factory` (a `GetAttrVariable`). Pull Request resolved: #166793 Approved by: https://github.com/guilhermeleobas
PreviousNext