-
Notifications
You must be signed in to change notification settings - Fork 191
Pull requests: awslabs/awsome-distributed-ai
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
fix(parallelcluster-prereqs): two private subnets + Multi-AZ FSx OpenZFS for /home (follow-up to #1100)
#1102
opened May 17, 2026 by
KeitaW
Collaborator
Loading…
8 of 9 tasks
Add two-post series: Multi-Tenant Slurm on AWS ParallelCluster (Parts 1 & 2)
#1093
opened May 16, 2026 by
KeitaW
Collaborator
Loading…
5 of 7 tasks
fix(test_cases): replace hardcoded user paths with overridable env vars
#1092
opened May 15, 2026 by
Zhenye-Na
Contributor
Loading…
6 of 7 tasks
Add Qwen3.6-35B-A3B MoE LoRA function-calling test case (Megatron-Bridge + Kubeflow PyTorchJob)
#1091
opened May 14, 2026 by
yhou-uk
Collaborator
Loading…
7 tasks done
feat(hyperpod-eks): add Cilium CNI support with overlay and chaining modes
#1090
opened May 14, 2026 by
bluecrayon52
Contributor
Loading…
Adding Nvidia Isaac Lab sample
#1083
opened May 6, 2026 by
allela-roy
Contributor
Loading…
7 tasks done
fix(verl/rlvr): Add EFA host mounts, fix data format bug, and add optimized GRPO recipe
#1062
opened Apr 9, 2026 by
paragao
Contributor
Loading…
Utils script to create users on all nodes (login, controller and compute), run from any node
#1061
opened Apr 8, 2026 by
mayankgupta14
Collaborator
Loading…
7 tasks
Bump transformers from 4.48.0 to 5.0.0rc3 in /3.test_cases/pytorch/nvrx
dependencies
Pull requests that update a dependency file
python
Pull requests that update python code
#1057
opened Apr 8, 2026 by
dependabot
Bot
Loading…
Add veRL GRPO training recipe for gpt-oss-20b on g5.12xlarge
#1054
opened Apr 4, 2026 by
nkumaraws
Contributor
Loading…
Bump requests from 2.32.3 to 2.33.0 in /3.test_cases/pytorch/nvrx
dependencies
Pull requests that update a dependency file
python
Pull requests that update python code
#1036
opened Mar 25, 2026 by
dependabot
Bot
Loading…
Add V-JEPA 2 (Meta FAIR) distributed training test case
#1035
opened Mar 23, 2026 by
paragao
Contributor
Loading…
feat: Add observability IAM permissions for RIG cluster execution role
#1030
opened Mar 20, 2026 by
Madhubalasri-B
Collaborator
Loading…
Add DeepSpeed CI regression tests for QLoRA and GPT-103B
#1029
opened Mar 20, 2026 by
paragao
Contributor
Loading…
fix: overhaul CI workflows for FSDP regression tests
#1024
opened Mar 17, 2026 by
paragao
Contributor
Loading…
Add OSMO AMR Navigation test case
#1018
opened Mar 12, 2026 by
KeitaW
Collaborator
Loading…
1 of 3 tasks
docs: add Instance Compatibility Guide with per-test-case configuration tables
#1017
opened Mar 11, 2026 by
nkumaraws
Contributor
Loading…
Updating CF stack for GB200 local zone deployments
#968
opened Feb 17, 2026 by
KeitaW
Collaborator
Loading…
ProTip!
Type g p on any issue or pull request to go back to the pull request listing page.