-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Closed
Labels
module: fsdponcall: distributedAdd this issue/PR to distributed oncall triage queueAdd this issue/PR to distributed oncall triage queuetriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module
Description
test_fsdp_pure_fp16.py has been mostly flaky or broken. For now, we are disabling the parameter check after the training step. However, this means that the backward pass could be incorrect for pure FP16 models using FSDP.
cc @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @gqchen @aazzolini @osalpekar @jiayisuse @H-Huang @kwen2501
Metadata
Metadata
Assignees
Labels
module: fsdponcall: distributedAdd this issue/PR to distributed oncall triage queueAdd this issue/PR to distributed oncall triage queuetriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module