Z1/2 init: flatten params on device by ksugama · Pull Request #7828 · deepspeedai/DeepSpeed

ksugama · 2026-02-03T14:09:48Z

This PR addresses FIXME by flattening parameter tensors on the accelerators instead of the CPU during zero stage 1 and 2 initialization. This should alleviate CPU contention, with the caveat that the optimization is only used when there is enough VRAM to allocate a full copy of the parameter buffers.

If necessary, this optimization can be extended to allowed a tiered system that trades off VRAM space with performance, which might look like the following:

if enough VRAM for 2x model_size:
    naive flatten
else if enough VRAM for model_size / N:
    distributed flatten across N devices
else:
    flatten on CPU

The distributed flatten would involve each device flattening a portion of the parameters and performing an all-gather to assemble the full flattened model. See FIXME for original discussion.

Signed-off-by: Kento Sugama <kentosugama@protonmail.ch>

ksugama added 7 commits February 3, 2026 17:19

flatten gpu side

30a45a7

Signed-off-by: Kento Sugama <kentosugama@protonmail.ch>

repro script

65a409a

Signed-off-by: Kento Sugama <kentosugama@protonmail.ch>

detect gpu count in repro

59527da

Signed-off-by: Kento Sugama <kentosugama@protonmail.ch>

add .venv to path

8131a86

Signed-off-by: Kento Sugama <kentosugama@protonmail.ch>

clean up

7dee2e0

Signed-off-by: Kento Sugama <kentosugama@protonmail.ch>

format and delete repro script

0bacc23

Signed-off-by: Kento Sugama <kentosugama@protonmail.ch>

add dedicated test

293fbab

Signed-off-by: Kento Sugama <kentosugama@protonmail.ch>

ksugama force-pushed the flatten-tensor-gpu branch from a07a21b to 293fbab Compare February 3, 2026 17:19

parametrize tests

dd8191b

Signed-off-by: Kento Sugama <kentosugama@protonmail.ch>

ksugama changed the title ~~Z1/2 Flatten Parameters on device~~ Z1/2 init: flatten params on device Feb 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Z1/2 init: flatten params on device#7828

Z1/2 init: flatten params on device#7828
ksugama wants to merge 8 commits intodeepspeedai:masterfrom
ksugama:flatten-tensor-gpu

ksugama commented Feb 3, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ksugama commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ksugama commented Feb 3, 2026 •

edited

Loading