[WIP]LTX modular + 1:1 match + improve agent debugging skills #13360
Open
[WIP]LTX modular + 1:1 match + improve agent debugging skills #13360
Conversation
Seven fixes to achieve bit-identical output between the diffusers LTX-2.3 pipeline and the reference Lightricks/LTX-2 implementation in bf16/GPU: 1. encode_video: use truncation (.astype) instead of .round() for float→uint8, matching the reference's .to(torch.uint8) behavior 2. Scheduler sigma computation: compute time_shift and stretch_shift_to_terminal in torch float32 instead of numpy float64 to match reference precision 3. Initial sigmas: use torch.linspace (float32) instead of np.linspace (float64) to produce bit-identical sigma schedules 4. CFG formula: use reference formula cond + (scale-1)*(cond-uncond) instead of uncond + scale*(cond-uncond) to match bf16 arithmetic order 5. Euler step: upcast model_output to sample dtype before multiplying by dt, avoiding bf16 precision loss from 0-dim tensor type promotion rules 6. x0→velocity division: use sigma.item() (Python float) instead of 0-dim tensor, matching reference's to_velocity which uses sigma.item() internally 7. RoPE: remove float32 upcast in apply_interleaved_rotary_emb and apply_split_rotary_emb, cast cos/sin to input dtype instead — reference computes RoPE in model dtype (bf16) without upcasting Also updates RMSNorm to use torch.nn.functional.rms_norm for consistency. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…g skill Model fixes: - Cross-attention timestep: always use cross-modality sigma instead of conditional on use_cross_timestep (matching reference preprocessor which always uses cross_modality.sigma) - This was the root cause of the remaining 3.56 pixel diff — the diffusers model used timestep.flatten() (2304 per-token values) instead of audio_sigma.flatten() (1 scalar) for cross-attention modulation Pipeline fixes: - Per-token timestep shape (B,S) instead of (B,) for main time_embed - f32 sigma for prompt_adaln (not bf16) - Audio decoder: .squeeze(0).float() to match reference output format Parity-testing skill updates: - Add Phase 2 (optional GPU/bf16) with same capture-inject methodology - Add 9 new pitfalls (#19-#27) from bf16 debugging - Decode test now includes final output format (encode_video, audio) - Add model interface mapping as required artifact from component tests - Add test directory + lab_book setup questions - Add example test script templates Result: diffusers pipeline produces pixel-identical video (0.0 diff) and bit-identical audio waveform vs reference pipeline. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds the LTX-2.3 modular pipeline structure: - modular_pipelines/ltx2/: encoders, modular_blocks, modular_pipeline - Registration in __init__.py, auto_pipeline.py, modular_pipeline mapping - Checkpoint utilities for parity testing - Supports T2V with CFG guidance (pixel-identical to reference) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.