-
-
Notifications
You must be signed in to change notification settings - Fork 12.1k
Migrate Interns1 inputs to TensorSchema #23510
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request migrates Interns1 inputs to use TensorSchema for better input validation, which is a great improvement for code quality and debug-ability. However, the refactoring has introduced a few critical issues where the num_patches field was either removed from an input class definition or from its instantiation, which will lead to runtime errors. I've provided comments and suggestions to fix these issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The num_patches field is required by InternS1VideoPixelInputs but it's missing from the constructor call. This will cause a ValueError at runtime because a required field is missing.
| return InternS1VideoPixelInputs( | |
| type="pixel_values_videos", | |
| pixel_values=self._validate_pixel_values( | |
| pixel_values_flat_video), | |
| num_patches=video_num_patches, | |
| pixel_values=pixel_values_flat_video, | |
| resolve_bindings={ | |
| "h": h, | |
| "w": w, | |
| }, | |
| ) | |
| return InternS1VideoPixelInputs( | |
| type="pixel_values_videos", | |
| pixel_values=pixel_values_flat_video, | |
| num_patches=video_num_patches, | |
| resolve_bindings={ | |
| "h": h, | |
| "w": w, | |
| }, | |
| ) |
|
Observing failing MM test for missing field. Will check schema: |
Signed-off-by: Benji Beck <benjibeck@meta.com>
Signed-off-by: Benji Beck <benjibeck@meta.com>
Head branch was pushed to by a user without write access
Signed-off-by: Benji Beck <benjibeck@meta.com>
Head branch was pushed to by a user without write access
Signed-off-by: Benji Beck <benjibeck@meta.com>
Signed-off-by: Benji Beck <benjibeck@meta.com>
Purpose
This PR migrates Interns1 inputs from a TypedDict-based definition to a structured TensorSchema model with runtime shape validation. This brings it in line with recent changes to Phi3VImagePixelInputs, and is part of a broader effort to improve input contract enforcement and debug-ability across multi-modal models.
More details: #14764 (comment)
Image Input Classes:
InternS1ImagePixelInputsInternS1ImageEmbeddingInputsVideo Input Classes:
InternS1VideoPixelInputsInternS1VideoEmbeddingInputsTest Plan
Confirm validation works via standalone tests in tests/standalone_test/test_tensor_schema.py and rely on CI to check integration.
Test Result