Skip to content

[fx] Add strict argument validation to Interpreter.boxed_run#166784

Closed
meghendra6 wants to merge 4 commits intopytorch:mainfrom
meghendra6:fx-interpreter-boxed-run-strict-args
Closed

[fx] Add strict argument validation to Interpreter.boxed_run#166784
meghendra6 wants to merge 4 commits intopytorch:mainfrom
meghendra6:fx-interpreter-boxed-run-strict-args

Conversation

@meghendra6
Copy link
Contributor

Summary

This PR fixes an issue where torch.fx.Interpreter.boxed_run would silently ignore extra input arguments instead of validating the argument count.

Previously, boxed_run would only consume as many inputs as there were placeholder nodes and then clear the entire args_list, hiding potential bugs. This change introduces a strict check to ensure len(args_list) matches the number of placeholder nodes, raising a RuntimeError on a mismatch.

Fixes #166583.

Changes

  • Validate len(args_list) against the number of placeholder nodes at the beginning of boxed_run.
  • Raise a RuntimeError with a clear message ("extra arguments" or "missing arguments") if the counts do not match.
  • Move args_list.clear() to only execute after successful validation and environment setup. If an error is raised, args_list is preserved for debugging.

Testing

  • Added test_interpreter_boxed_run_argument_validation to test/test_fx.py.
  • This test covers three scenarios:
    1. Correct number of arguments (succeeds, args_list is cleared).
    2. Extra arguments (raises RuntimeError, args_list is preserved).
    3. Missing arguments (raises RuntimeError, args_list is preserved).

User-facing impact / BC notes

This is a bug fix. Code that was incorrectly passing the wrong number of arguments to boxed_run will now fail fast with a RuntimeError instead of executing silently with unintended inputs. Correctly written code is unaffected.

cc @ezyang @EikanWang @jgong5 @wenzhe-nrv @chauhang @penguinwu @xmfan

@pytorch-bot
Copy link

pytorch-bot bot commented Nov 1, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/166784

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit f157e8b with merge base 7b64ad9 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added the release notes: fx release notes category label Nov 1, 2025
@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Nov 1, 2025

CLA Signed

The committers listed above are authorized under a signed CLA.

Comment on lines 223 to 237
placeholder_nodes = [n for n in self.graph.nodes if n.op == "placeholder"]
expected_args = len(placeholder_nodes)
actual_args = len(args_list)
if actual_args != expected_args:
detail = (
"extra arguments"
if actual_args > expected_args
else "missing arguments"
)
raise RuntimeError(
f"Interpreter.boxed_run expected {expected_args} arguments "
f"for placeholders but received {actual_args} ({detail})."
)

env = {n: arg for n, arg in zip(placeholder_nodes, args_list)}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a lot of changes, could we just add an assertion on line 228?

  args_iter = iter(args_list)
  env = {}
  for n in self.graph.nodes:
      if n.op == "placeholder":
          env[n] = next(args_iter)

+ assert len(args_list) == len(env), "<error message>"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion! I went ahead and simplified the code following your idea. The commit (4353144
) keeps the original loop and adds a simple check for the arg count. Appreciate the quick feedback!

Copy link
Contributor

@ezyang ezyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this seems overly complicated

Copy link
Contributor

@ezyang ezyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I would prefer simplifying this more by collecting the list of nodes to assign first, and then doing the length check, and then assigning. This eliminates the StopIteration logic.

@meghendra6
Copy link
Contributor Author

I think I would prefer simplifying this more by collecting the list of nodes to assign first, and then doing the length check, and then assigning. This eliminates the StopIteration logic.

Good point, that makes sense. I’ve updated the code in 81c6beb
to collect the placeholder nodes first and then do the length check before assigning. Thanks for the helpful suggestion!

@ezyang
Copy link
Contributor

ezyang commented Nov 3, 2025

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 3, 2025
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 mandatory check(s) failed. The first few are:

Dig deeper by viewing the failures on hud

Details for Dev Infra team Raised by workflow job

Failing merge rule: Core Maintainers

@pytorch-bot pytorch-bot bot removed the ciflow/trunk Trigger trunk jobs on your pull request label Nov 3, 2025
@meghendra6
Copy link
Contributor Author

@ezyang I've checked the CI failures, and they appear to be unrelated to the changes in this PR.

The main issue seems to be in test_external_module_register_with_existing_backend, which is failing with:
torch._dynamo.exc.InternalTorchDynamoError: RuntimeError: Device 'maia' does not have a corresponding module registered as 'torch.maia'.

This looks like an infrastructure or environment issue, not caused by my code.

Could someone advise on how to proceed? Happy to rebase or try a re-run.

@xmfan
Copy link
Member

xmfan commented Nov 4, 2025

dynamo_wrapped contain the most important tests for this PR, could we rebase on viable/strict to let CI try again?

@ezyang
Copy link
Contributor

ezyang commented Nov 4, 2025

@pytorchbot merge -r

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 4, 2025
@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

Fixes issue where extra arguments were silently ignored, leading to
graphs running with incorrect inputs. Now raises clear RuntimeError
for both extra and missing arguments.

Also improves error messages for better debugging experience.
@pytorchmergebot
Copy link
Collaborator

Successfully rebased fx-interpreter-boxed-run-strict-args onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout fx-interpreter-boxed-run-strict-args && git pull --rebase)

@pytorchmergebot pytorchmergebot force-pushed the fx-interpreter-boxed-run-strict-args branch from 2ff84f6 to f157e8b Compare November 4, 2025 04:21
@pytorch-bot pytorch-bot bot removed the ciflow/trunk Trigger trunk jobs on your pull request label Nov 4, 2025
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 3 mandatory check(s) failed. The first few are:

Dig deeper by viewing the failures on hud

Details for Dev Infra team Raised by workflow job

Failing merge rule: Core Maintainers

@meghendra6
Copy link
Contributor Author

It looks like all checks are green, but the PR still shows that the merge failed. I'm a bit confused about what's causing the issue.

Any chance you could see what's going on?

@xmfan
Copy link
Member

xmfan commented Nov 5, 2025

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 5, 2025
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request fx Merged open source release notes: fx release notes category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fx.Interpreter.boxed_run silently ignores extra args

6 participants