Handle concatenation nodes in bash command parser for exec policy#8395
Handle concatenation nodes in bash command parser for exec policy#8395bolinfest merged 3 commits intoopenai:mainfrom
Conversation
The bash command parser in exec_policy was failing to parse commands with concatenated flag-value patterns like `-g"*.py"` (no space between flag and quoted value). This caused policy rules like `prefix_rule(pattern=["rg"])` to not match commands such as `rg -n "foo" -g"*.py"`. When tree-sitter-bash parses `-g"*.py"`, it creates a "concatenation" node containing a word (`-g`) and a string (`"*.py"`). The parser previously rejected any node type not in the ALLOWED_KINDS list, causing the entire command parsing to fail and fall back to matching against the wrapped `bash -lc` command instead of the inner command. This change: - Adds "concatenation" to ALLOWED_KINDS in try_parse_word_only_commands_sequence - Adds handling for concatenation nodes in parse_plain_command_from_node that recursively extracts and joins word/string/raw_string children - Adds test cases for concatenated flag patterns with double and single quotes Fixes openai#8394
|
@codex review |
|
Codex Review: Didn't find any major issues. Swish! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
bolinfest
left a comment
There was a problem hiding this comment.
@ivanmurashko once again, really nice work here! Two small things and then I think we're good to go!
codex-rs/core/src/bash.rs
Outdated
| if !concatenated.is_empty() { | ||
| words.push(concatenated); | ||
| } |
There was a problem hiding this comment.
If it is empty, is that a logical error in this code such that we should panic?
There was a problem hiding this comment.
Yes, an empty concatenation indicates a logical error (either in our code or unexpected tree-sitter behavior). However, rather than panicking, I've changed the code to return None to treat it as a parse failure, consistent with the fail-fast pattern used throughout this function. This change makes the error handling explicit rather than silently skipping the empty result.
If you think panic! is more appropriate here, I'm happy to change it—just let me know your preference.
Addressed in: 4d2f7c2
| assert_eq!(parsed, vec![vec!["ls".to_string()]]); | ||
| } | ||
|
|
||
| #[test] |
There was a problem hiding this comment.
Could you please add a test that embeds an environment variable (or possibly other constructs) within a concatenation to ensure we don't match it?
There was a problem hiding this comment.
Good catch! While the current implementation correctly rejects these patterns (because tree-sitter creates variable or command_substitution nodes instead of string_content, failing the validation at line 175), there's no explicit test documenting this behavior. I'll add tests for:
- Variable substitution:
rg -g"$VAR"andrg -g"${VAR}" - Command substitution:
rg -g"$(pwd)"
Addressed in 208109b
Change empty concatenation handling from silently skipping to explicitly returning None. This makes the error handling consistent with other validation failures in the function and prevents silently accepting malformed input. An empty concatenation indicates either a bug in the iteration logic or unexpected AST structure from tree-sitter-bash. Returning None ensures these edge cases are caught rather than being silently ignored.
Add test cases to verify that the bash parser correctly rejects
environment variable substitution and command substitution within
concatenated arguments.
These tests ensure that patterns like:
- rg -g"$VAR" (variable substitution)
- rg -g"${VAR}" (brace expansion)
- rg -g"$(pwd)" (command substitution)
are properly rejected by the parser. While the implementation already
handles these cases correctly (tree-sitter creates variable/command_substitution
nodes instead of string_content, causing validation to fail), these tests
explicitly document and verify the rejection behavior.
bolinfest
left a comment
There was a problem hiding this comment.
Thanks for the fast reply and very nice work!
The bash command parser in exec_policy was failing to parse commands with concatenated flag-value patterns like
-g"*.py"(no space between flag and quoted value). This caused policy rules likeprefix_rule(pattern=["rg"])to not match commands such asrg -n "foo" -g"*.py".When tree-sitter-bash parses
-g"*.py", it creates a "concatenation" node containing a word (-g) and a string ("*.py"). The parser previously rejected any node type not in the ALLOWED_KINDS list, causing the entire command parsing to fail and fall back to matching against the wrappedbash -lccommand instead of the inner command.This change:
Fixes #8394