Skip to content

Handle concatenation nodes in bash command parser for exec policy#8395

Merged
bolinfest merged 3 commits intoopenai:mainfrom
ivanmurashko:fix-bash-concatenation-parser
Dec 22, 2025
Merged

Handle concatenation nodes in bash command parser for exec policy#8395
bolinfest merged 3 commits intoopenai:mainfrom
ivanmurashko:fix-bash-concatenation-parser

Conversation

@ivanmurashko
Copy link
Contributor

The bash command parser in exec_policy was failing to parse commands with concatenated flag-value patterns like -g"*.py" (no space between flag and quoted value). This caused policy rules like prefix_rule(pattern=["rg"]) to not match commands such as rg -n "foo" -g"*.py".

When tree-sitter-bash parses -g"*.py", it creates a "concatenation" node containing a word (-g) and a string ("*.py"). The parser previously rejected any node type not in the ALLOWED_KINDS list, causing the entire command parsing to fail and fall back to matching against the wrapped bash -lc command instead of the inner command.

This change:

  • Adds "concatenation" to ALLOWED_KINDS in try_parse_word_only_commands_sequence
  • Adds handling for concatenation nodes in parse_plain_command_from_node that recursively extracts and joins word/string/raw_string children
  • Adds test cases for concatenated flag patterns with double and single quotes

Fixes #8394

The bash command parser in exec_policy was failing to parse commands with
concatenated flag-value patterns like `-g"*.py"` (no space between flag and
quoted value). This caused policy rules like `prefix_rule(pattern=["rg"])`
to not match commands such as `rg -n "foo" -g"*.py"`.

When tree-sitter-bash parses `-g"*.py"`, it creates a "concatenation" node
containing a word (`-g`) and a string (`"*.py"`). The parser previously
rejected any node type not in the ALLOWED_KINDS list, causing the entire
command parsing to fail and fall back to matching against the wrapped
`bash -lc` command instead of the inner command.

This change:
- Adds "concatenation" to ALLOWED_KINDS in try_parse_word_only_commands_sequence
- Adds handling for concatenation nodes in parse_plain_command_from_node that
  recursively extracts and joins word/string/raw_string children
- Adds test cases for concatenated flag patterns with double and single quotes

Fixes openai#8394
@etraut-openai
Copy link
Collaborator

@codex review

@chatgpt-codex-connector
Copy link
Contributor

Codex Review: Didn't find any major issues. Swish!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link
Collaborator

@bolinfest bolinfest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ivanmurashko once again, really nice work here! Two small things and then I think we're good to go!

Comment on lines 199 to 201
if !concatenated.is_empty() {
words.push(concatenated);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is empty, is that a logical error in this code such that we should panic?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, an empty concatenation indicates a logical error (either in our code or unexpected tree-sitter behavior). However, rather than panicking, I've changed the code to return None to treat it as a parse failure, consistent with the fail-fast pattern used throughout this function. This change makes the error handling explicit rather than silently skipping the empty result.

If you think panic! is more appropriate here, I'm happy to change it—just let me know your preference.

Addressed in: 4d2f7c2

assert_eq!(parsed, vec![vec!["ls".to_string()]]);
}

#[test]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add a test that embeds an environment variable (or possibly other constructs) within a concatenation to ensure we don't match it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! While the current implementation correctly rejects these patterns (because tree-sitter creates variable or command_substitution nodes instead of string_content, failing the validation at line 175), there's no explicit test documenting this behavior. I'll add tests for:

  • Variable substitution: rg -g"$VAR" and rg -g"${VAR}"
  • Command substitution: rg -g"$(pwd)"

Addressed in 208109b

Change empty concatenation handling from silently skipping to explicitly
returning None. This makes the error handling consistent with other
validation failures in the function and prevents silently accepting
malformed input.

An empty concatenation indicates either a bug in the iteration logic or
unexpected AST structure from tree-sitter-bash. Returning None ensures
these edge cases are caught rather than being silently ignored.
Add test cases to verify that the bash parser correctly rejects
environment variable substitution and command substitution within
concatenated arguments.

These tests ensure that patterns like:
- rg -g"$VAR" (variable substitution)
- rg -g"${VAR}" (brace expansion)
- rg -g"$(pwd)" (command substitution)

are properly rejected by the parser. While the implementation already
handles these cases correctly (tree-sitter creates variable/command_substitution
nodes instead of string_content, causing validation to fail), these tests
explicitly document and verify the rejection behavior.
Copy link
Collaborator

@bolinfest bolinfest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fast reply and very nice work!

@bolinfest bolinfest enabled auto-merge (squash) December 22, 2025 19:45
@bolinfest bolinfest merged commit 0237459 into openai:main Dec 22, 2025
26 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Dec 22, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: Bash parser doesn't recognize commands with concatenated flag-value patterns

3 participants