Conversation
There was a problem hiding this comment.
Pull request overview
This PR introduces conditional receive functionality for handling non-deterministic outputs from debugged processes. The implementation uses an Aho-Corasick algorithm to efficiently match multiple patterns simultaneously.
Key changes:
- Added
match_recvuntilandmatch_recverruntilfunctions to PipeManager for pattern matching in stdout/stderr - Implemented an Aho-Corasick matcher class for efficient multi-pattern searching
- Added comprehensive unit tests for the Aho-Corasick algorithm and integration tests for the conditional receive functionality
Reviewed changes
Copilot reviewed 8 out of 11 changed files in this pull request and generated 13 comments.
Show a summary per file
| File | Description |
|---|---|
libdebug/utils/search_utils.py |
Implements AhoCorasickMatcher class for efficient multi-pattern searching with stateful matching capability |
libdebug/commlink/pipe_manager.py |
Adds match_recvuntil and match_recverruntil methods for conditional receive operations on stdout/stderr |
test/scripts/conditional_recv_test.py |
Provides comprehensive unit tests for Aho-Corasick matcher and integration tests for conditional receive functionality |
test/srcs/conditional_recv_test.c |
Test binary that generates random conditional output for testing pattern matching |
test/binaries/{i386,amd64,aarch64}/conditional_recv_test |
Compiled test binaries for different architectures |
test/scripts/__init__.py |
Registers ConditionalRecvTest in the test suite |
test/run_suite.py |
Adds ConditionalRecvTest to the fast test suite |
docs/basics/running_an_executable.md |
Documents the new match_recvuntil and match_recverruntil functions in the pipe manager API |
newsfragments/296.improvement.md |
Changelog entry describing the new feature |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 8 out of 11 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Did you try to benchmark the efficiency of Aho–Corasick with respect to other approaches? I am thinking about the really dumb approach of repeatedly calling bytes.find(pattern) for each pattern, or similar ones. I know what you’re thinking: the theoretical complexity of Aho–Corasick is very low. However, that complexity really manifests in cases of (i) large buffers and (ii) a high number of patterns.
On the other hand, I expect only a few patterns (2 or 3?) and buffers smaller than a single read (4096 bytes) in most cases (and at worst only a few times larger than a single read). Since the implementation allocates a non-negligible number of Python objects, while other “dumb” solutions are C-backed, there might be some surprises, as we have seen in the past with other pattern-searching code in libdebug, where the dumb solution was more efficient than the state-of-the-art algorithm for similar reasons.
I don’t really know the answer. I’m just suggesting that it might be worth running a few tests to ensure that the most common cases are prioritized.
Also, remember we are in 2026… the copyright 😉
PS. really cool stuff, btw
|
|
||
| def _internal_match_recvuntil( | ||
| self: PipeManager, | ||
| patterns: list[bytes], |
| while pattern_found < 0: | ||
| open_flag = self._stderr_is_open if stderr else self._stdout_is_open | ||
|
|
||
| if (remaining_time := max(0, end_time - time.time())) == 0: |
There was a problem hiding this comment.
This should be after the first search; it is pointless to check during the first iteration if the time has already elapsed
| if (remaining_time := max(0, end_time - time.time())) == 0: | ||
| raise TimeoutError("Timeout reached") | ||
|
|
||
| if not open_flag: |
There was a problem hiding this comment.
This, instead, MUST be after the first search. If data has been buffered, you are missing it. See _recvonceuntil as an example, where you have (until := data_buffer.find(delims)) at the beginning of the while. Or I am missing something?
| ): | ||
| # We will not receive more data, the child process is not running | ||
| if optional: | ||
| return (-1, bytes(matcher.consumed_bytes)) |
There was a problem hiding this comment.
same here, should not call first mather.stateful_search?
|
I think that we should also support regex-based conditional recv functions too. I expect that some users (cough cough maybe Chino? cough cough) might expect I'm also not sure about the names, if I need to call Maybe |
I did consider the regex matching recv as an alternative, but matching it on a stream like in the case of the multiple patterns would require adding additional dependencies such as Google's RE2 (reimplementing a regex engine is probably outside the scope of libdebug). As an alternative, I can implement a naive approach that just reruns the regex matcher from the builtin python package on the overall buffer as input is accumulated, which would probably be slow. Still, it there's demand for it, we can consider both options. As for renaming the function, we can do a poll with different options. |
I think doing it like pwntools does would be fine for now. |
|
This would be really handy for CTF exploitation where the binary output changes depending on state or randomization. match_recvuntil with a list of patterns is much cleaner than wrapping everything in try/except blocks or layering timeouts. +1 on this, looking forward to seeing it land. |

This PR introduces two new pipe interaction functions designed to handle non-deterministic outputs.
The new functions,
match_recvuntilandmatch_recverruntil, allow users to specify multiple patterns that the binary may produce. Each function returns the index of the matched pattern along with the consumed bytes.