Skip to content

Conditional recv functions#296

Open
Frank01001 wants to merge 5 commits intomainfrom
conditional-recv
Open

Conditional recv functions#296
Frank01001 wants to merge 5 commits intomainfrom
conditional-recv

Conversation

@Frank01001
Copy link
Copy Markdown
Member

This PR introduces two new pipe interaction functions designed to handle non-deterministic outputs.

The new functions, match_recvuntil and match_recverruntil, allow users to specify multiple patterns that the binary may produce. Each function returns the index of the matched pattern along with the consumed bytes.

@Frank01001 Frank01001 self-assigned this Jan 2, 2026
@Frank01001 Frank01001 requested a review from Copilot January 2, 2026 15:33
@Frank01001 Frank01001 added the enhancement New feature or request label Jan 2, 2026
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces conditional receive functionality for handling non-deterministic outputs from debugged processes. The implementation uses an Aho-Corasick algorithm to efficiently match multiple patterns simultaneously.

Key changes:

  • Added match_recvuntil and match_recverruntil functions to PipeManager for pattern matching in stdout/stderr
  • Implemented an Aho-Corasick matcher class for efficient multi-pattern searching
  • Added comprehensive unit tests for the Aho-Corasick algorithm and integration tests for the conditional receive functionality

Reviewed changes

Copilot reviewed 8 out of 11 changed files in this pull request and generated 13 comments.

Show a summary per file
File Description
libdebug/utils/search_utils.py Implements AhoCorasickMatcher class for efficient multi-pattern searching with stateful matching capability
libdebug/commlink/pipe_manager.py Adds match_recvuntil and match_recverruntil methods for conditional receive operations on stdout/stderr
test/scripts/conditional_recv_test.py Provides comprehensive unit tests for Aho-Corasick matcher and integration tests for conditional receive functionality
test/srcs/conditional_recv_test.c Test binary that generates random conditional output for testing pattern matching
test/binaries/{i386,amd64,aarch64}/conditional_recv_test Compiled test binaries for different architectures
test/scripts/__init__.py Registers ConditionalRecvTest in the test suite
test/run_suite.py Adds ConditionalRecvTest to the fast test suite
docs/basics/running_an_executable.md Documents the new match_recvuntil and match_recverruntil functions in the pipe manager API
newsfragments/296.improvement.md Changelog entry describing the new feature

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 11 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@Frank01001 Frank01001 marked this pull request as ready for review January 4, 2026 15:20
@Frank01001 Frank01001 requested review from MrIndeciso and io-no January 4, 2026 15:20
Copy link
Copy Markdown
Member

@io-no io-no left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you try to benchmark the efficiency of Aho–Corasick with respect to other approaches? I am thinking about the really dumb approach of repeatedly calling bytes.find(pattern) for each pattern, or similar ones. I know what you’re thinking: the theoretical complexity of Aho–Corasick is very low. However, that complexity really manifests in cases of (i) large buffers and (ii) a high number of patterns.

On the other hand, I expect only a few patterns (2 or 3?) and buffers smaller than a single read (4096 bytes) in most cases (and at worst only a few times larger than a single read). Since the implementation allocates a non-negligible number of Python objects, while other “dumb” solutions are C-backed, there might be some surprises, as we have seen in the past with other pattern-searching code in libdebug, where the dumb solution was more efficient than the state-of-the-art algorithm for similar reasons.

I don’t really know the answer. I’m just suggesting that it might be worth running a few tests to ensure that the most common cases are prioritized.

Also, remember we are in 2026… the copyright 😉

PS. really cool stuff, btw


def _internal_match_recvuntil(
self: PipeManager,
patterns: list[bytes],
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bytes | str

while pattern_found < 0:
open_flag = self._stderr_is_open if stderr else self._stdout_is_open

if (remaining_time := max(0, end_time - time.time())) == 0:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be after the first search; it is pointless to check during the first iteration if the time has already elapsed

if (remaining_time := max(0, end_time - time.time())) == 0:
raise TimeoutError("Timeout reached")

if not open_flag:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This, instead, MUST be after the first search. If data has been buffered, you are missing it. See _recvonceuntil as an example, where you have (until := data_buffer.find(delims)) at the beginning of the while. Or I am missing something?

):
# We will not receive more data, the child process is not running
if optional:
return (-1, bytes(matcher.consumed_bytes))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, should not call first mather.stateful_search?

@MrIndeciso
Copy link
Copy Markdown
Member

I think that we should also support regex-based conditional recv functions too. I expect that some users (cough cough maybe Chino? cough cough) might expect match_[...] to work with regex-like patterns.

I'm also not sure about the names, if I need to call recvuntil with multiple patterns, I think that recvuntil_something is easier to find than something_recvuntil.

Maybe recvuntil_regex and recvuntil_pattern?

@Frank01001
Copy link
Copy Markdown
Member Author

I think that we should also support regex-based conditional recv functions too. I expect that some users (cough cough maybe Chino? cough cough) might expect match_[...] to work with regex-like patterns.

I'm also not sure about the names, if I need to call recvuntil with multiple patterns, I think that recvuntil_something is easier to find than something_recvuntil.

Maybe recvuntil_regex and recvuntil_pattern?

I did consider the regex matching recv as an alternative, but matching it on a stream like in the case of the multiple patterns would require adding additional dependencies such as Google's RE2 (reimplementing a regex engine is probably outside the scope of libdebug). As an alternative, I can implement a naive approach that just reruns the regex matcher from the builtin python package on the overall buffer as input is accumulated, which would probably be slow. Still, it there's demand for it, we can consider both options.

As for renaming the function, we can do a poll with different options.

@MrIndeciso
Copy link
Copy Markdown
Member

I did consider the regex matching recv as an alternative, but matching it on a stream like in the case of the multiple patterns would require adding additional dependencies such as Google's RE2 (reimplementing a regex engine is probably outside the scope of libdebug). As an alternative, I can implement a naive approach that just reruns the regex matcher from the builtin python package on the overall buffer as input is accumulated, which would probably be slow. Still, it there's demand for it, we can consider both options.

I think doing it like pwntools does would be fine for now.

@Frank01001
Copy link
Copy Markdown
Member Author

Did you try to benchmark the efficiency of Aho–Corasick with respect to other approaches? I am thinking about the really dumb approach of repeatedly calling bytes.find(pattern) for each pattern, or similar ones. I know what you’re thinking: the theoretical complexity of Aho–Corasick is very low. However, that complexity really manifests in cases of (i) large buffers and (ii) a high number of patterns.

On the other hand, I expect only a few patterns (2 or 3?) and buffers smaller than a single read (4096 bytes) in most cases (and at worst only a few times larger than a single read). Since the implementation allocates a non-negligible number of Python objects, while other “dumb” solutions are C-backed, there might be some surprises, as we have seen in the past with other pattern-searching code in libdebug, where the dumb solution was more efficient than the state-of-the-art algorithm for similar reasons.

I don’t really know the answer. I’m just suggesting that it might be worth running a few tests to ensure that the most common cases are prioritized.

Also, remember we are in 2026… the copyright 😉

PS. really cool stuff, btw

Screenshot From 2026-01-05 13-15-39

Yes, the naive approach appears to be faster on the test binary. I guess we could either stick to the base approach or set a threshold on the number of patterns / flag to enable/disable it. I don't have large use cases in mind right now that I can readily benchmark.

@kbrenner-dev
Copy link
Copy Markdown

This would be really handy for CTF exploitation where the binary output changes depending on state or randomization. match_recvuntil with a list of patterns is much cleaner than wrapping everything in try/except blocks or layering timeouts. +1 on this, looking forward to seeing it land.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants