Skip to content

Conversation

@maurer
Copy link

@maurer maurer commented Aug 14, 2025

Currently, while quiche::Connection will clean up streams from its StreamMap, quiche::h3::Connection does not. As a result, long lived connections will slowly leak memory through the stream map in the connection.

This patch forwards the collect() event from the quiche::Connection to the quiche::h3::Connection through the h3::Connection::poll() call, allowing the h3::Connection to clean up its unused streams.

@maurer maurer requested a review from a team as a code owner August 14, 2025 18:30
@ghedo
Copy link
Member

ghedo commented Sep 19, 2025

Thnak you for the PR, and sorry for the delay. Looking at this my main worry was that an application not using the quiche h3 implementation (e.g. implementing a different application protocol on top of QUIC) would have to drain the collected streams queue itself or risk a memory leak of its own. That said, given that we keep a map of collected streams anyway this is probably not that big of an issue.

We do probably want to expose that API to applications directly, though an issue there would be both application and quiche::h3 draining the queue which could lead to leaking memory anyway. We'll need to think about how the API should work a bit more maybe.

I also spent some time looking at a different fix that wouldn't need to keep a queue of streams (https://github.com/cloudflare/quiche/compare/clean-streams-2?expand=1) though that doesn't quite handle some edge cases very well yet, so it wouldn't be a full replacement of this PR I think.

Currently, while `quiche::Connection` will clean up streams from its
`StreamMap`, `quiche::h3::Connection` does not. As a result, long lived
connections will slowly leak memory through the stream map in the
connection.

This patch forwards the `collect()` event from the `quiche::Connection`
to the `quiche::h3::Connection` through the `h3::Connection::poll()`
call, allowing the `h3::Connection` to clean up its unused streams.
@maurer
Copy link
Author

maurer commented Sep 23, 2025

Sorry, mixup with the git branch accidentally closed things.

I tried some things along the lines of what you were tracking with local_finished where I tried to avoid adding variable-sized data to the structure, but could never get it to actually stop the memory growth in practice. It sounds like you already understand what some of these cases are. If not, I could try to extend my test cases to produce them.

The trouble I ran into without actually having a list of pending streams was that it didn't seem like I had enough information at the h3 module to figure out when a stream was actually done. I'd either fail to collect idle-but-completed streams (our DoH app would continue to grow), or I'd be too aggressive and remove things that your test suite forbids (this did work in practice for our app though).

If you get to the point with your alternate implementation that you'd like me to test it for leakage on our test app I'll give it a shot. If you're curious about the how we're using it, something very close to its current state is here. The leak case occurs when an extremely reliable network connection keeps making queries - every query ends up forcing h3::Conn to keep the stream for that query alive.

@maurer maurer reopened this Sep 23, 2025
s.handshake().unwrap();

let init_streams_client = s.client.stream_count();
let init_streams_server = s.client.stream_count();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think this should be server?

Suggested change
let init_streams_server = s.client.stream_count();
let init_streams_server = s.server.stream_count();

self.local_max_streams_uni - self.peer_opened_streams_uni
}

/// Return streams we have collected but not yet notified the transport
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what "not yet notified the transport" means here? The quiche transport lib is the thing that would be calling collect() that would trigger adding to pending_collected.

// Polling again should clean up the client
assert_eq!(s.poll_client(), Ok((stream, Event::Finished)));
assert_eq!(s.poll_client(), Err(Error::Done));
assert_eq!(s.client.stream_count(), init_streams_client);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to see tests that cover reset and stopped streams cases too

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants