h11 keep-alive is slowing down HTTP clients

in #6445, it was reported that SQS had a performance regression with the new HTTP gateway, but it turns out the problem sits much deeper. my hunch that it has to do with HTTP connection management seems to have been correct.

TL/DR:

* what's happening? hypercorn/h11 connection keep-alive is slowing down successive HTTP client calls that re-use connections
* why was it working before? `http2_server.py` was forcing keep-alive to be disabled by responding with `Connection: close` headers

## connection re-use and keep-alive in the spec vs  localstack

[RFC 7230 6.3 Persistence](https://datatracker.ietf.org/doc/html/rfc7230#section-6.3) states that, unless the HTTP/1.1 client explicitly send the `Connection: close` header, then the server should use a persistent connection by default.

by forcing `Connection: close` here, we are not adhering to the spec. https://github.com/localstack/localstack/blob/736e7ad490deda766b81b0ed893a0322bc9d3d88/localstack/utils/server/http2_server.py#L225-L226

however this had a side effect in h11, which made clients close connections faster.

## h11 keep-alive

h11 (the protocol library underlying hypercorn) uses a state machine for connections that look like this (copied from the h11 documentation):

<img src="https://h11.readthedocs.io/en/latest/_static/SERVER.svg">

the purple path `start_next_cycle()` was never used with `http2_server.py`. h11 considers keep-alive to be active according to the spec: if the client uses HTTP/1.1, and is not explicitly disabling it via a `Connection: close` header, then it will enable keep-alive: https://github.com/python-hyper/h11/blob/fb6c715184c6a1f29e299d2f63d722b2ea4309d5/h11/_connection.py#L72-L78

more details on re-using connections in h11 can be found in the docs: [Re-using a connection: keep-alive and pipelining](https://h11.readthedocs.io/en/latest/api.html#re-using-a-connection-keep-alive-and-pipelining)


## It's probably not our asgi/wsgi bridge

Boto clients running against a mock quart application suffer from the same (albeit a bit less pronounced) problem:

```python
from quart import Quart, request

app = Quart(__name__)

@app.route("/", methods=["POST", "GET"])
async def echo():
    data = await request.data
    await asyncio.sleep(0.005) # simulate handler chain
    return '<?xml version=\'1.0\' encoding=\'utf-8\'?>\n<ListQueuesResponse xmlns="http://queue.amazonaws.com/doc/2012-11-05/"><ListQueuesResult /><ResponseMetadata><RequestId>Y05BFPBE0ZMIURXXPY35R57NELYBSPCNY6XBU0FR0A6ITR27LZUJ</RequestId></ResponseMetadata></ListQueuesResponse>'

app.run(port=4566)
```
running the benchmark against this shows the same or at least similar behavior.


## what to do?

well for now, the easiest fix would be to add a `ResponseHandler` to the gateway that sets the `Connection: close` header like the http2_server.py does. that's not great (breaking specs causes headaches), and it's also not clear how that will affect other clients. but don't seem to be having any problems doing it now so :shrug: 

uncovering what happens in boto and h11 would maybe help us figure out a better solution though. some initial debugging into urllib only led me to the connection pool class, where i found that the connection objects are re-used in both cases, but the connection is reset when `Connection: close` is set.

_Originally posted by @thrau in https://github.com/localstack/localstack/issues/6445#issuecomment-1200274407_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

h11 keep-alive is slowing down HTTP clients #6557

connection re-use and keep-alive in the spec vs localstack

h11 keep-alive

It's probably not our asgi/wsgi bridge

what to do?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	if "Connection" not in response.headers:
	response.headers["Connection"] = "close"

Uh oh!

h11 keep-alive is slowing down HTTP clients #6557

Description

connection re-use and keep-alive in the spec vs localstack

h11 keep-alive

It's probably not our asgi/wsgi bridge

what to do?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions