-
-
Notifications
You must be signed in to change notification settings - Fork 4.5k
Description
in #6445, it was reported that SQS had a performance regression with the new HTTP gateway, but it turns out the problem sits much deeper. my hunch that it has to do with HTTP connection management seems to have been correct.
TL/DR:
- what's happening? hypercorn/h11 connection keep-alive is slowing down successive HTTP client calls that re-use connections
- why was it working before?
http2_server.pywas forcing keep-alive to be disabled by responding withConnection: closeheaders
connection re-use and keep-alive in the spec vs localstack
RFC 7230 6.3 Persistence states that, unless the HTTP/1.1 client explicitly send the Connection: close header, then the server should use a persistent connection by default.
by forcing Connection: close here, we are not adhering to the spec.
localstack/localstack/utils/server/http2_server.py
Lines 225 to 226 in 736e7ad
| if "Connection" not in response.headers: | |
| response.headers["Connection"] = "close" |
however this had a side effect in h11, which made clients close connections faster.
h11 keep-alive
h11 (the protocol library underlying hypercorn) uses a state machine for connections that look like this (copied from the h11 documentation):
the purple path start_next_cycle() was never used with http2_server.py. h11 considers keep-alive to be active according to the spec: if the client uses HTTP/1.1, and is not explicitly disabling it via a Connection: close header, then it will enable keep-alive: https://github.com/python-hyper/h11/blob/fb6c715184c6a1f29e299d2f63d722b2ea4309d5/h11/_connection.py#L72-L78
more details on re-using connections in h11 can be found in the docs: Re-using a connection: keep-alive and pipelining
It's probably not our asgi/wsgi bridge
Boto clients running against a mock quart application suffer from the same (albeit a bit less pronounced) problem:
from quart import Quart, request
app = Quart(__name__)
@app.route("/", methods=["POST", "GET"])
async def echo():
data = await request.data
await asyncio.sleep(0.005) # simulate handler chain
return '<?xml version=\'1.0\' encoding=\'utf-8\'?>\n<ListQueuesResponse xmlns="http://queue.amazonaws.com/doc/2012-11-05/"><ListQueuesResult /><ResponseMetadata><RequestId>Y05BFPBE0ZMIURXXPY35R57NELYBSPCNY6XBU0FR0A6ITR27LZUJ</RequestId></ResponseMetadata></ListQueuesResponse>'
app.run(port=4566)running the benchmark against this shows the same or at least similar behavior.
what to do?
well for now, the easiest fix would be to add a ResponseHandler to the gateway that sets the Connection: close header like the http2_server.py does. that's not great (breaking specs causes headaches), and it's also not clear how that will affect other clients. but don't seem to be having any problems doing it now so 🤷
uncovering what happens in boto and h11 would maybe help us figure out a better solution though. some initial debugging into urllib only led me to the connection pool class, where i found that the connection objects are re-used in both cases, but the connection is reset when Connection: close is set.
Originally posted by @thrau in #6445 (comment)