Skip to content

h11 keep-alive is slowing down HTTP clients #6557

@thrau

Description

@thrau

in #6445, it was reported that SQS had a performance regression with the new HTTP gateway, but it turns out the problem sits much deeper. my hunch that it has to do with HTTP connection management seems to have been correct.

TL/DR:

  • what's happening? hypercorn/h11 connection keep-alive is slowing down successive HTTP client calls that re-use connections
  • why was it working before? http2_server.py was forcing keep-alive to be disabled by responding with Connection: close headers

connection re-use and keep-alive in the spec vs localstack

RFC 7230 6.3 Persistence states that, unless the HTTP/1.1 client explicitly send the Connection: close header, then the server should use a persistent connection by default.

by forcing Connection: close here, we are not adhering to the spec.

if "Connection" not in response.headers:
response.headers["Connection"] = "close"

however this had a side effect in h11, which made clients close connections faster.

h11 keep-alive

h11 (the protocol library underlying hypercorn) uses a state machine for connections that look like this (copied from the h11 documentation):

the purple path start_next_cycle() was never used with http2_server.py. h11 considers keep-alive to be active according to the spec: if the client uses HTTP/1.1, and is not explicitly disabling it via a Connection: close header, then it will enable keep-alive: https://github.com/python-hyper/h11/blob/fb6c715184c6a1f29e299d2f63d722b2ea4309d5/h11/_connection.py#L72-L78

more details on re-using connections in h11 can be found in the docs: Re-using a connection: keep-alive and pipelining

It's probably not our asgi/wsgi bridge

Boto clients running against a mock quart application suffer from the same (albeit a bit less pronounced) problem:

from quart import Quart, request

app = Quart(__name__)

@app.route("/", methods=["POST", "GET"])
async def echo():
    data = await request.data
    await asyncio.sleep(0.005) # simulate handler chain
    return '<?xml version=\'1.0\' encoding=\'utf-8\'?>\n<ListQueuesResponse xmlns="http://queue.amazonaws.com/doc/2012-11-05/"><ListQueuesResult /><ResponseMetadata><RequestId>Y05BFPBE0ZMIURXXPY35R57NELYBSPCNY6XBU0FR0A6ITR27LZUJ</RequestId></ResponseMetadata></ListQueuesResponse>'

app.run(port=4566)

running the benchmark against this shows the same or at least similar behavior.

what to do?

well for now, the easiest fix would be to add a ResponseHandler to the gateway that sets the Connection: close header like the http2_server.py does. that's not great (breaking specs causes headaches), and it's also not clear how that will affect other clients. but don't seem to be having any problems doing it now so 🤷

uncovering what happens in boto and h11 would maybe help us figure out a better solution though. some initial debugging into urllib only led me to the connection pool class, where i found that the connection objects are re-used in both cases, but the connection is reset when Connection: close is set.

Originally posted by @thrau in #6445 (comment)

Metadata

Metadata

Assignees

No one assigned

    Labels

    area: performanceMake LocalStack go rocket-faststatus: confirmedBug report was confirmedtype: featureNew feature, or improvement to an existing feature

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions