-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Description
Hi java-buildpack team,
As far as I can tell the tomcat default timeout for keep alives is 60 seconds
- keepAliveTimeout The number of milliseconds this Connector will wait for another HTTP request before closing the connection. The default value is to use the value that has been set for the connectionTimeout attribute. Use a value of -1 to indicate no (i.e. infinite) timeout
- connectionTimeout The number of milliseconds this Connector will wait, after accepting a connection, for the request URI line to be presented. Use a value of -1 to indicate no (i.e. infinite) timeout. The default value is 60000 (i.e. 60 seconds) but note that the standard server.xml that ships with Tomcat sets this to 20000 (i.e. 20 seconds). Unless disableUploadTimeout is set to false, this timeout will also be used when reading the request body (if any).
However when keep-alive backend connections are enabled in the gorouter, the gorouter will keep connections alive for 90 seconds.
Any time connections are kept open using http keep-alive, it's important that the client always closes the connection before the server does. If the server closes the connections first there is a risk of a race condition where a client will sends a request to the server on a pre-established connection which arrives immediately after a server has closed the connection.
This means that any person using Tomcat in its default configuration in a Cloud Foundry with keep-alive connections enabled will see occasional 502s caused by this race conditions (the clue for diagnosing this is that it happens exactly 60s after the last successful connection).
(I'm not sure if this is better as a gorouter issue or a java-buildpack issue, but in any case default values should work well together, whatever they happen to be)
Supporting information about why it's important for senders to close connections before receivers do for any L7 client-server connection using keep-alives.
https://docs.aws.amazon.com/elasticloadbalancing/latest/classic/config-idle-timeout.html
To ensure that the load balancer is responsible for closing the connections to your instance, make sure that the value you set for the HTTP keep-alive time is greater than the idle timeout setting configured for your load balancer.
https://docs.pivotal.io/application-service/2-8/operating/frontend-idle-timeout.html
In general, set the value higher than your load balancer’s back end idle timeout to avoid the race condition where the load balancer sends a request before it discovers that the Gorouter or HAProxy has closed the connection.
https://blog.percy.io/tuning-nginx-behind-google-cloud-platform-http-s-load-balancer-305982ddb340
This causes the load balancer to be the side that closes idle connections, rather than nginx, which fixes the race condition!
Let’s look from the “outside” (the client) to the “inside” (the server in the deployment). As you travel further in, the keep-alive timeouts should be configured to be longer. That is, the outermost layer (in this case, the client) should have the shortest backend keep-alive timeout, and as you go in, the keep-alive timeouts should get progressively longer in relation to the corresponding backend or frontend idle timeout
Consider Golang’s HTTP client, which we believe to be somewhat robust to this type of client-side problem. The low level http transport package used by the default client implements a background loop that checks if a connection in its idle pool has been closed by the server before reusing it to make a request. This leaves a small window for a race condition where the server could close the connection between the check and the new request; in the event that the client loses this race, it retries this request on a new connection.