Skip to content

Conversation

@adelsz
Copy link

@adelsz adelsz commented May 13, 2025

This PR introduces a new shutdownDuration configuration variable which allows the proxy to shutdown gracefully.

After receiving a termination signal, proxy will keep serving traffic for shutdownDuration but will start returning a 503 error for its readiness /ready endpoint signaling to the load balancer to not send new traffic to it.

After shutdownDuration, proxy shuts down as usual.

Motivation and Context

Fixes #3066

How Has This Been Tested?

  1. Launch proxy
  2. Send SIGTERM
  3. Observe that proxy keeps serving traffic for shutdownDuration seconds
  4. While timeout hasn't elapsed /ready endpoint returns an error
  5. After timeout the proxy exits normally

Checklist:

  • My change requires a change to the documentation or CHANGELOG.
  • I have updated the documentation/CHANGELOG accordingly.
  • I have created a feature (non-master) branch for my PR.
  • I have written tests for my code changes.

@adelsz adelsz requested a review from a team as a code owner May 13, 2025 17:15
@tuunit tuunit force-pushed the fix-readiness-check branch from c014b45 to 6ec9c0d Compare May 24, 2025 14:16
Copy link
Member

@tuunit tuunit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation wise looks good. Please add an entry to the changelog. Run make generate and commit the update to the alpha config documentation. Add the newly introduced flag / option to the configuration.

https://github.com/oauth2-proxy/oauth2-proxy/blob/master/docs/docs/configuration/overview.md

@tuunit tuunit changed the title Add graceful shutdown to prevent 502 errors when oauth2-proxy is load balanced feat: graceful shutdown to prevent 502 errors when oauth2-proxy is load balanced May 24, 2025
@github-actions github-actions bot added the docs label May 27, 2025
@adelsz
Copy link
Author

adelsz commented May 27, 2025

Thanks for the review @tuunit and @dolmen.
I have addressed your comments in the last commit.

@adelsz adelsz requested a review from tuunit May 28, 2025 15:30
@tuunit tuunit force-pushed the fix-readiness-check branch from f611ac0 to c1edeab Compare May 29, 2025 16:14
// Duration of time to continue serving traffic after receiving an exit signal.
// During this time the readiness endpoint will be returning HTTP 503 errors.
// Leave blank to disable.
ShutdownDuration time.Duration
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to check what impact this will have on the AlphaConfig 🤔

@tuunit tuunit force-pushed the fix-readiness-check branch from c1edeab to 4695366 Compare May 29, 2025 16:19

- [#3072](https://github.com/oauth2-proxy/oauth2-proxy/pull/3072) feat: support for multiple github orgs #3072 (@daniel-mersch)
- [#3068](https://github.com/oauth2-proxy/oauth2-proxy/pull/3068) feat: graceful shutdown to prevent errors when oauth2-proxy is load balanced (@adelsz)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The breaking change in the API surface of the oauthproxy package should probably be also mentioned.

Also, signal handling is now managed in the main package (which is better in my opinion), not anymore in oauthproxy.

@tuunit What do you think?

handler: opts.Handler,
handler: opts.Handler,
shutdownDuration: opts.ShutdownDuration,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a check about negative Shutdown duration to fail early with a meaningful error.

if err := srv.Shutdown(context.Background()); err != nil {
return fmt.Errorf("error shutting down server: %v", err)
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should handle differently if the server couldn't start...

Comment on lines +224 to +226
logger.Printf("Context canceled. Waiting %s before shutting down the listeners.", s.shutdownDuration)
time.Sleep(s.shutdownDuration)
logger.Printf("Shutting down listener.")
Copy link
Contributor

@dolmen dolmen May 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add an if s.shutdownDuration > 0 check around those lines.

Copy link

@haitch haitch Aug 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need this time.Sleep, it's going to delay things without any help.

srv.Shutdown is a blocking call, it will ensure inflight request drained before it finish.

All we need to do is make sure process won't exit before srv.shutdown finish, quotes from https://pkg.go.dev/net/http#Server.Shutdown

Shutdown gracefully shuts down the server without interrupting any active connections. Shutdown works by first closing all open listeners, then closing all idle connections, and then waiting indefinitely for connections to return to idle and then shut down. If the provided context expires before the shutdown is complete, Shutdown returns the context's error, otherwise it returns any error returned from closing the Server's underlying Listener(s).

When Shutdown is called, Serve, ServeTLS, ListenAndServe, and ListenAndServeTLS immediately return ErrServerClosed. Make sure the program doesn't exit and waits instead for Shutdown to return.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@haitch This PR isn't about draining inflight requests, that works correctly already.

Please see #3066 for the motivation.

Long story short:
Load balancer health checks are periodical, so you must keep the service running and actively serving traffic while reporting a bad health check, otherwise requests will be lost by the load balancer.


// NewOAuthProxy creates a new instance of OAuthProxy from the options provided
func NewOAuthProxy(opts *options.Options, validator func(string) bool) (*OAuthProxy, error) {
func NewOAuthProxy(ctx context.Context, opts *options.Options, validator func(string) bool) (*OAuthProxy, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this API change shouldn't be necessary.

Giving the server context should stay in the Start method.

What about moving the call to buildPreAuthChain in Start?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, passing ctx to a constructor is weird to me.

readynesscheck should better be a background loop than a middleware, middleware should be processed on every request.

I think a proper way is to construct readynesscheck in main with root ctx, pass into this newOAuthProxy and to buildPreAuthChain.

@tuunit tuunit force-pushed the fix-readiness-check branch from 4695366 to 5630b4d Compare July 16, 2025 19:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Incorrect readiness probe behaviour

4 participants