Handle MOVED error pointing to same endpoint. by barshaul · Pull Request #3003 · StackExchange/StackExchange.Redis

barshaul · 2026-02-03T12:22:30Z

Handle MOVED error pointing to same endpoint

Problem

When Redis/Valkey servers are deployed behind DNS records, load balancers, or proxies (as is common in managed environments), a MOVED error may be returned with a target node endpoint that is the same endpoint from which the error originated. Before the proposed change, when StackExchange.Redis received a MOVED error pointing to the same endpoint, it would fail the request and propagate the MOVED exception back to the application.

Issue ref: #2990

Solution

When a MOVED error points to the same endpoint, the client now triggers a reconnection before retrying the command. This allows the DNS record, proxy, or load balancer to route the new connection to a different underlying server host, enabling the retry to succeed.

Why proactive reconnection is necessary

The server closes the connection on its end immediately after sending the MOVED-to-same-endpoint error. However, due to the multiplexed nature of StackExchange.Redis connections, the library wouldn't notice the disconnection when it retries the request—all Write and Flush operations return successfully even though the underlying connection is broken. The disconnection is only detected when attempting to read the response.

At that point, it isn't safe to retry on the connection failure: we don't know if the request was actually sent to the server, so the connection error is raised back to the application, which also cannot determine if it's safe to retry. By initiating a reconnect immediately following the MOVED-to-same-endpoint error, we ensure the retry occurs on a fresh connection to a new server host.

Changes

Core fix:

ResultProcessor.cs: Detect when MOVED endpoint matches the current server endpoint and trigger reconnection by disposing the existing connection.

Tests:

MovedToSameEndpointTests.cs: Integration test verifying the reconnect-and-retry behavior
MovedTestServer.cs: Test server helper that simulates MOVED responses pointing to the same endpoint

…on before retrying the request.

mgravell · 2026-02-03T15:48:44Z

src/StackExchange.Redis/ResultProcessor.cs

                        {
-                            if (bridge is null)
-                            {
-                                // already toast
-                            }
-                            else if (bridge.Multiplexer.TryResend(hashSlot, message, endpoint, isMoved))
+                            // MOVED to same endpoint detected.
+                            // This occurs when Redis/Valkey servers are behind DNS records, load balancers, or proxies.
+                            // The MOVED error signals that the client should reconnect to allow the DNS/proxy/load balancer
+                            // to route the connection to a different underlying server host, then retry the command.
+                            bridge?.TryConnect(null)?.Dispose();


This seems like it relies on timing between the connection being detected as closed, and the parser - but the parser is IIRC "inline" here, i.e. we haven't finished reading yet. I need to think very carefully here about whether this is reliable ... I"m not sure either way, and I invite your input!

additional concern: from memory (@NickCraver may remember more), DNS resolution in .NET has a habit of being cached for the process duration, unless explicit steps are taken - and I'm not seeing any explicit steps. I'm concerned that this may result in instant reconnection on the old cached (in-proc) DNS entry; definitely something to check

@philon-msft tells me that the DNS part might be safer than I recall - will discuss, but: might not be a problem

@mgravell regarding your first concern: I added a _needsReconnect flag to the bridge so that we only mark the connection as requiring a reconnect. This ensures that a retried command is queued rather than buffered to the old connection, deferring the actual reconnection to the reader loop in the case of a MOVED-to-same-endpoint scenario.

re: the DNS caching concern:

I created a simple test using a Route53 DNS record:

Connect to host A (via the Route53 DNS hostname), SET a key, verify it's stored

Change the DNS record to point to a different host B

Create a new connection (via same hostname), verify the key doesn't exist

The test passed - the second connection correctly resolved to the new host. I don't see DNS caching issues. Additionally, as I mentioned in #2990, if DNS caching were an issue with SE.Redis clients, we at ElastiCache would have already encountered it with our customers - our managed clusters logic relies heavily on DNS changes for failover and scaling operations.

…D-to-same-endpoint

barshaul added 3 commits February 3, 2026 13:32

Handle MOVED error pointing to same endpoint by triggering reconnecti…

fae5c09

…on before retrying the request.

Better stimulate proxy/LB

4200efe

Increase timeout

9c32b44

barshaul force-pushed the moved_same_endpoint branch from 948a601 to 9c32b44 Compare February 3, 2026 13:35

Fixed key name to prevent collisions

93609b0

barshaul marked this pull request as ready for review February 3, 2026 13:50

mgravell reviewed Feb 3, 2026

View reviewed changes

Add NeedsReconnect flag to defer reconnection to reader loop for MOVE…

3ef0120

…D-to-same-endpoint

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle MOVED error pointing to same endpoint.#3003

Handle MOVED error pointing to same endpoint.#3003
barshaul wants to merge 5 commits intoStackExchange:mainfrom
barshaul:moved_same_endpoint

barshaul commented Feb 3, 2026 •

edited

Loading

Uh oh!

mgravell Feb 3, 2026

Uh oh!

mgravell Feb 3, 2026

Uh oh!

mgravell Feb 3, 2026

Uh oh!

barshaul Feb 5, 2026

Uh oh!

barshaul Feb 5, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

barshaul commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!