Multi-host failover causes user-facing connection delays after HostRecheckSeconds

We are currently using Npgsql with a multi-host configuration for PostgreSQL failover (no load balancing). While testing failure scenarios we observed behavior which can introduce significant latency for user requests.

It appears that the current host recheck mechanism may cause user requests to block on connection attempts to a previously failed host, even though a healthy host is available.

---

### Configuration

Example connection string:


Host=server1,server2;
load_balance_hosts=false;
Host Recheck Seconds=5;
Timeout=2;
target_session_attrs=primary
Maximum Pool Size=5;


Scenario assumptions:

- `server1` = primary host
- `server2` = failover host
- no load balancing
- application uses the default connection pooling

---

### Observed behavior

1. Application initially connects to `server1`.
2. `server1` becomes unavailable.
3. A connection attempt fails and `server1` is marked as down.
4. The driver starts using `server2` successfully.
5. After `HostRecheckSeconds` (e.g. 5 seconds), the driver tries `server1` again.

If `server1` is still unavailable, multiple concurrent requests attempt to connect to it again.

Because the connection attempt waits for the configured `Timeout`, these requests block for the full timeout duration before falling back to `server2`.

Example timeline:


Hosts: server1, server2
HostRecheckSeconds = 5
Timeout = 2


If `server1` is still down:

- every 5 seconds the driver retries `server1`
- concurrent connection attempts wait up to 2 seconds
- only afterwards do they fall back to `server2`

This results in user requests experiencing ~2 seconds of latency, even though `server2` is fully healthy and could respond immediately.

With higher concurrency (e.g. `Maximum Pool Size`), many requests may be delayed simultaneously.

---

### Expected behavior

From an application perspective, a preferable strategy would be:

- once a host is marked as down, continue using the healthy host
- periodically probe the failed host in the background
- avoid blocking user requests on retry attempts to the failed host

This would prevent unnecessary latency spikes when one host remains unavailable.

---

### Question

Is the current behavior intentional?

If so, is there a recommended way to avoid user-facing delays when using multi-host failover without load balancing?

Alternatively, would it make sense to introduce an option to probe failed hosts in the background rather than during user connection attempts?

---

### Environment

- Npgsql version: 8.0.6
- .NET SDK-version: 8.0.418

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-host failover causes user-facing connection delays after HostRecheckSeconds #6483

Configuration

Observed behavior

Expected behavior

Question

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Multi-host failover causes user-facing connection delays after HostRecheckSeconds #6483

Description

Configuration

Observed behavior

Expected behavior

Question

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions