2

I have an authoritative DNS server for example.com at ns1.example.com. It has the following subdomain defined (via RFC 1035 style thing):

sub IN A 198.51.100.10
sub IN A 203.0.113.20
sub IN A 203.0.113.30

When I query my own server repeatedly via dig sub.example.com @ns1.example.com I get records as defined every time:

;; ANSWER SECTION:
sub.example.com.    900 IN  A   198.51.100.10
sub.example.com.    900 IN  A   203.0.113.20
sub.example.com.    900 IN  A   203.0.113.30

When I query without specifying the server (which means it goes to 8.8.8.8 for example, if defined in the OS), I get them in random order every time:

;; ANSWER SECTION:
sub.example.com.    900 IN  A   203.0.113.30
sub.example.com.    900 IN  A   198.51.100.10
sub.example.com.    900 IN  A   203.0.113.20
;; ANSWER SECTION:
sub.example.com.    900 IN  A   203.0.113.20
sub.example.com.    900 IN  A   203.0.113.30
sub.example.com.    900 IN  A   198.51.100.10

How do I stop this from happening and make that caching proxy reply with actual record order?

As I understand they are trying to do this DNS LB thing, but in this instance I simply don't need it. I need the first IP to always be used unless it's down, in which case some clients will try the second.

This behavior also shows up on 1.1.1.1 which is Cloudflare, not Google. So it seems like it's quite common.

The second and third IPs have degraded quality of service (slower everything), I don't need them to show up on equal terms for clients.

3
  • 2
    The randomisation is working as designed. If an auth server isn't randomising, it's arguably malfunctioning. Maybe you want to look at SRV records instead? Or just post one A record with a very short TTL, and do something to swap the DNS when it becomes unavailable. (For example, putting the DNS server on a VIP that migrates, to the available host, and have each host only publish its own address in an A record.) Commented 2 days ago
  • 1
    I need the first IP to always be used unless it's down, in which case some clients will try the second. that should not be relied on either. There is nothing to stop a consumer from using all of the addresses at the same time, and selecting the first result. Or a consumer may also select any address from the collection, and hitting a failed server and not retrying anything, which is also "quite common". Commented yesterday
  • If a client has some preference of ordering, it should implement it; you could also use a local cache that does that: Once I had set up a cache that returned the net first that matched the requesting net (the interface the request arrived), for example. Commented yesterday

2 Answers 2

19

You cannot stop DNS round-robin with standard A/AAAA records

Resolvers operate as defined in RFC 1034, 3.6, and this behavior is fixed by the DNS data model:

3.6. Resource Records

A domain name identifies a node. Each node has a set of resource information, which may be empty. The set of resource information associated with a particular name is composed of separate resource records (RRs). The order of RRs in a set is not significant, and need not be preserved by name servers, resolvers, or other parts of the DNS.

As a consequence, the ordering of records within an RRset carries no semantic meaning and is not under the control of authoritative servers.

Some RR types, including those defined in the original RFC 1034/1035 era, define explicit selection or preference mechanisms that are independent of DNS response ordering:

Round-robin DNS emerged as a practical, implementation-level technique for load distribution that exploits the fact that RRset ordering is operationally unconstrained by the protocol. Its historical use and early implementations are described in RFC 1794, 2 and in Schemers, R. (1995), lbnamed: A Load Balancing Name Server in Perl.

While SRV records are intended to address the ordering or preference problem, they do so only at the application-protocol level. They are not a drop-in replacement for A/AAAA records and cannot fix applications that only perform address lookups.

Active failover using DNS

What you are asking for is not DNS load balancing, but active failover, and it cannot be achieved with standard A/AAAA records alone. To get the behavior "always return the primary address unless it is down," the authoritative name server itself would have to continuously monitor the health of the primary service and dynamically change its responses. In normal operation it would return only the primary IP address; if the service is detected as unavailable, it would instead return the secondary address.

This approach requires a custom or specialized authoritative DNS server, very short TTLs to minimize caching, and acceptance that failover will still be limited by resolver cache behavior. Standard DNS cannot guarantee strict primary-first semantics without such active, stateful logic. Furthermore, some resolvers may ignore the TTLs provided by the authoritative server, which can delay failover.

No major open-source DNS server provides active health-checked failover out of the box, but it can be built by combining DNS servers with automation. Some hosted DNS providers and cloud services offer integrated monitoring and automatic updates to DNS records when a backend becomes unhealthy, providing a managed solution for this use case.

Possible HTTPS service failover on modern browsers

If the service happens to be HTTPS, multiple HTTPS records with different SvcPriority values might provide a form of client-side active failover. For example, suppose sub.example.com has three servers:

sub.example.com. IN HTTPS 1 primary.example.com. alpn="h2"
sub.example.com. IN HTTPS 2 failover1.example.com. alpn="h2"
sub.example.com. IN HTTPS 3 failover2.example.com. alpn="h2"

primary.example.com.   IN A 198.51.100.10
failover1.example.com. IN A 203.0.113.20
failover2.example.com. IN A 203.0.113.30

As per RFC 9460, 2.4.1, compliant post-2023 browsers should attempt the highest-priority target first (the one with the lowest SvcPriority value, primary.example.com) and only try the lower-priority targets (failover1.example.com, failover2.example.com) if the first connection fails at the transport or TLS level. This creates a primary-first failover pattern at the connection level. Note that application-layer errors (e.g., HTTP 500) do not trigger retries.

However, this method only works for HTTPS services. As of December 2025, major browsers such as Chrome, Firefox, and Safari support HTTPS record lookups, so HTTPS records are already widely usable in practice. Generic SVCB records for other protocols, by contrast, are still rarely acted on by clients. For non‑HTTPS services, or to achieve strict DNS-level primary-first behavior, a health-aware authoritative server with very short TTLs is still required.

8
  • 1
    SVCB and HTTPS RRs defined in RFC 9460 also have priority. Commented Dec 16 at 17:04
  • @AlexD: I’m aware of RFC 9460, but I left it out since it’s not widely adopted or considered consensus. I’ll admit a bit of skepticism: it’s aimed at a single HTTPS/TLS protocol rather than general DNS, and mostly replicates SRV—so it feels like a “not invented here” redo rather than a broadly useful improvement. 😉 Commented Dec 16 at 17:25
  • SVCB is protocol-agnostic, and HTTPS RR is a specialized mapping of SVCB tailored for HTTP. All major browsers have at least partial support for HTTPS RR, with Safari using it by default. Cloudflare 1.1.1.1 reports that HTTPS RRs are 8% of incoming queries. 25% of the top 1M domains had HTTPS RR in 2023 (mostly Cloudflare setting them by default). Commented 2 days ago
  • @AlexD: Cloudflare’s adoption has made HTTPS records widely usable, which is great. Architecturally, though, it could have been more elegant: SVCB could have used a port SvcParam instead of relying on _port subdomains, and HTTPS could have been just a proto=HTTPS SvcParam within the same SVCB record rather than a separate RR type. That design would have been protocol-agnostic, name-preserving, and extensible to other services, while still solving a distinct problem from SRV—conveying how to connect rather than just where. Commented 2 days ago
  • @EsaJokinen: I assume SVCB like SRV uses _proto subdomains because if you were to just put all of the SVCB records under the same name, you'd end up with great many of them in a single response (as you cannot limit query by SvcParam), and that by itself might limit adoption. (That is, it would only be efficient as long as it was aimed at a single HTTPS/TLS protocol (or two)...) Commented 2 days ago
5

A records have no priority field, so there's no way to stop a downstream DNS server or a caching proxy from reordering the records.

Likewise, there is no mechanism to explicitly tell a client to always use the first IP address from a query, even though most do.

See RFC 1035 for details.

7
  • I believe RFC 1035 is not the best reference for this as it doesn't explicitly explain round-robin. Commented Dec 16 at 16:00
  • So it has no priority field meaning the order matters and yet everyone decided to just shuffle them breaking the order that actually matters. Nice. Are there any other setups that could achieve the use-case outlined in the last paragraph? Commented Dec 16 at 16:25
  • 1
    actually rfc 1035 explicitly states that resolvers should not rely on implicit ordering (because UDP). So even if you can convince your DNS server to always return them in the same order and only use TCP, they could still be consumed in a different order. Commented Dec 16 at 16:48
  • 1
    @Gear54rus: in general, you should not rely on the order of any objects in a collection unless you have an assurance by contract. Commented 2 days ago
  • 6
    @Gear54rus No, RFC 1034 3.6 is very clear in stating that the order carries no meaning whatsoever. Trying to rely on any specific order is an obvious bug. Recursive resolvers are free to use whatever mechanism they want to store the rrsets, and there's not even the slightest hint anywhere in the standard that it would be desirable to preserve some specific order. (DNSSEC contains a mechanism to internally sort any rrset into a defined order for signing purposes, and that order is not the order you use in your zone fine.) Commented 2 days ago

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.