Problem
QueryHandler.async_response builds known_answers_set = DNSRRSet(answers).lookup_set() from the union of every TC-deferred fragment's answers (up to _MAX_DEFERRED_PER_ADDR = 16 packets × ~750 records per packet ≈ 12 000 DNSRecord instances) and stores the entire set object by reference in QuestionHistory._history[question] via add_question_at_time. The dict is capped at _MAX_QUESTION_HISTORY_ENTRIES = 10000 (commit 0e5e637), but each entry's set has no size bound. In practice the entry count is naturally limited to roughly registered_services × question_types_we_answer, but each replacement of an existing entry under steady-state attack still pins ~12 000 unique freshly-allocated DNSRecord objects for up to _DUPLICATE_QUESTION_INTERVAL = 999 ms. With _CACHE_CLEANUP_INTERVAL = 10 s, stale entries can linger up to ~11 s before async_expire reclaims them. A LAN attacker sustaining TC-fragmented queries (one valid question matching a real service + many unique known-answers per query) can hold peak memory of hundreds of megabytes on small devices (Home Assistant on a Pi). The per-record retention is invisible to _MAX_CACHE_RECORDS because the records never enter the cache — known-answers are not cached.
Why This Matters
This is a memory-pressure DoS vector that is not closed by the existing cache cap or question-history entry cap. The recent OOM hardening passes have addressed the cache, the question-history entry count, the recent-packets dedup, the deferred-queue size, and the seen-logs dict — but the known-answers payload retained per question-history entry slipped through. Impact is degraded performance / OOM on memory-constrained mDNS responders, which is exactly the deployment shape (HA, smart-home hubs) python-zeroconf is widely used in.
Suggested Fix
Cap the size of known_answers_set stored per entry to a small fixed number (e.g. 256 — well above any RFC-realistic known-answer list for a single question, but small enough that 10 000 × 256 × sizeof(DNSRecord) stays in low MB). Do the trim at the add_question_at_time boundary rather than at lookup_set() time so the in-flight suppression logic still sees the full set during this query, but the persisted snapshot is bounded:
def add_question_at_time(self, question, now, known_answers):
if len(known_answers) > _MAX_KNOWN_ANSWERS_PER_HISTORY_ENTRY:
# Snapshot a bounded subset for duplicate-suppression purposes only
snapshot = set(islice(known_answers, _MAX_KNOWN_ANSWERS_PER_HISTORY_ENTRY))
else:
snapshot = known_answers
...
self._history[question] = (now, snapshot)
Note: the suppresses() check (not previous_known_answers - known_answers) tolerates a smaller stored set fine — the next query just won't be suppressed on the truncated portion, which is the conservative side of the trade-off.
Details
|
|
| Severity |
🟡 Medium |
| Category |
dos |
| Location |
src/zeroconf/_history.py:41-45 and src/zeroconf/_handlers/query_handler.py:347-356 |
| Effort |
⚡ Quick fix |
🤖 Created by Kōan from audit session
Problem
QueryHandler.async_responsebuildsknown_answers_set = DNSRRSet(answers).lookup_set()from the union of every TC-deferred fragment's answers (up to_MAX_DEFERRED_PER_ADDR = 16packets × ~750 records per packet ≈ 12 000DNSRecordinstances) and stores the entire set object by reference inQuestionHistory._history[question]viaadd_question_at_time. The dict is capped at_MAX_QUESTION_HISTORY_ENTRIES = 10000(commit 0e5e637), but each entry's set has no size bound. In practice the entry count is naturally limited to roughlyregistered_services × question_types_we_answer, but each replacement of an existing entry under steady-state attack still pins ~12 000 unique freshly-allocatedDNSRecordobjects for up to_DUPLICATE_QUESTION_INTERVAL = 999 ms. With_CACHE_CLEANUP_INTERVAL = 10 s, stale entries can linger up to ~11 s beforeasync_expirereclaims them. A LAN attacker sustaining TC-fragmented queries (one valid question matching a real service + many unique known-answers per query) can hold peak memory of hundreds of megabytes on small devices (Home Assistant on a Pi). The per-record retention is invisible to_MAX_CACHE_RECORDSbecause the records never enter the cache — known-answers are not cached.Why This Matters
This is a memory-pressure DoS vector that is not closed by the existing cache cap or question-history entry cap. The recent OOM hardening passes have addressed the cache, the question-history entry count, the recent-packets dedup, the deferred-queue size, and the seen-logs dict — but the known-answers payload retained per question-history entry slipped through. Impact is degraded performance / OOM on memory-constrained mDNS responders, which is exactly the deployment shape (HA, smart-home hubs) python-zeroconf is widely used in.
Suggested Fix
Cap the size of
known_answers_setstored per entry to a small fixed number (e.g. 256 — well above any RFC-realistic known-answer list for a single question, but small enough that 10 000 × 256 ×sizeof(DNSRecord)stays in low MB). Do the trim at theadd_question_at_timeboundary rather than atlookup_set()time so the in-flight suppression logic still sees the full set during this query, but the persisted snapshot is bounded:Note: the
suppresses()check (not previous_known_answers - known_answers) tolerates a smaller stored set fine — the next query just won't be suppressed on the truncated portion, which is the conservative side of the trade-off.Details
src/zeroconf/_history.py:41-45 and src/zeroconf/_handlers/query_handler.py:347-356🤖 Created by Kōan from audit session