Skip to content

dedup dns children by (rdtype, child) not parent host#3126

Open
liquidsec wants to merge 4 commits into
devfrom
dns-children-dedup-key
Open

dedup dns children by (rdtype, child) not parent host#3126
liquidsec wants to merge 4 commits into
devfrom
dns-children-dedup-key

Conversation

@liquidsec
Copy link
Copy Markdown
Collaborator

Summary

DNSResolve.emit_dns_children deduped its outgoing DNS_NAME events with the key (parent_host, rdtype, child_host). Including the parent host in the key meant the same out-of-scope NS/SOA/MX/CNAME value was re-emitted once per in-scope parent that referenced it.

In a recent scan of ~100 corporate domains, this produced:

rdtype emitted unique duplicates
NS 12,072 592 11,480
SOA 3,524 203 3,321
MX 871 123 748
CNAME 854 451 403
TXT 331 36 295
SRV 96 34 62
PTR 96 90 6
TOTAL 17,844 1,529 16,315

beth.ns.cloudflare.com alone was emitted 1,518 times. With Cloudflare and MarkMonitor concentrating many zones onto the same NS pair, the multiplier on consolidated providers is severe.

The 16K duplicate distance-1 events all flowed into every scan module that watches DNS_NAME (dnsbrute, dnscommonsrv, wayback, hunterio, sslcert, excavate, …). Each module had to pull them off its incoming queue, hash them for dedup, run scope and filter_event, and then drop them — serializing queue throughput and slowing scans.

Fix

Drop event.host from the dedup hash. The same (rdtype, child) pair across different parents is the same child event semantically.

- child_hash = hash(f"{event.host}:{module}:{child_host}")
+ child_hash = hash(f"{module}:{child_host}")

Adds TestDNSResolveSharedNameserverDedup to catch regressions: three in-scope domains share an NS/SOA pair; the test asserts each shared nameserver hostname is emitted exactly once across all parents.

emit_dns_children was keyed on (parent_host, rdtype, child_host), so the same
out-of-scope nameserver was re-emitted once per in-scope parent that referenced
it. With concentrated providers like Cloudflare or MarkMonitor this multiplied
NS/SOA emissions by 1000x+ and flooded downstream module queues.
@liquidsec liquidsec self-assigned this May 22, 2026
@liquidsec liquidsec added bug Something isn't working high-priority labels May 22, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 22, 2026

🚀 Performance Benchmark Report

⚠️ No current benchmark data available

This might be because:

  • Benchmarks failed to run
  • No benchmark tests found
  • Dependencies missing

both tests relied on the (parent, rdtype, child) emit_dns_children dedup to
produce duplicate DNS_NAME/IP_ADDRESS edges across parents. with the new
(rdtype, child) dedup these collapse to a single emission.
@codecov
Copy link
Copy Markdown

codecov Bot commented May 22, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 90%. Comparing base (feacaba) to head (f8d72a7).

Additional details and impacted files
@@          Coverage Diff          @@
##             dev   #3126   +/-   ##
=====================================
+ Coverage     90%     90%   +1%     
=====================================
  Files        441     441           
  Lines      38743   38757   +14     
=====================================
+ Hits       34663   34677   +14     
  Misses      4080    4080           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working high-priority

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant