dnsdist: implement simple packet shuffle in cache #16108

karelbilek · 2025-09-10T10:53:07Z

Short description

I have implemented a simple packet shuffle for dnsdist packet cache.

I want to use dnsdist as a local stub resolver, similar to dnsmasq. Unlike dnsmasq, which does do shuffling of A responses, dnsdist doesn't.

This is a bit complicated, as dnsdist just saves the packets directly and not the A/AAAA records. I figured out I could just take the packets and swap the memory of the A records directly in a narrow number of cases, but this actually covers most of cases where shuffling is needed.

The issue with directly swapping memory of records is that DNS compression can break. I have narrowed the cases where I can safely do shuffle:

there are only As or AAAAs in the answer section, with the exception of non-A/AAAAs at the start of the section
authority section is empty
- opt pseudosection doesn't need to be empty, as that cannot have a domain name/pointer - is this a correct assumption?
none of the As/AAAAs have compression pointer in the domain section that points before the As/AAAAs start

The shuffle itself is using Fisher-Yates, same algorithm that std::shuffle uses; it is using std::swap_ranges in most cases (same length), or std::swap_ranges and one std::rotate (which can have linear complexity).

The direct memory swapping in Mangler still feels a little "icky"; sorry 😞

I have added documentation; I didn't want to be too concrete in what packets are actually shuffled, as that could be changed eventually.

I have not yet added any tests, which I definitely will, if you think this approach is an appropriate approach. (And if I figure how to deterministically seed the random shuffle in a test.)

Checklist

I have:

read the CONTRIBUTING.md document
read and accepted the Developer Certificate of Origin document, including the AI policy, and added a "Signed-off-by" to my commits
compiled this code
tested this code
- tested random queries, they work
included documentation (including possible behaviour changes)
documented the code
added or modified regression test(s)
added or modified unit test(s)

coveralls · 2025-09-10T11:42:25Z

Pull Request Test Coverage Report for Build 17765900968

Details

9 of 135 (6.67%) changed or added relevant lines in 6 files are covered.
34 unchanged lines in 8 files lost coverage.
Overall coverage decreased (-0.04%) to 65.987%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
pdns/dnsdistdist/dnsdist-cache.cc	1	4	25.0%
pdns/dnsparser.hh	3	46	6.52%
pdns/dnsparser.cc	0	80	0.0%

Files with Coverage Reduction	New Missed Lines	%
pdns/dnsdistdist/dnsdist.cc	2	70.45%
pdns/opensslsigners.cc	2	61.0%
modules/gpgsqlbackend/spgsql.cc	3	67.94%
pdns/iputils.cc	3	56.95%
pdns/misc.cc	4	61.76%
pdns/recursordist/syncres.cc	5	81.2%
pdns/recursordist/test-syncres_cc1.cc	6	90.23%
pdns/dnsdistdist/dnsdist-carbon.cc	9	61.9%

Totals
Change from base Build 17761500969:	-0.04%
Covered Lines:	128910
Relevant Lines:	166619

💛 - Coveralls

coveralls · 2025-09-10T11:42:25Z

Pull Request Test Coverage Report for Build 17611467588

Details

9 of 120 (7.5%) changed or added relevant lines in 6 files are covered.
12 unchanged lines in 5 files lost coverage.
Overall coverage increased (+0.03%) to 66.01%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
pdns/dnsdistdist/dnsdist-cache.cc	1	4	25.0%
pdns/dnsparser.hh	3	46	6.52%
pdns/dnsparser.cc	0	65	0.0%

Files with Coverage Reduction	New Missed Lines	%
pdns/rcpgenerator.cc	2	90.2%
pdns/recursordist/rec-xfr.cc	2	73.94%
pdns/recursordist/syncres.cc	2	81.22%
modules/gpgsqlbackend/spgsql.cc	3	68.18%
pdns/recursordist/recpacketcache.hh	3	89.55%

Totals
Change from base Build 17610130088:	0.03%
Covered Lines:	128490
Relevant Lines:	166031

💛 - Coveralls

rgacogne · 2025-09-12T15:12:09Z

I haven't looked at the code yet, but reading your description I think this is only safe if the only compression pointers present point to the questions section, otherwise all bets are off. Technically I believe a compression pointer could point at the rdata of a A or AAAA (that doesn't make sense to me, to be clear, but I believe this is allowed).

omoerbeek · 2025-09-12T15:14:40Z

Also there's a check for the authority section, but not for the additional section. Why?

karelbilek · 2025-09-12T15:38:44Z

@rgacogne what happens often is that for example CNAME is followed by As and the As point to the CNAMEs; but even in that case, swapping As memory should be fine?

@omoerbeek I was under the impression that the additional section cannot have domain names and therefore cannot have pointers, so no need to disallow that. if this assumption is wrong then I will need to add that, too (and there can be no edns, which will make this a bit useless)

Habbie · 2025-09-12T15:41:55Z

but even in that case, swapping As memory should be fine?

it is theoretically possible that label compression points at A content 99.111.109.0 for a com. label. I don't know if any existing implementations would do it today, but it is allowed.

(that doesn't make sense to me, to be clear, but I believe this is allowed).

I believe it is. A fun project would be to see if any software chokes on it :-)

karelbilek · 2025-09-12T15:43:51Z

but even in that case I check that all the pointers point before the As start, so this should not be a problem.

…

On Fri 12. 9. 2025 at 17:42, Peter van Dijk ***@***.***> wrote: *Habbie* left a comment (PowerDNS/pdns#16108) <#16108 (comment)> but even in that case, swapping As memory should be fine? it is theoretically possible that label compression points at A content 99.111.109.0 for a com. label. I don't know if any existing implementations would do it today, but it is allowed. (that doesn't make sense to me, to be clear, but I believe this is allowed). I believe it is. A fun project would be to see if any software chokes on it :-) — Reply to this email directly, view it on GitHub <#16108 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAZT4OD37SYTDBMKPBCJOT3SLSVRAVCNFSM6AAAAACGD2RLXCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTEOBVG44DIMRZGA> . You are receiving this because you authored the thread.Message ID: ***@***.***>

Habbie · 2025-09-12T15:44:10Z

that sounds good

omoerbeek · 2025-09-12T15:45:01Z

@omoerbeek I was under the impression that the additional section cannot have domain names and therefore cannot have pointers, so no need to disallow that. if this assumption is wrong then I will need to add that, too (and there can be no edns, which will make this a bit useless)

Ehh, there can also be a compression pointer in the name field.

karelbilek · 2025-09-12T15:55:13Z

oh, I didn't know that, but I can check for that here at least.

…

On Fri 12. 9. 2025 at 17:45, Otto Moerbeek ***@***.***> wrote: *omoerbeek* left a comment (PowerDNS/pdns#16108) <#16108 (comment)> @omoerbeek <https://github.com/omoerbeek> I was under the impression that the additional section cannot have domain names and therefore cannot have pointers, so no need to disallow that. if this assumption is wrong then I will need to add that, too (and there can be no edns, which will make this a bit useless) Ehh, there can also be a compression pointer in the name field. — Reply to this email directly, view it on GitHub <#16108 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAZT4IFMBVOEPI6OSSIKUL3SLTBFAVCNFSM6AAAAACGD2RLXCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTEOBVG44TIMZSGU> . You are receiving this because you authored the thread.Message ID: ***@***.***>

omoerbeek · 2025-09-12T15:56:52Z

Its quite common to have A or AAAA's following a CNAME having names pointing to the rdata of the CNAME. Or the A and AAAA's of the additional section having names pointing to rdata in the answer or authority section.

karelbilek · 2025-09-16T11:53:40Z

@omoerbeek

Its quite common to have A or AAAA's following a CNAME having names pointing to the rdata of the CNAME. Or the A and AAAA's of the additional section having names pointing to rdata in the answer or authority section.

that should still work with this approach, nothing should break.

I have been reading what can be in additional section, and, for caution, have added check that there is only OPT or RRSIG.

pdns/dnsparser.cc

edmonds · 2025-09-16T15:43:04Z

Hello,

If I understand correctly, the goal is basically to shuffle the order of the RR's in an A or AAAA RRset appearing in the answer section? It seems like that could be done without paying any attention to name compression pointers or performing reordering on data that could potentially break compression pointers. Something like:

Locate the RR's in the answer section constituting the RRset to be reordered. This requires looking at the RRTYPE until you see A or AAAA and then successively comparing the owner names in subsequent RR's to previously seen values. Once you see a different name or type, the RRset has ended.
For each RR in the RRset to be reordered, record the offset of the first byte of the RDATA value, and copy out the RDATA value itself. (Require that the length of each RDATA value is equal to the length required by A or AAAA.)
Take the collection of (offset, RDATA value) pairs and shuffle the offsets, re-distributing them randomly among the available RDATA values.
Write the RDATA values back into the packet using the shuffled offsets.

I guess it could be extended to support multiple RRsets, or RRsets in different sections, etc.

This would only work for RRTYPEs that have fixed length RDATA values like A and AAAA, but that seems to be the scope of the original patch here.

If I am not mistaken this would avoid any issues related to name pointers moving around?

@rgacogne:

Technically I believe a compression pointer could point at the rdata of a A or AAAA (that doesn't make sense to me, to be clear, but I believe this is allowed).

@Habbie:

it is theoretically possible that label compression points at A content 99.111.109.0 for a com. label. I don't know if any existing implementations would do it today, but it is allowed.

I don't believe this is allowed. RFC 1035 section 4.1.4: "...an entire domain name or a list of labels at the end of a domain name is replaced with a pointer to a prior occurance of the same name." There are no domain names in the RDATA fields of the A and AAAA RRTYPEs. I think reinterpreting arbitrary non-name bytes of the message as if they were a wire format domain name is outside of the original specification, although admittedly STD 13 is a bit light on exactly what constitutes a valid compression offset.

Side note: the com. label requires five bytes to express so it won't fit in an A record's data :-)

Habbie · 2025-09-16T15:48:52Z

Side note: the com. label requires five bytes to express so it won't fit in an A record's data :-)

Sorry. Of course I meant some AAAA :D

admittedly STD 13 is a bit light on exactly what constitutes a valid compression offset.

It was a lighter time for specifications in general. But I agree that it probably is not a real concern.

omoerbeek · 2025-09-16T15:54:43Z

I was also pondering not shuffling each packet on the outgoing path, but the data in the packetcache: a swap of two rdata parts for a two random records in the rrset for each retrieval. This avoids a lot of copying.

omoerbeek · 2025-09-16T15:57:50Z

That way you can also store in the PC if a packet may be shuffled.

karelbilek · 2025-09-17T08:30:36Z

@edmonds that is a very good idea. I will need to see if it can be done without parsing the packet and recursively following the compressions - if we can easily tell that the names are the same - in the place I put the shuffling to, the simple parser doesn't follow the pointers. But yeah if it works, it's simpler than my current code

the big picture is do shuffling as recommended here - https://www.ietf.org/archive/id/draft-kerr-everybodys-shuffling-00.html . I am using dnsdist as a stub resolver; as written in the draft - "DNS Stub Resolvers SHOULD return RR within an RRset in random order." and "If these cache RRset, then they should also randomize." ; I just restricted this to A/AAAA as this was simpler.

karelbilek · 2025-09-19T07:15:36Z

@edmonds yeah as written now, I cannot do that easily, because I don't see the name at all (the packet mangler is very simple and just skips the domain name). I will see if I can put it to some other place

The shuffle is implementing by directly swapping pieces of RData memory in a single RRSet. Signed-off-by: Karel Bilek <kb@karelbilek.com>

karelbilek · 2025-09-19T13:23:51Z

I have changed the logic to what @edmonds suggested, except I still swap the RRData directly rather then copy to local container, then shuffle, then copy back; I found it simpler. (std::shuffle is using Fisher Yates anyway, same thing that I use here.) The whole code is simpler now, great.

I still change the packets in the outgoing path, not in the cache itself.

I am using the DNSName() parser, with uncompression. This could make the shuffle take long potentially, if there is too many redirects and too many records? I was not sure if it's better to uncompress or not here.

rgacogne

Thank you for opening this PR. I have a few nits but the code looks fine. We will definitely need test (unit ones, at least, ideally regression tests as well) before merging this. Let me know if you need help with that, of course!

rgacogne · 2025-09-26T09:23:12Z

pdns/dnsdistdist/docs/reference/config.rst

  * ``skipOptions={}``: Extra list of EDNS option codes to skip when hashing the packet (if ``cookieHashing`` above is false, EDNS cookie option number will be added to this list internally).
  * ``maximumEntrySize=4096``: int - The maximum size, in bytes, of a DNS packet that can be inserted into the packet cache. Default is 4096 bytes, which was the fixed size before 1.9.0, and is also a hard limit for UDP responses.
  * ``payloadRanks={}``: List of payload size used when hashing the packet. The list will be sorted in ascending order and searched to find a lower bound value for the payload size in the packet. If found then it will be used for packet hashing. Values less than 512 or greater than ``maximumEntrySize`` above will be discarded. This option is to enable cache entry sharing between clients using different payload sizes when needed.
+  * ``shuffle=false``: bool - Whether A and AAAA records should be shuffled when serving from cache, for load-balancing. The cache might not be always shuffled when the results are too complex.


Suggested change

* ``shuffle=false``: bool - Whether A and AAAA records should be shuffled when serving from cache, for load-balancing. The cache might not be always shuffled when the results are too complex.

* ``shuffle=false``: bool - Whether A and AAAA records should be shuffled when serving from cache, for load-balancing. The cache might not be shuffled if the cached packet is too complex for the simple parser used for this feature.

rgacogne · 2025-09-26T09:23:51Z

pdns/dnsdistdist/docs/reference/config.rst

  :param bool deferrableInsertLock: Whether the cache should give up insertion if the lock is held by another thread, or simply wait to get the lock
  :param int maxNegativeTTL: Cache a NXDomain or NoData answer from the backend for at most this amount of seconds, even if the TTL of the SOA record is higher
  :param bool parseECS: Whether any EDNS Client Subnet option present in the query should be extracted and stored to be able to detect hash collisions involving queries with the same qname, qtype and qclass but a different incoming ECS value. Enabling this option adds a parsing cost and only makes sense if at least one backend might send different responses based on the ECS value, so it's disabled by default
+  :param bool shuffle: Whether A and AAAA records should be shuffled when serving from cache, for load-balancing. The cache might not be always shuffled when the results are too complex.


This syntax has been deprecated since 1.4.0, no need to update it.

rgacogne · 2025-09-26T09:24:59Z

pdns/dnsdistdist/dnsdist-settings-definitions.yml

+    - name: "shuffle"
+      type: "bool"
+      default: "false"
+      description: "Whether to shuffle records in the cache."


Suggested change

description: "Whether to shuffle records in the cache."

description: "Whether A and AAAA records should be shuffled when serving from cache, for load-balancing. The cache might not be shuffled if the cached packet is too complex for the simple parser used for this feature."

rgacogne · 2025-09-26T09:51:29Z

pdns/dnsparser.cc

+      for (auto swapped = rrdata_indexes.size() - 1; swapped > 0; --swapped) {
+        auto swapped_with = dist(randomEngine, uid::param_type(0, swapped));
+        if (swapped != swapped_with) {
+          dpm.swapInPlace(rrdata_indexes[swapped], rrdata_indexes[swapped_with]);


Suggested change

dpm.swapInPlace(rrdata_indexes[swapped], rrdata_indexes[swapped_with]);

dpm.swapInPlace(rrdata_indexes.at(swapped), rrdata_indexes.at(swapped_with));

rgacogne · 2025-09-26T09:52:36Z

pdns/dnsparser.hh

    moveOffset(toskip);
  }

+  std::pair<uint32_t, uint32_t> skipAndReturnIndexesRData()


Suggested change

std::pair<uint32_t, uint32_t> skipAndReturnIndexesRData()

std::pair<uint32_t, uint32_t> skipRDataAndReturnOffsets()

rgacogne · 2025-09-26T09:54:18Z

pdns/dnsparser.cc

+
+    // for now shuffle only first rrset, only As and AAAAs
+    uint16_t rrset_type = 0;
+    DNSName rrset_dnsname = DNSName();


Suggested change

DNSName rrset_dnsname = DNSName();

DNSName rrset_dnsname{};

rgacogne · 2025-09-26T09:55:05Z

pdns/dnsparser.cc

+    return;
+  }
+  try {
+    DNSPacketMangler dpm(packet, length);


The creation of the DNSPacketMangler object can be moved after the ancount == 1 check.

karelbilek force-pushed the kb/shuffle branch from b6eb1a7 to 7346e49 Compare September 10, 2025 10:57

miodvallat added the dnsdist label Sep 10, 2025

karelbilek force-pushed the kb/shuffle branch from 7346e49 to d2b1a7b Compare September 10, 2025 11:57

karelbilek force-pushed the kb/shuffle branch from b3350d0 to 3ec864b Compare September 16, 2025 11:52

karelbilek force-pushed the kb/shuffle branch 2 times, most recently from 87f011d to 4059c5f Compare September 16, 2025 12:33

omoerbeek reviewed Sep 16, 2025

View reviewed changes

pdns/dnsparser.cc Outdated Show resolved Hide resolved

karelbilek force-pushed the kb/shuffle branch from 4059c5f to 73f93e2 Compare September 19, 2025 11:15

dnsdist: implement simple packet shuffle in cache

db6b5d5

The shuffle is implementing by directly swapping pieces of RData memory in a single RRSet. Signed-off-by: Karel Bilek <kb@karelbilek.com>

karelbilek force-pushed the kb/shuffle branch from 73f93e2 to db6b5d5 Compare September 19, 2025 12:56

rgacogne self-requested a review September 26, 2025 09:21

rgacogne added this to the dnsdist-2.1.0 milestone Sep 26, 2025

rgacogne added the enhancement label Sep 26, 2025

rgacogne requested changes Sep 26, 2025

View reviewed changes

	* ``shuffle=false``: bool - Whether A and AAAA records should be shuffled when serving from cache, for load-balancing. The cache might not be always shuffled when the results are too complex.
	* ``shuffle=false``: bool - Whether A and AAAA records should be shuffled when serving from cache, for load-balancing. The cache might not be shuffled if the cached packet is too complex for the simple parser used for this feature.

	description: "Whether to shuffle records in the cache."
	description: "Whether A and AAAA records should be shuffled when serving from cache, for load-balancing. The cache might not be shuffled if the cached packet is too complex for the simple parser used for this feature."

	dpm.swapInPlace(rrdata_indexes[swapped], rrdata_indexes[swapped_with]);
	dpm.swapInPlace(rrdata_indexes.at(swapped), rrdata_indexes.at(swapped_with));

	std::pair<uint32_t, uint32_t> skipAndReturnIndexesRData()
	std::pair<uint32_t, uint32_t> skipRDataAndReturnOffsets()

dnsdist: implement simple packet shuffle in cache #16108

Are you sure you want to change the base?

dnsdist: implement simple packet shuffle in cache #16108

Uh oh!

Conversation

karelbilek commented Sep 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Short description

Checklist

Uh oh!

coveralls commented Sep 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Test Coverage Report for Build 17765900968

Details

💛 - Coveralls

Uh oh!

coveralls commented Sep 10, 2025

Pull Request Test Coverage Report for Build 17611467588

Details

💛 - Coveralls

Uh oh!

rgacogne commented Sep 12, 2025

Uh oh!

omoerbeek commented Sep 12, 2025

Uh oh!

karelbilek commented Sep 12, 2025

Uh oh!

Habbie commented Sep 12, 2025

Uh oh!

karelbilek commented Sep 12, 2025 via email

Uh oh!

Habbie commented Sep 12, 2025

Uh oh!

omoerbeek commented Sep 12, 2025

Uh oh!

karelbilek commented Sep 12, 2025 via email

Uh oh!

omoerbeek commented Sep 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

karelbilek commented Sep 16, 2025

Uh oh!

Uh oh!

edmonds commented Sep 16, 2025

Uh oh!

Habbie commented Sep 16, 2025

Uh oh!

omoerbeek commented Sep 16, 2025

Uh oh!

omoerbeek commented Sep 16, 2025

Uh oh!

karelbilek commented Sep 17, 2025

Uh oh!

karelbilek commented Sep 19, 2025

Uh oh!

karelbilek commented Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rgacogne left a comment

Choose a reason for hiding this comment

Uh oh!

rgacogne Sep 26, 2025

Choose a reason for hiding this comment

Uh oh!

rgacogne Sep 26, 2025

Choose a reason for hiding this comment

Uh oh!

rgacogne Sep 26, 2025

Choose a reason for hiding this comment

Uh oh!

rgacogne Sep 26, 2025

Choose a reason for hiding this comment

Uh oh!

rgacogne Sep 26, 2025

Choose a reason for hiding this comment

Uh oh!

rgacogne Sep 26, 2025

Choose a reason for hiding this comment

Uh oh!

rgacogne Sep 26, 2025

karelbilek commented Sep 10, 2025 •

edited

Loading

coveralls commented Sep 10, 2025 •

edited

Loading

omoerbeek commented Sep 12, 2025 •

edited

Loading

karelbilek commented Sep 19, 2025 •

edited

Loading