Skip to content

Conversation

karelbilek
Copy link
Contributor

@karelbilek karelbilek commented Sep 10, 2025

Short description

I have implemented a simple packet shuffle for dnsdist packet cache.

I want to use dnsdist as a local stub resolver, similar to dnsmasq. Unlike dnsmasq, which does do shuffling of A responses, dnsdist doesn't.

This is a bit complicated, as dnsdist just saves the packets directly and not the A/AAAA records. I figured out I could just take the packets and swap the memory of the A records directly in a narrow number of cases, but this actually covers most of cases where shuffling is needed.

The issue with directly swapping memory of records is that DNS compression can break. I have narrowed the cases where I can safely do shuffle:

  • there are only As or AAAAs in the answer section, with the exception of non-A/AAAAs at the start of the section
  • authority section is empty
    • opt pseudosection doesn't need to be empty, as that cannot have a domain name/pointer - is this a correct assumption?
  • none of the As/AAAAs have compression pointer in the domain section that points before the As/AAAAs start

The shuffle itself is using Fisher-Yates, same algorithm that std::shuffle uses; it is using std::swap_ranges in most cases (same length), or std::swap_ranges and one std::rotate (which can have linear complexity).

The direct memory swapping in Mangler still feels a little "icky"; sorry 😞

I have added documentation; I didn't want to be too concrete in what packets are actually shuffled, as that could be changed eventually.

I have not yet added any tests, which I definitely will, if you think this approach is an appropriate approach. (And if I figure how to deterministically seed the random shuffle in a test.)

Checklist

I have:

  • read the CONTRIBUTING.md document
  • read and accepted the Developer Certificate of Origin document, including the AI policy, and added a "Signed-off-by" to my commits
  • compiled this code
  • tested this code
    • tested random queries, they work
  • included documentation (including possible behaviour changes)
  • documented the code
  • added or modified regression test(s)
  • added or modified unit test(s)

@coveralls
Copy link

coveralls commented Sep 10, 2025

Pull Request Test Coverage Report for Build 17765900968

Details

  • 9 of 135 (6.67%) changed or added relevant lines in 6 files are covered.
  • 34 unchanged lines in 8 files lost coverage.
  • Overall coverage decreased (-0.04%) to 65.987%

Changes Missing Coverage Covered Lines Changed/Added Lines %
pdns/dnsdistdist/dnsdist-cache.cc 1 4 25.0%
pdns/dnsparser.hh 3 46 6.52%
pdns/dnsparser.cc 0 80 0.0%
Files with Coverage Reduction New Missed Lines %
pdns/dnsdistdist/dnsdist.cc 2 70.45%
pdns/opensslsigners.cc 2 61.0%
modules/gpgsqlbackend/spgsql.cc 3 67.94%
pdns/iputils.cc 3 56.95%
pdns/misc.cc 4 61.76%
pdns/recursordist/syncres.cc 5 81.2%
pdns/recursordist/test-syncres_cc1.cc 6 90.23%
pdns/dnsdistdist/dnsdist-carbon.cc 9 61.9%
Totals Coverage Status
Change from base Build 17761500969: -0.04%
Covered Lines: 128910
Relevant Lines: 166619

💛 - Coveralls

@coveralls
Copy link

Pull Request Test Coverage Report for Build 17611467588

Details

  • 9 of 120 (7.5%) changed or added relevant lines in 6 files are covered.
  • 12 unchanged lines in 5 files lost coverage.
  • Overall coverage increased (+0.03%) to 66.01%

Changes Missing Coverage Covered Lines Changed/Added Lines %
pdns/dnsdistdist/dnsdist-cache.cc 1 4 25.0%
pdns/dnsparser.hh 3 46 6.52%
pdns/dnsparser.cc 0 65 0.0%
Files with Coverage Reduction New Missed Lines %
pdns/rcpgenerator.cc 2 90.2%
pdns/recursordist/rec-xfr.cc 2 73.94%
pdns/recursordist/syncres.cc 2 81.22%
modules/gpgsqlbackend/spgsql.cc 3 68.18%
pdns/recursordist/recpacketcache.hh 3 89.55%
Totals Coverage Status
Change from base Build 17610130088: 0.03%
Covered Lines: 128490
Relevant Lines: 166031

💛 - Coveralls

@rgacogne
Copy link
Member

I haven't looked at the code yet, but reading your description I think this is only safe if the only compression pointers present point to the questions section, otherwise all bets are off. Technically I believe a compression pointer could point at the rdata of a A or AAAA (that doesn't make sense to me, to be clear, but I believe this is allowed).

@omoerbeek
Copy link
Member

Also there's a check for the authority section, but not for the additional section. Why?

@karelbilek
Copy link
Contributor Author

@rgacogne what happens often is that for example CNAME is followed by As and the As point to the CNAMEs; but even in that case, swapping As memory should be fine?

@omoerbeek I was under the impression that the additional section cannot have domain names and therefore cannot have pointers, so no need to disallow that. if this assumption is wrong then I will need to add that, too (and there can be no edns, which will make this a bit useless)

@Habbie
Copy link
Member

Habbie commented Sep 12, 2025

but even in that case, swapping As memory should be fine?

it is theoretically possible that label compression points at A content 99.111.109.0 for a com. label. I don't know if any existing implementations would do it today, but it is allowed.

(that doesn't make sense to me, to be clear, but I believe this is allowed).

I believe it is. A fun project would be to see if any software chokes on it :-)

@karelbilek
Copy link
Contributor Author

karelbilek commented Sep 12, 2025 via email

@Habbie
Copy link
Member

Habbie commented Sep 12, 2025

that sounds good

@omoerbeek
Copy link
Member

@omoerbeek I was under the impression that the additional section cannot have domain names and therefore cannot have pointers, so no need to disallow that. if this assumption is wrong then I will need to add that, too (and there can be no edns, which will make this a bit useless)

Ehh, there can also be a compression pointer in the name field.

@karelbilek
Copy link
Contributor Author

karelbilek commented Sep 12, 2025 via email

@omoerbeek
Copy link
Member

omoerbeek commented Sep 12, 2025

Its quite common to have A or AAAA's following a CNAME having names pointing to the rdata of the CNAME. Or the A and AAAA's of the additional section having names pointing to rdata in the answer or authority section.

@karelbilek
Copy link
Contributor Author

@omoerbeek

Its quite common to have A or AAAA's following a CNAME having names pointing to the rdata of the CNAME. Or the A and AAAA's of the additional section having names pointing to rdata in the answer or authority section.

that should still work with this approach, nothing should break.

I have been reading what can be in additional section, and, for caution, have added check that there is only OPT or RRSIG.

@karelbilek karelbilek force-pushed the kb/shuffle branch 2 times, most recently from 87f011d to 4059c5f Compare September 16, 2025 12:33
@edmonds
Copy link
Contributor

edmonds commented Sep 16, 2025

Hello,

If I understand correctly, the goal is basically to shuffle the order of the RR's in an A or AAAA RRset appearing in the answer section? It seems like that could be done without paying any attention to name compression pointers or performing reordering on data that could potentially break compression pointers. Something like:

  1. Locate the RR's in the answer section constituting the RRset to be reordered. This requires looking at the RRTYPE until you see A or AAAA and then successively comparing the owner names in subsequent RR's to previously seen values. Once you see a different name or type, the RRset has ended.

  2. For each RR in the RRset to be reordered, record the offset of the first byte of the RDATA value, and copy out the RDATA value itself. (Require that the length of each RDATA value is equal to the length required by A or AAAA.)

  3. Take the collection of (offset, RDATA value) pairs and shuffle the offsets, re-distributing them randomly among the available RDATA values.

  4. Write the RDATA values back into the packet using the shuffled offsets.

I guess it could be extended to support multiple RRsets, or RRsets in different sections, etc.

This would only work for RRTYPEs that have fixed length RDATA values like A and AAAA, but that seems to be the scope of the original patch here.

If I am not mistaken this would avoid any issues related to name pointers moving around?

@rgacogne:

Technically I believe a compression pointer could point at the rdata of a A or AAAA (that doesn't make sense to me, to be clear, but I believe this is allowed).

@Habbie:

it is theoretically possible that label compression points at A content 99.111.109.0 for a com. label. I don't know if any existing implementations would do it today, but it is allowed.

I don't believe this is allowed. RFC 1035 section 4.1.4: "...an entire domain name or a list of labels at the end of a domain name is replaced with a pointer to a prior occurance of the same name." There are no domain names in the RDATA fields of the A and AAAA RRTYPEs. I think reinterpreting arbitrary non-name bytes of the message as if they were a wire format domain name is outside of the original specification, although admittedly STD 13 is a bit light on exactly what constitutes a valid compression offset.

Side note: the com. label requires five bytes to express so it won't fit in an A record's data :-)

@Habbie
Copy link
Member

Habbie commented Sep 16, 2025

Side note: the com. label requires five bytes to express so it won't fit in an A record's data :-)

Sorry. Of course I meant some AAAA :D

admittedly STD 13 is a bit light on exactly what constitutes a valid compression offset.

It was a lighter time for specifications in general. But I agree that it probably is not a real concern.

@omoerbeek
Copy link
Member

I was also pondering not shuffling each packet on the outgoing path, but the data in the packetcache: a swap of two rdata parts for a two random records in the rrset for each retrieval. This avoids a lot of copying.

@omoerbeek
Copy link
Member

That way you can also store in the PC if a packet may be shuffled.

@karelbilek
Copy link
Contributor Author

@edmonds that is a very good idea. I will need to see if it can be done without parsing the packet and recursively following the compressions - if we can easily tell that the names are the same - in the place I put the shuffling to, the simple parser doesn't follow the pointers. But yeah if it works, it's simpler than my current code

the big picture is do shuffling as recommended here - https://www.ietf.org/archive/id/draft-kerr-everybodys-shuffling-00.html . I am using dnsdist as a stub resolver; as written in the draft - "DNS Stub Resolvers SHOULD return RR within an RRset in random order." and "If these cache RRset, then they should also randomize." ; I just restricted this to A/AAAA as this was simpler.

@karelbilek
Copy link
Contributor Author

@edmonds yeah as written now, I cannot do that easily, because I don't see the name at all (the packet mangler is very simple and just skips the domain name). I will see if I can put it to some other place

The shuffle is implementing by directly swapping
pieces of RData memory in a single RRSet.

Signed-off-by: Karel Bilek <kb@karelbilek.com>
@karelbilek
Copy link
Contributor Author

karelbilek commented Sep 19, 2025

I have changed the logic to what @edmonds suggested, except I still swap the RRData directly rather then copy to local container, then shuffle, then copy back; I found it simpler. (std::shuffle is using Fisher Yates anyway, same thing that I use here.) The whole code is simpler now, great.

I still change the packets in the outgoing path, not in the cache itself.

I am using the DNSName() parser, with uncompression. This could make the shuffle take long potentially, if there is too many redirects and too many records? I was not sure if it's better to uncompress or not here.

@rgacogne rgacogne self-requested a review September 26, 2025 09:21
@rgacogne rgacogne added this to the dnsdist-2.1.0 milestone Sep 26, 2025
Copy link
Member

@rgacogne rgacogne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for opening this PR. I have a few nits but the code looks fine. We will definitely need test (unit ones, at least, ideally regression tests as well) before merging this. Let me know if you need help with that, of course!

* ``skipOptions={}``: Extra list of EDNS option codes to skip when hashing the packet (if ``cookieHashing`` above is false, EDNS cookie option number will be added to this list internally).
* ``maximumEntrySize=4096``: int - The maximum size, in bytes, of a DNS packet that can be inserted into the packet cache. Default is 4096 bytes, which was the fixed size before 1.9.0, and is also a hard limit for UDP responses.
* ``payloadRanks={}``: List of payload size used when hashing the packet. The list will be sorted in ascending order and searched to find a lower bound value for the payload size in the packet. If found then it will be used for packet hashing. Values less than 512 or greater than ``maximumEntrySize`` above will be discarded. This option is to enable cache entry sharing between clients using different payload sizes when needed.
* ``shuffle=false``: bool - Whether A and AAAA records should be shuffled when serving from cache, for load-balancing. The cache might not be always shuffled when the results are too complex.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* ``shuffle=false``: bool - Whether A and AAAA records should be shuffled when serving from cache, for load-balancing. The cache might not be always shuffled when the results are too complex.
* ``shuffle=false``: bool - Whether A and AAAA records should be shuffled when serving from cache, for load-balancing. The cache might not be shuffled if the cached packet is too complex for the simple parser used for this feature.

:param bool deferrableInsertLock: Whether the cache should give up insertion if the lock is held by another thread, or simply wait to get the lock
:param int maxNegativeTTL: Cache a NXDomain or NoData answer from the backend for at most this amount of seconds, even if the TTL of the SOA record is higher
:param bool parseECS: Whether any EDNS Client Subnet option present in the query should be extracted and stored to be able to detect hash collisions involving queries with the same qname, qtype and qclass but a different incoming ECS value. Enabling this option adds a parsing cost and only makes sense if at least one backend might send different responses based on the ECS value, so it's disabled by default
:param bool shuffle: Whether A and AAAA records should be shuffled when serving from cache, for load-balancing. The cache might not be always shuffled when the results are too complex.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This syntax has been deprecated since 1.4.0, no need to update it.

- name: "shuffle"
type: "bool"
default: "false"
description: "Whether to shuffle records in the cache."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
description: "Whether to shuffle records in the cache."
description: "Whether A and AAAA records should be shuffled when serving from cache, for load-balancing. The cache might not be shuffled if the cached packet is too complex for the simple parser used for this feature."

for (auto swapped = rrdata_indexes.size() - 1; swapped > 0; --swapped) {
auto swapped_with = dist(randomEngine, uid::param_type(0, swapped));
if (swapped != swapped_with) {
dpm.swapInPlace(rrdata_indexes[swapped], rrdata_indexes[swapped_with]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
dpm.swapInPlace(rrdata_indexes[swapped], rrdata_indexes[swapped_with]);
dpm.swapInPlace(rrdata_indexes.at(swapped), rrdata_indexes.at(swapped_with));

moveOffset(toskip);
}

std::pair<uint32_t, uint32_t> skipAndReturnIndexesRData()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
std::pair<uint32_t, uint32_t> skipAndReturnIndexesRData()
std::pair<uint32_t, uint32_t> skipRDataAndReturnOffsets()


// for now shuffle only first rrset, only As and AAAAs
uint16_t rrset_type = 0;
DNSName rrset_dnsname = DNSName();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
DNSName rrset_dnsname = DNSName();
DNSName rrset_dnsname{};

return;
}
try {
DNSPacketMangler dpm(packet, length);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The creation of the DNSPacketMangler object can be moved after the ancount == 1 check.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants