-
Notifications
You must be signed in to change notification settings - Fork 968
dnsdist: implement simple packet shuffle in cache #16108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
b6eb1a7
to
7346e49
Compare
Pull Request Test Coverage Report for Build 17765900968Details
💛 - Coveralls |
Pull Request Test Coverage Report for Build 17611467588Details
💛 - Coveralls |
7346e49
to
d2b1a7b
Compare
I haven't looked at the code yet, but reading your description I think this is only safe if the only compression pointers present point to the questions section, otherwise all bets are off. Technically I believe a compression pointer could point at the rdata of a |
Also there's a check for the authority section, but not for the additional section. Why? |
@rgacogne what happens often is that for example CNAME is followed by As and the As point to the CNAMEs; but even in that case, swapping As memory should be fine? @omoerbeek I was under the impression that the additional section cannot have domain names and therefore cannot have pointers, so no need to disallow that. if this assumption is wrong then I will need to add that, too (and there can be no edns, which will make this a bit useless) |
it is theoretically possible that label compression points at A content
I believe it is. A fun project would be to see if any software chokes on it :-) |
but even in that case I check that all the pointers point before the As
start, so this should not be a problem.
…On Fri 12. 9. 2025 at 17:42, Peter van Dijk ***@***.***> wrote:
*Habbie* left a comment (PowerDNS/pdns#16108)
<#16108 (comment)>
but even in that case, swapping As memory should be fine?
it is theoretically possible that label compression points at A content
99.111.109.0 for a com. label. I don't know if any existing
implementations would do it today, but it is allowed.
(that doesn't make sense to me, to be clear, but I believe this is
allowed).
I believe it is. A fun project would be to see if any software chokes on
it :-)
—
Reply to this email directly, view it on GitHub
<#16108 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAZT4OD37SYTDBMKPBCJOT3SLSVRAVCNFSM6AAAAACGD2RLXCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTEOBVG44DIMRZGA>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
that sounds good |
Ehh, there can also be a compression pointer in the name field. |
oh, I didn't know that, but I can check for that here at least.
…On Fri 12. 9. 2025 at 17:45, Otto Moerbeek ***@***.***> wrote:
*omoerbeek* left a comment (PowerDNS/pdns#16108)
<#16108 (comment)>
@omoerbeek <https://github.com/omoerbeek> I was under the impression that
the additional section cannot have domain names and therefore cannot have
pointers, so no need to disallow that. if this assumption is wrong then I
will need to add that, too (and there can be no edns, which will make this
a bit useless)
Ehh, there can also be a compression pointer in the name field.
—
Reply to this email directly, view it on GitHub
<#16108 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAZT4IFMBVOEPI6OSSIKUL3SLTBFAVCNFSM6AAAAACGD2RLXCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTEOBVG44TIMZSGU>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Its quite common to have A or AAAA's following a CNAME having names pointing to the rdata of the CNAME. Or the A and AAAA's of the additional section having names pointing to rdata in the answer or authority section. |
b3350d0
to
3ec864b
Compare
that should still work with this approach, nothing should break. I have been reading what can be in additional section, and, for caution, have added check that there is only OPT or RRSIG. |
87f011d
to
4059c5f
Compare
Hello, If I understand correctly, the goal is basically to shuffle the order of the RR's in an A or AAAA RRset appearing in the answer section? It seems like that could be done without paying any attention to name compression pointers or performing reordering on data that could potentially break compression pointers. Something like:
I guess it could be extended to support multiple RRsets, or RRsets in different sections, etc. This would only work for RRTYPEs that have fixed length RDATA values like A and AAAA, but that seems to be the scope of the original patch here. If I am not mistaken this would avoid any issues related to name pointers moving around?
I don't believe this is allowed. RFC 1035 section 4.1.4: "...an entire domain name or a list of labels at the end of a domain name is replaced with a pointer to a prior occurance of the same name." There are no domain names in the RDATA fields of the A and AAAA RRTYPEs. I think reinterpreting arbitrary non-name bytes of the message as if they were a wire format domain name is outside of the original specification, although admittedly STD 13 is a bit light on exactly what constitutes a valid compression offset. Side note: the |
Sorry. Of course I meant some AAAA :D
It was a lighter time for specifications in general. But I agree that it probably is not a real concern. |
I was also pondering not shuffling each packet on the outgoing path, but the data in the packetcache: a swap of two rdata parts for a two random records in the rrset for each retrieval. This avoids a lot of copying. |
That way you can also store in the PC if a packet may be shuffled. |
@edmonds that is a very good idea. I will need to see if it can be done without parsing the packet and recursively following the compressions - if we can easily tell that the names are the same - in the place I put the shuffling to, the simple parser doesn't follow the pointers. But yeah if it works, it's simpler than my current code the big picture is do shuffling as recommended here - https://www.ietf.org/archive/id/draft-kerr-everybodys-shuffling-00.html . I am using dnsdist as a stub resolver; as written in the draft - "DNS Stub Resolvers SHOULD return RR within an RRset in random order." and "If these cache RRset, then they should also randomize." ; I just restricted this to A/AAAA as this was simpler. |
@edmonds yeah as written now, I cannot do that easily, because I don't see the name at all (the packet mangler is very simple and just skips the domain name). I will see if I can put it to some other place |
4059c5f
to
73f93e2
Compare
The shuffle is implementing by directly swapping pieces of RData memory in a single RRSet. Signed-off-by: Karel Bilek <kb@karelbilek.com>
73f93e2
to
db6b5d5
Compare
I have changed the logic to what @edmonds suggested, except I still swap the RRData directly rather then copy to local container, then shuffle, then copy back; I found it simpler. (std::shuffle is using Fisher Yates anyway, same thing that I use here.) The whole code is simpler now, great. I still change the packets in the outgoing path, not in the cache itself. I am using the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for opening this PR. I have a few nits but the code looks fine. We will definitely need test (unit ones, at least, ideally regression tests as well) before merging this. Let me know if you need help with that, of course!
* ``skipOptions={}``: Extra list of EDNS option codes to skip when hashing the packet (if ``cookieHashing`` above is false, EDNS cookie option number will be added to this list internally). | ||
* ``maximumEntrySize=4096``: int - The maximum size, in bytes, of a DNS packet that can be inserted into the packet cache. Default is 4096 bytes, which was the fixed size before 1.9.0, and is also a hard limit for UDP responses. | ||
* ``payloadRanks={}``: List of payload size used when hashing the packet. The list will be sorted in ascending order and searched to find a lower bound value for the payload size in the packet. If found then it will be used for packet hashing. Values less than 512 or greater than ``maximumEntrySize`` above will be discarded. This option is to enable cache entry sharing between clients using different payload sizes when needed. | ||
* ``shuffle=false``: bool - Whether A and AAAA records should be shuffled when serving from cache, for load-balancing. The cache might not be always shuffled when the results are too complex. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* ``shuffle=false``: bool - Whether A and AAAA records should be shuffled when serving from cache, for load-balancing. The cache might not be always shuffled when the results are too complex. | |
* ``shuffle=false``: bool - Whether A and AAAA records should be shuffled when serving from cache, for load-balancing. The cache might not be shuffled if the cached packet is too complex for the simple parser used for this feature. |
:param bool deferrableInsertLock: Whether the cache should give up insertion if the lock is held by another thread, or simply wait to get the lock | ||
:param int maxNegativeTTL: Cache a NXDomain or NoData answer from the backend for at most this amount of seconds, even if the TTL of the SOA record is higher | ||
:param bool parseECS: Whether any EDNS Client Subnet option present in the query should be extracted and stored to be able to detect hash collisions involving queries with the same qname, qtype and qclass but a different incoming ECS value. Enabling this option adds a parsing cost and only makes sense if at least one backend might send different responses based on the ECS value, so it's disabled by default | ||
:param bool shuffle: Whether A and AAAA records should be shuffled when serving from cache, for load-balancing. The cache might not be always shuffled when the results are too complex. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This syntax has been deprecated since 1.4.0, no need to update it.
- name: "shuffle" | ||
type: "bool" | ||
default: "false" | ||
description: "Whether to shuffle records in the cache." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
description: "Whether to shuffle records in the cache." | |
description: "Whether A and AAAA records should be shuffled when serving from cache, for load-balancing. The cache might not be shuffled if the cached packet is too complex for the simple parser used for this feature." |
for (auto swapped = rrdata_indexes.size() - 1; swapped > 0; --swapped) { | ||
auto swapped_with = dist(randomEngine, uid::param_type(0, swapped)); | ||
if (swapped != swapped_with) { | ||
dpm.swapInPlace(rrdata_indexes[swapped], rrdata_indexes[swapped_with]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dpm.swapInPlace(rrdata_indexes[swapped], rrdata_indexes[swapped_with]); | |
dpm.swapInPlace(rrdata_indexes.at(swapped), rrdata_indexes.at(swapped_with)); |
moveOffset(toskip); | ||
} | ||
|
||
std::pair<uint32_t, uint32_t> skipAndReturnIndexesRData() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::pair<uint32_t, uint32_t> skipAndReturnIndexesRData() | |
std::pair<uint32_t, uint32_t> skipRDataAndReturnOffsets() |
|
||
// for now shuffle only first rrset, only As and AAAAs | ||
uint16_t rrset_type = 0; | ||
DNSName rrset_dnsname = DNSName(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DNSName rrset_dnsname = DNSName(); | |
DNSName rrset_dnsname{}; |
return; | ||
} | ||
try { | ||
DNSPacketMangler dpm(packet, length); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The creation of the DNSPacketMangler
object can be moved after the ancount == 1
check.
Short description
I have implemented a simple packet shuffle for dnsdist packet cache.
I want to use dnsdist as a local stub resolver, similar to dnsmasq. Unlike dnsmasq, which does do shuffling of A responses, dnsdist doesn't.
This is a bit complicated, as dnsdist just saves the packets directly and not the A/AAAA records. I figured out I could just take the packets and swap the memory of the A records directly in a narrow number of cases, but this actually covers most of cases where shuffling is needed.
The issue with directly swapping memory of records is that DNS compression can break. I have narrowed the cases where I can safely do shuffle:
The shuffle itself is using Fisher-Yates, same algorithm that std::shuffle uses; it is using std::swap_ranges in most cases (same length), or std::swap_ranges and one std::rotate (which can have linear complexity).
The direct memory swapping in Mangler still feels a little "icky"; sorry 😞
I have added documentation; I didn't want to be too concrete in what packets are actually shuffled, as that could be changed eventually.
I have not yet added any tests, which I definitely will, if you think this approach is an appropriate approach. (And if I figure how to deterministically seed the random shuffle in a test.)
Checklist
I have: