-
Notifications
You must be signed in to change notification settings - Fork 176
border router: traffic control (scheduling) #4054
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
matzf
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work!
First off, a fantastic bike shedding comment; isn't this "traffic control" (as e.g. the linux "tc" subsystem) rather than "traffic engineering"? Traffic engineering appears to be used more broadly for the processes of optimising network infrastructure to the expected traffic. See https://en.wikipedia.org/wiki/Network_traffic_control and https://en.wikipedia.org/wiki/Teletraffic_engineering.
I have a general concern about this prioritisation and scheduling (as previously discussed offline).
For this to be useful, it seems critical to be able to read and classify all packets. This is a slightly awkward position, as the implementation needs to meet performance requirements for correctness.
In this generic golang implementation of the router, it is simply not possible to process all incoming packets in all deployments. So perhaps it could make sense to include a way to determine a threshold throughput for which we can make some guarantees. With this, an operator could enforce the throughput limits early, so that the overall guarantees can be maintained. Maybe we can also set a target for this on some reference system, to evaluate whether the implementation meets our expectations. Ideally, this could also be checked in the CI system, but creating performance tests are both useful and robust is admittedly very difficult.
Some specific points about this implementation;
- Why are there so many queues? Why not just, say, Colibri, Epic and then all the rest?
- Is there ever a reason not to use the strict priority scheduler?
- The BFD packets should be higher priority than Colibri and Epic -- if we stop sending BFD packets, the link will just shut down (which doesn't help the Colibri and Epic packets). So if it's top priority anyway, we can bypass the scheduling an just keep sending to the socket directly. Or does the occasional WriteTo adversely affect the throughput achievable by the WriteBatch calls?
Note: I'm (still) not entirely convinced about copying the packet to put it in the prioritized queues, but I also see the advantage of decoupling the reading/processing from the writing stages. Have you already attempted to measure the overhead of this copy?
Reviewed 2 of 10 files at r1.
Reviewable status: 2 of 11 files reviewed, 2 unresolved discussions (waiting on @mawyss)
go/pkg/router/dataplane.go, line 80 at r1 (raw file):
type BatchConn interface { ReadBatch(underlayconn.Messages) (int, error) WriteBatch(underlayconn.Messages) (int, error)
Run make mocks to update the mocks.
go/pkg/router/te/roundrobin.go, line 20 at r1 (raw file):
// scheduling into the border router. Only basic scheduling algorithms are // implemented, more elaborate ones might be necessary in the future. package te
Nit: only add the package docs once
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your feedback, and sorry for the many commits, I had to debug several data race issues, which I could not reproduce locally, but only on buildkite.
For future reference, the important points are:
- The destination IP address needs to be copied, because it is pointing to the address inside the packet, and therefore it would get overwritten by the read/process goroutine.
- There needs to be a write goroutine per connection (BatchConn), not per egress interface.
- WriteBatch may not be the best option even if we are sending multiple packets at once: one invalid packet may prevent many others from being sent out.
Regarding "traffic engineering": I used the term because of https://datatracker.ietf.org/doc/html/rfc3272#page-18 (paragraph "Short").
I agree that "traffic control" describes the changes in this PR better, I changed the code accordingly.
"It is simply not possible to process all incoming packets in all deployments."
If the router cannot handle the packets at the rate they are arriving, this is indeed a problem for QoS systems. With decoupling of reading and writing, as used in this PR, reading will become faster than in the current BR version however. Also note that the packet reading can be easily scaled up by simply starting the corresponding goroutine multiple times. Furthermore, even in scenarios where the BR is indeed not able to read all packets fast enough, the prioritization is maybe not perfect but still improves QoS. I tested the code locally with file transfers: even on my performance-constrained system, prioritized traffic arrives twice as fast as the traffic sent over five non-prioritized other streams.
"With this, an operator could enforce the throughput limits early, so that the overall guarantees can be maintained."
Can you elaborate more on this? How would an operator enforce the throughput limits?
Regarding your questions:
- The queues are only a suggestion, which should allow for future scheduling algorithms that need to differentiate between all traffic classes. When having only Colibri, Epic and "all the rest", we could for example not distinguish between BFD and SCION packets.
- The idea behind the strict priority scheduler is to have a way of demonstrating in the near future that the Colibri QoS guarantees can indeed be satisfied. In the long term, more elaborate scheduling algorithms will be needed, that for example do not schedule on a per-packet-, but on a per-packetsize-basis. Also, if we only would use the strict priority scheduler, higher priority packets could cause lower priority packets to be dropped completely. Therefore, an algorithm will be necessary that provides minimal forwarding guarantees to each traffic class.
- I implemented your suggestion to prioritize BFD more than EPIC. For Colibri however, the idea is that an AS only provides bandwidth reservations that it can indeed handle. Therefore, Colibri traffic will never be high enough such that other traffic cannot be forwarded anymore.
- Copying the packet to the queues is an overhead, but it allows to completely separate the reading/processing from writing and makes the performance more predictable (compared to a global pool of packets shared by all goroutines). I did not measure the overhead due to this copy operation yet.
Reviewable status: 0 of 11 files reviewed, 1 unresolved discussion (waiting on @matzf)
go/pkg/router/dataplane.go, line 80 at r1 (raw file):
Previously, matzf (Matthias Frei) wrote…
Run
make mocksto update the mocks.
Done.
go/pkg/router/te/roundrobin.go, line 20 at r1 (raw file):
Previously, matzf (Matthias Frei) wrote…
Nit: only add the package docs once
Done.
matzf
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also note that the packet reading can be easily scaled up by simply starting the corresponding goroutine multiple times.
This will not help very much (the overhead for the read calls remains the same), but indeed the processing can be parallelised to some extent. However, doing this naively would lead to packet reordering. Packets with the same flow id must be processed (or at least written out) in the correct order. This can be done by using the flow ID as key when distributing packets to the parallel processing units. This is, however, not sufficient when attempting to improve the situation for prioritized traffic, as an attacker could attempt to have its low priority attack traffic processed on the same processor as the high priority victim traffic, by using the same flow IDs.
The prioritization is maybe not perfect but still improves QoS
But that's a major caveat. The selling point of Colibri and the GMA thingy is that it's supposed to provide guarantees. Possibly I see this too much black and white, but honestly I don't think that merely improving the throughput for high priority traffic is any good at all.
"With this, an operator could enforce the throughput limits early, so that the overall guarantees can be maintained."
Can you elaborate more on this? How would an operator enfoce the throughput limits?
The operator of an AS can ensure that only a certain amount of traffic can possibly reach a border router by "simply" sufficiently throttling the network at any ingress point to the network.
- The queues are only a suggestion, which should allow for future scheduling algorithms that need to differentiate between all traffic classes. When having only Colibri, Epic and "all the rest", we could for example not distinguish between BFD and SCION packets.
- The idea behind the strict prioritiy scheduler is to have a way of demonstrating in the near future that the Colibri QoS guarantees can indeed be satisfied. In the long term, more elaborate scheduling algorithms will be needed, that for example do not schedule on a per-packet-, but on a per-packetsize-basis. Also, if we only would use the strict priority scheduler, higher priority packets could cause lower priority packets to be dropped completely. Therefore, an algorithm will be necessary that provides minimal forwarding guarantees to each traffic class.
"YAGNI".
I think we should not include more fine grained queues or more abstract or elaborate schedulers than strictly necessary for now just in anticipation of possible future extensions. Instead, the system should be obvious enough (and I think that's already the case) to make it clear where and how to add the extended logic if it ever becomes necessary.
I don't understand the point about "higher priority packets could cause lower priority packets to be dropped completely"; yes, I mean, of course, but that's kind of the point, no!? Unless we don't have the right set of traffic classes/queues, that is.
Btw. scheduling based on packet size could be problematic due to possible packet reordering within a flow -- not sure why you'd want to do this anyway, so maybe I'm just misunderstanding.
Regarding the BFD packets; I think these should best bypass the queuing and write directly to the socket.
Reviewed 11 of 11 files at r5.
Reviewable status:complete! all files reviewed, all discussions resolved (waiting on @mawyss)
mawyss
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I implemented your suggestions:
- Only keep schedulers that will be used
- Only COLIBRI and EPIC queues (all other traffic goes to the "Others" queue)
- BFD packets are not scheduled, but sent directly to the socket
"Scheduling based on packet size could be problematic due to possible packet reordering within a flow"
I don't want to suggest to prioritize packets (of the same class) according to their size.
In case of congestion in all traffic classes, I would like to be able to specify that for example 80% of bandwidth (bytes per second, not packets per second) goes to Colibri traffic, 15% to EPIC, and 5% to the rest (SCION, SCMP, ...).
When the Colibri bandwidth is not fully used, the free bandwidth should be filled with EPIC traffic, but there should still be at least 5% available to other traffic.
And if at some point for example COLIBRI uses 20%, and EPIC 30% of the bandwidth, then the remaining 50% should be used for other traffic.
(This is basically what is described in the COLIBRI paper.)
This is not an issue at the moment however, because the strict scheduling algorithm does exactly what we want:
- COLIBRI reservations are prioritized, but will not take more than X percent of the bandwidth (enforced by the COLIBRI algorithm)
- The last two hops of an EPIC (more precisely EPIC-HP) packet will prioritize the EPIC packet before any SCION traffic.
In the future, with additional traffic classes such as EPIC-SAPV (source authentication + path validation based on DRKey), a more elaborate bandwidth algorithm such as described above might be necessary:
If we would strictly prioritize EPIC-SAPV before SCION traffic, SCION traffic might completely starve, this is why I suggested to still allocate it a guaranteed share of the total bandwidth. Not having such a guaranteed share for SCION traffic might be problematic, because in order to use EPIC-SAPV, the DRKeys need to be fetched (using SCION path type traffic).
But yes, this is indeed not a problem for now.
Reviewable status: 5 of 11 files reviewed, all discussions resolved (waiting on @matzf)
…ecution environment)
|
As long as COLIBRI and EPIC are not fully implemented in this repository, this packet prioritization is not really needed. |
This PR adds scheduling capabilities to the border router. For this,
the current stages read/process/write are split into two separate
processes, one to read/process and one to write, similar to what
is suggested in #4031. All packet state is pre-allocated, which
is in accordance to recent efforts in this direction (#4030).
Major changes:
corresponds to one traffic class (SCION, EPIC, OHP, etc.). If
scheduling is disabled, only one queue will be allocated for all
traffic classes together.
from the queues and schedules them according to the selected
scheduling algorithm.
or strict priority scheduling. The implementation design should
allow to easily add more elaborate algorithms in the future.
This change is