border router: traffic control (scheduling) #4054

mawyss · 2021-05-21T14:28:11Z

This PR adds scheduling capabilities to the border router. For this,
the current stages read/process/write are split into two separate
processes, one to read/process and one to write, similar to what
is suggested in #4031. All packet state is pre-allocated, which
is in accordance to recent efforts in this direction (#4030).

Major changes:

Add traffic queues for each border router interface: each queue
corresponds to one traffic class (SCION, EPIC, OHP, etc.). If
scheduling is disabled, only one queue will be allocated for all
traffic classes together.
Separate process for writing: the write process takes packets
from the queues and schedules them according to the selected
scheduling algorithm.
Some basic algorithms are implemented, such as round-robin
or strict priority scheduling. The implementation design should
allow to easily add more elaborate algorithms in the future.

This change is

matzf

Nice work!

First off, a fantastic bike shedding comment; isn't this "traffic control" (as e.g. the linux "tc" subsystem) rather than "traffic engineering"? Traffic engineering appears to be used more broadly for the processes of optimising network infrastructure to the expected traffic. See https://en.wikipedia.org/wiki/Network_traffic_control and https://en.wikipedia.org/wiki/Teletraffic_engineering.

I have a general concern about this prioritisation and scheduling (as previously discussed offline).
For this to be useful, it seems critical to be able to read and classify all packets. This is a slightly awkward position, as the implementation needs to meet performance requirements for correctness.
In this generic golang implementation of the router, it is simply not possible to process all incoming packets in all deployments. So perhaps it could make sense to include a way to determine a threshold throughput for which we can make some guarantees. With this, an operator could enforce the throughput limits early, so that the overall guarantees can be maintained. Maybe we can also set a target for this on some reference system, to evaluate whether the implementation meets our expectations. Ideally, this could also be checked in the CI system, but creating performance tests are both useful and robust is admittedly very difficult.

Some specific points about this implementation;

Why are there so many queues? Why not just, say, Colibri, Epic and then all the rest?
Is there ever a reason not to use the strict priority scheduler?
The BFD packets should be higher priority than Colibri and Epic -- if we stop sending BFD packets, the link will just shut down (which doesn't help the Colibri and Epic packets). So if it's top priority anyway, we can bypass the scheduling an just keep sending to the socket directly. Or does the occasional WriteTo adversely affect the throughput achievable by the WriteBatch calls?

Note: I'm (still) not entirely convinced about copying the packet to put it in the prioritized queues, but I also see the advantage of decoupling the reading/processing from the writing stages. Have you already attempted to measure the overhead of this copy?

Reviewed 2 of 10 files at r1.
Reviewable status: 2 of 11 files reviewed, 2 unresolved discussions (waiting on @mawyss)

go/pkg/router/dataplane.go, line 80 at r1 (raw file):

type BatchConn interface {
	ReadBatch(underlayconn.Messages) (int, error)
	WriteBatch(underlayconn.Messages) (int, error)

Run make mocks to update the mocks.

go/pkg/router/te/roundrobin.go, line 20 at r1 (raw file):

// scheduling into the border router. Only basic scheduling algorithms are
// implemented, more elaborate ones might be necessary in the future.
package te

Nit: only add the package docs once

mawyss

Thank you for your feedback, and sorry for the many commits, I had to debug several data race issues, which I could not reproduce locally, but only on buildkite.
For future reference, the important points are:

The destination IP address needs to be copied, because it is pointing to the address inside the packet, and therefore it would get overwritten by the read/process goroutine.
There needs to be a write goroutine per connection (BatchConn), not per egress interface.
WriteBatch may not be the best option even if we are sending multiple packets at once: one invalid packet may prevent many others from being sent out.

Regarding "traffic engineering": I used the term because of https://datatracker.ietf.org/doc/html/rfc3272#page-18 (paragraph "Short").
I agree that "traffic control" describes the changes in this PR better, I changed the code accordingly.

"It is simply not possible to process all incoming packets in all deployments."

If the router cannot handle the packets at the rate they are arriving, this is indeed a problem for QoS systems. With decoupling of reading and writing, as used in this PR, reading will become faster than in the current BR version however. Also note that the packet reading can be easily scaled up by simply starting the corresponding goroutine multiple times. Furthermore, even in scenarios where the BR is indeed not able to read all packets fast enough, the prioritization is maybe not perfect but still improves QoS. I tested the code locally with file transfers: even on my performance-constrained system, prioritized traffic arrives twice as fast as the traffic sent over five non-prioritized other streams.

"With this, an operator could enforce the throughput limits early, so that the overall guarantees can be maintained."

Can you elaborate more on this? How would an operator enforce the throughput limits?

Regarding your questions:

The queues are only a suggestion, which should allow for future scheduling algorithms that need to differentiate between all traffic classes. When having only Colibri, Epic and "all the rest", we could for example not distinguish between BFD and SCION packets.
The idea behind the strict priority scheduler is to have a way of demonstrating in the near future that the Colibri QoS guarantees can indeed be satisfied. In the long term, more elaborate scheduling algorithms will be needed, that for example do not schedule on a per-packet-, but on a per-packetsize-basis. Also, if we only would use the strict priority scheduler, higher priority packets could cause lower priority packets to be dropped completely. Therefore, an algorithm will be necessary that provides minimal forwarding guarantees to each traffic class.
I implemented your suggestion to prioritize BFD more than EPIC. For Colibri however, the idea is that an AS only provides bandwidth reservations that it can indeed handle. Therefore, Colibri traffic will never be high enough such that other traffic cannot be forwarded anymore.
Copying the packet to the queues is an overhead, but it allows to completely separate the reading/processing from writing and makes the performance more predictable (compared to a global pool of packets shared by all goroutines). I did not measure the overhead due to this copy operation yet.

Reviewable status: 0 of 11 files reviewed, 1 unresolved discussion (waiting on @matzf)

go/pkg/router/dataplane.go, line 80 at r1 (raw file):

Previously, matzf (Matthias Frei) wrote…

Run make mocks to update the mocks.

Done.

go/pkg/router/te/roundrobin.go, line 20 at r1 (raw file):

Previously, matzf (Matthias Frei) wrote…

Nit: only add the package docs once

Done.

matzf

Also note that the packet reading can be easily scaled up by simply starting the corresponding goroutine multiple times.

This will not help very much (the overhead for the read calls remains the same), but indeed the processing can be parallelised to some extent. However, doing this naively would lead to packet reordering. Packets with the same flow id must be processed (or at least written out) in the correct order. This can be done by using the flow ID as key when distributing packets to the parallel processing units. This is, however, not sufficient when attempting to improve the situation for prioritized traffic, as an attacker could attempt to have its low priority attack traffic processed on the same processor as the high priority victim traffic, by using the same flow IDs.

The prioritization is maybe not perfect but still improves QoS

But that's a major caveat. The selling point of Colibri and the GMA thingy is that it's supposed to provide guarantees. Possibly I see this too much black and white, but honestly I don't think that merely improving the throughput for high priority traffic is any good at all.

"With this, an operator could enforce the throughput limits early, so that the overall guarantees can be maintained."
Can you elaborate more on this? How would an operator enfoce the throughput limits?

The operator of an AS can ensure that only a certain amount of traffic can possibly reach a border router by "simply" sufficiently throttling the network at any ingress point to the network.

The queues are only a suggestion, which should allow for future scheduling algorithms that need to differentiate between all traffic classes. When having only Colibri, Epic and "all the rest", we could for example not distinguish between BFD and SCION packets.

The idea behind the strict prioritiy scheduler is to have a way of demonstrating in the near future that the Colibri QoS guarantees can indeed be satisfied. In the long term, more elaborate scheduling algorithms will be needed, that for example do not schedule on a per-packet-, but on a per-packetsize-basis. Also, if we only would use the strict priority scheduler, higher priority packets could cause lower priority packets to be dropped completely. Therefore, an algorithm will be necessary that provides minimal forwarding guarantees to each traffic class.

"YAGNI".
I think we should not include more fine grained queues or more abstract or elaborate schedulers than strictly necessary for now just in anticipation of possible future extensions. Instead, the system should be obvious enough (and I think that's already the case) to make it clear where and how to add the extended logic if it ever becomes necessary.

I don't understand the point about "higher priority packets could cause lower priority packets to be dropped completely"; yes, I mean, of course, but that's kind of the point, no!? Unless we don't have the right set of traffic classes/queues, that is.
Btw. scheduling based on packet size could be problematic due to possible packet reordering within a flow -- not sure why you'd want to do this anyway, so maybe I'm just misunderstanding.

Regarding the BFD packets; I think these should best bypass the queuing and write directly to the socket.

Reviewed 11 of 11 files at r5.
Reviewable status: complete! all files reviewed, all discussions resolved (waiting on @mawyss)

mawyss

I implemented your suggestions:

Only keep schedulers that will be used
Only COLIBRI and EPIC queues (all other traffic goes to the "Others" queue)
BFD packets are not scheduled, but sent directly to the socket

"Scheduling based on packet size could be problematic due to possible packet reordering within a flow"

I don't want to suggest to prioritize packets (of the same class) according to their size.
In case of congestion in all traffic classes, I would like to be able to specify that for example 80% of bandwidth (bytes per second, not packets per second) goes to Colibri traffic, 15% to EPIC, and 5% to the rest (SCION, SCMP, ...).
When the Colibri bandwidth is not fully used, the free bandwidth should be filled with EPIC traffic, but there should still be at least 5% available to other traffic.
And if at some point for example COLIBRI uses 20%, and EPIC 30% of the bandwidth, then the remaining 50% should be used for other traffic.
(This is basically what is described in the COLIBRI paper.)

This is not an issue at the moment however, because the strict scheduling algorithm does exactly what we want:

COLIBRI reservations are prioritized, but will not take more than X percent of the bandwidth (enforced by the COLIBRI algorithm)
The last two hops of an EPIC (more precisely EPIC-HP) packet will prioritize the EPIC packet before any SCION traffic.

In the future, with additional traffic classes such as EPIC-SAPV (source authentication + path validation based on DRKey), a more elaborate bandwidth algorithm such as described above might be necessary:
If we would strictly prioritize EPIC-SAPV before SCION traffic, SCION traffic might completely starve, this is why I suggested to still allocate it a guaranteed share of the total bandwidth. Not having such a guaranteed share for SCION traffic might be problematic, because in order to use EPIC-SAPV, the DRKeys need to be fetched (using SCION path type traffic).
But yes, this is indeed not a problem for now.

Reviewable status: 5 of 11 files reviewed, all discussions resolved (waiting on @matzf)

…ecution environment)

mawyss · 2023-03-15T10:11:15Z

As long as COLIBRI and EPIC are not fully implemented in this repository, this packet prioritization is not really needed.
Also, future efforts in the direction of making the border router faster will anyway require changes to this implementation again.

mawyss force-pushed the scion_te branch from 11ee711 to 9451efa Compare May 26, 2021 07:35

matzf reviewed May 26, 2021

View reviewed changes

mawyss marked this pull request as draft May 27, 2021 14:08

mawyss force-pushed the scion_te branch from df887bd to af63e79 Compare May 29, 2021 12:18

mawyss commented May 31, 2021

View reviewed changes

mawyss changed the title ~~border router: traffic engineering (scheduling)~~ border router: traffic control (scheduling) May 31, 2021

mawyss marked this pull request as ready for review May 31, 2021 08:06

matzf reviewed Jun 14, 2021

View reviewed changes

mawyss commented Jun 21, 2021

View reviewed changes

mawyss and others added 20 commits September 7, 2021 09:52

border router: traffic engineering (scheduling)

bf4eabb

Reintroduce WriteBatch in GoMock

93253d7

Remove redundant package description, increase buffer sizes.

6f805d4

Further increase buffers

5a24299

Try to solve failing buildkite tests (the code works locally just fine)

2bd771a

Test WriteBatch with one packet.

b8d78f0

Test WriteTo

c45bd36

Fix concurrency bug (occurring almost exclusively in the buildkite ex…

7ab5a95

…ecution environment)

Smaller output buffer, pretty-print traffic classes

13b36ae

Fix bug when returning packet buffers

8fd0c10

Test with WriteTo

d01fcc1

Per-connection queues

0c059b5

Use WriteTo only.

579de04

Rename to traffic control

ddd59e3

Prioritize OHP/BFD over EPIC in the strict priority scheduler.

e860553

Adapt documentation

1e45250

Typo

3d5d227

Implement feedback

98c55e5

Fix golangci-lint errors

cc96340

Minor

d31692d

mawyss force-pushed the scion_te branch from 8745330 to d31692d Compare September 7, 2021 08:44

mawyss added 19 commits September 7, 2021 13:20

Fix mock errors.

2850717

Merge branch 'master' into scion_te

4b7c6ed

Merge branch 'master' into scion_te

4f74b5d

Merge branch 'master' into scion_te

9ed139e

Merge branch 'master' into scion_te

c8a9c3d

Renaming

6f327e3

Merge branch 'master' into scion_te

7a2ee7d

Fix

63a4131

Merge branch 'master' into scion_te

91ac3d0

Fix

6e031e7

Merge branch 'master' into scion_te

43dc43b

Merge branch 'master' into scion_te

ef47999

Merge branch 'master' into scion_te

b7f61e6

Merge branch 'master' into scion_te

7f592f2

Merge branch 'master' into scion_te

879d7c9

Merge branch 'master' into scion_te

89f4c97

Merge branch 'master' into scion_te

d037b5c

Merge branch 'master' into scion_te

4c65510

Merge branch 'master' into scion_te

e5b3c60

mawyss force-pushed the scion_te branch from 14b9077 to e5b3c60 Compare December 13, 2022 17:39

Merge branch 'master' into scion_te

e0f8a1f

mawyss closed this Mar 15, 2023

marcfrei mentioned this pull request Apr 5, 2023

Border Router performance optimization #4334

Closed

juagargi mentioned this pull request Nov 14, 2025

proposal: Hummingbird Border Router #4845

Open

juagargi mentioned this pull request Dec 1, 2025

(do not merge) Hummingbird current implementation for visibility #4850

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

border router: traffic control (scheduling) #4054

border router: traffic control (scheduling) #4054

Uh oh!

mawyss commented May 21, 2021 •

edited

Loading

Uh oh!

matzf left a comment

Uh oh!

mawyss left a comment •

edited

Loading

Uh oh!

matzf left a comment

Uh oh!

mawyss left a comment

Uh oh!

mawyss commented Mar 15, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

border router: traffic control (scheduling) #4054

border router: traffic control (scheduling) #4054

Uh oh!

Conversation

mawyss commented May 21, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

matzf left a comment

Choose a reason for hiding this comment

Uh oh!

mawyss left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

matzf left a comment

Choose a reason for hiding this comment

Uh oh!

mawyss left a comment

Choose a reason for hiding this comment

Uh oh!

mawyss commented Mar 15, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mawyss commented May 21, 2021 •

edited

Loading

mawyss left a comment •

edited

Loading