-
Couldn't load subscription status.
- Fork 86
Add RACK to SCTP #390
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Add RACK to SCTP #390
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #390 +/- ##
==========================================
+ Coverage 81.85% 82.32% +0.46%
==========================================
Files 51 51
Lines 3439 3717 +278
==========================================
+ Hits 2815 3060 +245
- Misses 482 498 +16
- Partials 142 159 +17
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
3709326 to
877b27d
Compare
|
@enobufs Hi, I think you've previously tried to implement RACK for SCTP, would you happen to have any pointers for whether my current work looks like it's on the right track, or any notes/comments on things that you had in mind? If you also have a branch for me to use as reference if you've worked on it before then that would also be very appreciated! |
eb3bd88 to
686e5ec
Compare
|
To test this I used this branch's SCTP version in the following examples:
Each of the examples (and the listed versions) worked as the behavior matched the expected behavior, so I think it's looking good so far :) |
|
@philipch07 this is beyond amazing. You are doing something phenomenal and making it look easy :) Anything I can do to help? In the back of your head think about writing a blogpost.
|
6c3e8ba to
3914804
Compare
|
Thank you @Sean-Der! I think the only thing I need right now regarding this PR is more reviews and testing (ideally in real applications with some latency). I'm excited but also cautious about this since it seems to be a pretty desired feature! The next thing I have in mind at the moment for RACK would be #394 (adaptive burst mitigation) which only further improves RACK's performance. I hope that the two features would make for a compelling upgrade for users :) I'm also brainstorming some ideas for the blog post as you've mentioned but it may be tied closer to open source contributing as a whole instead of just about RACK (especially since this type of feature is a bit less flashy). With some time and luck, I think I can create an interesting narrative about my motives, interests, and journey through the contributions that I've made so far! |
|
Also I performed a short test with Linux's traffic control in my WSL setup to simulate 500ms latency for outbound traffic (so anything going from server -> client would be delayed) using the data-channels example from webrtc (latest version, so v4.1.6) and it didn't drop any messages in either direction, even when I sent a burst of messages to the server (all of which were instant), and with intermittent messages to the server. |
3914804 to
cc4e9d0
Compare
|
@philipch07 Could you test this with https://github.com/pion/example-webrtc-applications/tree/master/ebiten-game |
|
also this is off topic, but @philipch07 are you in the discord server? I'd like to talk you more casually about SCTP/datachannel stuff. |
This is the flashiest thing that could happen. If you can put up a graph that shows 'Under these network conditions file transfers happen X faster' and call out stuff that use DataChannels + Go (WebTorrent and Tor Snowflake) that is a huge thing. I would love to introduce you to companies with a flashy intro of 'This is the person that made WebRTC Datachannels n% faster'. This and ICE Renomination that @kevmo314 working on I think have biggest change of just instantly making peoples WebRTC sessions better with no effort on their side. |
Just finished testing this and confirmed that this works! I didn't test it with added latency though.
That's a fantastic idea, that makes me think of creating some sort of visual simulation to display the impact of RACK vs non-RACK implementation in practice!
That makes sense to me, and I agree that "free" performance with 0 changes/settings toggled is always a win for a user :) |
|
Whoops forgot to @Sean-Der in the above comment^ |
|
Chiming in to say that this is really cool! The paper is really interesting, so cool to see this in pion. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really good, I reviewed it over days, there are no obvious issues in the code logic.
Really excited to see the benchmark.
Thank you a lot <3
8fc1911 to
d7f2b58
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great to have better retransmission control!
I'm trying to understand the original TCP-RACK algorithm and SCTP-RACK paper, but not yet done.
Added a comment based on TCP-RACK's RFC.
| srtt := a.rtoMgr.setNewRTT(rtt) | ||
| a.srtt.Store(srtt) | ||
|
|
||
| // RACK (NOT RFC 4960): track minRTT and latest delivered *original* send time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RFC8985 (TCP RACK) says
The sender SHOULD track a windowed min-filtered estimate of recent RTT measurements that can adapt when migrating to significantly longer paths rather than tracking a simple global minimum of all RTT measurements.
However, the current implementation seems to track the global minimum RTT. Can it cause a problem when the network delay is dynamically change? (when network delay is increased after measuring minimum RTT)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about this for a while (when I wasn't able to test it) and if my current understanding is correct then I believe that in the scenario you described:
- Start with a low network delay
- Send something (file, msg, etc.)
- Increase network delay
- Monitor output
Then it would lead to an over-aggressive rate of retransmission which could cause a high amount of traffic.
I just finished some really basic testing for this and ended up with some logs. You can find logging/printing approach in this sctp branch: https://github.com/pion/sctp/commits/pch07/test-rack-logs/ and my logs from the data-channel example in this webrtc branch: https://github.com/pion/webrtc/commits/pch07/test-sctp-rack/.
I think the current behavior is likely problematic. Thanks for catching it, and let me know if you find anything else! I'm going to try to come up with some way to properly calculate the minRtt with a windowed approach like RFC 8985 suggests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Also thank you for reviewing!!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JoeTurki helped to write a test to go from lowRTT -> highRTT (012d2b4) which worked perfectly with global min, but it was very broken when going from highRTT -> lowRTT with global min.
My commit 244b97f has fixed this now, as confirmed by @JoeTurki: #390 (comment).
Thank you for catching this! My approach was to use a monotonic deque to handle the window with a default timeout of 30s (configurable). I tried to find what the timeout was for other implementations but couldn't readily find one. If you or anyone has any ideas for what the default should be, please let me know!!
association.go
Outdated
| a.rackTimer.Stop() | ||
| } | ||
|
|
||
| a.rackTimer = time.AfterFunc(dur, func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm little curious how frequently the timers are created and how much it affects the performance
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't get pprof working on my setup but from looking at the source with @JoeTurki, AfterFunc seems pretty bad and the timer approach can be implemented better. I will replace this with goroutines and channels. Thanks for pointing this out!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forgot to comment that I fixed this in my latest commit: bb8671d. Thanks for finding this and let me know what you think!
|
I just found a slightly expanded version of the paper that we've all been looking at 🤦. Here is the slightly expanded version: https://duepublico2.uni-due.de/servlets/MCRFileNodeServlet/duepublico_derivate_00073893/Diss_Weinrank.pdf#page=97 From what I can tell, the core information is mostly the same but it has a much expanded section 5. I just updated the PR description to include it as well. |
|
@philipch07 nice work! I was busy earlier this week and did not have time to look on it yet. I am reading paper now. |
|
@philipch07 Hello I added a test for the condition you asked me to test in discord dms (High RTT -> low RTT), after you rebase you can un-comment this part Lines 944 to 957 in 012d2b4
Your Implementation handles it very well:
I'll also test for the other way around. |
d7f2b58 to
63ec7e1
Compare
|
Going from low RTT to high RTT causes the packets to drop, but that's fixed after commit #244b97f It needed to go to extreme values tho, and it was more of a manual test, I'll try to get it as a CI test, but It's very hard to make in a way that's not flaky and under ~20s before: After: Thank you. |
|
Also there are many network conditions that makes the current sctp implementation drop packets but this implementation handles it so well. really amazing work. |
83a3b73 to
bb8671d
Compare
Description
Adds RACK to SCTP (see the paper that this is based on here: https://icnp20.cs.ucr.edu/proceedings/nipaa/RACK%20for%20SCTP.pdf)
I would also like to add the variable burst mitigation as described in the above paper (section 5a), but that will be done in a separate PR as it doesn't require RACK to be implemented.
Reference issue
Closes #389
Upd: expanded version of the paper above: https://duepublico2.uni-due.de/servlets/MCRFileNodeServlet/duepublico_derivate_00073893/Diss_Weinrank.pdf#page=97