-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bugfix: AsyncEventSource writes multiple events per tcp send, including partial events that straddle buffers; Improvement: don't hold onto event items until ack, immediately remove them from queue #41
Conversation
Interesting! I will cherry-pick your changes to the fork we maintain at https://github.com/mathieucarbou/ESPAsyncWebServer Is the PR final ? Thanks! |
…ment: don't hold onto event items until ack, immediately remove them from queue Ref: esphome#41
edit: let me address the two self-comments and the code |
07a5024
to
5f22ece
Compare
…utilize the buffer, stop needlessly holding onto events in queue until ack
5f22ece
to
352554c
Compare
@mathieucarbou Ready for cherry-pick. |
@nkinnan : before applying your fix, I ran some tests on the fork we actively maintain (based on https://github.com/yubox-node-org/ESPAsyncWebServer). It has received a lot of bug fixes that the ESPHome fork does not have, including some in the AsyncTCP lib (we also maintain it). Running our SSE example with a delta of 1 ms (one event sent per millis), Chrome receives about 1 message out of 2-4: I get some console log errors such as:
because of course the loop is exhausting the queue. Setting delta to 5 ms, I do not have any messages discarded. Which means running in AP mode, Chrome connected to a bare ESP32, I can send at a rate of 1 (short) message each 5 ms without loosing some. Is it possible that the delay you observed is caused by the communication time in STA mode ? |
…ng partial events that straddle buffers; Improvement: don't hold onto event items until ack, immediately remove them from queue Copy of esphome#41
…ng partial events that straddle buffers; Improvement: don't hold onto event items until ack, immediately remove them from queue Copy of esphome#41
I've re-created your PR in our fork (mathieucarbou#92). |
These factors in combination probably explain why you saw better performance than I did without this PR, but then the PR still did improve things further. I will definitely take a look at your forks and see if there's anything we want to pull over here, thanks for the heads up! :) Edit: 4) The TCP library used on ESP32 class chips is more advanced and capable, see the update to the PR description for details. |
Point 3 is why my source branch is named async_event_source_yield by the way - I originally thought that was the problem and added yield's all over the place to see if it fixed the throughput issue :) Eventually I found the true root cause but never switched to a new branch name. |
Bugfix: AsyncEventSource writes multiple events per tcp send, including partial events that straddle buffers
On line 148 in the diff, client->send() was being called before additional events could write into the tcp client send buffer. Once send is initiated, client->canSend() will return false preventing more events from being written. (edit: on the 8266, see update section for esp32 behavior) This resulted in only a single event per packet to be sent.
Because an ack must be received before canSend will become true again (the buffer must be held onto in case a tcp retry is needed) this resulted in (observed with WireShark) only one event every 70-80ms -- ESP-01: 20ms to send another packet after ack received, windows: 40-50ms to send an ack back after receiving a packet due to the equivalent of 'nagle algorithm' for acks on the receive side
The call to _client->send(); is now on line 267 after as many events have been written to the client tcp send buffer as will fit (including partial events to fully maximize the throughput). This improves event-throughput-per-ack by as much as 10-15x in some cases but probably around 5x in general (it varies with event size) based on the log output I was observing during testing. (edit: on the 8266, see update section for esp32 behavior)
I call this a bugfix since clearly the code was trying to add multiple events per send, that appears to be the original author's intent, it was just failing to do so because client->send() was called after the first event was written and that locked the tcp send buffer against further writes. (edit: on the 8266, see update section for esp32 behavior)
Improvement: don't hold onto event items until ack, immediately remove them from queue once (fully) written into the tcp send buffer
This code was holding onto the queued events until an ack came back, even after they were written to the tcp send buffer. There is no need to do so, the tcp send buffer is retained by the client for retries (why things were stalling out in the first place) so this just results in the data being duplicated in memory needlessly. Removing them from the queue immediately after write to the client tcp send buffer also makes those queue slots available for new incoming events, potentially reducing the number of dropped events during event bursts, or just lowering memory usage otherwise.
Practical effects for esphome:
And just in general:
Update for ESP32 chips:
I'm now looking at an ESP-C3 on wireshark and can see that the different TCP library used on ESP32s vs ESP8266s does allow the AsyncEventSource to write multiple events to the tcp send buffer even after a send has been initiated. This more advanced TCP stack will simply send additional packets out, processing the send-buffer piecewise as it is written. The act of sending packets incrementally as each event is written will actually reach the receiving side's threshold for sending an ACK (or does on Windows anyway), even with delayed ack in effect. This additional capability and coincidental effect on ack frequency is probably why no-one noticed this undesirable behavior earlier. (Plus performance issues on an 8266 are likely to be blamed on the slower core and dismissed.)
The change in this PR is still an improvement on the ESP32 stack, there is still an increase in event throughput, more efficient use of the full length of the send buffer, less CPU usage due to the batching, and less memory usage due to discarding queue items early, but the improvement will be less noticeable since the improved behavior and capabilities of the new TCP stack mitigated the implementation flaws in AsyncEventSource to a large degree - making it only display truly degenerate behavior on the 8266 class chips.
Additionally, on the 32, since the send buffer is larger than the MTU of ~1400, in case of high event load two packets will still be sent, thus still bypassing delayed ack. Even in case of lower event load, additional events will still generate additional packets (on addition loop()s) with the upgraded tcp library's ability to process the send buffer incrementally, thus also bypassing delayed ack. So this PR won't cause any regression in behavior in that regard due to the "batching effect".
I haven't dug into the TCP libraries themselves to see if this capability to incrementally process the send buffer and send additional packets as new writes happen could be added to the older stack used on the 8266. The improvement in this PR is probably "good enough" in working around the limitation, the 8266 isn't recommended for new designs at this point, and going that low into the network stack carries higher risk of regression while this change is relatively straightforward.