-
Notifications
You must be signed in to change notification settings - Fork 6.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TCP memory leak fixes #23334
TCP memory leak fixes #23334
Conversation
6c9d5c8
to
7f27337
Compare
7f27337
to
52eab57
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TCP over 6lowpan works fine.
5ce60fa
to
f25c198
Compare
Found still some corner cases when the packet was already about to be sent in tx thread and it was removed from tcp code. |
@pfalcon You had some nice stress tests for TCP, are you able to run them against this PR? |
In all fairness, I don't have further stress tests beyond those mentioned in #23246, #23302. I'm aware of tools like wrk, I also used boom because it's Python-scriptable, but for Zephyr testing I so far stick with Regarding testing this: ever since we last talked in November, I'm working on setting up automated integration tests for Zephyr networking (and not only networking) - which is of course similar to what many Intel folks do. Except the tool we need to use here, LAVA, is designed for Linux testing, and requires adding more functionality for small baremetal devices like Zephyr. Then deployment to testing lab, then figuring out issues, etc., overall, goes slow. And right now I'm working on preparing "last" round of patches for LAVA monthly release. Bottom line: I would hope to test this by the end of week. If that's too long and other folks will approve it, I'll test if after merge then. Thanks. |
f25c198
to
0f0e907
Compare
@@ -178,6 +178,13 @@ struct net_pkt { | |||
* AF_UNSPEC. | |||
*/ | |||
u8_t ppp_msg : 1; /* This is a PPP message */ | |||
|
|||
u8_t tcp_first_msg : 1; /* Is this the first time this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd find both this code comment and commit message which introduces it ("net: pkt: Add flag for telling if this is TCP 1st packet") to be unclear and confusing. To start with, we already maintain linked list of packets which were already sent (and may need to be resent) as tcp->sent_list
. So, if a packet is not in that list, it's "1st packet" (ahem to the terminology), and if it's in that list, it's "not 1st packet". So, do we need an extra flag for that? If we do, how about naming it more clear? E.g., "resent" (or "resent_pkt", or "pkt_resent"). But note that we already have flags sent_or_eof
and pkt_queued
. So, as I said, this is getting more and more confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The flag tells is this the 1st time this packet has been sent. This is done in order to avoid doing one extra ref when sending it first time. It is indeed confusing, I try to clarify the documentation a bit. The sent_or_eof
flag is for another purpose, there it means whether the pkt was sent in the first place. I tried to use the sent flag for detecting whether we need to ref the pkt in relevant places but it became very convoluted so it was much easier to just add a new flag for this. I agree that there are several flags now in the system and their interaction is hairy, but it is not easy to send the packet as there are two threads involved and the interaction is difficult when we have timers that expire when the packet is not even sent yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that there are several flags now in the system and their interaction is hairy, but it is not easy to send the packet as there are two threads ...
Ok, thanks for clarification.
Would make sense to squash "net: pkt: Print caller and line when allocation fails" and "net: pkt: Print size of the failed net_buf allocation" together? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for patchset. I see that the "meat" of the fixes is in the "net: tcp: Fix memory leak when lot of incoming packets" commit, and it's hard to understand them from just speculative review. However, for other commits, I made some comments calling to increase clarity. Because otherwise we may have situation when some things are fixed, but the code gets only more convoluted (potentially obscuring further issues).
I still didn't have a chance to test it thoroughly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so I did my usual testing matrix, which involves dumb_http_server/big_http_download, qemu_x86/frdm_k64f, and don't see neither improvements nor regressions with this patch. Which I think is a good sign. My results, as usual, are in https://docs.google.com/spreadsheets/d/1_8CsACPEXqrMIbxBKxPAds091tNAwnwdWkMKr3994QY/edit#gid=0 , which provides rather sparse, but still some trajectory over the years.
So, I think this patch is a positive matter, but I don't give +1 right away, as I think there's room to improve clarity/efficiency of the code, per my earlier comments. Thanks.
Sure, I was waiting your testing results before addressing the comments. I will send a new version next. |
TCP code needs to know whether the pkt is sent the first time or is it a resent one. This information is used when deciding if the pkt ref count needs to be increased or not. The packet does not need to increase ref count when sent first time, as the ref count is already 1 when the pkt is created. But for the 2nd time the packet is sent, we will need to increase the ref count in order to avoid buffer leak. Signed-off-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
The code was leaking memory in TX side when there was lot of incoming packets. The reason was that the net_pkt_sent() flag was manipulated in two threads which caused races. The solution is to move the sent flag check only to tcp.c. Fixes zephyrproject-rtos#23246 Signed-off-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
It is useful to know that we received ACK when debugging the TCP code. Signed-off-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
The RX fragment debug print does not work (compile error) if memory allocation debugging is enabled, so disable it for time being. Signed-off-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
It is useful to know who called the net_pkt allocator when we run out of buffers. Signed-off-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
If we are receiving UDP packet and if there is some error happening inside zsock_recv_dgram(), then make sure that the net_pkt received from recv_q is freed. Signed-off-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
This information is useful when debugging things. Signed-off-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
It is possible that net_pkt will disappear while we are sending it, so save link address if we need that information. Signed-off-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
0f0e907
to
5b8a3c9
Compare
Rebased against upstream and updated according to comments. |
Fixes #23246