Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle out-of-buf cases gracefully in net #9118

Merged
merged 4 commits into from
Aug 14, 2018

Conversation

jukkar
Copy link
Member

@jukkar jukkar commented Jul 25, 2018

This removes K_FOREVER calls from network buffer allocations so it is not possible to "hang" IP stack because of no network buffers available. Tested with echo-server running in frdm-k64f and flooding it from Linux using net-tools echo-client program. The system did not manage to receive much packets but it did not hang.

@jukkar jukkar added area: Networking DNM This PR should not be merged (Do Not Merge) labels Jul 25, 2018
@codecov-io
Copy link

codecov-io commented Jul 25, 2018

Codecov Report

Merging #9118 into master will decrease coverage by <.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #9118      +/-   ##
==========================================
- Coverage   52.53%   52.52%   -0.01%     
==========================================
  Files         212      201      -11     
  Lines       25821    25359     -462     
  Branches     5428     5299     -129     
==========================================
- Hits        13565    13320     -245     
+ Misses      10071     9944     -127     
+ Partials     2185     2095      -90
Impacted Files Coverage Δ
drivers/ethernet/eth_native_posix_adapt.c 20.4% <0%> (-19.22%) ⬇️
lib/posix/pthread_key.c 75.92% <0%> (-5.56%) ⬇️
subsys/net/ip/icmpv6.c 30.31% <0%> (-4.24%) ⬇️
kernel/mempool.c 84.61% <0%> (-3.85%) ⬇️
kernel/alert.c 96.42% <0%> (-3.58%) ⬇️
include/misc/slist.h 86.66% <0%> (-3.34%) ⬇️
subsys/net/ip/icmpv4.c 35.03% <0%> (-3.31%) ⬇️
drivers/ethernet/eth_native_posix.c 20.86% <0%> (-3.14%) ⬇️
kernel/stack.c 92.3% <0%> (-1.93%) ⬇️
kernel/mem_slab.c 98.07% <0%> (-1.93%) ⬇️
... and 35 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4b9d980...f236d18. Read the comment docs.

@jukkar
Copy link
Member Author

jukkar commented Jul 26, 2018

New version with L2 and network lib directories handled

@jukkar jukkar removed the DNM This PR should not be merged (Do Not Merge) label Jul 26, 2018
@jukkar
Copy link
Member Author

jukkar commented Jul 27, 2018

recheck

if (!net_pkt_append_##type##_timeout(pkt, value, \
BUF_ALLOC_TIMEOUT)) { \
ret = -ENOMEM; \
goto drop; \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That goto is a bit of a trap as it is encoded in the #define. As the code is specific to this file, instead of a goto drop cleanup could be performed. Or the caller could free all allocations on error. In this code it is simple, all drops just return the error, so a return -ENOMEM can be safely returned here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to make the define to do "the right thing" meaning that all the functions would have error return at the end of the function. The defines are there so that the code looks a bit nicer and we do not have so many exposed error checks in the middle of the functions. I am not really sure I follow how would you like to see the code to look like in this case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm on @jukkar's side here, in the sense that I find such code acceptable and "attempting-to-do-better". It's not just a single case (like dns/llmnr_responder.c to which @pfl originally commented), but a pattern reused in different files. It's effectively a poorman's exception handling handling, implemented in C.

Well, I can see @pfl's point too, that for someone not familiar with the code (or to someone who didn't look at the code for a couple of months ;-)), it may be confusing - after all, C is the language without exception handling, and it may be not immediately obvious that some functions/macros have a "side effect" of "throwing" an exception (goto'ing drop in this case). A compromise may be naming this macros e.g. as "append_or_drop()".

@jukkar
Copy link
Member Author

jukkar commented Aug 6, 2018

New version that fixes merge conflict.

@@ -134,7 +134,8 @@ int net_ipv6_send_na(struct net_if *iface, const struct in6_addr *src,
* @param iface Network interface
* @param next_header_proto Protocol type of the next header after IPv6 header.
*
* @return Return network packet that contains the IPv6 packet.
* @return Return network packet that contains the IPv6 packet or NULL if
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you edit this description, maybe makes sense to make it less cryptic at the same time. If this function returns its input "pkt" param, makes sense to say so, and then in parens add "(which now contains IPv6 header)".

@jukkar
Copy link
Member Author

jukkar commented Aug 7, 2018

Fixed compile error and clarified the function documentation in ipv6.h

@pfalcon
Copy link
Contributor

pfalcon commented Aug 7, 2018

This patch definitely moves in the right direction, though I wonder if it may have not immediately obvious outcomes (i.e. further refactors/changes may be required as follow-ups). Before +1'ing this, I'd like to test this with my usual smoke tests (which may take some time, as I'm just back from vacation and going thru backlog of things happened during this time).

@jukkar
Copy link
Member Author

jukkar commented Aug 8, 2018

rebased to latest master

@pfalcon
Copy link
Contributor

pfalcon commented Aug 9, 2018

I tested this on master 3712fd3 against issues described in #7831 and #3132, it doesn't help with it - there's a still imminent request processing delay after processing some (~100) number of requests.

@pfalcon
Copy link
Contributor

pfalcon commented Aug 9, 2018

I then proceeded to test with my 2nd standard testcase - samples/net/sockets/big_http_download.

With both dumb_http_server and big_http_download samples, I didn't see any objective regressions. And that's the whole point. I didn't see any improvements either on the issues I tested for, and that only proves that we have many different issues. I think we should merge this (because it's clearly a paradigmatic improvement), and move on to fight next issues.

@pfalcon
Copy link
Contributor

pfalcon commented Aug 9, 2018

@pfl: Please review discussion of the issues you raised in #9118 (comment) , and see if you like any suggestions there.

@rveerama1
Copy link
Collaborator

Just a side note, we still need to modify net_pkt_write_xx() helper functions. Those are still using K_FOREVER.

@jukkar
Copy link
Member Author

jukkar commented Aug 10, 2018

Changed net_pkt_write_xx() and net_pkt_insert_xx() to use the timeout.

@jukkar
Copy link
Member Author

jukkar commented Aug 10, 2018

Fixed compile error in rpl.c

@jukkar
Copy link
Member Author

jukkar commented Aug 10, 2018

@pfl ping, can you re-visit this one?

Instead of waiting forever for a network buffer, have a timeout
when allocating net_buf. This way we cannot left hanging for a
long time waiting for a buffer and possibly deadlock the system.
This commit only adds checks to core IP stack in subsys/net/ip

Fixes zephyrproject-rtos#7571

Signed-off-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Instead of waiting forever for a network buffer, have a timeout
when allocating net_buf. This way we cannot left hanging for a
long time waiting for a buffer and possibly deadlock the system.
This commit adds checks to L2 and network support libraries.

Fixes zephyrproject-rtos#7571

Signed-off-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Do not wait forever for network buffers when receiving data.

Signed-off-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Do not wait forever for network buffers when receiving data.

Signed-off-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
@jukkar
Copy link
Member Author

jukkar commented Aug 14, 2018

Resolved merge conflict.

@jukkar jukkar merged commit 3832378 into zephyrproject-rtos:master Aug 14, 2018
@jukkar jukkar deleted the bug-7571-buf-overload branch August 14, 2018 09:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants