Skip to content

Conversation

@pfalcon
Copy link
Contributor

@pfalcon pfalcon commented Aug 21, 2018

Previously (as introduced in 48fadfe), if k_poll() waited on a
queue (or subclass like fifo), and wait was cancelled on queue's
side using k_queue_cancel_wait(), k_poll returned -EINTR. But it
did not set event->state field (to anything else but
K_POLL_STATE_NOT_READY), so in case of waiting on multiple queues,
it was not possible to differentiate which of them was cancelled.

This in particular broke detection of network socket EOF conditions
in POSIX poll() implementation.

This situation is now resolved with introduction of explicit
K_POLL_STATE_CANCELLED state, which is now set for cancelled queue
(-EINTR return remains the same).

This change also elaborates docstring for the functions mentioned, to
document this behavior.

Fixes: #9032

Signed-off-by: Paul Sokolovsky paul.sokolovsky@linaro.org

@pfalcon pfalcon requested a review from Vudentz August 21, 2018 21:12
@pfalcon pfalcon requested review from jukkar and rlubos August 21, 2018 21:12
@pfalcon
Copy link
Contributor Author

pfalcon commented Aug 21, 2018

@Vudentz, some thing just can't be put to rest ;-).

Turned out, your change 48fadfe actually broke POSIX poll() implementation done on top of k_poll(). POSIX semantics is that when EOF happens (and that condition is being signaled with k_fifo_cancel_wait() translates to "read data available" (POLLIN). User tries to read, and discovers that there's no data on this socket no more (EOF). So, it's imperative that cancel_wait() leads to "active" poll state, while you in 48fadfe#diff-ec46254d7d7d1ff37f681251c352e3f8L98 set it in this case to K_POLL_STATE_NOT_READY, which is null, non-signaled state. The only hint of cancellation is -EINTR return code, but as may imagine, that doesn't differentiate among multiple fifos.

So, instead of submitting a revert of that commit and go on another spin of that vicious cycle, I did more homework. So, what you need is cancellation to have separate, explicit return code (IIRC, to differentiate of race condition between multiple k_fifo_get callers). What I need is "active" poll state. The solution is thus to introduce a new poll state, which would map to -EINTR return.

And that's what this patch does, and even documents the behavior of funcs more explicitly. Hopefully, this patch finally will put this issue to rest.

@pfalcon pfalcon requested a review from tbursztyka as a code owner August 21, 2018 21:44
@pfalcon
Copy link
Contributor Author

pfalcon commented Aug 21, 2018

@rlubos : 2nd commit here then fully fixes #9032 . Were your sockets/{echo_client,server} affected? (They use poll(), right?). Feel free to give it a try.

@codecov-io
Copy link

codecov-io commented Aug 21, 2018

Codecov Report

Merging #9556 into master will increase coverage by <.01%.
The diff coverage is 50%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #9556      +/-   ##
==========================================
+ Coverage   52.17%   52.17%   +<.01%     
==========================================
  Files         212      212              
  Lines       25914    25914              
  Branches     5583     5583              
==========================================
+ Hits        13520    13521       +1     
+ Misses      10145    10144       -1     
  Partials     2249     2249
Impacted Files Coverage Δ
include/kernel.h 98.52% <ø> (ø) ⬆️
kernel/poll.c 82.35% <ø> (ø) ⬆️
subsys/net/lib/sockets/sockets.c 41.56% <0%> (ø) ⬆️
kernel/queue.c 94.69% <100%> (+0.88%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f72c4c5...8c22527. Read the comment docs.

@pfalcon pfalcon added priority: high High impact/importance bug area: Kernel area: Networking area: Sockets Networking sockets labels Aug 22, 2018
@pfalcon pfalcon added this to the v1.13.0 milestone Aug 22, 2018
@rlubos
Copy link
Contributor

rlubos commented Aug 22, 2018

@pfalcon Sorry. I'm on vacation now and don't have a chance to test it. echo_client does use poll, but it's pretty straigtforward, so I wouldn't expect any suprises if other samples work fine. Thanks for looking into that btw.

@pfalcon
Copy link
Contributor Author

pfalcon commented Aug 22, 2018

@rlubos: Ack, thanks for the response, no hurries/worries. Just trying to understand how that issue could go unnoticed for so long (will definitely add echo_async to my usual set of smoke tests).

@nashif nashif requested a review from dcpleung August 22, 2018 15:43
@Vudentz
Copy link
Contributor

Vudentz commented Aug 23, 2018

@pfalcon LGTM but perhaps we should add a test for the new state then so we ensure the behaviour is not changed again. Btw, I don't see you using the new state, is the code you attempt to fix out of the tree?

@pfalcon
Copy link
Contributor Author

pfalcon commented Aug 23, 2018

perhaps we should add a test for the new state then so we ensure the behaviour is not changed again.

OK, will look into that.

Btw, I don't see you using the new state, is the code you attempt to fix out of the tree?

Nope, it just smartly used (pev->state != K_POLL_STATE_NOT_READY) condition, so don't need an update ;-) :
https://github.com/zephyrproject-rtos/zephyr/pull/9556/files#diff-9737de3d515c56bbc1c15bfec7a88651L738

@pfalcon
Copy link
Contributor Author

pfalcon commented Aug 23, 2018

@Vudentz: Indeed, there was no testcase for k_poll returning -EINTR at all. Now added, tests for both -EINTR and K_POLL_STATE_CANCELLED.

Previously (as introduced in 48fadfe), if k_poll() waited on a
queue (or subclass like fifo), and wait was cancelled on queue's
side using k_queue_cancel_wait(), k_poll returned -EINTR. But it
did not set event->state field (to anything else but
K_POLL_STATE_NOT_READY), so in case of waiting on multiple queues,
it was not possible to differentiate which of them was cancelled.

This in particular broke detection of network socket EOF conditions
in POSIX poll() implementation.

This situation is now resolved with introduction of explicit
K_POLL_STATE_CANCELLED state, which is now set for cancelled queue
(-EINTR return remains the same).

This change also elaborates docstring for the functions mentioned, to
document this behavior.

Fixes: zephyrproject-rtos#9032

Signed-off-by: Paul Sokolovsky <paul.sokolovsky@linaro.org>
In this case k_poll() returns -EINTR, while still fills in event
states.

Signed-off-by: Paul Sokolovsky <paul.sokolovsky@linaro.org>
This is similar to change which was done in 21f31e9, unfortunately
this case was missed.

Fixes: zephyrproject-rtos#9032

Signed-off-by: Paul Sokolovsky <paul.sokolovsky@linaro.org>
@pfalcon
Copy link
Contributor Author

pfalcon commented Aug 27, 2018

CI issues now fixed.

@pfalcon
Copy link
Contributor Author

pfalcon commented Aug 29, 2018

@Vudentz, ping. A test was added on your suggestion, this is ready for review. Thanks.

Copy link
Member

@dcpleung dcpleung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

Copy link
Contributor

@Vudentz Vudentz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@nashif nashif merged commit a2d1252 into zephyrproject-rtos:master Aug 30, 2018
@pfalcon pfalcon deleted the k_poll-cancelled branch September 6, 2018 09:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: Kernel area: Networking area: Sockets Networking sockets priority: high High impact/importance bug

Projects

None yet

Development

Successfully merging this pull request may close these issues.

net/sockets/echo_async crashes after several connections (qemu_x86)

6 participants