Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Async only exhausts handle when callback isn't removed #19764

Open
PMunch opened this issue May 5, 2022 · 6 comments
Open

Async only exhausts handle when callback isn't removed #19764

PMunch opened this issue May 5, 2022 · 6 comments
Labels
Async Everything related to Nim's async Documentation Content Related to documentation content (not generation).

Comments

@PMunch
Copy link
Contributor

PMunch commented May 5, 2022

I was trying to wrap a C library in Nims async capabilities. It all seemed to work fine, but when I tried to run more than one action at the same time it locked up. After some investigation I've concluded that it is caused by some subtle undocumented behaviour, potentially just a straight up bug. Consider the following example (only works on Linux, or other systems with async stdin):

import asyncdispatch

addTimer(5000, false, proc (fd: AsyncFD): bool =
  echo "ping"
  false
)

stdin.getFileHandle.AsyncFD.register
addRead(stdin.getFileHandle.AsyncFD, proc (fd: AsyncFD): bool =
  echo stdin.readLine
  true
)
addRead(stdin.getFileHandle.AsyncFD, proc (fd: AsyncFD): bool =
  echo stdin.readLine
  true
)

runForever()

When running the program this will happily print "ping" every five seconds. If you write something to the terminal in echos it back out, but then it stops sending the pings. What is happening here is that the second callback is blocking on trying to read stdin. The execution looks something like this:

timerCallback()
"ping"
timerCallback()
"ping"
# user writes "hello" and hits enter
readCallback1()
"hello"
readCallback2()
# This is now blocking on reading stdin

As we can see the second callback is also run when the standard input is ready for reading, even though the first handler exhausted the handle.

The related code is found in asyncdispatch:1265:1276 and reads (at the time of writing):

    # curList is the list of callbacks, newList is the new list of callbacks after this run
    var eventsExtinguished = false
    for cb in curList:
      if eventsExtinguished:
        newList.add(cb)
      elif not cb(fd):
        # Callback wants to be called again.
        newList.add(cb)
        # This callback has returned with EAGAIN, so we don't need to
        # call any other callbacks as they are all waiting for the same event
        # on the same fd.
        # We do need to ensure they are called again though.
        eventsExtinguished = true

If the callback handler returns false, meaning that it should not be run again, the rest of the callbacks are simply added to the new queue and will not be run until further reads are possible. However if the handler returns true as in the example above, meaning that the handler should not be removed it will continue processing the other callbacks. The relevant documentation for addRead is:

Be sure your callback cb returns true, if you want to remove watch of read notifications, and false, if you want to continue receiving notifications.

No mention of this exhaustion-handling at all. This should either be documented, or it should be fixed so that only one of the callbacks are ever called for a single event. At least on POSIX the selector should just immediately fire again if there is still data to be read.

@dom96
Copy link
Contributor

dom96 commented May 5, 2022

Hm, interesting how you arrived at this.

The current semantics make sense: you are telling the event loop to run two callbacks whenever the FD becomes readable. You are "extinguishing" it in the first callback by reading all the data in it, and then you're blocking your thread by calling read again when the FD is no longer readable.

What you should do instead is make your FD non-blocking, that way your second read will return EAGAIN and you can then return false from the second callback. As explained by the comment starting with "This callback has returned with EAGAIN" this is the expected logic and you shouldn't be using addRead et al with blocking FDs.

We can and should definitely document this better

@dom96 dom96 added Documentation Content Related to documentation content (not generation). Async Everything related to Nim's async labels May 5, 2022
@PMunch
Copy link
Contributor Author

PMunch commented May 6, 2022

Yeah the current semantics makes sense, if they had been documented. It wasn't immediately obvious that this is how the system worked, I just assumed that it would run either just the first one, a random one, or round-robin them, all of which should've worked fine with the way I was doing it.

The problem here is that I'm wrapping a C library and I don't read directly from the handle. When the handle can be read from I call io_process from the C library and it will run callbacks I have registered for certain events. The way I've built the wrapper is that setting up a new context registers a callback which completes registered futures. Then when I send a request I added a new reader with addRead which would call io_process and check if the future it was originally registered for had been completed or not (both before and after running io_process. If only one of the addRead callbacks had been called for each available read it would've worked fine, but with the current system that io_process will now block waiting for more data.

@dom96
Copy link
Contributor

dom96 commented May 6, 2022

Can you elaborate more on what this io_process thing does? The thing is: if your FD isn't set to be non-blocking then you always run the risk of blocking your program. This is because between the time your event loop notified you that the FD is readable and the time you do a read it might have become unreadable, you need to handle this case. I can advise more if I understand the library you are wrapping better.

@PMunch
Copy link
Contributor Author

PMunch commented May 6, 2022

The library I'm wrapping is libcoap. As you can see from the second way of handling IO listed in the description section you can select for read availability and then call coap_io_process (which I called io_process above because I strip away the coap prefix in my Futhark binding).

@dom96
Copy link
Contributor

dom96 commented May 6, 2022

What I think you want is to follow the "Method Two - epoll_wait() based on monitorable file descriptor" example. They even call coap_io_process with COAP_IO_NO_WAIT in that case, which I assume means non-blocking.

@PMunch
Copy link
Contributor Author

PMunch commented May 7, 2022

Yes, that is indeed what you'd want to do, and what I was doing. In my code I didn't have the blocking behaviour I used to demonstrate the issue above. The problem in my case was elsewhere, but it stemmed from a misunderstanding of how the callback from addRead was being treated. For the actual library I've now changed to registering the handle and assigning one common addRead callback to the file handle. When it triggers and after I call coap_io_process it then checks the number of waiting futures, if there are no more waiting callbacks it removes the callback and deregisters the file handle. This seems to work as expected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Async Everything related to Nim's async Documentation Content Related to documentation content (not generation).
Projects
None yet
Development

No branches or pull requests

2 participants