Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot properly close/cleanup character device ReadStream (process hangs) #15439

Closed
lfk opened this issue Sep 16, 2017 · 9 comments
Closed

Cannot properly close/cleanup character device ReadStream (process hangs) #15439

lfk opened this issue Sep 16, 2017 · 9 comments
Labels
doc Issues and PRs related to the documentations. fs Issues and PRs related to the fs subsystem / file system. good first issue Issues that are suitable for first-time contributors.

Comments

@lfk
Copy link

lfk commented Sep 16, 2017

Issue observed on two distinct systems:

  • System 1. Linux laptop

    • Version: 8.4.0
    • Platform: Arch Linux 4.11.9 (x86_64)
  • System 2. Beaglebone Black

    • Version: 6.10.3
    • Platform: Linux 4.9.41 (armv71)

I'm using fs.ReadStream to continuously read from a character device, e.g. /dev/input/event0. Receiving events works great, but on attempting to shut down the process does not exit as expected. Googling has turned up similar-sounding issues all the way back to node-0.x, but I have been unable to find a solution.

Code to reproduce (behaves the same on both aforementioned systems):

const fs = require('fs')
const stream = fs.createReadStream('/dev/input/event0')

stream
  .on('end', () => { console.log('End') })
  .on('open', fd => { console.log(`Opened (fd=${fd})`) })
  .on('close', () => { console.log('Closed') })
  .on('error', err => { console.log(`Error: ${err}`) })
  .on('data', data => { console.log(data) })

setTimeout(() => {
  console.log('Attempting to close stream')

  stream.on('close', () => {
    console.log(`Stream fd is now ${stream.fd}`)
    process.exit(0) // Hangs
  })

  stream.close()
}, 1000)

Output:

Opened (fd=9)
Attempting to close stream
Closed
Stream fd is now null
(Hung here, ^C or kill process to exit)

A few interesting (?) observations:

Explicitly obtaining the file descriptor we get identical results, unless if the .on('data', ...) listener is not attached, in which case we can exit as expected. Removing the listener in the shutdown process does not work, however, nor do we get this behavior if the fd is opened by fs.createReadStream (i.e. simply commenting out the .on('data', ...) from the first sample. Adjusted sample:

const fs = require('fs')

const fd = fs.openSync('/dev/input/event0', 'r')
const stream = fs.createReadStream('ignored', { fd: fd })

stream
  .on('end', () => { console.log('End') })
  //.on('open', fd => { console.log(`Opened (fd=${fd})`) }) // No event, as expected
  .on('close', () => { console.log('Closed') })
  .on('error', err => { console.log(`Error: ${err}`) })
  //.on('data', data => { console.log(data) }) // Exit works if commented out

setTimeout(() => {
  console.log('Attempting to close stream')

  //stream.removeAllListeners('data') // No result

  stream.on('close', () => {
    console.log(`Stream fd is now ${stream.fd}`)
    process.exit(0)
  })

  stream.close()
  //fs.closeSync(fd) // Does not trigger the `close` event
}, 1000)

This is probably not related to the problem, but I get the same result if I "trigger" an end-event by doing the following in my shutdown routine as such:

// `.push(null)` and `.read(0)` together trigger an `end` event
stream.push(null)
stream.read(0)

Is this a bug, or am I simply missing something? Cheers!

@mscdex mscdex added fs Issues and PRs related to the fs subsystem / file system. question Issues that look for answers. labels Sep 16, 2017
@lfk
Copy link
Author

lfk commented Mar 27, 2018

Some additional information in case somebody else is having similar problems.

I'm fairly confident that this happens because the data event handler remains in the node event loop (even after calling .destroy() or even explicitly unsetting all listeners). I'm by no means an expert on internals though, so please take that statement with a grain of salt.

Output from a simple test using Why Is Node Running?:

Closed

There are 3 handle(s) keeping the process running

# FSREQWRAP
(unknown stack trace)

# FSREQWRAP
/path/to/test.js:27 - stream.destroy()

# TickObject
(unknown stack trace)

Potential workaround

If an event is sent to the stream after close/destroy has been called, the process will end successfully without any errors. Thus, if you are able to simultaneously read and write from/to your stream source, you could ensure shutdown by artificially injecting some event in the .on('close')-handler. Might be a workaround for some.

A simple way to test the above is to use your keyboard or mouse as an input source, e.g. /dev/input/by-id/usb-my-mouse:

Opened (fd=9)
Attempting to close stream
Closed
Stream fd is now null
(Process hangs here)
(Move mouse)
(Process exits with code=0)

@gireeshpunathil
Copy link
Member

unable to reproduce this in macos, and with some other character devices in alpine linux too so looks like platform specific. Unable to access the said device in my Linux box, as it looks like that needs root permission.

Can you run with strace -f and see the last few lines? that will give some clues as to why the process hangs.

@lfk
Copy link
Author

lfk commented Apr 30, 2018

Certainly. Here's a log as requested, starting just where we do the stream.on('close') logging:

[pid   423] write(9, "Stream fd is now null\n", 22Stream fd is now null) = 22
[pid   423] futex(0x21d172c, FUTEX_WAKE_PRIVATE, 1) = 1
[pid   432] <... futex resumed> )       = 0
[pid   423] futex(0x21d16c0, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
[pid   432] futex(0x21d16c0, FUTEX_WAIT_PRIVATE, 2, NULL) = -1 EAGAIN (Resource temporarily unavailable)
[pid   432] futex(0x21d172c, FUTEX_WAKE_PRIVATE, 1) = 1
[pid   429] <... futex resumed> )       = 0
[pid   423] <... futex resumed> )       = 0
[pid   432] futex(0x21d16c0, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
[pid   423] futex(0x7f3e456ce9d0, FUTEX_WAIT, 429, NULL <unfinished ...>
[pid   432] <... futex resumed> )       = 0
[pid   429] futex(0x21d172c, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
[pid   432] madvise(0x7f3e36ffe000, 8368128, MADV_DONTNEED <unfinished ...>
[pid   431] <... futex resumed> )       = 0
[pid   429] <... futex resumed> )       = 1
[pid   432] <... madvise resumed> )     = 0
[pid   431] futex(0x21d16c0, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
[pid   432] exit(0 <unfinished ...>
[pid   431] <... futex resumed> )       = -1 EAGAIN (Resource temporarily unavailable)
[pid   432] <... exit resumed>)         = ?
[pid   431] futex(0x21d16c0, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
[pid   429] futex(0x21d16c0, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
[pid   432] +++ exited with 0 +++
[pid   431] <... futex resumed> )       = 0
[pid   429] <... futex resumed> )       = 0
[pid   431] madvise(0x7f3e377ff000, 8368128, MADV_DONTNEED <unfinished ...>
[pid   429] madvise(0x7f3e44ece000, 8368128, MADV_DONTNEED <unfinished ...>
[pid   431] <... madvise resumed> )     = 0
[pid   429] <... madvise resumed> )     = 0
[pid   431] exit(0 <unfinished ...>
[pid   429] exit(0 <unfinished ...>
[pid   431] <... exit resumed>)         = ?
[pid   429] <... exit resumed>)         = ?
[pid   431] +++ exited with 0 +++
[pid   429] +++ exited with 0 +++
[pid   423] <... futex resumed> )       = 0
[pid   423] futex(0x7f3e44ecd9d0, FUTEX_WAIT, 430, NULL

// Point where it hangs until ^C
^C) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
strace: Process n detached (7 processes detatched)

Thanks for looking into this!

@gireeshpunathil
Copy link
Member

thanks @lfk for the data, and sorry! actually I missed your updates among the github notification flood.

minimal reproduce:

$ cat 15439.js

const fs = require('fs')
const stream = fs.createReadStream('/dev/input/event0')
setTimeout(() => {
  stream.close()
}, process.argv[2] / 1)

I have tried with /dev/random , as I don't have access to the said device in my ststem.
with /dev/random, the code comes out after random intervals (apparently when the device reeives some random data at random intervals)

My theory: As the file system work is carried out by internal worker thread, and it has initiated a blocking read from the device, and the data won't come, the main thread is unable to move ahead - it is held up in the polling for an async event that has to come from the said worker.

If I adjust (reduce) the timeout I observe race condition - process exits sometimes, else hangs.

I believe the race is between the read thread and the close thread. If the close happens first, read is abandoned. If the read is initiated first, nothing else matters.

/cc @nodejs/fs @nodejs/libuv

@gireeshpunathil
Copy link
Member

A C reproduce, I wasn't successful with a libuv recreate.

$ cat 15439.c

#include <signal.h>
#include <unistd.h>
#include <fcntl.h>
#include <pthread.h>

int fd;

void *worker(void *data) {
  char buf[65536];
  read(fd, buf, 65536);
  read(fd, buf, 65536);
  return NULL;
}

int main() {
  pthread_t tid;
  fd = open("/dev/random", O_RDONLY);
  pthread_create(&tid, NULL, worker, NULL);
  sleep(1);
  close(fd);
  pthread_join(tid, NULL);
  return 0;
}
  • the multiple reads is to accommodate the flaky-ness of /dev/random. If you replace it with /dev/input/event0 then only one read is required.
  • /dev/input/event0 is operable only by root.
  • sleep(1) holds the key. If you adjust that, race will be seen.

The behavior is obvious, as the reader thread has entered the kernel and is oblivious to the closing of fd in main thread.

  • tcp streams do not have this issue as they are always non blocking.
  • regular fs streams do not have this issue as the disk read would always yield something, though with a latency that is addressed by internal worker threads.
  • this is a hybrid stream that uses a blocking device underneath but want to expose itself as an asynchronous stream abstraction.

Fixing this at its core will be bit involving, as it would need detecting such cases and interrupting worker threads.

Given that it is a corner case, I recommend to document this as a limitation.

@bnoordhuis bnoordhuis added doc Issues and PRs related to the documentations. and removed question Issues that look for answers. labels May 22, 2018
@bnoordhuis
Copy link
Member

This is not something libuv would (or could) fix but documenting it is a good way forward. Labels updated.

Aside: fs.createReadStream() is the wrong tool for this. Unless it's a /dev device that doesn't support polling you should use new net.Socket({ fd: ... }).

@gireeshpunathil gireeshpunathil added good first issue Issues that are suitable for first-time contributors. mentor-available labels May 23, 2018
@lfk
Copy link
Author

lfk commented May 28, 2018

Thank you again @gireeshpunathil, really appreciate that you took the time to look into this and provide such a detailed breakdown.

Just for the sake of completeness, in my case the net.Socket({ fd: ... }) approach doesn't work, it yields a TypeError: Unsupported fd type: FILE. But as mentioned earlier, I've got a viable workaround.

@gireeshpunathil
Copy link
Member

@lfk - sure. Do you want to attempt a documentation change to this effect and make it a PR? I guess you can best explain the limitation being the user who experienced it. Let me know!

gireeshpunathil added a commit to gireeshpunathil/node that referenced this issue Aug 14, 2018
charcter device streaming works just like any other streams, but hangs
on the close callsite due to the worker thread being blocked on the read
and main thread waiting for any async event that may not occur.

Document this behavior and suggest a potential workaround.

Fixes: nodejs#15439
addaleax pushed a commit that referenced this issue Aug 28, 2018
charcter device streaming works just like any other streams, but hangs
on the close callsite due to the worker thread being blocked on the read
and main thread waiting for any async event that may not occur.

Document this behavior and suggest a potential workaround.

Fixes: #15439
PR-URL: #21212

Reviewed-By: Vse Mozhet Byt <vsemozhetbyt@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Trivikram Kamat <trivikr.dev@gmail.com>
targos pushed a commit that referenced this issue Sep 3, 2018
charcter device streaming works just like any other streams, but hangs
on the close callsite due to the worker thread being blocked on the read
and main thread waiting for any async event that may not occur.

Document this behavior and suggest a potential workaround.

Fixes: #15439
PR-URL: #21212

Reviewed-By: Vse Mozhet Byt <vsemozhetbyt@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Trivikram Kamat <trivikr.dev@gmail.com>
targos pushed a commit that referenced this issue Sep 6, 2018
charcter device streaming works just like any other streams, but hangs
on the close callsite due to the worker thread being blocked on the read
and main thread waiting for any async event that may not occur.

Document this behavior and suggest a potential workaround.

Fixes: #15439
PR-URL: #21212

Reviewed-By: Vse Mozhet Byt <vsemozhetbyt@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Trivikram Kamat <trivikr.dev@gmail.com>
@kuba-orlik
Copy link

The workaround in the docs did not help me, but I managed to get it to work with:

this.stream = spawn("cat", [character_device_path]).stdout;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc Issues and PRs related to the documentations. fs Issues and PRs related to the fs subsystem / file system. good first issue Issues that are suitable for first-time contributors.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants