-
Notifications
You must be signed in to change notification settings - Fork 397
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Windows: unable to interrupt blocking hid_read call #133
Comments
I don't think it is safe to assume Any client code has to stop any read/write attempts before calling What do I miss here? |
Let's say you have 2 threads (both are using some kind of wrapper to make sure that access to hidapi is synchronized via mutex or something else). First one is doing some thread blocking read operation which puts whole thread on sleep, and the second one is trying to abort this read. The only way to abort is via Currently P.S. Pardon the grammar, just woke up. |
here is a pseudo-code, that works without issues and flexible for various use-cases: while(!interrupted) {
result = hid_read_timeout(dev, buffer, buffer_size, 1000);
if (result == 0) // no data available
continue;
if (result > 0) {
// handle data here
}
else {
// handle/report error, retry, etc.
}
} thread2: interrupted = true;
thread1.join();
// now we're sure no other threads are trying to access the device,
// assuming there is no threads, except thread1
hid_close(dev); such usage conforms to my previous statement, that client code should first stop any read/write attempts before trying to close the device as |
@todbot I see that there is a good bit of discussion under node-hid. I'm not sure what is the node-hid implementation/use-case and don't want to misunderstand the discussion. I'd very like to hear your input on this. My point stands that tuning |
Hi @Youw, I believe I'm in agreement with you now, after researching things a bit more and doing some tests. Thank you @FrogTheFrog for helping me understand Node.js is usually single-threaded so the "read on a closed handled" issue doesn't come up much. However, On my main use cases MacOS and Linux hidraw, this issue seemingly never came up. I guess both of their |
I believe it cannot deal gracefully by design. I've checked both hidraw and macOS implementations: there is always a race condition when The fact that you don't hit any issues is just you being lucky. It doesn't mean it is a well defined behavior. Instead, I'd rather add an explicit check into each What we do need here, as a solution:
P.S.: yes, the implementation of |
Agreed, I think I was getting lucky (or at least the underlying system calls on MacOS and hidraw are more tolerant to bad usage). And I might put in that hard-crash check in a local copy of The addition of a |
Similar discussion on libusb. From Chris in libusb/libusb#297
|
I found the same while looking at this: #650 Suggestion: Why not state that hid_close is thread safe to call? There are plenty of synchronization APIs on all platforms supported to ensure this. What supported system does not have this? E.g. on Windows we can just add a simple critical section which is then entered during hid_read_timeout() (and write), and released when calling GetOverlappedResult(). hid_close() then enters the same critical section. Then there is no potential race. @Youw's current comment that the current solution is not safe is true (and relies on luck :-)), but forcing people to use use short timeouts to workaround an unsafe API is not cool either. On a tight system, I expect to be able to call read() with a timeout of forever with any problems, as this is the most efficient implementation. When using hidapi, as a user i expect hid_close() to be thread safe, just like when calling close() on a file. Anything else, even if documented, would be illogical to a typical application programmer. |
I made a quick patch for win32 to show this concept. This also has the side-benefit of actually making the entire library actually thread safe (for win32 for now). Probably needs a bit more work, but i am fairly sure this will work. |
That suggestion has the same problem as the initial problem. How would you know you had something pending, if you did not utilize any O/S synchronization? You cannot, for the same reason you pointed out to begin with. Trying to "guess" if something was pending would rely again on being "lucky", and would be just as broken as the mechanism you are trying to fix. There is only one way to make it air-tight, and that is by implementing some kind of synchronization mechanism. If you then use that mechanism to accept or reject is then up to you. I would argue that if synchronization is implemented, why not just upgrade the library to be thread safe? |
So is the case with It is technically not possible to have a thread-safe 169c524 - right, so you're forcing a non-free overhead for all users that don't need it at all.
This is the solution: #146
There is no "not safe solution". There is either a solution or no solution at all/undefined behavior. The current behavior, when the things just hang - that is a good thing. That gives you the ability to catch it early and implement a stable workaround instead of having a nightmare catching rarely-occuring undefined-behavior/crashes later when you product hits thouthands of users in the world. |
I am not sure I understand the logic here. So using a few cycles using efficient locking is a problem? It will not be visible in any real-life application as locking is some of the most efficient O/S operations. But constantly wasting millions of instructions by having short timeout to get a round a design issue is considered a good thing? This is on the other hand very visible in real-life applications..... |
Nope. Same problem again. How will you define when it is safe to call hid_interrupt_read()? |
Please-please-please read my comment and understand that there is still a race-condition!
I don't know how you measured the millions of instructions because of the timeout.
Would you rather have a race condition/random crashes on you application?
I have multiple applications in production doing this. The only visible down side of this - it taks up-to-1-sec longer when closing the device/application. |
And external syncronisation is still will be needed. |
This is a "weird case" for any application that chooses to call close() in the middle of any file operation. But this is the applications responsibility to handle this and not the underlying library or O/S. With this logic you could strip all thread-safe operations from all system, as you can always argue that the user can mess it up. |
This basically a definition of "not thread-safe"... And I feel a bit inconsistency between the statement above and a statement below:
|
I have read you comments. Your argument is based on the fact that you cannot call other functions (read, write, whatever) after calling close. Yes, this is true for hidapi, as it is true for basically any other file-like system. I fully accept this and is it valid, just like any other system. But using this as an argument to avoid thread safety does not seem reasonable. Having worked with operating system for too many years, I agree that synchronization is not as easy as people make it out to be. However, your statement that it is inherently impossible to thread safe a simple library like hidapi is not something I can agree with. Following that logic all O/S on the planet should be scrapped.
You cannot in any way guarantee that this will work across multiple O/S without using synchronization. It just might only work on Windows. And then you are back to basically allowing async close() anyway. But why symptom fix this one thing, instead of fixing the overall problem?
No. But that is what you have now.... with synchronization the result would be the same every time, except for the case of calling read/write after close(), just like you would expect. Again, calling file operations on a device you just closed is not something I consider a problem, but just a broken app.
I have applications that need to shutdown within milliseconds. Hard-coding hidapi to suit your very narrow use-case does not seem reasonable to a wider audience. The only way to solve that would be to use smaller timeouts, and waste millions if instructions. The reason blocking operations are good is because of efficiency.
That is the whole point of this thread. That is not the current behavior. The current behavior is completely random, due to the obvious races in the code... |
I would still like to know if the current patch (which I just updated) has other races than the one where you erroneously call operations after close. If we accept that (just like any other file system) agree that you cannot call operations after close, I think the propose patch make the entire library thread-safe. Any input would be great. |
That is the implementation details. HIDAPI already uses syncronisation for some cases where it is unavoidable.
Interesting assumption. I wouldn't make such w/o a propper investigation first.
Of course not. The intended scenario is:
Of course not. Right now the only doable scenario is as above but w/o step 2. But race condition on this matter.
Lets make it right. #146
Yeah, I agree here. If you use timeout of several (one?) ms - that is a significant overhead.
I stand corrected. What I mean is - this is much easier to reproduce than a possible crash if one only uses
Only on Windows. And I'm not ready to commit to have the entire library thread-safe - at most
We need a more strict definition here: you cannot call operations once
Lets assume the behavior is as if your patch is accepted. In this case yes, it is safe to call Which means you also need to ensure that once you decide to call the |
I'd just like to add my 2c and point out a problem I had with the Rust wrapper, which I explained in ruabmbua/hidapi-rs#151. The Rust implementation makes it impossible to call any methods on the device from different threads at the same time, but you can call them sequentially using a mutex. And you don't have access to It turns out that this still causes a segfault on Windows, presumably because the pending overlapped/async operation that was started on the other thread doesn't get canceled (even though Apparently a side effect of that is that you might be able to stop a blocking |
@YgorSouza I totally agree. We probably need to work a bit more on the synchronization mechanism to make sure all of @Youw's concerns are met. I am not sure we can handle the close() scenario properly, but I have a few ideas that could improve it a bit. |
That sounds like a totally different issue. It sounds like it still going to be there even if one switches to CancelIoEx, e.g. if someone uses hid_read, it returns on timeout, and the buffer gets immediatelly deallocated. I think we could consider fixing it separately. |
The only code I have is the one I posted in the Rust issue linked above. My application is similar to the one mentioned in #576 (comment). I have a single dedicated thread that locks the list of open devices and tries to read them with a timeout of 0, then releases the lock and sleeps until the next cycle. And the UI thread can lock the list and add or remove some devices from it. I did notice that in my Rust reproduce, if I change the timeout to 1ms it doesn't crash, but it also doesn't crash if I leave it at 0 and sleep for 1ms before or after, while holding the lock, so I'm guessing it's just a timing coincidence. It is undefined behavior after all. So maybe just using I know this isn't enough for this issue here, since you'd still need more synchronization to prevent |
I tend to think this discussion is quite similar to various discussions in libusb. The users like to use the simple synchronous API but there are limitations. Reference libusb documentation. Reference libusb Wiki. What are the differences between Synchronous and asynchronous device I/O? Please take note that ALL transfers MUST be freed before libusb_exit() is called. Please take note that ALL libusb objects MUST be released before libusb_exit() is called. Please also take note that there is no way to cancel Synchronous transfer after the request has been submitted. |
Right now we have synchronous API in hidapi: Maybe we can add asyshronous API like And then the synchronous API can be re-written based on the above two asynchronous API, just like libusb. |
Currently hidapi is not able to close open device handle in a thread-safe manner. This is due to
CancelIo
function being used instead ofCancelIoEx
for closing a win32 device handle.Upgrading to
CancelIoEx
would cause a WinXP support to be dropped (signal11/hidapi#48), however a fallback could be used to try and loadCancelIoEx
dynamically to keep a somewhat buggy WinXP experience (signal11/hidapi#208).Something needs to be done about this issue as hidapi is being used in multi-threaded environment more and more. I would personally propose to drop WinXP support altogether as there is no point in beating a dead horse, however I would like to hear what you guys think.
The text was updated successfully, but these errors were encountered: