-
Notifications
You must be signed in to change notification settings - Fork 898
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OS X: Unable to find previously closed device / random crashes when closing device #114
Comments
Sounds similar to #103 . Apple changed something in Mountain Lion. They break stuff related to HID in every major release. On top of that, there are races that I'm not sure there are any good ways to work around. I don't have 10.8, and I don't know what it is that changed, unfortunately. dantulurisudheer says he got it working. I asked him for a patch, and haven't heard from him (I just pinged him again). If you can try to reproduce the problem with a simple command line program (or some testgui hacks), it might be easier to diagnose the problem. |
When you say that it happens in a loop with a 30 second timeout, do you mean the timeout on the read is 30 seconds? (so it waits for 30 seconds in case the device sends data, if not then it moves to the next protocol/transport?) At what point in the sequence are you disconnecting/reconnecting the device? During the timeout? |
Hey mrpippy. It's been a while. Are you using 10.8 without problems? |
mrpippy: I'm not sure at what point I unplugged the device since the connection attempt it is pretty speedy. signal11: The Apple dev world is proving to be quite an experience for us so far, coming from MS land. I'll work on making a simpler program to reproduce the issue. I'll have to brush up on my C++ :) |
signal11: Also I just compared the HID Manager source code (it's hid.subproj, part of IOKitUser) between 10.7 and 10.8, and there's almost no changes to the HID Manager itself. The IOKit HID layer underneath definitely had changes though. davidaramantSEP: |
I think I made things more confusing by trying to simplify things :) The 30 second timeout can also be ignored. The application has two modes where you either 'manually' scan for devices or it continually polls in the background (forever and ever and ever). Each manual attempt is capped at 30 seconds before we give up. We haven't gotten around to the auto-detect portion yet, so the 30 seconds happened to be on my mind. The pseudo code I posted is for one connection attempt, which takes around 5-7 seconds. In the "manual" case we get around 5-6 attempts done; in the auto-detect case it is unbounded. We close all open USB devices at the end of each unsuccessful connection attempt. The user could plug-in or unplug a transport at any time, so we have to rescan to pick up any new ones. There are probably over a hundred write/read calls during each connection attempt. The IR dongle is programmable; we send it scripts to run that in turn operate the IR emitter to search for devices. Each script we send has a response (even if it's "no data received") so theoretically we would never hit a read timeout in normal operation. The crash is very frustrating. |
Is your code doing something like:
And it crashes less often if you put a 50ms delay between hid_close() and hid_open()? Or does it crash after hid_close() without any other hid function being called? |
Sometimes it crashes after The sleeps are after close, read, and write. I'm not sure all three of those sleeps are needed; I haven't tested all the permutations of it. |
So I hacked up hidtest last night to do roughly the same loop as your code:
And it seemed to run ok. I'll do more testing tonight (including more disconnect testing), but I never had a segfault. I'm using 10.8.3 on a 2011 MacBook Air. What type/year Mac are you using? Also, do you have any other USB devices connected? |
Mid 2011 Mac Mini, i7 2.7GHz, 10.8.3. There's a mouse, keyboard, and a cheap hub that the transport is plugged into. Another wrinkle I just realized is that our app has lots of threads all over the place. I've made sure that an opened handle is only accessed by one and only one thread, but I'm wondering if there is a problem with shutting down that thread right after I think all managed .NET threads turn into pthreads under the hood on Mono (on Windows there is not a 1:1 mapping between managed & native threads). A closer match to our application would be to do everything inside that BTW, in hid.c, what does line 980 accomplish? |
Ah ok, I'll try it with more threads. As for hid.c line 980, I don't know why the device needs to be put onto the main runloop. The callbacks have just been unregistered, and there shouldn't be any more I/O going to the device. Also,
|
I tried your new close method without any of my hacks. It actually ran for a few minutes before crashing (this is much better than before). The crash is also much more interesting:
|
Interesting, you can see that it crashed in |
Unfortunately I didn't check the dev log on that crash before it got erased. It was executing our connection loop, so it was most likely reading/writing. I haven't been able to reproduce that exact crash again. So far it bombs with much like it did before: 1 - 3 minutes of running the connection loop, usually inside of a The crash info starts with this:
I can post the entire thing, but it's enormous since there are 18 threads active. None of them look particularly interesting except for a thread that has a lot of stuff about HID callbacks in it's stacktrace (you can see an example of a trace like that in my first post). |
The main thing that I see in every one of your crash logs is I've looked closer at the IOHIDManager code, and I think there are race conditions that could cause this crash if input reports are received while a device is being closed, or possibly after an input report callback is unregistered. I'm gonna try to test this more tonight. The HID Manager code gives me the impression that it's not designed to be thread-safe, i.e. calls to |
Your investigation sounds promising! I tried using a single thread for all API calls but I thought it had the same problem. I can try again, I suppose. Since we are supposed to support multiple transports, using a single thread for all communication would be a bit of a bottleneck. Currently there is one thread that "owns" |
I don't think that running I just hacked up (I was trying to figure out exactly how mono/xamarin implements threads and what they do with run loops for user-created threads, but didn't see any info. I'll probably have to look at the source) |
I've re-confirmed that running all the API calls on the same thread still bombs with the same error. I'll also investigate what Mono does with threads. |
I found the code for the managed thread and some of the unmanaged code. |
It looks like Mono's threads are built on top of GLib, which uses straight pthreads on the Mac. I'm not sure, but I doubt Mono is messing with the run loop at all for user-created threads. Also it turns out I had some problems with my Does your device send input reports only in response to a report from the host, or does it send them all the time? |
The transport uses a strictly request-response protocol; the host is in charge of everything. There should be no unexpected reports from the transport. So does your newly created pthread do anything with regards to run loops? Are you reading/writing on a separate thread as well? If I do have to do something with run loops, is it enough to call |
Well, calling |
Hi Guys, Sorry I've been unresponsive. Run Loops with respect to threading is not something I was ever able to find real clear authoritative documentation on. My understanding of how it works was mostly derived empirically, and also with the help of mrpippy back a couple years ago. In reading some of the git log, I'm reminded that I came to the realization at one point that Apple intends for an application to have an IOHIDManager handle for each device. The way we currently do it is to have one IOHIDManager and then an IOHIDDevice for each device. I'm not sure that matters to this particular case, but my experience has been that when you don't do stuff exactly the way they want you to, you have problems (and it's not always clear what they want you to do). I also wonder some about having a lot of threads. HIDAPI on mac should be threadsafe, but it doesn't mean there aren't bugs. hid_init() should be called one time at the start of execution, per the documentation, preferably from the main thread. I'm also suspicious about the mono stuff. Is there a minimal test program which shows the symptoms? Maybe start with hidtest, and then build onto it (make it more complicated) until you see the bad behavior. I also wonder, since you're calling hid_enumerate() from a lot of different threads, whether it needs to be mutexed, and then have the run loop moved to the current thread before process_pending_events() is called. There's also an issue with dev->disconnected. When the device has been disconnected, it will full-up crash if certain functions are called, even if the handle is still open. The only way I know to get around this is to set dev->disconnected in the device removal callback, and then check for it all over the place. This seems racy, but I've yet to find a better way to do it. IIRC, mrpippy tested this pretty hard with open/read/write/close/unplug (in random order). Maybe that was someone else. I hope maybe something I've said here will trigger a light bulb. Alan. |
davidaramantSEP: signal11: Also, HID Manager definitely has bugs! If you register an input report callback, then unregister it, it will segfault when the next input report comes in. I just added this as issue #116, but I think the only effect for hidapi is a possible race condition in |
Isn't that exactly what this issue is about? (well one of the things) |
He said his device uses a request-response protocol, so there shouldn't be input reports coming in during |
I tried the IOKit path hid.c with the new I'm going to dust off a console program that I was testing with earlier (still using Mono) and see if the behavior is any different. I think it worked better, but I don't know if I bothered leaving it running for very long. In parallel to getting an actual fix, I was progressing with a series of hacks to work around the two problems we see but the workarounds stopped... working. I'll explain our terminology again: The transports are normally not very interesting to us; it's talking to the IR devices on the other end that represent 95% of our code base. Not finding transport after unplugging/replugging: Lazily call We can live with the second hack, but not be able to discover transports is pretty fatal. |
Correction: With the IOKit path patch, it does find the transport again, but the path returned is empty. Could that be because the USB device hasn't been fully closed yet? |
In your latest tests, what thread are you calling I think the path is empty because the HID Manager's device info is stale (because it's run loop isn't running), so the device being queried was already unplugged. |
You're right; the change I made to make sure Enumerate and Init happen on the same thread got lost in all the mess of applying/undoing hacks. When I redo that change, rediscovering a USB device works again without having to continually call exit/init (hooray). |
So what's the answer then? What needs to change in HIDAPI? It sounds like you needed to call hid_enumerate() from the same thread as hid_init(). hid_enumerate() already calls process_pending_events(). Does the run loop need to get moved to the hid_enumerate() thread before the call to process_pending_events()? Does it then need to get moved back to the read_thread afterwards? In so doing, does it need to be mutexed with the read thread? |
I'm pretty sure that crash is still in there, so only one of the problems has been fixed so far. I'm currently running with the IOKit branch + new Moving the run loop in I think the I'm a bit confused by what you mean by the read thread. There should only be a read thread if a device is open, right? I don't know if scanning for devices is supposed to be supported when there is currently a handle open; we never do that either. |
See also #117 |
You guys have mentioned code changes you've made. Are there any patches? |
I haven't made any. I just used hid.c from the IOKit branch mentioned in this thread with the new close method posted above. We got HIDAPI into a usable state using those two changes so I've had to focus on other parts of the application. I did find out that the sleep after I'm not very happy about having that hack in there and not getting to the bottom of the problem, but unfortunately we have to be a bit pragmatic about where we spend time in order to meet our deadline. |
For the hid_close() modifications, I see you took out the unregistering of the report callbacks. Everything that is in there was put in there for a reason, because it was necessary on some version or other of OSX (maybe those reasons are gone on newer versions though)). I seem to remember that when porting to Lion (and I use the term "port" because getting this to run on Lion was a disaster) the input report callback needed to be explicitly unregistered, or else it would keep getting called even after IOHIDDeviceClose(). Maybe that's not necessary anymore. |
Three years later... any news on this? In my HIDAPI clone (PureJavaHidApi) I'm experiencing something similar on Yosemite. My lib is Java but the code logic was lifted from HIDAPI and is almost identical. If I open a device and unplug/replug it device, then I cannot open it again. I traced this to the fact that after unplugging IOHIDManagerCopyDevices still returns a list where the device is present even if it has been unplugged. After re-plugging the list then contains to instances of that device. When the list is then searched the wrong ('old') device is found and naturally opening that fails. So somehow the device is not properly closed (in my code) but left dangling. I've tried various ways to close the device in the removal call back but without success. As a work around I changed my open code to use the last matching entry in the list returned by IOHIDManagerCopyDevices, this improves things but unfortunately sometimes the latest plugged in incarnation of the device is not the last in the list. Anyway, posting this here just in case this issue is still unsolved or someone else can find some use for this titbit of debugging information. And also fishing for solution to my problem of course ;) |
Posting this here also for completeness, i.e. while my previous post was about PureJavaHidApi and not signal11/hidapi I'm posting the solution here for posterity in case someone ends up looking for solution to this kind of behaviour. After spending all spare time for six days finally nailed it to this: I did not call
before (in device enumeration) calling:
What an insidious feature in IOKit, without that call everything seems to work, br Kusti |
Fix github issue signal11#114
OS X 10.8.3, latest version from git, USB device is known to be good (we have been using it for 7 years on Windows).
Our application has a talks to a proprietary USB IR dongle (our 'transport') which in turn talks to a variety of IR devices. Our connect loop looks something like this in pseudo code:
The above sequence happens in a loop with a timeout (around 30 seconds). There are usually multiple writes/reads for each device protocol; the end result being that we send tons of these requests to the HID layer in a short period of time.
We are seeing problems with the hidapi library that I think are related:
It feels to me like the device is not being properly closed or there is a race condition. The stack traces following a crash always include something about HID callbacks, which seems related:
With the hack to activate/deactivate hidapi I saw a different crash which also seems to be related to threading:
There are a number of complications with us attempting to debug this issue:
We're kind of stuck. I think with a combination of hacks we might be able to limp along, but it would be better if we got the bottom of what is happening. Any ideas or suggestions you have would be appreciated.
The text was updated successfully, but these errors were encountered: