-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Improve usb.core.Device error handling #10611
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
shared-module/usb/core/Device.c was using its own 0xff value for the xfer_result_t enum defined by tinyusb/src/common/tusb_types.h. The 0xff value served the same purpose as the already exisiting XFER_RESULT_INVALID enum value (a placeholder to mark in-progress transactions). This commit standardizes on XFER_RESULT_INVALID in usb.core.Device consistent with the usage in tinyusb. Making this change allows implementing `switch(result){...}` style result code checks without compiler errors about 0xff not being a valid value for the enum.
This directly translates the Device.ctrl_transfer() result check logic from its old if-statements to an equivalent switch-statement. The point is to make it clear how each possible result code is handled. Note that XFER_RESULT_FAILED and XFER_RESULT_TIMEOUT both return 0 without generating any exception. (but also, tinyusb may not actually use XFER_RESULT_TIMOUT if its comments are still accurate)
Previously this returned the requested transfer length argument, ignoring the actual length of transferred bytes. This changes to returning the actual length.
Previously, usb.core.Device.ctrl_transfer() did not raise an exception when TinyUSB returned an XFER_RESULT_FAILED result code. This change raises an exception which prevents a failed transfer from returning an all zero result buffer.
The code for endpoint transfers and control transfers previously had a bunch of duplicated logic for setup, timeout checking, and result code error handling. This change factors that stuff out into functions, using my new transfer result error handling code from the last commit. Now the endpoint and control transfers can share the setup, timeout, and error check logic. For me, this refactor reduced the firmware image size by 176 bytes.
Submitting this as a draft to get comments. Before leaving draft status, I'd like to extend this to set unique |
This sets unique errno values for each place in the usb.core.Device implementation that raises USBError or USBTimeoutError. The idea is to provide more visibility so CircuitPython code can understand what's gone wrong when TinyUSB complains about something. CAUTION: This removes two error strings (stalled pipe and config not set). The new way relies on unique error codes as defined in shared-module/usb/core/Device.h. The "errno" values here are arbitrary with no relation to POSIX errno codes. I don't see this as a problem because the so-called errno codes returned by desktop PyUSB are undocumented and appear to be rather aribitrary.
Just pushed another commit that sets unique errno values for each place in the usb.core.Device implementation that raises USBError or USBTimeoutError. The idea is to provide more visibility so CircuitPython code can understand what's gone wrong when TinyUSB complains about something. Reviewers: The last commit deletes two USBError error strings (stalled pipe and no configuration set). Since everything now gets a unique errno code, it's still possible to distinguish those two USBError instances from all the others. I'm operating on the theory that the 9.x USB host implementation was pretty beta, and the error handling didn't work very well. So, seems to me like dropping those strings for 10.0.0 doesn't count as an API change worth being concerned about, plus it saves some bytes... Does this sound okay? |
Just converted this from draft to ready to review. |
Might make sense to also include a fix for [edit: I implemented this, see commit f1ad79b and comment below] |
For an example of when the "No configuration set" message might be useful as a diagnostic, see the description of this issue: But, in that case, the code is probably only making it to the point where that error message happens because usb.core.Device fails to check some earlier error return codes from TinyUSB (bad vid/pid). |
This fixes a bug where common_hal_usb_core_device_get_idVendor() and common_hal_usb_core_device_get_idProduct() were ignoring the error check boolean returned by tuh_vid_pid_get(). Also, this includes a workaround for what appears to be a TinyUSB glitch with enumeration and assigning addresses. By adding a 50 ms background task loop before raising the exception after a TinyUSB failure result, somehow the vid/pid error condition magically goes away on it's own. Something subtle and weird is happening here which could use further investigation.
Latest commit (probably, hopefully) fixes all of these:
The trick was adding a 50 ms delay after the first time that TinyUSB returns a failure result code for CAUTION: The delay thing is a kludge to work around weirdness with TinyUSB's behavior when one of its This is an example of my serial console log after the latest commit, running the reproducer code from issue #10563:
It's good that the last line ends with a debug print of the device descriptor bytes. That means that the loop which is calling Previously, before this last fix, it would have looked like:
where the "..." at the end represents hundreds of repetitions of the last USBError with errno=3. |
Previously, I used a 50ms delay after hitting a TinyUSB tuh_* API call failure result to have a big safety margin. But, that seems long enough that it might mess up animation loops. So, I tried many builds with delays between 0ms and 20ms, testing unplug detection with my big pile of USB devices. A 15ms delay worked fine with everything I tried. My Full Speed devices were marginal down to 13ms, but they stopped working at 12ms. Some of my Low Speed devices were okay down to 3ms, but the Adafruit generic SNES gamepad was marginal below 12ms.
Last commit reduces the delay after a TinyUSB failure result. Previously, I used 50ms to have a big safety margin. But, that seems excessive. I tested unplug detection for many builds with delays between 0ms and 20ms using my pile of USB devices. A 15ms delay worked fine with everything. Full Speed devices were marginal at 13ms and failed at 12ms. Some Low Speed devices were okay down to 3ms, but the Adafruit generic SNES gamepad was marginal below 12ms. 15ms seems like enough. |
I ended up loading these changes on a board I'm testing. Things mostly seemed to work for the narrow case I was testing but I did notice that the Adafruit_CircuitPython_USB_Host_Mouse library now fails when one of these fields fails to be retrieved. It's an simple fix to catch now that there's a specific error code but this will be a breaking change for at least some of the Fruit Jam apps as these fields seem to error on retrieval relatively frequently. |
Interesting. It's definitely been my experience that USBTimeoutError is very common for keyboards, mice, and wireless gamepads connected by USB-C or a USB wireless adapter. I will often see some USBError during initial connections, depending on the device, but I haven't seen much of it during normal usage unless I unplug. Was it a wireless mouse? I haven't tried much of that. Did you happen to experiment with hotplugging with and without the changes? One of my big goals was to get it so that you could unplug and reconnect devices whenever you want. It takes a couple layers of loops and exception handling to get that going smootly ( |
I didn't but I can do a little more playing tonight. My suspicion has been that there's some soft reset occurring to the USB host device and then a timeout has to occur before the IDs become available, but that's entirely based on observation and the fact that putting the "find_mouse" call in a short retry loop has been used as a work around. I gave these changes a go because I'm very deep into the weeds of a custom USB build that's a bit over my head. I'm using a build that has CIRCUITPY_USB_DEVICE set to zero and CIRCUITPY_USB_HOST enabled so TinyUSB isn't servicing the device calls, only the host calls. In this configuration it looked to me like there was a race condition (i.e. panic) caused by attempting to start a xfer (or perhaps a second abort) before a previous abort had completed. I don't want to muddy this PR up but if you have the time I may ping you sometime on Discord and maybe I could bounce some thoughts your way? |
sounds good. I started a thread over in #circuitpython-dev |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for looking into this. It is not simple stuff.
I'm mixed about these changes:
- The error checking in getting PID and VID is good.
- The magically fixes things delay is a fix I'd like to avoid. Instead, we can dig a bit deeper into why this is happening. Seems like there will be something tinyusb does during that delay that is important to do before we call more of its API. @hathach is back and can help with this.
- I don't mind exposing TinyUSB error codes back to CP code but don't want it done in a way that will break PyUSB compatibility.
errno
should be standard error codes. It looks like PyUSB also hasbackend_error_code
that we could implement: https://github.com/pyusb/pyusb/blob/8fa37c5efc84eaaf197e04182dae15bf56a81b59/usb/core.py#L288-L305 Human readable error messages are much more useful too.
Want to split this into smaller pieces to merge them in?
Okay, I can just remove the delay workaround. If somebody who actually understands how to troubleshoot weird low-level USB glitches wants to tackle a fix for the root cause, that would be fabulous.
Yeah, this is complicated...
Whatever works for you. Do you have any specific ideas about how you'd like it to be done? Like, perhaps as a start I could leave out the magic delay and avoid changing any of the existing errno or error string stuff? |
@hathach, the author of TinyUSB, is paid by Adafruit, so if you can characterize the problem in an issue either here or in TInyUSB, he can take a look. |
@tannewt I took a guess at what you might like and made this new PR: |
Tighten up gaps in usb.core.Device error handling:
0xff
(not a valid enum value) was assigned to thestatic xfer_result_t _xfer_result;
variable (an enum type) are converted to usingXFER_RESULT_INVALID
(a valid enum value). Both values were used to indicate the condition of waiting for a callback to finish. This change makes usage in usb.core.Device consistent with usage in the TinyUSB implementation.if
statements to aswitch
statement that fully covers all the possible enum values. In particular, this new code will raise aUSBError
exception in the case ofresult == XFER_RESULT_FAILED
(old behavior was to return an all zero buffer with no indication of an error condition)._xfer()
andcommon_hal_usb_core_device_ctrl_transfer()
.usb.core.Device.idVendor
andusb.core.Device.idProduct
TU_VERIFY()
macro) while getting vid/pid or doing a device descriptor control transfer. This delay magically fixes a bunch of stuff, see comments below.Checks:
Testing:
USB Host: Unplugging device can break usb.core.find() #10563