Skip to content

Conversation

samblenny
Copy link

@samblenny samblenny commented Sep 4, 2025

Tighten up gaps in usb.core.Device error handling:

  1. Instances where 0xff (not a valid enum value) was assigned to the static xfer_result_t _xfer_result; variable (an enum type) are converted to using XFER_RESULT_INVALID (a valid enum value). Both values were used to indicate the condition of waiting for a callback to finish. This change makes usage in usb.core.Device consistent with usage in the TinyUSB implementation.
  2. Callback result code checks are converted from partial coverage using if statements to a switch statement that fully covers all the possible enum values. In particular, this new code will raise a USBError exception in the case of result == XFER_RESULT_FAILED (old behavior was to return an all zero buffer with no indication of an error condition).
  3. Factor out timeout and result checking code that was previously duplicated in _xfer() and common_hal_usb_core_device_ctrl_transfer().
  4. Fix control transfer return value to give actual length (was returning requested length)
  5. [edited] Set USBError.errno and USBTimeoutError.errno with unique values for each type of error condition reported by TinyUSB (also drop the stalled pipe and unset configuration error strings)
  6. [edited] Fix missing error handling for usb.core.Device.idVendor and usb.core.Device.idProduct
  7. [edited] Add 50 ms delay kludge to let TinyUSB recover from enumeration or addressing problem (failed TU_VERIFY() macro) while getting vid/pid or doing a device descriptor control transfer. This delay magically fixes a bunch of stuff, see comments below.

Checks:

  • pre-commit
  • build & run for Fruit Jam

Testing:

  • Tested with code from issue:
    USB Host: Unplugging device can break usb.core.find() #10563
  • Before changes: test code produces all-zero device descriptor buffers
  • After changes: test code produces USBError with errno=3 or errno=9 instead of all-zero buffer (there is still a problem somewhere in usb.core.Device or TinyUSB, but getting an exception is better than getting bad data)
  • Test Hardware: Fruit Jam rev D + 8BitDo SN30 pro USB gamepad. NOTE: This Full Speed gamepad has some weird startup behavior. It powers up as XInput, disconnects, reappears as a Switch Pro controller, disconnects, then reappears and stabilizes as an XInput controller (it acts like it's been unplugged and re-connected). You can get a similar result with the Low Speed Adafruit generic SNES controller by plugging it in, waiting for the test code to show the device descriptor, then unplugging it.

shared-module/usb/core/Device.c was using its own 0xff value for
the xfer_result_t enum defined by tinyusb/src/common/tusb_types.h.
The 0xff value served the same purpose as the already exisiting
XFER_RESULT_INVALID enum value (a placeholder to mark in-progress
transactions). This commit standardizes on XFER_RESULT_INVALID in
usb.core.Device consistent with the usage in tinyusb.

Making this change allows implementing `switch(result){...}` style
result code checks without compiler errors about 0xff not being a
valid value for the enum.
This directly translates the Device.ctrl_transfer() result check
logic from its old if-statements to an equivalent switch-statement.
The point is to make it clear how each possible result code is
handled. Note that XFER_RESULT_FAILED and XFER_RESULT_TIMEOUT both
return 0 without generating any exception. (but also, tinyusb may
not actually use XFER_RESULT_TIMOUT if its comments are still
accurate)
Previously this returned the requested transfer length argument,
ignoring the actual length of transferred bytes. This changes to
returning the actual length.
Previously, usb.core.Device.ctrl_transfer() did not raise an
exception when TinyUSB returned an XFER_RESULT_FAILED result code.
This change raises an exception which prevents a failed transfer
from returning an all zero result buffer.
The code for endpoint transfers and control transfers previously
had a bunch of duplicated logic for setup, timeout checking, and
result code error handling.

This change factors that stuff out into functions, using my new
transfer result error handling code from the last commit. Now the
endpoint and control transfers can share the setup, timeout, and
error check logic.

For me, this refactor reduced the firmware image size by 176 bytes.
@samblenny
Copy link
Author

Submitting this as a draft to get comments. Before leaving draft status, I'd like to extend this to set unique USBError.errno values for the different places in usb.core that raise USBError and USBTimeoutError. General idea here is to make sure that TinyUSB error conditions do not get hidden or ignored by the CircuitPython usb.core implementation.

This sets unique errno values for each place in the usb.core.Device
implementation that raises USBError or USBTimeoutError. The idea is
to provide more visibility so CircuitPython code can understand
what's gone wrong when TinyUSB complains about something.

CAUTION: This removes two error strings (stalled pipe and config
not set).

The new way relies on unique error codes as defined in
shared-module/usb/core/Device.h. The "errno" values here are
arbitrary with no relation to POSIX errno codes. I don't see this
as a problem because the so-called errno codes returned by desktop
PyUSB are undocumented and appear to be rather aribitrary.
@samblenny
Copy link
Author

Just pushed another commit that sets unique errno values for each place in the usb.core.Device implementation that raises USBError or USBTimeoutError. The idea is to provide more visibility so CircuitPython code can understand what's gone wrong when TinyUSB complains about something.

Reviewers: The last commit deletes two USBError error strings (stalled pipe and no configuration set). Since everything now gets a unique errno code, it's still possible to distinguish those two USBError instances from all the others. I'm operating on the theory that the 9.x USB host implementation was pretty beta, and the error handling didn't work very well. So, seems to me like dropping those strings for 10.0.0 doesn't count as an API change worth being concerned about, plus it saves some bytes... Does this sound okay?

@samblenny samblenny marked this pull request as ready for review September 4, 2025 18:52
@samblenny
Copy link
Author

Just converted this from draft to ready to review.

@samblenny
Copy link
Author

samblenny commented Sep 4, 2025

Might make sense to also include a fix for common_hal_usb_core_device_get_idVendor() and common_hal_usb_core_device_get_idProduct() ignoring the TinyUSB return value of tuh_vid_pid_get() as described at:

[edit: I implemented this, see commit f1ad79b and comment below]

@samblenny
Copy link
Author

For an example of when the "No configuration set" message might be useful as a diagnostic, see the description of this issue:

But, in that case, the code is probably only making it to the point where that error message happens because usb.core.Device fails to check some earlier error return codes from TinyUSB (bad vid/pid).

This fixes a bug where common_hal_usb_core_device_get_idVendor()
and common_hal_usb_core_device_get_idProduct() were ignoring the
error check boolean returned by tuh_vid_pid_get().

Also, this includes a workaround for what appears to be a TinyUSB
glitch with enumeration and assigning addresses. By adding a 50 ms
background task loop before raising the exception after a TinyUSB
failure result, somehow the vid/pid error condition magically goes
away on it's own. Something subtle and weird is happening here
which could use further investigation.
@samblenny
Copy link
Author

samblenny commented Sep 4, 2025

Latest commit (probably, hopefully) fixes all of these:

The trick was adding a 50 ms delay after the first time that TinyUSB returns a failure result code for tuh_vid_pid_get() or a transfer callback.

CAUTION: The delay thing is a kludge to work around weirdness with TinyUSB's behavior when one of its TU_VERIFY(...) checks fails after an unplug, or for a device that disconnects on its own. There seems to be some deeper problem with enumeration or addressing, but it clears up on its own if you spin in a RUN_BACKGROUND_TASKS loop for something on the order of 20-50 ms (10 ms isn't enough).

This is an example of my serial console log after the latest commit, running the reproducer code from issue #10563:

Auto-reload is on. Simply save files over USB to run them or enter REPL to disable.
code.py output:
Finding USB devices...

idVendor      045e
idProduct     028e
product       Controller
manufacturer  Controller
{(18, 1, 0, 2, 255, 255, 255, 64, 94, 4, 142, 2, 20, 1, 1, 2, 3, 1): True}
find USBError: .errno=3 '3'
find USBError: .errno=9 '9'
find USBError: .errno=9 '9'
find USBError: .errno=9 '9'
find USBError: .errno=9 '9'
find USBError: .errno=9 '9'
find USBError: .errno=9 '9'
find USBError: .errno=9 '9'
find USBError: .errno=9 '9'
find USBError: .errno=9 '9'
find USBError: .errno=9 '9'
find USBError: .errno=9 '9'
find USBError: .errno=9 '9'
find USBError: .errno=9 '9'
find USBError: .errno=9 '9'
find USBError: .errno=9 '9'
find USBError: .errno=9 '9'
find USBError: .errno=9 '9'
find USBError: .errno=9 '9'
Clearing Cache

idVendor      057e
idProduct     2009
product       Pro Controller
manufacturer  Nintendo Co., Ltd.
{(18, 1, 0, 2, 0, 0, 0, 64, 126, 5, 9, 32, 0, 2, 1, 2, 3, 1): True}
find USBError: .errno=3 '3'
Clearing Cache

idVendor      045e
idProduct     028e
product       Controller
manufacturer  Controller
{(18, 1, 0, 2, 255, 255, 255, 64, 94, 4, 142, 2, 20, 1, 1, 2, 3, 1): True}

It's good that the last line ends with a debug print of the device descriptor bytes. That means that the loop which is calling usb.core.find() after printing the descriptor is not getting interrupted by a seemingly unending stream of USBError exceptions. The two instances of Clearing Cache mean that unplug detection is working (in this case triggered by the SN30 pro disconnecting in order to present itself with a different device descriptor).

Previously, before this last fix, it would have looked like:

Auto-reload is on. Simply save files over USB to run them or enter REPL to disable.
code.py output:
Finding USB devices...

idVendor      045e
idProduct     028e
product       Controller
manufacturer  Controller
{(18, 1, 0, 2, 255, 255, 255, 64, 94, 4, 142, 2, 20, 1, 1, 2, 3, 1): True}
find USBError: .errno=3 '3'
find USBError: .errno=3 '3'
find USBError: .errno=3 '3'
find USBError: .errno=3 '3'
find USBError: .errno=3 '3'
find USBError: .errno=3 '3'
find USBError: .errno=3 '3'
find USBError: .errno=3 '3'
find USBError: .errno=3 '3'
find USBError: .errno=3 '3'
find USBError: .errno=3 '3'
find USBError: .errno=3 '3'
...

where the "..." at the end represents hundreds of repetitions of the last USBError with errno=3.

Previously, I used a 50ms delay after hitting a TinyUSB tuh_* API
call failure result to have a big safety margin. But, that seems
long enough that it might mess up animation loops. So, I tried
many builds with delays between 0ms and 20ms, testing unplug
detection with my big pile of USB devices.

A 15ms delay worked fine with everything I tried. My Full Speed
devices were marginal down to 13ms, but they stopped working at
12ms. Some of my Low Speed devices were okay down to 3ms, but the
Adafruit generic SNES gamepad was marginal below 12ms.
@samblenny
Copy link
Author

samblenny commented Sep 5, 2025

Last commit reduces the delay after a TinyUSB failure result. Previously, I used 50ms to have a big safety margin. But, that seems excessive. I tested unplug detection for many builds with delays between 0ms and 20ms using my pile of USB devices.

A 15ms delay worked fine with everything. Full Speed devices were marginal at 13ms and failed at 12ms. Some Low Speed devices were okay down to 3ms, but the Adafruit generic SNES gamepad was marginal below 12ms. 15ms seems like enough.

@RetiredWizard
Copy link

6. Fix missing error handling for usb.core.Device.idVendor and usb.core.Device.idProduct

I ended up loading these changes on a board I'm testing. Things mostly seemed to work for the narrow case I was testing but I did notice that the Adafruit_CircuitPython_USB_Host_Mouse library now fails when one of these fields fails to be retrieved. It's an simple fix to catch now that there's a specific error code but this will be a breaking change for at least some of the Fruit Jam apps as these fields seem to error on retrieval relatively frequently.

@samblenny
Copy link
Author

as these fields seem to error on retrieval relatively frequently.

Interesting. It's definitely been my experience that USBTimeoutError is very common for keyboards, mice, and wireless gamepads connected by USB-C or a USB wireless adapter. I will often see some USBError during initial connections, depending on the device, but I haven't seen much of it during normal usage unless I unplug. Was it a wireless mouse? I haven't tried much of that.

Did you happen to experiment with hotplugging with and without the changes? One of my big goals was to get it so that you could unplug and reconnect devices whenever you want. It takes a couple layers of loops and exception handling to get that going smootly (find() in the outer loop, read() in the inner loop, etc).

@RetiredWizard
Copy link

Did you happen to experiment with hotplugging with and without the changes?

I didn't but I can do a little more playing tonight. My suspicion has been that there's some soft reset occurring to the USB host device and then a timeout has to occur before the IDs become available, but that's entirely based on observation and the fact that putting the "find_mouse" call in a short retry loop has been used as a work around.

I gave these changes a go because I'm very deep into the weeds of a custom USB build that's a bit over my head. I'm using a build that has CIRCUITPY_USB_DEVICE set to zero and CIRCUITPY_USB_HOST enabled so TinyUSB isn't servicing the device calls, only the host calls. In this configuration it looked to me like there was a race condition (i.e. panic) caused by attempting to start a xfer (or perhaps a second abort) before a previous abort had completed. I don't want to muddy this PR up but if you have the time I may ping you sometime on Discord and maybe I could bounce some thoughts your way?

@samblenny
Copy link
Author

sounds good. I started a thread over in #circuitpython-dev

Copy link
Member

@tannewt tannewt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking into this. It is not simple stuff.

I'm mixed about these changes:

  • The error checking in getting PID and VID is good.
  • The magically fixes things delay is a fix I'd like to avoid. Instead, we can dig a bit deeper into why this is happening. Seems like there will be something tinyusb does during that delay that is important to do before we call more of its API. @hathach is back and can help with this.
  • I don't mind exposing TinyUSB error codes back to CP code but don't want it done in a way that will break PyUSB compatibility. errno should be standard error codes. It looks like PyUSB also has backend_error_code that we could implement: https://github.com/pyusb/pyusb/blob/8fa37c5efc84eaaf197e04182dae15bf56a81b59/usb/core.py#L288-L305 Human readable error messages are much more useful too.

Want to split this into smaller pieces to merge them in?

@samblenny
Copy link
Author

  • The magically fixes things delay is a fix I'd like to avoid. Instead, we can dig a bit deeper into why this is happening. Seems like there will be something tinyusb does during that delay that is important to do before we call more of its API. @ hathach is back and can help with this.

Okay, I can just remove the delay workaround. If somebody who actually understands how to troubleshoot weird low-level USB glitches wants to tackle a fix for the root cause, that would be fabulous.

Yeah, this is complicated...

  1. I tried to figure out a way of just adding errno codes while preserving the existing strings, but the mp_raise_* functions for creating exceptions mostly want an mp_whatever string type (not compressed rom string). It wasn't obvious to me how to coax one of the CircuitPython exception making functions into also taking a numeric argument to set a property along with a rom string message. Basically, I don't understand how to work with MicroPython objects from C yet.
  2. The current CircuitPython code is incompatible with the PyUSB usage of errno in that CircuitPython doesn't set errno properties at all. It's also very unclear to me how desktop PyUSB sets the errno values. The values on Debian seem pretty arbitrary, or at least non-intuitive when you look up the POSIX meanings. When I looked at that last year, I didn't find any documentation for it.
  3. I agree that readable error strings are great. I'd be happy to use all strings and ignore the errno stuff. But, is adding all the firmware space for that okay? I've been operating under the understanding that adding new error strings is to be avoided if at all possible.
  4. Based on seeing how this PR code works, it might actually be fine to omit all the errno changes and leave the USBError instances that set a string message as-is. So long as unplug events result in a USBError instead of a USBTimeout error, that provides enough info to build a good usb.core.find() loop to manage hot-plugging.

Want to split this into smaller pieces to merge them in?

Whatever works for you. Do you have any specific ideas about how you'd like it to be done? Like, perhaps as a start I could leave out the magic delay and avoid changing any of the existing errno or error string stuff?

@dhalbert
Copy link
Collaborator

dhalbert commented Sep 6, 2025

If somebody who actually understands how to troubleshoot weird low-level USB glitches wants to tackle a fix for the root cause, that would be fabulous.

@hathach, the author of TinyUSB, is paid by Adafruit, so if you can characterize the problem in an issue either here or in TInyUSB, he can take a look.

@samblenny
Copy link
Author

@tannewt I took a guess at what you might like and made this new PR:

@samblenny samblenny marked this pull request as draft September 6, 2025 01:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants