Skip to content

Conversation

@xxxajk
Copy link
Contributor

@xxxajk xxxajk commented Jan 24, 2023

Fixes locating commonly mounted paths on Linux. There were too many assumptions made.
I left the original code in, but commented it out as a reference.

Note:
Since this currently "works for me" the TO-DO is only meaningful if someone else has a different issue and wants to scan something else than what is in the current common path coverage, caused by customization.

Copy link
Owner

@earlephilhower earlephilhower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

First of all, could you please remove the commented/quoted out code. git history will keep any old stuff for reference.

What exactly was the issue you were trying to solve here? Eyeballing it, it seems like this will make a list of all rootpaths which exist, not the 1st one encountered (i.e. iif you had a dummy /run/media dir the original would not even look a drive mounted in /media. Is that the base issue, or did you have some other reason?

@xxxajk
Copy link
Contributor Author

xxxajk commented Jan 25, 2023

Yes, the problem is a few different cases.
Some Linux installs don't use udisk, instead use other things such as kernel automount options.
Sometimes media can be forced mounted in /media or /run/media or /var/run/media if the media is tagged as shared.
Note that I also check write permissions in case it's there, but the user hasn't permission, so it doesn't cause an exception, if the user can see the magic file is present. Checking ahead eliminates the entry.

Not sure how far behind the version is for the install from the IDE, but it seems to not always automatically reset when switching to 1200bauds. Not related, but I thought you should be aware of that. Unfortunately the fix that works for me is to re-plug the device and/or hold boot and powercycle the pico. My guess is that the best way to do this would be to to the udiskctl first, without sniffing stdout, and then pause and use the known suspect places to locate it, or simply to try twice, which is the "usually works" part, because sometimes it does the reset and role change, however the mount comes in much later. Another point is that udisks is going to not work on a headless or text-only setup, since udisks is related to GUI desktops, instead automount in fstab or udev rules are the usual alternatives.

I'll remove the dead code, I left it in as an easier reference for myself.

@xxxajk
Copy link
Contributor Author

xxxajk commented Jan 25, 2023

While now more reliable, it still seems to require a few tries. I will experiment with the delay at the beginning, which seems to help. My guess is that udisksctl is called while it is already trying to run, and hits mid-point.

[1531102.714282] cdc_acm 1-2.4:1.0: ttyACM0: USB ACM device
[1531308.543871] usb 1-2.4: USB disconnect, device number 123
[1531308.799205] usb 1-2.4: new full-speed USB device number 124 using xhci_hcd
[1531308.934496] usb 1-2.4: New USB device found, idVendor=2e8a, idProduct=0003, bcdDevice= 1.00
[1531308.934500] usb 1-2.4: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[1531308.934501] usb 1-2.4: Product: RP2 Boot
[1531308.934502] usb 1-2.4: Manufacturer: Raspberry Pi
[1531308.934503] usb 1-2.4: SerialNumber: E0XXXXXXXXXX
[1531308.936517] usb-storage 1-2.4:1.0: USB Mass Storage device detected
[1531308.936595] scsi host7: usb-storage 1-2.4:1.0
[1531309.952716] scsi 7:0:0:0: Direct-Access     RPI      RP2              3    PQ: 0 ANSI: 2
[1531309.952852] sd 7:0:0:0: Attached scsi generic sg8 type 0
[1531309.953194] sd 7:0:0:0: [sdi] 262144 512-byte logical blocks: (134 MB/128 MiB)
[1531309.954545] sd 7:0:0:0: [sdi] Write Protect is off
[1531309.954547] sd 7:0:0:0: [sdi] Mode Sense: 03 00 00 00
[1531309.956547] sd 7:0:0:0: [sdi] No Caching mode page found
[1531309.956551] sd 7:0:0:0: [sdi] Assuming drive cache: write through
[1531309.964515]  sdi: sdi1
[1531309.967544] sd 7:0:0:0: [sdi] Attached SCSI removable disk

then it freaks out, since udisks interfered with it's self, and goes back to ACM device

[1531398.391343] usb 1-2.4: USB disconnect, device number 124
[1531398.397412] FAT-fs (sdi1): unable to read boot sector to mark fs as dirty
[1531398.747062] usb 1-2.4: new full-speed USB device number 125 using xhci_hcd
[1531398.887612] usb 1-2.4: New USB device found, idVendor=2e8a, idProduct=000a, bcdDevice= 1.00
[1531398.887616] usb 1-2.4: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[1531398.887617] usb 1-2.4: Product: PicoArduino
[1531398.887619] usb 1-2.4: Manufacturer: Raspberry Pi
[1531398.887620] usb 1-2.4: SerialNumber: E6XXXXXXXXXXXXXX
[1531398.937632] cdc_acm 1-2.4:1.0: ttyACM0: USB ACM device

@earlephilhower
Copy link
Owner

What distro are you running? I'm not really getting many reports of issues with things as they are and it's been very stable in my experience. And as long as the serial port is up, the 1200bps tickle is super stable since it's running code the RPi team wrote themselves. :)

If you're really having trouble, why not use the Picotool upload method which doesn't use the FS at all...

@xxxajk
Copy link
Contributor Author

xxxajk commented Jan 25, 2023

Basically, Linux mint, but with kubuntu shims for KDE.

@xxxajk
Copy link
Contributor Author

xxxajk commented Jan 25, 2023

Seems like updating that delay that I put in did the trick for me.
I'll push the new value.
Funny thing is that this isn't exactly a slow or low end box, but I have seen stranger things happen.

@xxxajk
Copy link
Contributor Author

xxxajk commented Jan 25, 2023

Even though it delays, I think it should attempt a retry at least twice, if it works the first time, then there's no loss.
Yes, it still can happen, and have had it only happen once since including the 10 second delay.

@xxxajk
Copy link
Contributor Author

xxxajk commented Jan 25, 2023

Actually, I believe I found another root cause... python isn't closing the ACM port after flipping to 1200 baud. This causes the USB device to hang and not enumerate into disk until python exits.

@xxxajk
Copy link
Contributor Author

xxxajk commented Jan 26, 2023

Here's more testing data. I'll put up the final changes in a bit too.

Jan 25 19:03:59 devil kernel: [1536915.845527] cdc_acm 1-2.4:1.0: ttyACM0: USB ACM device

Try to upload, pico goes off line (as indicated by the USB hub LED) and fails, but doesn't indicate the disconnect in the Kernel log until much later.

Jan 25 19:05:01 devil kernel: [1536978.378698] usb 1-2.4: USB disconnect, device number 9

Then the the HUB's LED turns on..

Jan 25 19:05:01 devil kernel: [1536978.633764] usb 1-2.4: new full-speed USB device number 10 using xhci_hcd

and udisks mounts it.

@xxxajk
Copy link
Contributor Author

xxxajk commented Jan 26, 2023

This is as good as I can get it.
I had a similar issue on one of my own devices that I made, where the USB needed to indicate a reset state and a delay.
There is usbreset which can force the reset but it requires root priv's and you have to know exactly what device.

@xxxajk
Copy link
Contributor Author

xxxajk commented Jan 26, 2023

Part of the problem (as I see it, and have experienced in the past) is having the USB D +/- lines to HiZ, or into an illegal state for a few ms before the pico hits the bootloader. Otherwise a hub never sees a "re-connection" event. From this side, we can only do so much, and I will have to take a good look at the code that calls the baked-in ROM. I am betting that it doesn't disable the USB PHY or change the state to something before that point, and if it does, needs a delay so that it looks like a re-connection event. Basically, we need to trick the system into issuing a USB reset sequence.

I'll have some time later tonight to put it on my USB analyzer and get the actual facts of what is or isn't happening on the wire. As far as this side of the code, this is as best as I could get it to do successful uploads without pressing bootsel and re-plugging.

@earlephilhower
Copy link
Owner

earlephilhower commented Jan 26, 2023

We do have full control over the USB peripheral in the chip. It would be possible to set it to reset (not sure what the pin states are at that point) and sleep for 500ms before rebooting into ROM mode.

-edit-
We also can control the USB PHY via the USBPHY_DIRECT(_OVERRIDE) registers so drive/float whatever lines might be needed.
https://datasheets.raspberrypi.com/rp2040/rp2040-datasheet.pdf

@xxxajk
Copy link
Contributor Author

xxxajk commented Jan 27, 2023

As I suspected, the phy doesn't go to sleep. ~50-100ms should do the trick.
See index 694 in the USB sniffer trace.

Screenshot_20230127_010010

@xxxajk
Copy link
Contributor Author

xxxajk commented Jan 27, 2023

Made a pretty sloppy fix, that works consistently.
Basically...

  1. Finishes possible pending USB tasks (could be improved?)
  2. Disables the IRQ on the NVIC for the USB
  3. resets the USB "block"
  4. waits for USB block to come out of reset
  5. delay (could be improved by waiting on millis/cpu cycle counts)

I'll put in better comments on what it's doing in the code, and push it fairly soon.

@xxxajk
Copy link
Contributor Author

xxxajk commented Feb 4, 2023

All pretty, and working excellent.

@xxxajk
Copy link
Contributor Author

xxxajk commented Feb 4, 2023

Last annoyance to fix, is to sleep in the python script after successful update to allow the USB to re-enumerate. While not always fatal, I have seen a few cases where everything collapses.

Resetting /dev/ttyACM0
Converting to uf2, output size: 134656, start address: 0x2000
Flashing /media/RPI-RP2 (RPI-RP2)
Wrote 134656 bytes to /media/RPI-RP2/NEW.UF2
processing.app.SerialException: Error opening serial port '/dev/ttyACM0'.
	at processing.app.Serial.<init>(Serial.java:154)
	at processing.app.Serial.<init>(Serial.java:82)
	at processing.app.SerialMonitor$2.<init>(SerialMonitor.java:132)
	at processing.app.SerialMonitor.open(SerialMonitor.java:132)
	at processing.app.AbstractMonitor.resume(AbstractMonitor.java:132)
	at processing.app.Editor.resumeOrCloseSerialMonitor(Editor.java:2152)
	at processing.app.Editor.access$1300(Editor.java:117)
	at processing.app.Editor$UploadHandler.run(Editor.java:2109)
	at java.lang.Thread.run(Thread.java:748)
Caused by: jssc.SerialPortException: Port name - /dev/ttyACM0; Method name - openPort(); Exception type - Port not found.
	at jssc.SerialPort.openPort(SerialPort.java:167)
	at processing.app.Serial.<init>(Serial.java:143)
	... 8 more
Error opening serial port '/dev/ttyACM0'.

Basically the IDE is expecting it to be ready upon return. So the "fix" is to fool it into thinking it is still being used for a little bit of time, perhaps 1000ms?

@xxxajk xxxajk requested a review from earlephilhower February 5, 2023 11:21
@earlephilhower
Copy link
Owner

@xxxajk did you want to stuff in a sleep(1) in uf2conv.py related to your last comment?

Also, needs a minor tweak on the formatting of the reset, check the astyle run. For sanity's sake we require {}s even with 1 statement blocks.

Copy link
Owner

@earlephilhower earlephilhower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Still some small concerns and the astyle (you can run ./tests/restyle.sh to fix)...

@xxxajk
Copy link
Contributor Author

xxxajk commented Feb 7, 2023

All works great as-is.
No errors unless there is something else bad happening,
e.g. demos that don't use while(!Serial); in the setup.

@xxxajk
Copy link
Contributor Author

xxxajk commented Feb 7, 2023

P.S. I'll be fixing SPI next, so that it is compliant with the prior fixes I assisted/did for the following cores:
Arduino
PJRC
ChipKIT

@earlephilhower earlephilhower self-requested a review February 9, 2023 16:39
@earlephilhower earlephilhower self-assigned this Feb 9, 2023
@earlephilhower
Copy link
Owner

Don't worry, I didn't forget about this! I'm in the middle of some SDK 1.5 stuff, but hope to pull and try this out by the end of the week for inclusion in the next dot-release.

Copy link
Owner

@earlephilhower earlephilhower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've done some testing on my own box and this PR significantly increases upload times.

Using blink.ino, uploading 2-3 times before actually measuring (so all binary is in RAM and the Pico is running the updated/original USB code) I get the following average times using my stopwatch from the time upload starts to when the IDE returns on Linux:

  • master : 5.6s average of 3 runs
  • #1136 : 9.2s average of 3 runs
    (TBD: I'll see how this affects Windows uploads later tonight)

Given the actual binary upload portion is the same, we've added about 4s of time to every upload which really makes it feel less usable overall.

That said, I did not see any warnings about the missing /dev/ttyACM port with the change. FWIW I was never able to see the errors your own box seemed to have on the USB toggle->reset->MSD mode switch so I can't comment other than that still seems to work fine.

I appreciate your on the belt and braces (and elastic waistband) additions, but making the IDE seem slower for everyone, all time, seems like a no go for me.

Let me mess around with a few ideas here and see if that 4s of wasted time per upload can be dropped down any simple way.

* Split  the difference on sleeping between get_drive calls
* Minimize the time spent at 9600bps->1200bps transition
* Abort sleep after upload if serial port comes back sooner
@earlephilhower
Copy link
Owner

I'm at about 7.3s per upload with the minor tweaks to uf2conv.py which may be good enough to get things in. Still need to test on Windows.

@earlephilhower
Copy link
Owner

Windows testing looks good, actually seems a little faster than Linux at ~4.7s average. And no warnings about the serial port not appearing in the IDE 2.0.

FWIW, the drive detection and serial port reconnect are much faster, at least on my Windows 10 Intel i5 12th gen vs. my Linux AMD 5700X.

So, after the tud_task() loop removal I think we are good to merge! This is a good quality of life improvement!

If it really is necessary to process USB events whilst in the reset sequence,
we'll need to find another way of handling this.  For now, pretend the user
completely unplugged the device which *should* (I hope) be handled
normally by the host USB stack.
@earlephilhower earlephilhower merged commit 5dafbf5 into earlephilhower:master Feb 11, 2023
@earlephilhower
Copy link
Owner

Thanks again for the work on this!

@xxxajk
Copy link
Contributor Author

xxxajk commented Feb 14, 2023

AMD Ryzen 7 3700X 8-Core Processor and ASUS TUF Gaming Wifi here. This particular hardware has very particular bugs, and USB can be quite painful. It was bought for the purpose of finding faults like this in firmware and was used to do a similar fix with the Ultimate Hacking Keyboard https://github.com/UltimateHackingKeyboard/firmware .

I'll check on the results of you removing the loop, but hangs without it because the USB protocol gets violated (See the trace I posted). Instead of eliminating the loop, it might be better to shorten it. The reason why it works, is because there are pending events waiting, and those need to be handled. I do not mind a few seconds of delay if it is reliable.

The possible difference here is that I am on a powered USB3.x hub on a 3.x port, also since the different BIOS doing different tweaks to the chipset, that could also be a reason.

@earlephilhower
Copy link
Owner

earlephilhower commented Feb 14, 2023

The issue with the loop is that it's called from tud_task and you cannot recursively call tud_task from inside it. You have to return from the fcn to tud_task, let that return to whatever main core code was running, and then somehow restart the reset process after some time. Possibly set an alarm w/CB after 300ms or whatever and then reset from the alarm CB.

i.e. This is invalid

tud_task
    CheckSerialReset
        tud_task
        tud_task
        ...

Whereas you could do

tud_task
    CheckSerialReset (set alarm CB)
..main app..
(tud_task cal;led normally in the 1ms handler)
<xxx ms later>
..irq..
    alarm_cb()

Also, while it's a "violation" of the USB spec, how is it any different from a surprise removal, which is legal and can happen any time, no?

@xxxajk
Copy link
Contributor Author

xxxajk commented Feb 14, 2023

What I have found is the recursive call (while not intended to be used) works because the messages are already there in the queue, waiting to be processed (not requiring the RX IRQ). Messages arrive as multiple small packets on high-speed USB 2 and greater inside what amounts to a larger wrapper packet. 99.9% of the host hardware out there, only does a software or hardware check when:
Packet buffers have the high-water mark reached
or
Single packet host, with no buffers
or
less than packet buffer high water mark and SOF seen (1ms)

Now as far as the difference, and why a surprise removal works, depends on how the OS handles a disconnection, and there are 3 types.

  1. Sleep, where you simply go idle on the data lines, which causes a disconnection, but keeps the addresses.
  2. Host sends a reset burst
  3. Power loss.

What is happening with not processing the final packets is the first one... because it's not a power loss, and the PC simply thinks that it dropped into a sleep mode. Without simulating the D+/- line state of a physical disconnect, the only option is to process any left-over packets, which is what the loop does, even though you "are not supposed to do that".
in lieu of actually fiddling with the hardware, which may/may not end up larger as code size, playing nice works every time. Also this system does have some very nasty usb quirks where even if the disconnection is not a surprise, the kernel disables the host because it gets wedged. Sometimes it can be recovered via tickling some of the files in /sys to cause the kernel to preform a warm software reset of the chips, but there are some rare occasions that it takes a full powercycle/reset button to recover because even the BIOS gets stuck and won't see the usb keyboard.

So until I get to fiddling with the D+/- lines, a bandage is to do the for() loop. Yeah, it probably could be shorter and the iteration count I simply made up in order to just get it working.

@earlephilhower
Copy link
Owner

earlephilhower commented Feb 14, 2023

Sometimes it can be recovered via tickling some of the files in /sys to cause the kernel to preform a warm software reset of the chips, but there are some rare occasions that it takes a full powercycle/reset button to recover because even the BIOS gets stuck and won't see the usb keyboard.

I would suggest that anyone with a motherboard with such a messed up chipset/BIOS/etc. take it to the nearest electronics recycler and get a shiny Ryzen 5 series and mobo. :)

I'm a little worried about performing extreme unction, breaking some TinyUSB usage rules, and adding additional delays to every upload, ever, for that specific case because I'm not getting reports like that here. If it was something everyone was complaining about, sure. But this seems like just a buggy USB implementation on the host we're bending over backwards to fix at the cost of everyone else's experience.

Another option, not breaking TinyUSB contracts: make the USB message pump do the work for you:

static void usb_irq() {
// if the mutex is already owned, then we are in user code
// in this file which will do a tud_task itself, so we'll just do nothing
// until the next tick; we won't starve
if (mutex_try_enter(&__usb_mutex, nullptr)) {
tud_task();
mutex_exit(&__usb_mutex);
}
}

Make the "imreseting" flag global and in the CheckSerialReset() just set the flag and return. In the above function you can check for that global flag and, if set, do your add'l calls to tud_task, safely and correctly, and finally do the actual reset (or call another fcn to finish up). No recursive tud_task calls, everything gets processed, etc. You should also freeze the other core (see the RP2040 class) to ensure it's not trying to do, say, Serial writes from core1 while you're in the reset cycle.

The same setup in the FreeRTOS USB task is left as an exercise for the reader. :)

@xxxajk
Copy link
Contributor Author

xxxajk commented Feb 15, 2023

Sometimes it can be recovered via tickling some of the files in /sys to cause the kernel to preform a warm software reset of the chips, but there are some rare occasions that it takes a full powercycle/reset button to recover because even the BIOS gets stuck and won't see the usb keyboard.

I would suggest that anyone with a motherboard with such a messed up chipset/BIOS/etc. take it to the nearest electronics recycler and get a shiny Ryzen 5 series and mobo. :)

Actually, most developers don't consider this board as something to develop on because of the implied "gaming" in the name. Hardly anyone uses it on Linux, which is Ryzen 7, and better than 5. This system was purchased and gifted to me specifically because it has this problem on purpose in order to make stuff work with it because I have the hardware tools to see what is happening on the actual physical wires (See posted trace!). I even have problems with monitoring the UPS unless I have a USB 2.0 hub on any of the ports, internal, external, all of them wedge, or simply mysteriously reset without it, going into sleep on the host. Basically the fix is to use a very dumb USB2.0 hub that totally lacks the ability to go into sleep. Thankfully, this is actually optional in the USB spec and the hub keeps everything alive.

I'm a little worried about performing extreme unction, breaking some TinyUSB usage rules, and adding additional delays to every upload, ever, for that specific case because I'm not getting reports like that here. If it was something everyone was complaining about, sure. But this seems like just a buggy USB implementation on the host we're bending over backwards to fix at the cost of everyone else's experience.

Very few people usually report a problem, or are willing to or know how to or have the tools to fix bugs. If you go and look at the USB trace I provided, it CLEARLY shows what is happening on the WIRE, and that is the DTR/baud change is never replied back to the host, and the chipset I have, gets confused, since it is expecting one, and it sees a sleep state, and does not know how to properly see this as a disconnect error condition, or how to report it to the kernel. Kernel takes a long time (up to a a few minutes) to time out. 4sec v.s. several minutes, is an improvement. I agree it may be a little long, especially since the loop count was pulled out of the air.

Another option, not breaking TinyUSB contracts: make the USB message pump do the work for you:

static void usb_irq() {
// if the mutex is already owned, then we are in user code
// in this file which will do a tud_task itself, so we'll just do nothing
// until the next tick; we won't starve
if (mutex_try_enter(&__usb_mutex, nullptr)) {
tud_task();
mutex_exit(&__usb_mutex);
}
}

Make the "imreseting" flag global and in the CheckSerialReset() just set the flag and return. In the above function you can check for that global flag and, if set, do your add'l calls to tud_task, safely and correctly, and finally do the actual reset (or call another fcn to finish up). No recursive tud_task calls, everything gets processed, etc. You should also freeze the other core (see the RP2040 class) to ensure it's not trying to do, say, Serial writes from core1 while you're in the reset cycle.

The same setup in the FreeRTOS USB task is left as an exercise for the reader. :)

Exactly, as I said, was a bandage until I know better. :-)

@earlephilhower
Copy link
Owner

...Very few people usually report a problem...

🤣 If only! Believe me, enough folks let you know when stuff doesn't work! It's when things do work they're quiet like 🐭.

Anyway, feel free to throw in a few hundred tud_task loops in USB pumps for a new PR and I'll be happy to give it a spin.

schkovich pushed a commit to schkovich/arduino-pico that referenced this pull request May 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants