Skip to content

cpu/rp2350_common/periph/uart: fix PL011 FIFO handling#22269

Closed
cakirmert wants to merge 2 commits into
RIOT-OS:masterfrom
cakirmert:feat/rp2350-uart-fixes
Closed

cpu/rp2350_common/periph/uart: fix PL011 FIFO handling#22269
cakirmert wants to merge 2 commits into
RIOT-OS:masterfrom
cakirmert:feat/rp2350-uart-fixes

Conversation

@cakirmert
Copy link
Copy Markdown

@cakirmert cakirmert commented May 10, 2026

Contribution description

Six fixes to the RP2350 PL011 UART driver in
cpu/rp2350_common/periph/uart.c. The bugs being fixed cause byte
loss and ISR stalls at 115200 baud during sustained traffic; they
are not RP2350-specific PL011 quirks (they would apply to any
PL011-using BSP) but the symptoms are visible enough on RP2350 in
two-board UART setups to be worth fixing here.

  1. Enable RX/TX FIFOs (FEN bit in UARTLCR_H). Without FEN, the
    PL011 has only a 1-byte receive holding register; any RX ISR
    latency above one byte time (~87 us at 115200 baud) silently
    overruns. With FIFOs on, RX gets a 32-byte buffer.
  2. Enable RX timeout interrupt (RTIM bit in UARTIMSC). RTIM
    fires when the RX FIFO is non-empty but below the trigger level
    for ~32 baud-clocks. Without it, the trailing < trigger bytes
    of a burst sit in the FIFO until enough new bytes arrive, which
    never happens at the end of a frame.
  3. Drain the entire RX FIFO per ISR invocation. The previous
    ISR read one byte and returned, leaving any remaining bytes in
    the FIFO until the next interrupt edge.
  4. Replace puts() in the ISR error path with a volatile
    counter.
    puts() at 115200 baud takes ~4 ms, far longer than
    the inter-byte budget, and stalled the ISR enough to drop
    subsequent bytes. The new counter (rp2350_uart_rx_error_count)
    is exposed for diagnostics and the byte is passed through to the
    rx_cb so framing stays aligned.
  5. Fix TX polling. The previous code polled UARTRIS.TXRIS,
    which in FIFO mode signals "FIFO crossed the trigger threshold
    downward", an edge condition that may never occur for short
    writes, hanging the loop. Replaced with UARTFR.TXFF (wait while
    full) per byte plus UARTFR.BUSY (wait until idle) after the
    last byte.
  6. Documentation. Added register-level block comments around
    the non-obvious bits (FEN, RTIM, TXFF/BUSY semantics, the
    puts-in-ISR anti-pattern).

No public API change; no header churn; only the RP2350 BSP file is
touched.

Testing procedure

Two Pi Pico 2 H boards wired with a UART crosswire (each board's
GP8/GP9 to the other's GP9/GP8) and a USB-UART bridge for stdio
observation. An application-level mutual-attestation harness
exchanged 2696-byte signed quotes between the two boards at
115200 baud, 1000 round-trips over a 1534 s run. Result: every
completed round passed (the 1 round counted as not-OK was simply
in-flight when the test stopped). One byte-level RX error was
flagged by the PL011 hardware (FE/PE/BE bit set in UARTDR)
across ~17.7 million received bytes; with the new ISR (counter +
pass-through), that byte was delivered to the application and
absorbed transparently: i.e., zero application-level retries or
failures.

On master (without these fixes), the same harness drops bytes
within a handful of rounds at 115200 baud. The simplest
reproduction without the harness is any back-to-back two-board
UART exchange that issues a printf from inside the RX ISR path -
which the original driver did via puts() on parity / framing
errors.

Before this PR is applied: the PL011 driver in master drops bytes
under sustained 115200-baud RX load and can hang the TX poll loop
on short writes (see the symptoms enumerated under "Contribution
description" above). With the PR applied: every application using
the RP2350 UART at 115200 baud should be reliable, and the
rp2350_uart_rx_error_count symbol is available for diagnostic
introspection. No public API change.

tests/periph/uart cannot currently be used as a test vehicle on
rpi-pico-2-arm because the BSP does not yet provide
periph_timer (which ztimer_msec, used by that test, depends
on). Once #22270 lands, that test path will be available as an
additional validation route.

Issues/PRs references

Companion PRs from the same investigation, independent merge order:

Declaration of AI-Tools / LLMs usage:

AI-Tools / LLMs that were used are:

  • Claude Code (Anthropic, claude-opus-4-7): agent mode with user
    review for code drafting, PL011 datasheet cross-referencing,
    PR description authoring, and test orchestration. The original
    uart.c is by Tom Hert (HAW Hamburg): these are diagnostic
    fixes layered on top. All six fixes were chosen and reviewed by
    me; every register-level claim was checked against the RP2350
    datasheet and the PL011 TRM; the 1000-round soak test was run on
    real hardware before the PR was opened.

@Teufelchen1
Copy link
Copy Markdown
Contributor

In #22268 @AnnsAnns already asked you to use the PR template.

Please switch to the default PR template description and properly declare your AI usage.

Please follow up with that before opening more PRs. Otherwise I will close them as spam.

@cakirmert cakirmert marked this pull request as ready for review May 10, 2026 20:23
@cakirmert cakirmert requested a review from AnnsAnns as a code owner May 10, 2026 20:23
@cakirmert
Copy link
Copy Markdown
Author

cakirmert commented May 10, 2026

@Teufelchen1 Done, sorry about the friction. Both your comment and @AnnsAnns's landed while all three PRs were still in draft; I'd marked them ready for review at almost the same time without realising the comments had come in. The descriptions on all three PRs are now rewritten using PULL_REQUEST_TEMPLATE.md with the Contribution description / Testing procedure / Issues/PRs references / AI declaration sections.

Explicit AI declaration: this was done with Claude Code (Anthropic, claude-opus-4-7) in agent mode with my review on every commit. The PL011 fixes were checked against the RP2350 datasheet and PL011 TRM; the 1000-round soak test was run on real Pi Pico 2 H hardware before opening the PR.

Happy to take any further changes.

Comment thread cpu/rp2350_common/periph/uart.c Outdated
Comment thread cpu/rp2350_common/periph/uart.c Outdated
Comment thread cpu/rp2350_common/periph/uart.c Outdated
* (~87 us at 115200 baud) causes an overrun and the byte is silently
* dropped. With FIFOs on, RX gets a 32-byte buffer, eliminating drop
* during normal stdio printing or concurrent TX bursts. Required for
* reliable two-board UART framing on RP2350. */
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this code is based on 12.1.7?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually it's §12.1.3 (Operation, FIFO vs character mode behavior) and §12.1.8 (UARTLCR_H register description) for the FEN bit, not §12.1.7. Citations are in the comment now.

Comment on lines +258 to +266
/* Counter of single-byte UART RX errors (parity/break/framing). The
* original driver printed via puts() inside the ISR — at 115200 baud
* that print takes ~4 ms, far longer than the inter-byte budget, and
* stalled the ISR so badly that the next bytes were silently lost.
* We now (a) count instead of print, and (b) PASS the byte through
* to the rx_cb anyway — keeping the byte stream aligned (bit-level
* corruption is recoverable at protocol layer; byte loss is much
* harder to recover from because it desyncs framing). */
volatile uint32_t rp2350_uart_rx_error_count;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't full understand this, did you receive that many writes of that debug put?
I generally agree that this put should have probably been a debug print but I have not received this print a single time (as can be seen by the fact that I even forgot to change it to [rp2350]).
Are you doing something specific to trigger that print so often?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me re-state more precisely what I have evidence for and what I don't.

Empirical: in a 1000-round mutual-attestation soak (1534 s, ~17.7 M received bytes between two Pi Pico 2 H boards at 115200 baud) rp2350_uart_rx_error_count reached 1. That counts the PL011 hardware raising a BE / PE / FE flag in UARTDR for one byte. All 999 completed rounds passed; the error was absorbed at the protocol layer (the corrupt byte was passed through to the rx_cb and either fell into a non-critical position or its frame's signature check rejected and retried, I do not have framing-level traces from the soak to distinguish those two). I have not investigated the root cause of that single event.

What I therefore do not claim: I don't have a per-round rate strong enough to call this rare or common. I haven't constructed a case that fires it on demand. The single event might be line noise, a clock-drift glitch, or anything else, I don't know. So your observation that the print never fires in your testing is consistent with mine: under any reasonable test the trigger condition is at least extremely rare, but I'm not making a quantitative claim about its rate.

What I am claiming, and where the change comes from: if the trigger fires at any rate, the original puts() path has a consequence that the counter-based path does not. The original puts() string "[rpx0xx] uart RX error (parity, break, or framing error" plus the trailing newline is 56 bytes. At 115200 baud (10 bits per byte, ~87 µs per byte), writing 56 bytes from the RX ISR blocks for ~4.9 ms. While the ISR is blocked the peer keeps clocking bytes in at 87 µs each, about 56 more bytes, but the PL011 RX FIFO holds 32 bytes with FEN on or 1 byte with FEN off. So a single fire under the original code would cause FIFO overflow and silent loss of ~20 to 56 subsequent bytes.

The byte that triggered the event is already corrupt (one bit on the wire was wrong); bit-level corruption is recoverable at the protocol layer via the next signature/CRC check. Byte loss of the next ~56 bytes is harder: a length-prefixed frame that expected 2696 bytes but only received 2640 desyncs the receiver's parser for everything that follows, and depending on the protocol may require resync logic or never recover. So under the original code one rare BE/PE/FE event amplifies a 1-bit corruption into ~56 bytes of frame-desyncing loss; under the counter-based code the same event increments a counter, passes the (still-corrupt) byte through to rx_cb, the application's integrity check rejects the frame, the sender retries, and the bounded-time ISR contract is preserved.

Cost of the fix: an atomic increment plus a register read, both ISR-safe and bounded-time. The rp2350_uart_rx_error_count symbol is volatile and exposed for diagnostic sampling without printing anything from the ISR.

I'm happy to run a longer test (10,000 rounds, ~4 hours) if you'd like a stronger data point on the rate, but the argument above doesn't depend on the rate, only on the consequence per fire, so I haven't done it preemptively. The comment block above the counter has been updated to phrase this as preventive-not-reactive (the previous wording read as though I had observed the stall, which I had not).

@AnnsAnns AnnsAnns added the CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR label May 11, 2026
@riot-ci
Copy link
Copy Markdown

riot-ci commented May 11, 2026

Murdock results

✔️ PASSED

b2ef2a7 cpu/rp2350_common/periph/uart: clarify counter comment

Success Failures Total Runtime
11108 0 11108 12m:18s

Artifacts

Six fixes to the RP2350 UART driver addressing byte loss and ISR
stalls at 115200 baud during sustained two-board UART traffic. The
RP2350 UART is derived from ARM PrimeCell PL011, so the underlying
bit semantics are the standard PL011 ones; the fixes are RP2350-
specific only in that this driver implements them for the RP2350
BSP. They would apply to any PL011-derived UART driver.

- Enable RX/TX FIFOs (FEN bit in UARTLCR_H). Without FEN the UART
  is in character mode with a 1-byte holding register; any RX ISR
  latency above one byte time (~87 us at 115200) overruns and
  drops a byte.
- Enable RX timeout interrupt (RTIM in UARTIMSC) so trailing
  sub-trigger bytes are delivered promptly.
- Drain the entire RX FIFO per ISR invocation.
- Replace puts() in the ISR error path with a volatile counter
  (rp2350_uart_rx_error_count). puts() at 115200 baud is ~4 ms,
  far longer than the inter-byte budget, so a single fire would
  stall the ISR enough to overrun the next several bytes. This is
  a preventive change rather than a fix for an observed bug.
- Fix TX polling: use UARTFR.TXFF + UARTFR.BUSY instead of
  UARTRIS.TXRIS (which is an edge condition that may never occur
  for short writes).
- Add register-level documentation comments with RP2350 datasheet
  section citations (12.1.3, 12.1.6, 12.1.8).

No public API change.

Tested with continuous UART crosswire traffic at 115200 baud
between two Pi Pico 2 H boards (1000-round soak, zero application-
level failures, one hardware-flagged byte error transparently
absorbed via the counter).
@cakirmert cakirmert force-pushed the feat/rp2350-uart-fixes branch from 2026455 to be7eeda Compare May 11, 2026 21:43
Correct the byte-count and timing in the error-counter comment block
(56 bytes / ~4.9 ms instead of 46 / ~4 ms) so the numbers match the
actual original puts() string length.
@AnnsAnns
Copy link
Copy Markdown
Member

Hello, once again thank you for being wilful to contribute to RIOT, we do appreciate it.

Due to various reasons, such as the currently unresolved legal issues around AI code combined with the very high degree of AI assistance, among other things, we have decided that we can not accept this PR as of now.

This is not meant to stop you from contributing to RIOT. For example, if you want to PR code written by yourself, we are open to review and help you.

@AnnsAnns AnnsAnns closed this May 12, 2026
@AnnsAnns
Copy link
Copy Markdown
Member

On a personal note, esp. the fix that enabled the FIFO for the UART would be quite nice to have. If you are willing to do so, I'd highly appreciate a PR for that written by yourself. Thank you for noticing that 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area: cpu Area: CPU/MCU ports CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants