Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

drivers: serial: Fix race condition in nRF5 UART TX #7431

Merged

Conversation

durub
Copy link

@durub durub commented May 9, 2018

Fix a somewhat rare race condition when the thread gets
preempted in the middle of sending a byte through UART.

If the other thread also sends another byte through UART
and "consumes" the EVENTS_TXDRDY value, the first thread
will get stuck in the while loop forever.

By moving the reset to the function start, we guarantee that
the baseline state of EVENTS_TXRDY is 1. Therefore, the first
thread will continue normally when it executes again.

Signed-off-by: Thiago Silveira thiago@exati.com.br

@durub
Copy link
Author

durub commented May 9, 2018

A little tricky to reproduce, and I don't have an easy test case on hand. I was reproducing this on our own project. At first I tried to lock the scheduler in a printk-heavy area of the code, and that fixed the issue. I investigated further and pinpointed it to the uart_nrf5_poll_out while loop.

I also found that RIOT-OS had the same issue in the past, with the exact same fix: RIOT-OS/RIOT#5954

@durub
Copy link
Author

durub commented May 9, 2018

I also removed the documentation part about checking if the transmitter is empty. This doesn't seem to be done at the moment or before. Maybe writing to uart->TXD does it implicitly, but reading the datasheet, it doesn't appear to have this information readily available.

Thoughts?

cc @carlescufi @cvinayak

@codecov-io
Copy link

codecov-io commented May 9, 2018

Codecov Report

Merging #7431 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #7431   +/-   ##
=======================================
  Coverage   55.25%   55.25%           
=======================================
  Files         468      468           
  Lines       51673    51673           
  Branches     9893     9893           
=======================================
  Hits        28554    28554           
  Misses      19206    19206           
  Partials     3913     3913

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b56b143...af6de18. Read the comment docs.

@carlescufi carlescufi requested a review from anangl May 9, 2018 14:25
Copy link
Member

@carlescufi carlescufi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No objections from me, but will let @anangl take a look at it before I approve.

@anangl
Copy link
Member

anangl commented May 10, 2018

The patch is okay.
From what I see, it fixes the handling of the TXDRDY event in general. Currently, the event is cleared at the exit of _poll_out, and this is not entirely good since the same event is also used by "interrupt driven" API to provide TX status (_tx_complete and _tx_ready). In case someone decided to combine usage of both APIs (what is not forbidden, I guess), this would mean troubles.

Regarding the removed documentation part - indeed, the current implementation does not do exactly what UART API states it should do. Actually, it should wait until the transmitter becomes empty before writing a new character there, i.e. do it like this:

 	/* Wait for transmitter to be ready */
 	while (!uart->EVENTS_TXDRDY) {
 	}

	/* reset transmitter ready state */
	uart->EVENTS_TXDRDY = 0;

 	/* send a character */
 	uart->TXD = (u8_t)c;

But then, we would fall into a similar issue. When the thread is preempted between clearing the TXDRDY event and writing to TXD, then the preempting thread would hang on the while loop if it tried to call _poll_out as well.

Therefore, I think the patch is the best what we can do to provide "thread safety" here without introducing some locking.
Maybe it would be worth mentioning in the code that the while loop has to be placed after the write to TXD (and consequently, _poll_out can't wait for empty transmitter right before writing to TXD) to avoid the rare yet possible race condition.

@carlescufi
Copy link
Member

@anangl thanks for the thorough explanation.
@durub mind adding a comment in the code reflecting what @anangl mentions?

@durub durub force-pushed the uart-nrf5-poll-out-rc-fix branch 3 times, most recently from 132d0ac to af6de18 Compare May 15, 2018 16:53
Fix a somewhat rare race condition when the thread gets
preempted in the middle of sending a byte through UART.

If the other thread also sends another byte through UART
and "consumes" the EVENTS_TXDRDY value, the first thread
will get stuck in the while loop forever.

By moving the reset to the function start, we guarantee that
the baseline state of EVENTS_TXRDY is 1. Therefore, the first
thread will continue normally when it executes again.

Signed-off-by: Thiago Silveira <thiago@exati.com.br>
@durub
Copy link
Author

durub commented May 15, 2018

I chatted with @anangl over IRC and we agreed on the comment in the commit.

@carlescufi, I think this is ready now! @anangl, thanks for the explanation and help.

@galak galak added area: UART Universal Asynchronous Receiver-Transmitter platform: nRF Nordic nRFx labels May 16, 2018
@carlescufi carlescufi merged commit 05c45e3 into zephyrproject-rtos:master May 16, 2018
@durub durub deleted the uart-nrf5-poll-out-rc-fix branch May 16, 2018 16:15
@leodesigner
Copy link

Hi,
After this commit the shell is broken. (nrf52 custom board)
Just no reaction to any command at all.

***** Booting Zephyr OS 1.11.99 *****
Zephyr Shell, Zephyr version: 1.11.99
Type 'help' for a list of available commands
shell> 
help
help

Before this commit shell is responding to the commands.
Should I open the issue ?

@carlescufi
Copy link
Member

@leodesigner I can reproduce, let me open the issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: UART Universal Asynchronous Receiver-Transmitter platform: nRF Nordic nRFx
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants