-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix race condition bugs when reading and writing to message buffers #264
Conversation
Thanks - looking at this now. First thing I need to do is re-produce the original issue. |
Adding the following to the soak test finds the issue very quickly:
|
…le in a message buffer from two separate tasks. Designed to highlight the issue reported in FreeRTOS/FreeRTOS-Kernel#264 Introduce configRUN_ADDITIONAL_TESTS which must be set to 1 to run the new tests. That is because the new tests got added to an existing standard demo file and smaller platforms may not have the resources to run them. Set configRUN_ADDITIONAL_TESTS to 1 in the MSVC and IAR/QEMU project so both project run the new test. Also add missing 'volatile' qualifier in the IAR/QEMU project on some register accesses.
…le in a message buffer from two separate tasks. Designed to highlight the issue reported in FreeRTOS/FreeRTOS-Kernel#264 Introduce configRUN_ADDITIONAL_TESTS which must be set to 1 to run the new tests. That is because the new tests got added to an existing standard demo file and smaller platforms may not have the resources to run them. Set configRUN_ADDITIONAL_TESTS to 1 in the MSVC and IAR/QEMU project so both project run the new test. Also add missing 'volatile' qualifier on some register accesses in the same file.
I have tests for the first reported condition now - unfortunately this PR doesn't get as far as the new test as it causes an earlier test to fail. Looking into it now. |
…le in a message buffer from two separate tasks. Designed to highlight the issue reported in FreeRTOS/FreeRTOS-Kernel#264 Introduce configRUN_ADDITIONAL_TESTS which must be set to 1 to run the new tests. That is because the new tests got added to an existing standard demo file and smaller platforms may not have the resources to run them. Set configRUN_ADDITIONAL_TESTS to 1 in the MSVC and IAR/QEMU project so both project run the new test. Also add missing 'volatile' qualifier in the IAR/QEMU project on some register accesses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've left a few comments (potential bugs) and a question. The question is about the addition of a configMIN() on line 1010. If you can let me know the purpose of that I can make the other changes, or you can as you prefer.
It is also necessary to update xStreamBufferSpacesAvailable(), see the comment in the code below.
With all these changes the old and new tests pass.
size_t xStreamBufferSpacesAvailable( StreamBufferHandle_t xStreamBuffer )
{
const StreamBuffer_t * const pxStreamBuffer = xStreamBuffer;
size_t xSpace;
volatile size_t xOriginalTail;
configASSERT( pxStreamBuffer );
/* The code below reads xTail and then xHead. This is safe if the stream
buffer is updated once between the two reads - but not if the stream buffer
is updated more than once between the two reads - hence the loop. */
do
{
xOriginalTail = pxStreamBuffer->xTail;
xSpace = pxStreamBuffer->xLength + pxStreamBuffer->xTail;
xSpace -= pxStreamBuffer->xHead;
} while( xOriginalTail != pxStreamBuffer->xTail );
xSpace -= ( size_t ) 1;
if( xSpace >= pxStreamBuffer->xLength )
{
xSpace -= pxStreamBuffer->xLength;
}
else
{
mtCOVERAGE_TEST_MARKER();
}
return xSpace;
}
Original author: RichardBarry
- prvWriteMessageToBuffer - prvReadMessageFromBuffer - prvWriteBytesToBuffer - prvReadBytesFromBuffer
stream_buffer.c
Outdated
if( xCount > ( size_t ) 0 ) | ||
{ | ||
configASSERT( xCount != ( size_t ) 0 ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The non-zero xCount
check is performed by all callers, so configASSERT
seems more appropriate here, and lets us drop a level of indentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure about this one. It is conceivable a user would pass xBufferLengthBytes as zero into prvReadMessageFromBuffer() at run time - in which case it should recover gracefully. Also in that case it would err if configASSERT() was not defined. While it would seem strange to pass xBufferLengthBytes as zero it might happen if, for example, a user passed xBufferLengthBytes as the free space in a buffer and that buffer were to become full.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is conceivable a user would pass xBufferLengthBytes as zero into prvReadMessageFromBuffer() at run time
Are we talking about prvReadMessageFromBuffer() or prvReadBytesFromBuffer()? In either case, the user can't call these private helper functions directly, and any attempt to pass a zero length into these functions is guarded against by existing checks. No guarantees that future code edits preserve all these checks though.
stream_buffer.c
Outdated
size_t xNextHead = pxStreamBuffer->xHead; | ||
|
||
if( xSpace == ( size_t ) 0 ) | ||
if( xDataLengthBytes != xRequiredSpace ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could alternatively rewrite this line as:
if( ( pxStreamBuffer->ucFlags & sbFLAGS_IS_MESSAGE_BUFFER ) != ( uint8_t ) 0 )
They're currently equivalent. The noisier version might be clearer or safer long-term.
I applied all review feedback from your comments, and hopefully answered all questions.
Also added a few clarifying comments and some questions of my own. |
stream_buffer.c
Outdated
{ | ||
mtCOVERAGE_TEST_MARKER(); | ||
} | ||
configASSERT( xCount != ( size_t ) 0 ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why my other review comment on this line is being marked as "outdated". Also, github's web viewer gets confused with this section unless you enable the "hide whitespace changes" diff setting.
@RichardBarry Is there anything I can do to help move this along? I was holding-off on rebasing to avoid modifying commits that have already been reviewed, but happy to do that if it would be helpful. |
Sorry - dropped the ball - will look tomorrow (it's 20.45 here now). |
Just an update. Been looking at this today. Closed off some of the conversations. Will run tests again as I have it now, then look at some of the additional changes. |
* Introduce tasks that test the coherency of the reported space available in a message buffer from two separate tasks. Designed to highlight the issue reported in FreeRTOS/FreeRTOS-Kernel#264 Introduce configRUN_ADDITIONAL_TESTS which must be set to 1 to run the new tests. That is because the new tests got added to an existing standard demo file and smaller platforms may not have the resources to run them. Set configRUN_ADDITIONAL_TESTS to 1 in the MSVC and IAR/QEMU project so both project run the new test. Also add missing 'volatile' qualifier in the IAR/QEMU project on some register accesses. * Update xAreMessageBufferTasksStillRunning() to report errors from the new message buffer size coherency tests. Co-authored-by: RichardBarry <ribarry@amazon.com> Co-authored-by: RichardBarry <3073890+RichardBarry@users.noreply.github.com>
Please see #286 which fixes an err in this PR. |
FYI for any future readers wondering why this PR rewrites the entire file, the last commit changed line endings from
Would it be worth adding a CI step to check for line endings and other formatting issues? Looks like there's a proposal for that in #274. |
The change in FreeRTOS/FreeRTOS-Kernel#264 updated prvWriteMessageToBuffer in a way that it no longer calls prvWriteBytesToBuffer for messages of zero size (prvWriteBytesToBuffer triggers an assert for messages of zero size). As a result, a zero byte send to a stream buffer no longer results in assert failure but instead returns 0. This change updates the zero byte send test to check for 0 return value. Signed-off-by: Gaurav Aggarwal <aggarg@amazon.com>
Description of original issue
There are a few sections of message buffer code that are vulnerable to race conditions. The underlying cause is that length and data bytes are written and read as two distinct operations, which both modify the size of the buffer. If a context switch occurs after adding or removing the length bytes, but before adding or removing the data bytes, then another task may observe the message buffer in this invalid state, and could further corrupt the contents with another buffer-modifying operation.
Here are some specific examples of how things can go wrong:
Corruption due to Send in the middle of a Receive:
Corruption due to Send in the middle of a NextMessageLength:
Crash due to NextMessageLength in the middle of a Send:
Description of changes
The fix is to ensure that the buffer's
xHead
andxTail
pointers are only modified after both length and payload are either written or read. This makes the read and write operations appear to be atomic, even if they are interrupted with a context switch.This also simplifies some code, such as in
xStreamBufferReceive
andxStreamBufferNextMessageLengthBytes
, where we don't have to roll-back to an originalxTail
if we're not ready to actually consume the data.I also included some minor clean-up tasks, which I hope are clearly described in the individual commits of this PR, but I could alternatively break those out into separate PRs if you'd prefer.
This change does not modify any public APIs.
Test Steps
I'm able to consistently reproduce the third (crashing) issue after a few seconds of execution on an STM32 uC. I'm running at high USB data rates and using this snippet to create large data blocks to transmit. Here's a snippet of that code, which makes many calls to
xMessageBufferNextLengthBytes()
:With the changes in this PR, I am not able to reproduce the issue, even after running for an hour+.
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.