Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix race condition bugs when reading and writing to message buffers #264

Merged
merged 11 commits into from
Mar 20, 2021

Conversation

milesfrain
Copy link
Contributor

Description of original issue

There are a few sections of message buffer code that are vulnerable to race conditions. The underlying cause is that length and data bytes are written and read as two distinct operations, which both modify the size of the buffer. If a context switch occurs after adding or removing the length bytes, but before adding or removing the data bytes, then another task may observe the message buffer in this invalid state, and could further corrupt the contents with another buffer-modifying operation.

Here are some specific examples of how things can go wrong:

Corruption due to Send in the middle of a Receive:

// Initial condition:
// 50-byte buffer contains a 20-byte message, plus 4-byte length overhead

// Task 1:
xStreamBufferReceive // to 10-byte destination
  prvReadMessageFromBuffer
    prvReadBytesFromBuffer // get 4-byte length, shifts tail, buffer now appears to have 30 bytes free space
    // Check if 10-byte destination is large enough for 20-byte message. It is not.
    // Context switch

// Task 2:
xStreamBufferSend // 25-byte payload, plus 4-byte length overhead
  // Sees 30 bytes free, so writes 29-byte message.
  // There should only be 26 bytes of free space.
// Context switch

// Task 1:
    // Continue from context switch.
    // Restore original tail position because destination is too small.
    // 3 of 4 length field bytes at the original tail position now contain overwritten data from Task 2.
    // Buffer is corrupted.

Corruption due to Send in the middle of a NextMessageLength:

// Initial condition:
// Buffer has 30 bytes free space

// Task 1:
xStreamBufferNextMessageLengthBytes
  prvReadBytesFromBuffer // consumes 4-byte length field and makes buffer appear to have 34 bytes free space
  // Not yet called // reset tail position so buffer has original 30 bytes free space

-- Context Switch --

// Task 2:
// Attempt to send a 32-byte (with overhead) packet
xStreamBufferSend
  // Sees 34 spaces available
  // Writes packet and clobbers 2-bytes of off-limits space

-- Context Switch --

// Task 1:
  // Continue from context switch.
  // Reset tail position
  // Now the next 'length' field contains 2-bytes of garbage data

Crash due to NextMessageLength in the middle of a Send:

// Initial condition:
// Buffer is empty

// Task 1:
xStreamBufferSend
  prvWriteMessageToBuffer
    prvWriteBytesToBuffer // write 4-byte length field
    // Not yet called // prvWriteBytesToBuffer // write n-byte payload

-- Context Switch --

// Task 2:
xStreamBufferNextMessageLengthBytes
  prvBytesInBuffer // just returns 4
  // Crash because expecting 5 or more bytes (at least 1 payload byte)

Description of changes

The fix is to ensure that the buffer's xHead and xTail pointers are only modified after both length and payload are either written or read. This makes the read and write operations appear to be atomic, even if they are interrupted with a context switch.

This also simplifies some code, such as in xStreamBufferReceive and xStreamBufferNextMessageLengthBytes, where we don't have to roll-back to an original xTail if we're not ready to actually consume the data.

I also included some minor clean-up tasks, which I hope are clearly described in the individual commits of this PR, but I could alternatively break those out into separate PRs if you'd prefer.

This change does not modify any public APIs.

Test Steps

I'm able to consistently reproduce the third (crashing) issue after a few seconds of execution on an STM32 uC. I'm running at high USB data rates and using this snippet to create large data blocks to transmit. Here's a snippet of that code, which makes many calls to xMessageBufferNextLengthBytes():

    // Keep attempting to pack more messages into single USB transfer
    while (1) {
      // Check for additional messages to read
      next_msg_len = xMessageBufferNextLengthBytes(txMsgBuf.handle);
      // If there's another message to read and we have space for it
      if (next_msg_len && txLen + next_msg_len <= sizeof(txBuf)) {
        // Grab next message with no timeout
        size_t next_rec_len = xMessageBufferReceive(txMsgBuf.handle, txBuf + txLen, sizeof(txBuf) - txLen, 0);
        // Sanity check that we read the expected number of bytes
        if (next_rec_len != next_msg_len) {
          critical();
        }
        // Note consumed bytes
        txLen += next_rec_len;
      } else {
        // Either no more messages, or no more space
        break;
      }
    }

    // Setup transmit
    USBD_CDC_SetTxBuffer(&usbDeviceHandle, txBuf, txLen);

With the changes in this PR, I am not able to reproduce the issue, even after running for an hour+.


By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@milesfrain milesfrain requested a review from a team as a code owner February 19, 2021 01:32
stream_buffer.c Outdated Show resolved Hide resolved
stream_buffer.c Outdated Show resolved Hide resolved
@RichardBarry
Copy link
Contributor

Thanks - looking at this now. First thing I need to do is re-produce the original issue.

@RichardBarry
Copy link
Contributor

Adding the following to the soak test finds the issue very quickly:

static void prvSpaceAvailableCoherenceActor( void *pvParameters )
{
static char *cTxString = "12345";
char cRxString[ mbCOHERENCE_TEST_BYTES_WRITTEN + 1 ]; /* +1 for NULL terminator. */

    ( void ) pvParameters;

    for( ;; )
    {
        /* Add bytes to the buffer so the other task should see
        mbEXPECTED_FREE_BYTES_AFTER_WRITING_STRING bytes free. */
        xMessageBufferSend( xCoherenceTestMessageBuffer, ( void * ) cTxString, strlen( cTxString ), 0 );
        configASSERT( xMessageBufferSpacesAvailable( xCoherenceTestMessageBuffer ) == mbEXPECTED_FREE_BYTES_AFTER_WRITING_STRING );

        /* Read out message again so the other task should read the full
        mbCOHERENCE_TEST_BUFFER_SIZE bytes free again. */
        memset( ( void * ) cRxString, 0x00, sizeof( cRxString ) );
        xMessageBufferReceive( xCoherenceTestMessageBuffer, ( void * ) cRxString, mbCOHERENCE_TEST_BYTES_WRITTEN, 0 );
        configASSERT( strcmp( cTxString, cRxString ) == 0 );
    }
}
/*-----------------------------------------------------------*/

static void prvSpaceAvailableCoherenceTester( void *pvParameters )
{
size_t xSpaceAvailable;
uint32_t ulSuccessCount = 0UL;
BaseType_t xErrorFound = pdFALSE;

    ( void ) pvParameters;

    for( ;; )
    {
        /* This message buffer is only ever empty or contains 5 bytes.  So all
        queries of its free space should result in one of the two values tested
        below. */
        xSpaceAvailable = xMessageBufferSpacesAvailable( xCoherenceTestMessageBuffer );

        if( ( xSpaceAvailable == mbCOHERENCE_TEST_BUFFER_SIZE ) ||
            ( xSpaceAvailable == mbEXPECTED_FREE_BYTES_AFTER_WRITING_STRING ) )
        {
            ulSuccessCount++;
        }
        else
        {
            xErrorFound = pdTRUE;
        }

        configASSERT( xErrorFound == pdFALSE );
    }
}

RichardBarry added a commit to RichardBarry/FreeRTOS that referenced this pull request Feb 20, 2021
…le in a message buffer from two separate tasks. Designed to highlight the issue reported in FreeRTOS/FreeRTOS-Kernel#264

Introduce configRUN_ADDITIONAL_TESTS which must be set to 1 to run the new tests.  That is because the new tests got added to an existing standard demo file and smaller platforms may not have the resources to run them.
Set configRUN_ADDITIONAL_TESTS to 1 in the MSVC and IAR/QEMU project so both project run the new test.
Also add missing 'volatile' qualifier in the IAR/QEMU project on some register accesses.
RichardBarry added a commit to RichardBarry/FreeRTOS that referenced this pull request Feb 20, 2021
…le in a message buffer from two separate tasks. Designed to highlight the issue reported in FreeRTOS/FreeRTOS-Kernel#264

Introduce configRUN_ADDITIONAL_TESTS which must be set to 1 to run the new tests. That is because the new tests got added to an existing standard demo file and smaller platforms may not have the resources to run them.

Set configRUN_ADDITIONAL_TESTS to 1 in the MSVC and IAR/QEMU project so both project run the new test. Also add missing 'volatile' qualifier on some register accesses in the same file.
@RichardBarry
Copy link
Contributor

I have tests for the first reported condition now - unfortunately this PR doesn't get as far as the new test as it causes an earlier test to fail. Looking into it now.

cobusve pushed a commit to cobusve/FreeRTOS that referenced this pull request Feb 21, 2021
…le in a message buffer from two separate tasks. Designed to highlight the issue reported in FreeRTOS/FreeRTOS-Kernel#264

Introduce configRUN_ADDITIONAL_TESTS which must be set to 1 to run the new tests.  That is because the new tests got added to an existing standard demo file and smaller platforms may not have the resources to run them.
Set configRUN_ADDITIONAL_TESTS to 1 in the MSVC and IAR/QEMU project so both project run the new test.
Also add missing 'volatile' qualifier in the IAR/QEMU project on some register accesses.
Copy link
Contributor

@RichardBarry RichardBarry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left a few comments (potential bugs) and a question. The question is about the addition of a configMIN() on line 1010. If you can let me know the purpose of that I can make the other changes, or you can as you prefer.

It is also necessary to update xStreamBufferSpacesAvailable(), see the comment in the code below.

With all these changes the old and new tests pass.

size_t xStreamBufferSpacesAvailable( StreamBufferHandle_t xStreamBuffer )
{
    const StreamBuffer_t * const pxStreamBuffer = xStreamBuffer;
    size_t xSpace;
    volatile size_t xOriginalTail;

    configASSERT( pxStreamBuffer );

    /* The code below reads xTail and then xHead.  This is safe if the stream
    buffer is updated once between the two reads - but not if the stream buffer
    is updated more than once between the two reads - hence the loop. */
    do
    {
        xOriginalTail = pxStreamBuffer->xTail;
        xSpace = pxStreamBuffer->xLength + pxStreamBuffer->xTail;
        xSpace -= pxStreamBuffer->xHead;
    } while( xOriginalTail != pxStreamBuffer->xTail );
    xSpace -= ( size_t ) 1;

    if( xSpace >= pxStreamBuffer->xLength )
    {
        xSpace -= pxStreamBuffer->xLength;
    }
    else
    {
        mtCOVERAGE_TEST_MARKER();
    }

    return xSpace;
}

stream_buffer.c Outdated Show resolved Hide resolved
stream_buffer.c Outdated Show resolved Hide resolved
stream_buffer.c Outdated Show resolved Hide resolved
Original author: RichardBarry
- prvWriteMessageToBuffer
- prvReadMessageFromBuffer
- prvWriteBytesToBuffer
- prvReadBytesFromBuffer
stream_buffer.c Outdated Show resolved Hide resolved
stream_buffer.c Outdated
Comment on lines 1210 to 1202
if( xCount > ( size_t ) 0 )
{
configASSERT( xCount != ( size_t ) 0 );
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The non-zero xCount check is performed by all callers, so configASSERT seems more appropriate here, and lets us drop a level of indentation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about this one. It is conceivable a user would pass xBufferLengthBytes as zero into prvReadMessageFromBuffer() at run time - in which case it should recover gracefully. Also in that case it would err if configASSERT() was not defined. While it would seem strange to pass xBufferLengthBytes as zero it might happen if, for example, a user passed xBufferLengthBytes as the free space in a buffer and that buffer were to become full.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is conceivable a user would pass xBufferLengthBytes as zero into prvReadMessageFromBuffer() at run time

Are we talking about prvReadMessageFromBuffer() or prvReadBytesFromBuffer()? In either case, the user can't call these private helper functions directly, and any attempt to pass a zero length into these functions is guarded against by existing checks. No guarantees that future code edits preserve all these checks though.

stream_buffer.c Outdated Show resolved Hide resolved
stream_buffer.c Outdated
size_t xNextHead = pxStreamBuffer->xHead;

if( xSpace == ( size_t ) 0 )
if( xDataLengthBytes != xRequiredSpace )
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could alternatively rewrite this line as:

if( ( pxStreamBuffer->ucFlags & sbFLAGS_IS_MESSAGE_BUFFER ) != ( uint8_t ) 0 )

They're currently equivalent. The noisier version might be clearer or safer long-term.

@milesfrain
Copy link
Contributor Author

I applied all review feedback from your comments, and hopefully answered all questions.
For clarity, I made separate commits for:

  • Your xStreamBufferSpacesAvailable fix.
  • Minimal set of review feedback bug fixes.
  • Additional refactoring. All of these changes should have no impact on functionality.

Also added a few clarifying comments and some questions of my own.

stream_buffer.c Outdated
{
mtCOVERAGE_TEST_MARKER();
}
configASSERT( xCount != ( size_t ) 0 );
Copy link
Contributor Author

@milesfrain milesfrain Feb 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why my other review comment on this line is being marked as "outdated". Also, github's web viewer gets confused with this section unless you enable the "hide whitespace changes" diff setting.

@milesfrain milesfrain requested a review from RichardBarry March 2, 2021 21:14
@milesfrain
Copy link
Contributor Author

milesfrain commented Mar 18, 2021

@RichardBarry Is there anything I can do to help move this along?

I was holding-off on rebasing to avoid modifying commits that have already been reviewed, but happy to do that if it would be helpful.

@RichardBarry
Copy link
Contributor

Sorry - dropped the ball - will look tomorrow (it's 20.45 here now).

@RichardBarry
Copy link
Contributor

Just an update. Been looking at this today. Closed off some of the conversations. Will run tests again as I have it now, then look at some of the additional changes.

RichardBarry added a commit to FreeRTOS/FreeRTOS that referenced this pull request Mar 20, 2021
* Introduce tasks that test the coherency of the reported space available in a message buffer from two separate tasks.  Designed to highlight the issue reported in FreeRTOS/FreeRTOS-Kernel#264
Introduce configRUN_ADDITIONAL_TESTS which must be set to 1 to run the new tests.  That is because the new tests got added to an existing standard demo file and smaller platforms may not have the resources to run them.
Set configRUN_ADDITIONAL_TESTS to 1 in the MSVC and IAR/QEMU project so both project run the new test.
Also add missing 'volatile' qualifier in the IAR/QEMU project on some register accesses.

* Update xAreMessageBufferTasksStillRunning() to report errors from the new message buffer size coherency tests.

Co-authored-by: RichardBarry <ribarry@amazon.com>
Co-authored-by: RichardBarry <3073890+RichardBarry@users.noreply.github.com>
RichardBarry
RichardBarry previously approved these changes Mar 20, 2021
@RichardBarry RichardBarry merged commit 6685c04 into FreeRTOS:main Mar 20, 2021
RichardBarry added a commit to RichardBarry/FreeRTOS-Kernel that referenced this pull request Mar 20, 2021
@RichardBarry
Copy link
Contributor

Please see #286 which fixes an err in this PR.

@milesfrain milesfrain deleted the atomic-msgbuf branch March 21, 2021 22:59
@milesfrain
Copy link
Contributor Author

FYI for any future readers wondering why this PR rewrites the entire file, the last commit changed line endings from dos to unix. Currently, all files in the repo use dos line endings, except for the following files which use unix endings:

stream_buffer.c
include/event_groups.h
include/queue.h
include/task.h
include/timers.h

Would it be worth adding a CI step to check for line endings and other formatting issues? Looks like there's a proposal for that in #274.

aggarg added a commit to aggarg/FreeRTOS that referenced this pull request Apr 2, 2021
The change in FreeRTOS/FreeRTOS-Kernel#264
updated prvWriteMessageToBuffer in a way that it no longer calls
prvWriteBytesToBuffer for messages of zero size (prvWriteBytesToBuffer
triggers an assert for messages of zero size). As a result, a zero byte
send to a stream buffer no longer results in assert failure but instead
returns 0. This change updates the zero byte send test to check for 0
return value.

Signed-off-by: Gaurav Aggarwal <aggarg@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants