Skip to content

Missing activity due to invalid nibble encoding in ActivityTracker.AddIdToGuid for some 12 bits values in EventPipe ActivityID generation #109078

Closed
@chrisnas

Description

@chrisnas

Description

The ActivityTracker class is used in .NET Core/5+ to encode activities into a GUID, 4 bits per 4 bits (i.e. nibble). The AddIdToGuid method is doing the encoding work for a given value into a certain position in a GUID. The bug in the encoding is generating an unexpected 00 byte that stops the decoding (done by Perfview for example)

Reproduction Steps

Use this test application to encode and decode an activity path/ID.
The source code used by this tool is the same (+ fix) as the one used by ActivityTracker and Perfview

Expected behavior

1/4/2000/1 would be stored as 14 c7 d0 00 10 00 instead of 14 c7 d0 10 00.

Actual behavior

There is a special case when a value can be stored in 12 bit (i.e. 3 nibbles) but the number of bytes to store in the GUID is still 2 (as computed for a 0xFF < value <= 0xFFF). This is a problem when the high bits of the value can be stored as the second nibble of the byte containing the extension code. The end result is the storage of 00 as the second byte of the value.
This does not happen when the value is stored in a new byte such as 1/2000/1 is stored as 1d d0 07 10 00

Regression?

I did not try with events generated for ETW

Known Workarounds

There is no workaround today but the fix is simple in ActivityTracker.AddIdToGuid:

WriteNibble(ref ptr, endPtr, (uint)NumberListCodes.MultiByte1 + (len - 1));

// Do we have an odd nibble?   If so flush it or use it for the 12 byte case.
if (ptr < endPtr && *ptr != 0)
{
    // If the value < 4096 we can use the nibble we are otherwise just outputting as padding.
    if (id < 4096)
    {
        // Indicate this is a 1 byte multicode with 4 high order bits in the lower nibble.
        *ptr = (byte)(((uint)NumberListCodes.MultiByte1 << 4) + (id >> 8));

        // FIX: it means that we now just need 1 byte to store the id instead of 2 as computed before
        //      --> the previous line is overwriting the "NumberListCodes.MultiByte1 + (len - 1)" value
        //          with NumberListCodes.MultiByte1 followed by the 4 high order bits of the id
        //      the 00 byte was due to the fact that the "id >>= 8;" line was leading to id = 0
        //      that was stored in the additional unneeded byte
        len = 1;

        id &= 0xFF;     // Now we only want the low order bits.
    }
    ptr++;
}

The len value is then used to store the rest of the value (i.e. the high 4 bits were already stored in the low nibble)

// Write out the bytes.
while (0 < len)
{
    if (endPtr <= ptr)
    {
        ptr++;        // Indicate that we have overflowed
        break;
    }
    *ptr++ = (byte)id;
    id >>= 8;
    --len;
}

but since there is 1 unneeded byte to store, id >>= 8 set id to 0 and store it into the last byte

Configuration

.NET 8
Windows 11
x64
But not specific to any configuration

Other information

The bug has been found while analyzing network events (HTTP, socket, security, DNS) generated by a web application that send 2000 requests when receiving 1 request. The final activity of some requests was missing in some cases:

 23360 | 00d0c714-0000-0000-0000-00003d2d6f5a =       /1/4/2000/ > event   1 __ [ 1| Start] RequestStart
       |> http://localhost:5500/Products/Info/1990
 23360 | 00d0c714-0010-0000-0000-00002d2d6f5a =       /1/4/2000/ > event   7 __ [ 1| Start] RequestHeadersStart
       |QH[ 20]
 23360 | 00d0c714-0010-0000-0000-00002d2d6f5a =       /1/4/2000/ > event   8 __ [ 2|  Stop] RequestHeadersStop
      <|QH
 44048 | 00d0c714-0020-0000-0000-00001d2d6f5a =       /1/4/2000/ > event  11 __ [ 1| Start] ResponseHeadersStart
       |RH>
 44048 | 00d0c714-0020-0000-0000-00001d2d6f5a =       /1/4/2000/ > event  12 __ [ 2|  Stop] ResponseHeadersStop
  200 <|RH
 44048 | 00d0c714-0030-0000-0000-00000d2d6f5a =       /1/4/2000/ > event  13 __ [ 1| Start] ResponseContentStart
       |RC>
 44048 | 00d0c714-0030-0000-0000-00000d2d6f5a =       /1/4/2000/ > event  14 __ [ 2|  Stop] ResponseContentStop
      <|RC
 44048 | 00d0c714-0000-0000-0000-00003d2d6f5a =       /1/4/2000/ > event   2 __ [ 2|  Stop] RequestStop
  200 <|

Since each Start/Stop action is adding an activity, I was expecting to see them (i.e. /1/4/2000/xx/ instead of /1/4/2000/
However, the GUID seemed to contain the missing xx value like in the second 32 value -0010- or -0020-.

Note that, in Perfview, the decoding method is stopping on the additional 00 byte so the final activity is not displayed
Image

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions