Skip to content

Conversation

@mdh1418
Copy link
Member

@mdh1418 mdh1418 commented Dec 11, 2025

This is #122134 but with the NuGet source moved from the test directory to the repo root's NuGet.config.

I briefly explored using dotnet-trace from the dotnet-tools nuget source, but extra hurdles were popping up with that approach, from installing the tool which also brought .store/dotnet-trace/9.0.660901/dotnet-trace/9.0.660901/tools/net8.0/any/ and invoking it would need to resolve sdk location. Even if all were resolved, the added benefit is fairly minimal, since dotnet-trace is just a wrapper around record-trace.


Repeating #122134's Description

With user_events support added in #115265, this PR looks to test a few end-to-end user_events scenario.

Alternative testing approaches considered

Existing EventPipe runtime tests

Existing EventPipe tests under src/tests/tracing/eventpipe are incompatible with testing the user_events scenario due to:

  1. Starting EventPipeSessions through DiagnosticClient ❌
    DiagnosticClient does not have the support to send the IPC command to start a user_events based EventPipe session, because it requires the user_events_data file descriptor to be sent using SCM_RIGHTS (see https://github.com/dotnet/diagnostics/blob/main/documentation/design-docs/ipc-protocol.md#passing_file_descriptor).

  2. Using an EventPipeEventSource to validate events streamed through EventPipe ❌
    User_events based EventPipe sessions do not stream events. Instead, events are written to configured TraceFS tracepoints, and currently only RecordTrace from https://github.com/microsoft/one-collect/ is capable of generating .nettrace traces from tracepoint user_events.

Native EventPipe Unit Tests

There are Mono Native EventPipe tests under src/mono/mono/eventpipe/test that are not hooked up to CI. These unit tests are built through linking the shared EventPipe interface library against Mono's EventPipe runtime shims and using Mono's test runner. To update these unit tests into the standard runtime tests structure, a larger investment is needed to either migrate EventPipe from using runtime shims to a OS Pal source shared by coreclr/nativeaot/mono (see #118874 (comment)) or build an EventPipe shared library specifically for the runtime test using a runtime-agnostic shim.
As existing mono unit tests don't currently test IPC commands, coupled with no existing runtime infrastructure to read events from tracepoints, there would be even more work on top of updating mono native eventpipe unit tests to even test the user_events scenario.

End-to-End Testing Added

A low-cost approach to testing .NET Runtime's user_events functionality leverages RecordTrace from https://github.com/microsoft/one-collect/, which is already capable of starting user_events based EventPipe sessions and generating .nettraces. (Note: dotnet-trace wraps around RecordTrace)
Despite adding an external dependency which allows RecordTrace failures to fail the end-to-end test, user_events was initially added with the intent to depend on RecordTrace for the end-to-end scenario, and there are no other ways to functionally test a user_events based eventpipe session.

Approach

Each scenario uses the same pattern:

  1. Scenario invokes the shared test runner

    User events scenarios can differ in their tracee logic, the events expected in the .nettrace, the record-trace script used to collect those events, and how long it takes for the tracee to emit them and for record-trace to resolve symbols and write the .nettrace. To handle this variance, UserEventsTestRunner lets each scenario pass in its scenario-specific record-trace script path, the path to its test assembly (used to spawn the tracee process), a validator that checks for the expected events from the tracee, and optional timeouts for both the tracee and record-trace to exit gracefully.

  2. UserEventsTestRunner orchestrates tracing and validation

    Using this configuration, UserEventsTestRunner first checks whether user events are supported. It then starts record-trace with the scenario’s script and launches the tracee process so it can emit events. After the run completes, the runner stops both the tracee and record-trace, opens the resulting .nettrace with EventPipeEventSource, and applies the scenario’s validator to confirm that the expected events were recorded. Finally, it returns an exit code indicating whether the scenario passed or failed.

Dependencies:

  • Environment with a kernel 6.4+, .NET 10, glibc 2.35+
  • Microsoft.OneCollect.RecordTrace (transitively resolved through a dotnet diagnostics public feed)
  • Microsoft.Diagnostics.Tracing.TraceEvent 3.1.24+ (to read NetTrace V6)

Helix Nuances

UserEvents functional runtime tests differ from other runtime tests because it depends on OneCollect's Record-Trace tool to enable a userevents-based eventpipe session and to collect events. By design, Record-Trace requires elevated privileges, so these tests invoke a record-trace executable with sudo.

When tests run on Helix, test artifacts are stripped of their permissions, so the test infrastructure was modified to give record-trace execute permissions (helix-extra-executables.list). Moreover, to avoid having one copy of record-trace per scenario, which in turn requires re-adding execute permissions for each, more modifications were added to copy over a single record-trace executable that would be used by all scenarios (OutOfProcess marker).

Additionally, in Helix environments, TMPDIR is set to a helix specific temporary directory like /datadisks/disk1/work//t, and at this time, record-trace only scans /tmp/ for the runtime's diagnostic ports. So as a workaround, the tracee apps are spawned with TMPDIR set to /tmp.

Lastly, the job steps to run tests on AzDO prevents restoring individual runtime test projects. Because record-trace is currently only resolvable through the dotnet-diagnostics-tests source, userevents_common.csproj was added to the group of projects restored at the beginning of copying native test components to restore Microsoft.OneCollect.RecordTrace.

UserEvents functional runtime tests differ from other runtime tests
because it depends on OneCollect's Record-Trace tool to enable a
userevents-based eventpipe session and to collect events. By design,
Record-Trace requires elevated privileges, so these tests invoke
a record-trace executable with sudo.

When tests run on Helix, test artifacts are stripped of their
permissions, so the test infrastructure was modified to give
record-trace execute permissions (helix-extra-executables.list).
Moreover, to avoid having one copy of record-trace per scenario,
which in turn requires re-adding execute permissions for each,
more modifications were added to copy over a single record-trace
executable that would be used by all scenarios (OutOfProcess marker).

Additionally, in Helix environments, TMPDIR is set to a helix specific
temporary directory like /datadisks/disk1/work/<id>/t, and at this time,
record-trace only scans /tmp/ for the runtime's diagnostic ports. So as
a workaround, the tracee apps are spawned with TMPDIR set to /tmp.

Lastly, the job steps to run tests on AzDO prevents restoring individual
runtime test projects. Because record-trace is currently only resolvable
through the dotnet-diagnostics-tests source, userevents_common.csproj
was added to the group of projects restored at the beginning of copying
native test components to restore Microsoft.OneCollect.RecordTrace.
@mdh1418
Copy link
Member Author

mdh1418 commented Dec 11, 2025

I tested the nuget move in the official branch, and although it didn't run all of the jobs, the same step that was breaking in #122134 (comment) passed in the one job step in the manual run. https://dev.azure.com/dnceng/internal/_build/results?buildId=2858321&view=logs&j=320ff1fa-c69d-516e-625f-b09cabcdcc83&t=3da399e3-b4b0-5cc1-d79b-34355ed77635

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds comprehensive functional tests for the .NET Runtime's user_events feature, which enables event tracing through Linux kernel tracepoints. The tests validate end-to-end scenarios by using Microsoft's RecordTrace tool to collect events and verify their correctness.

Key changes:

  • Introduces a shared test infrastructure with UserEventsTestRunner that orchestrates tracing, spawns tracee processes, and validates collected events
  • Adds five test scenarios covering basic runtime events, activity tracking, custom metadata, managed events, and multi-threading
  • Implements Helix CI integration with special handling for sudo requirements and executable permissions

Reviewed changes

Copilot reviewed 24 out of 24 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
src/tests/tracing/userevents/common/UserEventsTestRunner.cs Shared test orchestrator that manages record-trace and tracee processes, handles cleanup, and validates trace files
src/tests/tracing/userevents/common/UserEventsRequirements.cs Environment validation logic checking kernel version, glibc, tracefs, and user_events support
src/tests/tracing/userevents/common/userevents_common.csproj Build configuration that copies RecordTrace executable and creates Helix integration markers
src/tests/tracing/userevents/basic/basic.cs Test scenario validating runtime AllocationSampled events
src/tests/tracing/userevents/basic/basic.csproj Build configuration for basic test scenario
src/tests/tracing/userevents/basic/basic.script RecordTrace script configuration for basic scenario
src/tests/tracing/userevents/activity/activity.cs Test scenario validating activity ID correlation across async tasks
src/tests/tracing/userevents/activity/activity.csproj Build configuration for activity test scenario
src/tests/tracing/userevents/activity/activity.script RecordTrace script configuration for activity scenario
src/tests/tracing/userevents/custommetadata/custommetadata.cs Test scenario validating custom event metadata with EtwSelfDescribingEventFormat
src/tests/tracing/userevents/custommetadata/custommetadata.csproj Build configuration for custommetadata test scenario
src/tests/tracing/userevents/custommetadata/custommetadata.script RecordTrace script configuration for custommetadata scenario
src/tests/tracing/userevents/managedevent/managedevent.cs Test scenario validating managed user event source emission
src/tests/tracing/userevents/managedevent/managedevent.csproj Build configuration for managedevent test scenario
src/tests/tracing/userevents/managedevent/managedevent.script RecordTrace script configuration for managedevent scenario
src/tests/tracing/userevents/multithread/multithread.cs Test scenario validating concurrent event emission from multiple threads
src/tests/tracing/userevents/multithread/multithread.csproj Build configuration for multithread test scenario
src/tests/tracing/userevents/multithread/multithread.script RecordTrace script configuration for multithread scenario
src/tests/tracing/userevents/README.md Documentation explaining test architecture, flow, and directory layout
src/tests/build.proj Adds userevents_common to restore projects for early RecordTrace resolution
src/tests/Common/helixpublishwitharcade.proj Helix integration to restore execute permissions on copied executables
src/tests/issues.targets Excludes user_events tests from NativeAOT due to memfd compatibility issues
eng/Versions.props Updates TraceEvent version to 3.1.28 and adds RecordTrace version 0.1.32221
NuGet.config Adds dotnet-diagnostics-tests package source for RecordTrace resolution

Copy link
Member

@lateralusX lateralusX left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with the user event test implementation in this PR. Regarding the added nuget feed, getting hold of record-trace tool needed to run the user events tests (local and on helix), I believe that would need signoff from people more involved in processes adding new runtime nuget dependencies and additional nuget feeds.

@mdh1418 mdh1418 requested a review from akoeplinger December 11, 2025 16:35
@mdh1418
Copy link
Member Author

mdh1418 commented Dec 11, 2025

@jkoritzinsky @hoyosjs @akoeplinger Does the nuget addition look good to y'all?

Copy link
Member

@jkoritzinsky jkoritzinsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NuGet changes look good to me.

@mdh1418
Copy link
Member Author

mdh1418 commented Dec 12, 2025

These tests will be flakey until #122472 is resolved. So going to hold off until then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants