-
Notifications
You must be signed in to change notification settings - Fork 5.3k
[Tests][UserEvents] Add userevents functional runtime tests #122430
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[Tests][UserEvents] Add userevents functional runtime tests #122430
Conversation
UserEvents functional runtime tests differ from other runtime tests because it depends on OneCollect's Record-Trace tool to enable a userevents-based eventpipe session and to collect events. By design, Record-Trace requires elevated privileges, so these tests invoke a record-trace executable with sudo. When tests run on Helix, test artifacts are stripped of their permissions, so the test infrastructure was modified to give record-trace execute permissions (helix-extra-executables.list). Moreover, to avoid having one copy of record-trace per scenario, which in turn requires re-adding execute permissions for each, more modifications were added to copy over a single record-trace executable that would be used by all scenarios (OutOfProcess marker). Additionally, in Helix environments, TMPDIR is set to a helix specific temporary directory like /datadisks/disk1/work/<id>/t, and at this time, record-trace only scans /tmp/ for the runtime's diagnostic ports. So as a workaround, the tracee apps are spawned with TMPDIR set to /tmp. Lastly, the job steps to run tests on AzDO prevents restoring individual runtime test projects. Because record-trace is currently only resolvable through the dotnet-diagnostics-tests source, userevents_common.csproj was added to the group of projects restored at the beginning of copying native test components to restore Microsoft.OneCollect.RecordTrace.
|
I tested the nuget move in the official branch, and although it didn't run all of the jobs, the same step that was breaking in #122134 (comment) passed in the one job step in the manual run. https://dev.azure.com/dnceng/internal/_build/results?buildId=2858321&view=logs&j=320ff1fa-c69d-516e-625f-b09cabcdcc83&t=3da399e3-b4b0-5cc1-d79b-34355ed77635 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds comprehensive functional tests for the .NET Runtime's user_events feature, which enables event tracing through Linux kernel tracepoints. The tests validate end-to-end scenarios by using Microsoft's RecordTrace tool to collect events and verify their correctness.
Key changes:
- Introduces a shared test infrastructure with
UserEventsTestRunnerthat orchestrates tracing, spawns tracee processes, and validates collected events - Adds five test scenarios covering basic runtime events, activity tracking, custom metadata, managed events, and multi-threading
- Implements Helix CI integration with special handling for sudo requirements and executable permissions
Reviewed changes
Copilot reviewed 24 out of 24 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
src/tests/tracing/userevents/common/UserEventsTestRunner.cs |
Shared test orchestrator that manages record-trace and tracee processes, handles cleanup, and validates trace files |
src/tests/tracing/userevents/common/UserEventsRequirements.cs |
Environment validation logic checking kernel version, glibc, tracefs, and user_events support |
src/tests/tracing/userevents/common/userevents_common.csproj |
Build configuration that copies RecordTrace executable and creates Helix integration markers |
src/tests/tracing/userevents/basic/basic.cs |
Test scenario validating runtime AllocationSampled events |
src/tests/tracing/userevents/basic/basic.csproj |
Build configuration for basic test scenario |
src/tests/tracing/userevents/basic/basic.script |
RecordTrace script configuration for basic scenario |
src/tests/tracing/userevents/activity/activity.cs |
Test scenario validating activity ID correlation across async tasks |
src/tests/tracing/userevents/activity/activity.csproj |
Build configuration for activity test scenario |
src/tests/tracing/userevents/activity/activity.script |
RecordTrace script configuration for activity scenario |
src/tests/tracing/userevents/custommetadata/custommetadata.cs |
Test scenario validating custom event metadata with EtwSelfDescribingEventFormat |
src/tests/tracing/userevents/custommetadata/custommetadata.csproj |
Build configuration for custommetadata test scenario |
src/tests/tracing/userevents/custommetadata/custommetadata.script |
RecordTrace script configuration for custommetadata scenario |
src/tests/tracing/userevents/managedevent/managedevent.cs |
Test scenario validating managed user event source emission |
src/tests/tracing/userevents/managedevent/managedevent.csproj |
Build configuration for managedevent test scenario |
src/tests/tracing/userevents/managedevent/managedevent.script |
RecordTrace script configuration for managedevent scenario |
src/tests/tracing/userevents/multithread/multithread.cs |
Test scenario validating concurrent event emission from multiple threads |
src/tests/tracing/userevents/multithread/multithread.csproj |
Build configuration for multithread test scenario |
src/tests/tracing/userevents/multithread/multithread.script |
RecordTrace script configuration for multithread scenario |
src/tests/tracing/userevents/README.md |
Documentation explaining test architecture, flow, and directory layout |
src/tests/build.proj |
Adds userevents_common to restore projects for early RecordTrace resolution |
src/tests/Common/helixpublishwitharcade.proj |
Helix integration to restore execute permissions on copied executables |
src/tests/issues.targets |
Excludes user_events tests from NativeAOT due to memfd compatibility issues |
eng/Versions.props |
Updates TraceEvent version to 3.1.28 and adds RecordTrace version 0.1.32221 |
NuGet.config |
Adds dotnet-diagnostics-tests package source for RecordTrace resolution |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with the user event test implementation in this PR. Regarding the added nuget feed, getting hold of record-trace tool needed to run the user events tests (local and on helix), I believe that would need signoff from people more involved in processes adding new runtime nuget dependencies and additional nuget feeds.
|
@jkoritzinsky @hoyosjs @akoeplinger Does the nuget addition look good to y'all? |
jkoritzinsky
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NuGet changes look good to me.
|
These tests will be flakey until #122472 is resolved. So going to hold off until then. |
This is #122134 but with the NuGet source moved from the test directory to the repo root's NuGet.config.
I briefly explored using
dotnet-tracefrom the dotnet-tools nuget source, but extra hurdles were popping up with that approach, from installing the tool which also brought.store/dotnet-trace/9.0.660901/dotnet-trace/9.0.660901/tools/net8.0/any/and invoking it would need to resolve sdk location. Even if all were resolved, the added benefit is fairly minimal, since dotnet-trace is just a wrapper around record-trace.Repeating #122134's Description
With user_events support added in #115265, this PR looks to test a few end-to-end user_events scenario.
Alternative testing approaches considered
Existing EventPipe runtime tests
Existing EventPipe tests under
src/tests/tracing/eventpipeare incompatible with testing the user_events scenario due to:Starting EventPipeSessions through DiagnosticClient ❌
DiagnosticClient does not have the support to send the IPC command to start a user_events based EventPipe session, because it requires the user_events_data file descriptor to be sent using SCM_RIGHTS (see https://github.com/dotnet/diagnostics/blob/main/documentation/design-docs/ipc-protocol.md#passing_file_descriptor).
Using an EventPipeEventSource to validate events streamed through EventPipe ❌
User_events based EventPipe sessions do not stream events. Instead, events are written to configured TraceFS tracepoints, and currently only RecordTrace from https://github.com/microsoft/one-collect/ is capable of generating
.nettracetraces from tracepoint user_events.Native EventPipe Unit Tests
There are Mono Native EventPipe tests under
src/mono/mono/eventpipe/testthat are not hooked up to CI. These unit tests are built through linking the shared EventPipe interface library against Mono's EventPipe runtime shims and using Mono's test runner. To update these unit tests into the standard runtime tests structure, a larger investment is needed to either migrate EventPipe from using runtime shims to a OS Pal source shared by coreclr/nativeaot/mono (see #118874 (comment)) or build an EventPipe shared library specifically for the runtime test using a runtime-agnostic shim.As existing mono unit tests don't currently test IPC commands, coupled with no existing runtime infrastructure to read events from tracepoints, there would be even more work on top of updating mono native eventpipe unit tests to even test the user_events scenario.
End-to-End Testing Added
A low-cost approach to testing .NET Runtime's user_events functionality leverages RecordTrace from https://github.com/microsoft/one-collect/, which is already capable of starting user_events based EventPipe sessions and generating
.nettraces. (Note: dotnet-trace wraps around RecordTrace)Despite adding an external dependency which allows RecordTrace failures to fail the end-to-end test, user_events was initially added with the intent to depend on RecordTrace for the end-to-end scenario, and there are no other ways to functionally test a user_events based eventpipe session.
Approach
Each scenario uses the same pattern:
Scenario invokes the shared test runner
User events scenarios can differ in their tracee logic, the events expected in the .nettrace, the record-trace script used to collect those events, and how long it takes for the tracee to emit them and for record-trace to resolve symbols and write the .nettrace. To handle this variance, UserEventsTestRunner lets each scenario pass in its scenario-specific record-trace script path, the path to its test assembly (used to spawn the tracee process), a validator that checks for the expected events from the tracee, and optional timeouts for both the tracee and record-trace to exit gracefully.
UserEventsTestRunnerorchestrates tracing and validationUsing this configuration, UserEventsTestRunner first checks whether user events are supported. It then starts record-trace with the scenario’s script and launches the tracee process so it can emit events. After the run completes, the runner stops both the tracee and record-trace, opens the resulting .nettrace with EventPipeEventSource, and applies the scenario’s validator to confirm that the expected events were recorded. Finally, it returns an exit code indicating whether the scenario passed or failed.
Dependencies:
Helix Nuances
UserEvents functional runtime tests differ from other runtime tests because it depends on OneCollect's Record-Trace tool to enable a userevents-based eventpipe session and to collect events. By design, Record-Trace requires elevated privileges, so these tests invoke a record-trace executable with sudo.
When tests run on Helix, test artifacts are stripped of their permissions, so the test infrastructure was modified to give record-trace execute permissions (helix-extra-executables.list). Moreover, to avoid having one copy of record-trace per scenario, which in turn requires re-adding execute permissions for each, more modifications were added to copy over a single record-trace executable that would be used by all scenarios (OutOfProcess marker).
Additionally, in Helix environments, TMPDIR is set to a helix specific temporary directory like /datadisks/disk1/work//t, and at this time, record-trace only scans /tmp/ for the runtime's diagnostic ports. So as a workaround, the tracee apps are spawned with TMPDIR set to /tmp.
Lastly, the job steps to run tests on AzDO prevents restoring individual runtime test projects. Because record-trace is currently only resolvable through the dotnet-diagnostics-tests source, userevents_common.csproj was added to the group of projects restored at the beginning of copying native test components to restore Microsoft.OneCollect.RecordTrace.