Skip to content

Optimize nettrace-to-TraceLog Conversion#2403

Merged
brianrob merged 7 commits into
microsoft:mainfrom
brianrob:brianrob/universal-to-tracelog-perf
Apr 3, 2026
Merged

Optimize nettrace-to-TraceLog Conversion#2403
brianrob merged 7 commits into
microsoft:mainfrom
brianrob:brianrob/universal-to-tracelog-perf

Conversation

@brianrob

@brianrob brianrob commented Apr 1, 2026

Copy link
Copy Markdown
Member

Summary

Optimizes the nettrace-to-TraceLog/ETLX conversion pipeline, reducing conversion time for a 1.68 GB nettrace file from ~22 minutes to ~40 seconds (34x speedup).

Problem

Converting large nettrace files (e.g., from OneCollect Linux traces with many threads) to TraceLog/ETLX format was extremely slow. Profiling revealed three bottlenecks:

  1. EventCache.SortAndDispatch consumed ~75% of CPU — O(N×T) linear scan over all thread queues per event block, plus per-call LINQ List allocations.
  2. ParsedSymbolMetadata consumed ~10% of CPU — JSON deserialization repeated on every property access.
  3. FileName string allocations consumed ~12% of CPU — GetShortUTF8StringAt() called 3 times per mapping event.

Changes

EventCache min-heap (EventCache.cs)

  • Replaced the O(N×T) linear scan in SortAndDispatch with an O(N×log T) min-heap merge.
  • Introduced a generic MinHeap<TValue> class with full unit test coverage (13 tests).
  • Added _activeThreadQueues HashSet to track only threads with pending events, avoiding iteration over the full thread dictionary.

Cache ParsedSymbolMetadata (UniversalSystemTraceEventParser.cs)

  • Cached the result of ProcessMappingSymbolMetadataParser.TryParse() to avoid repeated JSON deserialization.
  • Reset cache in Dispatch() to prevent stale data when TraceEvent objects are reused.
  • Added Clone() override to preserve cached values in cloned events.

Reduce FileName allocations (NettraceUniversalConverter.cs)

  • Read data.FileName once into a local variable instead of accessing the property 3 times (each access allocates a new string via GetShortUTF8StringAt).
  • Call UniversalMapping(string, ...) overload directly, passing the cached string.

Performance Results

Measured with a 1.68 GB nettrace file (8.6M events, 11.5K processes) using Release builds:

Metric Before After
Conversion time 22:26 (1346s) 00:40 (40s)
Speedup 34×
Events 8,583,800 8,583,800 ✓
Processes 11,509 11,509 ✓
Unique code addresses 174,476 174,476 ✓
Unique stacks 206,467 206,467 ✓
ETLX file size 3,051,574,460 3,051,574,460 ✓

All semantic statistics match. ETLX files are the same size but not byte-identical due to different tie-breaking order for same-timestamp events (min-heap vs linear scan).

Testing

  • All existing tests pass (2,274 on net8.0 + 2,289 on net462).
  • 13 new MinHeap unit tests added.

brianrob and others added 7 commits March 31, 2026 20:33
Replace the O(N*T) linear-scan merge in SortAndDispatch with an O(N*log(T))
min-heap merge, where N is the number of events and T is the number of
threads. The previous implementation rebuilt a List from LINQ on every call
and linearly scanned all thread queues for the minimum timestamp per event.

The new implementation uses an array-backed binary min-heap keyed by
timestamp. After extracting the minimum, only a single O(log T) sift-down
is needed to restore the heap property. The heap list is reused across
calls to avoid per-call allocations.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Maintain a HashSet of thread queues that have pending events instead of
iterating all threads in the dictionary on every SortAndDispatch call.
Queues are added to the active set when their first event is enqueued
and removed when drained. This eliminates the Dictionary.Values
enumeration which was ~28% of CPU during nettrace-to-TraceLog conversion.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Cache the result of ProcessMappingSymbolMetadataParser.TryParse() in
ProcessMappingMetadataTraceData so that repeated accesses to the
ParsedSymbolMetadata property do not re-invoke JSON deserialization.
The property is accessed twice per mapping event (for PE and ELF
metadata checks), and the metadata objects are shared across multiple
mappings via MetadataId. This was ~10% of CPU during conversion.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Read ProcessMappingTraceData.FileName once into a local variable and pass
it directly to UniversalMapping(string, ...) instead of going through the
UniversalMapping(ProcessMappingTraceData, ...) overload. Previously,
FileName was accessed 3 times per mapping event (IsNullOrEmpty check,
StartsWith check, and inside UniversalMapping), each time allocating a
new string via GetShortUTF8StringAt(). String allocation was ~12% of CPU
in Release-mode profiling.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
TraceEvent objects are reused across callbacks. The cached
ParsedSymbolMetadata fields were never cleared between dispatches,
which could return metadata from a previous event if the property
was accessed on the template object rather than a clone.

Reset _parsedSymbolMetadataCached and _parsedSymbolMetadata at the
start of Dispatch() before invoking the callback.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Refactor the min-heap helpers into a self-contained private MinHeap
class with XML doc comments on all public methods. Add comments
explaining the binary heap child index formulas (2i+1, 2i+2). Use
C# tuple swap syntax instead of a temp variable.

Add Clone() override to ProcessMappingMetadataTraceData to explicitly
copy the cached ParsedSymbolMetadata fields into the clone. Strings
are immutable so a shallow copy of the reference is sufficient.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Make MinHeap generic (MinHeap<TValue>) and internal so it can be
tested from the test project. Add 13 unit tests covering: empty heap,
single element, ascending/descending/random input, duplicate keys,
ReplaceRoot, RemoveRoot, Clear, and mixed operations. Add a comment
to Build() explaining why iteration starts at Count/2-1.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@brianrob brianrob marked this pull request as ready for review April 1, 2026 21:06
@brianrob brianrob requested a review from a team as a code owner April 1, 2026 21:06
@brianrob brianrob changed the title Optimize nettrace-to-TraceLog conversion (34x speedup) Optimize nettrace-to-TraceLog Conversion Apr 1, 2026
@brianrob

brianrob commented Apr 1, 2026

Copy link
Copy Markdown
Member Author

cc: @zachcmadsen

@brianrob brianrob merged commit 85f1ca4 into microsoft:main Apr 3, 2026
5 checks passed
@brianrob brianrob deleted the brianrob/universal-to-tracelog-perf branch April 3, 2026 03:32
mitchellvette pushed a commit to mitchellvette/zenvizor that referenced this pull request Jul 1, 2026
Updated
[Microsoft.Diagnostics.Tracing.TraceEvent](https://github.com/Microsoft/perfview)
from 3.1.16 to 3.2.4.

<details>
<summary>Release notes</summary>

_Sourced from [Microsoft.Diagnostics.Tracing.TraceEvent's
releases](https://github.com/Microsoft/perfview/releases)._

## 3.2.4

## Security
This release contains security hardening fixes for a number of
malformed-input parsing and path-traversal vulnerabilities:
- Bounds-checking for malformed event payloads in the BPerf ULZ777
decompressor and event-record parser
- Bounds-checking for malformed metadata in the GCDynamic,
RegisteredTraceEventParser (TDH), Dynamic, and EventPipe V3 parsers
- Bounds-checking for malformed PE CodeView and Resource directory
entries
- Path containment hardening for PDB extraction (zipped ETL + container
PDBs), DiagSession resource extraction, R2R perf map writes, PdbScope
module paths, and dynamic manifest writes
- Path-traversal and command-execution hardening for Source Server
lookups

## What's Changed
* Update CsWin32 Package Version by @​brianrob in
microsoft/perfview#2425
* Fix incorrect field offsets when parsing ETW events with fixed-count
array fields by @​Copilot in
microsoft/perfview#2427
* Retarget Native Profiler Builds To VS 2026 V145 Toolset by @​brianrob
in microsoft/perfview#2428
* Stabilize XamlMessageBox UI-thread dispatch test by @​brianrob in
microsoft/perfview#2430

**Full Changelog**:
microsoft/perfview@v3.2.3...v3.2.4


## 3.2.3

## What's Changed
* Upgrade Microsoft.Windows.CsWin32 to 0.3.209 (GHSA-ghhp-997w-qr28) by
@​Copilot in microsoft/perfview#2409
* Enable Spectre mitigations and linker optimizations for EtwClrProfiler
by @​danmoseley in microsoft/perfview#2410
* Fix 'unhanded' / 'occured' typos in UnhandledExceptionDialog body text
by @​SAY-5 in microsoft/perfview#2413
* Fix GCStats failures on dotnet trace gc-verbose collections (#​2414)
by @​cincuranet in microsoft/perfview#2415
* C entrypoint fixes by @​zachcmadsen in
microsoft/perfview#2421

## New Contributors
* @​SAY-5 made their first contribution in
microsoft/perfview#2413

**Full Changelog**:
microsoft/perfview@v3.2.2...v3.2.3

## 3.2.2

## What's Changed
* Fix PDB Symbol Resolution for Unmerged Windows Traces by @​brianrob in
microsoft/perfview#2407


**Full Changelog**:
microsoft/perfview@v3.2.1...v3.2.2

## 3.2.1

## Native and R2R Symbol Download and Parsing Now Available
As of this release, if you capture a trace using [`dotnet-trace
collect-linux`](https://learn.microsoft.com/en-us/dotnet/core/diagnostics/dotnet-trace#dotnet-trace-collect-linux)
or
[`record-trace`](https://github.com/microsoft/one-collect/tree/main/record-trace),
**native and R2R symbols can now be downloaded and resolved at analysis
time**. All .NET symbols (both native and R2R) are available on the
Microsoft Symbol Server. Additionally, many Azure Linux symbol files are
available on the Microsoft Symbol Server. For those targeting other
distros, PerfView and TraceEvent are capable of pulling those symbol
files from local directories by adding a local symbol path pointing to
the files.

Most of the work for this was completed in PerfView and TraceEvent 3.2.1
with the final required fixes present in this release.

## What's Changed
* Optimize nettrace-to-TraceLog Conversion by @​brianrob in
microsoft/perfview#2403
* Embed missing System.Text.Json transitive dependencies in PerfView by
@​brianrob in microsoft/perfview#2404


**Full Changelog**:
microsoft/perfview@v3.2.0...v3.2.1

## 3.2.0

## What's Changed
* Fix Debug.Assert failures in SpeedScope tests and
DynamicTraceEventParser by @​brianrob in
microsoft/perfview#2368
* Add TraceParserGen.Tests project and fix code generation bugs by
@​Copilot in microsoft/perfview#2308
* Update UsersGuide.htm by @​AftabAnsari10662 in
microsoft/perfview#2370
* Strip .il and .ni suffixes from TraceModuleFile.Name by @​leculver in
microsoft/perfview#2364
* Handle provider names that start with a numeric digit. by @​brianrob
in microsoft/perfview#2369
* Dispose WebView2 controls before Environment.Exit to prevent finalizer
crash by @​brianrob in microsoft/perfview#2371
* Refactor GetManifestForRegisteredProvider to use XmlWriter by
@​Copilot in microsoft/perfview#2353
* docs: Add investigation guidance for JIT-inlined missing stack frames
by @​Copilot in microsoft/perfview#2377
* Fix spurious BROKEN frame at top of Linux thread stacks in CPU Stacks
viewer by @​Copilot in microsoft/perfview#2375
* Fix NRE in AddUniversalDynamicSymbol for invalid symbol address ranges
by @​brianrob in microsoft/perfview#2376
* Add missing authority parameter to log by @​hoyosjs in
microsoft/perfview#2379
* Replace individual code owners with microsoft/perfview-reviewers group
by @​brianrob in microsoft/perfview#2381
* Fix Dynamic Symbol Resolution for Mappings Shared Across Multiple
Processes in Universal Traces by @​brianrob in
microsoft/perfview#2380
* Implement Symbol Demanglers for Linux Binaries by @​brianrob in
microsoft/perfview#2383
* Fix NullReferenceException race condition in
TraceLog.AllocLookup/FreeLookup by @​Copilot in
microsoft/perfview#2387
* Add typed schema for AllocationSampled (EventID 303, .NET 10+) in
ClrTraceEventParser by @​Copilot in
microsoft/perfview#2388
* Add ElfSymbolModule for Parsing ELF Symbol Tables by @​brianrob in
microsoft/perfview#2384
* Update BDN to latest version. by @​cincuranet in
microsoft/perfview#2389
* Fixed overflow when working with large dumps by @​remilema in
microsoft/perfview#2399
* Fix XamlMessageBox STA Threading Crash from Background Threads by
@​brianrob in microsoft/perfview#2400
* Add ELF Symbol Resolution for Linux .nettrace Traces by @​brianrob in
microsoft/perfview#2397
* Add Missing WCF Event Templates by @​brianrob in
microsoft/perfview#2390

## New Contributors
* @​AftabAnsari10662 made their first contribution in
microsoft/perfview#2370
* @​remilema made their first contribution in
microsoft/perfview#2399

**Full Changelog**:
microsoft/perfview@v3.1.30...v3.2.0

## 3.1.30

## What's Changed
* doc: fix typos by @​chinwobble in
microsoft/perfview#2359
* Fix SourceLink parsing to support both wildcard and exact path
mappings by @​ivberg in microsoft/perfview#2355
* add horizontal scrolling to eventviewer by @​logangeorge01 in
microsoft/perfview#2361
* Add SHA-384 and SHA-512 hash algorithm support for PDB checksums by
@​Copilot in microsoft/perfview#2366

## New Contributors
* @​chinwobble made their first contribution in
microsoft/perfview#2359
* @​logangeorge01 made their first contribution in
microsoft/perfview#2361

**Full Changelog**:
microsoft/perfview@v3.1.29...v3.1.30

## 3.1.29

## What's Changed
* Warn users when circular buffer overflow causes missing type info in
allocation views for selected processes by @​Copilot in
microsoft/perfview#2326
* Special-Case BitMask Parsing by @​brianrob in
microsoft/perfview#2327
* Refactor PEFile and PEHeader to use ReadOnlySpan exclusively with
zero-copy buffer sharing by @​Copilot in
microsoft/perfview#2317
* Fix cdbstack parser dropping last sample and missing metrics by
@​Copilot in microsoft/perfview#2329
* Fix unhandled ArgumentOutOfRangeException when exporting FlameGraph
with unrendered canvas by @​Copilot in
microsoft/perfview#2339
* Add guidance for capturing ETW traces in Kubernetes pods by @​Copilot
in microsoft/perfview#2344
* Fix merge command line order in kubernetes documentation by @​Copilot
in microsoft/perfview#2346
* Fix GetRegisteredOrEnabledProviders() documentation claiming list is
small by @​Copilot in microsoft/perfview#2348
* Fix duplicate stringTable elements in instrumentation manifest by
@​Copilot in microsoft/perfview#2347
* Fix Histogram.AddMetric losing values after single-bucket to array
transition by @​Copilot in
microsoft/perfview#2337
* Fix clipboard copy formatting based on selection dimensions in Stack
Viewer by @​Copilot in microsoft/perfview#2332
* Fix XML escaping in GetManifestForRegisteredProvider by @​Copilot in
microsoft/perfview#2351
* Fix race condition in ProviderNameToGuid causing
ERROR_INSUFFICIENT_BUFFER crashes by @​Copilot in
microsoft/perfview#2357


**Full Changelog**:
microsoft/perfview@v3.1.28...v3.1.29

## 3.1.28

## What's Changed
* Add support for Boolean8 to NetTrace V6. by @​noahfalk in
microsoft/perfview#2318
* Implement A Thread Time View for Universal Traces by @​brianrob in
microsoft/perfview#2320
* Remove Incorrect Argument Description by @​brianrob in
microsoft/perfview#2323

**Full Changelog**:
microsoft/perfview@v3.1.26...v3.1.28

## 3.1.26

Roll-up through 2025/10/10.

* Only dispose non-null handles in `ETWTraceEventSource`
[#​2291](microsoft/perfview#2291)
* Small cleanup in `NettraceUniversalConverter`
[#​2292](microsoft/perfview#2292)
* Fix hyperlink focus visibility in dark mode and improve keyboard
navigation [#​2295](microsoft/perfview#2295)
* Gracefully handle invalid characters in `PATH`
[#​2296](microsoft/perfview#2296)
* Fix copying First/Last columns with pipe symbols to work in time range
input [#​2304](microsoft/perfview#2304)


## 3.1.24

Roll-up through 2025/08/26.

* Implement NuGet Central Package Version Management [#​2262]
* Fix broken stacks warning for universal traces [#​2268]
* Fix jitted code symbols in universal traces to show assembly names
instead of memfd:doublemapper [#​2269]
* Use themed background brush for menu and filter [#​2272]
* Improve rendering and dark mode [#​2274]
* Implement configurable symbol server authentication with /SymbolsAuth
command line argument for PerfView and HeapDump [#​2278]
* Add a themed dialog [#​2276]
* Fix regression: "Goto Item in Callers/Callees" now accumulates across
all threads [#​2284]
* Fix parsing issues and add support for additional events to the Linux
perf text file parser [#​2286]
* Fix TraceLog live session RelatedActivityID/ContainerID corruption by
preserving ExtendedData [#​2285]
* NetTrace LabelList metadata overrides and metadata flushing [#​2281]
* Fix NullReferenceException in ProviderBrowser.LevelSelected when
deselecting level [#​2289]

## 3.1.23

Roll-up through 2025/07/11.

- Fixed TraceEvent CaptureState API to support previously unsupported
keyword configurations. [#​2222]
- Added Exception Stacks view for .nettrace files to enhance exception
diagnostics. [#​2223]
- Corrected outdated documentation references to "GC Heap Alloc Stacks".
[#​2224]
- Fixed off-by-one error in P/Invoke buffer handling for Windows volume
events. [#​2227]
- Fixed broken links in the PerfView user guide. [#​2225]
- Improved error handling by throwing when TdhEnumerateProviders fails,
enabling better diagnostics. [#​2177]
- Added AutomationProperties.Name to the Process Selection DataGrid for
improved accessibility. [#​2239]
- Fixed focus indicator visibility for hyperlinks in dark mode and high
contrast themes. [#​2235]
- Addressed NullReferenceException in Anti-Malware view. [#​2233]
- Fixed WebView2 crash on close by implementing proper disposal pattern.
[#​2230]
- Added support for native AOT gcdumps, expanding compatibility with
modern .NET workloads. [#​2242]
- Fixed NVDA screen reader issue where Theme menu items did not announce
selection state. [#​2237]
- Extended PredefinedDynamicTraceEventParser to support dynamic events
from additional sources. [#​2232]
- Implemented MSFZ symbol format support in SymbolReader. [#​2244]
- Removed usage of DefaultAzureCredential, simplifying authentication
dependencies. [#​2255]
- Added option to hide TimeStamp columns in the EventWindow View menu.
[#​2247]
- Fixed NVDA screen reader reporting incorrect list count for File menu
separators. [#​2257]
- Fixed unhandled exception when double-clicking in scroll bar area with
no content. [#​2254]
- Fixed universal symbol conversion for overlapping mappings. [#​2252]
- Fixed TraceEvent.props to respect ProcessorArchitecture when
RuntimeIdentifier is set. [#​2249]

## 3.1.22

Roll-up through 2025/06/04.

- Added GC Heap Analyzer support for .nettrace files to enhance memory
analysis workflows. [#​2216]
- Introduced PredefinedDynamicTraceEventParser for known
TraceLogging events, improving trace event parsing. [#​2220]
- Enabled selection of process trees in the process selection dialog for
multi-process analysis, allowing deeper inspection across related
processes. [#​2195]
- Implemented sorting for the Duration column in the process selection
dialog using TotalDurationSeconds, improving usability. [#​2194]
- Improved NetTrace parameter parsing for better command-line
flexibility. [#​2200]
- Fixed GetActiveSessionNames to handle ERROR_MORE_DATA, resolving
session enumeration issues. [#​2196]
- Fixed ObjectDisposedException when opening Net OS Heap Alloc Stacks,
improving stability. [#​2212]
- Fixed null reference exception in GenFragmentationPercent method,
enhancing reliability. [#​2211]
 - Fixed TreeView auto-expansion when opening trace files. [#​2218]
- Fixed StackViewer issue where "Set Time Range" reset "Goto Items by
callees". [#​2208]
- Fixed markdown table formatting when copying from the stack viewer.
[#​2203]
- Fixed TraceEvent NuGet package to exclude Windows-specific native
DLLs. [#​2215]
- Removed PDB generation for .NET Core assemblies using CrossGen,
reducing build overhead. [#​2202]
- Made symbol server timeout configurable and removed dead code in
SymbolReader. [#​2209]
- Changed help ribbons to use textblocks, enabling tab navigation.
[#​2201]

## 3.1.21

Roll-up through 2025/05/02.

- Change NetTrace format version support
- Add /OSHeapMaxMB to set a max size for OS heap sessions
- Implement Nettrace Support for Traces with Universal Providers
- Implement R2R Symbol Lookup for Linux Traces
- Fix NetTrace parsing for Start/Stop event names
- Fix IndexOutOfRangeException in ProcessGlobalHistory

## 3.1.20

Roll-up through 2025/04/01.

 - Flamegraph and drill-in menu improvements
 - Performance improvements around unhandled event dispatch
 - Add configurable real-time delay in TraceLogEventSource
- Don't queue another flush during a real-time ETW session if one is
already in-process
- Allow configuration of rundown providers for real-time EventPipe
sessions
 - Fix stack handling for NetTrace V4
 - Add multi-line view for events viewer
 - Misc accessibility fixes

## 3.1.19

Roll-up through 2025/01/30.

 - Added missing time information in the Raw XML View for GCStats.
 - Updated activity computation logic to support OpenTelemetry events.
 - Changed timestamp values to use QPC time based on UTC for Relogger.
 - Fixed issues with report command handling.
 - Addressed various POH-related issues.
- Implemented file-size limitation for rundown using an ETW-based
approach.

## 3.1.18

Roll-up through 2024/12/11.

 - Fixed `perfcollect` install script on Azure Linux 3.
- Updated `System.Text.Json` to address
dotnet/announcements#329.

## 3.1.17

Roll-up through 2024/11/08.

- Numerous accessibility fixes to PerfView. This includes switching out
the previous web browser plugin to use WebView2.
- Breaking changes to FastSerialization to ensure that only expected
types are deserialized. This addresses potential vulnerabilities during
deserialization of untrusted input. Details at
microsoft/perfview#2121.

Commits viewable in [compare
view](microsoft/perfview@v3.1.16...v3.2.4).
</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants