[WIP][Perf] Optimize allocations in the layout engine#34155
Draft
simonrozsival wants to merge 11 commits intonet11.0from
Draft
[WIP][Perf] Optimize allocations in the layout engine#34155simonrozsival wants to merge 11 commits intonet11.0from
simonrozsival wants to merge 11 commits intonet11.0from
Conversation
… reuse - Convert Cell, Definition, and GridStructure from class to struct - Use ArrayPool for IView[], Cell[], and Definition[] arrays - Track actual counts (_childCount, _rowCount, _columnCount) for rented arrays - Add int defsCount parameter to all static methods operating on Definition[] - Reuse Dictionary<SpanKey, double> across measure passes (Clear instead of new) - Convert SpanKey to IEquatable<SpanKey> with HashCode.Combine - Convert foreach to indexed for loops in ArrangeChildren - Add lazy Dictionary initialization for no-span grids - Result: Grid layout achieves 0 B managed allocations (was 87-457 KB) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Replace float[] size = {w, h} in SelfSizing with two local variables
- Add [InlineArray(4)] FrameBuffer for Flex.Item.Frame (NET8_0_OR_GREATER)
- Use ArrayPool for ordered_indices and lines arrays in flex_layout
- Convert lines array growth from Array.Resize(+1) to doubling strategy
- Convert foreach to indexed for loops in FlexLayoutManager
- Change frame index fields from uint to int (InlineArray requirement)
- Result: Core Flex engine achieves 0 B managed allocations
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Cache ILayout.Count and Spacing in local variables - Convert foreach to indexed for loops in Measure/ArrangeChildren - Cache childCount in StackLayoutManager.UsesExpansion - Result: Stack layout maintains 0 B allocations with real objects Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add InvalidationEventArgs.GetCached(trigger) for cached instances - Replace new InvalidationEventArgs(trigger) in VisualElement, Page, Layout - Result: 0 B per invalidation dispatch Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- LayoutAllocBenchmarker: lightweight fakes for TRUE allocation measurement - LayoutHotPathBenchmarker: hot-path benchmarks with NSubstitute/Controls objects - InvalidationBenchmarker: invalidation event dispatch benchmark - initial-analysis.md: detailed performance analysis with results Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The remaining Flex Controls-layer allocations (64 B/child/pass) are caused by BindableProperty.SetValue boxing doubles for X/Y/Width/Height in VisualElement.UpdateBoundsComponents. Generic BindableProperty<T> will eliminate this overhead. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The analysis content has been incorporated into the PR description and issue #34154. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Let's not wait for .net11 for these awesome changes! 👯 |
Revert foreach→for and Count/Spacing caching in Stack and FlexLayoutManager — benchmarks confirm these had zero allocation impact (compiler already optimizes foreach on concrete types, and Stack already used indexed for loops). Remove manual LoopCount loops from benchmarks — let BenchmarkDotNet handle iteration for proper statistical analysis. Per-operation numbers are now directly readable. Grid ArrangeChildren retains the foreach→for change because foreach on IGridLayout (interface) boxes the List<IView>.Enumerator struct (verified: 1.56 KB/op with foreach vs 0 B with for). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Grid: construct GridStructure into local first, then return old arrays and swap — prevents dangling state if constructor throws - Flex: wrap layout_item body in try/finally to ensure ArrayPool arrays are always returned via cleanup() even on exceptions - Grid: add comment documenting ArrayPool lifecycle and IDisposable consideration for GridLayoutManager Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
pictos
reviewed
Feb 20, 2026
| // Since no rows are specified, we'll create an implied row 0 | ||
| return Implied(); | ||
| _rows = ArrayPool<Definition>.Shared.Rent(1); | ||
| _rows[0] = new Definition(GridLength.Star); |
Contributor
There was a problem hiding this comment.
For this scenario, wouldn’t it be better to have a cached Definition[] oneRow = [new Definition(GridLength.Star) ? The smallest array returned by ArrayPool is of length 16, so this is maybe an overhead (?)
same for InitializeColumns
Member
Author
There was a problem hiding this comment.
Thanks for the suggestion, that makes a lot of sense.
For grids without explicit row/column definitions, cache the single-element Definition[] arrays on the GridLayoutManager instead of renting them from ArrayPool. The smallest pool bucket is 16 elements, so caching a reusable Definition[1] avoids unnecessary pool overhead for the common case. Suggested-by: Pictos Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Member
Author
|
Pushed 3bc89bd — caches the implied arrays on instead of renting from . When no explicit row/column definitions are specified, the cached array is reused across measure/arrange calls. Still 0 B allocated in benchmarks. |
Extend the caching pattern from implied-only to all row/column arrays. ArrayPool<Definition> is now completely eliminated — the manager owns exact-sized Definition[] arrays that are reused across layout passes. This avoids ArrayPool's minimum bucket of 16 elements, which was wasteful for typical grids with 1-6 rows/columns. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This was referenced Feb 20, 2026
Open
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #34154
Description
This PR eliminates all managed heap allocations in the Core layout engine (Grid, Flex) during steady-state measure+arrange passes. The changes target the hot path that runs on every layout cycle — in a typical app with scrolling lists or animated layouts, this path executes thousands of times per second.
Why this matters
Every allocation in the layout hot path contributes to GC pressure. On mobile devices (Android/iOS), Gen0 collections during layout can cause frame drops. By eliminating allocations entirely from the Core layout engine, we remove GC as a variable in layout performance.
What changed
GridLayoutManager.cs (largest change)
1. GridStructure class → struct
The
GridStructureclass was allocated on everyMeasure()call. Converting it to a struct and storing it as a field onGridLayoutManagereliminates this allocation. Because structs are value types, method calls on struct fields operate directly on the field (no defensive copies) — the field is intentionally non-readonly.Uses
_hasGridStructurebool instead of nullable (can't useNullable<GridStructure>because.Valuewould copy the entire struct).2. Cell class → struct
Each
Celltracked a child's grid position and measurement constraints. Converting to a struct eliminates per-child allocations. Methods that mutate cells now useref Cellparameters to avoid copy-mutation bugs.3. Definition class → struct
Each
Definitiontracked a row/column's size and grid length. Converting to a struct and addingreadonlymodifiers to pure getters. Fixed a pre-existing copy-mutation bug inEnsureSizeLimitwherevar def = defs[n]; def.Size = newSize;silently mutated a copy — now correctly usesdefs[n].Size = newSize;.4. SpanKey record → readonly struct with IEquatable
The
SpanKeywas arecord(reference type with heap allocation). Converted to areadonly structimplementingIEquatable<SpanKey>to eliminate allocations when used as Dictionary keys. Implements properGetHashCode()usingHashCode.Combine(with netstandard fallback).5. Span class eliminated
The
Spanclass bundled aSpanKeywith aRequesteddouble. Eliminated the class entirely —TrackSpannow takes individual parameters and the dictionary storesDictionary<SpanKey, double>directly.6. ArrayPool for all arrays
All four arrays in
GridStructurenow useArrayPool<T>.Shared:_childrenToLayOut(IView[]): cleared on return to avoid holding references_cells(Cell[]): struct array, no clearing needed_rows(Definition[]): struct array, no clearing needed_columns(Definition[]): struct array, no clearing neededRented arrays may be larger than requested — actual counts tracked via
_childCount,_rowCount,_columnCount. All array loops use these counts instead of.Length.ReturnArrays()is called at the start ofMeasure()before creating a newGridStructure. The new structure is constructed into a local first, then the old arrays are returned and the field is swapped — this ensures exception safety if the constructor throws.7. Dictionary reuse for span tracking
Dictionary<SpanKey, double>? _spansDictionaryfield onGridLayoutManageris passed intoGridStructurevia constructor. On subsequent calls,.Clear()reuses the dictionary instead of allocating a new one. Lazy initialization (_spans ??= new()) still works for grids with no spanning children.8. foreach → for in ArrangeChildren
foreachonIGridLayout(interface dispatch) boxes theList<IView>.Enumeratorstruct — verified by benchmark: 1.56 KB/op withforeachvs 0 B with indexedfor. Internal array loops (_cells,_rows,_columns) also useforwith count because ArrayPool-rented arrays are oversized.Flex.cs
9. SelfSizing float[] elimination
SelfSizingDelegatewas called withfloat[] size = new float[2] { w, h }— allocating a 2-element array per child per layout pass. Replaced withref float width, ref float heightparameters, eliminating the allocation entirely.10. InlineArray(4) for Frame buffer
Item.Framewasfloat[] Frame { get; } = new float[4]— each Item allocated a 4-element float array. Replaced with[InlineArray(4)] struct FrameBufferthat stores the 4 floats inline in the Item. Conditional onNET8_0_OR_GREATERwithfloat[]fallback for netstandard.11. ArrayPool for ordered_indices and lines
ordered_indices:ArrayPool<int>.Shared.Rent(item.Count)inflex_layout.init, returned incleanup()lines(flex wrap lines):ArrayPool<flex_layout_line>.Shared.Rent(newCapacity)with manual copy+return for growth. Changed growth strategy fromArray.Resize(+1)(linear, N allocations for N lines) to doubling (logarithmic).cleanup()wrapped intry/finallyto ensure arrays are always returned, even if layout throws.InvalidationEventArgs.cs
12. Static cached instances
Added
InvalidationEventArgs.GetCached(InvalidationTrigger)that returns static singletons per trigger value. Replacednew InvalidationEventArgs(trigger)inVisualElement,Page, and legacyLayout. These fire on every measure invalidation, which happens frequently during layout.What we tried and didn't work (across all engines)
NSubstitute-based benchmarking for allocation measurement: NSubstitute mocks add 40–200% allocation noise that completely obscures real optimization gains. Mock indexer calls (
_grid[n]) allocate tracking objects per invocation. Solution: CreatedLayoutAllocBenchmarkerwith lightweight hand-written fakes implementingIGridLayout/IStackLayoutdirectly.Optimizing remaining Flex Controls-layer allocations: The remaining ~848 B (12 children) ≈ 71 B per child per pass. Traced through the call chain:
FlexLayoutManager.ArrangeChildren→child.Arrange(frame)→VisualElement.ArrangeOverride→UpdateBoundsComponents→ sets X, Y, Width, Height viaBindableObject.SetValue(property, doubleValue). EachSetValueboxes thedoubleargument. This is fundamental to howBindablePropertyworks — fixing it requires genericBindableProperty<T>(tracked in [Perf] Eliminate value-type boxing in BindableObject.SetValue #34080).Stack/FlexLayoutManager foreach→for and Count/Spacing caching: Benchmarks confirmed these had zero allocation impact — the compiler already optimizes
foreachon concrete types, and Stack was already allocation-free. These changes were reverted to minimize maintenance overhead.Grid
_cells[]foreach→for: The C# compiler already optimizesforeachon arrays to indexed access — no enumerator boxing.New Benchmarks
LayoutAllocBenchmarker
Lightweight fake objects (no NSubstitute) for true allocation measurement. Includes
FakeView,FakeGridLayout,FakeStackLayout,FakeRowDefinition,FakeColumnDefinition. Benchmarks Grid, VStack, HStack, and Flex Core engine with[Params]for ChildCount (12, 60) and UseSpans (true, false).LayoutHotPathBenchmarker
Uses NSubstitute for Grid/Stack and real Controls objects (
FlexLayout+Borderchildren) for Flex. Measures the full Controls-layer stack includingVisualElement.Measure/Arrange.InvalidationBenchmarker
Measures
InvalidationEventArgsdispatch allocation (before/after static caching).Benchmark Results
LayoutAllocBenchmarker — Core layer, lightweight fake objects
This benchmark uses hand-written fake
IView/IGridLayout/IStackLayoutimplementations (no NSubstitute) to measure true layout engine allocations without mock infrastructure noise.Baseline =
origin/net11.0with identical benchmark code copied over.Grid (1× Measure + 1× Arrange per invocation)
Raw BenchmarkDotNet output — baseline (net11.0)
Raw BenchmarkDotNet output — optimized (this PR)
(Gen0/Gen1/Gen2 all zero — omitted)
Flex Core engine (1× Layout per invocation, no Controls layer)
Raw BenchmarkDotNet output — baseline (net11.0)
Raw BenchmarkDotNet output — optimized (this PR)
Stack (1× Measure + 1× Arrange per invocation)
Stack layout was already allocation-free. No changes to Stack code in this PR.
LayoutHotPathBenchmarker — Flex end-to-end with real Controls objects
This benchmark uses real
FlexLayout+Borderchildren (Controls layer) to measure allocations through the full stack includingVisualElement.Measure/Arrange.The remaining ~848 B (12 children) ≈ 71 B per child, traced to
VisualElement.UpdateBoundsComponentsboxing doubles intoBindableObject.SetValuefor X/Y/Width/Height. This will be fixed by genericBindableProperty<T>(#34080).Raw BenchmarkDotNet output — baseline (net11.0)
Raw BenchmarkDotNet output — optimized (this PR)
Note on GridLayoutManagerBenchMarker (existing, NSubstitute-based)
The existing
GridLayoutManagerBenchMarkeruses NSubstitute mocks. After struct conversions, each_grid[n]indexer call on a mockedIGridLayoutallocates NSubstitute tracking objects — this is a benchmark artifact, not a real regression. TheLayoutAllocBenchmarkerwith lightweight fakes confirms the optimizations work correctly with real objects.Test Status
All 441 existing tests pass (394 Core layout + 47 Controls layout).