Skip to content

Span<T> wrapped in a struct isn't performing as fast as it could be #68797

Closed
@essoperagma

Description

@essoperagma

Brief intro:

I'm developing a binary serialization library with two major requirements:

  1. offering high performance
  2. minimizing user errors by utilizing the type system as much as possible.

For requirement #1, I'm heavily using Span. I'm quite happy with the performance and how clean the internal serializer implementation is.

For requirement #2, I'm trying to prevent users from accidentally passing the wrong Span to a method.
As a solution, I'm wrapping Span in my own struct: WrappedSpan. All my methods are requesting a WrappedSpan and they won't accept a Span.
This helps prevent mistakes via compile-time errors in the case that the user passes a Span instead of a WrappedSpan to any of the serialization methods.

The problem:

I noticed that a regular Span<byte> is performing 100% better than a Span wrapped in an otherwise-empty struct in some scenarios (many, small, chained method calls).

The method using Span<byte>:

public MethodHost_Span WriteInt32_Span(ref Span<byte> span, int value)
{
    MemoryMarshal.Cast<byte, int>(span)[0] = value;
    span = span.Slice(sizeof(int));
    // Return of zero-size struct is needed for method chaining.
    return default(MethodHost_Span);
}

The method using WrappedSpan:

public MethodHost_WrappedSpan WriteInt32_WrappedSpan(ref WrappedSpan wrapper, int value)
{
    MemoryMarshal.Cast<byte, int>(wrapper.Span)[0] = value;
    wrapper.Span = wrapper.Span.Slice(sizeof(int));
    // Return of zero-size struct is needed for method chaining.
    return default(MethodHost_WrappedSpan);
}

public ref struct WrappedSpan
{
    public Span<byte> Span;
}

Actual benchmark method is here.

More details:

  • All this is happening on .NET 7.0.100-preview.3.22179.4, but I got similar results on .NET 6 and 5.
  • Setting DOTNET_TieredPGO doesn't impact the performance.
  • The full benchmark project can be found here.
  • Outputs from Disasmo can be found here in the same repo.
  • Local runtime build used by Disasmo is at commit: 3535e0769f202ae4cd820bea24afd20cee313966
  • I'm on Windows 10, Version 10.0.19044 Build 19044
  • Building for amd64.

BenchmarkDotNet results with different data types:

Method Mean Error StdDev
WriteMany_Int32_Span 1.163 us 0.0122 us 0.0114 us
WriteMany_Int32_WrappedSpan 2.169 us 0.0194 us 0.0172 us
WriteMany_Single_Span 1.085 us 0.0119 us 0.0106 us
WriteMany_Single_WrappedSpan 2.183 us 0.0206 us 0.0161 us
WriteMany_Double_Span 1.096 us 0.0215 us 0.0221 us
WriteMany_Double_WrappedSpan 2.192 us 0.0181 us 0.0161 us
WriteMany_Mixed_Span 1.134 us 0.0127 us 0.0119 us
WriteMany_Mixed_WrappedSpan 2.183 us 0.0242 us 0.0226 us

Expected

I would expect the wrapping struct to have no impact on the generated byte code. In other words, WrappedSpan performs just as fast as a regular Span.

My questions

  1. Considering JIT internals, is this an expected result? If yes, could you share the decision-making process of JIT that results in such perf difference?
  2. Are there any options/tricks that I can use to get better results with the WrappedSpan?
  3. Would you consider improving JIT to generate better performing code for such scenarios?

category:cq
theme:structs
skill-level:expert
cost:large
impact:medium

Metadata

Metadata

Assignees

Labels

Priority:2Work that is important, but not critical for the releasearea-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMItenet-performancePerformance related issue

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions