Span<T> wrapped in a struct isn't performing as fast as it could be

## **Brief intro:**

I'm developing a binary serialization library with two major requirements:
1. offering high performance
2. minimizing user errors by utilizing the type system as much as possible.

For requirement `#1`, I'm heavily using Span<T>. I'm quite happy with the performance and how clean the internal serializer implementation is.

For requirement `#2`, I'm trying to prevent users from accidentally passing the wrong Span<T> to a method.
As a solution, I'm wrapping Span<T> in my own struct: `WrappedSpan`. All my methods are requesting a `WrappedSpan` and they won't accept a `Span`.
This helps prevent mistakes via compile-time errors in the case that the user passes a `Span` instead of a `WrappedSpan` to any of the serialization methods.


## **The problem:**

I noticed that a regular `Span<byte>` is performing 100% better than a `Span` wrapped in an otherwise-empty struct in some scenarios (many, small, chained method calls).

The method using `Span<byte>`:

```csharp
public MethodHost_Span WriteInt32_Span(ref Span<byte> span, int value)
{
    MemoryMarshal.Cast<byte, int>(span)[0] = value;
    span = span.Slice(sizeof(int));
    // Return of zero-size struct is needed for method chaining.
    return default(MethodHost_Span);
}
```

The method using `WrappedSpan`:
```csharp
public MethodHost_WrappedSpan WriteInt32_WrappedSpan(ref WrappedSpan wrapper, int value)
{
    MemoryMarshal.Cast<byte, int>(wrapper.Span)[0] = value;
    wrapper.Span = wrapper.Span.Slice(sizeof(int));
    // Return of zero-size struct is needed for method chaining.
    return default(MethodHost_WrappedSpan);
}

public ref struct WrappedSpan
{
    public Span<byte> Span;
}
```

Actual benchmark method is [here](https://github.com/essoperagma/dotnet-wrappedstruct/blob/e09cad74f1901ce1a74b7cd93c43d1d1b16321e4/StructWrapper/Benchmarks.cs#L112-L133).

More details:
- All this is happening on .NET 7.0.100-preview.3.22179.4, but I got similar results on .NET 6 and 5.
- Setting `DOTNET_TieredPGO` doesn't impact the performance.
- The full benchmark project can be found [here](https://github.com/essoperagma/dotnet-wrappedstruct).
- Outputs from Disasmo can be found [here](https://github.com/essoperagma/dotnet-wrappedstruct/tree/main/Disasmo%20Outputs) in the same repo.
- Local runtime build used by Disasmo is at commit: [3535e0769f202ae4cd820bea24afd20cee313966](https://github.com/dotnet/runtime/commit/3535e0769f202ae4cd820bea24afd20cee313966)
- I'm on Windows 10, Version 10.0.19044 Build 19044
- Building for amd64.

### **BenchmarkDotNet results with different data types:**
|                       Method |     Mean |     Error |    StdDev |
|----------------------------- |---------:|----------:|----------:|
|         WriteMany_Int32_Span | 1.163 us | 0.0122 us | 0.0114 us |
|  WriteMany_Int32_WrappedSpan | 2.169 us | 0.0194 us | 0.0172 us |
|        WriteMany_Single_Span | 1.085 us | 0.0119 us | 0.0106 us |
| WriteMany_Single_WrappedSpan | 2.183 us | 0.0206 us | 0.0161 us |
|        WriteMany_Double_Span | 1.096 us | 0.0215 us | 0.0221 us |
| WriteMany_Double_WrappedSpan | 2.192 us | 0.0181 us | 0.0161 us |
|         WriteMany_Mixed_Span | 1.134 us | 0.0127 us | 0.0119 us |
|  WriteMany_Mixed_WrappedSpan | 2.183 us | 0.0242 us | 0.0226 us |

## **Expected**
I would expect the wrapping struct to have no impact on the generated byte code. In other words, `WrappedSpan` performs just as fast as a regular Span<byte>.

## **My questions**

1. Considering JIT internals, is this an expected result? If yes, could you share the decision-making process of JIT that results in such perf difference?
1. Are there any options/tricks that I can use to get better results with the WrappedSpan?
1. Would you consider improving JIT to generate better performing code for such scenarios?

category:cq
theme:structs
skill-level:expert
cost:large
impact:medium

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Span<T> wrapped in a struct isn't performing as fast as it could be #68797

Brief intro:

The problem:

BenchmarkDotNet results with different data types:

Expected

My questions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Method	Mean	Error	StdDev
WriteMany_Int32_Span	1.163 us	0.0122 us	0.0114 us
WriteMany_Int32_WrappedSpan	2.169 us	0.0194 us	0.0172 us
WriteMany_Single_Span	1.085 us	0.0119 us	0.0106 us
WriteMany_Single_WrappedSpan	2.183 us	0.0206 us	0.0161 us
WriteMany_Double_Span	1.096 us	0.0215 us	0.0221 us
WriteMany_Double_WrappedSpan	2.192 us	0.0181 us	0.0161 us
WriteMany_Mixed_Span	1.134 us	0.0127 us	0.0119 us
WriteMany_Mixed_WrappedSpan	2.183 us	0.0242 us	0.0226 us

Span<T> wrapped in a struct isn't performing as fast as it could be #68797

Description

Brief intro:

The problem:

BenchmarkDotNet results with different data types:

Expected

My questions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions