-
Notifications
You must be signed in to change notification settings - Fork 9
Description
Describe the enhancement requested
ArrowStreamWriter allocates non trivial amount of byte[][] and byte[] objects. In our scenario where we use new ArrowStreamWriter for each Recordbatch it creates significant memory overhead.
| Method | Mean | Error | StdDev | Ratio | RatioSD | Gen0 | Gen1 | Allocated |
|---|---|---|---|---|---|---|---|---|
| ArrowStreamWriterCctor | 843.42 ns | 37.066 ns | 108.124 ns | 1.17 | 0.24 | 0.7687 | 0.0248 | 9656 B |
The majority of allocated data is comming from instantiating ArrayPool
The ArrayPool<byte>.Create() is very expensive operation as it preallocate the whole pool :
| Method | Mean | Error | StdDev | Ratio | RatioSD | Gen0 | Gen1 | Allocated |
|---|---|---|---|---|---|---|---|---|
| ArrayPoolCreate | 739.13 ns | 40.097 ns | 117.597 ns | 1.03 | 0.23 | 0.6428 | 0.0172 | 8072 B |
There is several options how to solve this unnecessary allocation:
Limit the pool size
This is the simplest change we can do but not perfect. As the pool is used only for arrays of size 4 or 8, we don't need to cfreate full ArrayPool, but limit it by parameters: ArrayPool<byte>.Create(8, 50)
Here is comparison of full and limited pool:
| Method | Mean | Error | StdDev | Ratio | RatioSD | Gen0 | Gen1 | Allocated | Alloc Ratio |
|---|---|---|---|---|---|---|---|---|---|
| ArrayPoolCreate | 739.13 ns | 40.097 ns | 117.597 ns | 1.03 | 0.23 | 0.6428 | 0.0172 | 8072 B | 1.00 |
| ArrayPoolCreateSmall | 68.02 ns | 3.041 ns | 8.823 ns | 0.09 | 0.02 | 0.0414 | - | 520 B | 0.06 |
Use ArrayPool.Shared
Actually there is no reason to create our own array pool because the Stream.Write(ReadOnlySpan<byte> buffer) becaue it rent from ArrayPool.Shared anyway in the default behaviour: https://github.com/dotnet/runtime/blob/b6127f9c7f6bab00186ec43d4a332053a1d02325/src/libraries/System.Private.CoreLib/src/System/IO/Stream.cs#L912-L924
Don't use Array pool at all
As the array pool is used only in WriteIpcMessageLengthAsync and WriteIpcMessageLengthAsync it should be enought to have shared array used as a buffer. This would imply the ArrowStreamWriter is not thread safe.
Component(s)
C#