Skip to content

Improve memory allocation of ArrowStreamWriter cctor #41

@aik-jahoda

Description

@aik-jahoda

Describe the enhancement requested

ArrowStreamWriter allocates non trivial amount of byte[][] and byte[] objects. In our scenario where we use new ArrowStreamWriter for each Recordbatch it creates significant memory overhead.

Method Mean Error StdDev Ratio RatioSD Gen0 Gen1 Allocated
ArrowStreamWriterCctor 843.42 ns 37.066 ns 108.124 ns 1.17 0.24 0.7687 0.0248 9656 B

The majority of allocated data is comming from instantiating ArrayPool

The ArrayPool<byte>.Create() is very expensive operation as it preallocate the whole pool :

https://github.com/dotnet/runtime/blob/b6127f9c7f6bab00186ec43d4a332053a1d02325/src/libraries/System.Private.CoreLib/src/System/Buffers/ConfigurableArrayPool.cs#L43-L46

Method Mean Error StdDev Ratio RatioSD Gen0 Gen1 Allocated
ArrayPoolCreate 739.13 ns 40.097 ns 117.597 ns 1.03 0.23 0.6428 0.0172 8072 B

There is several options how to solve this unnecessary allocation:

Limit the pool size

This is the simplest change we can do but not perfect. As the pool is used only for arrays of size 4 or 8, we don't need to cfreate full ArrayPool, but limit it by parameters: ArrayPool<byte>.Create(8, 50)
Here is comparison of full and limited pool:

Method Mean Error StdDev Ratio RatioSD Gen0 Gen1 Allocated Alloc Ratio
ArrayPoolCreate 739.13 ns 40.097 ns 117.597 ns 1.03 0.23 0.6428 0.0172 8072 B 1.00
ArrayPoolCreateSmall 68.02 ns 3.041 ns 8.823 ns 0.09 0.02 0.0414 - 520 B 0.06

Use ArrayPool.Shared

Actually there is no reason to create our own array pool because the Stream.Write(ReadOnlySpan<byte> buffer) becaue it rent from ArrayPool.Shared anyway in the default behaviour: https://github.com/dotnet/runtime/blob/b6127f9c7f6bab00186ec43d4a332053a1d02325/src/libraries/System.Private.CoreLib/src/System/IO/Stream.cs#L912-L924

Don't use Array pool at all

As the array pool is used only in WriteIpcMessageLengthAsync and WriteIpcMessageLengthAsync it should be enought to have shared array used as a buffer. This would imply the ArrowStreamWriter is not thread safe.

Component(s)

C#

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions