Skip to content

Conversation

@aik-jahoda
Copy link
Contributor

What's Changed

ArrowStreamWriter allocates 8k of memory by creating a new array pool. This change introduces a shared buffer instead of pool to reduce the allocations.

The array pool is used to rent small arrays (8 bytes) but the pool allocates much bigger arrays (8kb in total)
The Array pool has access time overhead for small arrays compared to direct allocation.

Results from benchmarks:

Old implementation:

Method BatchLength ColumnSetCount Mean Error StdDev Allocated
WriteBatch 10000 10 6.118 ms 0.1215 ms 0.3345 ms 248.53 KB
WriteBatch 10000 14 9.788 ms 0.1910 ms 0.3396 ms 324.12 KB
WriteBatch 300000 10 119.351 ms 3.1713 ms 9.3008 ms 248.53 KB
WriteBatch 300000 14 136.697 ms 2.9229 ms 8.4799 ms 324.12 KB

New implementation:

Method BatchLength ColumnSetCount Mean Error StdDev Median Allocated
WriteBatch 10000 10 5.925 ms 0.2057 ms 0.6001 ms 5.843 ms 240.64 KB
WriteBatch 10000 14 8.908 ms 0.2743 ms 0.8002 ms 8.778 ms 316.23 KB
WriteBatch 300000 10 94.835 ms 1.7872 ms 3.7699 ms 93.892 ms 240.64 KB
WriteBatch 300000 14 147.995 ms 3.6873 ms 10.6975 ms 144.591 ms 316.23 KB

Closes #41.

@CurtHagenlocher
Copy link
Contributor

I think we can do better when targeting modern .NET. Consider instead defining a helper like

#if NET5_0_OR_GREATER
        struct IntBuffer
        {
            public void WriteLittleEndian(Stream stream, int value)
            {
                Span<byte> buffer = stackalloc byte[4];
                BinaryPrimitives.WriteInt32LittleEndian(buffer, value);
                stream.Write(buffer);
            }

            public void WriteLittleEndian(Stream stream, long value)
            {
                Span<byte> buffer = stackalloc byte[8];
                BinaryPrimitives.WriteInt64LittleEndian(buffer, value);
                stream.Write(buffer);
            }
        }
#else
        struct IntBuffer
        {
            byte[] buffer;

            public IntBuffer()
            {
                buffer = new byte[8];
            }

            public void WriteLittleEndian(Stream stream, int value)
            {
                BinaryPrimitives.WriteInt32LittleEndian(buffer, value);
                stream.Write(buffer, 0, 4);
            }

            public void WriteLittleEndian(Stream stream, long value)
            {
                BinaryPrimitives.WriteInt64LittleEndian(buffer, value);
                stream.Write(buffer, 0, 8);
            }
        }
#endif

@aik-jahoda
Copy link
Contributor Author

I like the idea of making the buffer explicitly for endianity operation. Unfortunately, the stream.WriteAsync doesn't have an overload with ReadOnlySpan<byte> so it will disallow us to use stack alloc . And I believe the async overload will be preferred by users. As we are writing a small amount of data, I tend to think about implementing the WriteIpcMessageLengthAsync over sync IntBuffer.WriteLittleEndian so we can leverage zero allocation.

It would be good to support this by benchmark, but it would heavily depend on stream implementation.

@aik-jahoda
Copy link
Contributor Author

@CurtHagenlocher
Copy link
Contributor

Unfortunately, the stream.WriteAsync doesn't have an overload with ReadOnlySpan<byte> so it will disallow us to use stack alloc.

Right; of course, because the allocation has to live until the async call finishes.

I will try to explore solution without buffer, by using https://learn.microsoft.com/en-us/dotnet/api/system.buffers.binary.binaryprimitives.reverseendianness?view=net-9.0#system-buffers-binary-binaryprimitives-reverseendianness(system-int32)

I suspect the buffer is unavoidable in the async case, but let us know what you decide.

@aik-jahoda
Copy link
Contributor Author

I added your proposal, with async over sync. If you think we should have it fully async, I can add the async variant into the helper.

@CurtHagenlocher
Copy link
Contributor

I don't think we should do async over sync.

I've taken another look at the code and I don't understand why a new buffer pool is being created for this purpose. Wouldn't it be simplest just to delete the Buffers field entirely and then replace Buffers.RentReturn with ArrayPool<byte>.Shared.RentReturn?

@aik-jahoda
Copy link
Contributor Author

Wouldn't it be simplest just to delete the Buffers field entirely and then replace Buffers.RentReturn with ArrayPool<byte>.Shared.RentReturn?

Yes, it would be possible. From a performance point of view renting small buffers is slower than allocating a new array on the heap. https://adamsitnik.com/Array-Pool/

I'm fine with both approaches; tell me what your preference is.

I don't think we should do async over sync.
Make sense

@CurtHagenlocher
Copy link
Contributor

From a performance point of view renting small buffers is slower than allocating a new array on the heap. https://adamsitnik.com/Array-Pool/

We already use the shared array pool for this purpose, including the reverse of this scenario where we're trying to read a serialized int32 value from a Stream (see line 208 of ArrowStreamReaderImplementation.cs). The only thing that distinguishes this case in the current code from those others is that for some reason we're creating a new pool instead of reusing the shared one. I'm inclined to stick with the patterns already in the code rather than creating a new one -- unless we have specific evidence which suggests sufficient benefit for doing it differently.

@aik-jahoda aik-jahoda force-pushed the remove_allocation_ArrowStreamWriter_cctor branch from 7c6d04b to 0dd1fe8 Compare September 24, 2025 08:45
@aik-jahoda
Copy link
Contributor Author

I have pushed the change with ArrayPool<byte>.Shared

If we decide to do a stackalloc for sync method, I propose to do it in separate PR and discuss the solution in a new issue to keep this PR simple.

@CurtHagenlocher CurtHagenlocher merged commit 2503375 into apache:main Sep 24, 2025
18 of 19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve memory allocation of ArrowStreamWriter cctor

2 participants