-
Notifications
You must be signed in to change notification settings - Fork 9
Remove unnecessary allocation in ArrowStreamWriter #73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove unnecessary allocation in ArrowStreamWriter #73
Conversation
|
I think we can do better when targeting modern .NET. Consider instead defining a helper like |
|
I like the idea of making the buffer explicitly for endianity operation. Unfortunately, the stream.WriteAsync doesn't have an overload with It would be good to support this by benchmark, but it would heavily depend on stream implementation. |
|
I will try to explore solution without buffer, by using https://learn.microsoft.com/en-us/dotnet/api/system.buffers.binary.binaryprimitives.reverseendianness?view=net-9.0#system-buffers-binary-binaryprimitives-reverseendianness(system-int32) |
Right; of course, because the allocation has to live until the async call finishes.
I suspect the buffer is unavoidable in the async case, but let us know what you decide. |
|
I added your proposal, with async over sync. If you think we should have it fully async, I can add the async variant into the helper. |
|
I don't think we should do async over sync. I've taken another look at the code and I don't understand why a new buffer pool is being created for this purpose. Wouldn't it be simplest just to delete the Buffers field entirely and then replace |
Yes, it would be possible. From a performance point of view renting small buffers is slower than allocating a new array on the heap. https://adamsitnik.com/Array-Pool/ I'm fine with both approaches; tell me what your preference is.
|
We already use the shared array pool for this purpose, including the reverse of this scenario where we're trying to read a serialized int32 value from a |
7c6d04b to
0dd1fe8
Compare
|
I have pushed the change with If we decide to do a stackalloc for sync method, I propose to do it in separate PR and discuss the solution in a new issue to keep this PR simple. |
What's Changed
ArrowStreamWriter allocates 8k of memory by creating a new array pool. This change introduces a shared buffer instead of pool to reduce the allocations.
The array pool is used to rent small arrays (8 bytes) but the pool allocates much bigger arrays (8kb in total)
The Array pool has access time overhead for small arrays compared to direct allocation.
Results from benchmarks:
Old implementation:
New implementation:
Closes #41.