added new implementation using bytearray and memoryview #11
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hey there,
i've added a new implementation using a
bytearray
andmemoryview
to work on a fixed allocated memory buffer.As in the Pypy implementation the file gets distributed on all cpus via multiprocessing. I create n chunks (where n is the number of cpus available), rearrange each chunk to end/start at a whole line and spawn the processes.
But from there i've changed a lot, i allocate a buffer of configurable size, in this version
1024 * 128
bytes, and read the file directly into this buffer via thereadinto1(buffer)
method, after that i operate as much as possible on this fixed buffer, searching for\n
and;
to split the lines. If there is no\n
left in the buffer i read the next part of the file until i reach the end.On my machine this is even faster than the current Pypy solution, and also has a fixed size memory footprint, i don't need to disable the garbage collection, because there is no garbage created. (Even with disabled gc the memory footprint doesn't rise, whereas the pypy version uses all available ram, until my system freezes).
I don't know how fast my code is on your reference machine, but i'm really curious to find out, maybe you can try it out?