-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Open
Labels
Description
Describe the enhancement requested
Profiling the load of a Parquet file with Java Mission Control, I've noticed that InternalParquetRecordReader
LongStream consumes relevant amount of time.
This LongStream
can be replaced with a simpler Long Iterator that iterates from 0 to pages.getRowCount()
.
To measure the overhead I've created a test project that overwrites InternalParquetRecordReader
implementation with a Long Iterator: https://github.com/jerolba/parquet-rowindexiterator
The execution time is sensitive to the context of the JVM, but running the benchmark multiple times shows that LongStream is slower than LongIterator, between 1% and 4% depending on the run.
Component(s)
No response