Skip to content

Always use Unsafe as the default PinotBufferFactory#13639

Merged
gortiz merged 1 commit intoapache:masterfrom
gortiz:disable-larray-by-default
Jul 29, 2024
Merged

Always use Unsafe as the default PinotBufferFactory#13639
gortiz merged 1 commit intoapache:masterfrom
gortiz:disable-larray-by-default

Conversation

@gortiz
Copy link
Contributor

@gortiz gortiz commented Jul 17, 2024

We have been using UnsafePinotBufferFactory for a while with no regression detected. I think it is a good time to start working on the plan to remove LArray dependency (see #12810)

This PR changes the default PinotBufferFactory used in Pinot. Before we were using LArrayPinotBufferFactory on Java < 16 but give LArray doesn't work in Java >= 16, we used UnsafePinotBufferFactory in newer JVMs. With this PR, we always use UnsafePinotBufferFactory.

Users that want to still use LArrayPinotBufferFactory can do that by providing the following config to Pinot:

pinot.offheap.buffer.factory=org.apache.pinot.segment.spi.memory.LArrayPinotBufferFactory

Remember that this is not recommended given LArray implementation has been proved to be incorrect several times (see for example https://github.com/apache/pinot/pull/10774).\

A side effect of this PR is that Pinot should be able to be executed (and probably compiled) in Mac computers using aarch64 (like M1, M2, etc) without having to enable Rosetta. We need to open a parallel PR to change this doc page and probably this one. There are still some issues in ARMs (AFAIR CLP lib does not include a binary for Mac + arm64, @suddendust may know more about that) and maybe other build pages that may not be updated.

This was referenced Jul 17, 2024
@codecov-commenter
Copy link

codecov-commenter commented Jul 17, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 61.99%. Comparing base (59551e4) to head (93a2108).
Report is 782 commits behind head on master.

Additional details and impacted files
@@             Coverage Diff              @@
##             master   #13639      +/-   ##
============================================
+ Coverage     61.75%   61.99%   +0.24%     
+ Complexity      207      198       -9     
============================================
  Files          2436     2553     +117     
  Lines        133233   140468    +7235     
  Branches      20636    21849    +1213     
============================================
+ Hits          82274    87085    +4811     
- Misses        44911    46749    +1838     
- Partials       6048     6634     +586     
Flag Coverage Δ
custom-integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration <0.01% <0.00%> (-0.01%) ⬇️
integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration2 0.00% <0.00%> (ø)
java-11 61.91% <100.00%> (+0.20%) ⬆️
java-21 61.88% <100.00%> (+0.26%) ⬆️
skip-bytebuffers-false 61.97% <100.00%> (+0.22%) ⬆️
skip-bytebuffers-true 61.83% <100.00%> (+34.10%) ⬆️
temurin 61.99% <100.00%> (+0.24%) ⬆️
unittests 61.99% <100.00%> (+0.24%) ⬆️
unittests1 46.45% <100.00%> (-0.44%) ⬇️
unittests2 27.76% <0.00%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@siddharthteotia
Copy link
Contributor

@gortiz - how will this work for existing segments in production that currently LBuffer writer and therefore depend on the corresponding reader ?

I see that in #13545 you mentioned about incrementally working towards deleting LArray starting by making the new implementation as default one.

I understand that new segments will pick up the new writer (Unsafe based). But how do we take care of rebuilding existing segments ? We need to have an approach that will work for production users. Otherwise we will always have to keep the Larray reader code around.

May be we should do something like we did for segment v2 -> v3 format migration? IIRC, during pre-processor we detected V2 and moved to V3.

Thoughts @vvivekiyer @Jackie-Jiang ?

@gortiz
Copy link
Contributor Author

gortiz commented Jul 19, 2024

Good question. The answer is... it doesn't matter 😄 .

The library used to implement buffers does not require or modify the buffer on disk. I mean, a segment might have been created using LArray and can be read using Unsafe and the other way around. These libraries do not modify the content on disk in any way. They just change the jvm/system calls used to load the content on memory.

Basically LArray uses some JNI+sun.misc.Unsafe calls to allocate and load memory while Unsafe just use sun.misc.Unsafe. But in the end both libraries end up mmaping the content of a file indicated with an offset and length.

Copy link
Contributor

@vvivekiyer vvivekiyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@siddharthteotia Verified that this will not cause any issues for us. We already have many hosts running java 17 and using Unsafe.

@gortiz gortiz merged commit 0d33790 into apache:master Jul 29, 2024
@gortiz gortiz deleted the disable-larray-by-default branch July 29, 2024 05:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants