You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, Roaring bitmaps are only usable in Druid without run compression. We have RoaringBitmapFactory.DEFAULT_COMPRESS_RUN_ON_SERIALIZATION set to false. I'm all for conservatism, as nobody likes to introduce bugs in complex systems... but in this instance, I can see no practical reason to leave performance gains on the table.
We have demonstrated at length, on various forums and even in a peer reviewed paper that run compression makes Roaring generally better...
Daniel Lemire, Gregory Ssi-Yan-Kai, Owen Kaser, Consistently faster and smaller compressed bitmaps with Roaring, Software: Practice and Experience, 2016. https://arxiv.org/abs/1603.06549
I realize that one might not want to enable new code paths lightly... but thanks to Gregory's work (and others), RoaringBitmap-0.6.27 has a high test coverage (90%). The run-compression approach is used in other important systems like Apache Kylin and Apache Spark with no bug report (so far... fingers' crossed).
The only problem I can imagine is if one creates a Druid database using the latest code and then tries to open it up with a Druid engine running on old code (e.g., using Roaring 0.4). The old code won't be able to read the run-optimized format... but even then, you will not get corruption and other random mayhem... you'll just get the Roaring library that complains about an unrecognized data format... and bails out. I stress that this would only happen to a user that creates the database using recent code, and then tries to operate it with very old code... That's not something that anyone should ever do in any case.
Anyhow, I think that good performance gains are left on the table with the current code.
@lemire, in the upcoming Druid 0.9.2 we actually don't use the default constructor here. We always pass in our own compressRunOnSerialization flag, and Druid's defaults to true as of 0.9.2 (see discussion on apache/druid#3228).
Could still be worth changing the default here, but it won't affect Druid.
Currently, Roaring bitmaps are only usable in Druid without run compression. We have RoaringBitmapFactory.DEFAULT_COMPRESS_RUN_ON_SERIALIZATION set to false. I'm all for conservatism, as nobody likes to introduce bugs in complex systems... but in this instance, I can see no practical reason to leave performance gains on the table.
We have demonstrated at length, on various forums and even in a peer reviewed paper that run compression makes Roaring generally better...
I realize that one might not want to enable new code paths lightly... but thanks to Gregory's work (and others), RoaringBitmap-0.6.27 has a high test coverage (90%). The run-compression approach is used in other important systems like Apache Kylin and Apache Spark with no bug report (so far... fingers' crossed).
The only problem I can imagine is if one creates a Druid database using the latest code and then tries to open it up with a Druid engine running on old code (e.g., using Roaring 0.4). The old code won't be able to read the run-optimized format... but even then, you will not get corruption and other random mayhem... you'll just get the Roaring library that complains about an unrecognized data format... and bails out. I stress that this would only happen to a user that creates the database using recent code, and then tries to operate it with very old code... That's not something that anyone should ever do in any case.
Anyhow, I think that good performance gains are left on the table with the current code.
This is somewhat related to this PR:
#35
The text was updated successfully, but these errors were encountered: