Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set RoaringBitmapFactory.DEFAULT_COMPRESS_RUN_ON_SERIALIZATION = true #36

Open
lemire opened this issue Oct 6, 2016 · 2 comments
Open

Comments

@lemire
Copy link
Contributor

lemire commented Oct 6, 2016

Currently, Roaring bitmaps are only usable in Druid without run compression. We have RoaringBitmapFactory.DEFAULT_COMPRESS_RUN_ON_SERIALIZATION set to false. I'm all for conservatism, as nobody likes to introduce bugs in complex systems... but in this instance, I can see no practical reason to leave performance gains on the table.

We have demonstrated at length, on various forums and even in a peer reviewed paper that run compression makes Roaring generally better...

  • Daniel Lemire, Gregory Ssi-Yan-Kai, Owen Kaser, Consistently faster and smaller compressed bitmaps with Roaring, Software: Practice and Experience, 2016. https://arxiv.org/abs/1603.06549

I realize that one might not want to enable new code paths lightly... but thanks to Gregory's work (and others), RoaringBitmap-0.6.27 has a high test coverage (90%). The run-compression approach is used in other important systems like Apache Kylin and Apache Spark with no bug report (so far... fingers' crossed).

The only problem I can imagine is if one creates a Druid database using the latest code and then tries to open it up with a Druid engine running on old code (e.g., using Roaring 0.4). The old code won't be able to read the run-optimized format... but even then, you will not get corruption and other random mayhem... you'll just get the Roaring library that complains about an unrecognized data format... and bails out. I stress that this would only happen to a user that creates the database using recent code, and then tries to operate it with very old code... That's not something that anyone should ever do in any case.

Anyhow, I think that good performance gains are left on the table with the current code.

This is somewhat related to this PR:
#35

@gianm
Copy link

gianm commented Oct 6, 2016

@lemire, in the upcoming Druid 0.9.2 we actually don't use the default constructor here. We always pass in our own compressRunOnSerialization flag, and Druid's defaults to true as of 0.9.2 (see discussion on apache/druid#3228).

Could still be worth changing the default here, but it won't affect Druid.

@lemire
Copy link
Contributor Author

lemire commented Oct 6, 2016

@gianm

Then feel free to discard my issue as being irrelevant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants