-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve JSON_MATCH performance. #15049
base: master
Are you sure you want to change the base?
Improve JSON_MATCH performance. #15049
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #15049 +/- ##
============================================
+ Coverage 61.75% 63.36% +1.61%
- Complexity 207 1481 +1274
============================================
Files 2436 2733 +297
Lines 133233 153871 +20638
Branches 20636 23755 +3119
============================================
+ Hits 82274 97499 +15225
- Misses 44911 49002 +4091
- Partials 6048 7370 +1322
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Just a note on whether to use the The rest of the changes look good. |
Thanks for the advice, Richard! |
@richardstartin, I have to say I really miss your posts! |
* It stores either a bitmap from posting list that must be cloned before mutating (readOnly=true) | ||
* or an already cloned bitmap. | ||
*/ | ||
static class LazyBitmap { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Being a private class it may sound weird, but wouldn't it be beneficial to define an Interface
instead of using the LazyBitmap
implementation directly. I see some excessive coupling here in case we need to try different alternatives in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LazyBitmap shouldn't leave the internals of parent class so I doubt we'll ever have to check other implementations.
Changes here are similar to those in ImmutableJsonIndexReader but a bit more complex due to:
- working with RoaringBitmap (there's no simple way to differentiate between instance taken from index that shouldn't be changed and a copy)
- null RoaringBitmap having meaning in some contexts (e.g. filter() method )
- some operations mixing LazyBitmap with RoaringBitmap. Wrapping the latter with an instance of the former would be pointless and slow things down.
PR improves performance and reduces temporary object generation in
JSON_MATCH()
function calls by:!=
andNOT IN
expressions - switching from OR-ing all bitmaps except those for unwanted values to reading bitmap with all values for the key and then removing values present in bitmaps of unwanted valuesRunning the following query on
githubComplexTypeEvents
table with 2.5mil records (same base data, but multiplied) and JSON index on payload column shows (in async profiler/JFR) :Synthetic benchmark shows speedup of about 4 times :
branch:
master:
Note: I didn't change how flattened bitmaps are mapped to actual doc ids (simple
forEach
mapping values and adding to MutableRoaringBitmap) because other methods (tested inBenchmarkRoaringBitmapMapping
) weren't any faster, and some were much slower: