Improve JSON_MATCH performance. #15049

bziobrowski · 2025-02-12T21:23:30Z

PR improves performance and reduces temporary object generation in JSON_MATCH() function calls by:

reducing and/or delaying immutable to mutable conversion as much as possible
using immutable objects instead of allocating
for != and NOT IN expressions - switching from OR-ing all bitmaps except those for unwanted values to reading bitmap with all values for the key and then removing values present in bitmaps of unwanted values
comparing dict ids instead of string values
reusing byte[] buffer when reading from string dictionary

Running the following query on githubComplexTypeEvents table with 2.5mil records (same base data, but multiplied) and JSON index on payload column shows (in async profiler/JFR) :

response time reduction from ~ 500ms to about ~250ms
allocation reduction from 400MiB to about 20MiB .

SELECT
  COUNT(*) AS event_count,
  SUM(payload.size) AS event_size,
  SUM( CASE WHEN created_at_timestamp >= ago('PT3H') THEN 1 ELSE 0 END ) AS transaction_count_3h,
  SUM( CASE WHEN created_at_timestamp >= ago('PT3H') THEN payload.distinct_size END ) AS transaction_sum_3h,
  SUM( CASE WHEN created_at_timestamp >= ago('PT15M') THEN 1 END ) AS transaction_count_15min,
  SUM( CASE WHEN created_at_timestamp >= ago('PT15M') THEN payload.distinct_size END ) AS transaction_sum_15min,
  type
FROM githubComplexTypeEvents
WHERE created_at_timestamp between ago('P10000D') AND now()
  AND payload.size > 0
  AND JSON_MATCH(
    payload,	
    '("$.ref" = ''refs/heads/master'' OR ("$.ref" = ''refs/heads/gh-pages'' AND ("$.commits[0].author.name" != ''chao1995'' OR "$.commits[0].author.name" IS NOT NULL)))'
  )
GROUP BY type

Synthetic benchmark shows speedup of about 4 times :

branch:

Benchmark                   (_numRows)  (_numSegments)                                                                                                                                                                                                                                                                                                                              (_query)  (_scenario)  Mode  Cnt   Score   Error  Units
BenchmarkQueriesSSQE.query     2500000              10  SELECT
  COUNT(*) AS count,
  SUM(INT_COL) AS size,
  LOW_CARDINALITY_STRING_COL as type
FROM MyTable
WHERE JSON_MATCH(
    JSON_COL,	
    '("$.type" = ''type0'' OR ("$.type" = ''type1'' AND ("$.changes[0].author.name" != ''author10''     OR "$.changes[1].author.name" IS NOT NULL)))'
  )
GROUP BY LOW_CARDINALITY_STRING_COL   EXP(0.001)  avgt    5  23.332 ± 1.553  ms/op

master:

Benchmark                   (_numRows)  (_numSegments)                                                                                                                                                                                                                                                                                                                              (_query)  (_scenario)  Mode  Cnt   Score   Error  Units
BenchmarkQueriesSSQE.query     2500000              10  SELECT
  COUNT(*) AS count,
  SUM(INT_COL) AS size,
  LOW_CARDINALITY_STRING_COL as type
FROM MyTable
WHERE JSON_MATCH(
    JSON_COL,
    '("$.type" = ''type0'' OR ("$.type" = ''type1'' AND ("$.changes[0].author.name" != ''author10''     OR "$.changes[1].author.name" IS NOT NULL)))'
  )
GROUP BY LOW_CARDINALITY_STRING_COL   EXP(0.001)  avgt    5  93.082 ± 1.948  ms/op

Note: I didn't change how flattened bitmaps are mapped to actual doc ids (simple forEach mapping values and adding to MutableRoaringBitmap) because other methods (tested in BenchmarkRoaringBitmapMapping) weren't any faster, and some were much slower:

Benchmark                                             Mode  Cnt         Score       Error   Units
mapSimple                                             avgt    5       634.450 ±    22.122   ms/op
mapSimple:gc.alloc.rate                               avgt    5        63.587 ±     2.168  MB/sec
mapSimple:gc.alloc.rate.norm                          avgt    5  42325777.600 ± 90594.149    B/op
mapSimple:gc.count                                    avgt    5         6.000              counts
mapSimple:gc.time                                     avgt    5        14.000                  ms
mapWithDefaults                                       avgt    5       617.683 ±    50.692   ms/op
mapWithDefaults:gc.alloc.rate                         avgt    5        65.271 ±     5.294  MB/sec
mapWithDefaults:gc.alloc.rate.norm                    avgt    5  42282179.200 ± 90573.488    B/op
mapWithDefaults:gc.count                              avgt    5         5.000              counts
mapWithDefaults:gc.time                               avgt    5        12.000                  ms
mapWithInitCapacity                                   avgt    5       630.927 ±    45.332   ms/op
mapWithInitCapacity:gc.alloc.rate                     avgt    5        64.439 ±     4.636  MB/sec
mapWithInitCapacity:gc.alloc.rate.norm                avgt    5  42641448.800 ± 90578.670    B/op
mapWithInitCapacity:gc.count                          avgt    5         6.000              counts
mapWithInitCapacity:gc.time                           avgt    5        11.000                  ms
mapWithMaxInitCapacity                                avgt    5       633.701 ±    18.636   ms/op
mapWithMaxInitCapacity:gc.alloc.rate                  avgt    5        61.892 ±     1.812  MB/sec
mapWithMaxInitCapacity:gc.alloc.rate.norm             avgt    5  41146993.600 ± 90594.149    B/op
mapWithMaxInitCapacity:gc.count                       avgt    5         5.000              counts
mapWithMaxInitCapacity:gc.time                        avgt    5        12.000                  ms
mapWithOptimisedForArraysAppender                     avgt    5       635.250 ±    52.749   ms/op
mapWithOptimisedForArraysAppender:gc.alloc.rate       avgt    5        63.465 ±     5.259  MB/sec
mapWithOptimisedForArraysAppender:gc.alloc.rate.norm  avgt    5  42282185.600 ± 90594.149    B/op
mapWithOptimisedForArraysAppender:gc.count            avgt    5         6.000              counts
mapWithOptimisedForArraysAppender:gc.time             avgt    5        12.000                  ms
mapWithOptimisedForRunsAppender                       avgt    5       628.546 ±    39.954   ms/op
mapWithOptimisedForRunsAppender:gc.alloc.rate         avgt    5        64.306 ±     4.051  MB/sec
mapWithOptimisedForRunsAppender:gc.alloc.rate.norm    avgt    5  42397132.000 ± 90503.570    B/op
mapWithOptimisedForRunsAppender:gc.count              avgt    5         6.000              counts
mapWithOptimisedForRunsAppender:gc.time               avgt    5        12.000                  ms
mapWithPartialRadixSort                               avgt    5       706.383 ±    24.308   ms/op
mapWithPartialRadixSort:gc.alloc.rate                 avgt    5       133.071 ±     4.476  MB/sec
mapWithPartialRadixSort:gc.alloc.rate.norm            avgt    5  98613905.600 ± 90594.149    B/op
mapWithPartialRadixSort:gc.count                      avgt    5        12.000              counts
mapWithPartialRadixSort:gc.time                       avgt    5        25.000                  ms
mapWithPartialRadixSortPrealloc                       avgt    5       660.032 ±    49.422   ms/op
mapWithPartialRadixSortPrealloc:gc.alloc.rate         avgt    5       114.218 ±     8.313  MB/sec
mapWithPartialRadixSortPrealloc:gc.alloc.rate.norm    avgt    5  79067249.600 ± 90594.149    B/op
mapWithPartialRadixSortPrealloc:gc.count              avgt    5         9.000              counts
mapWithPartialRadixSortPrealloc:gc.time               avgt    5        18.000                  ms
mapWithRunCompressDisabled                            avgt    5       613.402 ±    22.244   ms/op
mapWithRunCompressDisabled:gc.alloc.rate              avgt    5        65.703 ±     2.368  MB/sec
mapWithRunCompressDisabled:gc.alloc.rate.norm         avgt    5  42282169.600 ± 90594.149    B/op
mapWithRunCompressDisabled:gc.count                   avgt    5         5.000              counts
mapWithRunCompressDisabled:gc.time                    avgt    5        11.000                  ms

codecov-commenter · 2025-02-12T22:12:36Z

Codecov Report

Attention: Patch coverage is 81.57895% with 63 lines in your changes missing coverage. Please review.

Project coverage is 63.36%. Comparing base (59551e4) to head (e2a68fa).
Report is 1721 commits behind head on master.

Files with missing lines	Patch %	Lines
...local/realtime/impl/json/MutableJsonIndexImpl.java	78.72%	23 Missing and 17 partials ⚠️
...t/index/readers/json/ImmutableJsonIndexReader.java	84.86%	11 Missing and 12 partials ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##             master   #15049      +/-   ##
============================================
+ Coverage     61.75%   63.36%   +1.61%     
- Complexity      207     1481    +1274     
============================================
  Files          2436     2733     +297     
  Lines        133233   153871   +20638     
  Branches      20636    23755    +3119     
============================================
+ Hits          82274    97499   +15225     
- Misses        44911    49002    +4091     
- Partials       6048     7370    +1322

Flag	Coverage Δ
custom-integration1	`100.00% <ø> (+99.99%)`	⬆️
integration	`100.00% <ø> (+99.99%)`	⬆️
integration1	`100.00% <ø> (+99.99%)`	⬆️
integration2	`0.00% <ø> (ø)`
java-11	`63.34% <81.57%> (+1.63%)`	⬆️
java-21	`63.23% <81.57%> (+1.60%)`	⬆️
skip-bytebuffers-false	`63.35% <81.57%> (+1.60%)`	⬆️
skip-bytebuffers-true	`63.21% <81.57%> (+35.49%)`	⬆️
temurin	`63.36% <81.57%> (+1.61%)`	⬆️
unittests	`63.36% <81.57%> (+1.61%)`	⬆️
unittests1	`56.10% <37.42%> (+9.21%)`	⬆️
unittests2	`33.81% <80.40%> (+6.08%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

…performance

…ary.

richardstartin · 2025-02-13T12:17:39Z

Just a note on whether to use the RoaringBitmapWriter, the benchmark results, and the necessity for the benchmark itself which might have saved a bit of time, and to avoid the result of this benchmark being generalised inappropriately. This isn't really the use case it was designed for, and just adding to the bitmap makes perfect sense. It's intended for consuming ascending or nearly ascending values, which the flattened doc id mapping won't preserve in general. The rationale was outlined here. If the added doc id is outside of the current window of 2^16 values, it falls back to simply adding to the underlying bitmap so can only add overhead. On the slow partial radix sort option, the intended use case is for adding unsorted values in one go as in RoaringBitmap.bitmapOfUnordered - batching as done here was never considered and wouldn't be worthwhile because the cost the approach saves can only be amortised in one go. It was benchmarked for intended use here. I can't think of a use case in Pinot for using the partial radix sort option.

The rest of the changes look good.

bziobrowski · 2025-02-13T12:58:49Z

Thanks for the advice, Richard!
When I was trying to speed up the mapping part, the only useful page google proposed was https://richardstartin.github.io/posts/roaringbitmap-performance-tricks .
I guess I've to limit search to your blog :)

gortiz · 2025-02-21T10:52:56Z

@richardstartin, I have to say I really miss your posts!

albertobastos · 2025-02-21T12:40:36Z

...al/src/main/java/org/apache/pinot/segment/local/realtime/impl/json/MutableJsonIndexImpl.java

+   * It stores either a bitmap from posting list that must be cloned before mutating (readOnly=true)
+   * or an already  cloned bitmap.
+   */
+  static class LazyBitmap {


Being a private class it may sound weird, but wouldn't it be beneficial to define an Interface instead of using the LazyBitmap implementation directly. I see some excessive coupling here in case we need to try different alternatives in the future.

LazyBitmap shouldn't leave the internals of parent class so I doubt we'll ever have to check other implementations.
Changes here are similar to those in ImmutableJsonIndexReader but a bit more complex due to:

working with RoaringBitmap (there's no simple way to differentiate between instance taken from index that shouldn't be changed and a copy)

null RoaringBitmap having meaning in some contexts (e.g. filter() method )

some operations mixing LazyBitmap with RoaringBitmap. Wrapping the latter with an instance of the former would be pointless and slow things down.

bziobrowski added 2 commits February 12, 2025 22:11

Improve JSON_MATCH performance.

df86503

Fix checkstyle 'issues'.

44401c0

yashmayya added the performance label Feb 13, 2025

bziobrowski added 2 commits February 13, 2025 09:05

Merge remote-tracking branch 'origin/master' into improve_json_match_…

add293c

…performance

Limit byte buffer allocations when reading string values from diction…

fc9f69d

…ary.

Add JSON_MATCH query benchmark.

e88e2a5

bziobrowski added 3 commits February 14, 2025 19:13

Reduce string allocations for REGEXP_LIKE in JSON_MATCH().

7613189

Refactored json index test.

f6af9df

Optimized JSON_MATCH execution with mutable json index.

e2a68fa

albertobastos reviewed Feb 21, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve JSON_MATCH performance. #15049

Improve JSON_MATCH performance. #15049

bziobrowski commented Feb 12, 2025 •

edited

Loading

codecov-commenter commented Feb 12, 2025 •

edited

Loading

richardstartin commented Feb 13, 2025

bziobrowski commented Feb 13, 2025

gortiz commented Feb 21, 2025

albertobastos Feb 21, 2025

bziobrowski Feb 21, 2025

Improve JSON_MATCH performance. #15049

Are you sure you want to change the base?

Improve JSON_MATCH performance. #15049

Conversation

bziobrowski commented Feb 12, 2025 • edited Loading

codecov-commenter commented Feb 12, 2025 • edited Loading

Codecov Report

richardstartin commented Feb 13, 2025

bziobrowski commented Feb 13, 2025

gortiz commented Feb 21, 2025

albertobastos Feb 21, 2025

Choose a reason for hiding this comment

bziobrowski Feb 21, 2025

Choose a reason for hiding this comment

bziobrowski commented Feb 12, 2025 •

edited

Loading

codecov-commenter commented Feb 12, 2025 •

edited

Loading