Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adaptive filter partitioning #15838

Merged
merged 29 commits into from
Feb 29, 2024

Conversation

clintropolis
Copy link
Member

@clintropolis clintropolis commented Feb 6, 2024

Description

This PR reworks how filter partitioning is performed, pushing it into Filter to "bundle" both index and value matcher creation, allowing filter partitioning to be done adaptively based on the selectivity of prior filters. This was alluded to during discussions in #12388, and contains and enhances the functionality added in #13977 (as well as removes the need to merge #15551), and aims to solve the problems originally described in #3878.

The main change centers around a new addition to the Filter interface,

default <T> FilterBundle makeFilterBundle(
      ColumnIndexSelector columnIndexSelector,
      BitmapResultFactory<T> bitmapResultFactory,
      int selectionRowCount,
      int totalRowCount,
      boolean includeUnknown,
      boolean allowPartialIndex
  )

for which a default implementation is provided that should be sufficient in most cases, and produces a new container type FilterBundle,

Production of the FilterBundle is assisted by a new method on BitmapColumnIndex

  @Nullable
  default <T> T computeBitmapResult(
      BitmapResultFactory<T> bitmapResultFactory,
      int selectionRowCount,
      int totalRowCount,
      boolean includeUnknown
  )

which allows index computation to factor selectionRowCount and totalRowCount into whether or not they should compute indexes, gauged based on the amount of work they need to do vs the amount of work it would take to just use a ValueMatcher/VectorValueMatcher and scan the 'remaining' rows.

The primary beneficiary of this is AND and OR filters, which can now partition not just on whether or not the sub-filter supports indexes at all, but also based on whether or not they should use those indexes based on the selectivity of sub-filters whose indexes have already been computed, which itself is delegated to the sub-filters via the new BitmapColumnIndex method.

This can have a pretty dramatic performance impact:

query:

"SELECT string2, SUM(long1) FROM foo WHERE string1 = '1000' AND string5 LIKE '%1%' AND (string3 in ('1', '10', '20', '22', '32') AND long2 IN (1, 19, 21, 23, 25, 26, 46) AND double3 < 1010.0 AND double3 > 1000.0 AND (string4 = '1' OR REGEXP_EXTRACT(string1, '^1') IS NOT NULL OR REGEXP_EXTRACT('Z' || string2, '^Z2') IS NOT NULL)) GROUP BY 1 ORDER BY 2"

before:

Benchmark                        (query)  (rowsPerSegment)  (schema)  (vectorize)  Mode  Cnt    Score    Error  Units
SqlExpressionBenchmark.querySql       41           5000000  explicit        false  avgt    5  455.417 ± 12.678  ms/op
SqlExpressionBenchmark.querySql       41           5000000      auto        false  avgt    5  474.680 ± 16.484  ms/op

after:

Benchmark                        (query)  (rowsPerSegment)  (schema)  (vectorize)  Mode  Cnt    Score    Error  Units
SqlExpressionBenchmark.querySql       41           5000000  explicit        false  avgt    5  127.663 ±  6.255  ms/op
SqlExpressionBenchmark.querySql       41           5000000      auto        false  avgt    5  125.231 ± 16.176  ms/op

query:

"SELECT SUM(long1) FROM foo WHERE string1 = '1000' AND string5 LIKE '%1%'"

before:

Benchmark                        (query)  (rowsPerSegment)  (schema)  (stringEncoding)  (vectorize)  Mode  Cnt    Score    Error  Units
SqlNestedDataBenchmark.querySql       46           5000000  explicit              none        false  avgt    3  294.571 ± 51.248  ms/op
SqlNestedDataBenchmark.querySql       46           5000000  explicit              none        force  avgt    3  270.761 ± 64.697  ms/op
SqlNestedDataBenchmark.querySql       46           5000000      auto              none        false  avgt    3  254.495 ± 14.844  ms/op
SqlNestedDataBenchmark.querySql       46           5000000      auto              none        force  avgt    3  282.779 ±  9.570  ms/op

after:

SqlNestedDataBenchmark.querySql       46           5000000  explicit              none        false  avgt    3    9.897 ±   3.995  ms/op
SqlNestedDataBenchmark.querySql       46           5000000  explicit              none        force  avgt    3    9.901 ±   3.668  ms/op
SqlNestedDataBenchmark.querySql       46           5000000      auto              none        false  avgt    3    9.908 ±   4.091  ms/op
SqlNestedDataBenchmark.querySql       46           5000000      auto              none        force  avgt    3    9.876 ±   3.121  ms/op

not even getting into doing anything fancy such as re-ordering sub-filters based on expected cost (for examples null and equality filters could always be evaluated first, then IN and range filters, followed by predicate and additional logical filters, i'll save all of this for follow-up work).

Currently only range and predicate based indexes are implementing this new functionality, but the 'IN' filter could also consider this for cases where it must check the entire set against the dictionary where the set is larger than the number of remaining rows to be scanned. Numeric range thresholds were chosen via benchmarking, which given how cheap numerical comparisons will skip indexes unless the range is quite a bit smaller than the number of rows to be scanned, but are wired to the same undocumented config added in #13977.

Predicate based indexes are using a very conservative "dictionary size greater than number of rows to be scanned" threshold, which should be safe enough to not need any config, since the same or larger number of predicate operations would need to be done to compute the indexes (on top of unioning the indexes themselves) as if we just used a matcher for the remaining rows. This can likely be better, but I'll save that for follow-up PRs.

This functionality completely replaces FilterAnalysis, which was previously externally performing filter partitioning for the top level AND filter of a query, otherwise would all or nothing use the index or not when constructing offsets.

The new makeFilterBundle method should be significantly more powerful because the AND filter delegates this to its children, meaning that nested AND filters will also be partitioned without doing any additional work or normalization.

OR filters also implement makeFilterBundle, and have been reworked to allow partial index computation for the sake of creating a value matcher that checks the index set bits rather than computing the actual matcher. This functionality itself is not new, previously it lived in FilteredOffset, but now it has been moved inside of OrFilter which constructs this partial index matcher when constructing its FilterBundle.

Visibility into the partitioning is available via debug logs, which will print how filters have been partitioned when enabled (it seemed too noisy to be info by default) and look something like this

2024-02-07T01:31:03,314 DEBUG [processing_3b8e0608-ce11-48c2-baee-7a1a1f2492f1] org.apache.druid.segment.QueryableIndexCursorSequenceBuilder - Filter partitioning (30ms):{indexes=[{filter="channel = #en.wikipedia", selectionSize=6650, buildTime=435μs}, {filter="100 < delta as LONG < 500", selectionSize=2725, buildTime=13998μs}], matcher=[{filter="page LIKE '%1%'"}, {filter="~(v0 IS NULL) || user = bob", partialIndex={filter="user = bob", selectionSize=0, buildTime=3382μs}}]}

currently largely relying on the Filter.toString implementation to populate the details.

This feels like a safe change to make since everything is very conservative, so I have not retained the ability to force Druid to always use indexes, though I suppose it should be possible if there is demand for it.

As mentioned earlier, I believe there is further improvement to be made, but I think this is a good start.

Release note

Druid query processing now adaptively determines when children of AND filters should compute indexes and when to simply match rows during the scan based on selectivity of other filters. We refer to this concept as 'filter partitioning', and it can result in quite a dramatic performance increase depending on the order which filters are written in the query.

For example, imagine we have a query like SELECT SUM(longColumn) FROM druid.table WHERE stringColumn1 = '1000' AND stringColumn2 LIKE '%1%'. Clasically, Druid would always use indexes when processing filters if they are available. This is not always ideal though, imagine if stringColumn1 = '1000' matches 100 rows, if we are always using indexes we still have to find every value of stringColumn2 LIKE '%1%' that is true to compute the indexes for that filter, which if stringColumn2 has more than 100 values ends up being worse than if the processing engine had simply checked for a match in those 100 remaining rows.

With the new logic in place, Druid now checks the selectivity of indexes as it processes each clause of the AND filter, and if it determines it would take more work to compute the index than to match the remaining rows, it skips computing the index.

As an end user, this means that the order you write filters in a WHERE clause of a query can now influence the performance of your queries. Don't worry, since Druid previously always used indexes if available, not ordering your filters cannot cause queries to be any slower than they were before. If you do want to explore this though, we recommend putting filter clauses with 'cheaper' to compute indexes such as IS NULL, =, and comparisons such as >, >=, <, <= near the front of AND filters to allow Druid to more efficiently process your queries. Future work will look into automatically processing these filters in the best order to take this burden off of the user, but for now feel free to play around with filter ordering to see if you can improve query performance.


Key changed/added classes in this PR
  • Filter
  • BitmapColumnIndex
  • QueryableIndexCursorSequenceBuilder

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@clintropolis
Copy link
Member Author

clintropolis commented Feb 6, 2024

I ran quite a lot of benchmarks to ensure everything was the "same" or "better" performance, will add them in the comments. Some of them are sort of noisy due to my laptop not being completely silent while taking measurements and a sort of low number of total iterations, but they seemed close enough to be satisfied and get an idea of the numbers.

@clintropolis
Copy link
Member Author

sql benchmark:
before:

Benchmark              (query)  (rowsPerSegment)  (schema)  (storageType)  (stringEncoding)  (vectorize)  Mode  Cnt    Score     Error  Units
SqlBenchmark.querySql        4           5000000  explicit           mmap              none        false  avgt    3  103.790 ±   3.982  ms/op
SqlBenchmark.querySql        4           5000000  explicit           mmap              none        force  avgt    3   42.487 ±   5.176  ms/op
SqlBenchmark.querySql        4           5000000      auto           mmap              none        false  avgt    3  104.875 ±   7.658  ms/op
SqlBenchmark.querySql        4           5000000      auto           mmap              none        force  avgt    3   43.358 ±   5.930  ms/op
SqlBenchmark.querySql        5           5000000  explicit           mmap              none        false  avgt    3   26.222 ±   6.128  ms/op
SqlBenchmark.querySql        5           5000000  explicit           mmap              none        force  avgt    3   25.999 ±   8.292  ms/op
SqlBenchmark.querySql        5           5000000      auto           mmap              none        false  avgt    3   26.239 ±   4.041  ms/op
SqlBenchmark.querySql        5           5000000      auto           mmap              none        force  avgt    3   25.745 ±   6.648  ms/op
SqlBenchmark.querySql        6           5000000  explicit           mmap              none        false  avgt    3  151.024 ±  26.757  ms/op
SqlBenchmark.querySql        6           5000000  explicit           mmap              none        force  avgt    3   56.607 ±   5.462  ms/op
SqlBenchmark.querySql        6           5000000      auto           mmap              none        false  avgt    3  150.426 ±  18.721  ms/op
SqlBenchmark.querySql        6           5000000      auto           mmap              none        force  avgt    3   56.812 ±  15.072  ms/op
SqlBenchmark.querySql        7           5000000  explicit           mmap              none        false  avgt    3  106.111 ±   7.608  ms/op
SqlBenchmark.querySql        7           5000000  explicit           mmap              none        force  avgt    3   45.750 ±   2.709  ms/op
SqlBenchmark.querySql        7           5000000      auto           mmap              none        false  avgt    3  109.590 ±  10.640  ms/op
SqlBenchmark.querySql        7           5000000      auto           mmap              none        force  avgt    3   45.695 ±   5.982  ms/op
SqlBenchmark.querySql        8           5000000  explicit           mmap              none        false  avgt    3  828.884 ±  28.062  ms/op
SqlBenchmark.querySql        8           5000000  explicit           mmap              none        force  avgt    3  359.063 ±  36.453  ms/op
SqlBenchmark.querySql        8           5000000      auto           mmap              none        false  avgt    3  826.850 ±  31.962  ms/op
SqlBenchmark.querySql        8           5000000      auto           mmap              none        force  avgt    3  356.854 ±  15.195  ms/op
SqlBenchmark.querySql       17           5000000  explicit           mmap              none        false  avgt    3  236.877 ±   1.832  ms/op
SqlBenchmark.querySql       17           5000000  explicit           mmap              none        force  avgt    3   60.963 ±   5.899  ms/op
SqlBenchmark.querySql       17           5000000      auto           mmap              none        false  avgt    3  236.196 ±  12.868  ms/op
SqlBenchmark.querySql       17           5000000      auto           mmap              none        force  avgt    3   61.224 ±   2.722  ms/op
SqlBenchmark.querySql       18           5000000  explicit           mmap              none        false  avgt    3  317.946 ±   7.598  ms/op
SqlBenchmark.querySql       18           5000000  explicit           mmap              none        force  avgt    3   88.799 ±   5.171  ms/op
SqlBenchmark.querySql       18           5000000      auto           mmap              none        false  avgt    3  314.940 ±  53.150  ms/op
SqlBenchmark.querySql       18           5000000      auto           mmap              none        force  avgt    3   89.033 ±   6.503  ms/op
SqlBenchmark.querySql       21           5000000  explicit           mmap              none        false  avgt    3  551.875 ±  87.206  ms/op
SqlBenchmark.querySql       21           5000000  explicit           mmap              none        force  avgt    3  305.976 ± 366.573  ms/op
SqlBenchmark.querySql       21           5000000      auto           mmap              none        false  avgt    3  501.930 ± 221.947  ms/op
SqlBenchmark.querySql       21           5000000      auto           mmap              none        force  avgt    3  286.408 ±  54.351  ms/op
SqlBenchmark.querySql       22           5000000  explicit           mmap              none        false  avgt    3   18.261 ±   6.083  ms/op
SqlBenchmark.querySql       22           5000000  explicit           mmap              none        force  avgt    3   18.182 ±   4.175  ms/op
SqlBenchmark.querySql       22           5000000      auto           mmap              none        false  avgt    3   18.169 ±   4.463  ms/op
SqlBenchmark.querySql       22           5000000      auto           mmap              none        force  avgt    3   18.139 ±   5.354  ms/op
SqlBenchmark.querySql       26           5000000  explicit           mmap              none        false  avgt    3   25.782 ±   6.490  ms/op
SqlBenchmark.querySql       26           5000000  explicit           mmap              none        force  avgt    3   22.193 ±   4.287  ms/op
SqlBenchmark.querySql       26           5000000      auto           mmap              none        false  avgt    3   25.816 ±   5.929  ms/op
SqlBenchmark.querySql       26           5000000      auto           mmap              none        force  avgt    3   22.098 ±   4.914  ms/op
SqlBenchmark.querySql       27           5000000  explicit           mmap              none        false  avgt    3  492.718 ±  40.841  ms/op
SqlBenchmark.querySql       27           5000000  explicit           mmap              none        force  avgt    3  238.930 ±  15.381  ms/op
SqlBenchmark.querySql       27           5000000      auto           mmap              none        false  avgt    3  469.631 ± 133.047  ms/op
SqlBenchmark.querySql       27           5000000      auto           mmap              none        force  avgt    3  240.719 ±  44.220  ms/op

after:

Benchmark              (query)  (rowsPerSegment)  (schema)  (storageType)  (stringEncoding)  (vectorize)  Mode  Cnt    Score     Error  Units
SqlBenchmark.querySql        4           5000000  explicit           mmap              none        false  avgt    3  102.034 ±  12.560  ms/op
SqlBenchmark.querySql        4           5000000  explicit           mmap              none        force  avgt    3   41.707 ±   6.050  ms/op
SqlBenchmark.querySql        4           5000000      auto           mmap              none        false  avgt    3  101.689 ±   6.914  ms/op
SqlBenchmark.querySql        4           5000000      auto           mmap              none        force  avgt    3   41.324 ±   8.919  ms/op
SqlBenchmark.querySql        5           5000000  explicit           mmap              none        false  avgt    3   25.053 ±   5.847  ms/op
SqlBenchmark.querySql        5           5000000  explicit           mmap              none        force  avgt    3   24.592 ±   4.665  ms/op
SqlBenchmark.querySql        5           5000000      auto           mmap              none        false  avgt    3   25.285 ±   2.859  ms/op
SqlBenchmark.querySql        5           5000000      auto           mmap              none        force  avgt    3   24.879 ±   4.202  ms/op
SqlBenchmark.querySql        6           5000000  explicit           mmap              none        false  avgt    3  148.534 ±   7.940  ms/op
SqlBenchmark.querySql        6           5000000  explicit           mmap              none        force  avgt    3   55.703 ±   5.023  ms/op
SqlBenchmark.querySql        6           5000000      auto           mmap              none        false  avgt    3  148.809 ±   9.032  ms/op
SqlBenchmark.querySql        6           5000000      auto           mmap              none        force  avgt    3   55.883 ±   7.610  ms/op
SqlBenchmark.querySql        7           5000000  explicit           mmap              none        false  avgt    3  104.756 ±   9.006  ms/op
SqlBenchmark.querySql        7           5000000  explicit           mmap              none        force  avgt    3   44.237 ±   8.804  ms/op
SqlBenchmark.querySql        7           5000000      auto           mmap              none        false  avgt    3  103.306 ±   8.286  ms/op
SqlBenchmark.querySql        7           5000000      auto           mmap              none        force  avgt    3   44.072 ±   8.514  ms/op
SqlBenchmark.querySql        8           5000000  explicit           mmap              none        false  avgt    3  809.821 ±  32.454  ms/op
SqlBenchmark.querySql        8           5000000  explicit           mmap              none        force  avgt    3  357.033 ±  41.743  ms/op
SqlBenchmark.querySql        8           5000000      auto           mmap              none        false  avgt    3  810.001 ±  20.478  ms/op
SqlBenchmark.querySql        8           5000000      auto           mmap              none        force  avgt    3  335.708 ±  39.297  ms/op
SqlBenchmark.querySql       17           5000000  explicit           mmap              none        false  avgt    3  231.403 ±  14.622  ms/op
SqlBenchmark.querySql       17           5000000  explicit           mmap              none        force  avgt    3   59.764 ±   6.011  ms/op
SqlBenchmark.querySql       17           5000000      auto           mmap              none        false  avgt    3  232.429 ±  18.161  ms/op
SqlBenchmark.querySql       17           5000000      auto           mmap              none        force  avgt    3   59.919 ±   4.308  ms/op
SqlBenchmark.querySql       18           5000000  explicit           mmap              none        false  avgt    3  316.084 ±  86.461  ms/op
SqlBenchmark.querySql       18           5000000  explicit           mmap              none        force  avgt    3   86.129 ±  24.248  ms/op
SqlBenchmark.querySql       18           5000000      auto           mmap              none        false  avgt    3  313.645 ±  27.448  ms/op
SqlBenchmark.querySql       18           5000000      auto           mmap              none        force  avgt    3   87.142 ±   1.809  ms/op
SqlBenchmark.querySql       21           5000000  explicit           mmap              none        false  avgt    3  456.894 ± 334.867  ms/op
SqlBenchmark.querySql       21           5000000  explicit           mmap              none        force  avgt    3  338.666 ±  73.569  ms/op
SqlBenchmark.querySql       21           5000000      auto           mmap              none        false  avgt    3  505.823 ± 281.326  ms/op
SqlBenchmark.querySql       21           5000000      auto           mmap              none        force  avgt    3  336.278 ± 245.398  ms/op
SqlBenchmark.querySql       22           5000000  explicit           mmap              none        false  avgt    3   18.209 ±   3.422  ms/op
SqlBenchmark.querySql       22           5000000  explicit           mmap              none        force  avgt    3   18.660 ±   5.772  ms/op
SqlBenchmark.querySql       22           5000000      auto           mmap              none        false  avgt    3   18.582 ±   9.571  ms/op
SqlBenchmark.querySql       22           5000000      auto           mmap              none        force  avgt    3   18.433 ±   2.817  ms/op
SqlBenchmark.querySql       26           5000000  explicit           mmap              none        false  avgt    3   25.835 ±   5.871  ms/op
SqlBenchmark.querySql       26           5000000  explicit           mmap              none        force  avgt    3   22.382 ±   3.196  ms/op
SqlBenchmark.querySql       26           5000000      auto           mmap              none        false  avgt    3   26.441 ±  11.957  ms/op
SqlBenchmark.querySql       26           5000000      auto           mmap              none        force  avgt    3   22.970 ±   8.281  ms/op
SqlBenchmark.querySql       27           5000000  explicit           mmap              none        false  avgt    3  536.666 ± 183.462  ms/op
SqlBenchmark.querySql       27           5000000  explicit           mmap              none        force  avgt    3  255.328 ± 191.410  ms/op
SqlBenchmark.querySql       27           5000000      auto           mmap              none        false  avgt    3  488.789 ± 287.492  ms/op
SqlBenchmark.querySql       27           5000000      auto           mmap              none        force  avgt    3  240.130 ±  28.132  ms/op

@clintropolis
Copy link
Member Author

clintropolis commented Feb 6, 2024

SqlNestedDataBenchmark.querySql       10           5000000  explicit              none        false  avgt    3   11.637 ±  3.534  ms/op
SqlNestedDataBenchmark.querySql       10           5000000  explicit              none        force  avgt    3   11.754 ±  2.188  ms/op
SqlNestedDataBenchmark.querySql       10           5000000      auto              none        false  avgt    3   11.763 ±  3.497  ms/op
SqlNestedDataBenchmark.querySql       10           5000000      auto              none        force  avgt    3   11.817 ±  3.566  ms/op
after:
SqlNestedDataBenchmark.querySql       10           5000000  explicit              none        false  avgt    3   11.866 ±   2.152  ms/op
SqlNestedDataBenchmark.querySql       10           5000000  explicit              none        force  avgt    3   11.683 ±   3.231  ms/op
SqlNestedDataBenchmark.querySql       10           5000000      auto              none        false  avgt    3   11.811 ±   2.869  ms/op
SqlNestedDataBenchmark.querySql       10           5000000      auto              none        force  avgt    3   11.703 ±   4.288  ms/op

Benchmark                        (query)  (rowsPerSegment)  (schema)  (stringEncoding)  (vectorize)  Mode  Cnt    Score    Error  Units
SqlNestedDataBenchmark.querySql       12           5000000  explicit              none        false  avgt    3   54.115 ±  6.050  ms/op
SqlNestedDataBenchmark.querySql       12           5000000  explicit              none        force  avgt    3   19.037 ±  4.762  ms/op
SqlNestedDataBenchmark.querySql       12           5000000      auto              none        false  avgt    3    0.832 ±  0.363  ms/op
SqlNestedDataBenchmark.querySql       12           5000000      auto              none        force  avgt    3    0.849 ±  0.395  ms/op
after:
SqlNestedDataBenchmark.querySql       12           5000000  explicit              none        false  avgt    3   52.140 ±   3.170  ms/op
SqlNestedDataBenchmark.querySql       12           5000000  explicit              none        force  avgt    3   18.735 ±   4.111  ms/op
SqlNestedDataBenchmark.querySql       12           5000000      auto              none        false  avgt    3    0.832 ±   0.461  ms/op
SqlNestedDataBenchmark.querySql       12           5000000      auto              none        force  avgt    3    0.847 ±   0.583  ms/op


Benchmark                        (query)  (rowsPerSegment)  (schema)  (stringEncoding)  (vectorize)  Mode  Cnt    Score    Error  Units
SqlNestedDataBenchmark.querySql       14           5000000  explicit              none        false  avgt    3  108.765 ± 14.863  ms/op
SqlNestedDataBenchmark.querySql       14           5000000  explicit              none        force  avgt    3   53.219 ±  6.255  ms/op
SqlNestedDataBenchmark.querySql       14           5000000      auto              none        false  avgt    3  208.942 ±  6.933  ms/op
SqlNestedDataBenchmark.querySql       14           5000000      auto              none        force  avgt    3  183.875 ±  6.608  ms/op
after:
SqlNestedDataBenchmark.querySql       14           5000000  explicit              none        false  avgt    3  105.725 ±   6.188  ms/op
SqlNestedDataBenchmark.querySql       14           5000000  explicit              none        force  avgt    3   53.162 ±   6.720  ms/op
SqlNestedDataBenchmark.querySql       14           5000000      auto              none        false  avgt    3  118.045 ±  10.816  ms/op
SqlNestedDataBenchmark.querySql       14           5000000      auto              none        force  avgt    3   54.288 ±   2.659  ms/op


Benchmark                        (query)  (rowsPerSegment)  (schema)  (stringEncoding)  (vectorize)  Mode  Cnt    Score    Error  Units
SqlNestedDataBenchmark.querySql       16           5000000  explicit              none        false  avgt    3  127.044 ± 53.657  ms/op
SqlNestedDataBenchmark.querySql       16           5000000  explicit              none        force  avgt    3  127.880 ± 18.051  ms/op
SqlNestedDataBenchmark.querySql       16           5000000      auto              none        false  avgt    3  127.662 ± 15.958  ms/op
SqlNestedDataBenchmark.querySql       16           5000000      auto              none        force  avgt    3  126.797 ± 14.320  ms/op
after:
SqlNestedDataBenchmark.querySql       16           5000000  explicit              none        false  avgt    3  126.754 ±  22.735  ms/op
SqlNestedDataBenchmark.querySql       16           5000000  explicit              none        force  avgt    3  128.253 ±  10.778  ms/op
SqlNestedDataBenchmark.querySql       16           5000000      auto              none        false  avgt    3  127.340 ±  37.610  ms/op
SqlNestedDataBenchmark.querySql       16           5000000      auto              none        force  avgt    3  128.456 ±   6.090  ms/op


Benchmark                        (query)  (rowsPerSegment)  (schema)  (stringEncoding)  (vectorize)  Mode  Cnt    Score    Error  Units
SqlNestedDataBenchmark.querySql       18           5000000  explicit              none        false  avgt    3  147.149 ± 22.464  ms/op
SqlNestedDataBenchmark.querySql       18           5000000  explicit              none        force  avgt    3  130.171 ±  2.246  ms/op
SqlNestedDataBenchmark.querySql       18           5000000      auto              none        false  avgt    3  114.237 ± 10.156  ms/op
SqlNestedDataBenchmark.querySql       18           5000000      auto              none        force  avgt    3  116.078 ± 20.910  ms/op
after:
SqlNestedDataBenchmark.querySql       18           5000000  explicit              none        false  avgt    3  147.123 ±  25.873  ms/op
SqlNestedDataBenchmark.querySql       18           5000000  explicit              none        force  avgt    3  129.819 ±  20.697  ms/op
SqlNestedDataBenchmark.querySql       18           5000000      auto              none        false  avgt    3  114.968 ±  28.498  ms/op
SqlNestedDataBenchmark.querySql       18           5000000      auto              none        force  avgt    3  115.304 ±   2.985  ms/op


Benchmark                        (query)  (rowsPerSegment)  (schema)  (stringEncoding)  (vectorize)  Mode  Cnt    Score    Error  Units
SqlNestedDataBenchmark.querySql       20           5000000  explicit              none        false  avgt    3  244.332 ± 14.908  ms/op
SqlNestedDataBenchmark.querySql       20           5000000  explicit              none        force  avgt    3  170.685 ± 13.431  ms/op
SqlNestedDataBenchmark.querySql       20           5000000      auto              none        false  avgt    3  342.328 ± 22.828  ms/op
SqlNestedDataBenchmark.querySql       20           5000000      auto              none        force  avgt    3  441.128 ± 55.055  ms/op
after:
SqlNestedDataBenchmark.querySql       20           5000000  explicit              none        false  avgt    3  244.219 ±  28.999  ms/op
SqlNestedDataBenchmark.querySql       20           5000000  explicit              none        force  avgt    3  173.492 ±  10.700  ms/op
SqlNestedDataBenchmark.querySql       20           5000000      auto              none        false  avgt    3  258.032 ±  34.933  ms/op
SqlNestedDataBenchmark.querySql       20           5000000      auto              none        force  avgt    3  170.822 ±  16.745  ms/op

Benchmark                        (query)  (rowsPerSegment)  (schema)  (stringEncoding)  (vectorize)  Mode  Cnt    Score    Error  Units
SqlNestedDataBenchmark.querySql       22           5000000  explicit              none        false  avgt    3  336.073 ± 78.811  ms/op
SqlNestedDataBenchmark.querySql       22           5000000  explicit              none        force  avgt    3  339.084 ± 46.841  ms/op
SqlNestedDataBenchmark.querySql       22           5000000      auto              none        false  avgt    3  175.400 ±  5.532  ms/op
SqlNestedDataBenchmark.querySql       22           5000000      auto              none        force  avgt    3  176.013 ± 12.655  ms/op
after:
SqlNestedDataBenchmark.querySql       22           5000000  explicit              none        false  avgt    3  338.662 ±  48.865  ms/op
SqlNestedDataBenchmark.querySql       22           5000000  explicit              none        force  avgt    3  335.812 ±  81.631  ms/op
SqlNestedDataBenchmark.querySql       22           5000000      auto              none        false  avgt    3  175.062 ±  13.503  ms/op
SqlNestedDataBenchmark.querySql       22           5000000      auto              none        force  avgt    3  173.447 ±   5.088  ms/op


Benchmark                        (query)  (rowsPerSegment)  (schema)  (stringEncoding)  (vectorize)  Mode  Cnt    Score    Error  Units
SqlNestedDataBenchmark.querySql       24           5000000  explicit              none        false  avgt    3  355.318 ± 31.117  ms/op
SqlNestedDataBenchmark.querySql       24           5000000  explicit              none        force  avgt    3  239.307 ± 15.838  ms/op
SqlNestedDataBenchmark.querySql       24           5000000      auto              none        false  avgt    3  204.634 ±  8.950  ms/op
SqlNestedDataBenchmark.querySql       24           5000000      auto              none        force  avgt    3  150.468 ± 25.591  ms/op
after:
SqlNestedDataBenchmark.querySql       24           5000000  explicit              none        false  avgt    3  354.594 ±  55.691  ms/op
SqlNestedDataBenchmark.querySql       24           5000000  explicit              none        force  avgt    3  217.886 ±  10.582  ms/op
SqlNestedDataBenchmark.querySql       24           5000000      auto              none        false  avgt    3  204.610 ±  17.658  ms/op
SqlNestedDataBenchmark.querySql       24           5000000      auto              none        force  avgt    3  149.440 ±  14.151  ms/op


Benchmark                        (query)  (rowsPerSegment)  (schema)  (stringEncoding)  (vectorize)  Mode  Cnt    Score    Error  Units
SqlNestedDataBenchmark.querySql       26           5000000  explicit              none        false  avgt    3   85.793 ± 10.031  ms/op
SqlNestedDataBenchmark.querySql       26           5000000  explicit              none        force  avgt    3   47.154 ±  4.797  ms/op
SqlNestedDataBenchmark.querySql       26           5000000      auto              none        false  avgt    3   13.057 ±  4.096  ms/op
SqlNestedDataBenchmark.querySql       26           5000000      auto              none        force  avgt    3   13.135 ±  5.635  ms/op
after:
SqlNestedDataBenchmark.querySql       26           5000000  explicit              none        false  avgt    3   85.970 ± 13.020  ms/op
SqlNestedDataBenchmark.querySql       26           5000000  explicit              none        force  avgt    3   47.193 ±  5.670  ms/op
SqlNestedDataBenchmark.querySql       26           5000000      auto              none        false  avgt    3   12.943 ±  3.607  ms/op
SqlNestedDataBenchmark.querySql       26           5000000      auto              none        force  avgt    3   13.055 ±  4.818  ms/op

Benchmark                        (query)  (rowsPerSegment)  (schema)  (stringEncoding)  (vectorize)  Mode  Cnt    Score    Error  Units
SqlNestedDataBenchmark.querySql       28           5000000  explicit              none        false  avgt    3   92.461 ± 10.528  ms/op
SqlNestedDataBenchmark.querySql       28           5000000  explicit              none        force  avgt    3   52.672 ±  5.959  ms/op
SqlNestedDataBenchmark.querySql       28           5000000      auto              none        false  avgt    3   40.219 ±  8.570  ms/op
SqlNestedDataBenchmark.querySql       28           5000000      auto              none        force  avgt    3   37.183 ±  7.056  ms/op
after:
SqlNestedDataBenchmark.querySql       28           5000000  explicit              none        false  avgt    3   92.558 ± 10.621  ms/op
SqlNestedDataBenchmark.querySql       28           5000000  explicit              none        force  avgt    3   52.874 ±  6.353  ms/op
SqlNestedDataBenchmark.querySql       28           5000000      auto              none        false  avgt    3   57.338 ±  2.425  ms/op
SqlNestedDataBenchmark.querySql       28           5000000      auto              none        force  avgt    3   36.934 ±  2.407  ms/op

Benchmark                        (query)  (rowsPerSegment)  (schema)  (stringEncoding)  (vectorize)  Mode  Cnt    Score    Error  Units
SqlNestedDataBenchmark.querySql       30           5000000  explicit              none        false  avgt    3   98.790 ±  7.881  ms/op
SqlNestedDataBenchmark.querySql       30           5000000  explicit              none        force  avgt    3   55.169 ±  5.377  ms/op
SqlNestedDataBenchmark.querySql       30           5000000      auto              none        false  avgt    3   64.262 ± 10.481  ms/op
SqlNestedDataBenchmark.querySql       30           5000000      auto              none        force  avgt    3   92.768 ±  8.949  ms/op
after:
SqlNestedDataBenchmark.querySql       30           5000000  explicit              none        false  avgt    3   99.081 ±  4.052  ms/op
SqlNestedDataBenchmark.querySql       30           5000000  explicit              none        force  avgt    3   54.448 ±  4.423  ms/op
SqlNestedDataBenchmark.querySql       30           5000000      auto              none        false  avgt    3  108.130 ±  4.709  ms/op
SqlNestedDataBenchmark.querySql       30           5000000      auto              none        force  avgt    3   54.642 ± 11.665  ms/op

Benchmark                        (query)  (rowsPerSegment)  (schema)  (stringEncoding)  (vectorize)  Mode  Cnt    Score    Error  Units
SqlNestedDataBenchmark.querySql       32           5000000  explicit              none        false  avgt    3  107.714 ± 15.555  ms/op
SqlNestedDataBenchmark.querySql       32           5000000  explicit              none        force  avgt    3   61.846 ± 41.127  ms/op
SqlNestedDataBenchmark.querySql       32           5000000      auto              none        false  avgt    3  113.979 ± 10.049  ms/op
SqlNestedDataBenchmark.querySql       32           5000000      auto              none        force  avgt    3  101.386 ± 14.529  ms/op
after:
SqlNestedDataBenchmark.querySql       32           5000000  explicit              none        false  avgt    3  107.730 ±   8.056  ms/op
SqlNestedDataBenchmark.querySql       32           5000000  explicit              none        force  avgt    3   60.531 ±   4.274  ms/op
SqlNestedDataBenchmark.querySql       32           5000000      auto              none        false  avgt    3  117.745 ±   8.900  ms/op
SqlNestedDataBenchmark.querySql       32           5000000      auto              none        force  avgt    3   59.409 ±   4.373  ms/op


Benchmark                        (query)  (rowsPerSegment)  (schema)  (stringEncoding)  (vectorize)  Mode  Cnt    Score    Error  Units
SqlNestedDataBenchmark.querySql       34           5000000  explicit              none        false  avgt    3   22.918 ±  5.078  ms/op
SqlNestedDataBenchmark.querySql       34           5000000  explicit              none        force  avgt    3   17.019 ±  4.782  ms/op
SqlNestedDataBenchmark.querySql       34           5000000      auto              none        false  avgt    3   23.445 ±  2.687  ms/op
SqlNestedDataBenchmark.querySql       34           5000000      auto              none        force  avgt    3   17.359 ± 16.973  ms/op
after:
SqlNestedDataBenchmark.querySql       34           5000000  explicit              none        false  avgt    3   22.788 ±   3.001  ms/op
SqlNestedDataBenchmark.querySql       34           5000000  explicit              none        force  avgt    3   16.920 ±   3.866  ms/op
SqlNestedDataBenchmark.querySql       34           5000000      auto              none        false  avgt    3   23.247 ±   4.945  ms/op
SqlNestedDataBenchmark.querySql       34           5000000      auto              none        force  avgt    3   16.931 ±   5.282  ms/op


Benchmark                        (query)  (rowsPerSegment)  (schema)  (stringEncoding)  (vectorize)  Mode  Cnt    Score    Error  Units
SqlNestedDataBenchmark.querySql       36           5000000  explicit              none        false  avgt    3   42.250 ±  6.649  ms/op
SqlNestedDataBenchmark.querySql       36           5000000  explicit              none        force  avgt    3   25.124 ±  5.186  ms/op
SqlNestedDataBenchmark.querySql       36           5000000      auto              none        false  avgt    3   45.242 ±  4.117  ms/op
SqlNestedDataBenchmark.querySql       36           5000000      auto              none        force  avgt    3   25.116 ±  6.706  ms/op
after:
SqlNestedDataBenchmark.querySql       36           5000000  explicit              none        false  avgt    3   42.112 ±   3.982  ms/op
SqlNestedDataBenchmark.querySql       36           5000000  explicit              none        force  avgt    3   25.153 ±   3.858  ms/op
SqlNestedDataBenchmark.querySql       36           5000000      auto              none        false  avgt    3   45.255 ±   3.932  ms/op
SqlNestedDataBenchmark.querySql       36           5000000      auto              none        force  avgt    3   25.070 ±   4.051  ms/op

Benchmark                        (query)  (rowsPerSegment)  (schema)  (stringEncoding)  (vectorize)  Mode  Cnt    Score    Error  Units
SqlNestedDataBenchmark.querySql       38           5000000  explicit              none        false  avgt    3   52.052 ± 15.365  ms/op
SqlNestedDataBenchmark.querySql       38           5000000  explicit              none        force  avgt    3   47.800 ±  6.100  ms/op
SqlNestedDataBenchmark.querySql       38           5000000      auto              none        false  avgt    3   52.033 ±  3.063  ms/op
SqlNestedDataBenchmark.querySql       38           5000000      auto              none        force  avgt    3   47.645 ±  4.431  ms/op
after:
SqlNestedDataBenchmark.querySql       38           5000000  explicit              none        false  avgt    3   51.805 ±   3.865  ms/op
SqlNestedDataBenchmark.querySql       38           5000000  explicit              none        force  avgt    3   32.438 ±   6.235  ms/op
SqlNestedDataBenchmark.querySql       38           5000000      auto              none        false  avgt    3   52.337 ±   7.259  ms/op
SqlNestedDataBenchmark.querySql       38           5000000      auto              none        force  avgt    3   48.042 ±  10.304  ms/op


Benchmark                        (query)  (rowsPerSegment)  (schema)  (stringEncoding)  (vectorize)  Mode  Cnt    Score    Error  Units
SqlNestedDataBenchmark.querySql       40           5000000  explicit              none        false  avgt    3  206.725 ± 34.181  ms/op
SqlNestedDataBenchmark.querySql       40           5000000  explicit              none        force  avgt    3  171.111 ±  5.876  ms/op
SqlNestedDataBenchmark.querySql       40           5000000      auto              none        false  avgt    3  211.692 ±  1.864  ms/op
SqlNestedDataBenchmark.querySql       40           5000000      auto              none        force  avgt    3  172.078 ± 21.725  ms/op
after:
SqlNestedDataBenchmark.querySql       40           5000000  explicit              none        false  avgt    3  204.639 ±   3.456  ms/op
SqlNestedDataBenchmark.querySql       40           5000000  explicit              none        force  avgt    3  172.043 ±   6.625  ms/op
SqlNestedDataBenchmark.querySql       40           5000000      auto              none        false  avgt    3  211.350 ±   5.531  ms/op
SqlNestedDataBenchmark.querySql       40           5000000      auto              none        force  avgt    3  171.776 ±   6.388  ms/op

Benchmark                        (query)  (rowsPerSegment)  (schema)  (stringEncoding)  (vectorize)  Mode  Cnt    Score    Error  Units
SqlNestedDataBenchmark.querySql       42           5000000  explicit              none        false  avgt    3  289.705 ± 34.154  ms/op
SqlNestedDataBenchmark.querySql       42           5000000  explicit              none        force  avgt    3  274.149 ± 18.942  ms/op
SqlNestedDataBenchmark.querySql       42           5000000      auto              none        false  avgt    3  329.124 ±  0.570  ms/op
SqlNestedDataBenchmark.querySql       42           5000000      auto              none        force  avgt    3  283.623 ± 30.770  ms/op
after:
SqlNestedDataBenchmark.querySql       42           5000000  explicit              none        false  avgt    3  301.896 ± 401.128  ms/op
SqlNestedDataBenchmark.querySql       42           5000000  explicit              none        force  avgt    3  275.323 ±  59.847  ms/op
SqlNestedDataBenchmark.querySql       42           5000000      auto              none        false  avgt    3  296.636 ±  39.259  ms/op
SqlNestedDataBenchmark.querySql       42           5000000      auto              none        force  avgt    3  299.627 ±   3.286  ms/op

Benchmark                        (query)  (rowsPerSegment)  (schema)  (stringEncoding)  (vectorize)  Mode  Cnt    Score    Error  Units
SqlNestedDataBenchmark.querySql       44           5000000  explicit              none        false  avgt    3  259.379 ± 22.446  ms/op
SqlNestedDataBenchmark.querySql       44           5000000  explicit              none        force  avgt    3  249.366 ±  1.213  ms/op
SqlNestedDataBenchmark.querySql       44           5000000      auto              none        false  avgt    3  297.829 ± 11.551  ms/op
SqlNestedDataBenchmark.querySql       44           5000000      auto              none        force  avgt    3  261.611 ± 15.745  ms/op
after:
SqlNestedDataBenchmark.querySql       44           5000000  explicit              none        false  avgt    3  264.475 ±   5.931  ms/op
SqlNestedDataBenchmark.querySql       44           5000000  explicit              none        force  avgt    3  266.744 ±  12.388  ms/op
SqlNestedDataBenchmark.querySql       44           5000000      auto              none        false  avgt    3  289.343 ±  15.449  ms/op
SqlNestedDataBenchmark.querySql       44           5000000      auto              none        force  avgt    3  265.745 ±  10.049  ms/op


Benchmark                        (query)  (rowsPerSegment)  (schema)  (stringEncoding)  (vectorize)  Mode  Cnt    Score    Error  Units
SqlNestedDataBenchmark.querySql       46           5000000  explicit              none        false  avgt    3  294.571 ± 51.248  ms/op
SqlNestedDataBenchmark.querySql       46           5000000  explicit              none        force  avgt    3  270.761 ± 64.697  ms/op
SqlNestedDataBenchmark.querySql       46           5000000      auto              none        false  avgt    3  254.495 ± 14.844  ms/op
SqlNestedDataBenchmark.querySql       46           5000000      auto              none        force  avgt    3  282.779 ±  9.570  ms/op
after:
SqlNestedDataBenchmark.querySql       46           5000000  explicit              none        false  avgt    3    9.897 ±   3.995  ms/op
SqlNestedDataBenchmark.querySql       46           5000000  explicit              none        force  avgt    3    9.901 ±   3.668  ms/op
SqlNestedDataBenchmark.querySql       46           5000000      auto              none        false  avgt    3    9.908 ±   4.091  ms/op
SqlNestedDataBenchmark.querySql       46           5000000      auto              none        force  avgt    3    9.876 ±   3.121  ms/op

@clintropolis
Copy link
Member Author

sql expression benchmark:

before:
Benchmark                        (query)  (rowsPerSegment)  (schema)  (vectorize)  Mode  Cnt    Score    Error  Units
SqlExpressionBenchmark.querySql       38           5000000  explicit        false  avgt    3  147.376 ± 14.760  ms/op
SqlExpressionBenchmark.querySql       38           5000000  explicit        force  avgt    3  151.660 ± 76.733  ms/op
SqlExpressionBenchmark.querySql       38           5000000      auto        false  avgt    3  149.385 ± 14.866  ms/op
SqlExpressionBenchmark.querySql       38           5000000      auto        force  avgt    3  147.082 ± 19.714  ms/op
SqlExpressionBenchmark.querySql       39           5000000  explicit        false  avgt    3  149.219 ± 28.936  ms/op
SqlExpressionBenchmark.querySql       39           5000000  explicit        force  avgt    3  149.769 ± 46.633  ms/op
SqlExpressionBenchmark.querySql       39           5000000      auto        false  avgt    3  148.768 ± 19.838  ms/op
SqlExpressionBenchmark.querySql       39           5000000      auto        force  avgt    3  148.006 ± 17.356  ms/op
SqlExpressionBenchmark.querySql       40           5000000  explicit        false  avgt    3  207.326 ± 59.087  ms/op
SqlExpressionBenchmark.querySql       40           5000000      auto        false  avgt    3  194.764 ± 18.171  ms/op
SqlExpressionBenchmark.querySql       41           5000000  explicit        false  avgt    3  481.297 ± 63.210  ms/op
SqlExpressionBenchmark.querySql       41           5000000      auto        false  avgt    3  492.275 ± 41.688  ms/op
after:
Benchmark                        (query)  (rowsPerSegment)  (schema)  (vectorize)  Mode  Cnt    Score    Error  Units
SqlExpressionBenchmark.querySql       38           5000000  explicit        false  avgt    5  131.982 ±  7.234  ms/op
SqlExpressionBenchmark.querySql       38           5000000  explicit        force  avgt    5  130.894 ±  7.175  ms/op
SqlExpressionBenchmark.querySql       38           5000000      auto        false  avgt    5  130.813 ±  3.670  ms/op
SqlExpressionBenchmark.querySql       38           5000000      auto        force  avgt    5  130.380 ±  3.506  ms/op
SqlExpressionBenchmark.querySql       39           5000000  explicit        false  avgt    5  131.314 ±  4.224  ms/op
SqlExpressionBenchmark.querySql       39           5000000  explicit        force  avgt    5  130.928 ±  7.588  ms/op
SqlExpressionBenchmark.querySql       39           5000000      auto        false  avgt    5  132.586 ±  8.593  ms/op
SqlExpressionBenchmark.querySql       39           5000000      auto        force  avgt    5  132.013 ±  3.563  ms/op
SqlExpressionBenchmark.querySql       40           5000000  explicit        false  avgt    5  184.984 ±  6.520  ms/op
SqlExpressionBenchmark.querySql       40           5000000      auto        false  avgt    4  185.333 ± 11.025  ms/op
SqlExpressionBenchmark.querySql       41           5000000  explicit        false  avgt    5  132.235 ±  5.456  ms/op
SqlExpressionBenchmark.querySql       41           5000000      auto        false  avgt    5  127.913 ±  5.675  ms/op

*/
@Deprecated
@SuppressWarnings({"unreachable", "unused"})
default void preFilters(List<Filter> preFilters)

Check notice

Code scanning / CodeQL

Useless parameter Note

The parameter 'preFilters' is never used.
* {@link org.apache.druid.segment.vector.FilteredVectorOffset}
*/
@SuppressWarnings({"unreachable", "unused"})
default void postFilters(List<Filter> postFilters)

Check notice

Code scanning / CodeQL

Useless parameter Note

The parameter 'postFilters' is never used.
final long bitmapConstructionStartNs = System.nanoTime();
for (Filter subfilter : filters) {
boolean needMatcher = true;
final BitmapColumnIndex index = subfilter.getBitmapColumnIndex(columnIndexSelector);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering if we can delegate to child's filter bundles, and have a way of combining the filter bundles?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do that for AndFilter, but in the case of OrFilter its a bit more complicated because we can only use indexes for a bitmap offset if all of the clauses fully support indexes. For the clauses of the OR that use the default implementation of makeFilterBundle this isn't an issue since it is effectively the same logic we are doing here, however for the case where an AndFilter is one of the clauses of an OrFilter, using makeFilterBundle could partition the clauses of the AND into indexes and matchers, which is not what we want for OR because we need all of the AND clauses to be using indexes for the OR to use indexes. Does that make sense?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fwiw, it should be possible to convert a bitmap index into a matcher by testing the id of the row against the bitmap. It should still be better than loading the value and grabbing the matcher and you can combine it together with a matcher to get the correct semantics inside of the OR here for the times when you need to do it all via a matcher

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code is actually doing that, if not all of the sub-filter can be converted into indexes, we re-use them to make a matcher, i'll add some comments and such to try to make it clearer

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i did end up doing this to make OrFilter delegate to child bundles. Its a bit more complicated than the initial implementation, but the code comments should explain the possible outcomes for different shapes of child bundles

@JsonCreator
public DruidProcessingIndexesConfig(
@JsonProperty("skipValueRangeIndexScale") @Nullable Double skipValueRangeIndexScale,
@JsonProperty("skipValuePredicateIndexScale") @Nullable Double skipValuePredicateIndexScale
@JsonProperty("skipValueRangeIndexScale") @Nullable Double skipValueRangeIndexScale
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we still even need this one?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left it in place to make it easy for us devs to play around with tunings, I don't ever intend for cluster operators to have to set it

@SuppressWarnings({"unreachable", "unused"})
default void preFilters(List<Filter> preFilters)
{
// do nothing, nothing calls this
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still exists just so that implementations don't suddenly break? If so, perhaps add some comments either here or in the javadoc that it will be removed in the future?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, since you currently have to write an extension to customize metrics, i figured it would be less disruptive to deprecate than remove it immediately. Will update the javadocs to indicate that it will go away in the future.

Comment on lines 259 to 262
* @deprecated use {@link #filterBundle(FilterBundle.BundleInfo)} instead to collect details about filters which were
* used as value matchers for {@link org.apache.druid.segment.FilteredOffset} or
* {@link org.apache.druid.segment.vector.FilteredVectorOffset}
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here?

final long bitmapConstructionStartNs = System.nanoTime();
T result = columnIndex.computeBitmapResult(bitmapResultFactory, selectionRowCount, totalRowCount, includeUnknown);
final long totalConstructionTimeNs = System.nanoTime() - bitmapConstructionStartNs;
if (result != null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you un-invert this.

if (result == null) {
  indexBundle = null;
} else {
  ... do stuff ...
}

Will be a lot smoother for the reader.

if (result != null) {
ImmutableBitmap bitmap = bitmapResultFactory.toImmutableBitmap(result);
indexBundle = new FilterBundle.SimpleIndexBundle(
Collections.singletonList(new FilterBundle.IndexBundleInfo(toString(), bitmap.size(), totalConstructionTimeNs)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you really need to force the toString() here? It's quite a bit of garbage, especially for an IN filter with like 100,000 entries and stuff, I suspect it's gonna have unintended side-effects. Can we force it to something that we know is non-verbose (perhaps an id, could assign ids based on position and nesting (so like, AND(x=y, z=2, OR(a IN (...), b IN (...))) would end up with x=y as id 1, z=2 as id 2, a IN (...) as id 3.1 (first nested under the 3rd predicate) and b IN (...) as id 3.2?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's a good point about huge filter strings. It is super useful to have the breakdown of filter partitioning, but maybe I can think of a way to have safeguards against this case, either adding a new method to filters instead of directly relying on toString to has these safeguards, or make it lazy so that it only happens if debug logging is enabled, or something else.

includeUnknown,
allowPartialIndex
);
if (subBundle.getIndex() != null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the code for generating the Cursor you were checking if the FilterBundle was null, here you are not. Are you missing a check here or are you needlessly checking there?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned above, Filter.makeFilterBundle cannot return null, it must return a bundle that has at least an index or a matcher. The cursor case is for when there is no filter on the cursor, but as mentioned in other comment will try to clean that up a bit

final long bitmapConstructionStartNs = System.nanoTime();
for (Filter subfilter : filters) {
boolean needMatcher = true;
final BitmapColumnIndex index = subfilter.getBitmapColumnIndex(columnIndexSelector);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fwiw, it should be possible to convert a bitmap index into a matcher by testing the id of the row against the bitmap. It should still be better than loading the value and grabbing the matcher and you can combine it together with a matcher to get the correct semantics inside of the OR here for the times when you need to do it all via a matcher

Comment on lines 80 to 87
public <T> FilterBundle makeFilterBundle(
ColumnIndexSelector columnIndexSelector,
BitmapResultFactory<T> bitmapResultFactory,
int selectionRowCount,
int totalRowCount,
boolean includeUnknown,
boolean allowPartialIndex
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the logic in this method a little bit hard to follow. I think it would help to have a bit of a pseudo-code explanation of how the method is trying to do its job.

The pseudo-code that I can imagine for this method (but which I'm pretty sure is currently not implemented by this method) looks like

  1. Compute all sub-filter bundles, store them all independently in a list
  2. If any of the sub-filter bundles requires a matcher, then convert all bitmap indexes into bitmap index matchers that compare rowIds against the bitmap index and return a matcher-only bundle.
  3. If all sub-filters are indexes, then OR them together and return an index-only bundle

This means that an OR filter will never produce a "partial bundle". It will either produce an index or a matcher bundle, but not both.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is more or less doing the pseudo-code you listed here, will try to update the comments to clear things up

Comment on lines +289 to +292
if (r == null) {
// all or nothing
return null;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the java doc for BitmapColumnIndex.computeBitmapResult() doesn't include a definition of what it means to return null. If a bitmap index cannot be computed, I would've expected index itself to be null, so either that is never null OR null means that it's an empty bitmap. If null means the empty bitmap, then this if statement is wrong (you just skip the clause). If it is never null, then the if statement is unnecessary and potentially causes confusion.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 2 argument version is not expected to be null, however the 4 argument version may return null and the javadocs reflect that https://github.com/apache/druid/pull/15838/files#diff-0524c5910c725c7074f7383e79ced3937827cdaae4170916f7f2ff017b21669dR56, which is how this new stuff can work.

If you remember, building the BitmapColumnIndex is not doing the actual work, rather it returns a thing that you call the 'compute' methods on to do the actual work to build the bitmaps. If BitmapColumnIndex is null then we cannot possibly use an index, while if not null, the 2 argument version will always build the index. The 4 argument version is what allows this new functionality to let the BitmapColumnIndex decide if it would have to do too much work to compute the index based on the selected row count so far.

This is a lot cleaner with the way things currently work, otherwise the all of the 'semantic' indexes classes returned by the .as of the index supplier (ArrayElementIndexes, DruidPredicateIndexes, LexicographicalRangeIndexes, NullValueIndex, NumericRangeIndexes, StringValueSetIndexes, Utf8ValueSetIndexes, ValueIndexes) would all need to have their methods that build BitmapColumnIndex updated to also take the selected and total row count parameters when building the BitmapColumnIndex. This was a much bigger surface area than just updating BitmapColumnIndex itself, since it is the common interface that all of these implementations spit out. I'm very much in favor of doing it this way because it is both significantly cleaner and a lot simpler to implement.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need the 2-argument method? :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, the search query is still calling the 2 argument version and computing the indexes directly from the filter instead of going through the bundle (also a bunch of benchmarks are calling it), so for now I was thinking about leaving it for this PR at least. I'll need to switch search query over to using the bundles anyway though, since its the real replacement for the 'auto' strategy that we removed, so I can look into removing the 2 argument version at that point.

@@ -29,7 +29,7 @@
public abstract class SimpleImmutableBitmapDelegatingIterableIndex extends SimpleBitmapColumnIndex
{
@Override
public <T> T computeBitmapResult(BitmapResultFactory<T> bitmapResultFactory, boolean includeUnknown)
public final <T> T computeBitmapResult(BitmapResultFactory<T> bitmapResultFactory, boolean includeUnknown)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why add the final? What are you protecting from?

Copy link
Contributor

@abhishekagarwal87 abhishekagarwal87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have done a partial review.


@Nullable
@Override
public final <T> T computeBitmapResult(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will be good to describe what "selectionRowCount and totalRowCount mean here in this method

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are described in the javadoc of base interface, BitmapColumnIndex

Comment on lines +38 to +39
private final int dictionarySize;
private final double scaleThreshold;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: but you could keep just one double number here - dictionarySize * scaleThreshold (dictionaryThreshold)

@@ -78,7 +78,7 @@ public BitmapColumnIndex forPredicate(DruidPredicateFactory matcherFactory)
final java.util.function.Function<Object, ExprEval<?>> evalFunction =
inputValue -> expr.eval(InputBindings.forInputSupplier(inputColumn, inputType, () -> inputValue));

return new SimpleImmutableBitmapIterableIndex()
return new DictionaryScanningBitmapIndex(inputColumnIndexes.getCardinality())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you revert to old behaviour?

Copy link
Member Author

@clintropolis clintropolis Feb 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the old behavior is no longer available. the approach for predicates is pretty conservative in this PR, we skip indexes when the selection count is smaller than the dictionary size, so it shouldn't really be possible to be worse than the previous behavior - predicate indexes must test the predicate against all of the dictionary values and then union all of the matching bitmaps together, so if the selection row count is smaller than the dictionary size it is fewer number of predicate evaluations and no bitmap operations.

I suspect that the selection size could be a bit smaller than dictionary size and come out ahead, but have not measured in this PR, we probably want to consider adding a config at that point to allow reverting to the behavior of this PR during tuning, since it is probably dependent on how expensive the predicate is what the break even point is.

@JsonCreator
public DruidProcessingIndexesConfig(
@JsonProperty("skipValueRangeIndexScale") @Nullable Double skipValueRangeIndexScale,
@JsonProperty("skipValuePredicateIndexScale") @Nullable Double skipValuePredicateIndexScale
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what will be the impact on clusters setting this value?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this parameter is for an easy way myself and other druid devs to experiment with stuff in case I didn't get the default scale value in the best spot; i don't really intend on cluster operators setting this value or for it to be documented. it controls the ratio of number of rows to be scanned vs computing indexes so effectively allowing adjusting the tuning performance of filter operations without code changes, but ideally it will eventually be removed

Comment on lines 38 to 48
* Itsa me, your filter bundle. This is a container for all the goodies used for producing filtered cursors,
* a {@link ImmutableBitmap} if the filter can use an index, and a {@link MatcherBundle} which contains functions to
* build {@link ValueMatcher} and {@link VectorValueMatcher} for any filters which must be evaluated row by row during
* the cursor scan. Cursors will use everything that is non-null, and at least one of index or matcher bundle MUST be
* set.
* <p>
* There are a few cases where the filter should set both indexes and matchers. For example, if the filter is a
* composite filter which can be partitioned, such as {@link org.apache.druid.segment.filter.AndFilter}, then the filter
* can be partitioned due to the intersection nature of AND, so the index can be set to reduce the number of rows and
* the matcher bundle will build a matcher which will filter the remainder. This can also be set in the case the index
* is an inexact match, and a matcher must be used to ensure that the remaining values actually match the filter.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Itsa me, your filter bundle. This is a container for all the goodies used for producing filtered cursors,
* a {@link ImmutableBitmap} if the filter can use an index, and a {@link MatcherBundle} which contains functions to
* build {@link ValueMatcher} and {@link VectorValueMatcher} for any filters which must be evaluated row by row during
* the cursor scan. Cursors will use everything that is non-null, and at least one of index or matcher bundle MUST be
* set.
* <p>
* There are a few cases where the filter should set both indexes and matchers. For example, if the filter is a
* composite filter which can be partitioned, such as {@link org.apache.druid.segment.filter.AndFilter}, then the filter
* can be partitioned due to the intersection nature of AND, so the index can be set to reduce the number of rows and
* the matcher bundle will build a matcher which will filter the remainder. This can also be set in the case the index
* is an inexact match, and a matcher must be used to ensure that the remaining values actually match the filter.
* It's me, your filter bundle. This is a container for all the goodies used for producing filtered cursors,
* a {@link ImmutableBitmap} if the filter can use an index, and a {@link MatcherBundle} which contains functions to
* build {@link ValueMatcher} and {@link VectorValueMatcher} for any filters which must be evaluated row by row during
* the cursor scan. Cursors will use everything that is non-null, and at least one of the index or the matcher bundle MUST be
* set.
* <p>
* There are a few cases where the filter should set both indexes and matchers. For example, if the filter is a
* composite filter which can be partitioned, such as {@link org.apache.druid.segment.filter.AndFilter}, then the filter
* can be partitioned due to the intersection nature of AND, so the index can be set to reduce the number of rows and
* the matcher bundle will build a matcher which will filter the remainder. This can also be set in the case the index
* is an inexact match, and a matcher must be used to ensure that the remaining values actually match the filter.

);
}

public interface IndexBundle
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see why FilterBundle is FilterBundle but IndexBundle doesn't seem like a container.

|| filter.canVectorizeMatcher(this);
// ideally we would allow stuff to vectorize if we can build indexes even if the value matcher cannot be
// vectorized, this used to be true in fact, but changes to filter partitioning (FilterBundle) have caused
// the only way to know this to be building the bitmaps since BitmapColumnIndex can return null.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// the only way to know this to be building the bitmaps since BitmapColumnIndex can return null.
// the only way to know this to be building the bitmaps since BitmapColumnIndex can return null with adaptive filter partitioning.

nit

Comment on lines 49 to 55
default <T> FilterBundle makeFilterBundle(
ColumnIndexSelector columnIndexSelector,
BitmapResultFactory<T> bitmapResultFactory,
int selectionRowCount,
int totalRowCount,
boolean includeUnknown,
boolean allowPartialIndex
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some javadocs about these params would be helpful.

allMatcherBundles.addAll(matcherOnlyBundles);

return new FilterBundle(
null,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in case matcher bundles are present, can we still use the bitmaps? Specifically, instead of the indexBundle being null here, we can use the unioned bitmap index. It will weed out the definite negatives (0's) when filtering, and the caller can use the matcher bundle to test the 1's

Copy link
Contributor

@LakshSingla LakshSingla Feb 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize that it will run into the same problem that the PR is trying to eliminate. But if that's the case, then why not convert the index-only bundle into a value matcher bundle as well, instead of unioning those?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doesn't work for an OR filter because a FilterBundle with both an index and matcher is basically an AND operation between the index and matcher, like if the bitmap has the row set the value matcher must also match the actual value at the row. The things that can produce bundles with both indexes and matchers are actual AND filters, and also filters that have ColumnIndexCapabilities.isExact() with a false value (none exist today, but planning for the future). I tried to explain this in the javadocs for FilterBundle, lmk if it isn't clear enough.

Anyway, because a bundle with both is an AND, an OR filter can only ever produce index only bundles or matcher only bundles, which is why we convert the child 'index only' bundles and 'index and matcher' bundles into matcher bundles with convertIndexToMatcherBundle and convertBundleToMatcherOnlyBundle respectively that use the indexes for cheaper matchers based on just checking the bitmap for a set bit instead of running the matcher on the row value.

The reason to union the index only bundles together is my gut says that having a single matcher bitmap to iterate would be a bit cheaper when doing the row scan than having a bunch of separate matchers with their own bitmaps to iterate. We can't combine the 'index and matcher' bundles into this because they need to check their own index and matcher before being OR'd with the rest of the matchers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation!

Copy link
Contributor

@imply-cheddar imply-cheddar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few nits and commentary, but approving anyway as I don't know that any of them are worthy of blocking the PR.

@@ -143,7 +143,7 @@ DimensionMergerV9 makeMerger(
*/
Comparator<ColumnValueSelector> getEncodedValueSelectorComparator();

/**
/**q
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: q?

:trollface: qq

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol oops, i suspect this is an artifact of stupid keyboard lag when switching 'spaces' to the one with my terminal on mac

Comment on lines +115 to +116
// filterBundle will only be null if the filter itself is null, otherwise check to see if the filter
// can use an index
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the point is to know if the filter was set or not, can we not change it to explicitly check if the filter was set instead of asking for a bundle to be made and believing that null from that means that the filter didn't exist?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, i guess i did it this way the nullability check doesn't associate filter bundle not being null with filter not being null which makes it kind of ugly Screenshot 2024-02-28 at 12 52 56 AM

// an empty index means the matcher can be skipped
emptyCount++;
} else {
if (!bundle.hasMatcher()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you really have to invert this? :P

Comment on lines 159 to 163
new FilterBundle.SimpleIndexBundle(
indexOnlyBundles.get(0).getIndexInfo(),
index,
indexOnlyBundles.get(0).getIndexCapabilities()
),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why isn't this just indexOnlyBundles.get(0)? In this case, the index had better be the same thing as the index in the index-only bundle right?

I'm not sure it really matters, just found myself wondering if there's a reason that this code needs to build a new object.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, this should be just re-using the same indexBundle, i think maybe it is stale from before i reworked some stuff and i forgot to clean it up

() -> "OR",
selectionRowCount,
totalBitmapConstructTimeNs,
indexOnlyBundles.stream().map(FilterBundle.IndexBundle::getIndexInfo).collect(Collectors.toList())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how often this is going to be called, but I've found that the performance of the java streaming API is pretty abysmal. If this was forced into a lazy evaluation that only matter when we are doing debug logging, I wouldn't care, but it looks like it's forced CPU and usage here...

Is there a way to shove this into the lambda for the toString instead (so that this isn't done unless it's actually necessary?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should only be called once, but I can make it an iterable on the bundles so that can make this lazy the same way the string is

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants