Query rewrite with secondary index and materialized view #1435

dai-chen · 2023-03-13T23:11:38Z

Is your feature request related to a problem?

In #1379 and #1407, we're able to build secondary index and materialized view on Maximus table. However, there are still some work remaining for both query acceleration:

For secondary index, streaming query won't be rewritten and accelerated
For materialized view, user has to query materialized view explicitly instead of original base table

What solution would you like?

Rewrite query from user with available secondary index or materialized view on the Maximus table to to accelerate its execution.

What alternatives have you considered?

Consider query rewrite limitation and out of scope for now.

Do you have any additional context?

N/A

muralikpbhat · 2023-03-24T08:59:21Z

Nice. Will this be capable of acceleration (across fields) in case of covering index with selective fields?

Say, if we index only field 'a' into opensearch and the query is something like 'avg(b) where a startswith "blah"'. OpenSearch will answer the prefix query on 'blah*' since it is indexed, but are we planing to use those results in the spark scan for b while aggregating?
Also, given the results from blah* can come from multiple shards, how are we planning to parallelize spark scan? I am assuming spark treating OS as a whole and not integrating at shard level. It seems like the results from blah* has to be sent each spark partition and they will have to filter for those docid...? Will this be really faster than the scenario where both a and b are processed by spark?

dai-chen · 2023-03-27T23:25:52Z

@muralikpbhat Thanks for the comment!

I think your example is more for fine-grained filtering index. Actually we're focused on the following index data structure:

Fine-grained covering index or MV: the index or MV can answer the query by itself with need to look at source data again. In other word, if it's loaded into OpenSearch, the OS index can work alone with search or visualization.
Coarse-grained skipping index: after filtering by a startswith "blah", rather than telling which source row, skipping index will tell us which source file(s) may have the answer. With such file list, Spark will do the rest (planning job, fetching data, aggregate, join and then load into OS if the query comes from MV)

dai-chen · 2023-03-28T16:29:13Z

I'm closing this because query rewrite for MV is deprioritized.

The main reason is performance overhead introduced by current logical integration between OpenSearch and Spark. Thinking of MV as a managed Maximus table and rewrite query in Spark make sense. However, the cost of transferring data between Spark and OpenSearch maybe high. Meanwhile there are limitations in Spark SQL support for OpenSearch Dashboard which requires extension.

So we'd like to make MV data a regular OpenSearch index so OpenSearch can access it directly in current initial phase.

dai-chen added feature spark integration labels Mar 13, 2023

dai-chen self-assigned this Mar 13, 2023

github-actions bot added the untriaged label Mar 13, 2023

dai-chen removed the untriaged label Mar 13, 2023

penghuo mentioned this issue Jul 11, 2023

[Feature] OpenSearch and Apache Spark Integration opensearch-project/opensearch-spark#3

Closed

dai-chen changed the title ~~Query rewrite with materialized view~~ Query rewrite with secondary index and materialized view Mar 14, 2023

dai-chen closed this as completed Mar 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Query rewrite with secondary index and materialized view #1435

Query rewrite with secondary index and materialized view #1435

dai-chen commented Mar 13, 2023 •

edited

Loading

muralikpbhat commented Mar 24, 2023

dai-chen commented Mar 27, 2023

dai-chen commented Mar 28, 2023 •

edited

Loading

Query rewrite with secondary index and materialized view #1435

Query rewrite with secondary index and materialized view #1435

Comments

dai-chen commented Mar 13, 2023 • edited Loading

muralikpbhat commented Mar 24, 2023

dai-chen commented Mar 27, 2023

dai-chen commented Mar 28, 2023 • edited Loading

dai-chen commented Mar 13, 2023 •

edited

Loading

dai-chen commented Mar 28, 2023 •

edited

Loading