This repository was archived by the owner on Jun 14, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 115
This repository was archived by the owner on Jun 14, 2024. It is now read-only.
Enable Hybrid Scan by default #333
Copy link
Copy link
Open
Labels
enhancementNew feature or requestNew feature or requestuntriagedThis is the default tag for a newly created issueThis is the default tag for a newly created issue
Description
Currently, Hybrid Scan is disabled by default.
Hybrid Scan can be efficient if the candidate index improves the query performance a lot, so that the query could show a good performance even with Hybrid Scan overhead (on-the-fly shuffle of appended data, merging, excluding deleted data ..)
In order to enable Hybrid Scan by default, we need to add some barriers and optimizations to avoid regressions from the Hybrid Scan.
- Rank algorithm
- compare common source data size in rank functions (Fix rank algorithm for Hybrid Scan #164, merged)
- we could improve this by using appended / deleted data (todo)
- Dataset similarity threshold
- add threshold configs for appended / deleted data (Add similarity thresholds for Hybrid Scan #300, merged)
- allow 30% of appended data and 20% of deleted data by default
- add threshold configs for appended / deleted data (Add similarity thresholds for Hybrid Scan #300, merged)
- Optimization of rule application (since hybrid scan can take longer to transform the plan, compared to non-hybrid scan case)
- introduce IndexLogEntryTags to avoid duplicate calculation
- InMemoryFileIndex cache (Add new IndexLogEntryTags to cache InMemoryFileIndex #324, pending)
- getCandidateIndexes (Add new IndexLogEntryTag to avoid duplicate calculation in getCandidateIndexes #293, pending)
- introduce IndexLogEntryTags to avoid duplicate calculation
- Regression check
- shuffle count check for hybrid scan (Check and remove unnecessary shuffle added by Hybrid Scan #331, pending)
- Performance optimization
- bucketed scan for filter index ( Support adaptive bucketed scan for FilterIndexRule #332, pending)
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or requestuntriagedThis is the default tag for a newly created issueThis is the default tag for a newly created issue