-
Notifications
You must be signed in to change notification settings - Fork 181
Description
Is your feature request related to a problem?
One of the key technical challenge in #719 is how to maintain the consistency between base table (S3 data) and derived table (OpenSearch index/materialized view).
What solution would you like?
One solution for the problem is to refresh new data from S3 to OpenSearch incrementally. We are proposing to enhance our query engine by unifying the batch processing and stream processing capability in single architecture as existing solution in Apache Flink and Spark. In particular, the enhancement includes changes in query planning, query execution engine and query plan itself.
PoC branch: https://github.com/opensearch-project/sql/tree/poc/maximus-m1. User manual and design doc in details will be published later as planned below.
What alternatives have you considered?
The alternative solution is rebuild the derived table (full refresh) on user demand or regular basis. This can be done by current batch processing architecture, however, introduce significant overhead for large S3 dataset it will.
Do you have any additional context?
Phase 1
Goal:
- Ready for performance evaluation
- Ready for feature evaluation
- Missing
- Failure recovery
- Security
Tasks
- Infra Enhancement
- Add StreamPlan and MicroBatchExecution #968
- Add Stream Source #969
- Add Table Write Operator #1093
- Add Windowing Support #951
- Add Watermark Support #953
- Query Plan Enhancement for Stream Processing #954
- Deprecate span collector #990
- Refactor AggregateOperator to support stream processing
- Support Streaming Query in Query Language #955
- Add INSERT STREAM statement
- Add CREATE TABLE statement. https://github.com/penghuo/os-sql/tree/hp/test/maximus-m1
- Filesystem connector #972
- Performance Test for S3 Streaming Ingestion Queries #1151
Phase 2
Goal:
- Ready for experimental release
- Missing
- Pipeline Execution
- Distributed Execution
Tasks
- Enhancement
- Fault Tolerant
- Security
- Use cases related feature
- object/array support
- full text search capability in streaming - match
- Test
- Documentation
- User Interface
Phase 3
Goal:
- Ready for production deployment
Tasks
- Pipeline Execution
- Distributed Execution
Metadata
Metadata
Labels
Type
Projects
Status