Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support remote scan in physical plan level #1112

Closed
3 of 4 tasks
Rachelint opened this issue Jul 28, 2023 · 0 comments · Fixed by #1204
Closed
3 of 4 tasks

Support remote scan in physical plan level #1112

Rachelint opened this issue Jul 28, 2023 · 0 comments · Fixed by #1204
Labels
A-query-engine Area: Query engine feature New feature or request

Comments

@Rachelint
Copy link
Contributor

Rachelint commented Jul 28, 2023

Describe This Problem

As the first step of #1108 , I prefer to refactor our impl for remote scanning.The refactor maybe includes following domain:

  • Make RemoteEngine able to execute physical plan in remote nodes.
  • Identify and rewrite the physical plan nodes to a remote executing ones(In the first step, just rewrite the table scan).

Proposal

  • Impl UnresolvedPartitionedScan, ResolvedPartitionedScan, UnresolvedSubTableScan, ResolvedSubTableScan.
  • Generate UnresolvedPartitionedScan when scan partitioned table.
    • Define a dedicated TableProviderAdapter for PartitionedTable.
    • Judge and decide to generate which TableProviderAdapter(according to if PartitionInfo exists).
  • Define Resolver to convert UnresolvedPartitionedScan to ResolvedPartitionedScan.
  • Send the UnresolvedSubTableScan to remote and use the SubTableScanExecutor to run it.

Additional Context

  • Some possible optimizations:

    • Can we generate ResolvedPartitionedScan directly rather than generating UnresolvedPartitionedScan?
    • More elegant way(physical optimization rule?) to resolve the UnresolvedPartitionedScan?
  • Communication between remote plan execution client side and server side.
    remote plan

No response

@Rachelint Rachelint added the feature New feature or request label Jul 28, 2023
@jiacai2050 jiacai2050 added A-analytic-engine Area: Analytic Engine A-query-engine Area: Query engine and removed A-analytic-engine Area: Analytic Engine labels Aug 2, 2023
ShiKaiWi pushed a commit that referenced this issue Aug 7, 2023
## Rationale
Close #1136 
Part of #1112 

The query engine impl now is too messy, you can #1136 . Worse, it can't
support physical plan's remote execution, which is necessary for #1112.

In this pr, I refactor it for solving problems mentioned above.

## Detailed Changes
+ Split `physical plan` and `physical planner`'s trait definition and
impl.
+ Split `physical plan`'s creation and execution.
+ Modify the call path for making it runable again.

## Test Plan
Test by exist tests.

---------

Co-authored-by: tanruixiang <tanruixiang0104@gmail.com>
Rachelint added a commit that referenced this issue Aug 17, 2023
…tioned table (#1148)

## Rationale
Part of #1112 

## Detailed Changes
+ Define dedicated table provider for partitioned table.
+ Generate `UnresolvedPartitionedScan` for partitioned table rather than
`ScanTable`.
+ Disable `UnresolvedPartitionedScan`'s generation in normal running,
and just enable in unit test temporarily.

## Test Plan
Test by new ut.
Rachelint added a commit that referenced this issue Aug 21, 2023
…an holding it (#1163)

## Rationale
Part of #1112 
I found we holding the execution context in
`DataFusionPhysicalPlanImpl`, rather than passing it as a param when
executing like what datafusion suggests to do.

It lead to problems when I plan to deserialize bytes to
`DataFusionPhysicalPlanImpl`...

## Detailed Changes
+ Remove `SessionContext` from `DataFusionPhysicalPlanImpl`.
+ Pass it as a param when executing the plan.
+ Wrap all common part to `DfContextBuilder` and use it to build
`SessionContext` in `DatafusionExecutorImpl` and
`DatafusionPhysicalPlannerImpl`.

## Test Plan
Test by exist tests.
Rachelint added a commit that referenced this issue Aug 25, 2023
…xecutable(resolving process) (#1161)

## Rationale
Part of #1112 
CeresDB is able to generate the specific physical plan for partitioned
table in #1148, but it is inexecutable.
This pr introduce the `resolver` to make it executable.  

## Detailed Changes
+ Add `Resolver` to convert the `UnresolvePartitionedScan` to final
executable scan plans of sub tables.
+ Add test for the resolving process.

## Test Plan
Test by new ut.
Rachelint added a commit that referenced this issue Aug 29, 2023
## Rationale
Part of #1112 
If we want to send plan to remote, we must encode it first.

## Detailed Changes
+ Impl extension codec of dist_sql_query related physical plans.
+ Remove the useless `PhysicalPlanCodec` and its impl.
+ Refactor the pb converting about `ReadRequest`.

## Test Plan
Test by new ut.
ShiKaiWi pushed a commit that referenced this issue Sep 8, 2023
## Rationale
Part of #1112 
The old resolver has some problems, need to refactor before adding it
into the main query process.

## Detailed Changes
+ Refactor `Resolver` in dist sql query.

## Test Plan
Test by richer unit tests.
Rachelint added a commit that referenced this issue Sep 27, 2023
## Rationale
Closes #1112 
The final part of #1112, we may do distributed query in the new way
after this pr.

## Detailed Changes
+ Impl rpc service to support physical plan's remote execution.
+ Refactor query engine to support the new query process.

## Test Plan
Test by exist tests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-query-engine Area: Query engine feature New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants