Tracking issue: push down computation in distributed query #1108
Labels
A-query-engine
Area: Query engine
feature
New feature or request
tracking issue
Issue tracks progress for something
Describe This Problem
Now, we support the rough disrtibuted sql query by hooking in table scan level, that leading actual computation such as aggregated can't be pushed down...
So, I plan to refactor it, and support distributed query in plan level for pushing down more things.
Proposal
1. Background
The exist implementations can be divided into two ways:
As I see, they are almost same, the more clear way is to have the explicit distributed logical plan but it is the problem about code organization.
The real problem is should we depend on datafusion to do this? If we do it ourself, it may be more controllable? But it may need to design the complete physical plan generating process.
I think we should try to reuse the logic in datafusion first.
2. General
Works can be broken down as following:
RemoteEngine
.3. Two role of node in proposal
My proposal is designed as folliowing:
4. Process
TableScan
node is just a placeholder(can't execute actually) with some information for generating later executable plan, so I name itUnresolvePartitionedScan
.UnresolvePartitionedScan
). The sub plans are unable to execute likeUnresolvePartitionedScan
before being sent to and be rewriting in the executor nodes, so I name themUnresolveSubScan
s.UnresolveSubScan
is converted toResolvePartitionedScan
now.UnresolveSubScan
toResolveSubScan
using the carried information and catalog in local.Additional Context
No response
The text was updated successfully, but these errors were encountered: