[multistage] Add Support for Values in Physical Optimizer#16221
[multistage] Add Support for Values in Physical Optimizer#16221ankitsultana merged 8 commits intoapache:masterfrom
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #16221 +/- ##
============================================
+ Coverage 62.90% 63.25% +0.35%
+ Complexity 1386 1362 -24
============================================
Files 2867 2962 +95
Lines 163354 171293 +7939
Branches 24952 26235 +1283
============================================
+ Hits 102755 108355 +5600
- Misses 52847 54730 +1883
- Partials 7752 8208 +456
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
@ankitsultana, if the Values node is being joined with a TableScan node, can we consider colocating the workers for the Values node to eliminate a shuffle? |
We can only do this if the scan was on a single node. If the scan is on multiple nodes, then we'll have to leverage "replicated execution" which is something I am working on to support lookup joins and supporting broadcast joins when a non dim-table has a complete copy available on all servers of the left side of the join. When the scan is on a single node, even with the existing approach we'll pick that node for values which will lead to colocation. |
| } else { | ||
| accumulateWorkers(currentNode, workerSet); | ||
| workers = List.of(sampleWorker(new ArrayList<>(workerSet))); | ||
| workers = List.of(String.format("0@%s", _context.getRandomInstanceId())); |
There was a problem hiding this comment.
nit: avoid String.format
| * Per-query unique context dedicated for the physical planner. | ||
| */ | ||
| public class PhysicalPlannerContext { | ||
| private static final Random RANDOM = new Random(); |
There was a problem hiding this comment.
could introduce some contention? maybe use ThreadLocalRandom?
Summary
We didn't support Values node yet because I wanted to think through some design decisions. This PR adds that support and the key decisions are called out below.
Values is NOT Part of Leaf Stage
We don't consider Values as a part of Leaf Stage, and values will have its worker assigned in either WorkerExchangeAssignmentRule, or the LiteModeWorkerssignmentRule.
This is also consistent with
ServerPlanRequestVisitor.Worker Assignment for Values
For now we will assign a random worker for the Values node. This logic is moved to
PhysicalPlannerContext, since at present we store the encountered instances in there. Moreover, I am re-using the same method in Lite Mode worker assignment.This also means that if a plan does not contain any table-scans, we will automatically assign the broker to all plan nodes, thereby executing the entire query in the broker. (no special handling required)
Solving Values Only Query in Broker
With v2 optimizer if a query plan does not consist of any table scan, we will solve it within broker itself.
Test Plan
Important
I realized that one of the tests in
ResourceBasedQueriesTestis not yet enabled for the v2 optimizer. On enabling it I found that around 8 queries out of ~520 are failing, most of them are related to table partition hints.Quickstart Tests
Ran the following queries: