Add test case for lookup join#15244
Conversation
yashmayya
left a comment
There was a problem hiding this comment.
Thanks for adding this test @krishan1390, I've left a suggestion on how we can improve the robustness of the assertions being made and verify that the expected join strategy was used.
| String query = "select /*+ joinOptions(join_strategy='lookup') */ yearID, teamName from baseballStats " | ||
| + "join dimBaseballTeams ON baseballStats.teamID = dimBaseballTeams.teamID where playerId = 'aardsda01'"; | ||
| JsonNode jsonNode = postQuery(query); | ||
| long result = jsonNode.get("resultTable").get("rows").size(); | ||
| assertEquals(result, 3); |
There was a problem hiding this comment.
Can we also make assertions to verify that the lookup join strategy is actually being used? Few ideas off the top of my head:
EXPLAIN PLAN FOR...output for the query should have an exchange attached only to the non-dimensional table side of the join and not the dimensional side. And I believe the distribution type should be singleton as opposed to a hash distribution.- Similarly,
EXPLAIN IMPLEMENTATION PLAN FOR...output for the query should haveMAIL_SEND/MAIL_RECEIVEoperators only on the non dimensional table side of the join. - The actual query response should have the
LOOKUP_JOINoperator in the query stats.
There was a problem hiding this comment.
thanks.
The LOOKUP_JOIN operator is a robust verification. Verified that this check fails if I remove /* joinOptions(join_strategy='lookup') */ so its helpful. Added this.
The explain plan / implementation plan verifications are string pattern matching verifications. I think better to not verify explain plan / implementation plan in this integration test because its part of details of how lookup join is implemented. And any changes there would affect this test case unncessarily.
Let me know if thats fine or if you have other opinions.
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #15244 +/- ##
============================================
+ Coverage 61.75% 63.60% +1.85%
- Complexity 207 1459 +1252
============================================
Files 2436 2780 +344
Lines 133233 156665 +23432
Branches 20636 24035 +3399
============================================
+ Hits 82274 99647 +17373
- Misses 44911 49519 +4608
- Partials 6048 7499 +1451
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| "segmentIngestionType": "APPEND", | ||
| "segmentIngestionFrequency": "DAILY", |
| String query = "select /*+ joinOptions(join_strategy='lookup') */ yearID, teamName from baseballStats " | ||
| + "join dimBaseballTeams ON baseballStats.teamID = dimBaseballTeams.teamID where playerId = 'aardsda01'"; | ||
| JsonNode jsonNode = postQuery(query); | ||
| long result = jsonNode.get("resultTable").get("rows").size(); | ||
| assertEquals(result, 3); |
There was a problem hiding this comment.
This is an 8 MB file, I'm wondering if it's really necessary to introduce this? Can't we add some dummy data for a dimension table that can be joined with the existing On_Time_On_Time_Performance_2014_100k_subset_nonulls data? Maybe something like DayOfWeek integer primary key to DayOfWeekName strings (just one idea, can be anything really since we don't care about the actual results, just the lookup join functionality).
There was a problem hiding this comment.
yes was previously comparing the actual result of the join and seemed easier to verify on a smaller file like baseball stats than On_Time_On_Time_Performance_2014_100k_subset_nonulls.
but now that we're verifying the join through query operator, removed the assertion of actual count and used On_Time_On_Time_Performance_2014_100k_subset_nonulls table.
thanks for the suggestion.
| assertTrue(stages.contains("LOOKUP_JOIN"), "Could not find LOOKUP_JOIN stage in the query plan"); | ||
| assertFalse(stages.contains("HASH_JOIN"), "HASH_JOIN stage should not be present in the query plan"); |
There was a problem hiding this comment.
nit: LOOKUP_JOIN and HASH_JOIN are operators, not stages.
Addresses #15224
Verified that the test case breaks without #15223 with exception trace