-
Notifications
You must be signed in to change notification settings - Fork 7
SNAP-1840 TPCH Q22 failure #68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@ymahajan The changes in #16 have always been very dicey to my mind without enough rationale just to get bootstrap to work or improve its performance. The blanket changes to WholeStage and GenerateSafeProjection were especially quite troubling to me though I have insufficient background to tell off-hand what the proper solution should be like @ahshahid asked for. Unfortunately I didn't have enough time to delve into greater details to tell exactly what was going on, and from the explanation could not tell why those changes were required in the first place. Some of it looked like perf improvements while others looked like fix to some issue. Latter should have been shown in a simplest possible plan first and then proposed upstream. It rather looks like those changes were just a band-aid to hide some other issue with the code generation of bootstrap and related code. This particular fix is just another band-aid. While I do not understand clearly why this change "fixes" the original issue introduced by #16 but this change looks harmless for now. However, I would like to track a highest priority issue of what changes #16 introduced, why they were necessary or are they any more, what are the other implications of that change etc. We now know SNAP-1840 is one of the implications but I fear that it is just the tip of iceberg. @hbhanawat do you have any comments on this? Do you understand what #16 actually tried to do, why it was required and whether my fear that it was just a band-aid to hide some other issues with bootstrap code generation has any merit? To my mind #16 lacks sufficient reasoning and should be taken out and replaced with the proper fix (unfortunately figuring that out might be more time consuming that I can afford at this point). |
Let me do this - I will back out all the changes in #16 AQP - https://github.com/SnappyDataInc/snappy-aqp/pull/102. This has 48 commits. On SnappyData - TIBCOSoftware/snappydata#397 I am assuming no changes were done on store and spark-jobserver side during #16 |
@ymahajan it might be slightly involved so can do this after 1.0. For now the workaround in this PR is ok but lets track the cleanup for 1.1 |
@sumwale I have reverted #16 changes. And moved the AQP specific handling to a separate rule. |
@ymahajan These changes look good. However, I am not sure how the new rule added to AQP side will only come into play for AQP queries because SnappyAQPSessionState is the one that is always created (at least in embedded mode but I think in connector mode too). |
@sumwale, yeah didn't realize that. Even if the precheckin is clean the original bug test still failing. So the above changes won't be sufficient. I tried adding an additional rule which insertsInputAdapter only for SampleTablePlan but that doesn't work either. I can't think of any other solution but supporting codegen for SampleTablePlan in all the cases, so that supportCodegen is always true for SampleTablePlan. |
The changes in 9d7e2ba cleaned up this area quite a bit with snappy-spark test suite being fully clean, so closing this PR. |
What changes were proposed in this pull request?
The TPCH Q22 with less data generates a plan involving
BroadcastHashJoin
. But if you increase more data in those tables, it usesSortMergeJoinExec
. And whenever theSortMergeJoinExec
(and LeftAnti) is selected and query is executed incorrectly withInputAdapter
. The changes which came from this checkin seem to be problematic. #16 It causes different execution paths to be taken. Also this will be different for embedded vs split mode. Tried to revert theWholeStageCodeGen
changes from this checkin but aqp dunits start failing. Ideally we should matchWholeStageCodeGen#insertWholeStageCodegen
that of stock spark. For now i have avoided cases involvingSortMergeJoinExec
. This bug reproduced using same Q22 in local mode after ingesting more data (to choose forSortMergeJoin
)BroadcastHashJoin
How was this patch tested?
clean precheckin
Other PRs ?
None