-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mismatch between schema and batches on a CREATE TABLE with a windowing query #5695
Comments
Actually the problem is the column name mismatches between schema and the batch
The reason for this I believe in for window function the name is shortened and looks not consistent
once you wrap it into the function it stops name shorten
|
as a workarounds you may want to use
|
Great, thank you for looking into this! I also discovered a similar workaround (and added it into the Additional context). We might have observed the same issue outside of windowing functions; I'll see if I can create more repros. |
I'm looking into this today! |
@milevin I have looked into the code and another workaround, more natural is to give an alias
The code currently uses alias if its given or shortened the name to prevent huge unreadable names. |
I don't know the context of why this was done / if we should do something more fancy Perhaps @ozankabak and @mustafasrepo have some input into this |
@comphead Here is another instance
Note: no windowing here, and all the columns are aliased |
You beat me to it, I looked around and was about to write that this seems to be a more general issue than windowing 🙂 |
Yes @ozankabak the problem is broader, for the case above the diff is:
I'm checking if its related to WITH |
Is there a bug in the logic of inferring output column names? Note how in this example, DF is assigning (It is my understanding that in a SELECT with a UNION clause, the output column names are derived from the first SELECT sub-clause; perhaps somebody can dig through the standard to confirm that.) Either way, these look like two different issues, both manifesting as the mismatch error. |
you are right, col names from first union all branch are the driving This case is not correct, col names has to be
If I remove order by I'm getting even more surprising
The bug partially related to wrong col name derivation in UNION ALL
I will prepare a fix for UNION ALL first and then test out other scenarios, like not deterministic column naming with and without ORDER BY |
@comphead Another day another "Mismatch" instance:
Looks unrelated to the other two cases I provided. |
In this last case, the issue is with inferring nullability incorrectly.
It incorrectly infers that col4 is non-nullable. It should be nullable. Any idea which code is responsible for this? |
Thanks @milevin for reporting all of this. Tbh I think there are multiple places. I’ll start with Union all first. For nullability check we probably need to weaken the check itself to test names and data types only |
Describe the bug
This doesn't work but should:
To Reproduce
See above
Expected behavior
This should not throw the mismatch error
Additional context
http://sqlfiddle.com/#!17/1d310/1
Note: if I slap round(...) around the window expression, it begins to work:
The text was updated successfully, but these errors were encountered: