Skip to content

Conversation

mihailotim-db
Copy link
Contributor

@mihailotim-db mihailotim-db commented Sep 26, 2025

What changes were proposed in this pull request?

Delay resolveColsLastResort until all UnresolvedAliases that come before the column that is being resolved, are resolved

Why are the changes needed?

For the follwing query:

DECLARE a = 'aa';
SELECT 'a', a;

Spark incorrectly resolves the second a column to the variable instead of resolving it as a lateral column alias reference to the implicit alias of literal 'a'. This is not consistent with the current intended behavior and name resolution precedence in Spark:

DECLARE a = 'aa';
SELECT 'a' AS a, a; -- second column resolved as LCA
SELECT 'b', b -- second column resolved to the implicit alias of literal 'b'

Similarly, the fix applies to precedence of LCAs over outer references as in this query:

SELECT col1 
FROM VALUES(1)
WHERE EXISTS (SELECT 'col1', col1);

Does this PR introduce any user-facing change?

Yes, user now sees the correct result

How was this patch tested?

Added golden file tests for affected queries.

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the SQL label Sep 26, 2025
@mihailotim-db mihailotim-db force-pushed the mihailo-timotic_data/fix_lca_correctness branch 2 times, most recently from 174d1f1 to a696d0d Compare September 30, 2025 08:05
Copy link
Contributor

@mihailoale-db mihailoale-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM after expanding tests a bit.

@mihailotim-db mihailotim-db force-pushed the mihailo-timotic_data/fix_lca_correctness branch from a696d0d to ff85dcd Compare September 30, 2025 13:05
SELECT 'a' AS a, a;

SELECT 'a', a FROM VALUES(1) AS t(a);
SELECT 'a' AS a, a FROM VALUES(1) AS t(a);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please use a view instead of VALUES? The lines would get shorter.

SELECT 'a' AS a, a;

SELECT 'a', a FROM VALUES(1) AS t(a);
SELECT 'a' AS a, a FROM VALUES(1) AS t(a);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think arguments for CREATE FUNCTION might also be affected - they are resolved after LCAs and before session variables

SELECT 'a', a FROM VALUES(1) AS t(a);
SELECT 'a' AS a, a FROM VALUES(1) AS t(a);

SELECT col1 FROM VALUES(1) WHERE EXISTS (SELECT 'col1', col1);
Copy link
Contributor

@vladimirg-db vladimirg-db Sep 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have the logic for Project, but the logic for Aggregate is either missing or non-obvious.

We need the same tests with with GROUP BY.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants