Skip to content

Turn on expand identifier in SqlValidator config#11457

Merged
Jackie-Jiang merged 1 commit intoapache:masterfrom
xiangfu0:sqlvalidator-expand-identifier
Aug 30, 2023
Merged

Turn on expand identifier in SqlValidator config#11457
Jackie-Jiang merged 1 commit intoapache:masterfrom
xiangfu0:sqlvalidator-expand-identifier

Conversation

@xiangfu0
Copy link
Contributor

@xiangfu0 xiangfu0 commented Aug 30, 2023

Fix for #11447
For sql:

SELECT toBase64(toUtf8(AirlineID)) as encoded
FROM mytable
GROUP BY toBase64(toUtf8(AirlineID))
LIMIT 10

The issue is that calcite expand group by toBase64(toUtf8(AirlineID)) to toBase64(toUtf8(CAST(mytable.AirlineID AS VARCHAR CHARACTER SET ISO-8859-1))) but not the expression in SELECT.

Enabling identifier expansion is useful in situations where you want to be explicit about the source of each column or table. It can help in reducing ambiguity and making the query more self-describing.

@xiangfu0 xiangfu0 force-pushed the sqlvalidator-expand-identifier branch from 5a77c45 to eb0c81d Compare August 30, 2023 00:56
@codecov-commenter
Copy link

codecov-commenter commented Aug 30, 2023

Codecov Report

Merging #11457 (84c3221) into master (399f033) will increase coverage by 0.01%.
Report is 4 commits behind head on master.
The diff coverage is 100.00%.

@@             Coverage Diff              @@
##             master   #11457      +/-   ##
============================================
+ Coverage     62.99%   63.00%   +0.01%     
- Complexity     1094     1098       +4     
============================================
  Files          2302     2302              
  Lines        124025   124041      +16     
  Branches      18901    18903       +2     
============================================
+ Hits          78126    78156      +30     
+ Misses        40351    40333      -18     
- Partials       5548     5552       +4     
Flag Coverage Δ
integration <0.01% <0.00%> (ø)
integration1 <0.01% <0.00%> (ø)
integration2 0.00% <0.00%> (ø)
java-11 62.95% <100.00%> (-0.01%) ⬇️
java-17 62.87% <100.00%> (+0.02%) ⬆️
java-20 62.84% <100.00%> (-0.01%) ⬇️
temurin 63.00% <100.00%> (+0.01%) ⬆️
unittests 63.00% <100.00%> (+0.01%) ⬆️
unittests1 67.55% <100.00%> (+0.02%) ⬆️
unittests2 14.47% <0.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Changed Coverage Δ
...ava/org/apache/pinot/query/validate/Validator.java 80.00% <100.00%> (+0.68%) ⬆️

... and 18 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@xiangfu0 xiangfu0 force-pushed the sqlvalidator-expand-identifier branch from eb0c81d to 84c3221 Compare August 30, 2023 02:38
@xiangfu0 xiangfu0 mentioned this pull request Aug 30, 2023
@Jackie-Jiang Jackie-Jiang added bugfix multi-stage Related to the multi-stage query engine labels Aug 30, 2023
@Jackie-Jiang Jackie-Jiang merged commit 0817c5e into apache:master Aug 30, 2023
@xiangfu0 xiangfu0 deleted the sqlvalidator-expand-identifier branch August 31, 2023 05:27
@gortiz
Copy link
Contributor

gortiz commented Sep 1, 2023

This PR seems to produce a regression where select * from table returns the $rowId, $hostname, etc that should not be returned

@walterddr
Copy link
Contributor

walterddr commented Sep 18, 2023

i was wondering if there's a better way to handle it. expanding both into a CAST seems a bit overkill b/c we might not really want to perform the cast operation, is there any rule in Calcite during parsing time to convert the group-by into just an index reference (e.g. SELECT toBase64(toUtf8(AirlineID)) as encoded FROM mytable GROUP BY encoded or ... GROUP BY 1 )

Comment on lines +533 to +538
// Test select with group by order by limit
sqlQuery = "SELECT toBase64(toUtf8(AirlineID)) "
+ "FROM mytable "
+ "GROUP BY toBase64(toUtf8(AirlineID)) "
+ "ORDER BY toBase64(toUtf8(AirlineID)) DESC "
+ "LIMIT 10";
Copy link
Contributor

@walterddr walterddr Sep 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's a problem with this.

SELECT toBase64(toUtf8(AirlineID)) AS AirlineID
FROM mytable
GROUP BY toBase64(toUtf8(AirlineID))
ORDER BY toBase64(toUtf8(AirlineID)) DESC 
LIMIT 10

will failed b/c of the reference to AirlineID is both in the function toBase64 and AS aliasing
did this query ever succeed before the expand identifier?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bugfix multi-stage Related to the multi-stage query engine

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants