Skip to content

query option to use parser.jj extension instead of regex#8880

Closed
walterddr wants to merge 4 commits intoapache:masterfrom
walterddr:fix_query_option
Closed

query option to use parser.jj extension instead of regex#8880
walterddr wants to merge 4 commits intoapache:masterfrom
walterddr:fix_query_option

Conversation

@walterddr
Copy link
Contributor

@walterddr walterddr commented Jun 10, 2022

Background

Currently Query Options are parsed via regex match and replaced with empty string. This causes SQLi issue.

Solution

  • using parser.jj extension.
  • backward-incompatible change, OPTIONS now can only be supported outside of a SQL statement (e.g. beginning or end of SQL string)

Release Note:

This is a backward incompatible change for utilizing OPTION keyword.

  • no longer allowed OPTION(key=value) ANYWHERE in the SQL query string; with this change only allow at the beginning or the end of the SQL string payload.
  • new syntax also requires identifier to literal syntax, for example:
    OPTION(foo=bar, alice=123, bob='potato', complex.key=valueLiteral)
    is now:
    OPTION(foo='bar', alice='123', bob='''potato''', "complex.key"='valueLiteral');
  • Note that standard SQL syntax statement separateor ; must be added between statements. both among multiple OPTION definitions; and between OPTION definitions and SQL DQL/DML statement.

Additional Remark

This new syntax can theoretically support other literal types for example:

  • OPTION(foo='bar', alice=123) --> Map<String, Object>.of("foo", (String) bar", "alice", (Integer) 123);
  • OPTION(foo=true) --> Map<String, Object>.of("foo", (boolean) true);

However since currently all k-v options support by Pinot is a string/string mapping, we will defer this to a new feature if necessary

Appendix

Full SQL syntax algebra

statementList:
      [ OPTION '(' option [, option ]* ')'; ]
      statement;
      [ OPTION '(' option [, option ]* ')'; ]

option:
      identifier
      '='
      literal

identifier:
      key | "complex.key"

literal
      QuotedLiteralString

Copy link
Member

@richardstartin richardstartin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔥

sql = removeOptionsFromSql(sql);
}

try (StringReader inStream = new StringReader(sql)) {
Copy link
Contributor

@siddharthteotia siddharthteotia Jun 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related to this PR --

Looks like this function was added when INSERT grammar extension was done and is being called from broker query endpoint in PinotClientRequest.java.

It seems like for DQL, the caller will parse the SQL statement twice. Once at line 153 when it calls this method in the parser to retrieve SqlNodeAndOptions and then if the type is not DML, it sends the query to requestHandler.handleRequest(sql) which will call the parser again (compileToPinotQuery(sql)) and the statement will be parsed again.

Not sure if this is expected. I just noticed it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i dont think so but yeah I will take a look to address this as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually the logic here is a bit more complex than simple reduce the compilation. I will create another PR to follow up.


pinotQuery = CalciteSqlParser.compileToPinotQuery(
"select * from vegetables where name <> 'Brussels sprouts' OPTION (delicious=yes)");
"OPTION (delicious='yes'); select * from vegetables where name <> 'Brussels sprouts'");
Copy link
Contributor

@siddharthteotia siddharthteotia Jun 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also add tests using examples called out in https://blog.doyensec.com/2022/06/09/apache-pinot-sqli-rce.html#query-options. Since they now don't fit in the grammar, we can check if they fail ?

Copy link
Contributor Author

@walterddr walterddr Jun 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh yes absolutely. I will add to the test set

@siddharthteotia
Copy link
Contributor

siddharthteotia commented Jun 10, 2022

I will spend sometime examining the production logs to see how OPTION is being used in production (if at all) and how it will break with this change. Will get back with the info by early next week.

@codecov-commenter
Copy link

codecov-commenter commented Jun 11, 2022

Codecov Report

Merging #8880 (4183b88) into master (ea564f0) will decrease coverage by 0.11%.
The diff coverage is 76.92%.

@@             Coverage Diff              @@
##             master    #8880      +/-   ##
============================================
- Coverage     69.78%   69.67%   -0.12%     
- Complexity     4672     4680       +8     
============================================
  Files          1803     1807       +4     
  Lines         93752    94212     +460     
  Branches      13935    14053     +118     
============================================
+ Hits          65426    65643     +217     
- Misses        23790    24011     +221     
- Partials       4536     4558      +22     
Flag Coverage Δ
integration1 26.31% <69.23%> (-0.06%) ⬇️
integration2 24.89% <65.38%> (-0.23%) ⬇️
unittests1 66.37% <77.55%> (-0.18%) ⬇️
unittests2 15.39% <0.00%> (-0.06%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...rg/apache/pinot/sql/parsers/parser/SqlOptions.java 45.45% <45.45%> (ø)
...t/controller/api/resources/PinotQueryResource.java 59.05% <50.00%> (ø)
...org/apache/pinot/sql/parsers/CalciteSqlParser.java 87.66% <85.29%> (-0.30%) ⬇️
...pinot/broker/api/resources/PinotClientRequest.java 58.49% <100.00%> (ø)
...rg/apache/pinot/sql/parsers/SqlNodeAndOptions.java 100.00% <100.00%> (ø)
...g/apache/pinot/sql/parsers/dml/InsertIntoFile.java 85.18% <100.00%> (ø)
...raw/RawDoubleSingleColumnDistinctOnlyExecutor.java 61.11% <0.00%> (-38.89%) ⬇️
...y/distinct/raw/RawMultiColumnDistinctExecutor.java 52.04% <0.00%> (-36.85%) ⬇️
...ct/raw/RawIntSingleColumnDistinctOnlyExecutor.java 44.44% <0.00%> (-35.56%) ⬇️
...t/raw/RawLongSingleColumnDistinctOnlyExecutor.java 44.44% <0.00%> (-35.56%) ⬇️
... and 59 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ea564f0...4183b88. Read the comment docs.

@walterddr walterddr marked this pull request as ready for review June 11, 2022 02:37
@Jackie-Jiang Jackie-Jiang added enhancement release-notes Referenced by PRs that need attention when compiling the next release notes backward-incompat Referenced by PRs that introduce or fix backward compat issues labels Jun 15, 2022
@Jackie-Jiang
Copy link
Contributor

This is awesome! Should we consider also supporting identifier as value? That way the current way of putting options can work

@kishoreg
Copy link
Member

+1 thanks @walterddr for adding this

@Jackie-Jiang
Copy link
Contributor

Jackie-Jiang commented Jun 15, 2022

Actually after a second thought, I think it is okay to keep the current way without affecting current query options. The only query option value that is not number/boolean is FORCE_HLC which I believe no one is using. @siddharthteotia

There is a major issue not handled though. In PinotClientRequest.constructSqlQueryOptions(), the groupByMode and responseFormat value SQL cannot be parsed, and that can cause user not able to upgrade from 0.10.0. We need to quote the value if there is no easy way to support the previous format

@Jackie-Jiang
Copy link
Contributor

Just noticed that seems it requires an extra ; to split the statements, then all the existing query option won't work... We should document this change in the PR description

@walterddr
Copy link
Contributor Author

Actually after a second thought, I think it is okay to keep the current way without affecting current query options. The only query option value that is not number/boolean is FORCE_HLC which I believe no one is using. @siddharthteotia

adding OPTION(identifier = identifier) is possible but the identifier cannot start with a number so we are still not backward-compatible.

There is a major issue not handled though. In PinotClientRequest.constructSqlQueryOptions(), the groupByMode and responseFormat value SQL cannot be parsed, and that can cause user not able to upgrade from 0.10.0. We need to quote the value if there is no easy way to support the previous format

This is a good catch, will fix; but also do we know if there's any other places that adds the OPTION regex to the query?

Just noticed that seems it requires an extra ; to split the statements, then all the existing query option won't work... We should document this change in the PR description

Yes this is the standard SQL statement separator. however the ; before the <EOF> token can be skipped thus we don't have any issue so far. I will document this.

@walterddr walterddr mentioned this pull request Jun 16, 2022
@walterddr
Copy link
Contributor Author

closing this in favor of #8906 for a more systematic fix.

@walterddr walterddr closed this Jun 16, 2022
@walterddr walterddr deleted the fix_query_option branch December 6, 2023 16:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backward-incompat Referenced by PRs that introduce or fix backward compat issues enhancement release-notes Referenced by PRs that need attention when compiling the next release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants