Add customizable parser module#8484
Conversation
| /** | ||
| * INSERT INTO [db_name.]table_name | ||
| * FROM [ FILE | ARCHIVE ] 'file_uri' [, [ FILE | ARCHIVE ] 'file_uri' ] | ||
| */ |
|
Don't we need to write new handlers (java code) for the new grammar ? |
Yes, I'm on the java part. We plan to add the query syntax to insert raw data files into pinot using the query console. See the examples here: #8465 (comment) @walterddr is extending the Calcite dialect to support new syntax then we can hook this with parser logic to this new ad-hoc minion task endpoint to execute tasks. |
|
this is still in POC phase but we were able to incorporate a new syntax and a new SqlCall node during parsing phase. caveat. no longer using BABEL parser factory. let's see how many tests failed b/c of this change |
walterddr
left a comment
There was a problem hiding this comment.
updated PR:
- Included a full list of BABEL overwrite so the parserImpls.ftl is quite large.
- If we want to separately release the fmpp plugin I created: https://github.com/walterddr/fmpp-maven-plugin
Codecov Report
@@ Coverage Diff @@
## master #8484 +/- ##
============================================
- Coverage 70.77% 67.00% -3.78%
+ Complexity 4281 4202 -79
============================================
Files 1664 1274 -390
Lines 87356 64076 -23280
Branches 13229 10053 -3176
============================================
- Hits 61826 42931 -18895
+ Misses 21234 18062 -3172
+ Partials 4296 3083 -1213
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
| </java> | ||
| </configuration> | ||
| </plugin> | ||
| <plugin> |
There was a problem hiding this comment.
need to sync the pom with pinot-common-jdk8 module
we can move fmpp to org: https://github.com/pinot-contrib ? |
531186a to
d61e541
Compare
d61e541 to
4ee7965
Compare
There was a problem hiding this comment.
High level question: Do we introduce this custom parser to accelerate the parsing but keep the same preserved words as BABEL? Can we get the same performance if switching to DEFAULT? I feel it might not worth the maintenance overhead if it is just for some extra preserved words.
If it is a must have in order to support more custom query syntax support, then go for it
| + " ELSE 'The quantity is under 30'\n" | ||
| + "END AS QuantityText\n" | ||
| + "FROM OrderDetails"); | ||
| "SELECT OrderID, Quantity,\n" + "CASE\n" + " WHEN Quantity > 30 THEN 'The quantity is greater than 30'\n" |
There was a problem hiding this comment.
(format) Please configure the formatter to honor formatter comments (Code Style -> Formatter -> check Turn formatter on/off with markers in code comments)
The main goal is to introduce new SQL syntax for |
the goal is not accelerate parsing but to get precise control on our SQL syntax
yes, we copied the whole list for backward compatibility but ideally speaking we should only kept the list of non-reserved keywords that we support in
Switching to DEFAULT or MYSQL_5 standard is desired if possible (see my TODO comment), but we still need to customized the list of non-reserved keyword + maintain backward compatibility until we can warn users that certain reserved keywords (in BABEL) will be removed in the future (if we decided to do so)
there shouldn't be any additional maintenance overhead comparing with using BABEL. As the syntax codegen rarely changed. and it will remain functional unless we change calcite version (which will causes maintenance overhead anyway if use BABEL) |
| } | ||
| return report; | ||
| } | ||
| } No newline at end of file |
| + " ELSE 'The quantity is under 30'\n" | ||
| + "END AS QuantityText\n" | ||
| + "FROM OrderDetails"); | ||
| "SELECT OrderID, Quantity,\n" + "CASE\n" + " WHEN Quantity > 30 THEN 'The quantity is greater than 30'\n" |
| @Test | ||
| public void testParserExtensionImpl() { | ||
| SqlNode sqlNode = testSqlWithCustomSqlParser("INSERT INTO db.tbl " | ||
| + "FROM FILE 'file:///tmp/file1', FILE 'file:///tmp/file2'"); |
There was a problem hiding this comment.
maybe also add a test for INSERT with OPTIONS?
There was a problem hiding this comment.
could you give an example? (i am not sure that's supported in my parser syntax)
There was a problem hiding this comment.
if you are referring to the option regex. that is not part of the change and will be removed from the SQL string before entering the parser anyway
There was a problem hiding this comment.
I'm referring to something like:
INSERT INTO "baseballStats"
FROM FILE s3://<my-bucket>/baseballStats/rawdata/
OPTION(taskName=myTask-s3)
OPTION(input.fs.className=org.apache.pinot.plugin.filesystem.S3PinotFS)
OPTION(input.fs.prop.accessKey=<my-key>)
OPTION(input.fs.prop.secretKey=<my-secret>)
OPTION(input.fs.prop.region=us-west-2)
If it's already handled before then no problem from here.
There was a problem hiding this comment.
So I think we are still using the BABEL conformance but removing the MYSQL lex analyzer ? There won't be any impact of this on existing queries that are running fine or are otherwise unsupported ?
There was a problem hiding this comment.
Lex is different from conformance. Lex basically is equivalent to the line
sqlParser.switchTo(SqlAbstractParserImpl.LexicalState.DQID);
There was a problem hiding this comment.
Understood. My concern was mainly around ifLex.MYSQL_ANSI will give same behavior as SqlAbstractParserImpl.LexicalState.DQID ?
There was a problem hiding this comment.
yes they are. If you check the config impl behind the scene it uses the LEX enum to see a whole bunch of settings I did here as well
There was a problem hiding this comment.
IIUC, the current code SqlParser sqlParser = SqlParser.create(expression, PARSER_CONFIG); will return SqlBabelParserImpl (created from SqlBabelParserImplFactory).
In this PR, I think we are going to use SqlParserImpl (SqlParserImplFactory) instead of SqlBabelParserImpl and that is going to give the same behavior as today because SqlConformance is still set to BABEL ?
There was a problem hiding this comment.
partially, also because we copied the BABEL config.fmpp content over.
There was a problem hiding this comment.
(nit) brief javadoc please
There was a problem hiding this comment.
(nit) javadoc please ?
There was a problem hiding this comment.
Are all cases covered by the new test in CalciteSqlCompilerTest.java ?
There was a problem hiding this comment.
yes. (actually there's one only, no alternative/branch)
2bd4a15 to
4df0f48
Compare
This PR added a customizable parser syntax extension.
Details:
config.fmppover Calcite's default.parserImpls.ftlnonReservedKeywordsbut we kept the full list for backward compatibilitySqlInsertFromTablenode.