-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding DML definition and parse SQL InsertFile #8557
Conversation
6163d00
to
15bf90b
Compare
Codecov Report
@@ Coverage Diff @@
## master #8557 +/- ##
============================================
+ Coverage 63.60% 66.45% +2.85%
+ Complexity 4322 4235 -87
============================================
Files 1648 1287 -361
Lines 86917 64697 -22220
Branches 13265 10139 -3126
============================================
- Hits 55281 42996 -12285
+ Misses 27595 18575 -9020
+ Partials 4041 3126 -915
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
f955d3b
to
0cee37f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
High level question: should we reuse the current SQL endpoint (HTTP GET), or introduce a new one. How do other systems handle it?
...t-controller/src/main/java/org/apache/pinot/controller/api/resources/PinotQueryResource.java
Outdated
Show resolved
Hide resolved
httpHeaders.getRequestHeaders().entrySet().stream().filter(entry -> !entry.getValue().isEmpty()) | ||
.map(entry -> Pair.of(entry.getKey(), entry.getValue().get(0))) | ||
.collect(Collectors.toMap(Pair::getKey, Pair::getValue)); | ||
return _sqlQueryExecutor.executeDMLStatement(sqlNodeAndOptions, headers).toJsonString(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is executed on controller, we can skip sending the request again
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DML, DDL statements are not latency-sensitive statements, so it might be ok to have an extra route but keep the code path clean.
0cee37f
to
23e0a72
Compare
23e0a72
to
a042d63
Compare
For most of the DB, there would be just only one endpoint for executing all statements. So I feel we should just extend the existing SQL endpoint. |
6bf3bd1
to
e457243
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good.
although I am not sure it is good to encode aws key/token in plain text. any chance we can do it differently?
case DQL: | ||
return _requestHandler.handleRequest(sqlRequestJson, httpRequesterIdentity, new RequestStatistics()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this means all parsing done so far is not used, instead sending the stuff to broker and parse it again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. If we want to reuse the SqlNode, it requires refactoring to add it into requestHandler's interface, which I don't want to change for now.
sql = removeTerminatingSemicolon(sql); | ||
|
||
// Extract OPTION statements from sql as Calcite Parser doesn't parse it. | ||
List<String> options = extractOptionsFromSql(sql); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can be cleaned up by extending parser.jj to allow OPTION
keyword but that can be done later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, since we already have the OPTIONS handler and be able to parse it out, then I don't complain.
Eventually, this should all go to Parser.
Users will definitely type it in plain text and send it over using the query console. We should internal mask those, similar to current recurring minion tasks. |
Looks good, but would be useful to have a few unit and / or integration tests. Also,
I am wondering if something similar can be done for all DML/DDL statements as well to avoid either accidental or malicious usecases where a user who normally runs SELECT statements ends up running an INSERT statement with valid S3 credential to insert bad data into a Pinot table (?). |
I wonder if Here is the reference from MySQL: |
Agreed, using I was trying to be on par with other technologies (https://clickhouse.com/docs/en/sql-reference/statements/insert-into/#inserting-data-from-a-file). We can also support similar semantics, but cc: @walterddr |
Will update PR to change to |
I think
[edit] --> just to clarify I wasn't opposing adding |
to add to it.
where the |
c11373c
to
4db99f1
Compare
Talked with @walterddr offline and feel we can keep the current syntax for now. We can support MySQL syntax later. |
7321e78
to
7db9c13
Compare
Agree that we should use one single endpoint to handle all statements. My concern is that the single SQL endpoint should probably be a |
GET endpoint is not used much, so far I think only pinot-admin Query command is still using GET api, all the other endpoint are using POST. For endpoint perspective, do you mean we should create a new POST endpoint, say |
Oh, just realize we already have the POST endpoint. What I meant is to make POST (reusing the existing one) to accept all SQL semantics, and keep GET only accepting DQL. We may add a check in the GET endpoint to prevent user using GET to change the table. |
6c6e9ee
to
34987e5
Compare
done, throws exception when it's not DQL for GET api. |
0b57e82
to
4ec4d4e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
pinot-core/src/main/java/org/apache/pinot/core/query/executor/sql/SqlQueryExecutor.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/query/executor/sql/SqlQueryExecutor.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/query/executor/sql/SqlQueryExecutor.java
Outdated
Show resolved
Hide resolved
1b81234
to
682f25a
Compare
682f25a
to
224f148
Compare
* Adding DML definition and parse sql * Use MinionClient for execute async task * Add integration tests
Adding DML definition and Support INSERT INTO table FROM inputDirURI statement
Syntax:
E.g.
Sample:
data:image/s3,"s3://crabby-images/e6ac6/e6ac6bbbb71f48c9b6bd67a06d3657e20eb946f1" alt="image"