Skip to content

Conversation

@pan3793
Copy link
Member

@pan3793 pan3793 commented Feb 21, 2023

Why are the changes needed?

This PR proposes to allow the user to provide a batch id on submitting a batch job. If the batch id already existed in metastore, Kyuubi ignores this submission and just returns the existing one, w/ a marker in response, this could avoid duplicated batch job submission.

This feature relies on #4418

Talking about the implementation, the key things are

How does the user set the custom batch id?

  • User can optionally set the kyuubi.batch.id in conf: Map[String, String], and the value must be a UUID, for Java users, it can be generated by UUID.randomUUID().toString()

How does the Kyuubi Server detect the duplication?

  • It's simple in single Kyuubi Server instance case, Kyuubi just needs to look up the metastore before creating a batch job
  • In HA mode, suppose the user requests to create the batch jobs w/ the same batch id concurrently, multiple Kyuubi Servers may process the request and try to insert to metastore DB at the same time, but only the first insertion success, others will fail w/ "duplicated key", Kyuubi Server needs to catch this error and return the existing batch job information instead of creating a new one.

How does the user know if the returned batch job is new created or duplicated?

  • a new field batchInfo: Map[String, String] is added to the response, and for duplicated batch job, "kyuubi.batch.duplicated": "true" will be contained.

How was this patch tested?

  • Add some test cases that check the changes thoroughly including negative and positive cases if possible

  • Add screenshots for manual tests if appropriate

  • Run test locally before make a pull request

Copy link
Member

@turboFei turboFei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left some comments

@turboFei
Copy link
Member

seems mistake, let me think it twice.

@ulysses-you
Copy link
Contributor

How about add a traceToken in rest client ? Trino support that and Kyuubi server should return traceToken back to client. Then developer can deduplicate by itself.

@pan3793
Copy link
Member Author

pan3793 commented Feb 21, 2023

How about add a traceToken in rest client ? Trino support that and Kyuubi server should return traceToken back to client. Then developer can deduplicate by itself.

I think it makes things more complex, IMO let the user provide a custom batch id does not has many risks, we already restrict the batch id MUST be a UUID.

@pan3793 pan3793 self-assigned this Mar 1, 2023
@pan3793 pan3793 marked this pull request as ready for review March 1, 2023 04:19
@pan3793
Copy link
Member Author

pan3793 commented Mar 1, 2023

cc @turboFei @ulysses-you it's ready for the next review.

@codecov-commenter
Copy link

codecov-commenter commented Mar 2, 2023

Codecov Report

Merging #4390 (b6917ba) into master (abc0a1d) will decrease coverage by 0.01%.
The diff coverage is 81.53%.

@@             Coverage Diff              @@
##             master    #4390      +/-   ##
============================================
- Coverage     53.24%   53.24%   -0.01%     
  Complexity       13       13              
============================================
  Files           569      569              
  Lines         31111    31146      +35     
  Branches       4203     4208       +5     
============================================
+ Hits          16565    16583      +18     
- Misses        12979    12992      +13     
- Partials       1567     1571       +4     
Impacted Files Coverage Δ
...ava/org/apache/kyuubi/client/api/v1/dto/Batch.java 88.23% <75.00%> (-1.93%) ⬇️
.../apache/kyuubi/server/api/v1/BatchesResource.scala 70.58% <75.60%> (-1.04%) ⬇️
.../main/scala/org/apache/kyuubi/util/JdbcUtils.scala 89.74% <100.00%> (+1.17%) ⬆️
.../apache/kyuubi/client/api/v1/dto/BatchRequest.java 56.86% <100.00%> (+3.92%) ⬆️
...java/org/apache/kyuubi/client/util/BatchUtils.java 80.00% <100.00%> (+5.00%) ⬆️
...pache/kyuubi/server/metadata/MetadataManager.scala 80.72% <100.00%> (-0.23%) ⬇️
...apache/kyuubi/session/KyuubiBatchSessionImpl.scala 86.81% <100.00%> (ø)
...e/kyuubi/jdbc/hive/ClosedOrCancelledException.java 0.00% <0.00%> (-100.00%) ⬇️
...rg/apache/kyuubi/ctl/cmd/log/LogBatchCommand.scala 59.09% <0.00%> (-7.58%) ⬇️
...in/spark/authz/ranger/SparkRangerAdminPlugin.scala 64.47% <0.00%> (-2.64%) ⬇️
... and 3 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@pan3793 pan3793 added this to the v1.7.0 milestone Mar 2, 2023
@turboFei
Copy link
Member

turboFei commented Mar 2, 2023

TODO: render the batchInfo in kyuubi-ctl

@pan3793
Copy link
Member Author

pan3793 commented Mar 2, 2023

render the batchInfo in kyuubi-ctl

Tracking in #4438

@pan3793 pan3793 closed this in efbaaff Mar 2, 2023
@pan3793
Copy link
Member Author

pan3793 commented Mar 2, 2023

Thanks, merged to master/1.7

pan3793 added a commit that referenced this pull request Mar 2, 2023
### _Why are the changes needed?_

This PR proposes to allow the user to provide a batch id on submitting a batch job. If the batch id already existed in metastore, Kyuubi ignores this submission and just returns the existing one, w/ a marker in response, this could avoid duplicated batch job submission.

Talking about the implementation, the key things are

How does the user set the custom batch id?

- User can optionally set the `kyuubi.batch.id` in `conf: Map[String, String]`, and the value must be a UUID, for Java users, it can be generated by `UUID.randomUUID().toString()`

How does the Kyuubi Server detect the duplication?

- It's simple in single Kyuubi Server instance case, Kyuubi just needs to look up the metastore before creating a batch job
- In HA mode, suppose the user requests to create the batch jobs w/ the same batch id concurrently, multiple Kyuubi Servers may process the request and try to insert to metastore DB at the same time, but only the first insertion success, others will fail w/ "duplicated key", Kyuubi Server needs to catch this error and return the existing batch job information instead of creating a new one.

How does the user know if the returned batch job is new created or duplicated?

- a new field `batchInfo: Map[String, String]` is added to the response, and for duplicated batch job, `"kyuubi.batch.duplicated": "true"` will be contained.

### _How was this patch tested?_
- [x] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [x] [Run test](https://kyuubi.readthedocs.io/en/master/develop_tools/testing.html#running-tests) locally before make a pull request

Closes #4390 from pan3793/batch-id.

Closes #4390

b6917ba [Cheng Pan] move constant to rest client
79ef1b5 [Cheng Pan] flaky test
f822285 [Cheng Pan] it
88bdfa5 [Cheng Pan] ut
fd8bc22 [Cheng Pan] ut
c820f5e [Cheng Pan] Support user provided batch id on batch job submission

Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>
(cherry picked from commit efbaaff)
Signed-off-by: Cheng Pan <chengpan@apache.org>
@pan3793 pan3793 deleted the batch-id branch March 3, 2023 03:17
turboFei added a commit that referenced this pull request Mar 7, 2023
### _Why are the changes needed?_

Close #4438
Follow up of #4390
### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [x] [Run test](https://kyuubi.readthedocs.io/en/master/develop_tools/testing.html#running-tests) locally before make a pull request

Closes #4463 from turboFei/render_batch.

Closes #4438

b5ae211 [fwang12] render batch info

Authored-by: fwang12 <fwang12@ebay.com>
Signed-off-by: fwang12 <fwang12@ebay.com>
turboFei added a commit that referenced this pull request Mar 7, 2023
### _Why are the changes needed?_

Close #4438
Follow up of #4390
### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [x] [Run test](https://kyuubi.readthedocs.io/en/master/develop_tools/testing.html#running-tests) locally before make a pull request

Closes #4463 from turboFei/render_batch.

Closes #4438

b5ae211 [fwang12] render batch info

Authored-by: fwang12 <fwang12@ebay.com>
Signed-off-by: fwang12 <fwang12@ebay.com>
(cherry picked from commit 39eac5a)
Signed-off-by: fwang12 <fwang12@ebay.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants