-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spark: support rewrite on specified target branch #8797
Conversation
...ensions/src/test/java/org/apache/iceberg/spark/extensions/TestRewriteDataFilesProcedure.java
Outdated
Show resolved
Hide resolved
...k/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/RewriteDataFilesSparkAction.java
Outdated
Show resolved
Hide resolved
can you also please update the respective docs? |
...k/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/RewriteDataFilesSparkAction.java
Outdated
Show resolved
Hide resolved
.../v3.5/spark/src/main/java/org/apache/iceberg/spark/procedures/RewriteDataFilesProcedure.java
Show resolved
Hide resolved
...k/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/RewriteDataFilesSparkAction.java
Show resolved
Hide resolved
...ensions/src/test/java/org/apache/iceberg/spark/extensions/TestRewriteDataFilesProcedure.java
Show resolved
Hide resolved
do you mean the branch & tagging doc? anywhere else? |
...ensions/src/test/java/org/apache/iceberg/spark/extensions/TestRewriteDataFilesProcedure.java
Outdated
Show resolved
Hide resolved
...ensions/src/test/java/org/apache/iceberg/spark/extensions/TestRewriteDataFilesProcedure.java
Outdated
Show resolved
Hide resolved
...k/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/RewriteDataFilesSparkAction.java
Outdated
Show resolved
Hide resolved
...k/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/RewriteDataFilesSparkAction.java
Outdated
Show resolved
Hide resolved
I mean in the Spark procedures section: https://github.com/apache/iceberg/blob/e5ad5ce4c09d1efcd26e3cf0bd27bcf8c06c2400/docs/spark-procedures.md#rewrite_data_files |
#8762 associating the original issue here. |
...k/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/RewriteDataFilesSparkAction.java
Show resolved
Hide resolved
.../v3.5/spark/src/main/java/org/apache/iceberg/spark/procedures/RewriteDataFilesProcedure.java
Outdated
Show resolved
Hide resolved
Just checking on the state of this PR, @zinking is there anything pending to move this forward? @ajantha-bhat @nastra any remaining concerns? |
let me update the PR, and @nastra can have another look. |
…s/RewriteDataFilesSparkAction.java Co-authored-by: Eduard Tudenhoefner <etudenhoefner@gmail.com>
5428083
to
3a36f4b
Compare
any updates on this ?? is this something thats going to be merged soon |
Ill give it a try :) ,thanks |
Hey @szehon-ho @nastra @nk1506 @ajantha-bhat - i fixed you suggestions , can you please review again. |
This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@iceberg.apache.org list. Thank you for your contributions. |
@@ -171,6 +171,16 @@ default RewriteDataFiles zOrder(String... columns) { | |||
*/ | |||
RewriteDataFiles filter(Expression expression); | |||
|
|||
/** | |||
* Specify the branch on which the rewrite will be performed. via WAP(write-audit-publish) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
via WAP(write-audit-publish)
I don't think this sentence is needed? And it' also incomplete sentence
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, looks like WAP is supported through the procedure, but I don't think we need to mention it here in the RewriteDataFiles class, since that is a caller's logic to map WAP branch to this
private RewriteDataFiles checkAndApplyBranch( | ||
Table table, Identifier ident, RewriteDataFiles action) { | ||
String branchIdent = Spark3Util.extractBranch(ident); | ||
SparkWriteConf writeConf = new SparkWriteConf(spark(), table, branchIdent, Maps.newHashMap()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SparkWriteConf should not be initialized independently within this method, it feels like it should be a part of the upper RewriteDataFilesProcedure.call()
looks like this did not get much attention, @amit-cloudinary I did another pass, let me know if you can make the updates |
also @amitgilad3 @zinking |
Tnx for looking into this @jackye1995 , Will fix all the comment |
2. initialize SparkWriteConf at call function
Hey @jackye1995 - i fixed all your comments but for some reason one test fails and i am not able to reproduce it locally , any chance to assist with this ? |
I think you might want to rebase the commit against latest main branch |
@amitgilad3 I have some issue doing the rebase with my current computer. so you probably just copy the branch and doing it on your own repo. |
This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@iceberg.apache.org list. Thank you for your contributions. |
This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time. |
Is this work permanently abandoned? Is there another good way to solve this, or is this pattern just not really supported? |
Ill try to finish the pr this weekend, last time i tried we had merge conflicts |
Thank you so much! |
Is the outstanding items here just a matter of doing a rebase and fixing collisions? If so, I can take that up. |
@amitgilad3 thank you so much, that was so so fast ! |
why the change
currently, rewrite only happens on main branch.
with the branch functionality it is also useful to conduct rewrite related tests on branches
what is changed
added target branch to specify the rewrite target
how is the change tested
added new unit tests to cover the new cases