Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spark: support rewrite on specified target branch #8797

Closed
wants to merge 14 commits into from

Conversation

zinking
Copy link
Contributor

@zinking zinking commented Oct 11, 2023

why the change

currently, rewrite only happens on main branch.
with the branch functionality it is also useful to conduct rewrite related tests on branches

what is changed

added target branch to specify the rewrite target

how is the change tested

added new unit tests to cover the new cases

@nastra
Copy link
Contributor

nastra commented Oct 11, 2023

can you also please update the respective docs?

@zinking
Copy link
Contributor Author

zinking commented Oct 11, 2023

can you also please update the respective docs?

do you mean the branch & tagging doc? anywhere else?

@nastra
Copy link
Contributor

nastra commented Oct 11, 2023

can you also please update the respective docs?

do you mean the branch & tagging doc? anywhere else?

I mean in the Spark procedures section: https://github.com/apache/iceberg/blob/e5ad5ce4c09d1efcd26e3cf0bd27bcf8c06c2400/docs/spark-procedures.md#rewrite_data_files

@github-actions github-actions bot added the docs label Oct 11, 2023
@zinking
Copy link
Contributor Author

zinking commented Oct 11, 2023

#8762 associating the original issue here.

@edgarRd
Copy link
Contributor

edgarRd commented Feb 27, 2024

Just checking on the state of this PR, @zinking is there anything pending to move this forward? @ajantha-bhat @nastra any remaining concerns?
With branches and merge-on-read deletes now fixed in branches I think the functionality to rewrite data files is necessary.
Thanks everyone!

@zinking
Copy link
Contributor Author

zinking commented Feb 28, 2024

let me update the PR, and @nastra can have another look.

@amit-cloudinary
Copy link

any updates on this ?? is this something thats going to be merged soon

@amit-cloudinary
Copy link

@zinking - if i can assist with this please let me know (would love to contribute if possible)

feel free to pick it up here.

Ill give it a try :) ,thanks

@amitgilad3
Copy link

amitgilad3 commented Sep 3, 2024

Hey @szehon-ho @nastra @nk1506 @ajantha-bhat - i fixed you suggestions , can you please review again.
thanks

Copy link

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@iceberg.apache.org list. Thank you for your contributions.

@github-actions github-actions bot added the stale label Oct 10, 2024
@@ -171,6 +171,16 @@ default RewriteDataFiles zOrder(String... columns) {
*/
RewriteDataFiles filter(Expression expression);

/**
* Specify the branch on which the rewrite will be performed. via WAP(write-audit-publish)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

via WAP(write-audit-publish)

I don't think this sentence is needed? And it' also incomplete sentence

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, looks like WAP is supported through the procedure, but I don't think we need to mention it here in the RewriteDataFiles class, since that is a caller's logic to map WAP branch to this

private RewriteDataFiles checkAndApplyBranch(
Table table, Identifier ident, RewriteDataFiles action) {
String branchIdent = Spark3Util.extractBranch(ident);
SparkWriteConf writeConf = new SparkWriteConf(spark(), table, branchIdent, Maps.newHashMap());
Copy link
Contributor

@jackye1995 jackye1995 Oct 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SparkWriteConf should not be initialized independently within this method, it feels like it should be a part of the upper RewriteDataFilesProcedure.call()

@jackye1995
Copy link
Contributor

looks like this did not get much attention, @amit-cloudinary I did another pass, let me know if you can make the updates

@jackye1995
Copy link
Contributor

also @amitgilad3 @zinking

@amitgilad3
Copy link

Tnx for looking into this @jackye1995 , Will fix all the comment

@github-actions github-actions bot removed the stale label Oct 19, 2024
2. initialize SparkWriteConf at call function
@amitgilad3
Copy link

Hey @jackye1995 - i fixed all your comments but for some reason one test fails and i am not able to reproduce it locally , any chance to assist with this ?

@jackye1995
Copy link
Contributor

I think you might want to rebase the commit against latest main branch

@zinking
Copy link
Contributor Author

zinking commented Nov 5, 2024

@amitgilad3 I have some issue doing the rebase with my current computer. so you probably just copy the branch and doing it on your own repo.

Copy link

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@iceberg.apache.org list. Thank you for your contributions.

@github-actions github-actions bot added the stale label Jan 16, 2025
Copy link

This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.

@github-actions github-actions bot closed this Jan 30, 2025
@nabaskes
Copy link

Is this work permanently abandoned? Is there another good way to solve this, or is this pattern just not really supported?

@amitgilad3
Copy link

Ill try to finish the pr this weekend, last time i tried we had merge conflicts

@nabaskes
Copy link

Thank you so much!

@nabaskes
Copy link

nabaskes commented Feb 12, 2025

Is the outstanding items here just a matter of doing a rebase and fixing collisions? If so, I can take that up.

@amitgilad3
Copy link

@nabaskes - zinking#5
this is an attempt to finish the pr

@nabaskes
Copy link

@amitgilad3 thank you so much, that was so so fast !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants