Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spark: Support Spark 3.3 #5056

Closed

Conversation

singhpk234
Copy link
Contributor

@singhpk234 singhpk234 commented Jun 15, 2022

Closes #4713


About Change

  • Introduced a new Node ReplaceIcebergData to be used in place of ReplaceData, as it extends command and commands are eagerly executed, which was causing problems with Update (in case we allow allowScanDuplication) & RowLevelCommandDynamicPruning (filters not being injected)
  • Changed the Rule name RewriteDeleteFromTable -> RewriteDeleteFromIcebergTable, OptimizeMetadataOnlyDeleteFromTable -> OptimizeMetadataOnlyDeleteFromIcebergTable to avoid conflicting rule in upstream which doesn't support DeltaWrites and uses ReplaceData

Other Notes :

Have marked some UT's ignored for complete suite to run:
[1] TestAddFilesProcedure failed due to bug in spark upstream, have fixed it in upstream but since the snapshot is last based of 7th May it still failing hence ignored it.
JIRA : https://issues.apache.org/jira/browse/SPARK-39417
[2] Investigating the RC for the below failures :
TestMerge#testMergeWithInvalidAssignments
TestDeleteFrom#testDeleteFromUnpartitionedTable, TestDeleteFrom#testDeleteFromPartitionedTable


cc @rdblue @aokolnychyi @jackye1995 @RussellSpitzer @kbendick @amogh-jahagirdar @rajarshisarkar @flyrain

@pan3793
Copy link
Member

pan3793 commented Jun 16, 2022

"copy 3.2 files from 3.3" should be in another PR, otherwise after squash, the git will not keep commit history.

@singhpk234
Copy link
Contributor Author

makes sense, thanks @pan3793, will do it in a separate PR.

@singhpk234
Copy link
Contributor Author

singhpk234 commented Jun 17, 2022

UPDATE :
spark 3.3 got published on maven repo : https://mvnrepository.com/artifact/org.apache.spark/spark-core
have removed the pr dependency on snapshot's repo
[1] Have reverted the ignored AddFilesProcedure UT's as well, since the upstream now having fix is published.
[2] Have reverted the Test#Merge ut failure as well, found spark hid the behaviour we were relying on behind a conf, have enabled in our session.

I am investigating the RC for the two remaining UT's failure in TestDeleteFrom, will update the fixes the shortly.

@singhpk234 singhpk234 marked this pull request as ready for review June 17, 2022 08:40
@singhpk234
Copy link
Contributor Author

singhpk234 commented Jun 20, 2022

"copy 3.2 files from 3.3" should be in another PR, otherwise after squash, the git will not keep commit history.

ACK, opened a new PR for "copy 3.2 files from 3.3"

@singhpk234 singhpk234 force-pushed the feature/oss_spark_3.3_iceberg branch from 38eb20e to dfffccc Compare June 28, 2022 13:43
@singhpk234
Copy link
Contributor Author

updated this pr also with 3.3 only changes just in case we decide to merge 3.3 only first.

@singhpk234
Copy link
Contributor Author

superceded by #5094

Thank you everyone for your awesome reviews :) !!

@singhpk234 singhpk234 closed this Jun 28, 2022
Copy link

@saswata-dutta saswata-dutta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Spark 3.3] : Support Apache Spark 3.3 in iceberg
3 participants