Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spark 3.4: Support rate limit in Spark Streaming #7422

Merged
merged 2 commits into from
Apr 28, 2023

Conversation

singhpk234
Copy link
Contributor

About the change

Forward port #4479 to spark 3.4

cc @jackye1995

@github-actions github-actions bot added the spark label Apr 24, 2023
@singhpk234
Copy link
Contributor Author

Output of :

git diff --no-index spark/v3.3/spark/src/ spark/v3.4/spark/src --name-only

spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/SparkPositionDeltaOperation.java
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/SparkPositionDeltaWrite.java
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/SparkPositionDeltaWriteBuilder.java
spark/v3.4/spark/src/main/java/org/apache/spark/sql/catalyst/analysis/NoSuchProcedureException.java
/dev/null
/dev/null
/dev/null
/dev/null
/dev/null
/dev/null
/dev/null
spark/v3.4/spark/src/test/java/org/apache/iceberg/spark/TestFunctionCatalog.java
spark/v3.4/spark/src/test/java/org/apache/iceberg/spark/source/TestRequiredDistributionAndOrdering.java
spark/v3.4/spark/src/test/java/org/apache/iceberg/spark/source/TestSparkDataWrite.java
spark/v3.4/spark/src/test/java/org/apache/iceberg/spark/sql/TestDropTable.java

Copy link
Contributor

@jackye1995 jackye1995 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the forward port, looks good to me, just waiting for CI to complete.

Comment on lines +371 to +378
// TODO : use readLimit provided in function param, the readLimits are derived from
// these 2 properties.
if ((curFilesAdded + 1) > maxFilesPerMicroBatch
|| (curRecordCount + task.file().recordCount()) > maxRecordsPerMicroBatch) {
shouldContinueReading = false;
break;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is a forward port this should be fine for now but probably worth creating an issue to track the ToDo to use the provided ReadLimit

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK let me take this as follow-up of this pr immediately and add a tracking issue as well meanwhile.

@jackye1995
Copy link
Contributor

@singhpk234 could you rebase the PR based on the refactoring?

Copy link
Contributor

@jackye1995 jackye1995 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me!

Copy link
Contributor

@amogh-jahagirdar amogh-jahagirdar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM as well, thanks @singhpk234 !

@jackye1995
Copy link
Contributor

I think the comments are all addressed, will go ahead and merge it. Thanks everyone for the review!

@jackye1995 jackye1995 merged commit 3d651a1 into apache:master Apr 28, 2023
@wypoon
Copy link
Contributor

wypoon commented Apr 29, 2023

@singhpk234 @jackye1995
After I pulled from master, org.apache.iceberg.spark.source.TestStructuredStreamingRead3.testReadStreamOnIcebergTableWithMultipleSnapshots_WithNumberOfRows_1 is failing for me.

@szehon-ho
Copy link
Collaborator

Yea this test fail for my pr run as well..

@singhpk234
Copy link
Contributor Author

singhpk234 commented Apr 29, 2023

Added a pr : #7470 for the fix.

cc @szehon-ho @wypoon

@wypoon
Copy link
Contributor

wypoon commented Apr 29, 2023

Thanks @singhpk234.

manisin pushed a commit to Snowflake-Labs/iceberg that referenced this pull request May 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants