Spark: Fix a separate table cache being created for each rewriteFiles #5392

manuzhang · 2022-07-30T08:20:31Z

Currently, during Spark's rewrite data files procedure with bin pack strategy, SparkSession is cloned to disable AQE in each rewriteFiles. Since a cloned SparkSession has its own state, V2SessionCatalog is reloaded every time and a separate table cache is created. That means each file group has its own table cache and effectively disables the table cache.

This PR fixes it by cloning SparkSession when creating SparkBinPackStrategy.

manuzhang · 2022-07-30T08:24:51Z

@rdblue @aokolnychyi @kbendick Please help review and suggest where to add a UT. I haven't found a proper place.

rdblue · 2022-07-30T18:43:40Z

I'm interested to hear what @szehon-ho and @RussellSpitzer think about this.

My initial reaction is that this is not something that we should change. We don't want to disable AQE for other Spark work, which is a side-effect of this change. I also don't like that we need to create a new Spark session for each rewrite, but I don't think there is much we can do to avoid it if we want to disable AQE. We could also fail if AQE is on or just accept the AQE results.

Also, is a separate table cache a bug? Since it is only used once, what is the problem with doing it this way? Sure, this won't cache the rewritten table, but is there a behavior problem or are loads just slightly slower?

manuzhang · 2022-07-31T01:19:08Z

@rdblue I've moved cloning session and disabling AQE into RewriteDataFilesSparkAction.

When multiple rewrite actions are submitted concurrently as follows, they will block each other when loading table due to locks in shared HiveExternalCatalog. That hurts the overall performance of rewrite actions.

CALL spark_catalog.system.rewrite_data_files(table => 'default.table', options => map('max-concurrent-file-group-rewrites', '200'));

manuzhang · 2022-07-31T05:57:04Z

There is another lock in SessionCatalog#tableExists for Spark versions before apache/spark#31891 since sessionCatalog is also shared.

kbendick · 2022-07-31T06:11:21Z

Ignore my previous comments I had the caches mistaken.

kbendick · 2022-07-31T06:15:08Z

I will say that I don't love the idea of asking users to disable AQE (e.g. not cloning the session and ensuring AQE is disabled), as much as cloning the session is somewhat of a pain.

People use these statements at the end of queries and disabling AQE would be a bummer.

rdblue · 2022-07-31T16:42:46Z

...k/v3.3/spark/src/main/java/org/apache/iceberg/spark/actions/RewriteDataFilesSparkAction.java

+    // Disable Adaptive Query Execution as this may change the output partitioning of our write
+    SparkSession spark = spark().cloneSession();
+    spark.conf().set(SQLConf.ADAPTIVE_EXECUTION_ENABLED().key(), false);
+    return new SparkBinPackStrategy(table, spark);


Bin packing is used as the basis for the other strategies. If you're moving this into the action itself, then you should move it into spark().

If it's moved into spark() then SparkSession is cloned with AQE disabled for all BaseSparkActions. I'm not sure about the side-effect.

Given that spark() is protected in BaseSparkAction, it could be overridden within this action.

Or another spark() method could be added that clones and disables AQE, like spark(boolean cloneAndDisableAQE) or something.

I don't see the difference here since we need to invoke the spark() from super class to get SparkSession anyway.

rdblue · 2022-07-31T16:43:31Z

@manuzhang, it seems reasonable to create a session for the entire rewrite, not just each Spark submission. Is that what was happening before?

manuzhang · 2022-08-01T01:24:19Z

Is that what was happening before?

Do you mean session was created for the entire rewrite before?

rdblue · 2022-08-01T23:30:57Z

Do you mean session was created for the entire rewrite before?

I'm asking you what the behavior was before this change that you want to fix.

manuzhang · 2022-08-03T02:50:32Z

Not sure. The furthest I can track is #2591 and the behavior is same as now.

manuzhang · 2022-08-04T05:44:07Z

@rdblue Any more concerns or suggestions for this PR?

manuzhang · 2022-08-18T06:45:19Z

@rdblue and @kbendick any more comments?

manuzhang · 2022-08-29T02:17:03Z

Gentle ping @rdblue @aokolnychyi @kbendick for another review

spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/actions/SparkBinPackStrategy.java

ajantha-bhat

LGTM.
Thanks for addressing the comments and for the fix.

@Fokko, @rdblue, @aokolnychyi, @szehon-ho, @danielcweeks : Please help in review/merge.

rdblue · 2022-10-20T16:51:29Z

...k/v3.3/spark/src/main/java/org/apache/iceberg/spark/actions/RewriteDataFilesSparkAction.java

@@ -185,6 +186,14 @@ public RewriteDataFiles.Result execute() {
    }
  }

+  @Override
+  protected SparkSession spark() {


This may only be called once right now, but I think we should not assume that it always will be. Can you update this to keep a copy of the session for this action?

Not sure what you mean by "copy". It is now cloning the SparkSession for this action on binPackStrategy, sortStrategy and zOrderStrategy

I think what he meant is that, have a local variable of SparkSession in RewriteDataFilesSparkAction, initialize it once by cloning the session and disabling adaptive encoding.

Whenever spark() is called, return that variable instead of cloning again and again.

Good point. It looks there's no need for overriding this method but simply cloning in the constructor of RewriteDataFilesSparkAction

spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/actions/SparkBinPackStrategy.java

manuzhang · 2022-10-21T02:32:34Z

@ajantha-bhat I forgot to remove cloneSession in SparkSortStrategy and SparkZOrderStrategy. Please review again.

ajantha-bhat · 2022-10-21T02:38:46Z

spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/actions/SparkZOrderStrategy.java

-      SparkSession cloneSession = spark().cloneSession();
-      cloneSession.conf().set(SQLConf.ADAPTIVE_EXECUTION_ENABLED().key(), false);
-
+      SparkSession spark = spark();


instead of cloning again, Aren't we suppose to store spark coming in the constructor into a variable and use it here?

this is not cloning again but spark() from SparkSortStrategy

Got it. This similar method name is really confusing. Just by reading this block of the code couldn't figure out spark() was also there in SparkSortStrategy and it is not from RewriteDataFilesSparkAction

Maybe we can add a comment that it is from the parent or rename the method name in the parent.

Sure, I'm wondering whether spark can be a protected field. The current way is a bit overuse of design patterns, IMO.

Our current style rules don't allow for protected fields, but we could make an exception here I guess. I think long term we imagined that ZOrderStrategy will just be a special case of SortStrategy once we have the abilty to use a multi arg transform as a Sort Order.

spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/actions/SparkZOrderStrategy.java

ajantha-bhat · 2022-10-21T02:49:36Z

@ajantha-bhat I forgot to remove cloneSession in SparkSortStrategy and SparkZOrderStrategy. Please review again.

Yeah, last time I gave a comment that these other places also have issues and I forgot to recheck😔 whether it was addressed or not. I think now we cover all the places. Thanks.

manuzhang · 2022-10-31T03:25:39Z

@ajantha-bhat @rdblue any more comments?

RussellSpitzer · 2022-11-03T16:24:34Z

spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/actions/SparkSortStrategy.java

      // Reset Shuffle Partitions for our sort
      long numOutputFiles =
          numOutputFiles((long) (inputFileSize(filesToRewrite) * sizeEstimateMultiple));
-      cloneSession.conf().set(SQLConf.SHUFFLE_PARTITIONS().key(), Math.max(1, numOutputFiles));
+      spark.conf().set(SQLConf.SHUFFLE_PARTITIONS().key(), Math.max(1, numOutputFiles));


I'm a little worried that this may be a race condition between various rewrite actions occurring at the same time in the same JVM

Actually I think this is fine since each Action get's it's own cloned session

RussellSpitzer · 2022-11-03T16:32:56Z

...k/v3.3/spark/src/main/java/org/apache/iceberg/spark/actions/RewriteDataFilesSparkAction.java

@@ -94,10 +95,14 @@
  private boolean useStartingSequenceNumber;
  private RewriteJobOrder rewriteJobOrder;
  private RewriteStrategy strategy = null;
+  private final SparkSession cloneSession;

  RewriteDataFilesSparkAction(SparkSession spark, Table table) {
    super(spark);


I think this needs to be the cloned session, otherwise the subclass calls would be modifying the original session and not the new cloned one correct?

Like here -
https://github.com/apache/iceberg/pull/5392/files#diff-39b303771b5d730c63672bb27597474e2b84a7f1b4b3f8b22fb58352c34f8968R206

I agree. In general, I think this should probably be overriding spark() so that the same context is used everywhere consistently.

RussellSpitzer · 2022-11-03T16:37:50Z

@manuzhang, it seems reasonable to create a session for the entire rewrite, not just each Spark submission. Is that what was happening before?

Yes basically the old behavior would be to clone the Spark Session for each file group rewrite, rather than once for the entire action.

RussellSpitzer

Overall I think this is the right approach, if we are going to be modifying session state for a given action I think i makes sense to clone at the action level rather than within the rewriteFiles call.

That said I do think we have to be careful that the return of spark() is the cloned session and not the original. Once that is fixed I think this is good to go.

manuzhang · 2022-11-05T02:50:43Z

@RussellSpitzer please check again whether it's fixed now.

manuzhang · 2022-11-11T03:56:17Z

@RussellSpitzer @rdblue @ajantha-bhat please take another look. Thanks.

rdblue · 2022-11-27T22:41:10Z

I think that all of the session references are to the cloned Spark session now. +1.

rdblue · 2022-11-27T22:41:28Z

Thanks, @manuzhang!

manuzhang · 2022-11-28T03:31:47Z

Thanks @rdblue @ajantha-bhat @RussellSpitzer @kbendick @hililiwei for your review.
I opened #6284 and #6285 to back-port the PR to Spark 3.2 and 3.1 respectively. Please help review those PRs as well.

github-actions bot added the spark label Jul 30, 2022

manuzhang force-pushed the fix-table-caching branch from 918bb15 to d756a4f Compare July 30, 2022 10:38

rdblue reviewed Jul 31, 2022

View reviewed changes

ajantha-bhat reviewed Oct 20, 2022

View reviewed changes

spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/actions/SparkBinPackStrategy.java Show resolved Hide resolved

manuzhang added 3 commits October 20, 2022 13:05

Spark: Fix a separate table cache being created for each rewriteFiles

5d8c50b

Move cloning session into RewriteDataFileSparkAction

2c84ba2

Disable AQE in overriding method

905d2c5

manuzhang force-pushed the fix-table-caching branch from 96f4c9b to 905d2c5 Compare October 20, 2022 05:38

ajantha-bhat approved these changes Oct 20, 2022

View reviewed changes

rdblue reviewed Oct 20, 2022

View reviewed changes

spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/actions/SparkBinPackStrategy.java Show resolved Hide resolved

Remove cloneSession in SparkSortStrategy and SparkZOrderStrategy

8ff9a94

ajantha-bhat reviewed Oct 21, 2022

View reviewed changes

spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/actions/SparkZOrderStrategy.java Outdated Show resolved Hide resolved

Fix using SparkSession

d1ed661

Refactor

3611c62

ajantha-bhat approved these changes Oct 21, 2022

View reviewed changes

RussellSpitzer reviewed Nov 3, 2022

View reviewed changes

Set cloned session to superclass

870d04c

manuzhang force-pushed the fix-table-caching branch from a3aab20 to 870d04c Compare November 4, 2022 16:31

hililiwei approved these changes Nov 18, 2022

View reviewed changes

rdblue approved these changes Nov 27, 2022

View reviewed changes

rdblue merged commit 4caf1b4 into apache:master Nov 27, 2022

This was referenced Nov 28, 2022

Spark 3.2: Fix a separate table cache being created for each rewriteFiles #6284

Merged

Spark 3.1: Fix a separate table cache being created for each rewriteFiles #6285

Merged

manuzhang deleted the fix-table-caching branch March 26, 2024 15:02

Spark: Fix a separate table cache being created for each rewriteFiles #5392

Spark: Fix a separate table cache being created for each rewriteFiles #5392

Conversation

manuzhang commented Jul 30, 2022

manuzhang commented Jul 30, 2022 • edited Loading

rdblue commented Jul 30, 2022

manuzhang commented Jul 31, 2022 • edited Loading

manuzhang commented Jul 31, 2022

kbendick commented Jul 31, 2022

kbendick commented Jul 31, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

manuzhang Aug 11, 2022 • edited Loading

Choose a reason for hiding this comment

rdblue commented Jul 31, 2022

manuzhang commented Aug 1, 2022

rdblue commented Aug 1, 2022

manuzhang commented Aug 3, 2022

manuzhang commented Aug 4, 2022

manuzhang commented Aug 18, 2022

manuzhang commented Aug 29, 2022

ajantha-bhat left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

manuzhang commented Oct 21, 2022

Choose a reason for hiding this comment

manuzhang Oct 21, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RussellSpitzer Nov 3, 2022 • edited Loading

Choose a reason for hiding this comment

ajantha-bhat commented Oct 21, 2022

manuzhang commented Oct 31, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RussellSpitzer Nov 3, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RussellSpitzer commented Nov 3, 2022

RussellSpitzer left a comment

Choose a reason for hiding this comment

manuzhang commented Nov 5, 2022

manuzhang commented Nov 11, 2022

rdblue commented Nov 27, 2022

rdblue commented Nov 27, 2022

manuzhang commented Nov 28, 2022

manuzhang commented Jul 30, 2022 •

edited

Loading

manuzhang commented Jul 31, 2022 •

edited

Loading

manuzhang Aug 11, 2022 •

edited

Loading

manuzhang Oct 21, 2022 •

edited

Loading

RussellSpitzer Nov 3, 2022 •

edited

Loading

RussellSpitzer Nov 3, 2022 •

edited

Loading