Spark: Analyze but don't optimize view body during creation by jbewing · Pull Request #14681 · apache/iceberg

jbewing · 2025-11-25T04:45:18Z

What

This PR updates view creation from Spark 4, 3.5, & 3.4 to analyze, but not optimize the view body when creating a view. Previously, the view body would be optimized which could result in long view creation times with larger tables. When creating views over a larger table (hundreds of TBs), creating a small number of views (say just a couple thousand) takes about ~12 hours and requires a moderately sized Spark cluster (~100 CPUs). Without running optimization over a view body, the view body is still analyzed for invalid syntax or references.

How

Upon looking upstream at Spark, we can see that for similar pieces of the view creation logic that views have some explicit code that enables the view body to be analyzed, but not optimized.

In Iceberg, we hijack the regular upstream Spark code and run our own variants of view creation that don't pull in this optimization. Given that a table scan planning is both redundant and slow in this case, we should update the internal Iceberg view creation code to only be used in Spark analysis, but not optimization phases.

Testing

I've run the existing test suite locally for Spark 3.4, 3.5, & 4 to verify that they still pass. Additionally, I've run this iceberg patch on an fork of Iceberg 1.10.0 on a fork of Spark 3.5 an observed in a staging environment that a task which creates some views over a smaller (~10TB) table that used to take 2 hours now takes 14 minutes consistently. Additionally, no errors or bugs were observed with the created views when testing in this staging environment.

Issue: #14680

huaxingao · 2025-11-25T21:41:48Z

...ons/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/views/CreateIcebergView.scala

-  rewritten: Boolean = false) extends BinaryCommand {
-  override def left: LogicalPlan = child
+  rewritten: Boolean = false,
+  isAnalyzed: Boolean = false) extends AnalysisOnlyCommand {


nit: Can we add a comment to explain why we want to use AnalysisOnlyCommand?

Good call! See db21f91 for added comments

huaxingao · 2025-11-25T21:49:45Z

LGTM. This aligns Iceberg CreateIcebergView with Spark’s CreateViewCommand by extending AnalysisOnlyCommand. The command’s children are analyzed then hidden, so the optimizer/planner won’t traverse the view body.

…ommand`

huaxingao

LGTM

jbewing · 2025-11-26T15:49:03Z

🙇 Thank you for the review @huaxingao ! Any timeline on when this can be merged?

nastra · 2025-11-26T16:30:01Z

@jbewing I'll also take a look at this PR this week

jbewing · 2025-11-26T16:52:29Z

Sounds good! Thank you @nastra !

nastra · 2025-11-27T07:28:52Z

...ons/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/views/CreateIcebergView.scala

-  override protected def withNewChildrenInternal(
-    newLeft: LogicalPlan, newRight: LogicalPlan): LogicalPlan =
-    copy(child = newLeft, query = newRight)
+  def markAsAnalyzed(analysisContext: AnalysisContext): LogicalPlan = {


override def markAsAnalyzed(analysisContext: AnalysisContext): LogicalPlan = { copy(isAnalyzed = true) }

can you please also fix the other spark versions?

Done in 1a09db6!

@jbewing I think this is missing the override for markAsAnalyzed?

You're absolutely right @nastra! I've addressed that in f77a893!

nastra · 2025-11-27T07:30:16Z

...ons/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/views/CreateIcebergView.scala

-  rewritten: Boolean = false) extends BinaryCommand {
-  override def left: LogicalPlan = child
+  rewritten: Boolean = false,
+  // Align Iceberg CreateIcebergView with Spark’s CreateViewCommand by extending AnalysisOnlyCommand.


Suggested change

// Align Iceberg CreateIcebergView with Spark’s CreateViewCommand by extending AnalysisOnlyCommand.

// Align Iceberg's CreateIcebergView with Spark’s CreateViewCommand by extending AnalysisOnlyCommand.

also can we move the comment right above case class CreateIcebergView?

Done in 1a09db6

- Improve + move CreateIcebergView comment for clarity - Remove excessive indentation in `markAsAnalyzed` scala function

nastra

LGTM with one final comment

…ateIcebergView`

jbewing · 2025-12-02T15:21:28Z

Thank you for the review @nastra! I've addressed your final comment in f77a893

jbewing · 2025-12-02T17:16:09Z

Actually, hold for a re-review as a recent PR #8023 has been merged which creates a ton of conflicts here that I need to resolve

…ze-view-creation

jbewing · 2025-12-02T19:33:16Z

Alright, I've rebased on master!

…4681)

Analyze but don't optimize view body during creation

ee4bf30

github-actions bot added the spark label Nov 25, 2025

nastra requested review from huaxingao and nastra November 25, 2025 08:30

huaxingao reviewed Nov 25, 2025

View reviewed changes

Add comment explaining why CreateIcebergView extends `AnalysisOnlyC…

db21f91

…ommand`

jbewing requested a review from huaxingao November 25, 2025 22:23

huaxingao approved these changes Nov 26, 2025

View reviewed changes

nastra reviewed Nov 27, 2025

View reviewed changes

PR feedback

1a09db6

- Improve + move CreateIcebergView comment for clarity - Remove excessive indentation in `markAsAnalyzed` scala function

jbewing requested a review from nastra December 1, 2025 18:34

nastra approved these changes Dec 2, 2025

View reviewed changes

Add missing override qualifier to markAsAnalyzed function in `Cre…

f77a893

…ateIcebergView`

Merge remote-tracking branch 'upstream/main' into analyze-dont-optimi…

b3bc291

…ze-view-creation

jbewing requested a review from nastra December 2, 2025 19:33

nastra approved these changes Dec 3, 2025

View reviewed changes

nastra merged commit fac485c into apache:main Dec 3, 2025
29 checks passed

jbewing mentioned this pull request Dec 3, 2025

Spark view creation is slow as it performs a redundant table scan planning #14680

Closed

3 tasks

thomaschow pushed a commit to thomaschow/iceberg that referenced this pull request Jan 19, 2026

Spark: Analyze but don't optimize view body during creation (apache#1…

3a32c08

…4681)

	// Align Iceberg CreateIcebergView with Spark’s CreateViewCommand by extending AnalysisOnlyCommand.
	// Align Iceberg's CreateIcebergView with Spark’s CreateViewCommand by extending AnalysisOnlyCommand.

Conversation

jbewing commented Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

How

Testing

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

huaxingao commented Nov 25, 2025

Uh oh!

huaxingao left a comment

Choose a reason for hiding this comment

Uh oh!

jbewing commented Nov 26, 2025

Uh oh!

nastra commented Nov 26, 2025

Uh oh!

jbewing commented Nov 26, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nastra left a comment

Choose a reason for hiding this comment

Uh oh!

jbewing commented Dec 2, 2025

Uh oh!

jbewing commented Dec 2, 2025

Uh oh!

jbewing commented Dec 2, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jbewing commented Nov 25, 2025 •

edited

Loading