Skip to content

Conversation

@adam-christian-software
Copy link
Contributor

Context

This is the same change as #3026 but I accidentally destroyed my repo.

This change fixes #3021 by creating the Spark 4 client. Most of the changes are just simply copying and pasting what already exists and changing to match the proper interfaces.

I will carve out the overlapping business logic into a separate module later.

Test

Move over the same testing framework with what exists in the 3.5 plugin. In addition, I tested E2E with Parquet.

Checklist

  • 🛡️ Don't disclose security issues! (contact security@apache.org)
  • 🔗 Clearly explained why the changes are needed, or linked related issues: Fixes #
  • 🧪 Added/updated tests with good coverage, or manually tested (and explained how)
  • 💡 Added comments for complex logic
  • 🧾 Updated CHANGELOG.md (if needed)
  • [BLOCKED] 📚 Updated documentation in site/content/in-dev/unreleased (if needed). I need the updates in test: Add Some Spark Client Tests and Update Documentation on Generic Tables #3000 first.

@adam-christian-software
Copy link
Contributor Author

@flyrain & @gh-yzou - This is the same thing as I did here (#3026) before I accidentally blew away my repo. I'd love to have a conversation about whether you would be fine merging this in or if you want to do the module separation first to clean this up. I know that there is a user who wants to have this and having a first version will help us be responsive to our users.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces support for Spark 4.0 by creating a new client that mirrors the existing Spark 3.5 implementation. The changes primarily involve copying and adapting existing code to work with Spark 4.0's interfaces and dependencies, with appropriate configuration changes to support Scala 2.13 exclusively for this version.

Key Changes:

  • Created a complete Spark 4.0 client implementation in plugins/spark/v4.0/
  • Added version-specific Scala configuration to support different Scala versions per Spark version
  • Updated build configurations to handle ANTLR version conflicts between Spark/Delta dependencies
  • Included comprehensive test suites (unit, integration, and regression tests)
  • Added documentation and Docker setup for testing

Reviewed changes

Copilot reviewed 47 out of 47 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
settings.gradle.kts Added version-specific Scala version configuration
runtime/spark-tests/build.gradle.kts Forced ANTLR 4.9.3 for Spark 3.5/Delta 3.3 compatibility
plugins/spark/v4.0/spark/* Complete Spark 4.0 client implementation (catalog, REST, utilities)
plugins/spark/v4.0/integration/* Integration tests for Spark 4.0 client
plugins/spark/v4.0/regtests/* Regression test framework and scripts
plugins/spark/v3.5/integration/build.gradle.kts Forced ANTLR 4.9.3 for consistency
plugins/spark/v3.5/README.md New comprehensive documentation for Spark 3.5
plugins/spark/v4.0/README.md New comprehensive documentation for Spark 4.0
plugins/spark/README.md Updated to reference version-specific READMEs
plugins/spark/spark-scala.properties Added Spark 4.0 with Scala 2.13 only
pluginlibs.versions.toml Added Spark 4.0.1 version
CHANGELOG.md Added entry for Spark 4.0 client

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@flyrain
Copy link
Contributor

flyrain commented Dec 1, 2025

Thanks a lot for working on it, Adam! This is a solid direction. Iceberg followed a very similar approach when introducing new Spark version support, copying the existing Spark version client and adapting it accordingly. It keeps the implementation straightforward and reduces the risk of regressions. And we could delete the old versions when users have migrate the new ones.

One thing we might consider is preserving the original git commit history when copying the modules. That can make future debugging and backporting work easier, especially when comparing behaviors across Spark versions.

configurations.all {
if (name != "checkstyle") {
resolutionStrategy {
force("org.antlr:antlr4-runtime:4.9.3") // Spark 3.5 and Delta 3.3 require ANTLR 4.9.3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have to force this version? Who tries to make it different and why?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, we have an issue where our integration tests rely on Spark 3.5. Spark 3.5 relies on Spark 3.5 requires ANTLR 4.9.3. Spark 4.0 relies on ANTLR 4.13.1.

This was one way to handle this, but there actually might be a cleaner way by putting this in pluginlibs.versions.toml. I'll try that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If Spark dependencies clash with our main server (I assume this is the case), then leveraging the Apprunner is probably the cleanest solution. I'm sure it was created right for this purpose (right, @snazy ?). With Apprunner the server build is independent of the test env. and the tests will run the server in isolation. The tests will be able to use anything they need on the class path without clashing with server deps.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to close because of #3188 (comment), but I'll handle this post-first merge.

Copy link
Contributor

@gh-yzou gh-yzou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @adam-christian-software Thanks a lot for working on this! I took a quick look, i assume there is no actual src code change, most of the change is related to dependency and readme, right?

@@ -0,0 +1,83 @@
<!--

Licensed to the Apache Software Foundation (ASF) under one
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't see the get-started for spark 4.0, do we plan to reuse get-started? shall we move the get-start out for common?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I was thinking it made sense to move the getting-started out of the plugin and push it to the getting-started area that we have the other docker-compose files in.

What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the polaris client side, we mostly intended to treat it as an independent project and the only dependency we would like to put with the main project is the interface part. I think it might be better to let it still stay with polaris spark, but at the top level, what do you think?

Resolved conflict in plugins/spark/README.md by combining both the directory
listing for Spark versions and the detailed documentation from upstream.
@adam-christian-software
Copy link
Contributor Author

In order to preserve history with the previous client, I'm going to close this PR and open another one using a similar approach as:

# Copy the file using git mv, then restore the original
git mv original.txt copy.txt
git commit -m "Copy original.txt to copy.txt"
git checkout HEAD~ -- original.txt
git add original.txt
git commit -m "Restore original.txt"

@github-actions
Copy link

This PR is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions bot added the Stale label Jan 31, 2026
@github-actions github-actions bot closed this Feb 7, 2026
@github-project-automation github-project-automation bot moved this from PRs In Progress to Done in Basic Kanban Board Feb 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Create Spark 4 client

4 participants