Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Spark][Kernel] Pass table identifier in Coordinated Commits #3608

Merged

Conversation

prakharjain09
Copy link
Collaborator

@prakharjain09 prakharjain09 commented Aug 27, 2024

Which Delta project/connector is this regarding?

  • Spark
  • Standalone
  • Flink
  • Kernel
  • Other (fill in here)

Description

This PR adds TableIdentifier in the Coordinated Commit Interface.

There will be a separate change to pass the tableName reliably to the Commit Coordinator in delta-spark.

How was this patch tested?

Existing UTs

Does this PR introduce any user-facing changes?

No

@prakharjain09 prakharjain09 force-pushed the prakharjain09/cc-api-tableId-temp branch from 52447f6 to a33c191 Compare August 27, 2024 05:32
@scovich
Copy link
Collaborator

scovich commented Aug 27, 2024

Missing some newly added files by chance?

[error] /home/runner/work/delta/delta/storage/src/main/java/io/delta/storage/commit/CommitCoordinatorClient.java:73:1: cannot find symbol
[error]   symbol:   class TableIdentifier
[error]   location: interface io.delta.storage.commit.CommitCoordinatorClient
[error] TableIdentifier
[error] /home/runner/work/delta/delta/storage/src/main/java/io/delta/storage/commit/CommitCoordinatorClient.java:97:1: cannot find symbol
[error]   symbol:   class TableDescriptor
[error]   location: interface io.delta.storage.commit.CommitCoordinatorClient
[error] TableDescriptor

@prakharjain09 prakharjain09 force-pushed the prakharjain09/cc-api-tableId-temp branch 2 times, most recently from 48b79ff to 668920b Compare August 27, 2024 21:49
@prakharjain09 prakharjain09 requested a review from scovich August 27, 2024 21:49
@prakharjain09
Copy link
Collaborator Author

Missing some newly added files by chance?

[error] /home/runner/work/delta/delta/storage/src/main/java/io/delta/storage/commit/CommitCoordinatorClient.java:73:1: cannot find symbol
[error] symbol: class TableIdentifier
[error] location: interface io.delta.storage.commit.CommitCoordinatorClient
[error] TableIdentifier
[error] /home/runner/work/delta/delta/storage/src/main/java/io/delta/storage/commit/CommitCoordinatorClient.java:97:1: cannot find symbol
[error] symbol: class TableDescriptor
[error] location: interface io.delta.storage.commit.CommitCoordinatorClient
[error] TableDescriptor

Yes - fixed it.

Copy link
Collaborator

@scovich scovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR seems to mix two things:

  1. Introducing the TableDescriptor and plumbing it through all the various method signatures and call sites. Important, but very noisy.
  2. Introducing the TableIdentifier and barely starting to plumb it in a couple of spots (which just pass a hard-wired empty option).

Can we move the second part to the PR that actually plumbs through table identifiers in a meaningful way? Then this PR would be a lot easier to review, and probably that later PR as well (plus meaningful testing becomes possible there).

@prakharjain09 prakharjain09 force-pushed the prakharjain09/cc-api-tableId-temp branch from 668920b to d94a825 Compare August 29, 2024 17:15
@prakharjain09 prakharjain09 changed the title [WIP][Spark][Kernel] Pass table identifier in Coordinated Commits [Spark][Kernel] Pass table identifier in Coordinated Commits Aug 29, 2024
@prakharjain09 prakharjain09 force-pushed the prakharjain09/cc-api-tableId-temp branch from d94a825 to 9775eac Compare August 29, 2024 17:50
@prakharjain09
Copy link
Collaborator Author

Can we move the second part to the PR that actually plumbs through table identifiers in a meaningful way?

The changes to plumb TableIdentifier is a superset of the changes to plumb TableDescriptor. This PR focuses on the TableDescriptor part and together with it also takes care of plumbing TableIdentifier in those areas. Have left specific TODOs to fix the call-sites and pass a non-empty TableIdentifier which will be taken in next set of PRs.

Copy link
Collaborator

@vkorukanti vkorukanti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kernel changes LGTM

Copy link
Collaborator

@scovich scovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Just double check some files that may import java Optional unnecessarily?

catalystTableIdentifier.catalog.toSeq ++
catalystTableIdentifier.database.toSeq
new TableIdentifier(namespace.toArray, catalystTableIdentifier.table)
}.map(Optional.of[TableIdentifier]).getOrElse(Optional.empty[TableIdentifier])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There isn't a(ny easy) way to directly convert a scala Option into a java Optional?
https://www.javadoc.io/doc/org.scala-lang.modules/scala-java8-compat_2.11/0.6.0/scala/compat/java8/OptionConverters.html
https://www.scala-lang.org/api/2.13.6/scala/jdk/OptionConverters$.html

(maybe it's a scala-2.13 thing we don't have a compat shim for yet?)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed we currently lack that particular shim... https://github.com/delta-io/delta/blob/master/spark/src/main/scala/org/apache/spark/sql/util/ScalaExtensions.scala#L22

Maybe we should consider adding it as a follow-up item, since coordinated commit code has a lot of scala/java interop?

@@ -18,6 +18,7 @@ package org.apache.spark.sql.delta

// scalastyle:off import.ordering.noEmptyLine
import java.nio.file.FileAlreadyExistsException
import java.util.Optional
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused import?

@@ -91,6 +93,40 @@ object DeltaTestImplicits {
}
}

/** Helper class for working with [[TableCommitCoordinatorClient]] */
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These helpers avoid having to retrofit zillions of call sites in path-based unit tests?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - these are very useful.

@vkorukanti vkorukanti merged commit a745000 into delta-io:master Aug 30, 2024
14 of 17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants