-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bump Nessie to 0.28.0, improve commit conflict detection #4594
Conversation
@@ -103,13 +123,14 @@ protected void doRefresh() { | |||
"is no longer valid.", client.getRef().getName()), e); | |||
} | |||
String metadataLocation = null; | |||
Reference reference = client.getRef().getReference(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm assuming these changes were done to not do additional API calls. These changes won't be necessary after #4593
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling Supplier.get()
a couple of times isn't necessary either.
nessie/src/main/java/org/apache/iceberg/nessie/NessieTableOperations.java
Outdated
Show resolved
Hide resolved
nessie/src/main/java/org/apache/iceberg/nessie/NessieTableOperations.java
Outdated
Show resolved
Hide resolved
@@ -819,7 +827,7 @@ private Builder(TableMetadata base) { | |||
this.startingChangeCount = changes.size(); | |||
|
|||
this.snapshotLog = Lists.newArrayList(base.snapshotLog); | |||
this.previousFileLocation = base.metadataFileLocation; | |||
this.previousFileLocation = noPreviousFile ? null : base.metadataFileLocation; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a little bit annoying that this needs to be set, so I wonder whether it would be better to have a builder method that allows overriding this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to override that too for something I'm working on. But I have just used the builder's withMetadataLocation
. In my case, the value of base.metadataFIleLocation
is already null so just calling the function works fine.
nessie/src/main/java/org/apache/iceberg/nessie/NessieTableOperations.java
Show resolved
Hide resolved
@@ -173,7 +207,7 @@ protected void doCommit(TableMetadata base, TableMetadata metadata) { | |||
throw new CommitStateUnknownException(ex); | |||
} catch (NessieNotFoundException ex) { | |||
throw new RuntimeException( | |||
String.format("Cannot commit: Reference '%s' no longer exists", client.getRef().getName()), ex); | |||
String.format("Cannot commit: Reference '%s' no longer exists", updateableReference.getName()), ex); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I would probably revert all of those changes as I don't think they are necessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling Supplier.get()
a couple of times isn't necessary either.
b4c5b88
to
0fd33b4
Compare
TableMetadata.Builder builder = TableMetadata.buildFrom(deserialized, true) | ||
.setCurrentSchema(table.getSchemaId()) | ||
.setDefaultSortOrder(table.getSortOrderId()) | ||
.setDefaultPartitionSpec(table.getSpecId()) | ||
.withMetadataLocation(metadataLocation); | ||
.withMetadataLocation(metadataLocation) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you don't need to pass true
in here. If base
doesn't have that field set, it will just be null. Otherwise it will get overridden after this builder is called with .build()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem here is that the buildFrom
/Builder.<init>()
sets the final field previousFileLocation
, which is then used in Builder.build()
to add a MetadataLogEntry
. Builder.previousFileLocation
and Builder.metadataLocation
are different.
d36118d
to
bfc6655
Compare
@@ -155,7 +155,7 @@ public static String toJson(TableMetadata metadata) { | |||
} | |||
} | |||
|
|||
private static void toJson(TableMetadata metadata, JsonGenerator generator) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we want to change accessibility on these methods, why is the custom JsonGenerator needed in the Nessie code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sigh
fair enough
TableMetadata.Builder builder = TableMetadata.buildFrom(TableMetadataParser.read(io(), metadataLocation)) | ||
TableMetadata deserialized; | ||
if (table.getMetadata() != null) { | ||
deserialized = TableMetadataParser.fromJson(io(), metadataLocation, table.getMetadata().getMetadata()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we have this path in to request metadata out of a Nessie specific object which stores the metadata as a Json String that needs to be parsed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think i see, the IcebergTable class has a JsonNode? And our current public method only accepts a string, it doesn't allow directly injecting the node. I kind of think we should probably stick to that "string" api rather that opening up another method to the public but I may be paranoid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Not yet, but in the future, yes.)
It's already a JSON node tree (as a generic JsonNode
) - so the text->nodes parsing already happened.
As I didn't want to serialize the JsonNode
to a String
to get a TableMetadata
, I've updated the TableMetadataParser
.
if (table.getMetadata() != null) { | ||
deserialized = TableMetadataParser.fromJson(io(), metadataLocation, table.getMetadata().getMetadata()); | ||
} else { | ||
deserialized = TableMetadataParser.read(io(), metadataLocation); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise we just read from the location on disk?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right
@@ -755,6 +755,10 @@ public static Builder buildFrom(TableMetadata base) { | |||
return new Builder(base); | |||
} | |||
|
|||
public static Builder buildFrom(TableMetadata base, boolean noPreviousFile) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A new public api here needs a java doc, I'm not sure I follow what noPreviousFile is expected to do here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a "smart thing" in the builder. If it's present, then it add it as a history item. That behavior is okay, when buildFrom()
is only used when it is used within a "modify TableMetadata
operation". But the Nessie catalog only uses it to modify the TableMetadata
as it is - so it's not a change, which requires a new history-item.
Without the noPreviousFile==true
, a couple of catalog tests fail, because those assert on the number of history items.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@snazy I've been thinking on this and I'm wondering why can't implement a builder like .withPreviousFileLocation
to set that to previousFileLocation to null
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WFM, updated the PR
JsonNode newMetadata; | ||
try { | ||
TokenBuffer tokenBuffer = new TokenBuffer(JsonUtil.factory().getCodec(), false); | ||
TableMetadataParser.toJson(metadata, tokenBuffer); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we could avoid this usage if we use the "toJson(metadata) method and then pass that into the token buffer. I know it's a bit more wasteful but we are only doing this once per operation and it let's us keep the toJson(metadata, generator) method package private. Again I may be being paranoid here but ideally we don't open up any more methods than we absolutely have to.
Enhances Iceberg commit conflict detection by maintaining the commit-id from which a table metadata has been loaded, to use it as the "expected hash" in a Nessie commit. Makes Nessie specific properties available in `TableMetadata` properties: * `nessie.content.id` - the Nessie `Content.getId()` * `nessie.commit.id` - the commit ID used to retrieve the table metadata * `nessie.reference.name` - the reference name from which the table metadata has been loaded Also fixes an issue via `org.apache.iceberg.nessie.NessieTableOperations#loadTableMetadata` that caused too many "previous files", because the `TableMetadata.buildFrom()` assumed that it's only called for ongoing modifications.
@@ -173,7 +203,8 @@ protected void doCommit(TableMetadata base, TableMetadata metadata) { | |||
throw new CommitStateUnknownException(ex); | |||
} catch (NessieNotFoundException ex) { | |||
throw new RuntimeException( | |||
String.format("Cannot commit: Reference '%s' no longer exists", client.getRef().getName()), ex); | |||
String.format("Cannot commit: Reference '%s' no longer exists", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a formatting change
@@ -189,7 +220,8 @@ private boolean isSnapshotOperation(TableMetadata base, TableMetadata metadata) | |||
|
|||
private String buildCommitMsg(TableMetadata base, TableMetadata metadata) { | |||
if (isSnapshotOperation(base, metadata)) { | |||
return String.format("Iceberg %s against %s", metadata.currentSnapshot().operation(), tableName()); | |||
return String.format("Iceberg %s against %s", metadata.currentSnapshot().operation(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a formatting change
Enhances Iceberg commit conflict detection by maintaining the commit-id from which a
table metadata has been loaded, to use it as the "expected hash" in a Nessie commit.
Makes Nessie specific properties available in
TableMetadata
properties:nessie.content.id
- the NessieContent.getId()
nessie.commit.id
- the commit ID used to retrieve the table metadatanessie.reference.name
- the reference name from which the table metadata has been loadedAlso fixes an issue via
org.apache.iceberg.nessie.NessieTableOperations#loadTableMetadata
that caused too many "previous files", because the
TableMetadata.buildFrom()
assumed thatit's only called for ongoing modifications.