-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Core: add schema id and schemas to table metadata #2096
Conversation
@@ -242,6 +255,7 @@ public static TableMetadata read(FileIO io, InputFile file) { | |||
} | |||
} | |||
|
|||
@SuppressWarnings("checkstyle:CyclomaticComplexity") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember this also shows up when adding primary key #2010 , I think we should probably find some way to refactor the code a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I imagine there would be some conflict when either of the PRs got merged. I think the main problem with refactoring this is that a lot of the things it parsed should be parsed together/validate each other; e.g. parse of schema array has to come together with current schema id since both will either present at the same time, or derived from a different field, and same goes for sort order/partition spec. Since Java only allows returning a single object per method, we either have to refactor the code a bit and ensure the logic flow doesn't change and is readable, or let the new methods to return things like tuple<Integer, List<Schema>>
that I'm not sure which way is cleaner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could have a method that parses schemas and a separate one that returns the current schema ID. That said, I'm fine with suppressing the complexity warning for now.
/** | ||
* A wrapper around Schema to include its schema ID | ||
*/ | ||
public class VersionedSchema implements Serializable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The biggest question I have for this PR is whether we should have this wrapped class for adding schema ID, or we should just add schema ID to the Schema
class itself, maybe as an optional Integer
. Because Iceberg have its own jackson parser for serdes, it is not hard to ignore the schemaId
field or set a default for it for backwards compatibility. What is the benefit of this VersionedSchema
approach?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mentioned this briefly in the PR description:
The reason to not add ID directly into Schema is that currently schema creation is widely used as a convenient method for a lot of actions that don't involve a "real" table schema.
The main concern I had with adding schema ID is that it might be confusing to tell from the code base when a schema should have ID and when not. A schema only needs to have ID when it's really a table schema strictly associated with table metadata, but outside of that context, the ID is not useful in any "real" operation of the schema, especially considering that it's often used for conveniently mutate fields for things like projection. Maybe we should rename VersionedSchema
to something like TableSchema
to better reflect this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I agree with @jackye1995 here. We could add the ID to Schema
and just have it use either null
or 0
if there is no schema ID. I think we can use 0
.
If there is no schema ID, then there will not be a schemas
list, and all of the snapshots would not have a schema ID. So if an older writer updates a table, then we can expect the current schema ID to be dropped, older schemas to be dropped, snapshot schema IDs to be dropped, etc. So it is consistent with having just one schema. That should be safe for v1 because we won't leave any dangling IDs that could reference the wrong schema later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My comment above is thinking through your question 1 in the description. I think it is okay to do this as you have it here.
@Override | ||
public int hashCode() { | ||
return Objects.hash(struct, aliasToId); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why add equals
and hashCode
to Schema
? The reason why we didn't already have this implemented is that you can compare the type using schema.asStruct().equals(...)
. And we don't want to define whether two schemas are "equal" based on the aliases because the same schema can have different original aliases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the new change I introduced, we can reuse the old schema ID during schema update, if the exact same schema already exists in the schema list. I think in this case we may need to confirm both schemas' aliases should be the same? Or do you think there shouldn't be a case when schema are the same while aliases are not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that should use aliases. Aliases are populated when converting from a file schema to Iceberg, so that we can look up the file's column names from IDs and vice versa. That information is used by integration and not by Iceberg itself. The aliases are not serialized when writing a schema into Iceberg metadata.
For the purposes of maintaining the schema list, we can ignore aliases and just use schema.asStruct()
.
@@ -161,8 +163,19 @@ private static void toJson(TableMetadata metadata, JsonGenerator generator) thro | |||
generator.writeNumberField(LAST_UPDATED_MILLIS, metadata.lastUpdatedMillis()); | |||
generator.writeNumberField(LAST_COLUMN_ID, metadata.lastColumnId()); | |||
|
|||
generator.writeFieldName(SCHEMA); | |||
SchemaParser.toJson(metadata.schema(), generator); | |||
// for older readers, continue writing the current schema as "schema" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might want to note that this is done until v2 because support for schemas
and current-schema-id
is required in v2 and later.
My initial thought was no because the schema IDs are completely local to a metadata file. We can reassign all IDs from one file to the next and as long as they are internally consistent we are okay. The IDs are not embedded in separate files that might get improperly used. But, the Nessie comment you pointed to is a good one to think about. I think it is still okay because Nessie merges would either be fast-forward, or would re-apply changes and create entirely new metadata files. So the internal consistency argument still holds.
I don't think so.
I would add it to Snapshot, not just History. Snapshot is what will be used for time travel queries. I would avoid adding too much metadata to history.
I think we should follow up and keep the history. I think the reason for this is because we previously didn't have any compatibility across schemas (IDs were completely reassigned, so time travel would be incorrect). But that's fixed now. |
Thank you for the review and the input! I'll create a separate issue to mention the history problem and link here. Do you have any suggestion on if we want to add schema id to the history table? |
I don't think that we want to add it to history. The snapshot log should refer to snapshots by ID and we can get the schema easily enough from there. For the actual metadata table, we don't want to expose schema ID because that's something that users should ideally never need to know much about. We've avoided adding IDs in other similar situations. For example, you can rewrite manifests for a specific partition spec, by ID. But we didn't include that in the stored procedure because we haven't exposed the partition spec ID anywhere to SQL users. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me from code, I am testing locally with this branch to see if there are any other potential issues, will reply back after that.
@@ -157,6 +157,21 @@ public static Schema assignFreshIds(Schema schema, NextID nextId) { | |||
.fields()); | |||
} | |||
|
|||
/** | |||
* Assigns fresh ids from the {@link NextID nextId function} for all fields in a schema. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: function should be outside the bracket
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm actually I think either way may make sense? We either link nextId function
or nextId
to NextId
interface, and I think both are valid interpretations? I'm also a bit hesitant to change since I copied this from another method, and there are a few others that use this style.
@@ -134,6 +143,10 @@ static void toJson(Type type, JsonGenerator generator) throws IOException { | |||
} | |||
} | |||
|
|||
public static void toJsonWithVersion(Schema schema, JsonGenerator generator) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: should this be called toJsonWithId
, just to match the fromJsonWithId
below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, will update
Works fine for me so far, tested a few operations among a few clusters of different versions. One behavior I see is that, if an old v1 writer is somehow still used to create a new table version after v2 writer is used, all schema history will be dropped, so new schema written by v2 writer will have schema id starting from 0 again, which can break time travel query. In theory, once a table is written by a v2 writer, it should never be touched by a v1 writer, but we cannot prevent people from doing that in real life, especially when the organization is large and people use different versions of the same software all the time. So I feel we still need something like a |
Thanks for putting efforts into testing this! I think the problem you described seem to be similar to what I had in my question 1, and I think we might be fine with that since the IDs within the metadata file should always be consistent, and we don't expose them in metadata table as Ryan mentioned above. I think it wouldn't break time travel query, as after the old writer writes the data, the schema ID with the snapshot got dropped, in which case we will likely need to rely on #1508 for time travel? |
I see, that makes sense. |
@@ -54,6 +58,7 @@ | |||
private transient Map<Integer, String> idToName = null; | |||
|
|||
public Schema(List<NestedField> columns, Map<String, Integer> aliases) { | |||
this.schemaId = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should add a constant for the default schema ID. That way we can reference it in other places when we fill in the default.
@@ -134,6 +143,10 @@ static void toJson(Type type, JsonGenerator generator) throws IOException { | |||
} | |||
} | |||
|
|||
public static void toJsonWithId(Schema schema, JsonGenerator generator) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is only called in TableMetadataParser
and I don't see why there should be two toJson
methods. It isn't a problem for the v1 metadata to be written with the schema ID because extra fields are ignored. So I think that this method isn't needed and toJson
should always pass the ID to the struct writer.
toJson(struct, null, generator); | ||
} | ||
|
||
static void toJson(Types.StructType struct, Integer schemaId, JsonGenerator generator) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the struct toJson
methods aren't called outside of this file. What do you think about making them private?
@@ -253,4 +266,15 @@ public static Schema fromJson(String json) { | |||
} | |||
}); | |||
} | |||
|
|||
public static Schema fromJsonWithId(int schemaId, JsonNode json) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer if all of the parsing methods returned the correct ID instead of having some that use a version that ignores the ID.
I think you chose to use two methods because there are some situations where the ID wasn't written and you don't want the schema to have the wrong ID. But, because the fromJson
method uses the Schema
constructor that does not pass the ID, the schemas will have the default ID, 0, anyway.
As long as the schemas have an ID, I think there is no need to have multiple parser methods, both with and without ID. Also, we need to make sure that the IDs are consistent when reading the schema from files that were written before IDs. That should just be manifest files, where the schema is embedded in the header.
For manifests, we need to add the schema ID when writing (https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/ManifestWriter.java#L190). And when reading, we will need to update how the spec is constructed. If the spec is loaded by ID then it's fine. Otherwise we should load the schema by ID to parse the spec, and if that doesn't work we should parse both the schema and the spec (https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/ManifestReader.java#L115-L120). I think we should probably also try to take the parsed schema and find the correct one from table metadata so that we don't have an incorrect ID anywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have removed fromJsonWithId
and update fromJson
to use the constructor with schema Id, if Id present in Json blob, or default to constructor that doesn't require Id. Please let me know if it is expected!
Regarding persisting schema ID in manifest, since schema and partition-spec in both v1 and v2 are required in manifest metadata, I think it is unlikely that we will miss schema itself but has schema ID for lookup? And I think in manifest reader, schema is only used for constructing partition spec, so that having a correct schema ID might not be super necessary.
Today there are a lot of places that rely on toJson/fromJson for schema, e.g. Avro metadata, Hive table properties, scan tasks. In this PR, except for table metadata parser, those are all using the toJson
version that do not write schemaId at all, which means that when the strings are read back to Schema object they will always have id of 0; and this goes back to the persisting schema ID question above, that we do not persist id in the manifest header, so we are not persisting an incorrect ID, but instead constructed a schema without ID that default to be 0 like everywhere else.
My original thinking was that schema Id should always be ignored/considered inaccurate unless we directly get it from table metadata, as after all kinds of projections it might be hard to track if the ID is correct. And the ID is only used for lookup, so if we have the schema struct, it's fine to have incorrect ID since we are unlikely to use it. But I might be overcomplicating the problem. Do you think we are able to guarantee the ID to be right everywhere, and do you have comment about the current state of toJson
variations (that only the one used by metadata parser writes out ID)?
lastColumnId.get(), freshSchema, INITIAL_SPEC_ID, ImmutableList.of(freshSpec), | ||
freshSpec.lastAssignedFieldId(), freshSortOrderId, ImmutableList.of(freshSortOrder), | ||
lastColumnId.get(), freshSchema.schemaId(), ImmutableList.of(freshSchema), | ||
INITIAL_SPEC_ID, ImmutableList.of(freshSpec), freshSpec.lastAssignedFieldId(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that you use freshSchema.schemaId()
instead of using INITIAL_SCHEMA_ID
again. Should we do the same for INITIAL_SPEC_ID
?
|
||
private int reuseOrCreateNewSchemaId(Schema newSchema) { | ||
// if the schema already exists, use its id; otherwise use the highest id + 1 | ||
int newSchemaId = currentSchemaId; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be currentSchemaId + 1
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the for loop below it will check if each existing schema's ID is greater or equal to this newSchemaId
and +1 if so, so I think at the end of the method it will be currentSchemaId + 1 if no reuse happens?
} | ||
} | ||
|
||
private int reuseOrCreateNewSchemaId(Schema newSchema) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you move this down to the bottom with other private helper methods?
} | ||
|
||
ImmutableList.Builder<Schema> builder = ImmutableList.<Schema>builder() | ||
.addAll(schemas); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be on the previous line?
defaultSortOrderId, sortOrders, properties, | ||
currentSnapshotId, snapshots, snapshotLog, addPreviousFile(file, lastUpdatedMillis)); | ||
defaultSortOrderId, sortOrders, properties, currentSnapshotId, snapshots, snapshotLog, | ||
addPreviousFile(file, lastUpdatedMillis)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this change wasn't needed and just moves 3 variables?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry about these, I got confused on which changes were mine when resolving merge conflicts, I'll be more careful next time.
snapshots, newSnapshotLog, addPreviousFile(file, lastUpdatedMillis)); | ||
lastSequenceNumber, System.currentTimeMillis(), lastColumnId, currentSchemaId, schemas, defaultSpecId, specs, | ||
lastAssignedPartitionId, defaultSortOrderId, sortOrders, properties, currentSnapshotId, snapshots, | ||
newSnapshotLog, addPreviousFile(file, lastUpdatedMillis)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did snapshots
need to move?
snapshots, snapshotLog, addPreviousFile(file, lastUpdatedMillis)); | ||
lastSequenceNumber, System.currentTimeMillis(), lastColumnId, currentSchemaId, schemas, defaultSpecId, specs, | ||
lastAssignedPartitionId, defaultSortOrderId, sortOrders, properties, currentSnapshotId, snapshots, snapshotLog, | ||
addPreviousFile(file, lastUpdatedMillis)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like it isn't necessary to move snapshots
and snapshotLog
.
snapshots, snapshotLog, addPreviousFile(file, lastUpdatedMillis)); | ||
lastSequenceNumber, System.currentTimeMillis(), lastColumnId, currentSchemaId, schemas, defaultSpecId, specs, | ||
lastAssignedPartitionId, defaultSortOrderId, sortOrders, properties, currentSnapshotId, snapshots, snapshotLog, | ||
addPreviousFile(file, lastUpdatedMillis)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here. Could you remove unnecessary reformatting?
Preconditions.checkArgument(formatVersion == 1, | ||
"%s must exist in format v%s", SCHEMAS, formatVersion); | ||
|
||
currentSchemaId = TableMetadata.INITIAL_SCHEMA_ID; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be schema.schemaId()
instead of referencing the constant.
|
||
List<Schema> schemas; | ||
int currentSchemaId; | ||
Schema schema = null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why set this to null but not the two others?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is because in the first if statement, after parsing the schema array and assign the schema, I did a check to ensure the schema holding current schema ID exists in schema array. Without this null check, the Preconditions.checkArgument(schema != null, ...)
will highlight with error warning "schema variable might not be initialized".
snapshots, entries.build(), metadataEntries.build()); | ||
lastSequenceNumber, lastUpdatedMillis, lastAssignedColumnId, currentSchemaId, schemas, defaultSpecId, specs, | ||
lastAssignedPartitionId, defaultSortOrderId, sortOrders, properties, currentVersionId, snapshots, | ||
entries.build(), metadataEntries.build()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Snapshots doesn't need to move.
@@ -100,8 +101,15 @@ public void testJsonConversion() throws Exception { | |||
.add(new SnapshotLogEntry(currentSnapshot.timestampMillis(), currentSnapshot.snapshotId())) | |||
.build(); | |||
|
|||
Schema schema = new Schema(6, | |||
Types.NestedField.required(10, "x", Types.StringType.get())); | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: extra newline.
@@ -78,4 +82,21 @@ public void testSerialization() throws Exception { | |||
Lists.transform(result.snapshots(), Snapshot::snapshotId)); | |||
Assert.assertEquals("History should match", meta.snapshotLog(), result.snapshotLog()); | |||
} | |||
|
|||
private void assertSameSchemaList(List<Schema> list1, List<Schema> list2) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to factor this into a common util rather than copying it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't find a test util class in core, but looks like there's a TestHelpers
in api module so I'll put it there.
Thanks @yyanyy! I think this is getting close. The main thing is not exposing extra id-aware methods in the parser and also reading/writing manifests with IDs. Adding schema ID to snapshots will be a follow-up, right? |
Thanks for the review and sorry for delay responding! And yes, schema ID to snapshot will be a follow up PR. |
TableMetadata freshTable = TableMetadata.newTableMetadata( | ||
schema, PartitionSpec.unpartitioned(), null, ImmutableMap.of()); | ||
Assert.assertEquals("Should use TableMetadata.INITIAL_SCHEMA_ID for current schema id", | ||
TableMetadata.INITIAL_SCHEMA_ID, freshTable.currentSchemaId()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: The schema passed in uses the same ID as the one that is validated (INITIAL_SCHEMA_ID=0
). I think to test this, it would be better to pass in a schema with a non-zero ID.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean to change the schema id supplied to TableMetadata.newTableMetadata
to be non-0? Actually in TableMetadata
the input schema Id will be ignored and always assign from 0, so changing this will break the test... Although I think this current behavior should be fine since I think this method is only called when creating a new table, so there shouldn't be any existing schema within the table that will collide on schema ID.
); | ||
TableMetadata sameSchemaTable = twoSchemasTable.updateSchema(sameSchema2, 2); | ||
Assert.assertEquals("Should return same table metadata", | ||
twoSchemasTable, sameSchemaTable); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this use assertSame
since the object must be the same TableMetadata
instance and not just equal to it?
Types.NestedField.required(4, "x", Types.StringType.get()), | ||
Types.NestedField.required(6, "z", Types.IntegerType.get()) | ||
); | ||
TableMetadata threeSchemaTable = revertSchemaTable.updateSchema(schema3, 3); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: the last assigned ID should be 6 since z
has ID 6.
Looks great, thanks for updating this, @yyanyy! There were a few minor comments in tests, but we can pick up those fixes later. I think overall this looks great! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of minor nits. I know this has been merged, but I understand that you're planning a follow-up to address nits in the tests that Ryan Blue pointed out.
SEQ_NO, System.currentTimeMillis(), 3, | ||
7, ImmutableList.of(TEST_SCHEMA, schema), | ||
5, ImmutableList.of(SPEC_5), SPEC_5.lastAssignedFieldId(), | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: extraneous blank line.
@@ -223,6 +237,7 @@ public static String toJsonWithoutSpecList(TableMetadata metadata) { | |||
generator.writeNumberField(LAST_UPDATED_MILLIS, metadata.lastUpdatedMillis()); | |||
generator.writeNumberField(LAST_COLUMN_ID, metadata.lastColumnId()); | |||
|
|||
// mimic an old writer by writing only schema and not the current ID or schema list | |||
generator.writeFieldName(SCHEMA); | |||
SchemaParser.toJson(metadata.schema(), generator); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that to properly test backward compatibility, the schema in this schema
field should not have a schema-id
, so this should be SchemaParser.toJson(metadata.schema().asStruct(), generator);
This PR adds
current-schema-id
andschemas
to table metadata. It also introduces a wrapper around schema to associate table schema with id.The reason to not add ID directly into
Schema
is that currently schema creation is widely used as a convenient method for a lot of actions that don't involve a "real" table schema.Next steps:
schema-id
to snapshot interface and populateschemas
to look up the right schema in time travel queries; this may mean to addschemas()
inTable
APIhistoryTable
(will be mentioned later)Open questions:
historyTable
metadata table, at different time ID 0 could mean different things in this history table. We could potentially workaround this by only exposingschemaID
field inhistoryTable
only for v2 tables, or mention this caveat on spec.last-assigned-schema-id
to table metadata? My answer would be yes, for a similar reason mentioned in this commentsnapshot-id
to only history entries, orSnapshot
interface? I think former is enough for fixing the time travel query problem, but we might want the latter in case we need this information later.snapshotLog
will be reset to empty (second to last argument in here). Is this expected? do we want to fix this as a separate issue?