-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Core: add row key to format v2 #2354
Conversation
* Iceberg itself does not enforce row uniqueness based on this identifier. | ||
* It is leveraged by operations such as streaming upsert. | ||
*/ | ||
public class RowIdentifier implements Serializable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have considered some alternatives for the name, and discarded the following:
- primary key: used in db systems and implies enforcing uniqueness, which does not fit Iceberg use case
- upsert id: too specific for the upsert use case
- row id: conflicts with
java.sql.RowId
, and gives a feeling of referring to a specific row - default row id: defaultXxx is used a lot in metadata, having a class with a default in prefix makes code in table metadata confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the Row identifier
is the correct semantic to express. On minor thing for me is : the name seems a bit long as a part of table spec, how about using RowKey
as the name ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RowKey
sounds good to me, I will update based on that!
* 1. a required column in the table schema | ||
* 2. a primitive type column | ||
*/ | ||
public class RowIdentifyField implements Serializable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the field only contains the source column id. Technically we can directly make fields()
a Integer[]
, but because this will be a public API, it will be hard to change in case we want to amend it in the future, so I decide to still have this separated class for the individual field information. This also aligns better with the structure of partition spec and sort order.
In SortOrder
, I see transform is also added as a part of the field, but it seems that only identity transform is permitted. It is the same situation here, so maybe we can also add transform here if anyone thinks there might be a use case of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agreed that it's extensible to introduce a separate RowIdentifyField
class here. For the transform
function, currently I don't see there's any requirement that we will need it for this row identifier fields, but I'm OK to make it extensible for future usage.
@@ -152,6 +154,7 @@ public static String toJson(TableMetadata metadata) { | |||
} | |||
} | |||
|
|||
@SuppressWarnings("checkstyle:CyclomaticComplexity") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will do another PR to fix all the code complexity complaints in this class after this is merged.
Thanks for working on this, @jackye1995! I'll take a look. |
* Iceberg itself does not enforce row uniqueness based on this identifier. | ||
* It is leveraged by operations such as streaming upsert. | ||
*/ | ||
public class RowIdentifier implements Serializable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the Row identifier
is the correct semantic to express. On minor thing for me is : the name seems a bit long as a part of table spec, how about using RowKey
as the name ?
|
||
RowIdentifyField(Types.NestedField column) { | ||
ValidationException.check(column.isRequired(), | ||
"Cannot add column %s to row identifier because it is not a required column", column); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: In this RowIdentifyField
constructor, we will throw the exception with message saying Cannot add column ...
, that seems unreasonable because we are mixed the column adding and instance constructing together. How about moving those validation check out of this constructor ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense, will update
* 1. a required column in the table schema | ||
* 2. a primitive type column | ||
*/ | ||
public class RowIdentifyField implements Serializable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agreed that it's extensible to introduce a separate RowIdentifyField
class here. For the transform
function, currently I don't see there's any requirement that we will need it for this row identifier fields, but I'm OK to make it extensible for future usage.
core/src/test/java/org/apache/iceberg/TestRowIdentifierParser.java
Outdated
Show resolved
Hide resolved
@@ -111,6 +117,11 @@ static TableMetadata newTableMetadata(Schema schema, | |||
int freshSortOrderId = sortOrder.isUnsorted() ? sortOrder.orderId() : INITIAL_SORT_ORDER_ID; | |||
SortOrder freshSortOrder = freshSortOrder(freshSortOrderId, freshSchema, sortOrder); | |||
|
|||
// rebuild the row identifier using the new column ids | |||
int freshRowIdentifierVersion = rowIdentifier.isNotIdentified() ? | |||
rowIdentifier.rowIdVersion() : INITIAL_ROW_IDENTIFIER_VERSION; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the RowIdentifier
is not identified, then the row-id-version should be 0 rather than 1 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, because not identified is default that has ID 0. Otherwise the row key should have ID 1 in this method, and then get merged with current metadata in create/replace table transaction.
359bb24
to
a79ccf5
Compare
// reassign all row keys with fresh column IDs. | ||
Types.NestedField column = schema.findField(columnName); | ||
Preconditions.checkNotNull(column, | ||
"Cannot find column in the fresh schema. name: %s, schema: %s", columnName, schema); | ||
builder.addField(column.fieldId()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just use builder.addField(columnName)
here ? The builder#addField will validate the existence of the column inside, so we don't have to check it again here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point, updated
hive-metastore/src/test/java/org/apache/iceberg/hive/TestHiveCatalog.java
Outdated
Show resolved
Hide resolved
hive-metastore/src/test/java/org/apache/iceberg/hive/TestHiveCatalog.java
Show resolved
Hide resolved
* Notice that the order of each field matters. | ||
* 2 keys with the same set of fields but different order are viewed as different. | ||
* The fields of the key should ideally be ordered based on the importance of each field | ||
* to be leveraged by features like secondary index. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment looks great, thanks for the doc.
|
||
public Builder addField(String name) { | ||
Types.NestedField column = schema.findField(name); | ||
ValidationException.check(column != null, "Cannot find column with name %s in schema", name); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about appending the schema string at the end of error message so that we could easily find out what's wrong when encountered the validation exception ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good to me
I like this PR , just left several comments. We may need to change this title to |
properties, currentSnapshotId, snapshots, snapshotLog, addPreviousFile(file, lastUpdatedMillis)); | ||
} | ||
|
||
public TableMetadata updateRowKey(RowKey newKey) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will need to introduce a public iceberg table API to update the RowKey specification, right ? Maybe we could file a separate issue to address this, we can publish the interface in that issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I have a separate PR for that after this is out.
// reassign all row keys with fresh column IDs. | ||
Types.NestedField column = schema.findField(columnName); | ||
Preconditions.checkNotNull(column, | ||
"Cannot find column in the fresh schema. name: %s, schema: %s", columnName, schema); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have to do the nullable check here, because the builder.addField(columnName)
will do the check inside its implementation. (I'm sorry I did not describe this clearly in the last comment).
return writer.toString(); | ||
|
||
} catch (IOException e) { | ||
throw new RuntimeIOException(e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We iceberg prefer to use the UncheckedIOException
rather than the deprecated RuntimeIOException
, right ?
I checked this RowKey specification twice again, in my opinion I think it's good enough to get this merged, but since it's an basic specification for iceberg table format. I'd like to invite @rdblue and @aokolnychyi to do the double-check, will be so appreciate if you two get this check when you have time, thanks. |
@jackye1995 and @openinx, I have a few questions about this before I'm comfortable merging it. Thanks for working on this so far! Why do we need to track multiple versions of the row identifier like we do for schema, partition spec, and sort order? I think of this as the "fields that identify a row". Is it helpful to have more than one view of how rows are identified? To answer that, we need to consider whether two versions are ever valid at the same time, and how row IDs are going to evolve over time:
I think both of those operations only require setting the current way of identifying rows, not keeping track of the previous ways. I'm interested to hear what everyone thinks about that and whether there is agreement. If I'm correct, then I would probably not keep track of multiple versions here. If I'm not, then I think we should ask whether the row ID columns should be tracked in the schema itself rather than separately versioned, since they will probably change at the same time the schema does -- when adding a new column that is now part of the identifier. It would be great to hear from @aokolnychyi on this as well. |
I think we should keep trace of multiple version in apache iceberg schema, let's discuss the case you described: adding profile_id to table previously identified by only account_id. t1: User defines the In my option, the iceberg table format's row identifier specification is introduced because we expect the standard SQL's ( CREATE TABLE sample(id INT, data STRING, PRIMARY KEY (id) NOT ENFORCED); ) Back to the above case, at the timestamp |
@openinx for your example, my understanding is that at t3, the equality delete file would have a row like @rdblue I actually mostly agree with what you mention, as I don't see why the example mentioned by openinx would not work, but maybe I missed something there. But I decided to go with the versioned approach because I think it can potentially be used to provide some uniqueness guarantee at read time in the future by merging rows, given the fact that now we basically have a primary key concept through RowKey and a sort key concept through SortOrder. And at that time, we will need this information to be present in the specific snapshot that we time travel to. |
@openinx, I think that @jackye1995 is right about how the case you described would be encoded. The delete files themselves always encode what columns are used for the equality delete. There is no requirement that a delete file's delete columns match the table's row identifier fields. That's one reason why we can encode deletes right now, before we've added the row identifier tracking. That also enables deleting rows by different fields than the row identifier fields, which is what makes the evolution case possible. The row identifier fields are related to deletes only in that in situations where we don't have explicit delete columns in the operation, we can default the delete columns to the row identifier fields. That's to support the From @jackye1995's second comment, I think there is at least some agreement that the row identifier columns don't need to be tracked over time. That's because there is no way to go back to an older snapshot and then manipulate that data. Time travel is read-only and data manipulation is always applied to the current snapshot, so it is reasonable that there is only ever one version of the row identifier that matters: the one that is configured at the start of the operation. Before moving ahead with this, I think we should simplify it and remove the versioning. I'm also wondering about the field ordering mentioned in the code. Is that relevant? I think of the row identifier fields as unordered and simply used to produce a projection of the table schema that is a row identifier, in whatever field order the schema had. So I would model this as an unordered set of IDs rather than as an ordered collection. |
@jackye1995, I don't think that the row identifier fields are strongly related to a clustering index. Records in an Iceberg table are physically sorted according to some order used at write time, and also distributed by a hash or ordered distribution. That logic is independent. And a future index type that has some order for its contents would probably be separately configured and would also not use the row identifier fields. |
@rdblue sounds good to me, I will make the update, thank you! |
@jackye1995 @rdblue Yes, I can understand that we've maintained the equality field ids inside each equality delete files and iterating records from an old snapshot should be correct because we've considered applying process for different equality fields ids. But in this comment, I'm not discussing the data correctness (the data correctness has no problem). I mean people should be able to read the old Another case is rollback to an older snapshot by replacing the latest snapshot, though we currently do not support replacing schema/partition-spec/sort-order with the old one, but in my mind I think we'd better to because providing an uniformed view of data files, schema and other table metadata at the old timestamp That's why I recommended to track the multiple versions of |
@rdblue Yes, if we only consider the records time-travel among different snapshots, the old versions of
People modified the My question is: after reverting the table to Currently, we implementation is the first one, that means people will need to manually change the |
I think I see the miscommunication. I don't think there is a way to roll back to t3. There is a snapshot created at t2, t3, and t5. Those snapshots are accessible via time travel and rollback. The rest of the table metadata is independent so rolling back doesn't change it. To revert both the bad write and the configuration change, the user should roll back and then set the row identifier fields to just Keeping table metadata and data separate (and only versioning data) is the right behavior, I think. Data is constantly evolving and we don't want to accidentally revert metadata changes -- like updating table properties -- when the data snapshot is rolled back. Consider a slightly different scenario where the rollback to t3 was needed because the source was producing bad data. Why should the I think the right approach is to keep data a separate dimension. Since we want Iceberg to be a coordination layer between multiple services that don't know about one another, I think it would be bad for actions that fix data to also make possibly unknown changes to metadata. |
Let me catch up with the discussion today. |
I support the idea of a row identifier as long as Iceberg does not enforce it. I see its primary usage in I also think it is important not to limit equality deletes to row identifier alone, which is currently handled by the spec as each delete file is associated with arbitrary column ids. We plan to leverage it in some MERGE INTO use cases, where the we can derive the delete column from the ON clause and merge columns can vary from operation to operation. W.r.t. versioning, I'd go simple. I think the current rollback semantics applies only to snapshots. We don't revert table properties or sort order. I believe we should treat row identifiers in the same way. That said, @openinx's use case is also valid. I have seen scenarios when users want to rollback the table state completely rather the current snapshot. I think that should be done by replacing the current pointer in the catalog to an old JSON file rather than by calling the table rollback API. Do we want to expose ways for rolling back table state to the users? I think that may be handy and should cover the use case that @openinx brought up. |
*/ | ||
public class RowKey implements Serializable { | ||
|
||
private static final RowKey NOT_IDENTIFIED = new RowKey(null, 0, ImmutableList.of()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guava collections are not Kryo friendly. We have to be careful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a great point, because we've suffered from the kryo serialization issues in here https://github.com/apache/iceberg/pull/2343/files. Maybe we could provide an unit test to cover the RowKey
kryo serialization cases ( Similar to TestDataFileSerialization).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool, will do that
* Iceberg itself does not enforce row uniqueness based on this key. | ||
* It is leveraged by operations such as streaming upsert. | ||
*/ | ||
public class RowKey implements Serializable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a strong opinion but I'd go for RowIdentifier
or RowId
. I think Key
means uniqueness but I'll be fine this way too as long as we agree Iceberg does not ensure uniqueness.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The key
does not always means uniqueness
in my mind. from the MySQL document, index
could be created on top of key columns and the index
could choose to be unique or non-unique.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we have gone back and forth on this naming, and for now I would prefer the Key
case because Id
is heavily used in table metadata to mean concepts such as spec-id
, schema-id
, order-id
, etc. which are the increasing ID of different specs. Using a different keyword Key
would provide more clarity in the table metadata.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with Jack's logic that "id" is typically used in Iceberg to refer to a numeric identifier. It would be odd to use RowId
, especially given the overlap with the JDBC one. But, we have had a significant number of people that find "key" confusing when it is a non-unique "key".
What about shifting the focus from the "key" or "identifier" to the fields? We could use identifier-field-ids
to hold the collection and add identifierFieldIds
to APIs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I renamed class RowKeyField
to RowKeyIdentifierField
, and fields
to identifier-fields
in metadata. Please let me know if that feels better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I don't have strong opinion about the RowKeyField
or RowKeyIdentifierField
. I'm okay if you think it's good for one of them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be more clear, I don't think we should ignore the near consensus from our sync discussion that "key" is misleading. I think we should instead call this class IdentityFields
(or something similar) and store identifier-field-ids
in table metadata.
Okay, I think everyone has reached a consensus on this issue
As a common iceberg table specification, the row identifier don't have to be enforced. (I've left a comment here).
I don't know much about this point, I guess you may want to use row identifier to achieve some optimizations at the spark engine level. Can you provide more information? @jackye1995 , I think we could update this PR now, thanks for the great work |
+1 on this
sounds good, will do it now. |
e88c211
to
8acb134
Compare
@@ -22,6 +22,7 @@ | |||
import java.util.List; | |||
import java.util.Map; | |||
import org.apache.iceberg.PartitionSpec; | |||
import org.apache.iceberg.RowKey; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to introduce a createTable method that could accept RowKey
in this interface ? I rise this question because when I support flink SQL primary key , I find it's necessary method. Of course, we could publish separate PR for this if possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opening a separate issue for this is good enough for me now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point, I thought about that, and there were 2 reasons that let me decide to not add it:
-
currently both Spark and Hive SQL specification does not allow a primary key clause. I would expect the user to run something like
ALTER TABLE ... ADD ROW KEY
(actual syntax TBD) to add this row key in Spark. -
it's now becoming hard to iterate through all the combinations of parameters in
createTable
, we do not have the methods with SortOrder in parameter yet, although it can be mapped toSORTED BY
clause. I would prefer us to switch to use the table builder if possible instead of adding all those overloading methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I agree it's good to use TableBuilder
to createTable
in future usage, as we are introducing more and more arguments when creating table.
@openinx, what I meant is that we can figure out the upsert columns from the ON condition of MERGE INTO.
That's different from upsert use cases where we don't have the upsert columns in the command itself. |
I think this is now unrelated to this PR, but since discussion is happening here I want to mention it:
I don't agree with the direction of adding an API to roll back the JSON file itself. That approach discards relevant history, like the fact that after
If we want to support this use case, then I think we need to make an API that will roll back a table to some point in time. That would roll back the snapshot (preserving the snapshot log) and revert metadata changes. We could do this by having a rollback API that actually uses a transaction to make multiple different changes. I've been thinking about updating the There's a big more discussion that should happen here, but I'm open to the transaction approach. I just don't think rolling back the metadata file instead of moving forward and keeping history is a good idea. |
RowKey actualKey = identifiedByX.rowKey(); | ||
Assert.assertEquals("Row key must have 1 field", 1, actualKey.identifierFields().size()); | ||
Assert.assertEquals("Row key must have the expected field", | ||
org.apache.commons.compress.utils.Sets.newHashSet(1), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: wrong import? Also may worth confirming "x" in schema is indeed assigned as id 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh yeah, Intellij keeps giving me this, I fixed a few and missed this one, thanks
@@ -482,6 +506,7 @@ public TableMetadata updateSchema(Schema newSchema, int newLastColumnId) { | |||
// rebuild all of the partition specs and sort orders for the new current schema | |||
List<PartitionSpec> updatedSpecs = Lists.transform(specs, spec -> updateSpecSchema(newSchema, spec)); | |||
List<SortOrder> updatedSortOrders = Lists.transform(sortOrders, order -> updateSortOrderSchema(newSchema, order)); | |||
RowKey updatedRowKey = updateRowKeySchema(newSchema, rowKey); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we fail if schema update drops column defined in the row key? probably want to add test for that too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point, I was thinking about having another PR for blocking column drop for column in row key after this one, do you think it's better to have it here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be a relatively small code change in this class and some testing which might be easy to include in this same PR, unless you are thinking about something different?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about the following: 1 PR for the update related changes including update row key API, implementation, and this fix in schema update; 1PR for work in Spark for SQL extension; 1PR in Flink for primary key clause (if no one did it yet)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jackye1995 your plan looks great !
@rdblue I think all other people are fine with the PR, please let me know if there are any other comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@rdblue , Pls take a final look at this specification if you have a chance. It will be great if we could reach consistence on this PR and merge this into master branch. (Some other PRs are blocked by this PR). |
* 1. a required column in the table schema | ||
* 2. a primitive type column | ||
*/ | ||
public class RowKeyIdentifierField implements Serializable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be a class that wraps a single ID. Could we get rid of it? Instead, IdentifierFields
could expose idSet
or ids
that returns Collection<Integer>
.
@jackye1995, looking at this, I think that we can simplify it quite a bit since we are no longer tracking multiple versions of the identifier fields. I started by looking at For example, we can reuse the Combining the identifier fields with schema also causes them to be versioned with schema -- we'd need to update |
Thanks for the comment Ryan, I agree that this can be simplified into a part of the schema itself, something like:
Originally we were trying to add this as a component of table metadata because of (1) benefit of versioning, (2) possibility to extend each row-identifying field for more potential use cases. Since we all agree that (2) is not likely, and we can also get versioning by adding it into schema, I think this is a good approach to go. I will update this PR tomorrow, @openinx meanwhile please let me know if you have any concern with the new approach. |
Thanks @rdblue & @jackye1995's comment, I'm okay about the simplest way to implement the row identifier specification. Let's push this work forward. Thanks. |
This is the continuation for #2010 for adding a concept that describes how a row in a table should be uniquely identified. I have the implementation ready up to the Spark SQL extension to update the row key, and will separate them into multiple PRs for review. This PR should have the same amount of content as what openInx had in the old PR.
This PR adds
RowKey
to theTable
andTableMetadata
API, and writes the metadata information as something like:I will add reasons behind the namings inline.
@openinx @rdblue @aokolnychyi