feat: support metadata table "snapshots"#822
Conversation
186f075 to
e137fbb
Compare
e137fbb to
b3cf119
Compare
crates/iceberg/src/metadata_scan.rs
Outdated
| /// Returns the schema of the metadata table. | ||
| fn schema() -> Schema; |
There was a problem hiding this comment.
I need to make this separate and public, so engines can get the schema first without fetching the data.
There was a problem hiding this comment.
Some of the metadata tables (e.g. entires) have schemas that depend on the table schema. E.g. readable_metrics is a struct with field per column. Might be worth having this take a scan as well or do so in another PR.
Signed-off-by: xxchan <xxchan22f@gmail.com>
8ca0ab6 to
973129e
Compare
crates/iceberg/src/metadata_scan.rs
Outdated
| pub struct MetadataScan { | ||
| metadata_ref: TableMetadataRef, | ||
| } |
There was a problem hiding this comment.
Wondering what you think about
| pub struct MetadataScan { | |
| metadata_ref: TableMetadataRef, | |
| } | |
| pub struct MetadataScan<'a> { | |
| table: &'a Table, | |
| } |
so we can get at FileIO and other table properties that aren't in the metadata.
Signed-off-by: xxchan <xxchan22f@gmail.com>
Signed-off-by: xxchan <xxchan22f@gmail.com>
Signed-off-by: xxchan <xxchan22f@gmail.com>
973129e to
f40aa87
Compare
crates/iceberg/src/metadata_scan.rs
Outdated
| fn schema() -> Schema; | ||
|
|
||
| /// Scans the metadata table. | ||
| fn scan(scan: &MetadataScan) -> Result<RecordBatch>; |
There was a problem hiding this comment.
Thoughts on making async because some of the metadata scans will do async reads of metadata files.
There was a problem hiding this comment.
+1 for this, the manifests table requires an async call load_manifest_list() with a dependency on FileIO.
|
@rshkv Thanks for the review and the follow-up work. I just want to wait for the maintainers to make sure the framework looks good before applying any suggestions :p BTW I think we can also change the details like signatures and fields on demand later. |
|
Yeah, totally!
Yeah that's what I was worried about. Sg. 👍🏻 |
Signed-off-by: xxchan <xxchan22f@gmail.com>
Xuanwo
left a comment
There was a problem hiding this comment.
Hi, it looks much better now. Only some minor questions.
Signed-off-by: xxchan <xxchan22f@gmail.com>
|
Thanks for the review. Updated. Please take a another look. |
Xuanwo
left a comment
There was a problem hiding this comment.
This PR is already in good shape. Let's merge it to unblock other pending PRs. Suggested changes can be made in a separate PR.
| pub mod io; | ||
| pub mod spec; | ||
|
|
||
| pub mod metadata_scan; |
There was a problem hiding this comment.
Should be changed in to metadata_table.
|
|
||
| /// Creates a metadata table which provides table-like APIs for inspecting metadata. | ||
| /// See [`MetadataTable`] for more details. | ||
| pub fn metadata_table(self) -> MetadataTable { |
There was a problem hiding this comment.
Better to use &self here to align with other APIs.
#823
In Iceberg Spark/Flink there are metadata tables that provide information around the table: https://iceberg.apache.org/docs/latest/spark-queries/#inspecting-tables
Supporting this in
iceberg-rustallows other engines (like RisingWave) to support these "standard" metadata tables.In this PR, I added a trait for metadata tables and the first implementation for the
snapshotsmetadata table. If it looks good, we can add more later.reference implementation:
- https://py.iceberg.apache.org/api/#inspecting-tables
- https://github.com/apache/iceberg-python/blob/main/pyiceberg/table/inspect.py#L58-L90