Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Core: Add MetadataLog metadata table #5063

Merged
merged 5 commits into from
Jul 22, 2022

Conversation

singhpk234
Copy link
Contributor

@singhpk234 singhpk234 commented Jun 16, 2022

This PR aims to expose MetadataLog as a metadata table which will be beneficial, for us to figure out what the metadata.json path this table points to.

This supersedes the PR

where we were exposing the current metadata location via DESC TABLE EXTENDED.

> SELECT * from {tableName}.metadata_logs;
+----------------+-----------------------------------------------------------------------------------------------------------------------------------------------------+------------------+----------------+----------------------+
|timestamp_millis|file                                                                                                                                                 |latest_snapshot_id|latest_schema_id|latest_sequence_number|
+----------------+-----------------------------------------------------------------------------------------------------------------------------------------------------+------------------+----------------+----------------------+
|1658211678838   |file:/var/folders/9h/w6r5pljd597gy_plht_4llzs63hv0p/T/hive2324856887883087837/table/metadata/00000-9441e604-b3c2-498a-a45a-6320e8ab9006.metadata.json|null              |null            |null                  |
|1658211679192   |file:/var/folders/9h/w6r5pljd597gy_plht_4llzs63hv0p/T/hive2324856887883087837/table/metadata/00001-f30823df-b745-4a0a-b293-7532e0c99986.metadata.json|170260833677645300|0               |1                     |
|1658211679682   |file:/var/folders/9h/w6r5pljd597gy_plht_4llzs63hv0p/T/hive2324856887883087837/table/metadata/00002-2cc2837a-02dc-4687-acc1-b4d86ea486f4.metadata.json|958906493976709774|0               |2                     |
+----------------+-----------------------------------------------------------------------------------------------------------------------------------------------------+------------------+----------------+----------------------+


> SELECT * from {tableName}.snapshots
+-----------------------+------------------+------------------+---------+----------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|committed_at           |snapshot_id       |parent_id         |operation|manifest_list                                                                                                                                                   |summary                                                                                                                                                                                                                                                                                         |
+-----------------------+------------------+------------------+---------+----------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|2022-07-19 11:51:19.192|170260833677645300|null              |append   |file:/var/folders/9h/w6r5pljd597gy_plht_4llzs63hv0p/T/hive2324856887883087837/table/metadata/snap-170260833677645300-1-303df19f-e01e-4242-9f32-3fe1447a4e76.avro|{spark.app.id -> local-1658211666936, added-data-files -> 2, added-records -> 2, added-files-size -> 1315, changed-partition-count -> 1, total-records -> 2, total-files-size -> 1315, total-data-files -> 2, total-delete-files -> 0, total-position-deletes -> 0, total-equality-deletes -> 0}|
|2022-07-19 11:51:19.682|958906493976709774|170260833677645300|append   |file:/var/folders/9h/w6r5pljd597gy_plht_4llzs63hv0p/T/hive2324856887883087837/table/metadata/snap-958906493976709774-1-2dbbbb8d-227f-4eb7-911c-f55faa2a7aa1.avro|{spark.app.id -> local-1658211666936, added-data-files -> 2, added-records -> 2, added-files-size -> 1315, changed-partition-count -> 1, total-records -> 4, total-files-size -> 2630, total-data-files -> 4, total-delete-files -> 0, total-position-deletes -> 0, total-equality-deletes -> 0}|
+-----------------------+------------------+------------------+---------+----------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

cc @rdblue @jackye1995 @rajarshisarkar @amogh-jahagirdar

Copy link
Collaborator

@szehon-ho szehon-ho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this table will be quite useful especially with registerTable() to rollback to old metadata.

But I think unfortunately not all catalog will support it, like catalog without metadata json, maybe REST catalog?

@RussellSpitzer @rdblue @danielcweeks not sure if we could have a metadata table but not for all catalog types?

@singhpk234 singhpk234 force-pushed the feature/metadata_log_entry branch 2 times, most recently from 4ac3c59 to f86febd Compare June 23, 2022 14:15
@rdblue
Copy link
Contributor

rdblue commented Jun 28, 2022

@szehon-ho, the REST catalog can support registerTable. It still uses metadata locations, it just allows the server to cache metadata and send it so the client doesn't need to read it.

Copy link
Collaborator

@szehon-ho szehon-ho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The table looks good to me, left a few comments. I think we could even extend this table to have more fields later, like latest schemaId, latest specId,e tc.

FYI @rdblue @RussellSpitzer if there are any concerns

Copy link
Collaborator

@szehon-ho szehon-ho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes. Looks good to me, just a few more nits

Copy link
Collaborator

@szehon-ho szehon-ho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the changes, just a few more really small nits and ready to go for me

@szehon-ho szehon-ho merged commit ae96bdf into apache:master Jul 22, 2022
@szehon-ho
Copy link
Collaborator

Merged, thanks @singhpk234 for the contribution

@singhpk234
Copy link
Contributor Author

Thanks @szehon-ho, for the awesome review :) !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants