Spark: Show metadata.json path in `DESC TABLE EXTENDED` #5006

singhpk234 · 2022-06-09T14:42:39Z

About Change

The change attempts to show metadata location in DESC TABLE EXTENDED, this adds metadata-location to the table properties.

Why this change is Required

We can use this to figure out the metadata pointer this table points to and use it to register table via StoredProcedure recently introduced. This could be convenient for users who only want to use SPARK-SQL.

Testing

Added a UT.
Add one more UT for path validation

rdblue · 2022-06-12T23:23:05Z

@singhpk234, is this something we want to expose?

singhpk234 · 2022-06-13T05:36:47Z

@rdblue, Considering now we have register table stored-procedure in, was thinking if there is a pure SQL way to find this property. I had a migration use case (i.e migrate hive tables to glue), had to go to glue UI to find this path and then register it via SQL and also vice-versa went to hive table param's to find it and register in glue.

MariaDB [hive]> SELECT * from TBLS where TBL_NAME='store_sales'
MariaDB [hive]> SELECT * FROM TABLE_PARAMS where TBL_ID = xx

Apologies if there exists some other SQL way to find it, I thought that having this exposed in describe table could have helped me. I agree this is not a property of the table but a state of the table (we also expose current_snapshot_id). This was my rationale behind putting a pr out for this change. Would love to know your thoughts about the same.

There exists an alternative way for us to find this as well, so it's not like it can't be worked around :

Table table = Spark3Util.loadIcebergTable(spark, tableName);
String metadataJson = ((HiveTableOperations) (((HasTableOperations) table).operations())).currentMetadataLocation();

rdblue · 2022-06-29T16:55:31Z

Right now, you can use input_file_name() on some metadata tables to get this, but that's mostly a hack. I don't think that we want to expose this detail to users, but I could be convinced otherwise. I'm skeptical about the register table use case. Wouldn't you want to export from the current metastore so you don't have a duplicate table? That use case seems like a dangerous way to migrate.

singhpk234 · 2022-07-01T05:15:07Z

I don't think that we want to expose this detail to users, but I could be convinced otherwise

Agree with you, hence opened #5063 based on @jackye1995 suggestion. Apologies I forgot to close this PR.

Wouldn't you want to export from the current metastore so you don't have a duplicate table? That use case seems like a dangerous way to migrate

In the case above I wanted to have duplicate tables (1 per each catalog), was bench-marking tpc-ds perf using various catalog (hive / glue). This is a very niche use case (and apologies not entirely a migration use case) though and very dangerous as well for prod use cases.

singhpk234 · 2022-07-01T05:16:05Z

Superceded by #5063

github-actions bot added the spark label Jun 9, 2022

DESC table should show table properties

79dc60c

singhpk234 force-pushed the feature/desc-table branch from 6de14c8 to 79dc60c Compare June 9, 2022 15:17

singhpk234 mentioned this pull request Jun 16, 2022

Core: Add MetadataLog metadata table #5063

Merged

singhpk234 closed this Jul 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spark: Show metadata.json path in `DESC TABLE EXTENDED` #5006

Spark: Show metadata.json path in `DESC TABLE EXTENDED` #5006

singhpk234 commented Jun 9, 2022

rdblue commented Jun 12, 2022

singhpk234 commented Jun 13, 2022 •

edited

Loading

rdblue commented Jun 29, 2022

singhpk234 commented Jul 1, 2022 •

edited

Loading

singhpk234 commented Jul 1, 2022

Spark: Show metadata.json path in DESC TABLE EXTENDED #5006

Spark: Show metadata.json path in DESC TABLE EXTENDED #5006

Conversation

singhpk234 commented Jun 9, 2022

About Change

Why this change is Required

Testing

rdblue commented Jun 12, 2022

singhpk234 commented Jun 13, 2022 • edited Loading

rdblue commented Jun 29, 2022

singhpk234 commented Jul 1, 2022 • edited Loading

singhpk234 commented Jul 1, 2022

Spark: Show metadata.json path in `DESC TABLE EXTENDED` #5006

Spark: Show metadata.json path in `DESC TABLE EXTENDED` #5006

singhpk234 commented Jun 13, 2022 •

edited

Loading

singhpk234 commented Jul 1, 2022 •

edited

Loading