Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs: Document all metadata tables. #8709

Merged
merged 4 commits into from
Oct 10, 2023
Merged

Conversation

nk1506
Copy link
Contributor

@nk1506 nk1506 commented Oct 3, 2023

added missing metadata tables all_delete_files all_entries entries position_deletes

Fixes #757

@ajantha-bhat
Copy link
Member

@nk1506: I think it is hard to review. Can you please keep the scope of this PR to add the missing metadata tables? Rearranging can be done in a followup PR.

```

| status | snapshot_id | sequence_number | file_sequence_number | data_file | readable_metrics |
|--------| -- |-----------------|----------------------| -- | -- |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this style is not same as existing tables.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment for all new tables added.

@@ -268,6 +268,18 @@ order by made_current_at
| 2019-02-09 16:32:47.336 | append | 57897183625154 | true | application_1520379288616_155055 |
| 2019-02-08 03:47:55.948 | overwrite | 51792995261850 | true | application_1520379288616_152431 |

### Entries

To show all the table's current manifest entries for both data and delete files.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this statement is confusing. "all the table + current" ?

@@ -335,6 +347,18 @@ Note:

2. The partitions metadata table shows partitions with data files or delete files in the current snapshot. However, delete files are not applied, and so in some cases partitions may be shown even though all their data rows are marked deleted by delete files.

### Positional Delete Files

To show all positional delete files from a table:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only from current snapshot? If it is not referenced by current snapshot, it won't show right? I think we need to clarify

@@ -357,6 +381,31 @@ SELECT * FROM prod.db.table.all_data_files;
| 0|s3://.../dt=20210103/00000-0-26222098-032f-472b-8ea5-651a55b21210-00001.parquet| PARQUET|{20210103}| 14| 2444|{1 -> 94, 2 -> 17}|{1 -> 14, 2 -> 14}| {1 -> 0, 2 -> 0}| {}|{1 -> 1, 2 -> 20210103}|{1 -> 3, 2 -> 20210103}| null| [4]| null| 0|
| 0|s3://.../dt=20210104/00000-0-a3bb1927-88eb-4f1c-bc6e-19076b0d952e-00001.parquet| PARQUET|{20210104}| 14| 2444|{1 -> 94, 2 -> 17}|{1 -> 14, 2 -> 14}| {1 -> 0, 2 -> 0}| {}|{1 -> 1, 2 -> 20210104}|{1 -> 3, 2 -> 20210104}| null| [4]| null| 0|

#### All Delete Files

To show all the table's delete files and each file's metadata:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment for "all the table's"

do we need each file's metadata?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

followed the same convention of previous tables like files

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. The existing convention is still confusing for me to read. But we can optimize in a follow up. Ok for me to keep it similar now.


#### All Entries

To show all the table's manifest entries from any reachable snapshot for both data and delete files:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from any reachable snapshot -> from all the snapshots

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also " all the table's" is confusing here too.

@ajantha-bhat ajantha-bhat requested a review from szehon-ho October 4, 2023 15:02
@nk1506
Copy link
Contributor Author

nk1506 commented Oct 5, 2023

@szehon-ho , Please review and share the feedback.

Copy link
Member

@ajantha-bhat ajantha-bhat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@nastra nastra merged commit d8a07ff into apache:master Oct 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Document all metadata tables
3 participants