-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spec: Add snapshot tagging and branching #3425
Conversation
site/docs/snapshot-tag-branch.md
Outdated
##### Day 1: F and G are expired | ||
|
||
On day 1, the global and branch b2 max age has passed, affecting A, D, F, G. | ||
A andD cannot be expired due to branch b1 policy, so only F and G are expired. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A andD cannot be expired due to branch b1 policy, so only F and G are expired. | |
A and D cannot be expired due to branch b1 policy, so only F and G are expired. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am also bit confused about what happens after the expiry? we delete F & G snapshots and also drop branch b2 (we remove b2 from table metadata?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a snapshot with reference is dropped, the reference is also dropped, so branch b2 will be removed.
site/docs/snapshot-tag-branch.md
Outdated
|
||
##### Day 1: F and G are expired | ||
|
||
On day 1, the global and branch b2 max age has passed, affecting A, D, F, G. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it better to have some snapshots in the example that are not referenced by branches and tags just to apply global retention policy ? now all snapshots are under branch or tag. so global policy cannot be applied. (This is based on my assumption, please see last comment for this page)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, I will try to add one
site/docs/snapshot-tag-branch.md
Outdated
##### Day 3: A is expired | ||
|
||
On day 3, main branch max age has also passed, affecting A, B, C. | ||
Based on the global policy, we must keep 4 snapshots, so can only expire 1 snapshot, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need to honour the global policy when that branch is having local policy ? Is it better to delete A and B in this case as local policy says min snapshots to keep is 1 ?
I think applying global policies is good enough only for snapshots that are not involved in any branch, tag. WDYT ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, based on the mechanism described to determine snapshots to expire, we are determining which snapshots are not removed after evaluating all the related policies, so it would eventually hit the global policy and stop there.
I think it also fits the meaning of "minimum number of snapshots to keep", because it's the minimum, not that we need to always reach that value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another reason to avoid saying "global policy" is that it isn't clear whether branches override "global" or whether there is some way to merge them (like max(table-min-snapshots, branch-min-snapshots)).
I think calling the "global" settings "table defaults" is a good way to help people understand.
site/docs/spec.md
Outdated
@@ -581,10 +596,11 @@ Table metadata consists of the following fields: | |||
| _optional_ | _optional_ | **`metadata-log`**| A list (optional) of timestamp and metadata file location pairs that encodes changes to the previous metadata files for the table. Each time a new metadata file is created, a new entry of the previous metadata file location should be added to the list. Tables can be configured to remove oldest metadata log entries and keep a fixed-size log of the most recent entries after a commit. | | |||
| _optional_ | _required_ | **`sort-orders`**| A list of sort orders, stored as full sort order objects. | | |||
| _optional_ | _required_ | **`default-sort-order-id`**| Default sort order id of the table. Note that this could be used by writers, but is not used when reading because reads use the specs stored in manifest files. | | |||
| | _optional_ | **`refs`** | A list of snapshot references, stored as full snapshot reference objects. | | |||
| | _optional_ | **`current-branch`** | The name of the current branch. If not specified, it defaults to the `main` branch that starts with the table creation commit. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to keep each branch's latest snapshot id here ? I wonder how do we support query on each branch or tag ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the refs
always record the latest snapshot ID of a branch, query of branch and tag will use that information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I noted above, I think that we should remove this. The "current branch" must always be main
.
That simplifies a couple things in the spec:
- If
refs
is present, it must contain amain
branch - The
main
branch reference never expires - No need for
current-branch
current-snapshot-id
must always match the snapshot ID of themain
branch
site/docs/snapshot-tag-branch.md
Outdated
|
||
1. Data scientists and ML researchers can easily create an Iceberg branch to experiment with table data without worrying about polluting the main table snapshot. | ||
2. Data engineers can perform production AB testing against the experimental branch to ensure the correctness of certain table updates. | ||
3. Data producers can perform test load in a table in an experimental branch, and then append all the loaded files back to the main branch (similar to Git cherry-pick). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since Iceberg does not plan to offer Git-like merge, can we assume cherry-pick can happen only in a scenario similar to fast-forward merges (where there's no additional commit on the base)?
If yes, do you think it is worth mentioning this pre-requisite?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would imagine it to be similar to the Spark procedure cherrypick_snapshot
. It does not have any requirement, you need to know exactly what you are doing to do the cherry-pick.
site/docs/snapshot-tag-branch.md
Outdated
|
||
| Property | Type | Description | | ||
|------------------------------|-----------|-------------| | ||
| **`min-snapshots-to-keep`** | `int` | For `branch` type only, the minimum number of snapshots to keep in a branch | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we mention the branch name in the property?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An update API will handle the actual update operation, which will allow specifying a branch name. It is not a part of this doc.
site/docs/spec.md
Outdated
| ---------- |------------------------------|-----------|-------------| | ||
| _required_ | **`snapshot-id`** | `long` | The ID of the snapshot referenced | | ||
| _required_ | **`name`** | `string` | The name of the reference | | ||
| _required_ | **`type`** | `string` | Type of the reference, `tag` or `branch` | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to have an enum?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to be serialized in JSON, so there isn't an enum type. In addition, enums are a minefield for schema evolution, so it's usually better to use a string so that people understand there may be other (unsupported) values.
site/docs/spec.md
Outdated
| _required_ | **`snapshot-id`** | `long` | The ID of the snapshot referenced | | ||
| _required_ | **`name`** | `string` | The name of the reference | | ||
| _required_ | **`type`** | `string` | Type of the reference, `tag` or `branch` | | ||
| _optional_ | **`min-snapshots-to-keep`** | `int` | For `branch` type only, the minimum number of snapshots to keep in a branch | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will it make sense to mention the default, since it is an optional field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point, added
site/docs/spec.md
Outdated
| v2 | Field name | Type | Description | | ||
| ---------- |------------------------------|-----------|-------------| | ||
| _required_ | **`snapshot-id`** | `long` | The ID of the snapshot referenced | | ||
| _required_ | **`name`** | `string` | The name of the reference | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the name required to be unique? If yes, will it be worth mentioning the same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it should be unique and I agree that it should be called out. We could also make this a map to guarantee it in JSON.
site/docs/spec.md
Outdated
@@ -581,10 +596,11 @@ Table metadata consists of the following fields: | |||
| _optional_ | _optional_ | **`metadata-log`**| A list (optional) of timestamp and metadata file location pairs that encodes changes to the previous metadata files for the table. Each time a new metadata file is created, a new entry of the previous metadata file location should be added to the list. Tables can be configured to remove oldest metadata log entries and keep a fixed-size log of the most recent entries after a commit. | | |||
| _optional_ | _required_ | **`sort-orders`**| A list of sort orders, stored as full sort order objects. | | |||
| _optional_ | _required_ | **`default-sort-order-id`**| Default sort order id of the table. Note that this could be used by writers, but is not used when reading because reads use the specs stored in manifest files. | | |||
| | _optional_ | **`refs`** | A list of snapshot references, stored as full snapshot reference objects. | | |||
| | _optional_ | **`current-branch`** | The name of the current branch. If not specified, it defaults to the `main` branch that starts with the table creation commit. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we have a current_branch
in metadata? Will it make more sense to position this as a client-session specific conf?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the metadata need to know this information, because it will use the current branch to know what is the default branch to add the next commit.
site/docs/snapshot-tag-branch.md
Outdated
Iceberg offers a [snapshot expiration procedure](../spark-procedures/#expire_snapshots) to clean up snapshots that are not needed to free up storage space. | ||
Retention policy can be configured both globally and on snapshot reference to provide highly flexible customization to the expiration behavior. | ||
|
||
#### Global snapshot retention policy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am bit lost here, I thought we will there will be as well a property to disable the retention either on snapshot or globally?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we already have GC_ENABLED
table property for that.
more details here: #3246
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that "global" might be confusing. How about "Table level retention policy" and "Branch level retention policy"? Reference retention (max-ref-age-ms
) can also be a separate section.
site/docs/snapshot-tag-branch.md
Outdated
@@ -0,0 +1,148 @@ | |||
# Snapshot Tagging and Branching | |||
|
|||
Iceberg snapshot tagging and branching feature offers user a Git-like experience in manging table snapshots. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering here about offers user a Git-like experience
, since we don't intend to offer a full fledged git like experience, does it make sense to add limited Git-like experience
? In order not to confuse users I guess, WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I agree it's probably not good to say Git-like given it's not completely Git-like. But limited
sounds a bit too strong to me, I am changing the wording to the following:
Iceberg snapshot tagging and branching feature provides users more functionalities in managing Iceberg snapshot lifecycle, and allows users to treat table snapshots just like git branch and tag.
Users can assign tags to snapshots, create new branches, set the current branch for read and write, and configure customized retention policy for them.
Please let me know what you think.
site/docs/snapshot-tag-branch.md
Outdated
In Iceberg, we use a similar concept of **Snapshot Reference** to implement branching and tagging. | ||
|
||
Each Iceberg table metadata contains a list of `refs` (references), and a `current-branch` indicating the current branch to use. | ||
When user creates an Iceberg table, the first commit belongs to the default `main` branch. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here a branch is created when the user configure current-branch
property?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, user set a snapshot as a BRANCH
type reference to create a branch. Current branch is used as the default branch to read the latest table snapshot and write the next commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think mentioning a git-like experience
might rather be confusing for users. To me this feature is rather Named Snapshots
(or something similar)
site/docs/spec.md
Outdated
| _required_ | **`name`** | `string` | The name of the reference | | ||
| _required_ | **`type`** | `string` | Type of the reference, `tag` or `branch` | | ||
| _optional_ | **`min-snapshots-to-keep`** | `int` | For `branch` type only, the minimum number of snapshots to keep in a branch | | ||
| _required_ | **`max-snapshot-age-ms`** | `long` | The duration before a snapshot tagged or in a branch could be expired by any automatic snapshot expiration process | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe change to The duration before a snapshot reference can be expired by any automatic snapshot expiration process
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this should be required. Why not default to the table setting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nastra The issue with The duration before a snapshot reference can be expired by any automatic snapshot expiration process
is that it does not express the fact that all snapshots in a branch would be affected by the policy. But I agree the current wording is a bit convoluted, any suggestion would be welcome.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rdblue good point, I was defaulting to the table default by setting it to 5 days explicitly so in the spec it is required.
I agree it sounds better to inherit the table's latest value of history.expire.max-snapshot-age-ms
, although the retention would change if user updates the table property, that is likely the expected behavior, similar to how metrics mode is configured for the whole table and inherited for each column. I will make the change.
site/docs/snapshot-tag-branch.md
Outdated
@@ -0,0 +1,148 @@ | |||
# Snapshot Tagging and Branching | |||
|
|||
Iceberg snapshot tagging and branching feature offers user a Git-like experience in manging table snapshots. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that the git-like experience
part could confuse users as to what that really means.
I believe in some version of the design doc you named it Named Snapshots
(or something similar) as to me that's what this feature really is. Those are named snapshots where the type
can be either a branch
(moves forward with changes) or a tag
(does not move forward) and I would probably rather explain this feature from this perspective
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, I updated it, please let me know what you think about the new wording.
Hey @jackye1995 I left a lot of comments on the google doc. is that still the referecne or should I copy the comments here? |
site/docs/spec.md
Outdated
@@ -548,6 +548,21 @@ Notes: | |||
1. An alternative, *strict projection*, creates a partition predicate that will match a file if all of the rows in the file must match the scan predicate. These projections are used to calculate the residual predicates for each file in a scan. | |||
2. For example, if `file_a` has rows with `id` between 1 and 10 and a delete file contains rows with `id` between 1 and 4, a scan for `id = 9` may ignore the delete file because none of the deletes can match a row that will be selected. | |||
|
|||
#### Snapshot Reference | |||
|
|||
Snapshot reference allows users to perform tagging and branching within an Iceberg table. The detailed user experience is described in [Iceberg Tagging and Branching](../snapshot-tag-branch). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would avoid linking to user experience from the spec. The spec sets requirements and should include enough to understand those requirements. But we don't want to be overly prescriptive or make it seem like there are additional requirements outside of the spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can make this introduction a bit more clear. Something like this:
Iceberg tables keep track of branches and tags using snapshot references. Tags are labels for individual snapshots. Branches are mutable named references that can be updated by committing a new snapshot as the branch's referenced snapshot using the [Commit Conflict Resolution and Retry] procedures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good, updated
site/docs/spec.md
Outdated
@@ -965,7 +981,7 @@ The following table describes the possible values for the some of the field with | |||
|
|||
### Table Metadata and Snapshots | |||
|
|||
Table metadata is serialized as a JSON object according to the following table. Snapshots are not serialized separately. Instead, they are stored in the table metadata JSON. | |||
Table metadata is serialized as a JSON object according to the following table. Snapshots and snapshot references are not serialized separately. Instead, they are stored in the table metadata JSON. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this change is necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
site/docs/spec.md
Outdated
@@ -581,10 +596,11 @@ Table metadata consists of the following fields: | |||
| _optional_ | _optional_ | **`metadata-log`**| A list (optional) of timestamp and metadata file location pairs that encodes changes to the previous metadata files for the table. Each time a new metadata file is created, a new entry of the previous metadata file location should be added to the list. Tables can be configured to remove oldest metadata log entries and keep a fixed-size log of the most recent entries after a commit. | | |||
| _optional_ | _required_ | **`sort-orders`**| A list of sort orders, stored as full sort order objects. | | |||
| _optional_ | _required_ | **`default-sort-order-id`**| Default sort order id of the table. Note that this could be used by writers, but is not used when reading because reads use the specs stored in manifest files. | | |||
| | _optional_ | **`refs`** | A list of snapshot references, stored as full snapshot reference objects. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that this PR needs to address compatibility with current-snapshot-id
. For the duration of v2, writers must still produce current-snapshot-id
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see the current-snapshot-id
field is described as optional, but when checking the TableMetadataParser
it looks like we always assume the field exists for both v1 and v2.
Is that an issue with the spec itself?
If this field is required for v1 and v2, then I think there is no concern? Or are you suggesting adding a new column for format v3 which deprecates current-snapshot-id
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current snapshot ID is optional because the table may not have a snapshot ID. We can clarify this in the spec and update the parser to return null
.
145ac9d
to
00efedd
Compare
Sorry for the delayed reply @rymurr @nastra @omarsmak @harshm-dev @ajantha-bhat I have updated based on the comments and also replied questions in the design doc, please let me know if there is any further question. |
00efedd
to
f648068
Compare
site/docs/spec.md
Outdated
@@ -593,16 +609,17 @@ Table metadata consists of the following fields: | |||
| _optional_ | _required_ | **`default-spec-id`**| ID of the "current" spec that writers should use by default. | | |||
| _optional_ | _required_ | **`last-partition-id`**| An integer; the highest assigned partition field ID across all partition specs for the table. This is used to ensure partition fields are always assigned an unused ID when evolving specs. | | |||
| _optional_ | _optional_ | **`properties`**| A string to string map of table properties. This is used to control settings that affect reading and writing and is not intended to be used for arbitrary metadata. For example, `commit.retry.num-retries` is used to control the number of commit retries. | | |||
| _optional_ | _optional_ | **`current-snapshot-id`**| `long` ID of the current table snapshot. | | |||
| _optional_ | _optional_ | **`current-snapshot-id`**| `long` ID of the latest snapshot of the current branch. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After the confusion over current-branch
, I think that the current branch should always be main
so that people don't try to "checkout" branches in a table. That's always an incorrect operation.
In that case, this should be the ID of the current table snapshot; must be the same as the current ID of the main branch
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it has proven too confusing to have that feature. Plus, we don't really need it if we support renaming branch refs. Making another branch main
is as easy as this:
table.updateReferences()
.rename("main", "main_v1")
.rename("dev", "main")
.commit();
site/docs/spec.md
Outdated
| _required_ | **`name`** | `string` | The name of the reference, should be unique within a table. | | ||
| _required_ | **`type`** | `string` | Type of the reference, `tag` or `branch` | | ||
| _optional_ | **`min-snapshots-to-keep`** | `int` | For `branch` type only, the minimum number of snapshots to keep in a branch, default to the current value of table property `history.expire.min-snapshots-to-keep` when this value is evaluated | | ||
| _optional_ | **`max-snapshot-age-ms`** | `long` | The duration before a snapshot tagged or in a branch could be expired by any automatic snapshot expiration process, default to the current value of table property `history.expire.max-snapshot-age-ms` when this value is evaluated | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that both of these only apply to branches because these control the snapshots that are kept behind the branch's current snapshot.
We also need a new configuration option at the table level and at the reference level that controls when the reference itself is removed. That is, if the referenced snapshot is older than some interval, we remove the reference itself. That helps prune old branches and tags that are no longer used. We'll probably default it to keep them forever, but it is good to have the setting for compliance needs. For that, how about max-ref-age-ms
and history.expire.max-ref-age-ms
?
site/docs/snapshot-tag-branch.md
Outdated
|
||
## Example use cases | ||
|
||
### Time-based Snapshot tagging |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It isn't clear from the spec how this would be done, especially without the max-ref-age-ms
property.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My original thinking is that if the snapshot is expired and it has reference, the reference is removed at the same time. But I agree max-ref-age-ms
does give more flexibility, I will update based on that.
site/docs/snapshot-tag-branch.md
Outdated
## Snapshot Reference | ||
|
||
In version control systems like git, branch and tag are both references of commits. | ||
In Iceberg, we use a similar concept of **Snapshot Reference** to implement branching and tagging. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: it's unclear who "we" is in documentation. It's more direct and clear to say "Iceberg uses"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will remove all the usages of we
site/docs/snapshot-tag-branch.md
Outdated
|
||
### Retention Policy | ||
|
||
Iceberg offers a [snapshot expiration procedure](../spark-procedures/#expire_snapshots) to clean up snapshots that are not needed to free up storage space. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be better to say something like "Table snapshots expire and are removed from metadata to allow removed or replaced data files to be physically deleted. The snapshot expiration procedure removes snapshots from table metadata and applies the table's retention policy. Retention policy ..."
site/docs/snapshot-tag-branch.md
Outdated
|
||
#### Policy evaluation mechanism | ||
|
||
When a snapshot expiration process starts, it follows the steps described below: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If retention properties are in the spec, then this section should be in the spec as well.
site/docs/snapshot-tag-branch.md
Outdated
|
||
When a snapshot expiration process starts, it follows the steps described below: | ||
|
||
1. form an expiration candidate pool containing all snapshots |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Could you use sentence case? Also, there are two "4." points. While markdown will fix it, I like to keep them updated.
site/docs/snapshot-tag-branch.md
Outdated
2. for each snapshot reference, evaluate the associated policy and move snapshots out of the candidate pool | ||
3. apply global retention policy to and move snapshots out of the candidate pool | ||
4. when multiple snapshots can be chosen to be moved out, newer snapshots win | ||
4. after evaluation, expire all snapshots that are still in the candidate pool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this description of the process confusing. Why build a pool and remove, rather than building a set of snapshots to retain? I also don't think it makes sense to refer to applying global policy as a separate step or "choosing" snapshots from the pool. Instead, I think it is more clear to use a procedure that focuses on the snapshots that must be kept:
- Start with an empty set of snapshots to retain
- Remove any refs (except
main
) where the referenced snapshot is older thanmax-ref-age-ms
- For each branch and tag, add the referenced snapshot to the retained set
- For each branch, add its ancestors to the retained set until:
- The snapshot is older than
max-snapshot-age-ms
, AND- The snapshot is not one of the first
min-snapshots-to-keep
in the branch (including the branch's referenced snapshot)- Expire any snapshot not in the set of snapshots to retain.
Note:
max-ref-age-ms
,max-snapshot-age-ms
, andmin-snapshots-to-keep
can be set in ref metadata. If not set, the ref should use default values from table metadata.
site/docs/snapshot-tag-branch.md
Outdated
H (b3) | ||
``` | ||
|
||
| Policy Type | Max Age | Min to Keep | Snapshots Affected | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Referring to "policy" sounds complicated, and listing the "snapshot affected" makes this less clear because snapshot A is affected by multiple policies.
I think I'd drop the affected snapshots column and call policy something like "scope": table scope, branch/main scope, etc.
I like the sections for how this progresses. I think that makes it more clear.
site/docs/snapshot-tag-branch.md
Outdated
| branch/main | 3 days | 1 | A, B, C | | ||
| branch/b1 | 2 days | 2 | A, D, E | | ||
| branch/b2 | 1 day | 0 | A, D, F, G | | ||
| tag/dev | forever | N/A | C | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you set "forever"? I think this would make sense if max-ref-age-ms
is unset.
site/docs/snapshot-tag-branch.md
Outdated
|
||
##### Day 1: F, G, H are expired | ||
|
||
On day 1, the global and branch b2 max age has passed, affecting A, D, F, G, H. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems strange to me that H would expire since it is a branch ref. I think that the additional setting solves the problem though.
site/docs/snapshot-tag-branch.md
Outdated
|
||
Here is an example for how an Iceberg snapshot expiration procedure evaluates what snapshots to expire. | ||
|
||
Suppose we have the following snapshot graph and retention policies configured: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should mention that we're assuming all of these snapshots were created at or around the same time.
site/docs/snapshot-tag-branch.md
Outdated
| global | 5 hours | 4 | A, B, C, D, E, F, G, H | | ||
| branch/main | 3 days | 1 | A, B, C | | ||
| branch/b1 | 2 days | 2 | A, D, E | | ||
| branch/b2 | 1 day | 0 | A, D, F, G | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Min to keep must be at least 1 because it includes the referenced snapshot.
site/docs/snapshot-tag-branch.md
Outdated
``` | ||
A -> B -> C (main) | ||
\ (dev) | ||
D -> E (b1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is dev pointing to C or E?
site/docs/snapshot-tag-branch.md
Outdated
|
||
On day 1, the global and branch b2 max age has passed, affecting A, D, F, G, H. | ||
There is no policy configured for branch b3, so H is expired. | ||
On branch b2, A and D cannot be expired due to branch b1 policy, so only F and G are expired. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like the process above, I think it will be more clear when this focuses on what to retain, rather than what to expire. The problem is that focusing on what to expire begs the question about which setting takes priority for removal. Focusing on what to retain avoids that problem because it is clear that a snapshot may be retained for multiple reasons.
Here, I'd say that:
main
is set to keep 3 days of snapshots, so all ancestors of C newer than 3 days should be kept: (C, B, A)b1
is set to keep 2 days of snapshots, so all ancestors of E newer than 2 days are kept: (E, D, A)dev
retains snapshot E (I think)b2
aged off, so it doesn't affect retention
site/docs/snapshot-tag-branch.md
Outdated
``` | ||
A -> B -> C (main) | ||
\ (dev) | ||
D -> E (b1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would help if D expired here and the branch had min-snapshots-to-keep
set to 1.
@rdblue thanks for the detailed review, I have made the following changes:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @jackye1995! I have put my thoughts from the google doc here as a review
Tags are labels for individual snapshots. Branches are mutable named references that can be updated by committing a new snapshot as the branch's referenced snapshot using the [Commit Conflict Resolution and Retry](#commit-conflict-resolution-and-retry) procedures. | ||
|
||
The snapshot reference object records all the information of a reference including snapshot ID, reference type and [Snapshot Retention Policy](#snapshot-retention-policy). | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed on the google doc I would be in favour of splitting this into two structures. From the google doc:
To me this is coupling the concept of snapshot expiry to the branch/tag definition when we know we probably don't want to long term. Which could make it hard to make changes in the future. Maybe I am thinking about it too hard but to me there are two concepts: the list of branches/tags and the expiration policies. Having these as two separate data structures allows for both to grow independently in my mind: named policies, more complex policies etc as well as more flexibility in how the list of existing references is maintained in the presence of multi-table branching or catalog owned branching etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be fine either way. I think that the advantage here is that it is simple to express. Having a list of expiration policies and applying them selectively to tags or branches, possibly by identifying the policy by ID seems a bit overkill to me. I think you'd probably have some default branch/tag retention, retention for branch versions, and that's it. Even customizing that per branch seems a little speculative to me. I think people will likely just set expiration globally and forget about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I mentioned this a bit in the design doc's reply, we can always add a reference pointer when there is a need to have a separated retention policy construct, so it's fully backwards compatible. For exmaple:
"branchA": {
"max-ref-age-ms": 10000
},
"branchB": {
"retention-policy-id": 2
}
But for now having a separated retention policy object feels a bit overkill.
| _optional_ | _optional_ | **`snapshots`**| A list of valid snapshots. Valid snapshots are snapshots for which all data files exist in the file system. A data file must not be deleted from the file system until the last snapshot in which it was listed is garbage collected. | | ||
| _optional_ | _optional_ | **`snapshot-log`**| A list (optional) of timestamp and snapshot ID pairs that encodes changes to the current snapshot for the table. Each time the current-snapshot-id is changed, a new entry should be added with the last-updated-ms and the new current-snapshot-id. When snapshots are expired from the list of valid snapshots, all entries before a snapshot that has expired should be removed. | | ||
| _optional_ | _optional_ | **`metadata-log`**| A list (optional) of timestamp and metadata file location pairs that encodes changes to the previous metadata files for the table. Each time a new metadata file is created, a new entry of the previous metadata file location should be added to the list. Tables can be configured to remove oldest metadata log entries and keep a fixed-size log of the most recent entries after a commit. | | ||
| _optional_ | _required_ | **`sort-orders`**| A list of sort orders, stored as full sort order objects. | | ||
| _optional_ | _required_ | **`default-sort-order-id`**| Default sort order id of the table. Note that this could be used by writers, but is not used when reading because reads use the specs stored in manifest files. | | ||
| | _optional_ | **`refs`** | A map of snapshot references. The map keys are the unique snapshot reference names in the table, and the map values are snapshot reference objects. There is always a `main` branch reference pointing to the `current-snapshot-id` even if the `refs` map is null. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My concern raised on the google doc was that the act of re-writing metadata.json
and doing a full transaction just to add or change a branch/tag feels strange. Now that current-branch
is gone I am a bit less concerned. The only time where there is a transaction just to modify the refs
field is when a branch or tag is created, is that right?
The optimistic retry means that at transaction commit time things will unlikely conflict or cause issues. I think it just feels counterintuitive to me to do a re-write of the table metadata just to eg change the branch or add a tag. I also have concerns about extensibility in the future. For example extending to multiple tables, or implementing a git log
like feature.
One potential solution is to give the catalogs control over the implementation. For example the HMS catalog may store the proposed branch related data structures in the table definition in Hive rather in the metadata.json. This allows individual catalogs more latitude in how they deal with branching, and the ability to evolve that strategy over time. It does have the downside that migrating between catalogs could be more troublesome.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only time where there is a transaction just to modify the refs field is when a branch or tag is created, is that right?
Yes.
feels counterintuitive to me to do a re-write of the table metadata just to eg change the branch or add a tag . . . One potential solution is to give the catalogs control over the implementation.
I agree, this is a bit awkward. But it fits with the current model of keeping metadata.json as the source of truth for a table. I think we can evolve catalogs to do fancier things here -- for one, not sending all of the snapshots back to the client. But as far as the spec is concerned, how would we express that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe one potential middle ground we can settle here is to first add the tagging feature without branching. That is a feature that many users would like to have for use cases like time-based snapshot expiration, and it makes sense to have that as a part of the Iceberg spec. For the branching feature, it was added to the design more for completeness, but so far (at least to me) there is no feature request for that yet. We can add it when the feature request comes and then discuss the best way to go. What do you think? @rdblue @rymurr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should modify the proposal here. We know that lots of snapshots can be a problem, but that's how Iceberg manages snapshots. I don't think we need to hold off on building a feature like this for fairly rare cases. We can add branches and handle the metadata issues elsewhere.
One solution I've been thinking about is allowing table implementations to request a specific time range or updates since some time in the REST catalog. That could handle this problem nicely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense. In that case do you have any additional concern related to the spec change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still feel uncomfortable about a transaction to add a tag but I don't see any easy way out of it. I think it would be good to have a discussion about metadata.json as the source of truth (for everything) in the longer term as I think that is becoming less feasible. My comment on the call today about notification settings living in metadata is related.
I guess my only question is if we agree that its awkward and we agree that catalogs have more of a role to play in teh future then how do we move on from this proposal? Its hard to back this out of the spec or evolve it once its in. I know this is a hard question to reason about and I don't want to hold this useful feature up. But it would be good to at least think about it given the above discussion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I think the work in TableMetadata builder and REST catalog is to try have an alternative way of dealing with table metadata updates from as a diff rather than rewriting the entire table metadata file. And that could open the door for potentially some other implementations of the TableMetadata spec backed by services and do not require full metadata rewrite. That is a development I would be happy to see, and with that we will be able to have fast commits to the metadata for small changes like tagging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think we agree that it is awkward. But it's a separate point. We already allow branching to some degree, or at least a collection of snapshots that isn't necessarily linear. Being able to label some of those snapshots and change retention for them doesn't really alter the problem.
@jackye1995 is right that we're getting to a point where we can possibly change this in the future. Moving to a change-based API is one step toward it, in addition to the use case of being able to more easily migrate library versions.
site/docs/configuration.md
Outdated
@@ -70,6 +70,7 @@ Iceberg tables support table properties to configure table behavior, like the de | |||
| commit.manifest-merge.enabled | true | Controls whether to automatically merge manifests on writes | | |||
| history.expire.max-snapshot-age-ms | 432000000 (5 days) | Default max age of snapshots to keep while expiring snapshots | | |||
| history.expire.min-snapshots-to-keep | 1 | Default min number of snapshots to keep while expiring snapshots | | |||
| history.expire.max-ref-age-ms | forever | For snapshot references except the `main` branch, default max age of snapshot references to keep while expiring snapshots. The `main` branch never expires. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be something like unset (forever)
or Long.MAX_VALUE (forever)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because this will be used as the default if the value is not set in ref, I am making it Long.MAX_VALUE
to avoid an additional null check.
@rdblue @jackye1995 Can this PR be merged? And then we can continue the following work. |
site/docs/spec.md
Outdated
|
||
| v2 | Field name | Type | Description | | ||
| ---------- |------------------------------|-----------|-------------| | ||
| _required_ | **`snapshot-id`** | `long` | The ID of the snapshot referenced | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would flip the description to be about the reference, not about the snapshot ID. I'd also add context for tags and branches to make it more understandable: "A reference's snapshot ID. The tagged snapshot or latest snapshot of a branch."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
site/docs/spec.md
Outdated
| ---------- |------------------------------|-----------|-------------| | ||
| _required_ | **`snapshot-id`** | `long` | The ID of the snapshot referenced | | ||
| _required_ | **`type`** | `string` | Type of the reference, `tag` or `branch` | | ||
| _optional_ | **`min-snapshots-to-keep`** | `int` | For `branch` type only, a positive number for the minimum number of snapshots to keep in a branch while expiring snapshots, default to the value of table property `history.expire.min-snapshots-to-keep` when evaluated | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: "defaults to" rather than "default to"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
site/docs/spec.md
Outdated
| _required_ | **`snapshot-id`** | `long` | The ID of the snapshot referenced | | ||
| _required_ | **`type`** | `string` | Type of the reference, `tag` or `branch` | | ||
| _optional_ | **`min-snapshots-to-keep`** | `int` | For `branch` type only, a positive number for the minimum number of snapshots to keep in a branch while expiring snapshots, default to the value of table property `history.expire.min-snapshots-to-keep` when evaluated | | ||
| _optional_ | **`max-snapshot-age-ms`** | `long` | For `branch` type only, a positive number for the max age of snapshots to keep in a branch while expiring snapshots, default to the value of table property `history.expire.max-snapshot-age-ms` when evaluated | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could make this a bit smaller by getting rid of extra words in the second sentence. Also, I think this should state that it includes the latest snapshot.
For
branch
type only, a positive number for the max age of snapshots to keep when expiring, including the latest snapshot. Defaults to table propertyhistory.expire.max-snapshot-age-ms
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
site/docs/spec.md
Outdated
|
||
The snapshot reference object records all the information of a reference including snapshot ID, reference type and [Snapshot Retention Policy](#snapshot-retention-policy). | ||
|
||
| v2 | Field name | Type | Description | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just noticed that only v2 is here. What about v1? I think this should be v1 / v2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense, updated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some minor comments, but +1 overall. I think this is ready to go in.
296f189
to
cfb305f
Compare
Thanks Ryan for the review! I have updated everything based on your comments. Please let me know if you would like a final pass, otherwise I will merge this by the end of the day. |
No description provided.