-
Notifications
You must be signed in to change notification settings - Fork 502
fix(cache): scope RowIdSequence cache by version #5508
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
fix(cache): scope RowIdSequence cache by version #5508
Conversation
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
|
Why your version 3 succeeded since it should be conflicted with version 2. Simply read your test. I think you are assuming we could reproduce a fragement with the same id in a version line, which in my view should never happen |
hi @majin1102 check out and restore to version 1 and trigger another compaction(update this description) |
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
That's a good catch. I wonder if we should keep the max_fragment_id for restoring, or use the approach you pointed, or both. I think fragment id traveling is not a good thing which introduces unnecessary risks. |
For now, in the case of restore, there will indeed be duplicate fragment IDs, row ID sequences, and possible |
I'm referring to this field in manifest. Right now, restore and overwrite can cause this max_fragment_id ID fallback and lead to fragment id traveling. That means fragement id is not globally identical through versions and time. I don't know if this logic was fully thought through initially.
Yeah I agree this is a light fix for the issue we are encountering Would appreciate thoughts from @jackye1995 @wjones127 @westonpace! |
That's a good point. I think we need to redesign that better so we can cache properly. We definitely want to re-use cached objects between dataset versions. But if that's the case we need to be able to rely on |
|
Starting to think about this: #5540 But need some time to think and discuss. Feel free to add ideas there on how we could solve this. |
Reproduction
(Simplify the new versions generated by restore and compaction reservation)
Root Cause
Fix
RowIdSequencecache entries by both dataset version and fragment:row_id_sequence/{version}/{fragment_id}.dataset.manifest.versionwhen creating theRowIdSequenceKeyduring sequence loads.