Skip to content

Conversation

@zhangyue19921010
Copy link
Contributor

@zhangyue19921010 zhangyue19921010 commented Dec 17, 2025

Reproduction
(Simplify the new versions generated by restore and compaction reservation)

  • Create a dataset with stable Row IDs enabled; current version is 1.
  • Run compaction to produce version 2.
  • Checkout and restore back to version 1, adjust compaction params, run compaction again to produce version 3.
  • Query version 3, then checkout to version 2 and query again.
  • Observed: the query on version 2 fails; disabling caches makes the query succeed.

Root Cause

  • If you temporarily check out and restore a previous version and then perform data operations (not just compaction), fragment IDs may be duplicated across versions. Additionally, since the cache does not cache rowid sequences at the version granularity, this will cause time travel based on different versions to be affected by each other due to the cache.

Fix

  • Scope RowIdSequence cache entries by both dataset version and fragment:
    • Change the cache key to row_id_sequence/{version}/{fragment_id}.
    • Pass dataset.manifest.version when creating the RowIdSequenceKey during sequence loads.

@chatgpt-codex-connector
Copy link

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

@github-actions github-actions bot added the bug Something isn't working label Dec 17, 2025
@zhangyue19921010 zhangyue19921010 changed the title fix(cache): scope RowIdSequence by version fix(cache): scope RowIdSequence cache by version Dec 17, 2025
@codecov
Copy link

codecov bot commented Dec 17, 2025

Codecov Report

❌ Patch coverage is 97.82609% with 1 line in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
rust/lance/src/dataset/rowids.rs 50.00% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

@zhangyue19921010 zhangyue19921010 marked this pull request as draft December 17, 2025 13:28
@majin1102
Copy link
Contributor

Why your version 3 succeeded since it should be conflicted with version 2.

Simply read your test. I think you are assuming we could reproduce a fragement with the same id in a version line, which in my view should never happen

@zhangyue19921010
Copy link
Contributor Author

Why your version 3 succeeded since it should be conflicted with version 2.

Simply read your test. I think you are assuming we could reproduce a fragement with the same id in a version line, which in my view should never happen

hi @majin1102 check out and restore to version 1 and trigger another compaction(update this description)

@zhangyue19921010 zhangyue19921010 marked this pull request as ready for review December 17, 2025 15:21
@chatgpt-codex-connector
Copy link

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

@majin1102
Copy link
Contributor

majin1102 commented Dec 17, 2025

Why your version 3 succeeded since it should be conflicted with version 2.
Simply read your test. I think you are assuming we could reproduce a fragement with the same id in a version line, which in my view should never happen

hi @majin1102 check out and restore to version 1 and trigger another compaction(update this description)

That's a good catch.

I wonder if we should keep the max_fragment_id for restoring, or use the approach you pointed, or both. I think fragment id traveling is not a good thing which introduces unnecessary risks.

@zhangyue19921010
Copy link
Contributor Author

Why your version 3 succeeded since it should be conflicted with version 2.
Simply read your test. I think you are assuming we could reproduce a fragement with the same id in a version line, which in my view should never happen

hi @majin1102 check out and restore to version 1 and trigger another compaction(update this description)

That's a good catch.

I wonder if we should keep the max_fragment_id for restoring, or use the approach you pointed, or both. I think fragment id traveling is not a good thing which introduces unnecessary risks.

For now, in the case of restore, there will indeed be duplicate fragment IDs, row ID sequences, and possible max_xxx_xxx in the future. The current PR is a lightweight solution, but for further improvements, we may need to discuss the specific behavior of restore.

@majin1102
Copy link
Contributor

majin1102 commented Dec 18, 2025

and possible max_xxx_xxx in the future.

pub max_fragment_id: Option<u32>,

I'm referring to this field in manifest.

Right now, restore and overwrite can cause this max_fragment_id ID fallback and lead to fragment id traveling. That means fragement id is not globally identical through versions and time. I don't know if this logic was fully thought through initially.

The current PR is a lightweight solution

Yeah I agree this is a light fix for the issue we are encountering

Would appreciate thoughts from @jackye1995 @wjones127 @westonpace!

@wjones127
Copy link
Contributor

Right now, restore and overwrite can cause this max_fragment_id ID fallback and lead to fragment id traveling. That means fragement id is not globally identical through versions and time. I don't know if this logic was fully thought through initially.

That's a good point. I think we need to redesign that better so we can cache properly. We definitely want to re-use cached objects between dataset versions. But if that's the case we need to be able to rely on fragment_id being unique. This seems easy to accomplish with Overwrite (just don't reset the max_fragment_id. But it seems harder with restore. The other hard part is branches: they might also introduce duplicates in terms of fragment_id. I think we need a design discussion around this.

@wjones127 wjones127 self-assigned this Dec 18, 2025
@wjones127
Copy link
Contributor

Starting to think about this: #5540

But need some time to think and discuss. Feel free to add ideas there on how we could solve this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants