Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: online restore download inconsistency #134168

Open
nameisbhaskar opened this issue Nov 3, 2024 · 8 comments
Open

storage: online restore download inconsistency #134168

nameisbhaskar opened this issue Nov 3, 2024 · 8 comments
Assignees
Labels
A-storage Relating to our storage engine (Pebble) on-disk storage. branch-master Failures and bugs on the master branch. branch-release-24.3 Used to mark GA and release blockers, technical advisories, and bugs for 24.3 C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-testcluster Issues found or occurred on a test cluster, i.e. a long-running internal cluster P-3 Issues/test failures with no fix SLA T-storage Storage Team

Comments

@nameisbhaskar
Copy link
Contributor

nameisbhaskar commented Nov 3, 2024

drt-chaos is in a really bad state now. Several of the nodes are down, with at least two of them complaining about consistency issues:
F241031 14:03:24.454366 1160675458 kv/kvserver/replica_consistency.go:832 ⋮ [T1,Vsystem,n1,s3,r2614737/1:‹/Table/643372/5/{65/7…-127/…}›] 1538467 +This node is terminating because a replica inconsistency was detected between [n1,s3,r2614737/1:‹/Table/643372/5/{65/7…-127/…}›]

Slack threads -
https://cockroachlabs.slack.com/archives/C05FHJJ0MD0/p1730353029336369
https://cockroachlabs.slack.com/archives/C0KB9Q03D/p1730409574466829

Jira issue: CRDB-43952

@nameisbhaskar nameisbhaskar added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-kv KV Team O-testcluster Issues found or occurred on a test cluster, i.e. a long-running internal cluster P-0 Issues/test failures with a fix SLA of 2 weeks labels Nov 3, 2024
Copy link

blathers-crl bot commented Nov 3, 2024

Hi @nameisbhaskar, please add branch-* labels to identify which branch(es) this C-bug affects.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

Copy link

blathers-crl bot commented Nov 3, 2024

Hi @ajstorm, please add branch-* labels to identify which branch(es) this GA-blocker affects.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@ajstorm ajstorm added the branch-release-24.3 Used to mark GA and release blockers, technical advisories, and bugs for 24.3 label Nov 3, 2024
@ajstorm
Copy link
Collaborator

ajstorm commented Nov 3, 2024

Adding the GA blocker tag until we have a better idea of what's going on here.

@arulajmani arulajmani added T-storage Storage Team and removed T-kv KV Team labels Nov 4, 2024
@blathers-crl blathers-crl bot added the A-storage Relating to our storage engine (Pebble) on-disk storage. label Nov 4, 2024
@github-project-automation github-project-automation bot moved this to Incoming in Storage Nov 4, 2024
@itsbilal
Copy link
Member

itsbilal commented Nov 4, 2024

Not a GA-blocker; this is solely an online restore bug. The replica inconsistency is between n1 having compacted a virtualized external (online restore) sstable, while n3 and n6 did downloads. It looks like the downloads ate away a lot of the sstable that they shouldn't have.

There have been a lot of fixes in this space since this bug happened, so I'm not sure if this bug would even happen with a build from today. But either way it's pretty clearly isolated to online restore, a non-GA feature, so I'll deprioritize this issue accordingly.

@itsbilal itsbilal added P-2 Issues/test failures with a fix SLA of 3 months and removed GA-blocker P-0 Issues/test failures with a fix SLA of 2 weeks labels Nov 4, 2024
@itsbilal
Copy link
Member

itsbilal commented Nov 5, 2024

The bug is in the way sstable.CopySpan (used for downloads) works with comparers that don't just do a simple bytewise comparison for keys all the time, such as the Cockroach comparer. We call CopySpan with the prefix trimmed away:

https://github.com/cockroachdb/pebble/blob/1af22dc322eb38a77c35f41910139149999c418a/compaction.go#L2440

And here we call the comparer with the trimmed key, which is invalid:

https://github.com/cockroachdb/pebble/blob/1af22dc322eb38a77c35f41910139149999c418a/sstable/copier.go#L253

The reason why the metamorphic tests didn't catch this is because they have a relatively simple comparer, where the Compare("foo"+a, "foo"+b) == Compare(a, b) property is being held. This is not true of the Cockroach comparer, so when the bytes immediately following the trimmed prefix are a special key prefix in Cockroach (eg. they denote local keys or something), CopySpan does something unexpected like ignoring those blocks completely.

In the case of drt-chaos, n3 and n6 did buggy downloads of the online restore sstables, while n1 did a regular compaction with new writes from higher levels. This led to n3 and n6 repeating the same bug and achieving a majority with the wrong result, while n1 went down the non-buggy code path and got yeeted out by the consistency checker for being in the minority.

@itsbilal itsbilal changed the title kv: All nodes in drt-chaos went down due to error "replica inconsistency was detected" pebble: online restore download copier makes unsafe assumptions about key comparisons in the presence of synthetic prefixes and suffixes Nov 5, 2024
@itsbilal itsbilal changed the title pebble: online restore download copier makes unsafe assumptions about key comparisons in the presence of synthetic prefixes and suffixes pebble: online restore download copier makes unsafe assumptions about key comparisons with synthetic prefix/suffix Nov 5, 2024
@itsbilal
Copy link
Member

itsbilal commented Nov 5, 2024

This bug isn't just limited to downloads; there are instances of invalid key comparisons even inside the sstable iterators in the presence of synthetic prefixes:

https://github.com/cockroachdb/pebble/blob/1af22dc322eb38a77c35f41910139149999c418a/sstable/rowblk/rowblk_iter.go#L542

Given this is going to be a significant effort to fix (it'll involve changes to CopySpan and likely all of the sstable iterators), I will lower the priority for this issue even further to give us more time for a holistic fix.

@itsbilal itsbilal added P-3 Issues/test failures with no fix SLA and removed P-2 Issues/test failures with a fix SLA of 3 months labels Nov 5, 2024
@itsbilal itsbilal changed the title pebble: online restore download copier makes unsafe assumptions about key comparisons with synthetic prefix/suffix pebble: invalid use of comparators in the presence of synthetic prefix/suffix with online restore Nov 5, 2024
@itsbilal
Copy link
Member

itsbilal commented Nov 5, 2024

Filed the Pebble companion issue cockroachdb/pebble#4136.

@itsbilal itsbilal added the branch-master Failures and bugs on the master branch. label Nov 5, 2024
@jbowens jbowens changed the title pebble: invalid use of comparators in the presence of synthetic prefix/suffix with online restore storage: online restore download inconsistency Nov 11, 2024
@itsbilal
Copy link
Member

Update to this issue: The root cause spelled out above and in cockroachdb/pebble#4136 is not a valid issue. I'll try reproducing this again, as we lost the backup sstables from drt-chaos very shortly after the online restore and couldn't investigate further. Until then, the actual mechanism of this bug is unknown.

Still, from the compaction history of the sstables observed, it's safe to say whatever caused the drt-chaos replica inconsistency was isolated to online restore (as the nodes that did the download had the missing rows, but not the nodes that did regular compactions), so this is not a GA or release blocker. It's just harder to say anything more specific within online restore than that just yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-storage Relating to our storage engine (Pebble) on-disk storage. branch-master Failures and bugs on the master branch. branch-release-24.3 Used to mark GA and release blockers, technical advisories, and bugs for 24.3 C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-testcluster Issues found or occurred on a test cluster, i.e. a long-running internal cluster P-3 Issues/test failures with no fix SLA T-storage Storage Team
Projects
Status: Investigations
Development

No branches or pull requests

4 participants