-
Notifications
You must be signed in to change notification settings - Fork 4k
storage: improve safety of merge lock #27061
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
dff0338 to
c3db5fe
Compare
|
Reviewed 2 of 2 files at r1. pkg/storage/replica.go, line 5162 at r1 (raw file):
I'm almost certain this is not what we want. By setting a descriptor here, you make this an initialized replica, so it's not just a placeholder -- it's a full-blown replica that thinks it's initialized (see Comments from Reviewable |
|
Review status: pkg/storage/replica.go, line 5162 at r1 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
It can't send Raft messages because we hold its raftMu, right? And unless I'm mistaken the merge cannot fail after this point. If it does the node crashes. I agree that the old code was not safe. Comments from Reviewable |
|
Review status: pkg/storage/replica.go, line 5162 at r1 (raw file): Previously, benesch (Nikhil Benesch) wrote…
It's true that the code has no way to gracefully fail after this lock is acquired, at least not today. I think you're also right about not being able to handle/send Raft messages, and it's nice that this unifies two code paths (by making the RHS replica always "exist"). But at the same time raftMu isn't what protects the descriptor, that's Comments from Reviewable |
|
Hmm, ok. I'll be more comfortable using the combination of a placeholder and an uninitialized replica if we can clean up the cleanup code (heh). A failed split sometimes does this: cockroach/pkg/storage/replica.go Lines 5164 to 5174 in c094d1d
And sometimes this: cockroach/pkg/storage/store.go Lines 2239 to 2244 in c094d1d
But then all that logic is also duplicated in cockroach/pkg/storage/store.go Lines 2536 to 2541 in c094d1d
And for some reason store.removeReplicaImpl can remove a placeholder replica?? cockroach/pkg/storage/store.go Line 2549 in c094d1d
AllowPlaceholders option, so why did that line exist before?
|
|
tl;dr is there's another PR coming your way to unify this logic first. |
c3db5fe to
a15e5b3
Compare
|
Review status: pkg/storage/replica.go, line 5162 at r1 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
Ok, reworked this. How's it look now? Comments from Reviewable |
27112: roachpb: replace `gogoproto.onlyone` with `oneof` in BatchRequest/BatchResponse r=nvanbenschoten a=nvanbenschoten All Requests and Responses pass through RequestUnion/ResponseUnion structs when they are added to BatchRequests/BatchResponses. In order to ensure that only one Request type can be assigned to one of these RequestUnion or ResponseUnion structs, we currently use gogoproto's approach to tagged unions: the `gogoproto.onlyone` option. This option was introduced before proto3. Proto3 then added the `oneof` option, which for all intents and purposes addresses the same issue: https://developers.google.com/protocol-buffers/docs/proto#oneof. However, there is one major difference between the two options, which is in their generated code. `gogoproto.onlyone` will generate a single flat struct with pointers to each possible variant type. `oneof` will generate a union interface and an interface "wrapper" struct for each variant type. The effect of this is that `onlyone` will generate code that looks like this: ``` type Union struct { Variant1 *Variant1Type Variant2 *Variant2Type ... } ``` While `oneof` will generate code the looks like this: ``` type Union struct { Value isUnion_Value } type isUnion_Value interface { ... } type Union_Variant1 struct { Variant1 *Variant1Type } type Union_Variant2 struct { Variant2 *Variant2Type } ``` There are pretty obvious tradeoffs to each. For one, `oneof` introduces an extra layer of indirection, which forces an extra allocation. It also doesn't generate particularly useful setters and getters. On the other hand, `onlyone` creates a large struct that grows linearly with the number of variants. Neither approach is great, and there has been **A LOT** of discussion on this: - golang/protobuf#78 - golang/protobuf#283 - gogo/protobuf#103 - gogo/protobuf#168 Clearly neither approach is ideal, ergonomically or with regard to performance. However, over time, the tradeoff has been getting worse for us and it's time we consider switching over to `oneof` in `RequestUnion` and `ResponseUnion`. These structs have gotten huge as more and more request variants have been added: `RequestUnion` has grown to **328 bytes** and `ResponseUnion` has grown to **320 bytes**. It has gotten to the point where the wasted space is non-negligible. This change switches over to `oneof` to shrink these union structs down to more manageable sizes (16 bytes each). The downside of this is that in reducing the struct size we end up introducing an extra allocation. This isn't great, but we can avoid the extra allocation in some places (like `BatchRequest.CreateReply`) by grouping the allocation with that of the Request/Response itself. We've seen previous cases like #4216 where adding in an extra allocation/indirection is a net-win if it reduces a commonly used struct's size significantly. The other downside to this change is that the ergonomics of `oneof` aren't quite as nice as `gogo.onlyone`. Specifically, `gogo.onlyone` generates getters and setters called `GetValue` and `SetValue` that provide access to the wrapped `interface{}`, which we can assert to a `Request`. `oneof` doesn't provide such facilities. This was the cause of a lot of the discussions linked above. While it we be nice for this to be resolved upstream, I think we've waited long enough (~3 years) for a resolution to those discussions. For now, we'll just generate the getters and setters ourselves. This change demonstrated about a **5%** improvement when running kv95 on my local laptop. When run on a three-node GCE cluster (4 vCPUs), the improvements were less pronounced but still present. kv95 showed a throughput improvement of **2.4%**. Running kv100 showed a much more dramatic improvement of **18%** on the three-node GCE cluster. I think this is because kv100 is essentially a hot loop where all reads miss because the cluster remains empty, so it's the best-case scenario for this change. Still, the impact was shocking. Release note (performance improvement): Reduce the memory size of commonly used Request and Response objects. 27114: opt/sql: fix explain analyze missing option r=asubiotto a=asubiotto ConstructExplain previously ignored the ANALYZE option so any EXPLAIN ANALYZE statement would result in execution as an EXPLAIN (DISTSQL) statement. The ANALYZE option is now observed in ConstructExplain. Additionally, the stmtType field from the explainDistSQLNode has been removed because it was not necessary and it was unclear how to pass this from the `execFactory`. Release note: None 27116: Makefile: learn that roachtest depends on optimizer-generated code r=benesch a=benesch As mentioned in cd4415c, the Makefile will one day be smart enough to deduce this on its own, but for now it's simpler to explicitly list the commands that require generated code. Note that the simple but coarse solution of assuming that all commands depend on generated code is inviable as some of these commands are used to generate the code in the first place. Release note: None 27119: storage: extract replica unlinking into store method r=tschottdorf a=benesch Extract some code that was duplicated in three places into a dedicated helper method. Prerequisite for #27061. Release note: None Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com> Co-authored-by: Alfonso Subiotto Marqués <alfonso@cockroachlabs.com> Co-authored-by: Nikhil Benesch <nikhil.benesch@gmail.com>
|
Reviewed 2 of 2 files at r2. pkg/storage/replica.go, line 5215 at r2 (raw file):
In the interest of future proofing, can you remove the placeholder if there is one (in reality, today, the merge looks like it will commit and should remove the placeholder in the process by destroying the RHS, right?). pkg/storage/store.go, line 2373 at r1 (raw file):
Is there some obscure requirement here about some uninitialized replica for the given RangeID? Comments from Reviewable |
a15e5b3 to
46f04e6
Compare
|
Ugh, I had to rework this slightly. What I was doing before was wrong. It's much easier to handle the uninitialized case separately from the initialized case. PTAL. Review status: pkg/storage/replica.go, line 5215 at r2 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
Done. pkg/storage/store.go, line 2373 at r1 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
Oops, yes. #11657 I updated the comment to be far more clear. Comments from Reviewable |
8ab71a0 to
779cd00
Compare
tbg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just curious, what was the problem with the old code?
Reviewed 2 of 2 files at r3.
Reviewable status:complete! 0 of 0 LGTMs obtained (and 1 stale)
pkg/storage/replica.go, line 5215 at r3 (raw file):
r.store.mu.Lock() if _, ok := r.store.mu.replicasByKey.Get(placeholder).(*ReplicaPlaceholder); ok { log.Fatalf(ctx, "merge did not remove placeholder %+v from replicasByKey", placeholder)
Just for OCD, would you Unlock here and below before Fatal'ing? This won't matter in most cases, but if the stringer for something passed into the Fatal would need the store lock, it would deadlock in a nondescript way.
779cd00 to
d7c9c46
Compare
benesch
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way I had plumbed opts.AllowPlaceholders was preventing the placeholder from getting removed immediately from replicasByKey. Whoops. (It was getting removed on the next Raft tick or something, so it wasn't causing problems, but still wonky.) Anyway, it's much easier to reason about the code if uninitialized replicas never make it to removeReplicaImpl.
Thanks for the review!
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale)
pkg/storage/replica.go, line 5215 at r3 (raw file):
Previously, tschottdorf (Tobias Schottdorf) wrote…
Just for OCD, would you Unlock here and below before Fatal'ing? This won't matter in most cases, but if the stringer for something passed into the Fatal would need the store lock, it would deadlock in a nondescript way.
Done.
|
bors r=tschottdorf |
Merge conflict |
bdarnell
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 2 files at r3, 1 of 1 files at r4.
Reviewable status:complete! 0 of 0 LGTMs obtained (and 2 stale)
Fix a known correctness bug in the merge locking scheme. See the comment updates within for details. As an added benefit, this fix frees store.RemoveReplica of conditionals to deal with uninitialized replicas. There are no tests for this because testing this particular edge case is quite difficult and I'm primarily interested in avoiding the obviously wrong call to batch.ClearRange(nil, nil) that occurs when calling store.RemoveReplica on an unintialized replica. This nonsensical ClearRange is, somehow, not currently an error, but it is preventing landing. Release note: None
d7c9c46 to
0712a55
Compare
|
bors r=tschottdorf,bdarnell |
27061: storage: improve safety of merge lock r=tschottdorf,bdarnell a=benesch Fix a known correctness bug in the merge locking scheme. See the comment updates within for details. As an added benefit, this fix frees store.RemoveReplica of conditionals to deal with uninitialized replicas. Acquiring the merge lock now always results in an initialized right-hand replica, so store.MergeRange is always removing an initialized right-hand replica. There are no tests for this because testing this particular edge case is quite difficult and I'm primarily interested in avoiding the obviously wrong call to batch.ClearRange(nil, nil) that occurs when calling store.RemoveReplica on an unintialized replica. This nonsensical ClearRange is, somehow, not currently an error, but it is preventing #26872 ("engine: require iterators to specify an upper bound") from landing. Release note: None --- P.S. The `storage.RemoveOptions` struct now bundles precisely one parameter. Let me know if you'd prefer I change back the signature of `store.removeReplicaInternal` to take the bool directly. Bool parameters are frowned upon, though, so I wasn't sure what was preferred. Co-authored-by: Nikhil Benesch <nikhil.benesch@gmail.com>
Build succeeded |
Fix a known correctness bug in the merge locking scheme. See the comment
updates within for details.
As an added benefit, this fix frees store.RemoveReplica of conditionals
to deal with uninitialized replicas. Acquiring the merge lock now always
results in an initialized right-hand replica, so store.MergeRange is
always removing an initialized right-hand replica.
There are no tests for this because testing this particular edge case is
quite difficult and I'm primarily interested in avoiding the obviously
wrong call to batch.ClearRange(nil, nil) that occurs when calling
store.RemoveReplica on an unintialized replica. This nonsensical
ClearRange is, somehow, not currently an error, but it is preventing
#26872 ("engine: require iterators to specify an upper bound") from
landing.
Release note: None
P.S. The
storage.RemoveOptionsstruct now bundles precisely one parameter. Let me know if you'd prefer I change back the signature ofstore.removeReplicaInternalto take the bool directly. Bool parameters are frowned upon, though, so I wasn't sure what was preferred.