-
Notifications
You must be signed in to change notification settings - Fork 457
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
db: add LazyValue for a value that may not be stored in-place with th… #2012
Conversation
The Pebble benchmark numbers are slightly slower, but they are a bit noisy in that another run on my gceworker had master slower than this PR. The only consistently slower one is
|
That +26% was noise. There's nothing odd in the cpu profile. Rerunning on gceworker
And on my macbook
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still wrapping my head around the LazyValue interface. left some comments that are just verbalized attempts at under understanding
Reviewed all commit messages.
Reviewable status: 0 of 41 files reviewed, 5 unresolved discussions (waiting on @sumeerbhola)
-- commits
line 11 at r2:
This is necessary because of the new extended MVCC Value encoding, where a non-zero length value no longer guarantees it's not a tombstone?
-- commits
line 44 at r2:
Nice.
internal/base/lazy_value.go
line 100 at r2 (raw file):
// sstable iterator may not even be open. LazyValue.Value() will still // work (at a higher cost), but since the sstable iterator is not open, it // cannot satisfy P2. For this reason, the LazyValue.Value() method
what does "satisfy P2" mean here? does it mean 'it has no mechanism to know when the retrieved value is no longer in use?' since the cloned LazyValue has an infinite lifetime? I see the general problem described under P2, but I don't think I understand the definition of P2.
internal/base/lazy_value.go
line 237 at r2 (raw file):
} bufLen := len(buf) buf = append(buf, lv.ValueOrHandle...)
Is buf ever-growing then? And if the caller doesn't want that, they're free to reslice the returned buf before supplying it to Clone again?
Could this copy be prevented by with an alternative interface? In a reverse iteration case where the current value is inlined, the copy is unnecessary if the underlying block doesn't end up being released. Could this copy be done lazily, if the iterator is going to step to the previous block but the caller has currently 'pinned' the value that they currently care about (P1) to a value in the previous block?
Is the passing around of LazyValue as values with indefinite lifetimes in conflict with P1?
The following isn't an earnest interface suggestion, but talking about it might help me understand LazyValue and its relationship to P1 & P2:
type InternalIterator interface {
ValuePin
// ...
}
type ValuePin interface {
// Pin pins the current value, ensuring that subsequent calls to Value return
// the value of the key-value pair under the current iterator position until
// either Unpin or Pin is called again.
//
// If the iterator must copy the value to preserve access, the provided byte
// slice will be used.
Pin(buf []byte)
// Unpin unpins the previous pin. Subsequent calls to Value will return the value at
// the current iterator position.
Unpin()
// Value returns the pinned value, if currently pinned, or the value at the current iterator
// position if currently unpinned. The return LazyValue is only valid until unpinned, if
// if pinned, or the next iterator operation if currently unpinned.
Value() LazyValue
}
type LazyValue struct { /* ... */}
func (lv LazyValue) Len() int {}
func (lv LazyValue) Attributes() ShortAttributes {}
func (lv LazyValue) Get() []byte {}
Does the above codify [P1] & [P2] at the expense of not being able to lazily retrieve values from multiple previous iterator positions?
sstable/block_property.go
line 82 at r2 (raw file):
// stored together with the key, and may not even be retrieved during // compactions. If Pebble is configured with such value separation, block // properties must only apply to the key.
Is this saying that when a value is separated, the block property collector won't be supplied with the value at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR!
Reviewable status: 0 of 41 files reviewed, 5 unresolved discussions (waiting on @jbowens)
Previously, jbowens (Jackson Owens) wrote…
This is necessary because of the new extended MVCC Value encoding, where a non-zero length value no longer guarantees it's not a tombstone?
Correct
Previously, jbowens (Jackson Owens) wrote…
Nice.
This mostly fell out of the design.
It was getting hard to reason about when a copy should be made, pebble.Iterator
might have stepped the InternalIterator
but pebbleMVCCScanner
with a LazyValue
should not have to know. Earlier pebbleMVCCScanner
did not care whether the unstable value was backed by a slice in pebble.Iterator
or in InternalIterator
, and we want something analogous now.
internal/base/lazy_value.go
line 100 at r2 (raw file):
what does "satisfy P2" mean here? does it mean 'it has no mechanism to know when the retrieved value is no longer in use?'
Yes, it is 'it has no mechanism to know when the retrieved value is no longer in use'. I've changed the phrasing in the comment to say this.
since the cloned LazyValue has an infinite lifetime? I see the general problem described under P2, but I don't think I understand the definition of P2.
Yes the cloned LazyValue has an infinite lifetime, but that is not quite the issue -- cloning a LazyValue only makes the LazyValue lifetime infinite and says nothing about the underlying value. And we've said that the the caller, by calling LazyValue.Value, is relieving the iterator tree of the promise regarding stability of the value returned from a previous LazyValue.Value. The problem is that we can't transmit this "relief of the promise" to a iterator that is not open. In the implementation, what actually retrieves the value in this case is a valueBlockReader that has temporarily "reopened" itself, and it will "close" itself after doing the retrieval. We could make that valueBlockReader
do an allocation, but I am wary of allocations. Since the caller is the one who is long-lived, it seemed better to have it provide a pre-allocated buffer.
internal/base/lazy_value.go
line 237 at r2 (raw file):
Is buf ever-growing then? And if the caller doesn't want that, they're free to reslice the returned buf before supplying it to Clone again?
Yes, the caller should reslice to buf[:0]
before supplying again. The only reason to do it this way was to support callers that are packing multiple things into the same buffer. Specifically, pebbleMVCCScanner
does:
p.savedBuf = append(p.savedBuf[:0], p.curRawKey...)
p.savedBuf = append(p.savedBuf, p.curRawValue...)
Regarding the ValuePin
interface, it seems that if one moves the iterator after doing a Pin
one can't look at the LazyValue
at the current iterator position without calling Unpin
. So even the attributes are not available for both. That is narrower than what is in this PR. Also, with the PR one can keep a sequence of k LazyValues (k > 2, but small, though we don't currently have a use case for this) and sequentially call LazyValue.Value on each.
In a reverse iteration case where the current value is inlined, the copy is unnecessary if the underlying block doesn't end up being released. Could this copy be done lazily, if the iterator is going to step to the previous block but the caller has currently 'pinned' the value that they currently care about (P1) to a value in the previous block?
I assume the optimization was not the primary goal behind ValuePin
, so I will ignore that part for now. I think what this is trying to do is make pinning/unpinning more explicit via the interface, vs the scheme in this PR.
The PR keeps LazyValue mostly self-contained in order to avoid traversing the iterator tree again when manipulating the value. In addition to the cost of that traversal, doing such a traversal makes my head hurt because the iterator tree may have changed. So the only reference to the iterator tree is via the opaque ValueFetcher
which in practice will point directly to an open valueBlockReader
(the closed valueBlockReader
will retain enough state that it can open the sstable again). It seems to me that ValuePin
will need to traverse the iterator tree, so I am not sure how that will work under the covers.
btw, there is a deficiency in the Clone()/Value() combo in that it does not make it clear whether the value is already owned: say Alice has called Clone() previously and now calls Value() and gets callerOwned=false, and the value is in-place, then in fact it is already caller owned because this in-place value was previously cloned. I don't think we have a use case that would benefit from this, which is why I didn't add support to introspect whether a LazyValue is in-place or not.
sstable/block_property.go
line 82 at r2 (raw file):
Previously, jbowens (Jackson Owens) wrote…
Is this saying that when a value is separated, the block property collector won't be supplied with the value at all?
Yes, I think to make this clean if a Pebble instance has enabled value separation (initially via Options) we could use that knowledge and always pass nil as the value (regardless of whether it is separated or not).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice
Reviewed 27 of 41 files at r1, 11 of 12 files at r2, 3 of 3 files at r3, all commit messages.
Reviewable status: complete! all files reviewed, all discussions resolved (waiting on @sumeerbhola)
…e key The changes here are plumbing for the read path. The actual write and read path for values stored in value blocks will happen in future PRs. A ShortAttribute (3 bits) is also introduced, for user-defined attributes about a value. These will be included in the 1 byte prefix of the value in the key-value pairs in the sstable (the other 5 bits are for internal use). In CockroachDB, the ShortAttribute will initially be used for marking MVCC tombstones (for which 1 bit is sufficient). LazyValue can represent either an in-place value (stored with the key) or a value stored in the value section. The interface is general enough for the future where values may be stored in separate files. LazyValue is used in the InternalIterator interface, such that all positioning calls return (*InternalKey, LazyValue). It is also exposed via the public Iterator for callers that need to remember a recent but not necessarily latest LazyValue, in case they need the actual value in the future. An example is a caller that is iterating in reverse and looking for the latest MVCC version for a key -- it cannot identify the latest MVCC version without stepping to the previous key-value pair e.g. storage.pebbleMVCCScanner in CockroachDB. There is subtlety in memory management: - A LazyValue returned by an InternalIterator or Iterator is unstable in that repositioning the iterator will invalidate the memory inside it. A caller wishing to maintain that LazyValue needs to call LazyValue.Clone. Note that this does not fetch the value if it is not in-place. - A user of an iterator that calls LazyValue.Value wants as much as possible for the returned value byte-slice to point to iterator owned memory. - [P1] The underlying iterator that owns that memory also needs a promise from that user that at any time there is at most one value byte-slice that the caller is expecting it to maintain. Otherwise, the underlying iterator has to maintain multiple such byte-slices which results in more complicated and inefficient code. Note that pre-LazyValue this promise was trivially provided by only asking for the value at the current iterator position (so it could certainly point inside the underlying ssblock). For other cases, the user would make a copy of the byte-slice. We are generalizing this promise which allows for more memory allocation optimization. - [P2] The underlying iterator, in order to make the promise that it is maintaining the one value byte-slice, also needs a way to know when it is relieved of that promise. It is relieved of that promise by being told that it is being repositioned (this behavior is unchanged by this PR). Specifically, the owner of the value byte-slice is a sstable iterator. Consider the case where the caller has used LazyValue.Clone and repositioned the iterator. In this case the sstable iterator may not even be open. LazyValue.Value will still work (at a higher cost), but since the sstable iterator is not open, it cannot satisfy P2. For this reason, the LazyValue.Value method accepts a caller owned buffer, that the callee will use if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rebased. Will merge when CI is green.
Reviewable status: 28 of 41 files reviewed, all discussions resolved (waiting on @jbowens)
Two test failures, both ignorable:
|
…e key
The changes here are plumbing for the read path. The actual write and read path for values stored in value blocks will happen in future PRs.
A ShortAttribute (3 bits) is also introduced, for user-defined attributes about a value. These will be included in the 1 byte prefix of the value in the key-value pairs in the sstable (the other 5 bits are for internal use). In CockroachDB, the ShortAttribute will initially be used for marking MVCC tombstones (for which 1 bit is sufficient).
LazyValue can represent either an in-place value (stored with the key) or a value stored in the value section. The interface is general enough for the future where values may be stored in separate files.
LazyValue is used in the InternalIterator interface, such that all positioning calls return (*InternalKey, LazyValue). It is also exposed via the public Iterator for callers that need to remember a recent but not necessarily latest LazyValue, in case they need the actual value in the future. An example is a caller that is iterating in reverse and looking for the latest MVCC version for a key -- it cannot identify the latest MVCC version without stepping to the previous key-value pair e.g. storage.pebbleMVCCScanner in CockroachDB.
There is subtlety in memory management:
A LazyValue returned by an InternalIterator or Iterator is unstable in that repositioning the iterator will invalidate the memory inside it. A caller wishing to maintain that LazyValue needs to call LazyValue.Clone. Note that this does not fetch the value if it is not in-place.
A user of an iterator that calls LazyValue.Value wants as much as possible for the returned value byte-slice to point to iterator owned memory.
Informs #1170