-
Notifications
You must be signed in to change notification settings - Fork 507
sstable: lazy load the index block in single- and two- level iterator #4916
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sstable: lazy load the index block in single- and two- level iterator #4916
Conversation
b0f66ca
to
f203b44
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd expect the (constructor + first access) time to be similar, any ideas why it's so much larger after the change?
Reviewable status: 0 of 10 files reviewed, 5 unresolved discussions (waiting on @jbowens)
sstable/simple_benchmark_test.go
line 93 at r1 (raw file):
// BenchmarkConstructionPlusFirstAccess measures construction + first index access // How to: go test -bench=BenchmarkConstructionPlusFirstAccess -run=^$ ./sstable func BenchmarkConstructionPlusFirstAccess(b *testing.B) {
[nit] all these benchmarks should start with BenchmarkIterator...
sstable/comparison_benchmark_test.go
line 44 at r1 (raw file):
}) b.Run("SimulatedEagerLoading_ConstructOnly", func(b *testing.B) {
I don't understand all these benchmark variants (and the three files). We should have just:
- construction
- construction+first
- construction+seekge
- construction+seek-prefix-ge with no hit in filter
- maybe construction+seek-prefix-ge with existing key
They should be in a different commit so we can easily obtain and show before and after values.
sstable/reader_iter_single_lvl.go
line 173 at r1 (raw file):
clearForResetBoundary struct{} // Lazy loading flag - must be below clearForResetBoundary to survive pool reuse
Why does this need to survive pool reuse?
In any case, we should put it next to another bool so it doesn't unnecessarily increase the struct size by a word
sstable/reader_iter_single_lvl.go
line 451 at r1 (raw file):
} // Load the next block. bhp, err := i.indexIter().BlockHandleWithProperties()
We don't need to keep calling the function each time..
sstable/reader_iter_single_lvl.go
line 1585 at r1 (raw file):
func (i *singleLevelIterator[I, PI, D, PD]) indexIter() PI { if err := i.ensureIndexLoaded(); err != nil {
[nit] since it's a hot path, I'd move the if i.indexLoaded
here, in case ensureIndexLoaded()
doesn't get inlined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me address that in the separate PR after split.
Reviewable status: 0 of 10 files reviewed, 5 unresolved discussions (waiting on @jbowens and @RaduBerinde)
sstable/reader_iter_single_lvl.go
line 173 at r1 (raw file):
Previously, RaduBerinde wrote…
Why does this need to survive pool reuse?
In any case, we should put it next to another bool so it doesn't unnecessarily increase the struct size by a word
You are right. This is unnecessary. Moved it above clearForResetBoundary.
sstable/reader_iter_single_lvl.go
line 451 at r1 (raw file):
Previously, RaduBerinde wrote…
We don't need to keep calling the function each time..
This is existing code replaced with the indexIter() wrapper. Test would fail if use PI(&i.index).
sstable/reader_iter_single_lvl.go
line 1585 at r1 (raw file):
Previously, RaduBerinde wrote…
[nit] since it's a hot path, I'd move the
if i.indexLoaded
here, in caseensureIndexLoaded()
doesn't get inlined.
updated. I'm always under the impression that the compiler is smart enough these days, anyway :p
sstable/comparison_benchmark_test.go
line 44 at r1 (raw file):
Previously, RaduBerinde wrote…
I don't understand all these benchmark variants (and the three files). We should have just:
- construction
- construction+first
- construction+seekge
- construction+seek-prefix-ge with no hit in filter
- maybe construction+seek-prefix-ge with existing key
They should be in a different commit so we can easily obtain and show before and after values.
hmmm, ok. I will split the PR.
sstable/simple_benchmark_test.go
line 93 at r1 (raw file):
Previously, RaduBerinde wrote…
[nit] all these benchmarks should start with
BenchmarkIterator...
ack
f203b44
to
86145b6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 10 files reviewed, 6 unresolved discussions (waiting on @jbowens and @xxmplus)
sstable/reader_iter_single_lvl.go
line 451 at r1 (raw file):
Previously, xxmplus (Andy Xu) wrote…
This is existing code replaced with the indexIter() wrapper. Test would fail if use PI(&i.index).
We already called indexIter()
right above and returned if it wasn't valid, PI(&i.index)
should be fine here.
sstable/reader_iter_single_lvl.go
line 1582 at r2 (raw file):
if !i.indexLoaded { if err := i.ensureIndexLoaded(); err != nil { i.err = err
Is it always safe to use the PI(&i.index)
returned value even if the load failed? E.g. I believe the code i.indexIter().BlockHandleWithProperties()
will crash if we hit an error here.
Also, this introduces a subtle place where i.err
gets set; existing code might be modifying i.err
with the assumption that it was nil before.
I think it would be cleaner to just call ensureIndexLoaded()
and return any error at the beginning of every function that needs it, and then proceed to use PI(&i.index)
as usual.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 10 files reviewed, 6 unresolved discussions (waiting on @jbowens and @RaduBerinde)
sstable/reader_iter_single_lvl.go
line 1582 at r2 (raw file):
Is it always safe to use the
PI(&i.index)
returned value even if the load failed?
It is not. This is indeed a good catch! We should always check for nil to be safe. Let me fix that in the next revision. Btw, when nil is detected, say, before calling i.indexIter().BlockHandleWithProperties()
, what is the correct behavior? Is returning nil fine? I assume panicking is not ok, right?
existing code might be modifying
i.err
with the assumption that it was nil before.
I don't see existing code check i.err
before setting it, what's the correct handling behavior?
I think it would be cleaner to just call
ensureIndexLoaded()
and return any error at the beginning of every function that needs it, and then proceed to usePI(&i.index)
as usual.
I'm afraid it'd defeat the purpose of lazy loading. We do want to delay index loading until it's actually required. Also, all reference to PI(&i.index)
is replaced by indexIter()
. If we mixes the usage of them, it will be really confusing in the future.
Previously, xxmplus (Andy Xu) wrote…
Of course we would call it later (after the bloom filter check) in the one method where it matters (SeekPrefixGE). Methods need to return a nil KV on errors. We can change indexIter() to assert (when invariants.Enabled) that the filter is loaded, and use a separate error-returning ensureIndexLoaded() to do the load. |
This commit implements the index block lazy loading in single-level iterator as default behavior. The index block loading is now deferred until first access. At the same time, it leads to semantic changes as the file reading ordering changes because of this. Here is a quick overview of the change. Before (eager loading) 1. Iterator Construction: 1.1 the constructor calls readTopLevelIndexBlock() 1.2 readTopLevelIndexBlock() calls readIndexBlock() 1.3 readIndexBlock(() calls blockReader.Read() 1.4 blockReader.Read() calls objstorage ReadHandle ReadAt() 1.5 ReadAt() loads the index block 1.6 construction completes 2. First Positioning (e.g., First(), SeekGE(), SeekPrefixGE()): 2.1 the positioning function uses the already-loaded index via PI(&i.index) 2.2 file opened for data read 2.3 ReadAt() loads the data block After (lazy loading) 1. Iterator Construction: 1.1 the constructor sets indexLoaded = false to defer loading 1.2 construction completes (no I/O) 2. First Positioning (e.g., First(), SeekGE(), SeekPrefixGE()): 2.1 the positioning function calls the new indexIter() wrapper 2.2 indexIter() calls ensureIndexLoaded() 2.3 ensureIndexLoaded() calls ReadAt() 2.4 ReadAt() loads the index block 2.5 uses the newly-loaded index for positioning 2.6 ReadAt() loads the data block As seen above, the I/O operation is now deferred. While the overall behavior is transparent to users, the internal semantic is different now. And the data-driven tests have to change accordingly. See the order of events change in file: checkpoint, cleaner, and event_listener. What's interesting (optimized) is the change observed in flushable_ingest, which demonstrate the I/O saving thanks to the optimization. ``` -# When the key doesn't pass the bloom filter, we should see only two block -# reads. +# When the key doesn't pass the bloom filter, we should see only one block +# read due to lazy loading optimization - bloom filter is checked first, +# and if it rejects the key, we can avoid loading the index block entirely. get with-fs-logging small-00001-does-not-exist ---- -read-at(158, 41): 000004.sst read-at(199, 74): 000004.sst ``` implements cockroachdb#3248
86145b6
to
6425752
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 10 files reviewed, 5 unresolved discussions (waiting on @jbowens and @RaduBerinde)
sstable/reader_iter_single_lvl.go
line 1582 at r2 (raw file):
Previously, RaduBerinde wrote…
Of course we would call it later (after the bloom filter check) in the one method where it matters (SeekPrefixGE). Methods need to return a nil KV on errors. We can change indexIter() to assert (when invariants.Enabled) that the filter is loaded, and use a separate error-returning ensureIndexLoaded() to do the load.
Alright, all indexIter()
usages are removed. We now continue to use PI(&i.index)
as before and call ensureIndexLoaded()
when needed.
This commit implements the index block lazy loading in two-level iterator as default behavior. The index block loading is now deferred until first access. Implements cockroachdb#3248
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 11 files reviewed, 5 unresolved discussions (waiting on @jbowens)
This commit implements the index block lazy loading in single-level iterator as default behavior. The index block loading is now deferred until first access. At the same time, it leads to semantic changes as the file reading ordering changes because of this.
Here is a quick overview of the change.
Before (eager loading)
Iterator Construction:
1.1 the constructor calls readTopLevelIndexBlock()
1.2 readTopLevelIndexBlock() calls readIndexBlock()
1.3 readIndexBlock(() calls blockReader.Read()
1.4 blockReader.Read() calls objstorage ReadHandle ReadAt()
1.5 ReadAt() loads the index block
1.6 construction completes
First Positioning (e.g., First(), SeekGE(), SeekPrefixGE()):
2.1 the positioning function uses the already-loaded index via PI(&i.index)
2.2 file opened for data read
2.3 ReadAt() loads the data block
After (lazy loading)
Iterator Construction:
1.1 the constructor sets indexLoaded = false to defer loading
1.2 construction completes (no I/O)
First Positioning (e.g., First(), SeekGE(), SeekPrefixGE()):
2.1 the positioning function calls the new indexIter() wrapper
2.2 indexIter() calls ensureIndexLoaded()
2.3 ensureIndexLoaded() calls ReadAt()
2.4 ReadAt() loads the index block
2.5 uses the newly-loaded index for positioning
2.6 ReadAt() loads the data block
As seen above, the I/O operation is now deferred. While the overall behavior is transparent to users, the internal semantic is different now. And the data-driven tests have to change accordingly. See the order of events change in file: checkpoint, cleaner, and event_listener.
What's interesting (optimized) is the change observed in flushable_ingest, which demonstrate the I/O saving thanks to the optimization.
Next is a comparison between eager loading and lazy loading. We should collect metrics on how often SeekPrefixGE() saved I/O; how often an iterator is created but not used; or how much the deferred operations improve memory usage if they are not available today (TODO).
Benchmark will be moved to a separate PR.
While one may argue the amortized cost is similar in the long run, lazy load could also help with perceived performance potentially. In other words, the average might be same in theory, but the common case is indeed optimized.implements #3248