Skip to content

Conversation

@max-hoffman
Copy link
Contributor

@max-hoffman max-hoffman commented Apr 11, 2024

Add read planning for journal getMany so instead of doing randIO we get seqIO. SeqIO has a large effect for disk storage systems. Also parallelize getMany calls. The biggest change here is the semantics around how reading locks the journal file. Reads hold the journal lock for longer, which could lower write throughput in some cases. If we see evidence of this, we can do more work to limit the amount of time batch reads can interruptibly holding the journal lock.

@max-hoffman
Copy link
Contributor Author

#benchmark

@github-actions
Copy link

@coffeegoddd
Copy link
Contributor

@max-hoffman DOLT

test_name from_latency_median to_latency_median is_faster
tpcc-scale-factor-1 223.34 130.13 1
test_name server_name server_version tps test_name server_name server_version tps is_faster
tpcc-scale-factor-1 dolt a6ce239 14.19 tpcc-scale-factor-1 dolt c3af984 16.71 0

@coffeegoddd
Copy link
Contributor

@max-hoffman DOLT

comparing_percentages
100.000000 to 100.000000
version result total
c3af984 ok 5937457
version total_tests
c3af984 5937457
correctness_percentage
100.0

@coffeegoddd
Copy link
Contributor

@max-hoffman DOLT

read_tests from_latency_median to_latency_median is_faster
covering_index_scan 3.02 2.97 0
groupby_scan 17.63 17.63 0
index_join 5.28 5.18 0
index_join_scan 2.3 2.3 0
index_scan 55.82 55.82 0
oltp_point_select 0.54 0.54 0
oltp_read_only 8.58 8.58 0
select_random_points 0.84 0.84 0
select_random_ranges 0.99 0.99 0
table_scan 55.82 55.82 0
types_table_scan 164.45 161.51 0
write_tests from_latency_median to_latency_median is_faster
oltp_delete_insert 6.79 6.79 0
oltp_insert 3.36 3.43 0
oltp_read_write 16.41 16.41 0
oltp_update_index 3.49 3.49 0
oltp_update_non_index 3.43 3.43 0
oltp_write_only 7.84 7.98 0
types_delete_insert 7.56 7.56 0

@max-hoffman max-hoffman changed the title [nbs] parallelize journal getMany [nbs] getMany read planning and parallelize Apr 11, 2024
@max-hoffman
Copy link
Contributor Author

#benchmark

@max-hoffman max-hoffman requested a review from reltuk April 11, 2024 21:50
@github-actions
Copy link

@coffeegoddd
Copy link
Contributor

@max-hoffman DOLT

comparing_percentages
100.000000 to 100.000000
version result total
f47c827 ok 5937457
version total_tests
f47c827 5937457
correctness_percentage
100.0

@coffeegoddd
Copy link
Contributor

@max-hoffman DOLT

test_name from_latency_median to_latency_median is_faster
tpcc-scale-factor-1 84.47 90.78 0
test_name server_name server_version tps test_name server_name server_version tps is_faster
tpcc-scale-factor-1 dolt 5d5847f 0.47 tpcc-scale-factor-1 dolt f47c827 17.03 -1

@coffeegoddd
Copy link
Contributor

@max-hoffman DOLT

read_tests from_latency_median to_latency_median is_faster
covering_index_scan 3.02 3.02 0
groupby_scan 17.63 17.95 0
index_join 5.28 5.28 0
index_join_scan 2.26 2.3 0
index_scan 54.83 54.83 0
oltp_point_select 0.53 0.53 0
oltp_read_only 8.58 8.58 0
select_random_points 0.84 0.84 0
select_random_ranges 0.99 0.99 0
table_scan 55.82 55.82 0
types_table_scan 161.51 158.63 0
write_tests from_latency_median to_latency_median is_faster
oltp_delete_insert 6.79 6.79 0
oltp_insert 3.36 3.43 0
oltp_read_write 16.41 16.41 0
oltp_update_index 3.49 3.49 0
oltp_update_non_index 3.43 3.43 0
oltp_write_only 7.84 7.84 0
types_delete_insert 7.56 7.56 0

@coffeegoddd
Copy link
Contributor

@max-hoffman DOLT

comparing_percentages
100.000000 to 100.000000
version result total
ba06fd4 ok 5937457
version total_tests
ba06fd4 5937457
correctness_percentage
100.0

@coffeegoddd
Copy link
Contributor

@max-hoffman DOLT

comparing_percentages
100.000000 to 100.000000
version result total
8d0834c ok 5937457
version total_tests
8d0834c 5937457
correctness_percentage
100.0

@coffeegoddd
Copy link
Contributor

@max-hoffman DOLT

comparing_percentages
100.000000 to 100.000000
version result total
6699fca ok 5937457
version total_tests
6699fca 5937457
correctness_percentage
100.0

Copy link
Contributor

@reltuk reltuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, other than the race in getMany. I don't know how to measure the impact on the lock duration here vs. what we used to have before...it's something we can always come back and fix though. I think this is good to go if we fix the race.

Simplest way to fix the race might be to move getManyCompressed impl to a helper function that has a signature something like:

func (s journalChunkSource) getManyCompressed_ext(ctx context.Context, eg *errgroup.Group, reqs []getRecord, found func(context.Context, CompressedChunk) error, stats *Stats) (bool, error) {

and then to wrap it in both getMany and getManyCompressed...

return s.getManyCompressed(ctx, eg, reqs, func(ctx context.Context, cc CompressedChunk) {
ch, err := cc.ToChunk()
if err != nil {
eg.Go(func() error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, I don't think this is race-free. In particular, I don't think it's safe to eg.Go() on an errgroup where eg.Wait() may have already been called.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think I might be wrong here...in particular, errgroup is implemented in terms of a sync.WaitGroup, and I thought all positive delta wg.Add() calls were dangerous after a wg.Wait(), but that's not true...

// Note that calls with a positive delta that occur when the counter is zero
// must happen before a Wait. Calls with a negative delta, or calls with a
// positive delta that start when the counter is greater than zero, may happen
// at any time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And by, "I think I might be wrong here"...I'm quite sure I'm wrong and your usage is safe / to spec. Sorry for the noise.

@max-hoffman
Copy link
Contributor Author

We've talked about how this may negatively impact mixed workloads that will now block on reads for longer. Tradeoff being that batch reads that will complete faster, but are less interruptible because the read holds the journal lock for longer. The tradeoff for tablescans is pretty obvious, if we see issues with write throughput after adding this we add changes that make batch reads only hold the journal lock for short periods of time.

@max-hoffman max-hoffman merged commit 463434e into main Apr 24, 2024
@max-hoffman max-hoffman deleted the max/parallelize-get-many branch April 24, 2024 17:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants