osiris_log: Use index skip search for timestamp offset specs #195

the-mikedavis · 2025-09-18T14:54:22Z

Looking up {timestamp, Ts} offset specs for offset readers previously performed a linear scan through the index file with many individual reads. We can reuse the idx_skip_search/3 routine used for the offset offset spec for a faster lookup as the skip search eliminates large sections of the index file and performs fewer reads.

This resolves the "TODO: optimise using skip search approach" in the old code.

Looking up `{timestamp, Ts}` offset specs for offset readers previously performed a linear scan through the index file with many individual reads. We can reuse the `idx_skip_search/3` routine used for the offset offset spec for a faster lookup as the skip search eliminates large sections of the index file and performs fewer reads.

the-mikedavis · 2025-09-18T15:23:28Z

Looks like there might be a flake in osiris_SUITE:retention_overtakes_offset_reader

kjnilsson · 2025-09-18T15:28:17Z

Looks like there might be a flake in osiris_SUITE:retention_overtakes_offset_reader

yeah it looks like a flaky test.

This PR looks good, did you run the stream tests in rabbit against this?

the-mikedavis · 2025-09-18T15:32:38Z

Not yet! I'll do that now

the-mikedavis · 2025-09-18T16:08:26Z

All of the rabbit tests look green rabbitmq/rabbitmq-server#14571

To get some numbers I'm building a segment very slowly on my machine perf-test --queue sq --stream-queue --rate 1. Once the segment rolls over I bet we'll see a big difference in this scenario.

the-mikedavis · 2025-09-18T18:31:59Z

On second thought, it would take a really really long time for that segment file to roll over, so, easier to test this with 'artisanal' index files.

Testing shell session...

I exported chunk_location_for_timestamp/2 and the old version for this test. Then we build index files with increasing numbers of entries and find the timestamp that should be 95% the way through the file.

> Now = erlang:system_time(millisecond).
> file:write_file("1000.index", <<"OSIL", 1:32/unsigned, << <<N:64/unsigned, (Now + 1000 * N):64/signed, 1:64/unsigned, 0:32/unsigned, 0:8/unsigned>> || N <- lists:seq(1, 1000) >>/binary>>).
> file:write_file("10000.index", <<"OSIL", 1:32/unsigned, << <<N:64/unsigned, (Now + 1000 * N):64/signed, 1:64/unsigned, 0:32/unsigned, 0:8/unsigned>> || N <- lists:seq(1, 10000) >>/binary>>).
> file:write_file("100000.index", <<"OSIL", 1:32/unsigned, << <<N:64/unsigned, (Now + 1000 * N):64/signed, 1:64/unsigned, 0:32/unsigned, 0:8/unsigned>> || N <- lists:seq(1, 100_000) >>/binary>>).
> file:write_file("1000000.index", <<"OSIL", 1:32/unsigned, << <<N:64/unsigned, (Now + 1000 * N):64/signed, 1:64/unsigned, 0:32/unsigned, 0:8/unsigned>> || N <- lists:seq(1, 1_000_000) >>/binary>>).
> file:write_file("10000000.index", <<"OSIL", 1:32/unsigned, << <<N:64/unsigned, (Now + 1000 * N):64/signed, 1:64/unsigned, 0:32/unsigned, 0:8/unsigned>> || N <- lists:seq(1, 10_000_000) >>/binary>>).

> timer:tc(fun() -> osiris_log:chunk_location_for_timestamp("1000.index", Now + 950 * 1000) end, microsecond).
{137,{950,0}}
> timer:tc(fun() -> osiris_log:old_chunk_location_for_timestamp("1000.index", Now + 950 * 1000) end, microsecond).
{266,{950,0}}

> timer:tc(fun() -> osiris_log:chunk_location_for_timestamp("10000.index", Now + 9500 * 1000) end, microsecond).
{179,{9500,0}}
> timer:tc(fun() -> osiris_log:old_chunk_location_for_timestamp("10000.index", Now + 9500 * 1000) end, microsecond).
{1614,{9500,0}}

> timer:tc(fun() -> osiris_log:chunk_location_for_timestamp("100000.index", Now + 95000 * 1000) end, microsecond).
{311,{95000,0}}
> timer:tc(fun() -> osiris_log:old_chunk_location_for_timestamp("100000.index", Now + 95000 * 1000) end, microsecond).
{14939,{95000,0}}

> timer:tc(fun() -> osiris_log:chunk_location_for_timestamp("1000000.index", Now + 950000 * 1000) end, microsecond).
{2079,{950000,0}}
> timer:tc(fun() -> osiris_log:old_chunk_location_for_timestamp("1000000.index", Now + 950000 * 1000) end, microsecond).
{147279,{950000,0}}

> timer:tc(fun() -> osiris_log:chunk_location_for_timestamp("10000000.index", Now + 9500000 * 1000) end, microsecond).
{18315,{9500000,0}}
> timer:tc(fun() -> osiris_log:old_chunk_location_for_timestamp("10000000.index", Now + 9500000 * 1000) end, microsecond).
{1492878,{9500000,0}}

Num records in index	Old lookup time (μs)	New lookup time (μs)
1_000	266	137
10_000	1_614	179
100_000	14_939	311
1_000_000	147_279	2_079
10_000_000	1_492_878	18_315

Nice! Even when the index file is smaller than the skip size (2048) we see an improvement, presumably because we're reading the entire index into memory rather than doing many file reads. The skip search doesn't really degrade until we get up to really ridiculously large index sizes.

kjnilsson · 2025-09-19T09:28:38Z

On second thought, it would take a really really long time for that segment file to roll over, so, easier to test this with 'artisanal' index files.

Testing shell session...
Num records in index Old lookup time (μs) New lookup time (μs)
1_000 266 137
10_000 1_614 179
100_000 14_939 311
1_000_000 147_279 2_079
10_000_000 1_492_878 18_315
Nice! Even when the index file is smaller than the skip size (2048) we see an improvement, presumably because we're reading the entire index into memory rather than doing many file reads. The skip search doesn't really degrade until we get up to really ridiculously large index sizes.

Cool, yeah I think streams have a chunk limit of 256k anyway to avoid super sized indexes.

the-mikedavis · 2025-09-19T18:44:48Z

Ah right, so the index file can be at most 8 + 29 * 256000 = 7424008 bytes (at least by default, without advanced config). I was thinking about trying to use a spin on binary search but the skip search already performs very well under the default. So doesn't seem worth the complexity / effort

the-mikedavis self-assigned this Sep 18, 2025

lukebakken requested review from kjnilsson and lukebakken and removed request for lukebakken September 18, 2025 15:06

the-mikedavis mentioned this pull request Sep 18, 2025

Check Osiris timestamp scan change rabbitmq/rabbitmq-server#14571

Closed

kjnilsson approved these changes Sep 19, 2025

View reviewed changes

kjnilsson merged commit 927efa4 into main Sep 19, 2025
6 of 7 checks passed

kjnilsson deleted the md/timestamp-skip-search branch September 19, 2025 09:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

osiris_log: Use index skip search for timestamp offset specs #195

osiris_log: Use index skip search for timestamp offset specs #195

Uh oh!

the-mikedavis commented Sep 18, 2025

Uh oh!

the-mikedavis commented Sep 18, 2025

Uh oh!

kjnilsson commented Sep 18, 2025

Uh oh!

the-mikedavis commented Sep 18, 2025

Uh oh!

the-mikedavis commented Sep 18, 2025

Uh oh!

the-mikedavis commented Sep 18, 2025 •

edited

Loading

Uh oh!

kjnilsson commented Sep 19, 2025

Uh oh!

Uh oh!

the-mikedavis commented Sep 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

osiris_log: Use index skip search for timestamp offset specs #195

osiris_log: Use index skip search for timestamp offset specs #195

Uh oh!

Conversation

the-mikedavis commented Sep 18, 2025

Uh oh!

the-mikedavis commented Sep 18, 2025

Uh oh!

kjnilsson commented Sep 18, 2025

Uh oh!

the-mikedavis commented Sep 18, 2025

Uh oh!

the-mikedavis commented Sep 18, 2025

Uh oh!

the-mikedavis commented Sep 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kjnilsson commented Sep 19, 2025

Uh oh!

Uh oh!

the-mikedavis commented Sep 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

the-mikedavis commented Sep 18, 2025 •

edited

Loading