Skip to content

Conversation

@the-mikedavis
Copy link
Collaborator

Looking up {timestamp, Ts} offset specs for offset readers previously performed a linear scan through the index file with many individual reads. We can reuse the idx_skip_search/3 routine used for the offset offset spec for a faster lookup as the skip search eliminates large sections of the index file and performs fewer reads.

This resolves the "TODO: optimise using skip search approach" in the old code.

Looking up `{timestamp, Ts}` offset specs for offset readers previously
performed a linear scan through the index file with many individual
reads. We can reuse the `idx_skip_search/3` routine used for the offset
offset spec for a faster lookup as the skip search eliminates large
sections of the index file and performs fewer reads.
@the-mikedavis the-mikedavis self-assigned this Sep 18, 2025
@lukebakken lukebakken requested review from kjnilsson and lukebakken and removed request for lukebakken September 18, 2025 15:06
@the-mikedavis
Copy link
Collaborator Author

Looks like there might be a flake in osiris_SUITE:retention_overtakes_offset_reader

@kjnilsson
Copy link
Contributor

Looks like there might be a flake in osiris_SUITE:retention_overtakes_offset_reader

yeah it looks like a flaky test.

This PR looks good, did you run the stream tests in rabbit against this?

@the-mikedavis
Copy link
Collaborator Author

Not yet! I'll do that now

@the-mikedavis
Copy link
Collaborator Author

All of the rabbit tests look green rabbitmq/rabbitmq-server#14571

To get some numbers I'm building a segment very slowly on my machine perf-test --queue sq --stream-queue --rate 1. Once the segment rolls over I bet we'll see a big difference in this scenario.

@the-mikedavis
Copy link
Collaborator Author

the-mikedavis commented Sep 18, 2025

On second thought, it would take a really really long time for that segment file to roll over, so, easier to test this with 'artisanal' index files.

Testing shell session...

I exported chunk_location_for_timestamp/2 and the old version for this test. Then we build index files with increasing numbers of entries and find the timestamp that should be 95% the way through the file.

> Now = erlang:system_time(millisecond).
> file:write_file("1000.index", <<"OSIL", 1:32/unsigned, << <<N:64/unsigned, (Now + 1000 * N):64/signed, 1:64/unsigned, 0:32/unsigned, 0:8/unsigned>> || N <- lists:seq(1, 1000) >>/binary>>).
> file:write_file("10000.index", <<"OSIL", 1:32/unsigned, << <<N:64/unsigned, (Now + 1000 * N):64/signed, 1:64/unsigned, 0:32/unsigned, 0:8/unsigned>> || N <- lists:seq(1, 10000) >>/binary>>).
> file:write_file("100000.index", <<"OSIL", 1:32/unsigned, << <<N:64/unsigned, (Now + 1000 * N):64/signed, 1:64/unsigned, 0:32/unsigned, 0:8/unsigned>> || N <- lists:seq(1, 100_000) >>/binary>>).
> file:write_file("1000000.index", <<"OSIL", 1:32/unsigned, << <<N:64/unsigned, (Now + 1000 * N):64/signed, 1:64/unsigned, 0:32/unsigned, 0:8/unsigned>> || N <- lists:seq(1, 1_000_000) >>/binary>>).
> file:write_file("10000000.index", <<"OSIL", 1:32/unsigned, << <<N:64/unsigned, (Now + 1000 * N):64/signed, 1:64/unsigned, 0:32/unsigned, 0:8/unsigned>> || N <- lists:seq(1, 10_000_000) >>/binary>>).

> timer:tc(fun() -> osiris_log:chunk_location_for_timestamp("1000.index", Now + 950 * 1000) end, microsecond).
{137,{950,0}}
> timer:tc(fun() -> osiris_log:old_chunk_location_for_timestamp("1000.index", Now + 950 * 1000) end, microsecond).
{266,{950,0}}

> timer:tc(fun() -> osiris_log:chunk_location_for_timestamp("10000.index", Now + 9500 * 1000) end, microsecond).
{179,{9500,0}}
> timer:tc(fun() -> osiris_log:old_chunk_location_for_timestamp("10000.index", Now + 9500 * 1000) end, microsecond).
{1614,{9500,0}}

> timer:tc(fun() -> osiris_log:chunk_location_for_timestamp("100000.index", Now + 95000 * 1000) end, microsecond).
{311,{95000,0}}
> timer:tc(fun() -> osiris_log:old_chunk_location_for_timestamp("100000.index", Now + 95000 * 1000) end, microsecond).
{14939,{95000,0}}

> timer:tc(fun() -> osiris_log:chunk_location_for_timestamp("1000000.index", Now + 950000 * 1000) end, microsecond).
{2079,{950000,0}}
> timer:tc(fun() -> osiris_log:old_chunk_location_for_timestamp("1000000.index", Now + 950000 * 1000) end, microsecond).
{147279,{950000,0}}

> timer:tc(fun() -> osiris_log:chunk_location_for_timestamp("10000000.index", Now + 9500000 * 1000) end, microsecond).
{18315,{9500000,0}}
> timer:tc(fun() -> osiris_log:old_chunk_location_for_timestamp("10000000.index", Now + 9500000 * 1000) end, microsecond).
{1492878,{9500000,0}}
Num records in index Old lookup time (μs) New lookup time (μs)
1_000 266 137
10_000 1_614 179
100_000 14_939 311
1_000_000 147_279 2_079
10_000_000 1_492_878 18_315

Nice! Even when the index file is smaller than the skip size (2048) we see an improvement, presumably because we're reading the entire index into memory rather than doing many file reads. The skip search doesn't really degrade until we get up to really ridiculously large index sizes.

@kjnilsson
Copy link
Contributor

On second thought, it would take a really really long time for that segment file to roll over, so, easier to test this with 'artisanal' index files.

Testing shell session...
Num records in index Old lookup time (μs) New lookup time (μs)
1_000 266 137
10_000 1_614 179
100_000 14_939 311
1_000_000 147_279 2_079
10_000_000 1_492_878 18_315
Nice! Even when the index file is smaller than the skip size (2048) we see an improvement, presumably because we're reading the entire index into memory rather than doing many file reads. The skip search doesn't really degrade until we get up to really ridiculously large index sizes.

Cool, yeah I think streams have a chunk limit of 256k anyway to avoid super sized indexes.

@kjnilsson kjnilsson merged commit 927efa4 into main Sep 19, 2025
6 of 7 checks passed
@kjnilsson kjnilsson deleted the md/timestamp-skip-search branch September 19, 2025 09:29
@the-mikedavis
Copy link
Collaborator Author

Ah right, so the index file can be at most 8 + 29 * 256000 = 7424008 bytes (at least by default, without advanced config). I was thinking about trying to use a spin on binary search but the skip search already performs very well under the default. So doesn't seem worth the complexity / effort

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants