Improve random sorting algorithm #4246

DingDongSoLong4 · 2023-10-26T16:11:37Z

This implements a new random sorting algorithm, which is both faster and "more random" than the previous implementation.

These are performance numbers I get for the initial SQL query of a random sort on the images page, using a database of ~2.5M images:

	Time	SQL
Before	650-680ms	`SELECT DISTINCT images.id FROM images ORDER BY (substr(images.id * 0.1583058238029480, length(images.id) + 2)) ASC, COALESCE(images.title, images.id) COLLATE NATURAL_CI ASC LIMIT 40 OFFSET 0`
After	250-280ms	`SELECT DISTINCT images.id FROM images ORDER BY mod((images.id + 15830583) * (images.id + 15830583) * 52959209 + (images.id + 15830583) * 1047483763, 2147483647) ASC, COALESCE(images.title, images.id) COLLATE NATURAL_CI ASC LIMIT 40 OFFSET 0`
Created At	150-170ms	`SELECT DISTINCT images.id FROM images ORDER BY images.created_at ASC, COALESCE(images.title, images.id) COLLATE NATURAL_CI ASC LIMIT 40 OFFSET 0`

I've included a sort on Created At as comparison, since that column is not indexed and thus SQLite needs to do a full table scan, like it does for a random sort.

As for the randomness, the previous algorithm was effectively doing rand = images.id * seed % 100000000. If you plot a graph of id vs rand, the pattern that results from this is extremely obvious. As a result, many seed values produce terrible results. Some particularly bad seeds I found: 10000005, 11111111, 55555555, 99999995 (use in e.g. http://localhost:9999/scenes?sortby=random_{seed}). Those are admittedly a bit contrived, but I've definitely come across bad seeds randomly while using stash normally several times.

I've "borrowed" the new algorithm from a comment on StackOverflow here. Its values are far more random when plotted, and give far better output, with no "broken" seed values.

As I mention in a comment, ideally we'd be using a custom function, since that would allow the use of uints and overflow rather than floats, but that was much slower than even the previous algorithm in my testing.

Improve random sorting algorithm

67f1d4c

WithoutPants added this to the Version 0.24.0 milestone Nov 2, 2023

WithoutPants added the improvement Something needed tweaking. label Nov 2, 2023

WithoutPants merged commit d965587 into stashapp:develop Nov 2, 2023
2 checks passed

DingDongSoLong4 deleted the random-sort-perf branch November 2, 2023 10:30

halkeye pushed a commit to halkeye/stash that referenced this pull request Sep 1, 2024

Improve random sorting algorithm (stashapp#4246)

45e255c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve random sorting algorithm #4246

Improve random sorting algorithm #4246

DingDongSoLong4 commented Oct 26, 2023 •

edited

Loading

Improve random sorting algorithm #4246

Improve random sorting algorithm #4246

Conversation

DingDongSoLong4 commented Oct 26, 2023 • edited Loading

DingDongSoLong4 commented Oct 26, 2023 •

edited

Loading