Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This implements a new random sorting algorithm, which is both faster and "more random" than the previous implementation.
These are performance numbers I get for the initial SQL query of a random sort on the images page, using a database of ~2.5M images:
SELECT DISTINCT images.id FROM images ORDER BY (substr(images.id * 0.1583058238029480, length(images.id) + 2)) ASC, COALESCE(images.title, images.id) COLLATE NATURAL_CI ASC LIMIT 40 OFFSET 0
SELECT DISTINCT images.id FROM images ORDER BY mod((images.id + 15830583) * (images.id + 15830583) * 52959209 + (images.id + 15830583) * 1047483763, 2147483647) ASC, COALESCE(images.title, images.id) COLLATE NATURAL_CI ASC LIMIT 40 OFFSET 0
SELECT DISTINCT images.id FROM images ORDER BY images.created_at ASC, COALESCE(images.title, images.id) COLLATE NATURAL_CI ASC LIMIT 40 OFFSET 0
I've included a sort on
Created At
as comparison, since that column is not indexed and thus SQLite needs to do a full table scan, like it does for a random sort.As for the randomness, the previous algorithm was effectively doing
rand = images.id * seed % 100000000
. If you plot a graph ofid
vsrand
, the pattern that results from this is extremely obvious. As a result, many seed values produce terrible results. Some particularly bad seeds I found:10000005
,11111111
,55555555
,99999995
(use in e.g. http://localhost:9999/scenes?sortby=random_{seed}). Those are admittedly a bit contrived, but I've definitely come across bad seeds randomly while using stash normally several times.I've "borrowed" the new algorithm from a comment on StackOverflow here. Its values are far more random when plotted, and give far better output, with no "broken" seed values.
As I mention in a comment, ideally we'd be using a custom function, since that would allow the use of
uint
s and overflow rather than floats, but that was much slower than even the previous algorithm in my testing.