Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug Report] Slow scanning with huge ammounts of videos #2860

Closed
don20aba opened this issue Aug 29, 2022 · 2 comments
Closed

[Bug Report] Slow scanning with huge ammounts of videos #2860

don20aba opened this issue Aug 29, 2022 · 2 comments
Labels
bug report Bug reports that are not yet verified duplicate

Comments

@don20aba
Copy link

With huge amounts of videos, scanning local video content gets slower and slower.
(Remade the bug report since account problems)

General info:
Windows 10
Stash v0.16.1
stash-go.sqlite size: ~110MB
DB contents: ~25k scenes, ~200k images

Short description and possible solution:
The problem lies in screenshots and vtt folders having too many files inside in single folder. This gets more noticable the more files you have, but especially so when you have screenshots and vtt folders (generated\screenshots, generated\vtt) on SMB network share. Stash app is located on local ssd (c:).
Solution would be to remake those folders the same way thumbnails are stored (two levels deep), so there are never many files in one single folder.

A few tests to ilustrate the problem
Test 1:
I moved whole s:\stash\generated\screenshots folder somewhere else.
s:\stash\generated\screenshots - empty, smb share
s:\stash\generated\thumbnails - 200k+ files, stored in two level deep folders, smb share
s:\stash\generated\vtt - ~50k files, smb share

time="2022-08-24 16:36:05" level=info msg="Calculating oshash for t:\dl\1\something120.mp4 ..."
time="2022-08-24 16:36:05" level=info msg="t:\dl\1\something120.mp4 doesn't exist. Creating new item..."
time="2022-08-24 16:36:05" level=debug msg="Creating thumbnail for t:\dl\1\something120.mp4"
time="2022-08-24 16:36:05" level=debug msg="created thumbnail: s:\stash\generated\screenshots\34f85dfe4387a676.thumb.jpg"
time="2022-08-24 16:36:05" level=debug msg="Creating screenshot for t:\dl\1\something120.mp4"
time="2022-08-24 16:36:05" level=debug msg="created screenshot: s:\stash\generated\screenshots\34f85dfe4387a676.jpg"
time="2022-08-24 16:36:14" level=info msg="[generator] generating phash sprite for t:\dl\1\something120.mp4"
time="2022-08-24 16:36:18" level=info msg="[generator] generating sprite image for t:\dl\1\something120.mp4"
time="2022-08-24 16:36:36" level=info msg="[generator] generating sprite vtt for t:\dl\1\something120.mp4"
time="2022-08-24 16:36:36" level=info msg="Calculating oshash for t:\dl\1\something120.mp4 ..."

Fast first part since screenshots is empty

Test 2:
I moved whole vtt folder somewhere else, returned screenshots folder back:
s:\stash\generated\screenshots - ~65k files, smb share
s:\stash\generated\thumbnails - 200k+ files, stored in two level deep folders, smb share
s:\stash\generated\vtt - empty, smb share

time="2022-08-24 16:57:36" level=info msg="Calculating oshash for t:\dl\1\something220.mp4 ..."
time="2022-08-24 16:57:36" level=info msg="t:\dl\1\something220.mp4 doesn't exist. Creating new item..."
time="2022-08-24 16:57:47" level=debug msg="Creating thumbnail for t:\dl\1\something220.mp4"
time="2022-08-24 16:57:53" level=debug msg="created thumbnail: s:\stash\generated\screenshots\84a7340be6934e32.thumb.jpg"
time="2022-08-24 16:57:53" level=debug msg="Creating screenshot for t:\dl\1\something220.mp4"
time="2022-08-24 16:57:59" level=debug msg="created screenshot: s:\stash\generated\screenshots\84a7340be6934e32.jpg"
time="2022-08-24 16:57:59" level=info msg="[generator] generating phash sprite for t:\dl\1\something220.mp4"
time="2022-08-24 16:57:59" level=info msg="[generator] generating sprite image for t:\dl\1\something220.mp4"
time="2022-08-24 16:58:07" level=info msg="[generator] generating sprite vtt for t:\dl\1\something220.mp4"
time="2022-08-24 16:58:07" level=info msg="Calculating checksum for t:\dl\1\something221.mp4..."

Fast second part since vtt is empty

Test 3:
I moved both screenshots and vtt folders somewhere else:
s:\stash\generated\screenshots - empty, smb share
s:\stash\generated\thumbnails - 200k+ files, stored in two level deep folders, smb share
s:\stash\generated\vtt - empty, smb share

time="2022-08-24 17:02:44" level=info msg="Calculating oshash for t:\dl\1\something250.mp4 ..."
time="2022-08-24 17:02:44" level=info msg="t:\dl\1\something250.mp4 doesn't exist. Creating new item..."
time="2022-08-24 17:02:44" level=debug msg="Creating thumbnail for t:\dl\1\something250.mp4"
time="2022-08-24 17:02:44" level=debug msg="created thumbnail: s:\stash\generated\screenshots\380b43d21343734c.thumb.jpg"
time="2022-08-24 17:02:44" level=debug msg="Creating screenshot for t:\dl\1\something250.mp4"
time="2022-08-24 17:02:44" level=debug msg="created screenshot: s:\stash\generated\screenshots\380b43d21343734c.jpg"
time="2022-08-24 17:02:44" level=info msg="[generator] generating phash sprite for t:\dl\1\something250.mp4"
time="2022-08-24 17:02:44" level=info msg="[generator] generating sprite image for t:\dl\1\something250.mp4"
time="2022-08-24 17:03:14" level=info msg="[generator] generating sprite vtt for t:\dl\1\something250.mp4"
time="2022-08-24 17:03:14" level=info msg="Calculating oshash for t:\dl\1\something251.mp4 ..."

Fast everything since both are empty

Test 4:
I returned both screenshots and vtt folders back:
s:\stash\generated\screenshots - ~65k files, smb share
s:\stash\generated\thumbnails - 200k+ files, stored in two level deep folders, smb share
s:\stash\generated\vtt - ~50k files, smb share

time="2022-08-24 17:23:43" level=info msg="Calculating oshash for t:\dl\1\something291.mp4 ..."
time="2022-08-24 17:23:43" level=info msg="t:\dl\1\something291.mp4 doesn't exist. Creating new item..."
time="2022-08-24 17:23:55" level=debug msg="Creating thumbnail for t:\dl\1\something291.mp4"
time="2022-08-24 17:24:00" level=debug msg="created thumbnail: s:\stash\generated\screenshots\aeb4b94d5147b534.thumb.jpg"
time="2022-08-24 17:24:00" level=debug msg="Creating screenshot for t:\dl\1\something291.mp4"
time="2022-08-24 17:24:06" level=debug msg="created screenshot: s:\stash\generated\screenshots\aeb4b94d5147b534.jpg"
time="2022-08-24 17:24:16" level=info msg="[generator] generating phash sprite for t:\dl\1\something291.mp4"
time="2022-08-24 17:24:21" level=info msg="[generator] generating sprite image for t:\dl\1\something291.mp4"
time="2022-08-24 17:24:39" level=info msg="[generator] generating sprite vtt for t:\dl\1\something291.mp4"
time="2022-08-24 17:24:39" level=info msg="Calculating oshash for t:\dl\1\something292.mp4 ..."

Slowdown since both folders are populated.

Test 5:
Moved stash working dir (db, config, exe, ffmpeg) to SMB share (s:\stash), just to illustrate it
s:\stash\generated\screenshots - ~65k files, smb share
s:\stash\generated\thumbnails - 200k+ files, stored in two level deep folders, smb share
s:\stash\generated\vtt - ~50k files, smb share

time="2022-08-24 18:13:35" level=info msg="Calculating oshash for t:\dl\1\something305.mp4 ..."
time="2022-08-24 18:13:35" level=info msg="t:\dl\1\something305.mp4 doesn't exist. Creating new item..."
time="2022-08-24 18:13:47" level=debug msg="Creating thumbnail for t:\dl\1\something305.mp4"
time="2022-08-24 18:13:52" level=debug msg="created thumbnail: s:\stash\generated\screenshots\a5258d52dc20b525.thumb.jpg"
time="2022-08-24 18:13:52" level=debug msg="Creating screenshot for t:\dl\1\something305.mp4"
time="2022-08-24 18:13:58" level=debug msg="created screenshot: s:\stash\generated\screenshots\a5258d52dc20b525.jpg"
time="2022-08-24 18:14:08" level=info msg="[generator] generating phash sprite for t:\dl\1\something305.mp4"
time="2022-08-24 18:14:13" level=info msg="[generator] generating sprite image for t:\dl\1\something305.mp4"
time="2022-08-24 18:14:34" level=info msg="[generator] generating sprite vtt for t:\dl\1\something305.mp4"
time="2022-08-24 18:14:34" level=info msg="Calculating oshash for t:\dl\1\something292.mp4 ..."

Sort of same times as test 4 where DB was on SSD directly.

As @gitgiggety helped out in the original thread, he wanted to see how ffmpeg oneliner did (s:\stash\generated\tmp\ folder is empty):

ffmpeg -v error -y -ss 10 -i "t:\dl\1\something305.mp4" -frames:v 1 -q:v 2 -f image2 s:\stash\generated\tmp\test.jpg

Generating took perhaps half a second
Additional tests with folders full of files (~65k and ~50k):

ffmpeg -v error -y -ss 10 -i "t:\dl\1\something305.mp4" -frames:v 1 -q:v 2 -f image2 s:\stash\generated\screenshots\test.jpg

~3s

ffmpeg -v error -y -ss 10 -i "t:\dl\1\something305.mp4" -frames:v 1 -q:v 2 -f image2 s:\stash\generated\vtt\test.jpg

~3s

@don20aba don20aba added the bug report Bug reports that are not yet verified label Aug 29, 2022
@gitgiggety
Copy link
Contributor

Thanks for reporting back. So you're indeed having the generated folder on a network drive which makes it slow, as kinda expected (at least based on the fact I couldn't reproduce).

Moving files into subdirectories would then indeed be the solution, but personally I wouldn't know how to handle that. Whether a migration could be done for existing stuff (might be terribly slow in case of this being on a network drive), or just removing everything and letting the user manually start a generate task (although I'm unsure whether that's even possible, as this is normally done during import, and not optional). But I do believe the latter has been done before. Asking the user to run a generate task after upgrade. So it's at least up to @WithoutPants to decide on how to handle the migration.

Generating took perhaps half a second
Additional tests with folders full of files (~65k and ~50k):

For what it's worth. Stash thus does the first, and invokes ffmpeg with the file in the \tmp\ dir as output / destination. Afterwards Stash itself moves the file. So it's not a 100% comparable. But at least confirms that file operations are slow on those massive folders (being written to over SMB).

@WithoutPants
Copy link
Collaborator

Duplicate of #2824. Please post to that issue instead of creating a new one.

@WithoutPants WithoutPants closed this as not planned Won't fix, can't repro, duplicate, stale Oct 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug report Bug reports that are not yet verified duplicate
Projects
None yet
Development

No branches or pull requests

3 participants