-
-
Notifications
You must be signed in to change notification settings - Fork 800
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug Report] Slow scanning with huge ammounts of videos #2824
Comments
Curious if the problem continues with the file refactor... The generated directories getting huge could be solved by using a subdirectories solution usually that's done by using 1-2 of the first or last digits as a reference, like putting Axxxxx into an A subdirectories, Bxxxxx into a B subdirectory, etc. |
Yes, it seems that is already in use for generated\thumbnails (nesting folders two levels deep of 00 through ff) so could be reused here as well. Another issue with such big folders is any kind of backup operation. It takes ages to sync those two folders and added downside is that any disk operation on that network share is very slow while backup is in progress. It could be just my way of using it, but i am sure similar problems will happen all over for folders filled with so many files |
While browsing a big directory is not ideal it shouldn't be an issue for the generation as we only stat the files while looking for existing ones. |
True about not impacting speed inside Stash with big folders, but for any other operation with said folder, things get slower and slower, the more files there are inside. Yes, i know there are a few additional operations done on movies. That is deliberate and expected to take some time (looking forward to the feature request mentioned here, if it ever gets added, in combination with sprites. Explained here: #2206). I wanted to show full log. |
Same problem here... General info: |
Figured I’d chime in. I’m currently sitting at 115,727 scenes, 161,253 galleries, and 19,546,134 images. I have noticed a similar problem but it is not due to the quantity of the content. I’ve noticed that the issue was prominent with specific content. In my case, the issue is with Bangbros galleries. Scanning my Bangbros directory which only contains around 2k scenes with galleries can take 6-10 hours to complete. Keep in mind this is the case even when no new content was added to that directory. Again I want to mention that the issue is with the galleries in my case. I haven’t dug much deeper yet to get a better idea as to why the Bangbros galleries are so problematic. For now, I just avoid doing full scans on that directory. When not scanning that directly Stash breezes through everything else. |
I've looked into this by adding much more log statements and enabling micro second timestamping of the log. Myself I'm mostly seeing such a slowdown between "Creating new item" and "generating phash sprite" due to screenshot generation. This is always done / isn't conditional (contrary to what @bnkai mentioned above) and for me takes just over half a second. For example:
So it is my believe this slowdown is due to that. Also because if I revert to an older database and rerun the scan this slowdown doesn't happen. This because it's quick in determining that the screenshot files already do exist and it thus doesn't have to run ffmpeg to generate them. (Which might explain why it's fast when you start with a fresh database, if you didn't clear the generated screenshots directory as well). One thing I'm wondering about is whether it would be better to not generate both the screenshot and the thumbnail of the video, but only generate the screenshot of the video, and just resize that image into the thumbnail. Then the video doesn't have to be read twice. Edit: |
Currently the thumbnail is generated based on the video, but this is slow, especially when the video is on network storage. So instead only generate the screenshot based on the video, and resize that image for the thumbnail. Refs: stashapp#2824
Currently the thumbnail is generated based on the video, but this is slow, especially when the video is on network storage. So instead only generate the screenshot based on the video, and resize that image for the thumbnail. Refs: stashapp#2824
When trying with new DB i meant i just copied config.yml from old location to new folder (along with stash-win.exe and ffmpeg files, so it uses same ones). "generated" folder is no more, same for db. I reran a quick test on such new setup (with debug logging): ~1MB file:
~2GB file:
Again "Creating new item" is just a fraction of a second, as is everything else apart from "generating sprite image". Big (old) DB with accompanying folders again:
~18MB file
There seems to be something off here. New install is fast, old one is not on most of the tasks. |
Thank you for the clarification and log including debug messages. So it seems like there already is a 10 second delay between "Creating new item" and "Creating thumbnail", and creating both images takes about 5 seconds each. The PR I created, #2839, should avoid a second slow pass over the image so should reduce one of those 5 seconds to (well) under a second. The remaining time is hard to explain for me. But I will try to look into it. For example in the 10 seconds from "Creating new item" to "Creating thumbnail" there is a lot of stuff going on. The information of the video is read (duration, video and audio codecs, resolution, etc) which might or might not be slow, but it could also be inserting the item into the database being slow. As I guess your empty test installation stays more or less empty? So you're not importing the entire collection? Which obviously makes for a much smaller database in which in shouldn't be that hard / slow to insert new items, for example. And if there is a massive slowdown in the screenshot & thumbnail generation because of the directory containing lots of files you wouldn't notice that either when the directory is empty. |
Forgot to ask, could you try to run the following command and report how long it took? In other words: whether it's instant, or takes a second or longer:
This will generate a screenshot at 10 seconds of the given video to the given path. And this is also what Stash executes to generate the video. Note you might have to look up the location of the ffmpeg executable (and it might be ffmpeg.exe as well). This to determine whether it's ffmpeg which is slow or whether it's in Stash. But, which I didn't notice before either, the generated image isn't stored in the destination folder, it's put in a temp folder and moved from there. So even if ffmpeg for some reason would scan the output folder it should be a more or less empty folder anyway. |
So I've just done a test and generated a database with 30.000 scenes (1 performer, 1 tag, etc). I've also copied the contents of generated/screenshot folder 10 times (folder contained 1600 genuine screenshots, thumbs and previews, copied 10 times thus creates over 16.000 files). I've then removed the genuine files and started an import of my video collection, without enabling any of the extra "generators", so it just imports the item and generates the screenshot and thumbnail. This ran on the branch with the thumbnail fix, and it took 9 minutes and 15 seconds to import 692 scenes, generating 1384 files. This means it only took 0,80 seconds on average per scene to be imported (and it takes a bit less per video as there are some duplicate files being ignored). Taking into account the thumbnail fix included in this build and it at maximum reducing the screenshot/thumbnail generation in half it still would on average take a maximum of 1,6 seconds on the normal build. And this all on a database with already having 30K scenes in it (although "fake"), and the screenshot folder already containing over 16K of files. So IMO it's safe to say there aren't any real issues with this. Or at least no issues which result in these massive slowdowns of 10 seconds between "Creating new item" and "Generating thumbnail", nor for 5 seconds to generate the two files (per file). All of this leaves me wondering how you're using Stash. You mention using an SSD over SMB / network share. But are Stash' files on the network share as well (but running on the local computer)? So the database file, generated folder, etc? Because for me that would be the only explanation why it would be so slow. Because it then constantly has to read and write those files over the share. And I can imagine that being, a lot slower than having Stash read and write it's own files locally. So checking for the existence of the screenshot files might be slow. But for example ffmpeg having to read the video over the SMB share but at the same time also having to write the screenshot to the share might incur some performance penalty. And if the SQLite database is on the SMB share as well it might even be worse. As that definitely has to go back and forward to read and write to the database. Which might, partially, explain the 10 seconds gap between "Creating new item" and starting to generate the thumbnail. And for what it's worth: in this test Stash was running on my computer, with the video files being on my NAS, stored on HDDs, and in use using SSHFS. So the video files also being made available over a network share. (But Stash' files like the SQLite database and generated folder being stored locally, on an SSD). |
A bit more info how i used it: i had everything on SMB initially (movies, whole stash folder). I now did a few more tests to try to narrow things down a bit. It looks like huge amounts of files in one folder in combination with SMB share are actually very much connected to the problem after all. For me SMB share is fast for single files (over 100MB/s) but slow for enumerating huge amounts of files. I moved stash folder to SSD (c: - kept old db of ~130MB, stash-win.exe, ffmpeg files and "cache" folder in config (just in case)) and redirected "generated" folder to initial stash location on SMB (s:\stash\generated). movies are on another SMB - t:. Test 1:
Fast first part since screenshots is empty Test 2:
Fast second part since vtt is empty Test 3:
Fast everything since both are empty Test 4:
Slowdown as in the begining. Test 5:
Sort of same times as test 4 where DB was on SSD directly. Your experimental test with ffmpeg (s:\stash\generated\tmp\ folder is empty).
Generating took perhaps half a second And did additional ones with folders full of files (~65k and ~50k):
~3s
~3s |
I see this bug report is visible again. To recap findings from the other thread: Right now i stopped using this awesome app as i can not use it due to inability to scan collection. Am very hopeful it will be fixed in some future release. |
Stash is usable again! :) Thank you for reworking cover images in "blobs" folder in same way as thumbnails (two levels deep). If you do the same for "generated\screenshots" and "generated\vtt", features that use those folders (previews,..) would work well in this scenario. |
With huge amounts of videos, scanning local video content gets slower and slower.
General info:
Windows 10
Stash v0.16.1
stash-go.sqlite size: ~110MB
DB contents: ~25k scenes, ~200k images
stash folder location: ssd attached via smb
According to log, slowdown occurs in the "Creating new item..." stage for movies (36s in below example):
Interestingly for images it works instantly.
Just to be safe it is not my system, i created another empty Stash instance. No such problems there.
Unrelated to above bug, just a thought:
perhaps it would be better to have nested folders in generated\screenshots and generated\vtt so both folders do not have such huge amounts of files in single folder (~50k+ files right now). Listing such folders for whatever reason gets very slow.
The text was updated successfully, but these errors were encountered: