-
-
Notifications
You must be signed in to change notification settings - Fork 976
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DeviantArt] Duplicates in database #1874
Comments
Each deviantart sub-extractor uses its own archive id scheme for reasons: |
I take a part of the blame here, because I've argued in favor of those reasons here in the past 😄 The rationale here is basically the principle of least surprise, but this is somewhat subjective and ultimately a question of personal priorities (in effect, not having avoiding duplicates as the highest priority), but I'll admit it's debatable whether this is really the most reasonable choice. DeviantArt is kind of extreme here, with 10 different But this also depends on the cooperation of the site, to some extent. The identifier used here is apparently the |
I wasn't aware of Anyway, can we at least set the same for galleries and direct link to posts? Aka set |
This leads to the problem where replacing the upload with a modified one, one which could be a better one, is skipped. Conversely, having the upload's name leads to potential dupes, but pretty sure you can't upload anything so massive it'll be a problem if downloaded multiple times, and it's always better to have a potential dupe than replace with a potentially damaged, let alone inferior, one. I'd recommend:
^ Example:
^ That's a 24h time cuz I didn't find a direct formatting code in Python docs. |
Well, there are tools specifically made to deal with duplicate files. These here should even support similar image search, which might help with manually converted images etc.
Fine with me... |
I think we misunderstood each other, what ends up happening is that the file has been downloaded by gallery-dl, then converted to another format. Then gallery-dl downloads that file again, suddenly I have one version in png, and one in webp. But the safety measure to not overwrite files kicks in and I am stuck with both files. All in all, external problem, but it gets triggered by the database not doing its job :P |
(#1874) use the same as gallery downloads
Done (ada36c2) |
When downloading a deviation directly, the entry in gallery-dl's database is like
deviantart415470071.png
, but when when for example downloading via a gallery, you getdeviantartg_fluffytheneko_415470071.png
, this easily leads to duplicates being downloaded, and completely defeats the purpose of the database. My solution would be to only keep the first format, as it would also skip re-downloading images when accounts change name.I also have countless of entries missing the filename, so really I think the format should just be
deviantart{index}
The text was updated successfully, but these errors were encountered: