Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DeviantArt Pulling Extra Random Galleries/Artists During Scrape #1356

Closed
sourmilk01 opened this issue Mar 6, 2021 · 7 comments
Closed

DeviantArt Pulling Extra Random Galleries/Artists During Scrape #1356

sourmilk01 opened this issue Mar 6, 2021 · 7 comments

Comments

@sourmilk01
Copy link

During an extraction I've noticed that the script will sometimes pull the galleries of other artists. I have no idea where they come from, they don't appear to be related to the target gallery in any way and they don't even appear as a favorite of the target artist. Any idea why this is happening?

@Galewin
Copy link

Galewin commented Mar 6, 2021

Do you have Extra set to true in your config file ? If so, I think one of the recent update adds embed content, so it's scrapping way more stuff for me too.

The only ones I wanted to scrap withe extra were the stash links for variants in some posts but now I also get new galleries that looks unrelated.

@sourmilk01
Copy link
Author

Extra is set to true in my config; I hadn't run a DeviantArt scrape in a couple of months and last night when I tried running a batch it also pulled several hundred other galleries which I've never heard of nor could relate to the galleries in my list. I too had Extra enabled in case any St.ash content was available.

I'll try running it with Extra set to false, see what happens.

@sourmilk01 sourmilk01 changed the title DeviantArt Pulling Extra Random Galleries/Artists During Extraction DeviantArt Pulling Extra Random Galleries/Artists During Scrape Mar 6, 2021
@sourmilk01
Copy link
Author

I just ran it with Extra set to false, it appears to have fixed the issue. The scrape is only pulling the galleries I have queued.

I'll keep this issue open a little while longer before closing in case any one else wants to contribute.

@rautamiekka
Copy link
Contributor

1.17.0 released 22h ago expanded the 'extra' option to do exactly that.

IMO 'extra' should be changed so that you can conf which extras you want cuz recursively getting embedded galleries/favs will become disastrous in terms of data downloaded and thus space/time wasted.

mikf added a commit that referenced this issue Mar 6, 2021
Setting 'extra' to "stash" or "deviations" will only download embedded
sta.sh content or deviations. 'true' still downloads both.
@mikf
Copy link
Owner

mikf commented Mar 6, 2021

Someone on Gitter requested to also download regular dA posts linked in descriptions as well as sta.sh posts, and it was a lot easier to fetch both at once without implementing any selection logic. Well, I should've known better ...

Anyway, commit 5c32a7b adds a way to only download extra sta.sh content or only dA posts.

@sourmilk01
Copy link
Author

Thanks for the commit @mikf .

@rautamiekka
Copy link
Contributor

rautamiekka commented Mar 7, 2021

Someone on Gitter requested to also download regular dA posts linked in descriptions as well as sta.sh posts, and it was a lot easier to fetch both at once without implementing any selection logic. Well, I should've known better ...

Anyway, commit 5c32a7b adds a way to only download extra sta.sh content or only dA posts.

I were thinking about 'stash', 'posts', and 'all', in the vein of the extractor.deviantart.include where you can specify a comma-separated string or an array. 5c32a7b is indeed a step to the right direction, thanks for that.

mikf added a commit that referenced this issue Mar 7, 2021
- change its expected type to string
- let users specify a list of sources (stash, posts) or 'all'
mikf added a commit that referenced this issue Mar 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants