Skip to content

A crawler for crawling Pixiv rank top N and any illustrator all artworks (!!!Abandoned Maintenance)

License

Notifications You must be signed in to change notification settings

Neod0Matrix/pixiv-crawler

Repository files navigation

Python 2.7

pixiv-crawler - Pixiv images and messages crawler

██████╗ ██╗██╗  ██╗██╗██╗   ██╗       ██████╗██████╗  █████╗ ██╗    ██╗██╗     ███████╗██████╗ 
██╔══██╗██║╚██╗██╔╝██║██║   ██║      ██╔════╝██╔══██╗██╔══██╗██║    ██║██║     ██╔════╝██╔══██╗
██████╔╝██║ ╚███╔╝ ██║██║   ██║█████╗██║     ██████╔╝███████║██║ █╗ ██║██║     █████╗  ██████╔╝
██╔═══╝ ██║ ██╔██╗ ██║╚██╗ ██╔╝╚════╝██║     ██╔══██╗██╔══██║██║███╗██║██║     ██╔══╝  ██╔══██╗
██║     ██║██╔╝ ██╗██║ ╚████╔╝       ╚██████╗██║  ██║██║  ██║╚███╔███╔╝███████╗███████╗██║  ██║
╚═╝     ╚═╝╚═╝  ╚═╝╚═╝  ╚═══╝         ╚═════╝╚═╝  ╚═╝╚═╝  ╚═╝ ╚══╝╚══╝ ╚══════╝╚══════╝╚═╝  ╚═╝

ascii artword from http://patorjk.com/software/taag/

License

Copyright (c) 2017 @T.WKVER </MATRIX>
Code by </MATRIX>@Neod Anderjon(LeaderN)
MIT license read in LICENSE
Thanks to fork and watch my project

Update

Version: v5p4_LTE
Last Update Time: 20171201pm1735

This python crawler is built to crawl pixiv images
It have two mode: RankTopN and illustRepoAll 
Call threading to add multi-process download images
Two mode for requesting original images

Platform

Linux x86_64 kernel and Windows NT
Python: 2.7+(2.6 may be too old, and not support 3.x)

Requirements

  • urllib
  • urllib2
  • beautifulsoup4
  • json
  • getpass
  • cookielib
  • threading
  • PIL
  • retrying

Run

Problems that may arise

May the good network status with you

If you use the crawler too often to request data from the server, 
the server may return an 10060 error for you, 
just need to wait for a while and then try again, or use a proxy server

If your test network environment has been dns-polluted, I suggest you 
fix your PC dns-server to a pure server
In China, such as 115.159.146.99 from https://aixyz.com/

ira mode you need input that illuster id ,not image id
crawler log image will rename to array number + image id, 
you can use this id to find original image with URL:
https://www.pixiv.net/member_illust.php?mode=medium&illust_id=<your known id>

Pixiv website will often change the image URL frame, 
please use the lastest results from javascript console

Remember delete login.cr info before push or commit issue

Login successed, Pixiv sees the post-data rather than the headers,
as long as the opener is guaranteed to be used correctly, 
no headers can be successfully logged in
Now you can use this crawler to crawl all target from Pixiv

About

A crawler for crawling Pixiv rank top N and any illustrator all artworks (!!!Abandoned Maintenance)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages