Skip to content

JarvisXing/scrapy_tu

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

scrapy_tu



this project can spider user infomation saved as .json file,
and download their avatar icon as .jpg files.
As a photographer, I choose tuchong as target,
there are nearly 2 million users on this site, if you want to spider all of them,
I suggest use this spider with proxy and set spider time.

As a result,user data will be like this,saved in data folder.

their avator will be saved in icon folder. avator file named with the user's id.

software

before run this demo,you'd better to install these lib

  1. Anaconda3 for windows
    download Anaconda3-5.1.0-Windows-x86_64.exe
    just install
  2. opencv for python
    In cmd just run conda install -c conda-forge opencv It's necessary for scrapy to spider image with PIL,
    opencv is a good choice instead of PIL,but pillow not work.
  3. scrapy run conda install scrapy
  4. recommand chrome as debug tool you may frequently operate ctrl+shift+i,ctrl+shift+c,ctrl+f,right click->copy xpath.

run

In cmd run scrapy crawl tuchong
welcome to leave msg at issues encountered error.

File structure

WORKSPACE_DIR(scrapy_tu)

scrapy.cfg

data

xx.json

icon

xx.jpg

tuchong

spiders

init.py
tuchong_spider.py

init.py
items.py
middlewares.py
pipelines.py
settings.py

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages