learning_python_scrapy

Scrapy Shell

scrapy shell 'http://quotes.toscrape.com/'

Write to csv

scrapy crawl quotes -o items.csv

scrapy crawl quotes -o items.json

scrapy crawl quotes -o items.xml

Runspider command

Use it to create a quick crawler and run it straight as a script:

~/workspace (master) $ scrapy runspider quotes.py

Start Crawler

scrapy crawl books

Pass argument to scrapy

scrapy crawl books -a category="http://books.toscrape.com/catalogue/category/books/philosophy_7/index.html"

Export to Excel

Install openpyxl

sudo pip install openpyxl

Create project

scrapy startproject testproject

scrapy genspider testspider "www.example.com"

Change User Agent

scrapy crawl quotes -s USER_AGENT="Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.139 Safari/537.36"

How to find out your user agent?
- Google "whats my user agent"

--> 22/063

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
bitcointalk		bitcointalk
books_crawler		books_crawler
books_crawler_advanced		books_crawler_advanced
books_crawler_image_download		books_crawler_image_download
books_crawler_saveToMysql/books_crawler		books_crawler_saveToMysql/books_crawler
class_central_spider		class_central_spider
eplanning_spider		eplanning_spider
quotes_spider		quotes_spider
scrap_table		scrap_table
trumptwitterarchive_spider		trumptwitterarchive_spider
.gitignore		.gitignore
053 scrapy-formrequest.py		053 scrapy-formrequest.py
README.md		README.md
fixC9Scrapy.sh		fixC9Scrapy.sh
installScrapy.sh		installScrapy.sh
linkedIn_selenium.py		linkedIn_selenium.py
parameters.py		parameters.py
python3.sh		python3.sh
quotes.py		quotes.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

learning_python_scrapy

Scrapy Shell

Write to csv

Runspider command

Start Crawler

Pass argument to scrapy

Export to Excel

Install openpyxl

Create project

Change User Agent

About

Uh oh!

Releases

Packages

Languages

marcpre/learning_python_scrapy

Folders and files

Latest commit

History

Repository files navigation

learning_python_scrapy

Scrapy Shell

Write to csv

Runspider command

Start Crawler

Pass argument to scrapy

Export to Excel

Install openpyxl

Create project

Change User Agent

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages