Scraper of the Dutch real estate website Funda.nl, written in Python using Scrapy. Based on funda-scraper.
This project is part of a study project to learn data science. The main learning purposes: Python, regular expressions, web scraping.
There are two spiders: funda_spider scrapes data on houses for sale in a certain city, such as those listed on http://www.funda.nl/koop/amsterdam/, funda_spider_sold scrapes data on houses which have recently been sold, such as those listed on http://www.funda.nl/koop/verkocht/amsterdam/.
The spiders can be run with the following commands:
scrapy crawl funda_spider -a place=amsterdam -o amsterdam_for_sale.csv -s LOG_LEVEL=ERROR
scrapy crawl funda_spider_sold -a place=amsterdam -o amsterdam_sold.csv -s LOG_LEVEL=ERROR
The keyword 'place' specifies the city for which the data is scraped. The output format can be set alternatively to .json by typing 'amsterdam_sold.json' instead of 'amsterdam_sold.csv'.
Install Scrapy in the project directory
- sudo apt-get install python-pip python-scrapy