Skip to content

maartje/funda-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

50 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Overview

Scraper of the Dutch real estate website Funda.nl, written in Python using Scrapy. Based on funda-scraper.

Motivation

This project is part of a study project to learn data science. The main learning purposes: Python, regular expressions, web scraping.

Basic Usage

There are two spiders: funda_spider scrapes data on houses for sale in a certain city, such as those listed on http://www.funda.nl/koop/amsterdam/, funda_spider_sold scrapes data on houses which have recently been sold, such as those listed on http://www.funda.nl/koop/verkocht/amsterdam/. The spiders can be run with the following commands:

scrapy crawl funda_spider -a place=amsterdam -o amsterdam_for_sale.csv -s LOG_LEVEL=ERROR

scrapy crawl funda_spider_sold -a place=amsterdam -o amsterdam_sold.csv -s LOG_LEVEL=ERROR

The keyword 'place' specifies the city for which the data is scraped. The output format can be set alternatively to .json by typing 'amsterdam_sold.json' instead of 'amsterdam_sold.csv'.

Installation

Install Scrapy in the project directory

  • sudo apt-get install python-pip python-scrapy

About

scraper for funda housing site

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages