Spider built with scrapy and ScrapySplash to crawl listings
This checklist is for personal use and isn't relevant to using the scraper.
- Spider can successfully parse one page of listings
 - Spider can successfully parse mutliple/all pages of designated location
 -  Spider can take price ranges as arguments (
price_lbandprice_ub) - Spider can take location as argument
 
Since Airbnb uses JavaScript to render content, just scrapy on its own cannot suffice sometimes. We need to use Splash as well, which is a plugin created by the Scrapy team that integrates nicely with scrapy.
To install Splash, we need to do several things:
- Install Docker, create a Docker account (if you don't already have one), and run Docker in the background before crawling with
 
docker run -p 8050:8050 scrapinghub/splash
It might take a few minutes to pull the image for the first time doing this. When this is done, you can type localhost:8050 in your browser to check if it's working. If an interface opens up, you are good to go.
- Install scrapy-splash using pip
 
pip install scrapy-splash
See scrapy-splash if you run into any issues.
Run the spider with scrapy crawl airbnb -o {filename}.json -a city='{cityname}' -a price_lb='{pricelowerbound}' -a price_ub='{priceupperbound}'
cityname refers to a valid city name
pricelowerbound refers to a lower bound for price from 0 to 999
priceupperbound refers to upper bound for price from 0 to 999. Spider will close if priceupperbound is less than
pricelowerbound
Note: Airbnb only returns a maximum of ~300 listings per specific filter (price range). To get more listings, I recommend scraping multiple times using small increments in price and concatenating the datasets.
If you would like to do multiple scrapes over a wide price range (e.g. 10-spaced intervals from 20 to 990), see cancun.sh which I used to crawl a large number listings for Cancún.
I would like to thank Ahmed Rafik for his guidance and teachings.