This is a web scraper for the Dublin City Council planning website.
It's capable of scraping:
- All of the results from a Search of the DCC planning website
- In-depth details of individual planning applications
- The documents associated with each planning application
It is set up to scrape to a local sqlite3 database and create associations between planning applications and their documents.
It doesn't actually download any documents. It just records their details.
- To get a search URL that can be scraped, you must perform a search manually, then navigate to the second page and back to the first page. Insert this URL into the
src/searchResults/scrape.js
page with thestartIndex
URL parameter passed in from the function. - The format of the Date Registered From field is
dd-mmm-YYYY
. For example: "01-jan-2019".
- The scraper doesn't attempt to figure out how many pages of results there are in a particular planning application search. The number of pages to paginate is hardcoded.
- It doesn't yet record the planning decision.
- It's not capable of executing searches automatically. You have to manually run a search then feed the search URL into the code to scrape the results.
- Create a CLI with options and flags.
There is no CLI yet. You have to manually edit the src/index.js
file to tell the scraoer
what to do. Read the code and you'll get it pretty quickly.
Run it like this:
node src/index.js
Open terminal
cd /Users/laurennorton/Desktop/dcc-scraper-master
open the