Skip to content

Scrape various open data directories to create an index of what's available out there

Notifications You must be signed in to change notification settings

jhenderson00/scrape-open-data

Repository files navigation

scrape-open-data

Scrape latest data

Scrapes every available dataset from Socrata and stores them as newline-delimited JSON in this repository, to track changes over time through Simon Willison's Git scraping method.

  • socrata/data.delaware.gov.jsonl contains the latest datasets for a specific domain. This is updated twice a day.
  • socrata/data.delaware.gov.stats.jsonl contains information on page views and download numbers. This is updated once a week to avoid every single fetch including updated counts for many different datasets.

The resulting database is deployed to https://open-data.datasette.io/

scrape_socrata.py

Run python scrape_socrata.py socrata/ to scrape the data from Socrata and save it in the socrata/ directory.

Add --stats to include page view and download statistics in separate files.

Add --verbose for verbose output.

build_socrata_db.py`

Run this command to build a SQLite database from the .jsonl files in socrata/:

python build_socrata_db.py socrata.db socrata

About

Scrape various open data directories to create an index of what's available out there

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages