Skip to content
/ scrape Public

RSS feeds from Australian government media releases.

License

gov-rss/scrape

Repository files navigation

gov-scrape

CI to Docker Hub

A collection of Scrapy spiders that transform government media releases into RSS feeds. The purpose of creating this is to increase the availability of these media releases to members of the public, making it easier to keep up to date with state governments.

The feeds are available through the website and gov-rss/rss.

Setup

Pip

$ pip install -r requirements.txt

Conda

$ conda create --file=environment.yaml
$ conda activate gov-scrape

Docker

$ docker pull callumskeet/gov-scrape
# or
$ docker build -f splash.Dockerfile -t gov-scrape .

Run

Shell

$ scrapy crawl <spider-name>  # one spider
$ ./crawl.sh                  # all spiders

Docker-Compose

$ docker-compose up -d        # runs crawl.sh then exits

Docker

$ docker run \
    --name gov-scrape \
    --rm \
    -v $FEED_DIR:/gov-scrape/feeds \                # stores rss files
    -v $LOG_DIR:/gov-scrape/logs \                  # log files from scrapy
    -v $CACHE_DIR:/gov-scrape/.scrapy/httpcache     # cache content from crawled pages
    -it gov-scrape                                  # crawls with all spiders

The regular shell commands also work with Docker, e.g. scrapy crawl vic-prem can be passed to the container.

Available spiders

Spider Name Source
act_shadow canberraliberals.org.au
nsw_gov nsw.gov.au
nsw_prem nsw.gov.au
nt_shadow countryliberal.org
sa_prem premier.sa.gov.au
tas_prem premier.tas.gov.au
qld_gov statements.qld.gov.au
qld_shadow lnp.org.au
sa_shadow* facebook.com/SouthAustralianLabor/
tas_shadow taslabor.com
vic_prem premier.vic.gov.au
vic_shadow vic.liberal.org.au
wa_gov mediastatements.wa.gov.au
wa_shadow waliberal.org.au

* In the process of getting permission from Facebook to scrape the SA Labor page

A few sources already had RSS feeds available, a list of these is available below:

RSS Feed Source
Jodie McKay Media Releases (NSW Shadow Premier) https://www.jodimckay.com.au/media_releases
NT Government Newsroom https://newsroom.nt.gov.au/
NT Government Departmental and Agency Media Releases https://mediareleases.nt.gov.au/
ACT Government RSS Feed Collection https://www.cmtedd.act.gov.au/open_government/inform/act_government_media_releases

Projects used


Copyright (c) 2021 Callum Skeet under the MIT License

About

RSS feeds from Australian government media releases.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published