Skip to content

inthisworl/Covid19CanadaArchive

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Canadian COVID-19 Data Archive

The purpose of this repository is to support automated, daily backups of COVID-19 data from Canadian governmental and non-governmental sources. It is composed of a list of datasets (datasets.json), as well as the Python code making up the archival tool itself. The Canadian COVID-19 Data Archive is one component of the What Happened? COVID-19 in Canada project.

For a list of available datasets, see the Data catalogue below. For information on how to access the datasets in the archive, see Accessing the data.

The easiest way to contribute to this project is to help add new data (by providing a link to the data or by uploading files you have previously downloaded) using our data submission form or by opening an issue on GitHub. We're also looking for help making this archive more useful and accessible by building tools to simplify discovering, downloading and working with the data contained within.

File name timestamps are given in ET (America/Toronto) in the following format: %Y-%m-%d_%H-%M. Files are archived nightly beginning around 22:00 ET.

All code in this repository is covered by the MIT License. Archived datasets may be used under the licenses/terms of use assigned to them by the data creators.

This repository is maintained by Jean-Paul R. Soucy on behalf of the COVID-19 Open Data Working Group.

Table of contents:

Data catalogue

A searchable catalogue of datasets, sorted by province/territory (and city/organization, if applicable), is available in the Data Explorer. Full details for each dataset, including any notes pertaining to them, are available in the Search list of datasets section of the Data Explorer. Feature requests and bug reports for the Data Explorer should be made in its dedicated GitHub repository.

A note about data from Quebec: when both French and English data files are available, the French dataset should usually be considered definitive (and in most cases, these files have been captured in the archive for a longer duration).

Accessing the data

The easiest way to explore the data in the archive and download individual files is the aforementioned Data Explorer.

The files in the archive are hosted under the following domain: https://data.opencovid.ca/archive. For example, the PHAC Epidemiology Update from November 4, 2020 may be downloaded at the following URL:

https://data.opencovid.ca/archive/can/epidemiology-update-2/covid19-download_2020-11-04_23-38.csv

A complete index of files in the archive, including flags for duplicated files and corrected file dates (file_data_true), is available at the following URL:

https://data.opencovid.ca/archive/file_index.csv

This index is refreshed nightly around 23:00 ET. The file index is a searchable spreadsheet containing the download links to all files in the archive. Any programming language can be used to easily download a list of files.

An experimental JSON API is also available to search the file index, although it currently only supports filtering by UUID. For example, the following URL returns the index for the PHAC Epidemiology Update:

https://api.opencovid.ca/archive?uuid=f7db31d0-6504-4a55-86f7-608664517bdb

The API is not yet documented but will soon be added to https://opencovid.ca/api/.

Finally, the entire contents of the archive are accessible via the R package Covid19CanadaData using the function dl_archive, which interfaces with the API described above. Be aware that this package is undergoing rapid development and may change at any time.

Contributing

You may contribute to the project in several ways. In the future, more ways of contributing will be added (e.g., adding metadata).

Add a new dataset

New datasets may be added in the following ways:

  • New! Use our data submission form.
  • Create a pull request on GitHub adding the dataset to the appropriate location in the "active" section of data/datasets.json. See other entries for examples.
  • Create an issue on GitHub requesting the new dataset be added.
  • Email the maintainer requesting the new dataset be added.

If you have archived versions of the dataset you are adding (e.g., you previously downloaded the dataset daily), see "Contributing historical data" below.

Contribute historical data

Historical data (e.g., archived versions of a dataset newly added to the archival tool) may be contributed in the following ways:

Retire an inactive dataset

Some datasets continue to exist at a URL but are no longer updated. These datasets should be removed from the nightly update. This may be achieved in the following ways:

  • Create a pull request on GitHub moving the dataset's entry from the "active" section of data/datsets.json to the appropriate location in the "inactive" section. Also, change the dataset's "active" flag from "True" to "False". See other entries for examples.
  • Email the maintainer with the historical data (for a dataset you've downloaded previously but is no longer updated).

Recommended citation

COVID-19 Canada Open Data Working Group. Canadian COVID-19 Data Archive. https://github.com/ccodwg/Covid19CanadaArchive. (Access date).

Notes about the data archive

On several occasions, the nightly archival script has failed to run. Depending on when the failure was identified, this may have resulted in a partial or total loss of archival data for that day. A list of these days is provided below:

  • 2020-10-21
  • 2020-11-19

In addition, the method of archiving websites (HTML files) was modified on 2021-12-30. This may have caused a handful of HTML files not to be marked duplicates of the previous day's file when they otherwise would have been. On 2022-03-26, the old method of archiving websites was erroneously used, once again resulting in some HTML files not being marked duplicates when they otherwise would have been.

Notes about the archival tool

Updates to the Canadian COVID-19 Data Archive are managed by the archivist package. Development of archivist originally took place in this repository but has since been migrated to its own repository.

Acknowledgements

Shannon Fiedler created the banner image for the Canadian COVID-19 Data Archive.

Many people are to thank for contributing archived data and code to this repository:

Jens von Bergmann / Simon Coulombe / James E. Wright / Farbod Abolhassani / Shelby L. Sturrock / Safa Ahmad / Jacques Marcoux / Shraddha Pai / Matti Aleve / Scott van Millingen / Robson Fletcher / Les Perreaux / Allen Kwan (Twitter/LinkedIn) / Christine Hagyard (Twitter/LinkedIn) / Amy Bihari (Twitter/LinkedIn) / Razieh Faraji (Twitter/LinkedIn) / David Lussier / Matthias Schoettle / Jeremy Moreau

Last but not least, thank you to the Internet Archive for being a resource and an inspiration to amateur archivists everywhere. They even gave the Canadian COVID-19 Data Archive a shoutout on Twitter!

About

Canadian COVID-19 Data Archive

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%