Mirrors public dataset of summaries of completed Government of Canada Access to Information requests. You can view a nice interface to the dataset maintained by the federal Open Government team.
Why mirror the dataset? Unfortunately, completed summaries are generally kept in the dataset for only two years; after that, it’s assumed that the original institution destroyed the response records (following standard procedures for information like that). But you may still be interested in seeing what’s been asked and answered in the past! Hence, a mirror.
This repository:
- started with an existing dataset of summaries from May 2017 onward (if you have anything from earlier, please share!)
- combines these summaries with new ones published online
- cleans the summaries to try to avoid duplicates (to be integrated into updating flow, see #4)
- automatically checks once a week for those new summaries
- automatically deploys the resulting CSV to a public datasette instance
You can explore the public datasette instance to see, explore, and download the data. Or you can download the summaries directly from this repository.
You can also check out the issues list to see what’s happening and what’s coming next for this project. Feel free to dig in if you’d like!
Thanks:
- Jamie Duncan, AL, SB for the original dataset and encouragement
- Simon Willison for git scraping, datasette, and a clear how-to on deploying a git scraping datasette instance in a serverless function (what is the internet!?)
- Government of Canada Open Data team, and Access to Information shops everywhere, for doing the hard work to make things open
- 2022-10-18, @jdunca contributed the full historical dataset (#9).
- 2022-10-19, @lchski merged the historical dataset, integrating with existing data and removing duplicates (#10).
- 2022-10-25, @jdunca points out that most of the remaining duplicates (a few hundred that point) were due to errors from the original source data, and they seemed mostly fixed in the full historical dataset (#11).
- 2022-11-21, @lchski (with many apologies for his tardiness) made a few edits and merged #11.
There’s a small possibility some summaries fell through the cracks in this process—but checks by @jdunca in cleaning/2022-10-23-exploring-errors-merging-historical-data.R
make us pretty confident all’s good. There are around 250 remaining duplicates, largely due to slight changes in request summary, mostly from SSC—these are left as potentially interesting data points. On the off chance this deleted something, the previous version of ati-summaries.csv
remains available in the repository’s history.