Skip to content

COVID-19 statistics for Germany. For states and counties. With time series data. Daily updates. Official RKI numbers.

License

Notifications You must be signed in to change notification settings

mhrabovcin/covid-19-germany-gae

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

COVID-19 case numbers for Germany 😷

🇩🇪 Übersicht

(see below for an English version)

  • COVID-19 Fallzahlen für Bundesländer und Landkreise.
  • Mehrfach täglich automatisiert aktualisiert.
  • Mit Zeitreihen (inkl. 7-Tage-Inzidenz-Zeitreihen).
  • Aktuelle Einwohnerzahlen und GeoJSON-Daten, mit transparenten Quellen.
  • Präzise maschinenlesbare CSV-Dateien. Zeitstempel in ISO 8601-Notation, Spaltennamen nutzen u.a. ISO 3166 country codes.
  • Zwei verschiedene Perspektiven:

🇺🇸 Overview

  • Historical (time series) data for individual Bundesländer and Landkreise (states and counties).
  • Automatic updates, multiple times per day.
  • 7-day incidence time series (so that you don't need to compute those).
  • Population data and GeoJSON data, with transparent references and code for reproduction.
  • Provided through machine-readable (CSV) files: timestamps are encoded using ISO 8601 time string notation. Column names use the ISO 3166 notation for individual states.
  • Two perspectives on the historical evolution:
    • Official RKI time series data, based on an ArcGIS HTTP API (docs) provided by the Esri COVID-19 GeoHub Deutschland. These time series are being re-written as data gets better over time (accounting for delay in reporting etc), and provide a credible, curated view into the past weeks and months.
    • Time series data provided by the Risklayer GmbH-coordinated crowdsourcing effort (the foundation for what various German newspapers and TV channels show on a daily basis, such as the ZDF but also the foundation for what the JHU) publishes about Germany.

Contact, questions, contributions

You probably have a number of questions. Just as I had (and still have). Your feedback, your contributions, and your questions are highly appreciated! Please use the GitHub issue tracker (preferred) or contact me via mail. For updates, you can also follow me on Twitter: @gehrcke.

Plots

Note that these plots are updated multiple times per day. Feel free to hotlink them.

Note: there is a systematic difference between the RKI data-based death rate curve and the Risklayer-based death rate curve. Both curves are wrong, and yet both curves are legit. The incidents of death that we learn about today may have happened days or weeks in the past. Neither curve attempts to show the exact time of death (sadly! :-)) The RKI curve, in fact, is based on the point in time when each corresponding COVID-19 case that led to death was registered in the first place ("Meldedatum" of the corresponding case). The Risklayer data set to my knowledge pretends as if the incidents of death we learn about today happened yesterday. While this is not true, the resulting curve is a little more intuitive. Despite its limitations, the Risklayer data set is the best view on the "current" evolution of deaths that we have.

The individual data files

  • RKI data (most credible view into the past): time series data provided by the Robert Koch-Institut (updated daily):
    • cases-rki-by-ags.csv and deaths-rki-by-ags.csv: per-Landkreis time series
    • cases-rki-by-state.csv and deaths-rki-by-state.csv: per-Bundesland time series
    • 7-day incidence time series resolved by county based on RKI data can be found in more-data/.
    • This is the only data source that rigorously accounts for Meldeverzug (reporting delay). The historical evolution of data points in these files is updated daily based on a (less accessible) RKI ArcGIS system. These time series see amendments weeks and months into the past as data gets better over time. This data source has its strength in the past, but it often does not yet reflect the latest from today and yesterday.
  • Crowdsourcing data (fresh view into the last 1-2 days): Risklayer GmbH crowdsource effort (see "Attribution" below):
  • ags.json:
    • for translating "amtlicher Gemeindeschlüssel" (AGS) to Landreis/Bundesland details, including latitude and longitude.
    • containing per-county population data (see pull/383 for details).
  • JSON endpoint /now: Germany's total case count (updated in real time, always fresh, for the sensationalists) -- Update Feb 2021: the HTTP API was disabled.
  • data.csv: history, mixed data source based on RKI/ZEIT ONLINE. This did power the per-Bundesland time series exposed by the HTTP JSON API up until Jan 2021.

How is this data set different from others?

  • It includes historical data for individual Bundesländer and Landkreise (states and counties).
  • Its time series data is being re-written as data gets better over time. This is based on official RKI-provided time series data which receives daily updates even for days weeks in the past (accounting for delay in reporting).

CSV file details

Focus: predictable/robust machine readability. Backwards-compatibility (columns get added; but have never been removed so far).

  • The column names use the ISO 3166 code for individual states.
  • The points in time are encoded using localized ISO 8601 time string notation.

Note that the numbers for "today" as presented in media often actually refer to the last known state of data on the evening before. To address this ambiguity, the sample timestamps in the CSV files presented in this repository contain the time of the day (and not just the day). With that, consumers can have a vague impression about whether the sample represents the state in the morning or evening -- a common confusion / ambiguity with other data sets.

The recovered metric is not presented because it is rather blurry. Feel free to consume it from other sources!

Quality data sources published by Bundesländer

I tried to discover these step-by-step, they are possibly underrated (April 2020, minor updates towards the end of 2020):

Further resources

HTTP API details

Update Feb 2021: I disabled the HTTP API. It's best to directly use the data files from this respository.

What you should know before reading these numbers

Please question the conclusiveness of these numbers. Some directions along which you may want to think:

  • Germany seems to perform a large number of tests. But think about how much insight you actually have into how the testing rate (and its spatial distribution) evolves over time. In my opinion, one absolutely should know a whole lot about the testing effort itself before drawing conclusions from the time evolution of case count numbers.
  • Each confirmed case is implicitly associated with a reporting date. We do not know for sure how that reporting date relates to the date of taking the sample.
  • We believe that each "confirmed case" actually corresponds to a polymerase chain reaction (PCR) test for the SARS-CoV2 virus with a positive outcome. Well, I think that's true, we can have that much trust into the system.
  • We seem to believe that the change of the number of confirmed COVID-19 cases over time is somewhat expressive: but what does it shed light on, exactly? The amount of testing performed, and its spatial coverage? The efficiency with which the virus spreads through the population ("basic reproduction number")? The actual, absolute number of people infected? The virus' potential to exhibit COVID-19 in an infected human body?

If you keep these (and more) ambiguities and questions in mind then I think you are ready to look at these numbers and their time evolution :-) 😷.

Thoughts about reporting delays

In Germany, every step along the chain of reporting (Meldekette) introduces a noticeable delay. This is not necessary, but sadly the current state of affairs. The Robert Koch-Institut (RKI) seems to be working on a more modern reporting system that might mitigate some of these delays along the Meldekette in the future. Until then, it is fair to assume that case numbers published by RKI have 1-2 days delay over the case numbers published by Landkreise, which themselves have an unknown lag relative to the physical tests. In some cases, the Meldekette might even be entirely disrupted, as discussed in this SPIEGEL article (German). Also see this discussion.

Wishlist: every case should be tracked with its own time line, and transparently change state over time. The individual cases (and their time lines) should be aggregated on a country-wide level, anonymously, and get published in almost real time, through an official, structured data source, free to consume for everyone.

Attributions

Beginning of March 2020: shout-out to ZEIT ONLINE for continuously collecting and publishing the state-level data with little delay.

Edit March 21, 2020: Notably, by now the Berliner Morgenpost seems to do an equally well job of quickly aggregating the state-level data. We are using that in here, too. Thanks!

Edit March 26, 2020: Risklayer is coordinating a crowd-sourcing effort to process verified Landkreis data as quickly as possible. Tagesspiegel is verifying this effort and using it in their overview page. As far as I can tell this is so far the most transparent data flow, and also the fastest, getting us the freshest case count numbers. Great work!

Edit December 13, 2020: for the *-rl-crowdsource*.csv files proper legal attribution goes to

Risklayer GmbH (www.risklayer.com) and Center for Disaster Management and Risk Reduction Technology (CEDIM) at Karlsruhe Institute of Technology (KIT) and the Risklayer-CEDIM-Tagesspiegel SARS-CoV-2 Crowdsourcing Contributors

About

COVID-19 statistics for Germany. For states and counties. With time series data. Daily updates. Official RKI numbers.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 86.7%
  • HTML 7.9%
  • Shell 4.8%
  • Makefile 0.6%