Skip to content

JShabtai/CovidDataJSON

Repository files navigation

Overview

This repo is a tool to process the Johns Hopkins daily reports into a single JSON file. The file is moderately large (~8MB at the moment, but obviously growing every day).

Precomputed data

The output of this script can be found on the data branch of this repository. https://github.com/JShabtai/CovidDataJSON/tree/data/data

I will try to keep that up to date, but make no guarantees.

Motivation

The Johns Hopkins data, while really useful, is in a CSV format which can be a bit tricky to parse into a useful structure. In addition, the timeseries files don't contain recovered or active case data for subregions (i.e. states/provinces). So this tool parses the daily reports instead to be able to provide that useful information.

JSON Structure

If you're using this data, you probably want to know the structure of the data. Here is the Typescript interface that's used:

export interface Datapoint {
    confirmed: number;
    recovered: number;
    deaths: number;
    active?: number;
}

type TimeSeriesData = Record<string, Datapoint>;

export interface DatasetInterface {
    name: string;
    data: TimeSeriesData;
    subsets: Record<string, DatasetInterface>;
}

As you can see, the structure is recursive. A dataset can contain any number of subsets which are themselves datasets. The top level is always the 'Global' dataset. This is not directly in the Johns Hopkins data, but is computed by adding up the number of each case types for each country.

The nesting levels are as follows (further detail, such as cities, may be added in the future):

Global
- Country
  - Province/state/territory
  - ...
- Country
  - Province/state/territory
  - Province/state/territory
- ...

Example:

{
  "name": "Global",
  "data": {
    "2020-03-22": {
      "confirmed": 137,
      "recovered": 0,
      "deaths": 3,
      "active": 134
    }
  },
  "subsets": {
    "FooCountry": {
      "name": "FooCountry",
      "data": {
        "2020-03-22": {
          "confirmed": 137,
          "recovered": 0,
          "deaths": 3,
          "active": 134
        }
      },
      "subsets": {
        "Province 1": {
          "name": "Province 1",
          "data": {
            "2020-03-22": {
              "confirmed": 8,
              "recovered": 0,
              "deaths": 0,
              "active": 8
            }
          },
          "subsets": {}
        },
        "Province 2": {
          "name": "Province 2",
          "data": {
            "2020-03-22": {
              "confirmed": 6,
              "recovered": 0,
              "deaths": 2,
              "active": 4
            }
          },
          "subsets": {}
        }
      }
    }
  }
}

Notes:

  • If the CSVs contain rows for the country as a whole, that data is used directly. If data is only available for regions within the country, they are added up to compute the country values.
  • The active case numbers are not read from the CSVs. They are computed as active = confirmed - (recovered + deaths).

Related works

The data here is used by my covid graphing tool which you can see at covid.shabtai.ca. The source is available on GitHub

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •