Skip to content
This repository has been archived by the owner on Mar 10, 2023. It is now read-only.

Derived (and augmented) dataset available in JSON, TSV and SQL formats #1281

Open
cipriancraciun opened this issue Mar 23, 2020 · 14 comments
Open

Comments

@cipriancraciun
Copy link

cipriancraciun commented Mar 23, 2020

I have wrote some scripts that take both the series and daily reports files output the following two files:

If you want to automate the download (given how GitHub handles URL's to raw files), you can use the links listed on this page.

Also some plots for these available at:


What I've done:

  • the original JHU dataset has the daily data points in columns, which basically doesn't work with 90% of the usual tools; thus I have "normalized" this in a more SQL-friendly format, with one data-point per row, keyed by location + date;
  • for those countries with regions / provinces I have added a "total" row; (it's easier for plotting;)
  • I have also provided total for US country and US states;
  • I have added an infected column which is computed as infected := confirmed - deaths - recovered; (this data is available up to 2020-03-22;)
  • for each of the metrics (confirmed, recovered, deaths and infected) I have added four additional metrics (i.e. in total 16 metrics):
    • absolute_* -- the original value from the JHU dataset, i.e. cumulative values;
    • relative_* -- the metric divided by confirmed in percentage; (I.e. how many recovered people from the total confirmed up to that date;)
    • delta_* -- the difference from the previous day; (in case of infected the number can be negative;)
    • deltapct_* -- the delta divided by the previous day value; (i.e. the speed in percentage;)
  • I have also added the day_index_* columns which represents the day index since that country / region has reached either 1, 10, 100, or 1000 confrimed cases; (it helps align countries and compare them to that;)
  • I have normalized the country names (i.e. some countries are named differently in different rows, etc.);
  • I have augmented the country data with ISO codes, continents, subcontinents and other useful information;
  • I have added population columns based on CIA Factbook dataset;
  • I have added total rows for continent and sub-continent levels;
  • I have provided the date in ISO format;

I will update these files twice per day, say at 06 UTC and 12 UTC.

Moreover I have also added the in the same format also the NY Times US dataset and the ECDC one.


The scripts are available in the following repository and consist mainly of jq snippets.


If anyone has other ideas about what I can add to these augmented datasets please let me know.

@cipriancraciun
Copy link
Author

cipriancraciun commented Mar 24, 2020

I'll point here a few other issues that are solved by my derived dataset:

@cipriancraciun
Copy link
Author

cipriancraciun commented Mar 26, 2020

Additional issues that popped-up in the last day that would be helped:

@dnalkram
Copy link

Thanks cipriancraciun but app I'm using throws a validation error trying to use the JSON file and online validators too indicate problems with format.

@cipriancraciun
Copy link
Author

cipriancraciun commented Mar 26, 2020

@dnalkram What is the actual error you are getting?

You could open an issue on my repository (https://github.com/cipriancraciun/covid19-datasets) so that we don't clutter the JHU repository with this.

@dnalkram
Copy link

In Power BI I get
"We found extra characters at the end of JSON input.".
I then used site: https://www.freeformatter.com/json-validator.html
and got:
"The JSON input is NOT valid in JavaScript, illegally formed XML syntax (At line #1164), (At position #1)"
Hope this helps.

@cipriancraciun
Copy link
Author

@dnalkram Could you paste the URL you've used to download the file?

The links in the description above are directly from GitHub, and you must use the raw URL from that page.

To keep things simple, at the following link you can find the files on my own site, which shouldn't give you any issues:

(You can use these second links as I usually update them as soon as I push to GitHub.)

If you still get errors, download that link and double check that you are actually getting a JSON and not an error HTML page.

@dnalkram
Copy link

That file worked fine, thanks for all the help!

@sbw78
Copy link

sbw78 commented Mar 30, 2020

THANK YOU! I have spent the past 3 days trying to wrangle the CCSSE daily reports into a parseable time series, but with the constant changes in data format and inconsistent location naming I was going insane. Looking forward to checking out your data.

@cipriancraciun
Copy link
Author

@sbw78 I'm glad I could help you. If you encounter any issues, please open a ticket on my repository and describe the issue. (Given the "quarantine" I usually reply fairly quickly.) :)

BTW, in the interim I have integrated also the NY Times dataset and the ECDC one; thus if you are looking for alternative data you can choose one of these.

@ChrisParkerWA
Copy link

Hi Ciprian,
Thought I would submit a request here if I may.
Is there any way to create a simple JSON dataset along the lines of, for example
"country": "Australia"
"cases": 4860
"deaths": 20
"recovered": 244

@cipriancraciun
Copy link
Author

cipriancraciun commented Apr 2, 2020

@ChrisParkerWA I just opened an issue on my repository (cipriancraciun/covid19-datasets#12), where I've proposed to add such a simple format, but perhaps with only a few more values. Please verify my proposal there and let me know if it works for you.

@cipriancraciun cipriancraciun changed the title Derived (and augmented) JSON and TSV in a more SQL friendly format Derived (and augmented) dataset available in JSON, TSV and SQL formats Apr 3, 2020
@cipriancraciun
Copy link
Author

In the interim I've also added SQL and SQLite DB files for all the datasets.

@AmauryVanEspen
Copy link

Hello @cipriancraciun
there is several API available on SwaggerHub
It seems https://gravitee.io/ could also do the job or a simple flask app http://michal.karzynski.pl/blog/2016/06/19/building-beautiful-restful-apis-using-flask-swagger-ui-flask-restplus/
Is there a willingness to setup this kind of service ?

@cipriancraciun
Copy link
Author

@AmauryVanEspen I've just opened an issue about this feature request on my repository:

I would propose moving the discussion there, as this isn't strictly JHU related.

(I would be open to such an API, however we must first understand exactly what its use-cases would be.)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants