Skip to content

Commit 0d5a364

Browse files
committed
Update EXAMPLES.md
Adapt to changes in different statistical offices APIs.
1 parent b4fd122 commit 0d5a364

File tree

1 file changed

+41
-41
lines changed

1 file changed

+41
-41
lines changed

docs/EXAMPLES.md

Lines changed: 41 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -118,7 +118,7 @@ As you probably noticed, the _eurostat.json_ does not have the structure we need
118118
{
119119
"unit": "Percentage of active population",
120120
"sex": "Total",
121-
"age": "Total",
121+
"age": "From 15 to 74 years",
122122
"time": "2005",
123123
"geo": "Austria",
124124
"value": 5.6
@@ -134,7 +134,7 @@ Instead, we need to have a property for each category of the _geo_ dimension (wh
134134
{
135135
"unit": "Percentage of active population",
136136
"sex": "Total",
137-
"age": "Total",
137+
"age": "From 15 to 74 years",
138138
"time": "2005",
139139
"AT": 5.6,
140140
"BE": 8.5,
@@ -152,10 +152,10 @@ That is, we need to transpose the values by _geo_:
152152
jsonstat2arrobj eurostat.jsonstat eurostat-transp.json --by geo
153153
```
154154

155-
Dataset **tesem120** contains several single-category dimensions: <strike>_sex_ and _age_ are always "Total" and _unit_ is always "Percentage of active population"</strike>. **Correction**: That was true in the past, but at some point **tesem120** began including data by sex. This could also happen in the future with **age** or even **unit**. Because we are only interested in unemployment as a percentage of active population (PC_ACT) and we don't care about _sex_ or _age_, we need to create a subset of eurostat.jsonstat:
155+
Because we are only interested in unemployment as a percentage of active population (PC_ACT) and we don't care about _sex_ or _age_, we need to create a subset of eurostat.jsonstat:
156156

157157
```
158-
jsonstatslice eurostat.jsonstat eurostat-subset.jsonstat --filter sex=T,age=TOTAL,unit=PC_ACT
158+
jsonstatslice eurostat.jsonstat eurostat-subset.jsonstat --filter sex=T,age=Y15-74,unit=PC_ACT
159159
```
160160

161161
Now that we are sure that _sex_, _age_ and _unit_ are single-category dimensions, we can remove them from the transposed JSON:
@@ -197,7 +197,7 @@ All the process has required three lines and three files (_eurostat.jsonstat_, _
197197
```
198198
curl "https://ec.europa.eu/eurostat/wdds/rest/data/v2.1/json/en/tesem120?precision=1" -o eurostat.jsonstat
199199
200-
jsonstatslice eurostat.jsonstat eurostat-subset.jsonstat --filter sex=T,age=TOTAL,unit=PC_ACT
200+
jsonstatslice eurostat.jsonstat eurostat-subset.jsonstat --filter sex=T,age=Y15-74,unit=PC_ACT
201201
202202
jsonstat2arrobj eurostat-subset.jsonstat eurostat-drop.json --by geo --drop sex,age,unit
203203
@@ -221,19 +221,19 @@ jsonstat2arrobj < eurostat.jsonstat > eurostat.json --stream
221221
So to get a comma-delimited CSV with dot as the decimal mark in a single line:
222222

223223
```
224-
curl "https://ec.europa.eu/eurostat/wdds/rest/data/v2.1/json/en/tesem120?precision=1" | jsonstatslice --filter sex=T,age=TOTAL,unit=PC_ACT --stream | jsonstat2arrobj --by geo --drop sex,age,unit --stream | json2csv > eurostat.csv
224+
curl "https://ec.europa.eu/eurostat/wdds/rest/data/v2.1/json/en/tesem120?precision=1" | jsonstatslice --filter sex=T,age=Y15-74,unit=PC_ACT --stream | jsonstat2arrobj --by geo --drop sex,age,unit --stream | json2csv > eurostat.csv
225225
```
226226

227227
Or a little shorter:
228228

229229
```
230-
curl "https://ec.europa.eu/eurostat/wdds/rest/data/v2.1/json/en/tesem120?precision=1" | jsonstatslice -f sex=T,age=TOTAL,unit=PC_ACT -t | jsonstat2arrobj -b geo -d sex,age,unit -t | json2csv > eurostat.csv
230+
curl "https://ec.europa.eu/eurostat/wdds/rest/data/v2.1/json/en/tesem120?precision=1" | jsonstatslice -f sex=T,age=Y15-74,unit=PC_ACT -t | jsonstat2arrobj -b geo -d sex,age,unit -t | json2csv > eurostat.csv
231231
```
232232

233233
And to get a semicolon-delimited CSV with comma as the decimal mark:
234234

235235
```
236-
curl "https://ec.europa.eu/eurostat/wdds/rest/data/v2.1/json/en/tesem120?precision=1" | jsonstatslice -f sex=T,age=TOTAL,unit=PC_ACT -t | jsonstat2arrobj -b geo -d sex,age,unit -k -t | json2csv > eurostat-semi.csv -w ";"
236+
curl "https://ec.europa.eu/eurostat/wdds/rest/data/v2.1/json/en/tesem120?precision=1" | jsonstatslice -f sex=T,age=Y15-74,unit=PC_ACT -t | jsonstat2arrobj -b geo -d sex,age,unit -k -t | json2csv > eurostat-semi.csv -w ";"
237237
```
238238

239239
## A UNECE Example
@@ -480,9 +480,9 @@ json2csv < no-ratio.json > no.csv
480480
```
481481
time,ratio
482482
2006M02,0.9026666666666667
483-
2006M03,0.9038718291054739
483+
2006M03,0.90520694259012
484484
2006M04,0.90520694259012
485-
2006M05,0.9054593874833556
485+
2006M05,0.9067909454061251
486486
2006M06,0.905710491367862
487487
2006M07,0.9060846560846562
488488
...
@@ -511,7 +511,7 @@ To draw a time series in a line chart, Visual expects that we provide two separa
511511
```
512512
[
513513
0.9026666666666667,
514-
0.9038718291054739,
514+
0.90520694259012,
515515
0.90520694259012,
516516
...
517517
]
@@ -568,44 +568,44 @@ Let's assume that we must build the population pyramid of Ireland.
568568

569569
#### 1. Retrieve the population by sex from the Central Statistics Office of Ireland
570570

571-
You'll need to find the JSON-stat dataset URL on CSO's Statbank API. Go to
571+
You'll need to find the JSON-stat dataset URL on CSO's PxStat. Go to
572572

573-
https://www.cso.ie/webserviceclient/DatasetListing.aspx
573+
https://data.cso.ie/
574574

575575
and then
576576

577577
```
578-
People and Society
578+
Population Estimates
579579
> Annual Population Estimates
580-
> Population Estimates (Persons in April) by Age Group, Sex and Year
580+
> PEA01 - Population Estimates (Persons in April)
581581
```
582582

583-
[Dataset PEA01](https://www.cso.ie/webserviceclient/DatasetDetails.aspx?id=PEA01) from CSO provides a yearly time series of population by sex. It is available in the JSON-stat format at:
583+
[Dataset PEA01](https://data.cso.ie/table/PEA01) from CSO provides a yearly time series of population by sex and age group. It is available in the JSON-stat format at:
584584

585-
https://statbank.cso.ie/StatbankServices/StatbankServices.svc/jsonservice/responseinstance/PEA01
585+
https://ws.cso.ie/public/api.restful/PxStat.Data.Cube_API.ReadDataset/PEA01/JSON-stat/2.0/en
586586

587587
You can view the contents of the dataset at
588588

589-
https://jsonstat.com/explorer/#/https%3A%2F%2Fstatbank.cso.ie%2FStatbankServices%2FStatbankServices.svc%2Fjsonservice%2Fresponseinstance%2FPEA01
589+
https://jsonstat.com/explorer/#/https%3A%2F%2Fws.cso.ie%2Fpublic%2Fapi.restful%2FPxStat.Data.Cube_API.ReadDataset%2FPEA01%2FJSON-stat%2F2.0%2Fen
590590

591591
To download the dataset from the command line, run [cURL](https://curl.haxx.se/dlwiz/?type=bin):
592592

593593
```
594-
curl https://statbank.cso.ie/StatbankServices/StatbankServices.svc/jsonservice/responseinstance/PEA01 -o ie.jsonstat
594+
curl https://ws.cso.ie/public/api.restful/PxStat.Data.Cube_API.ReadDataset/PEA01/JSON-stat/2.0/en -o ie.jsonstat
595595
```
596596

597597
#### 2. Convert JSON-stat to a more popular JSON data structure
598598

599-
In this step, we will convert the JSON-stat file into an array of objects transposing dimension *Sex*. The dataset contains a dimension (*Statistic*) with a single category (*Population Estimates (Persons in April) (Thousand)*): we won't need it. Using **jsonstat2arrobj** like in previous examples:
599+
In this step, we will convert the JSON-stat file into an array of objects transposing dimension Sex (*C02199V02655*). The dataset contains a dimension (*STATISTIC*) with a single category (*Population Estimates (Persons in April)*): we won't need it. Using **jsonstat2arrobj** like in previous examples:
600600

601601
```
602-
jsonstat2arrobj ie.jsonstat ie.json --drop Statistic --by Sex --bylabel
602+
jsonstat2arrobj ie.jsonstat ie.json --drop STATISTIC --by C02199V02655 --bylabel
603603
```
604604

605605
Or using the stream interface:
606606

607607
```
608-
jsonstat2arrobj < ie.jsonstat > ie.json --drop Statistic --by Sex --bylabel --stream
608+
jsonstat2arrobj < ie.jsonstat > ie.json --drop STATISTIC --by C02199V02655 --bylabel --stream
609609
```
610610

611611
The only difference between the previous two lines is that in the stream interface *ie.json* will be written even though it already existed while in the non-stream interface a new filename is used to avoid losing the content of an existing file.
@@ -620,13 +620,13 @@ First we need to convert JSON to [NDJSON](http://ndjson.org/):
620620
ndjson-split < ie.json > ie.ndjson
621621
```
622622

623-
Because we are only interested in data for the latest year (2019 at the time of writing), we need to apply this filtering condition:
623+
Because we are only interested in data for the latest year (2020 at the time of writing), we need to apply this filtering condition:
624624

625625
```js
626-
d.Year==='2019'
626+
d['TLIST(A1)']==='2020'
627627
```
628628

629-
We also want to remove the age total (*All ages*) and all the subtotals included in the dataset:
629+
We also want to remove the age (*C02076V02508*) total (*All ages*) and all the subtotals included in the dataset:
630630

631631
* _15 years and over_
632632
* _65 years and over_
@@ -648,21 +648,21 @@ They are not needed to build a population pyramid. One way to achieve this in Ja
648648
'15 - 24 years',
649649
'25 - 44 years',
650650
'45 - 64 years'
651-
].indexOf( d['Age Group'] ) < 0
651+
].indexOf( d['C02076V02508'] ) < 0
652652
```
653653

654654
The resulting filtering command is then:
655655

656656
```
657-
ndjson-filter "d.Year==='2019' && ['All ages', '15 years and over', '65 years and over', '0 - 4 years', '0 - 14 years', '15 - 24 years', '25 - 44 years', '45 - 64 years'].indexOf(d['Age Group'])<0" < ie.ndjson > ie-filtered.ndjson
657+
ndjson-filter "d['TLIST(A1)']==='2020' && ['All ages', '15 years and over', '65 years and over', '0 - 4 years', '0 - 14 years', '15 - 24 years', '25 - 44 years', '45 - 64 years'].indexOf(d['C02076V02508'])<0" < ie.ndjson > ie-filtered.ndjson
658658
```
659659

660660
#### 4. Transform data
661661

662662
Many visualization tools do not have pyramids as a type of chart, because they are actually just a special case of a bar chart where the male values have negative values. This is the case of Google Sheets, the tool we are going to use. So the next step is to keep only the information we want and multiply male values by -1.
663663

664664
```
665-
ndjson-map "{ Age: d['Age Group'], Sex: d.Sex, Male: -1*d.Male, Female: d.Female }" < ie-filtered.ndjson > ie-pyram.ndjson
665+
ndjson-map "{ Age: d['C02076V02508'], Sex: d['C02199V02655'], Male: -1*d.Male, Female: d.Female }" < ie-filtered.ndjson > ie-pyram.ndjson
666666
```
667667

668668
In the Norwegian example, we used **ndjson-reduce** to go back from NDJSON to JSON.
@@ -691,11 +691,11 @@ We've ended up with a CSV that looks like this:
691691

692692
```
693693
Age,Male,Female
694-
Under 1 year,-33.9,32.3
695-
1 - 4 years,-148,141.3
696-
5 - 9 years,-183.8,179.4
697-
10 - 14 years,-163.4,157.1
698-
15 - 19 years,-148,140.1
694+
Under 1 year,-29.9,28.4
695+
1 - 4 years,-128.3,122.9
696+
5 - 9 years,-176.3,167.8
697+
10 - 14 years,-179.4,170.6
698+
15 - 19 years,-164.7,159.3
699699
...
700700
```
701701

@@ -704,7 +704,7 @@ Under 1 year,-33.9,32.3
704704
In a single line:
705705

706706
```
707-
curl https://statbank.cso.ie/StatbankServices/StatbankServices.svc/jsonservice/responseinstance/PEA01 | jsonstat2arrobj -d Statistic -b Sex -l -t | ndjson-split | ndjson-filter "d.Year==='2019' && ['All ages', '15 years and over', '65 years and over', '0 - 4 years', '0 - 14 years', '15 - 24 years', '25 - 44 years', '45 - 64 years'].indexOf(d['Age Group'])<0" | ndjson-map "{Age: d['Age Group'], Sex: d.Sex, Male: -1*d.Male, Female: d.Female}" | json2csv -n > ie.csv
707+
curl https://ws.cso.ie/public/api.restful/PxStat.Data.Cube_API.ReadDataset/PEA01/JSON-stat/2.0/en | jsonstat2arrobj -d STATISTIC -b C02199V02655 -l -t | ndjson-split | ndjson-filter "d['TLIST(A1)']==='2020' && ['All ages', '15 years and over', '65 years and over', '0 - 4 years', '0 - 14 years', '15 - 24 years', '25 - 44 years', '45 - 64 years'].indexOf(d['C02076V02508'])<0" | ndjson-map "{Age: d['C02076V02508'], Sex: d['C02199V02655'], Male: -1*d.Male, Female: d.Female}" | json2csv -n > ie.csv
708708
```
709709

710710
#### 7. Data visualization
@@ -827,7 +827,7 @@ In this example, we will be doing several translations.
827827
curl "https://stats.oecd.org/SDMX-JSON/data/KEI/PS+PR+PRINTO01+SL+SLRTTO01+SLRTCR03+OD+ODCNPI03+CI+LO+LOLITOAA+LORSGPRT+LI+LF+LFEMTTTT+LR+LRHUTTTT+LC+LCEAMN01+UL+ULQEUL01+PP+PI+CP+CPALTT01+FI+MA+MABMM301+IR+IRSTCI01+IR3TIB01+IRLTLT01+SP+SPASTT01+CCUSMA02+XT+XTEXVA01+XTIMVA01+BP+B6BLTT02+NA+NAEXKP01+NAEXKP02+NAEXKP03+NAEXKP04+NAEXKP06+NAEXKP07.AUS+AUT+BEL+CAN+CHL+CZE+DNK+EST+FIN+FRA+DEU+GRC+HUN+ISL+IRL+ISR+ITA+JPN+KOR+LVA+LTU+LUX+MEX+NLD+NZL+NOR+POL+PRT+SVK+SVN+ESP+SWE+CHE+TUR+GBR+USA+EU28+G-7+OECDE+G-20+OECD+NMEC+ARG+BRA+CHN+COL+IND+IDN+RUS+SAU+ZAF.GP.M/all?startTime=2018-01&endTime=2020-01&dimensionAtObservation=allDimensions" -o kei.sdmx.json
828828
```
829829

830-
This line of code produces an SDMX-JSON file with the growth over the previous period of some key economic indicators for several locations. The size of _kei.sdmx.json_ is 393 Kb.
830+
This line of code produces an SDMX-JSON file with the growth over the previous period of some key economic indicators for several locations. The size of _kei.sdmx.json_ is (at the time of writing) 430 Kb.
831831

832832
#### 2. Convert SDMX-JSON to JSON-stat
833833

@@ -837,15 +837,15 @@ This line of code produces an SDMX-JSON file with the growth over the previous p
837837
sdmx2jsonstat kei.sdmx.json default.stat.json
838838
```
839839

840-
The JSON-stat file is smaller (232 Kb) than the original SDMX-JSON one. An even smaller file can be produced: by default, **sdmx2jsonstat** uses arrays to express values and statuses. JSON-stat supports both arrays and objects for this purpose. Because usually only a few data have status information, it is generally better to use an object for statuses.
840+
The JSON-stat file is smaller (250 Kb) than the original SDMX-JSON one. An even smaller file can be produced: by default, **sdmx2jsonstat** uses arrays to express values and statuses. JSON-stat supports both arrays and objects for this purpose. Because usually only a few data have status information, it is generally better to use an object for statuses.
841841

842842
**sdmx2jsonstat** supports objects for status information using the _--ostatus_ option (_-s_).
843843

844844
```
845845
sdmx2jsonstat kei.sdmx.json kei.stat.json -s
846846
```
847847

848-
The new JSON-stat file is now only 168 Kb: less than half the original SDMX-JSON one.
848+
The new JSON-stat file is now only 181 Kb: less than half the original SDMX-JSON one.
849849

850850
#### 3. Convert JSON-stat to CSV
851851

@@ -855,31 +855,31 @@ Because now we have a regular JSON-stat file, it is trivial to convert it to CSV
855855
jsonstat2csv kei.stat.json kei.csv
856856
```
857857

858-
The new file is very big (1,174 Kb) because by default labels, instead of identifiers, are used. **jsonstat2csv** has several options to avoid this. But you don&rsquo;t actually has to choose between labels or identifiers (each serves a different purpose): you can use the ([CSV-stat](https://github.com/jsonstat/csv)) format as the output format: CSV-stat supports the core semantics of JSON-stat using an enriched CSV structure.
858+
The new file is very big (1,3 Mb) because by default labels, instead of identifiers, are used. **jsonstat2csv** has several options to avoid this. But you don&rsquo;t actually has to choose between labels or identifiers (each serves a different purpose): you can use the ([CSV-stat](https://github.com/jsonstat/csv)) format as the output format: CSV-stat supports the core semantics of JSON-stat using an enriched CSV structure.
859859

860860
You can produce CSV-stat with the _--rich_ option (_-r_):
861861

862862
```
863863
jsonstat2csv kei.stat.json kei.rich.csv -r
864864
```
865865

866-
This command produces a 510 Kb file.
866+
This command produces a 547 Kb file.
867867

868868
#### 4. Back to JSON-stat
869869

870870
```
871871
csv2jsonstat kei.rich.csv default.json
872872
```
873873

874-
The size of the new JSON-stat is 231 Kb: it is a little smaller than the original JSON-stat had some extension information that was lost in CSV-stat.
874+
The size of the new JSON-stat is 241 Kb: it is a little smaller than the original JSON-stat had some extension information that was lost in CSV-stat.
875875

876876
This file can be minimized using objects for statuses, thanks to **jsonstat2jsonstat**:
877877

878878
```
879879
jsonstat2jsonstat default.json kei.json -m -s
880880
```
881881

882-
The size of the resulting file is 167 Kb.
882+
The size of the resulting file is 181 Kb.
883883

884884
#### 5. Producing a key economic indicators CSV for a particular country
885885

0 commit comments

Comments
 (0)