Skip to content

Example 1

Peter edited this page Nov 20, 2017 · 2 revisions

Example of conf.json and resulting summary.

{
  "db": "postgresql://postgres:postgres@localhost:5432/trydatasets",
  "github.com":{
    "lexml/lexml-vocabulary":null,
    "datasets/language-codes":null,
    "datasets/country-codes":null,
    "datasets/world-cities":{
      "_corrections_":{"resources":[{"primaryKey": "geonameid"}]}
    },
    "datasets-br/state-codes":"br-state-codes",
    "datasets-br/city-codes":null
  },
  "useBig":true,  "useIDX":false,        "useRename":true,
  "useYUml":true, "useAllNsAsDft":false
}

After run all as showed by How to install, you get at terminal two tables of summarizations:

id (ns) urn pkey jtd n_cols n_rows
1 (2)lexml:autoridade id tab-aoa 9 601
4 (2)lexml:evento id tab-aoa 9 14
5 (2)lexml:lingua id tab-aoa 9 6
2 (2)lexml:localidade id tab-aoa 9 5664
6 (2)lexml:tipoconteudo id tab-aoa 9 6
3 (2)lexml:tipodocumento id tab-aoa 9 2372
11 (3)datasets:country_codes tab-aoa 56 250
10 (3)datasets:ietf_language_tags tab-aoa 7 721
7 (3)datasets:language_codes tab-aoa 2 184
8 (3)datasets:language_codes_3b2 tab-aoa 3 184
9 (3)datasets:language_codes_full tab-aoa 5 486
12 (3)datasets:world_cities geonameid tab-aoa 4 23018
15 (4)datasets-br:br_city_codes state/lexLabel tab-aoa 9 5570
14 (4)datasets-br:br_city_synonyms state/lexLabel/synonym tab-aoa 5 26
13 (4)datasets-br:br_state_codes id tab-aoa 15 33

This 15 rows-summary was obtained by select * from dataset.vmeta_summary. The first column is the source-id, id at dataset.meta table. The (ns) is the namespace-ID, used to labeling the SQL-VIEWs, to avoid long names, so for the first the view name is dataset.vw2_autoridade. When dataset is in the "empty namespace" (default or forced when using "all namespaces as default" flag, useAllNsAsDft) the view name will be dataset.vw_autoridade.

URN is only a short to "nameSpace:datasetName", using a string with some URN conventions. Next coluns in the summary are pkey for the primary-keys when exists, and jtd for JSON Type Definition used in the dataset.big.j internal structure; cols show the "number of coluns" (fields) and rows the number of lines of data.

nspname n_tables total_bytes table_bytes table_size
dataset 4 1474560 73728 72 kB

This other summary is only for check disk-usage, it is generated by select * from pgvw_nsclass_usage where nspname='dataset'. You can use similar one to check only the dataset.big table,

Clone this wiki locally