-
Notifications
You must be signed in to change notification settings - Fork 0
Example 1
Example of conf.json
and resulting summary.
{
"db": "postgresql://postgres:postgres@localhost:5432/trydatasets",
"github.com":{
"lexml/lexml-vocabulary":null,
"datasets/language-codes":null,
"datasets/country-codes":null,
"datasets/world-cities":{
"_corrections_":{"resources":[{"primaryKey": "geonameid"}]}
},
"datasets-br/state-codes":"br-state-codes",
"datasets-br/city-codes":null
},
"useBig":true, "useIDX":false, "useRename":true,
"useYUml":true, "useAllNsAsDft":false
}
After run all as showed by How to install, you get at terminal two tables of summarizations:
id | (ns) urn | pkey | jtd | n_cols | n_rows |
---|---|---|---|---|---|
1 | (2)lexml:autoridade | id | tab-aoa | 9 | 601 |
4 | (2)lexml:evento | id | tab-aoa | 9 | 14 |
5 | (2)lexml:lingua | id | tab-aoa | 9 | 6 |
2 | (2)lexml:localidade | id | tab-aoa | 9 | 5664 |
6 | (2)lexml:tipoconteudo | id | tab-aoa | 9 | 6 |
3 | (2)lexml:tipodocumento | id | tab-aoa | 9 | 2372 |
11 | (3)datasets:country_codes | tab-aoa | 56 | 250 | |
10 | (3)datasets:ietf_language_tags | tab-aoa | 7 | 721 | |
7 | (3)datasets:language_codes | tab-aoa | 2 | 184 | |
8 | (3)datasets:language_codes_3b2 | tab-aoa | 3 | 184 | |
9 | (3)datasets:language_codes_full | tab-aoa | 5 | 486 | |
12 | (3)datasets:world_cities | geonameid | tab-aoa | 4 | 23018 |
15 | (4)datasets-br:br_city_codes | state/lexLabel | tab-aoa | 9 | 5570 |
14 | (4)datasets-br:br_city_synonyms | state/lexLabel/synonym | tab-aoa | 5 | 26 |
13 | (4)datasets-br:br_state_codes | id | tab-aoa | 15 | 33 |
This 15 rows-summary was obtained by select * from dataset.vmeta_summary
. The first column is the source-id, id
at dataset.meta
table. The (ns) is the namespace-ID, used to labeling the SQL-VIEWs, to avoid long names, so for the first the view name is dataset.vw2_autoridade
. When dataset is in the "empty namespace" (default or forced when using "all namespaces as default" flag, useAllNsAsDft) the view name will be dataset.vw_autoridade
.
URN is only a short to "nameSpace:datasetName", using a string with some URN conventions. Next coluns in the summary are pkey for the primary-keys when exists, and jtd for JSON Type Definition used in the dataset.big.j
internal structure; cols show the "number of coluns" (fields) and rows the number of lines of data.
nspname | n_tables | total_bytes | table_bytes | table_size |
---|---|---|---|---|
dataset | 4 | 1474560 | 73728 | 72 kB |
This other summary is only for check disk-usage, it is generated by select * from pgvw_nsclass_usage where nspname='dataset'
. You can use similar one to check only the dataset.big
table,