2. How to import datasets

You can import to the dataset.big table, datasets described by datapackage or non-prepared datasets:

prepared with a standard datapackage.json v1.0:
- from Web at Github.com: use the key github.com.
- from Web at other like GitLab: later, it is easy but not implemented yet.
- from Localhost: at your machine or the server of a remote PostgreSQL. Key local.
non-prepared: any CSV file, but only local option at this time. Key local-csv.

With these basic clues you can understand and edit your config.json to select what you will import. It is a "configuration + import list" for the generator software, that generates a make file that you can run as shell script in anywhere (local or server) to import datasets

The `conf.json` file

All configurations to the make-generator and a list of resources of your datasets are expressed as simple key-value pairs at this file. Lets start with Example 1, that is the "default" configuration at distribution:

  "db": "postgresql://postgres:postgres@localhost:5432/trydatasets",
  "github.com":{
    "lexml/lexml-vocabulary":null,
    "datasets/language-codes":null,
    "datasets/country-codes":null,
    "datasets/world-cities":{
      "_corrections_":{"resources":[{"primaryKey": "geonameid"}]}
    },
    "datasets-br/state-codes":"br-state-codes",
    "datasets-br/city-codes":null
  },
  "useBig":true,  "useIDX":false,        "useRename":true,
  "useYUml":true, "useAllNsAsDft":false
}

db is the PostgreSQL connection string.
github.com is a well-knowned place for datasets... So, only with Github's project name the software can get all files. Contents and some explanations:
- "datasets/language-codes":null is a Github project at http://github.com/datasets/language-codes
  There are a /datapackage.json file and a /data folder with the CSV files pointed by datapackage.json. There are 4 CVS files, the null say that you need all of them.
- "datasets-br/state-codes":"br-state-codes", here the string "br-state-codes" reduced "all" to only one CSV. It is at datasets-br/state-codes/data.
- "datasets/world-cities":{...} is also not null, but now have some information. The typical one is to do correctins, and it is only a replacement for other informations at the project's datapackage.json. The first item at "resources" array.
useBig, useIDX, etc. are flags.

Using local and local-csv

Other conf.json example:

{
   "db":"postgresql://postgres:postgres@localhost:5432/trydatasets",
   "github.com":{
        "datasets/country-codes":null,
        "datasets-br/state-codes":"br-state-codes",
        "datasets-br/city-codes":null
   },
   "local": {
        "/tmp/test1":null
   },
   "local-csv":{
     "test2017":{
       "separator":";",
       "folder":"/home/user/mytests"
     },
     "otherTests":"/home/user/myOthertests"
   },
   "useBig":true, "useIDX":false, "useRename":true
}

"local" lists the local folders containing usual datapackage.json at root, so all other behaviours are the same tham Github's.
"local-csv" poits directly a CSV files, with no datapackage descriptor. So, some more information is necessary. Most commom is the CSV-separator. The name is used to define dataset's namespace.

Messages of the make-generator

...

BEGIN of cache-scripts generation

 CONFIGS (github.com): NsAsDft= useIDX=, count=6 items.

 Creating cache-scripts for lexml/lexml-vocabulary of github.com:
	 Building table1 with data/autoridade.csv.
	 Building table2 with data/localidade.csv.
	 Building table3 with data/tipoDocumento.csv.
	 Building table4 with data/evento.csv.
	 Building table5 with data/lingua.csv.
	 Building table6 with data/tipoConteudo.csv.
 Creating cache-scripts for datasets/language-codes of github.com:
	 Building table7 with data/language-codes.csv.
	 Building table8 with data/language-codes-3b2.csv.
	 Building table9 with data/language-codes-full.csv.
	 Building table10 with data/ietf-language-tags.csv.
 Creating cache-scripts for datasets/country-codes of github.com:
	 Building table11 with data/country-codes.csv.
 Creating cache-scripts for datasets/world-cities of github.com:
	 -- Notice: using conf-corrections for datapackage
		... Replacing resources[0][primaryKey] by 'geonameid'
	 Building table12 with data/world-cities.csv.
 Creating cache-scripts for datasets-br/state-codes of github.com:
	 Building table13 with data/br-state-codes.csv.
 Creating cache-scripts for datasets-br/city-codes of github.com:
	 Building table14 with data/br-city-synonyms.csv.
	 Building table15 with data/br-city-codes.csv.
END of cache-scripts generation

Associated importation

The configuration and first output results are at Example 1. To check what was imported you can compare conf.josn directives with the vmeta_summary,

select * from  dataset.vmeta_summary;
 id |               urn               |          pkey          |   jtd   | n_cols | n_rows 
----+---------------------------------+------------------------+---------+--------+--------
  1 | (2)lexml:autoridade             | id                     | tab-aoa |      9 |    601
  4 | (2)lexml:evento                 | id                     | tab-aoa |      9 |     14
 ...
 15 | (4)datasets-br:br_city_codes    | state/lexLabel         | tab-aoa |      9 |   5570
 14 | (4)datasets-br:br_city_synonyms | state/lexLabel/synonym | tab-aoa |      5 |     26
 13 | (4)datasets-br:br_state_codes   | id                     | tab-aoa |     15 |     33
(15 rows)

Conf's datasets/country-codes generated the datasets namespace, and there was "datasets-br/state-codes":"br-state-codes", "datasets-br/city-codes":null

...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

2. How to import datasets

The `conf.json` file

Using local and local-csv

Messages of the make-generator

Associated importation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally

2. How to import datasets

The conf.json file

Using local and local-csv

Messages of the make-generator

Associated importation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally

The `conf.json` file