Skip to content
Sytze Van Herck edited this page Dec 6, 2022 · 1 revision

This section illustrates answers to frequently asked questions with brief examples.

Contents

Filling with Leading Zeros

When each integer needs to be formatted with the same number of digits, Jinja code is used to adapt the string in the JSON schema.

The following example expands the string variable hisco to 5 digits by adding leading zero's:

"aboutUrl": "hisco-code:{{'%05d'|format(hisco|int)}}"

If you only want to format positive values for the hisco string variable, use conditional formatting as follows:

"aboutUrl": "hisco-code:{% if hisco|int > 0 %}{{'%05d'|format(hisco|int)}}{% else %} hisco {% endif %}"

^

Skipping NULL Values

When data is missing or unknown, cells are often left empty or recorded as NULL. Often these values need to be skipped when converting to Linked Data. Depending on the specification in the JSON schema, NULL values in different triple positions are left out.

NULL in Rows

Sometimes certain values in CSVs need to be skipped when converting to Linked Data. These so-called NULL values can be specified in the JSON schema under a column as "null":

{
    "name": "ht_feet",
    "null": ["999", "0"],
    "datatype": "xsd:integer",
    "dc:description": "height in feet",
    "titles": ["ht_feet"],
    "propertyUrl": "microheights:dimension/ht_feet",
    "@id": "https://iisg.amsterdam/__af_ke_1890_milit_2009_moradi.dta.csv/column/ht_feet"
   },

In this example individuals with a ht_feet of "999" or "0" are skipped. Since CoW reads NULL values as strings, remember to use quote marks.

^

NULL in Multiple Columns

Virtual columns can create dates based on multiple columns in the CSV. When one of those columns contains a NULL value, the generation of a triple can be prevented using a @list of conditions.

{
        "virtual": "true",
        "null": {"@list": [{"name": "birth_year", "null": ""},
        {"name": "birth_month", "null": ""},
	{"name": "birth_year", "null": ""}]},
        "datatype": "xsd:date",
    	"dc:description": "Birth Date",
        "propertyUrl": "schema:birthDate",
    	"csvw:value": "{{['%04d'| format(birth_year|int),'-','%02d' | format(birth_month|int),'-','%02d'|format(birth_day|int)]|join}}"
   },

If none of the date variables (birth_day/birth_month/birth_year) are empty, the JSON schema will result in a triple. If at least one of the variables (columns) in the list is empty ("null": ""), no birthDate triple will be generated for the corresponding row in the CSV. In this example NULL is specified as an empty cell (""), but this can be any string. For an explanation of the formatting of the result, see dates.

^

NULL in Virtual Column

Virtual columns can also refer to only a part of a date in a single column. Since the NULL value is checked for a virtual column, the JSON schema still requires a @list even for a single value from a variable (column).

{
     "virtual": true,
     "null": {"@list": [{"name": "byear", "null": ""}]},
     "datatype": "xsd:gYear",
     "propertyUrl": "microheights:dimension/bdec",
     "csvw:value": "{{[byear[0:3],'0']|join }}",
     "dc:description": "decade in which birth was recorded"
   },

If the birth year byear has an empty cell, the decade in which the birth was recorded bdec will not be recorded.

^