-
Notifications
You must be signed in to change notification settings - Fork 1
Codelists
!! Needs updating following recent discussions on codelists with Swirrl.
!! This section may change depending on discussion at https://github.com/GSS-Cogs/gss-utils/issues/210.
- All dimensions should have an associated codelist.
- Wherever possible, official codelists and codes defined by official bodies should be adopted to facilitate the linkage of data. URIs should be re-used when referring to the same concept.
A default codelist is generated from the data cube for any dimensions where the user has not explicitly provided their own separate codelist. Additional metadata is added to the CSVW
which defines a skos:ConceptScheme
and its entries.
A local codelist is provided by the user in separate .csv
file alongside the data cube. The user is able to provide additional metadata about these codes.
An external codelist is defined independently of the data cube. Statisticians who closely collaborate on subject matters may define a set of codelists they will adhere to as part of their data governance strategy. Since these are published independently of any data cubes we could consider these to also be external.
An external codelist can only be adopted if strictly adhered to by the user. The user must only use codes from the codelist and must not create any bespoke codes.
If a user primarily adopts an external codelist but also supplements their own bespoke codes they should create a new codelist containing both the codes from the external codelist which they have adopted but also their own bespoke codes. This is referred to as a mixed codelist. A mixed codelist combines concepts from across different local and external codelists into a single scheme.
The following attributes can be specified by the user. The approach to assigning a codelist is dependent on the combination and quantity of attributes specified.
attribute | type | description |
---|---|---|
[CODELIST_URI] |
URI | must match the string contained in the column header in the .csv being described. |
[CODELIST_CSV] |
string | one of dimension , measure , measure_dimension , measure_value or attribute . |
[CODELIST_TITLE] |
string | a human-readable title for the codelist. |
[CODELIST_DESCRIPTION] |
string | a human-readable description for the codelist. |
Where the user has not explicitly provided a [CODELIST_URI]
, one is generated during processing of the data cube .csv
file.
[CODELIST_URI] = [DATACUBE_URI]#codelist/[KEBAB_NAME]
We include virtual columns which designate the entries of the dimension's cells as skos:Concept
and also assign the literal cell value as an rdfs:label
.
// "url": "[DATACUBE_CSV]"
// [1]
{
"name": "[COLUMN_NAME]_codelist",
"virtual": true,
"aboutUrl": "[VALUE_URI]",
"propertyUrl": "skos:inScheme",
"valueUrl": "[CODELIST_URI]"
},
{
"name": "[COLUMN_NAME]_codelist_labels",
"virtual": true,
"aboutUrl": "[VALUE_URI]",
"propertyUrl": "rdfs:label",
"valueUrl": "{[COLUMN_NAME]}"
},
we also assign the concept scheme as the dimension's codelist.
// [3]
{
"@id": "[CODELIST_URI]",
"@type": "skos:ConceptScheme"
},
The user specifies the URI for an external concept scheme which they are adopting.
The user should provide a [VALUE_URI]
containing a URI template which, when evaluated against the column's entries would match skos:Concept
s from the concept scheme they have adopted.
attribute | necessity |
---|---|
[CODELIST_URI] |
required |
// [3]
{
"@id": "[CODELIST_URI]",
"@type": "skos:ConceptScheme"
},
attribute | necessity |
---|---|
[CODELIST_CSV] |
required |
[CODELIST_TITLE] |
recommended |
[CODELIST_DESCRIPTION] |
recommended |
[CODELIST_URI] |
optional |
The user may create a .csv
file containing codes used by a specific dimension, along with descriptions, labels, and other metadata, and provides the name of the file as [CODELIST_CSV]
.
If a [CODELIST_CSV]
is provided, the user should specify additional metadata about the codelist such as a [CODELIST_TITLE]
and [CODELIST_DESCRIPTON]
.
If the user provides [CODELIST_CSV]
and a [PROPERTY_URI]
they should be presented with a warning that they may be trying to define a codelist for a resource they do not own.
If [CODELIST_CSV]
is provided, an additional table should be added to the CSVW
file containing metadata relating to the codelist .csv
file.
Current convention suggests a codelist .csv
must contain the columns Label
and Notation
and should contain the columns Parent Notation
and Description
.
- Set
[CODELIST_CONCEPT_URI]
as[VALUE_URI]
, but replacing the word in curly braces{}
with the word "notation
". - Set
[CODELIST_PARENT_CONCEPT_URI]
= as[VALUE_URI]
, but replacing the word in curly braces{}
with the word "parent_notation
".
{
"@context": "http://www.w3.org/ns/csvw",
"@id": "[CODELIST_URI]",
"url": "[CODELIST_CSV]",
"rdf:type": "skos:ConceptScheme",
"dcterms:title": "[CODELIST_TITLE]",
"dcterms:description": "[CODELIST_DESCRIPTION]",
"tableSchema": {
"columns": [
{
"titles": "Label",
"name": "label",
"datatype": "string",
"required": true,
"propertyUrl": "rdfs:label"
},
{
"titles": "Notation",
"name": "notation",
"datatype": {
"base": "string",
"format": "^-?[\\w\\.\\/\\+]+(-[\\w\\.\\/\\+]+)*$"
},
"required": true,
"propertyUrl": "skos:notation"
},
{
"titles": "Parent Notation",
"name": "parent_notation",
"datatype": {
"base": "string",
"format": "^(-?[\\w\\.\\/\\+]+(-[\\w\\.\\/\\+]+)*|)$"
},
"required": false,
"propertyUrl": "skos:broader",
"valueUrl": "[CODELIST_PARENT_CONCEPT_URI]"
},
{
"titles": "Description",
"name": "description",
"datatype": "string",
"required": false,
"propertyUrl": "rdfs:comment"
},
{
"virtual": true,
"propertyUrl": "rdf:type",
"valueUrl": "skos:Concept"
},
{
"virtual": true,
"propertyUrl": "skos:inScheme",
"valueUrl": "[CODELIST_URI]"
}
],
"primaryKey": [
"notation",
"parent_notation"
],
"aboutUrl": "[CODELIST_CONCEPT_URI]"
}
}
Additionally, a foreign key relationship should be added to the table which contains the subject dimension.
{
// "url": "[DATACUBE_URI]"
// ...
"tableSchema": {
//...
"columns": [
//...
],
"foreignKeys": [{
"columnReference": "[COLUMN_NAME]",
"reference": {
"resource": "[CODELIST_CSV]",
"columnReference": "notation"
}
}]
}
If a [CODELIST_CSV]
is provided, a new table should be added to the CSVW
according to the following template.
Statistics producers regularly adopt external codelists, often defined by centralised statistical agencies, but then extend these with additional codes. Common scenarios include the inclusion of other
, unknown
or total
/all
categories but also "unofficial" aggregations to meet a specific user need.
Our goal is to encourage the use of externally defined URIs wherever possible, but we also recognise that statistics producers need the freedom to be able to define their own concepts.
The idea of specifying multiple codelists as being in use for a particular dimension was explored but was found to violate the qb
spec's IC-19. The adopted approach has been to gather all codes from multiple sources which are in use and to merge these into some sort of super codelist.
A programmatic means for generating super codelists could be:
- The user may specify one or many URIs of any external codelists they are using.
- The user may specify one or many
.ttl
files containing any external codelists they wish to adopt. - The URI resource or
.ttl
file must define one or many skos:ConceptScheme(s). - All resources are brought into a single graph. A new concept scheme URI is coined, bound to the variable
?concept_scheme
and the belowCONSTRUCT
query executed to generate a new scheme. - Any
skos:narrower
,skos:broader
andskos:hasTopConcept
predicates in place across any schemes specified bt the user, which explain the parent-child relationship between concepts will be replicated in the new scheme.
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
CONSTRUCT {
?concept_scheme a skos:ConceptScheme ;
skos:hasTopConcept ?top_concept ;
.
?concept skos:inScheme ?concept_scheme .
}
WHERE {
?external_scheme a skos:ConceptScheme ;
skos:hasTopConcept ?top_concept ;
.
?concept skos:inScheme ?external_scheme .
}