Overview • Quick start • Debug • Additional info Containers
Requirements:
Docker Compose environment for ingesting metadata from different spatial/semantic/general metadata sources into CKAN.
- OGC harvester (WCS/WFS, WMS & WMTS services)
- CSW harvester (ISO 19115/19139 Metadata Catalogue Services)
- Spreadsheets (XLS/XLSX)
- Metadata files (XML ISO19139)
- CKAN API - WIP
- Semantic metadata files (RDF/TTL) - WIP
- Tabular data (CSV, TSV) - WIP
Note
It can be tested with an open data portal of the CKAN type such as: : mjanez/ckan-docker1
First copy the .env.example
template and configure by changing the .env
file. Change PYCSW_URL
and CKAN_URL
, as well as the Harvester info OGC2CKAN INFO
, if needed.
cp .env.example .env
Custom ennvars:
-
CKAN_URL
: CKAN site URL to load the harvested datasets. -
PYCSW_URL
: PyCSW site URL to load the harvested datasets. -
APP_DIR
: Path to the application folder in Docker. -
TZ
: Timezone. -
CKAN_API_KEY
: CKAN authorisation key can be created at{CKAN_URL}/user/admin
. -
DEFAULT_LICENSE
: Default license for the harvested datasets. Open Data default:http://creativecommons.org/licenses/by/4.0/
-
DEFAULT_LICENSE_ID
: Default license ID for the harvested datasets, ID list:{ckan_site_url}/api/3/action/license_list
. Open Data default:cc-by-4.0
-
PARALLELIZATION
: [WIP] Parallelization of the harvesters. Default:False
-
CKAN_DATASET_SCHEMA
: Dataset schema of the CKAN Endpoint. Default:geodcatap_eu
-
SSL_UNVERIFIED_MODE
: SSL certificate from host will download ifSSL_UNVERIFIED_MODE=True
. Ennvar to avoid SSL error when certificate was self-signed. -
METADATA_DISTRIBUTIONS
: If need to create a metadata distributions as CKAN resources (GeoDCAT-AP/ISO19139), setMETADATA_DISTRIBUTIONS=True
. Default:False
Warning
SSL_UNVERIFIED_MODE=True
is not recommended for production environments. Update your certificate or use a valid one. Check the container log if it fails, and putTrue
in the.env
file.
Then configure your custom ckan-ogc/conf/config.yaml.template
. Define the harvest servers and the CKAN DCAT default info.
- Put your XLS/CSV/XML files in: ./data/* folder as you need.
Note
Also if you need create yous custom organization YAML file inogc2ckan/mappings/organizations
. Use the templateogc2ckan/mappings/organizations/organizations.yaml.template
to create your custom file.
To deploy the environment, docker compose
will build the latest source in the repo.
To deploy a 5 minutes
image, use the latest (ghcr.io/mjanez/ckan-ogc:latest
) with docker-compose.ghcr.yml
git clone https://github.com/mjanez/ckan-ogc
cd ckan-ogc
docker compose up --build
# Github latest registry image
docker compose -f docker-compose.ghcr.yml --build
# Or detached mode
docker compose up -d --build
Note:
Deploy the dev (local build)docker-compose.dev.yml
with:docker compose -f docker-compose.dev.yml up --build
Note:
If needed, to build a specific container simply run:docker build -t target_name xxxx/
Dependencies:
python3 -m pip install --user pip3
pip3 install pdm
pdm install --no-self
configure your custom config.yaml
. Define the harvest servers and the CKAN DCAT default info.
cp ckan-ogc/conf/config.yaml.template ./config.yaml
Remember to configure your .env
cp .env.example .env
Run:
pdm run python ogc2ckan/ogc2ckan.py
- Build and run container.
- Attach Visual Studio Code to container.
- Start debugging on
ogc2ckan.py
Python file (Debug the currently active Python file
) in the container.
- Update the previously created
.env
file in the root of theckan-ogc
repo and move it to:/ogc2ckan
- Open
ogc2ckan.py
. - Start debugging on
ogc2ckan.py
Python file (Debug the currently active Python file
).
Note<br By default, the Python extension looks for and loads a file named
.env
in the current workspace folder. More info about Python debugger and Enviromental variables use.
The CKAN output schemas are located in the ogc2ckan/ckan_datasets
folder. The schemas are used to map the metadata fields from the different sources to the CKAN dataset fields. Now are available the following schemas:
geodcatap
: Schema based in GeoDCAT-AP Schema for CKAN.base
: A DCAT schema with the basic fields.
You can create your own Schema.
The harvester is located in the ogc2ckan/harvesters
folder. The harvester is a Python script that harvests the metadata from the different sources and creates the datasets in CKAN.
There are differente harvesters:
csw
: Harvests the metadata from a CSW server using OWSLib.table
: Harvests the metadata from a XLS/XLSX file that contains the metadata records in a table format using the CKANfield_name
of the custom schemas as the column name.ogc
: Harvests the metadata from a OGC server (WCS/WFS, WMS & WMTS services) using OWSLib.xml
: Harvests the metadata from a XML file that contains the metadata records in a ISO19139 format.
You can create your own Harvester.
The configuration file is located in the ckan-ogc/conf/config.yaml.template
file. It is a YAML file that contains the configuration of the harvesters and the CKAN DCAT default info.
The configuration file contains the elements that are used to configure the harvesters and the CKAN DCAT default info. The required elements inside the Harvesters are specified at Harvester Schema in the ogc2ckan/model/harvest_schema.py
file.
A custom organization is a YAML file that contains the custom metadata fields that will be used to create the dataset in CKAN. The custom organization is located in the ogc2ckan/mappings/organizations
folder.
If you need create yous custom organization YAML file use the template-org.yaml
to create your custom file. Specified by the dataset_id
the custom organization will be used in the harvested datasets (if the organization exists in the CKAN instance) to create the dataset with the custom metadata fields.
List of containers:
Repository | Type | Docker tag | Size | Notes |
---|---|---|---|---|
python 3.13 | base image | python/python:3.13-slim-bullseye |
45.57 MB | - |
Repository | Type | Docker tag | Size | Notes |
---|---|---|---|---|
mjanez/ckan-ogc | custom image | mjanez/ckan-ogc:latest-dev |
582 MB | Latest stable version from Registry. |
mjanez/ckan-ogc | custom image | mjanez/ckan-ogc:main-dev |
582 MB | Dev version from Registry. |
mjanez/ckan-ogc | custom image | mjanez/ckan-ogc:latest |
457 MB | Latest stable version. |
mjanez/ckan-ogc | custom image | mjanez/ckan-ogc:main |
457 MB | Development branch version. |
Footnotes
-
A custom installation of Docker Compose with specific extensions for spatial data and GeoDCAT-AP/INSPIRE metadata profiles. ↩