Data reports on open metadata status in public knowledge bases by the KOMET project (https://projects.tib.eu/komet) on citation metadata and geometadata for scholarly works. This repository implements tracking of contributions to the open metadata commons from OJS-based journals using the structured citations feature (as of OJS 3.6, formerly the citation manager plugin) and the geometadata plugin.
Target platforms: OpenCitations and Wikidata
View the report: https://geoinformationsystems.github.io/komet-report/
This evaluation tracks the contributions of the KOMET project to the open metadata commons. The primary goal is to measure the impact of OJS features and plugins that enable journals to contribute citation and geospatial metadata to open knowledge bases. Key questions are:
- How many citation relationships have KOMET contributions added to OpenCitations and Wikidata?
- What is the baseline citation coverage for partner journals in Wikidata?
- How does metadata coverage change over time as more journals adopt the plugins?
As the main uptake developments are likely to occure after the project funding phase, this project contains a computational notebook that collects and saves relevant indicators. The notebook is executed automatically on a regular schedule. The analysis uses a set of journals from KOMET's collaboration partners, but may be readily extended. Future projects may then expand the analysis and use the collected historic data to improve and extend the software infrastructure for capturing citation and geospatial metadata.
| Source | API | Purpose | Rate Limits |
|---|---|---|---|
| OpenCitations | GitHub Issues API | Track crowdsourced citation deposits | 60 req/h (unauthenticated) |
| Wikidata | SPARQL endpoint | Query scholarly article metadata, citations (P2860), journal relationships (P1433) | Soft limits, timeout-based |
The following indicators are saved in local JSON files for the two main data sources.
- OpenCitations
opencitations.total_issues: Total crowdsourcing issues in the repositoryopencitations.komet_issues: Issues created via KOMET-developed softwareopencitations.komet_done: Successfully processed KOMET contributionsopencitations.komet_pending: KOMET contributions awaiting processingopencitations.komet_invalid: KOMET contributions marked as invalid
- Wikidata
wikidata.p1343_scholarly_count: Scholarly articles with "described by source" property (baseline reference)wikidata.komet_provenance_count: Items with KOMET provenancewikidata.journals.{QID}.articles: Article count per partner journalwikidata.journals.{QID}.citations_p2860: Outgoing citations per partner journal
Historical statistics are stored in komet_timeline.json using a hierarchical format, grouping by source and storing a compact series of {t, v} (timestamp, value) pairs.
Metrics include descriptive names, descriptions, and units.
New entries are only added when values change.
Journal metadata is stored once per journal.
The evaluation focuses on 15 journals from 8 collaboration partners that have committed to testing KOMET plugins:
| Partner | Journals | Platforms |
|---|---|---|
| KIM Universität Konstanz | 1 | OJS 3.3 |
| WWU Münster | 2 | OJS 3.3 |
| Julius Kühn-Institut | 3 | OJS 3.3 (OpenAgrar) |
| heiJOURNALS Heidelberg | 6 | OJS 3.2 |
| Deutsches Archäologisches Institut | 1 | OJS 3.4 |
| JOSIS / TU Dresden | 1 | OJS 3.3 |
| ZHB Luzern | 2 | OJS 3.3 |
Each journal is identified by its Wikidata QID (where available) to enable precise SPARQL queries. A 2022 baseline was established for comparison with current values.
- P1343 (described by source): Initial analysis showed only ~14 scholarly articles use this property in Wikidata, making it unsuitable for tracking KOMET contributions. The project shifted focus to OpenCitations as the primary platform.
- Wikidata query timeouts: Some aggregate queries (e.g., total P2860 citation count) may timeout due to dataset size (~3.4M citation relationships).
- OpenCitations processing lag: Crowdsourced deposits may take weeks to be processed and appear in the OpenCitations corpus.
- Partner journal coverage: Not all partner journals have Wikidata entries; 7 additional journals are tracked but cannot yet be queried via SPARQL.
The notebook is designed to run weekly via GitHub Actions CI:
- Execute
komet_evaluation.ipynbusingnbconvert - Update
komet_timeline.jsononly when metric values change - Generate HTML report to
docs/for GitHub Pages hosting - Commit and push updated data files
In addition, an annual CI action at the last day of December creates a new release, which is automatically deposited on Zenodo (https://doi.org/10.5281/zenodo.18479191).
| File | Description |
|---|---|
komet_evaluation.ipynb |
Main analysis notebook |
komet_helpers.py |
Python helper functions for API queries |
komet_timeline.json |
Timestamped observations log |
komet_report_data.json |
Latest report snapshot |
docs/index.html |
HTML report for GitHub Pages, rendered from notebook |
docs/komert_report.pdf |
PDF report for Zenodo archival, rendered from notebook |
Makefile |
Build automation for local development and CI |
templates/collapsible/ |
Custom nbconvert template for collapsible code cells |
- OPTIMETA Citations Plugin: https://github.com/TIBHannover/optimetaCitations
- OPTIMETA Geo Plugin: https://github.com/TIBHannover/optimetaGeo
- OPTIMETA Plugin Shared: https://github.com/TIBHannover/optimeta-plugin-shared
- OPTIMAP: https://optimap.science/
- OpenCitations Crowdsourcing: https://github.com/opencitations/crowdsourcing
- Python 3.11+ (with
venvmodule) - Make
# Create virtual environment and install dependencies
make install
# Run the evaluation notebook interactively
.venv/bin/jupyter notebook komet_evaluation.ipynb
# Or run helper module directly for quick test
.venv/bin/python komet_helpers.pyThe project uses a Makefile that automatically manages a local .venv virtual environment. Run make help to see all available targets:
make help # Show available targets
make install # Create venv and install Python dependencies
make run # Execute notebook (updates timeline and report data)
make html # Generate HTML report in docs/
make update # Run notebook and generate HTML (full update)
make check # Verify notebook executes without errors
make serve # Start local HTTP server to preview docs/
make clean # Remove generated HTML and report data
make clean-all # Remove all generated files including timeline
make clean-venv # Remove virtual environment
make distclean # Remove everything (clean-all + clean-venv)The Makefile automatically creates and uses a .venv directory for Python dependencies. This ensures:
- Reproducible builds locally and in CI
- No interference with system Python packages
- Easy cleanup with
make clean-venv
To start fresh with a clean environment:
make distclean # Remove all generated files and venv
make update # Recreate venv, install deps, run notebook, generate HTMLTo regenerate the HTML report for preview:
# Full update: execute notebook and generate HTML
make update
# Preview the generated HTML locally
make serve
# Then open http://localhost:8000 in your browserTo generate a PDF report using typst:
make pdf # Generates docs/komet_report.pdfThe PDF is generated by converting the notebook to markdown via nbconvert, then compiling with typst using the cmarker package. Code cells are excluded; only narrative and visualizations are included.
The notebook uses cell tags to control HTML export behavior:
| Tag | Effect |
|---|---|
remove-cell |
Cell completely removed from HTML export |
code_shown |
Code cell expanded by default (otherwise collapsed) |
Code cells without tags are collapsed but can be expanded by clicking "Show Code".
The GitHub Actions workflow (.github/workflows/update-report.yml) automates report generation:
- Install pandoc 3.x and typst for PDF generation
make install- Create venv and install Python dependenciesmake update- Execute notebook and generate HTML- Check if data content changed (ignoring timestamp-only updates)
make pdf- Generate PDF report (only if data changed)- Commit and push all outputs (only if data changed)
Change detection: The workflow compares the report data excluding timestamps. This prevents unnecessary commits when the data sources return identical values but the report timestamp updates.
The workflow runs automatically on the 1st of each month at 06:00 UTC and can be triggered manually via the GitHub Actions UI.
The CI workflow supports creating GitHub releases both automatically and manually.
Automatic year-end release:
- Runs on December 31st at 23:00 UTC
- Creates release with tag
vYYYY(e.g.,v2025) - Includes
komet_report.pdfandindex.htmlas assets
Manual release (for first release or out-of-schedule releases):
- Go to Actions → Update KOMET Report
- Click Run workflow
- Check Create a GitHub release
- Optionally set a custom Release tag (e.g.,
v2024,v2025.1,v2025-interim)- Leave empty for auto-generated
vYYYYtag - If the tag already exists,
.1,.2, etc. is appended automatically
- Leave empty for auto-generated
- Click Run workflow
Release assets:
komet_report.pdf- PDF version of the evaluation reportindex.html- Interactive HTML version (also available via GitHub Pages)
If you use this dataset or software, please cite it as:
Nüst, D., Niers, T., Hauschke, C., & Yücel, G. (2026). KOMET Project - Open Metadata Evaluation Report [Data set]. Zenodo. https://doi.org/10.5281/zenodo.18479191
@dataset{komet_report_2026,
author = {Nüst, Daniel and Niers, Tom and Hauschke, Christian and Yücel, Gazi},
title = {{KOMET Project - Open Metadata Evaluation Report}},
year = 2026,
publisher = {Zenodo},
doi = {10.5281/zenodo.18479191},
url = {https://doi.org/10.5281/zenodo.18479191}
}This work is funded by the German Federal Ministry of Education and Research (BMBF) under grant number 16TOA039.
The KOMET project (Förderkennzeichen 16TOA039) is part of the funding initiative "Förderung von Projekten zur Etablierung einer gelebten Open-Access-Kultur in der deutschen Forschungs- und Wissenschaftspraxis."
Code: CC0-1.0
Report outputs: CC-BY-4.0
This project uses data from the following sources:
| Source | License | Link |
|---|---|---|
| Wikidata | CC0 1.0 | https://www.wikidata.org/wiki/Wikidata:Licensing |
| OpenCitations | CC0 1.0 | https://opencitations.net/about#licensing |