Shared test and benchmark corpus for CityJSON data handling software.
This repository keeps the corpus contract in one place.
- Painstakingly curated and handwritten CityJSON spec conformance test cases in the
cases/conformance/v2_0folder. - Synthetic data generated by cityjson-fake for benchmarking particular workloads (eg. attribute-heavy, geometry-heavy etc.).
- Real-world data downloaded from the 3DBAG and Basisvoorziening 3D projects.
If you are reading the repository through the docs site, start with:
docs/index.mddocs/shared-corpus.mdcases/README.mddocs/contributing.mddocs/independent-use.mddocs/licensing.md
cases/: source of truth. Each case folder contains the case metadata, the expected result, and the source or instructions for the artifact.catalog/: derived machine-readable index built fromcases/.schemas/: JSON Schemas and the short glossary for controlled values.scripts/: validation, catalog rendering, docs generation, and acquisition helpers.pipelines/: notes about how derived benchmark outputs are built.artifacts/: generated files, acquired files, and derived indexes.docs/: hand-written docs and architecture notes.
cases/is the source of truth.catalog/andartifacts/are derived outputs.- Do not edit derived files by hand when a source file or build command owns them.
- If you remove a case, run
just cleanbefore rebuilding docs so stale generated case pages do not remain in the site output.
just fmt: format Python files with ruff.just lint: validate the repo.just sync-catalog: rebuildcatalog/cases.jsonandartifacts/correctness-index.json.just generate-data: materialize generated workload data and refreshartifacts/benchmark-index.json(requires cityjson-fake).just acquire-3dbag: materialize the pinned 3DBAG workload artifacts.just acquire-basisvoorziening-3d: materialize the pinned Basisvoorziening 3D workload artifacts via the PDOK OGC API.just clean: remove generated outputs and generated docs pages.just docs-build: build the ProperDocs site.just docs-serve: serve the ProperDocs site locally.
just lint and just docs-build use the checked-in
schemas/cityjson-fake-manifest.schema.json. Only just generate-data
requires access to cityjson-fake.
This repository now uses a dual-license model for repository-authored content:
LICENSE:Apache-2.0for repository-authored code, scripts, schemas, and build logic.LICENSE-DATA:CC BY 4.0for repository-authored docs, metadata, and synthetic corpus content.- Acquired third-party data keeps the upstream license named in its
acquisition.json.
Contributions are welcome in all forms. You are welcome to create, refine, and delete cases by submitting a pull request and explaining the changes. For a detailed guide on how to contribute, see online documentation.
ChatGPT 5.4 was used to scaffold the repository, develop the schemas and structure of the corpus, write the documentation. LLM-models do not generate the actual data files that are used in tests and benchmarks by the consuming software.