Config Validation with Pydantic #1912

lkstrp · 2025-11-19T13:09:24Z

Supersedes #1547

Closes #1547
Closes #1734
Closes #1721

Changes proposed in this Pull Request

This PR revives and supersedes #1547. This time using pydantic which can export to JSON Schema instead of writing the JSON Schema directly.

With the new approach we can do a couple of new things and keep anything config related at one place:

Validation
- Run any validation: Just simple type checks (bool, list[str]), Literals (e.g. for logging levels Literal["DEBUG", "INFO", "WARNING", "ERROR", "CRITICAL"], as well as any custom validation you can come up with. This PR tries to not break anything, which means only types, and Literals from the previous doctable description/values column have been taken over. In future we wanna get as strict as possible.
Defaults
- Defaults are now defined in the model and exported to config.default.yaml and schema.json. The config can be used locally as is, but must be kept in sync upstream with the validation config. Since the file is auto generated now, we also lose all comments.
Typed Config Object
- We can now use the pydantic schema object instead of the previous dict config. This gives us a typed object. Which means that we can have auto completion and documentation in the IDE (not just in the config.yaml), type checks via mypy etc and attribute style config selection. Both dict-like and attribute object access is then possible: snakemake.config["run"] and snakemake.config.run.
  - This is currently not implemented. We need to get rid of any use of dash-case in the config and move to snake_case. In the export, currently both is supported via aliases, but python attributes do not allow for dash-case. We could also duck type this, but it's cleaner to just remove them.
Deprecation of config keys
- Deprecate keys and warn user via Deprecated Fields
Docs
- The schema is also used in the documentation, which contains additional information such as types, defaults, examples and descriptions. All doctables have been removed. Check new docs build here.
- When the schema is uploaded to schemastore.org (what we can do after this PR is merged), most IDEs will automatically give you inline tooltips on config files. It is also possible to set it up manually.

Other changes

Splits up workflow in seperate jobs for unit/ integration tests
Removes old requirements.txt for docs, pins docs dependencies in pixi
Fixes nested string pre-commit issues

Checklist

Add plotting
Docs

Post Merge

Apply for schemastore with **/pypsa-eur/config/*.yaml and **/pypsa-eur*/config/*.yaml
Move from dash-case to camel_case for all config keys and add typed object
Stricter validation

lkstrp · 2025-12-16T09:45:13Z

This is a very minimal example right now, and not integrated into the Snakefile. I want to discuss two things first:

On 4: We could define the defaults right away in the model, which means it would contain everything related: description, validation, type, examples as well as defaults. That would be the cleanest approach. config.default.yaml can still be automatically generated to not break any existing workflows, but if you wanna change defaults you would need to touch the model directly. This is what I vote for and what is done in this minimal version. We could also keep the config.default.yaml and only use the model for validation, so keep defaults seperated from docs, validation, types etc. This needs to be decided now. @fneum @coroa @FabianHofmann

On 2: In a first version we wanna just do basic validation. But if we wanna define more complex stuff, it is a bit a question where do we define it. The cleanest would be to keep that all in lib/validation/config and not in any scripts. Similar to the example I have right now. Nothing we need to decide right now and I just wanna hear some opinions

I went with option 1 (defining defaults in Pydantic), which means that config.default.yaml can be used locally as is, but must be kept in sync upstream with the validation config. Since this file is auto generated now, we also lose all comments.

for more information, see https://pre-commit.ci

fneum

Great, this is very rigorous! It will be super helpful to have some checking of configuration files for consistency! The only disadvantage I see is that developing new configuration settings now becomes a bit more complicated and less accessible. But I tend to agree that it is best to keep everything in one place.

I am glad all the comments disappear! We want all documentation in one place.
config.default.yaml and schema.json are autogenerated and should not be touched. This should be noted when writing the release note.
Did you consider another location for the config validation code? Maybe just ./validation/config, then scripts is for the rules? Not sure, but the current structure looks quite deeply nested in subfolders.
Thinking ahead, will the display also work in mkdocs with a similar plugin?
Sorry, I may have created more configuration settings by merging #1935.
Could you perhaps write up a few general principles of working with pydantic with links to resources in the documentation?
Could you maybe summarise in one sentence (just for reference), why this is better than the standard validation in snakemake? https://snakemake.readthedocs.io/en/stable/snakefiles/configuration.html#validation

.github/workflows/test.yaml

config/config.default.yaml

doc/configuration.rst

fneum · 2025-12-17T11:02:04Z

doc/configuration.rst

+.. dropdown:: Details
+
+   .. jsonschema:: ../config/schema.json#/properties/version
+      :lift_description:
+


This is the core information of the configuration page. I would not hide it in a dropdown, rather the other way around: the default config yaml could be in a dropdown.

I've cleaned it up now, but still kept the detailed schema in a drop down menu. Otherwise, some configurations look just too bloated

I agree with @fneum that we should put the generated jsonschema directive first and without the dropdown. I think i would even re-include the defaults there. I think i would only keep it in the dropdown for the data section where it does not add any content.

doc/configuration.rst

scripts/lib/validation/config/__init__.py

scripts/lib/validation/config/biomass.py

lkstrp · 2025-12-17T13:19:49Z

The only disadvantage I see is that developing new configuration settings now becomes a bit more complicated and less accessible.

For development, not really. You can still do whatever you want locally in your config.default.yaml, and Snakemake will use only this. The worst case is that you need to comment out the validation call. Just before merging, you need to adjust the schema to keep both versions in sync. I will explain this all in the docs.

Did you consider another location for the config validation code? Maybe just ./validation/config, then scripts is for the rules? Not sure, but the current structure looks quite deeply nested in subfolders.

I thought about it, but ended up with the current path:

config subdir is included since we wanna aim to add also validation on input data for any rule, which would sit next to the config validation. This is also the reason why everything sits in scripts.
lib I added in the desperate try to plan for more structure in pypsa-eur. _helpers.py could go there, tons of boilerplate of rules, splitting up rules etc.
Putting config validation in an extra ./validation project level dir, while still having rule validation in scripts feels odd. Also adding it to ./configs isn't a good idea.
Which leads to ./scripts/lib/validation/config, but with basically to placeholder dirs which should be further filled in future.j

Thinking ahead, will the display also work in mkdocs with a similar plugin?

Haven't tried it yet, but a way will be found. This will not be a blocker to deprecate doctables.

Could you maybe summarise in one sentence (just for reference), why this is better than the standard validation in snakemake? https://snakemake.readthedocs.io/en/stable/snakefiles/configuration.html#validation

Using Snakemake for validation doesn't change much. Those are basically two questions: How you write the schema and how you validate it:

In Add config validation via json-schema #1547, I wrote the schema straight away in JSON Schema format and you could use snakemake to validate it. But Snakemake does not check for deprecated arguments, which JSON schemas in general support, and therefore I used already there a Python library to validate the schema.
In here, the schema is written as a Pydantic model. This makes it much easier to write and maintain than having everything in JSON. It can still be exported to a schema representation, which is used to build docs, or get IDE documentation etc. And we could still use snakemake to validate on that JSON schema representation. But we can also use Pydantic for validation, which is done here, and allows us to have:
- More complex validations in Pydantic, which can't be exported to the JSON schema representation.
- Not only validate the Snakemake config (which is a simple Python dictionary), but also cast it to a typed Python object, which allows linters to know the type and gives you autocompletion on possible config keys, etc. See (3) above. This is not implemented here, but would be nice to have in the future.

fneum · 2025-12-17T14:40:34Z

Thanks for the responses! That all sounds reasonable.

coroa · 2025-12-19T11:59:03Z

Since [config.default.yaml] is auto generated now, we also lose all comments.

proposal for using ruamel.yaml to bring back comments (not all top-level sections do have a doc heading, i think), in lkstrp#34

coroa · 2025-12-19T12:08:42Z

Apply for schemastore with **/pypsa-eur/config/.yaml and */pypsa-eur/config/.yaml

I think i would prefer to add language-server comments to the yaml files linking relatively to the schema. ie. add to config.default.yaml:

# yaml-language-server: $schema=./schema.json

version: v2025.07.0
tutorial: false

This would be more self-contained and apply the right version automatically. This comment is interpreted as is by zed, in vscode one needs the redhat's vscode yaml extension.

For using schemastore we would probably have to come up with a versioning scheme?

coroa · 2025-12-19T17:44:28Z

Currently links in the description are markdown formatted and therefore don't parse correctly in the docs. In principle this is temporary, since we are switching to mkdocs eventually. I added a few commits to lkstrp#34 that convert those links to sphinx and then add a markdownDescription field to the generated json schema. This renders fine correctly in sphinx and in the zed and vscode tooltips.

coroa

i like it. i think mainly some guidance for soft-fork authors on how this should be extended is missing.

coroa · 2025-12-19T17:50:14Z

doc/configuration.rst

+.. dropdown:: Details
+
+   .. jsonschema:: ../config/schema.json#/properties/version
+      :lift_description:
+


I agree with @fneum that we should put the generated jsonschema directive first and without the dropdown. I think i would even re-include the defaults there. I think i would only keep it in the dropdown for the data section where it does not add any content.

coroa · 2025-12-19T17:54:33Z

scripts/lib/validation/config/data.py

+    osm: _DataSourceConfig = Field(
+        default_factory=lambda: _DataSourceConfig(source="archive", version="latest"),
+        description="OSM data source configuration.",
+    )


Why not use the _DataSourceConfig defaults implicitly?

Suggested change

osm: _DataSourceConfig = Field(

default_factory=lambda: _DataSourceConfig(source="archive", version="latest"),

description="OSM data source configuration.",

)

osm: _DataSourceConfig = Field(

default_factory=_DataSourceConfig,

description="OSM data source configuration.",

)

minimal version

edbe9bf

lkstrp changed the title ~~minimal version~~ Config Validation with Pydantic Nov 19, 2025

lkstrp marked this pull request as draft November 19, 2025 13:09

amos-schledorn mentioned this pull request Nov 20, 2025

Versioning newly incorporated datasets open-energy-transition/pypsa-eur#67

Open

8 tasks

lkstrp added 7 commits December 15, 2025 17:24

new config.default.yaml

da8902a

exclude schema from pre-commit

abda3d4

add model for default config

16ca5b5

add schema.json

d6c0844

ci: add schema in sync check

a5fd84e

duck type model schema

e833f5f

show schema in docs

9dfe9a0

lkstrp and others added 7 commits December 16, 2025 11:58

Merge remote-tracking branch 'upstream/master' into pydantic

f7c4dac

move ci check to unit test

255ea49

updade lock file and add data.py schema

baeb4e4

update schema

2c7ee04

fix pre-commit, split workflow

dc6dccd

[pre-commit.ci] auto fixes from pre-commit.com hooks

aac8d11

for more information, see https://pre-commit.ci

shorten workflow name

2423ca7

lkstrp marked this pull request as ready for review December 16, 2025 11:30

lkstrp requested review from coroa and fneum December 16, 2025 11:31

lkstrp force-pushed the pydantic branch from 901950d to 93d496f Compare December 16, 2025 12:39

fix undocumented literals

2b797d9

lkstrp force-pushed the pydantic branch from 93d496f to 2b797d9 Compare December 16, 2025 13:38

lkstrp added 2 commits December 17, 2025 10:47

Merge remote-tracking branch 'upstream/master' into pydantic

63533a0

merge changes from PyPSA#1940

61d3043

fneum requested changes Dec 17, 2025

View reviewed changes

fix docs

45d5f63

coroa reviewed Dec 20, 2025

View reviewed changes

This was referenced Dec 29, 2025

feat: add config validation and enable config key deletion #1734

Closed

Add config validation via json-schema #1547

Closed

Config Validation with Pydantic #1912

Are you sure you want to change the base?

Config Validation with Pydantic #1912

Uh oh!

Conversation

lkstrp commented Nov 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes proposed in this Pull Request

Checklist

Uh oh!

lkstrp commented Dec 16, 2025

Uh oh!

fneum left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

fneum Dec 17, 2025

Choose a reason for hiding this comment

Uh oh!

lkstrp Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

coroa Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lkstrp commented Dec 17, 2025

Uh oh!

fneum commented Dec 17, 2025

Uh oh!

coroa commented Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coroa commented Dec 19, 2025

Uh oh!

coroa commented Dec 19, 2025

Uh oh!

coroa left a comment

Choose a reason for hiding this comment

Uh oh!

coroa Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

coroa Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

lkstrp commented Nov 19, 2025 •

edited

Loading

fneum left a comment •

edited

Loading

coroa commented Dec 19, 2025 •

edited

Loading