Skip to content

Conversation

@lkstrp
Copy link
Member

@lkstrp lkstrp commented Nov 19, 2025

Supersedes #1547

Closes #1547
Closes #1734
Closes #1721

Changes proposed in this Pull Request

This PR revives and supersedes #1547. This time using pydantic which can export to JSON Schema instead of writing the JSON Schema directly.

With the new approach we can do a couple of new things and keep anything config related at one place:

  1. Validation
    • Run any validation: Just simple type checks (bool, list[str]), Literals (e.g. for logging levels Literal["DEBUG", "INFO", "WARNING", "ERROR", "CRITICAL"], as well as any custom validation you can come up with. This PR tries to not break anything, which means only types, and Literals from the previous doctable description/values column have been taken over. In future we wanna get as strict as possible.
  2. Defaults
    • Defaults are now defined in the model and exported to config.default.yaml and schema.json. The config can be used locally as is, but must be kept in sync upstream with the validation config. Since the file is auto generated now, we also lose all comments.
  3. Typed Config Object
    • We can now use the pydantic schema object instead of the previous dict config. This gives us a typed object. Which means that we can have auto completion and documentation in the IDE (not just in the config.yaml), type checks via mypy etc and attribute style config selection. Both dict-like and attribute object access is then possible: snakemake.config["run"] and snakemake.config.run.
      • This is currently not implemented. We need to get rid of any use of dash-case in the config and move to snake_case. In the export, currently both is supported via aliases, but python attributes do not allow for dash-case. We could also duck type this, but it's cleaner to just remove them.
  4. Deprecation of config keys
  5. Docs
    • The schema is also used in the documentation, which contains additional information such as types, defaults, examples and descriptions. All doctables have been removed. Check new docs build here.
    • When the schema is uploaded to schemastore.org (what we can do after this PR is merged), most IDEs will automatically give you inline tooltips on config files. It is also possible to set it up manually.
image

Other changes

  • Splits up workflow in seperate jobs for unit/ integration tests
  • Removes old requirements.txt for docs, pins docs dependencies in pixi
  • Fixes nested string pre-commit issues

Checklist

  • Add plotting
  • Docs

Post Merge

  • Apply for schemastore with **/pypsa-eur/config/*.yaml and **/pypsa-eur*/config/*.yaml
  • Move from dash-case to camel_case for all config keys and add typed object
  • Stricter validation

@lkstrp lkstrp changed the title minimal version Config Validation with Pydantic Nov 19, 2025
@lkstrp lkstrp marked this pull request as draft November 19, 2025 13:09
@lkstrp
Copy link
Member Author

lkstrp commented Dec 16, 2025

This is a very minimal example right now, and not integrated into the Snakefile. I want to discuss two things first:

  1. On 4: We could define the defaults right away in the model, which means it would contain everything related: description, validation, type, examples as well as defaults. That would be the cleanest approach. config.default.yaml can still be automatically generated to not break any existing workflows, but if you wanna change defaults you would need to touch the model directly. This is what I vote for and what is done in this minimal version. We could also keep the config.default.yaml and only use the model for validation, so keep defaults seperated from docs, validation, types etc. This needs to be decided now. @fneum @coroa @FabianHofmann
  2. On 2: In a first version we wanna just do basic validation. But if we wanna define more complex stuff, it is a bit a question where do we define it. The cleanest would be to keep that all in lib/validation/config and not in any scripts. Similar to the example I have right now. Nothing we need to decide right now and I just wanna hear some opinions

I went with option 1 (defining defaults in Pydantic), which means that config.default.yaml can be used locally as is, but must be kept in sync upstream with the validation config. Since this file is auto generated now, we also lose all comments.

@lkstrp lkstrp marked this pull request as ready for review December 16, 2025 11:30
@lkstrp lkstrp requested review from coroa and fneum December 16, 2025 11:31
Copy link
Member

@fneum fneum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, this is very rigorous! It will be super helpful to have some checking of configuration files for consistency! The only disadvantage I see is that developing new configuration settings now becomes a bit more complicated and less accessible. But I tend to agree that it is best to keep everything in one place.

  • I am glad all the comments disappear! We want all documentation in one place.

  • config.default.yaml and schema.json are autogenerated and should not be touched. This should be noted when writing the release note.

  • Did you consider another location for the config validation code? Maybe just ./validation/config, then scripts is for the rules? Not sure, but the current structure looks quite deeply nested in subfolders.

  • Thinking ahead, will the display also work in mkdocs with a similar plugin?

  • Sorry, I may have created more configuration settings by merging #1935.

  • Could you perhaps write up a few general principles of working with pydantic with links to resources in the documentation?

  • Could you maybe summarise in one sentence (just for reference), why this is better than the standard validation in snakemake? https://snakemake.readthedocs.io/en/stable/snakefiles/configuration.html#validation

Comment on lines 63 to 67
.. dropdown:: Details

.. jsonschema:: ../config/schema.json#/properties/version
:lift_description:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the core information of the configuration page. I would not hide it in a dropdown, rather the other way around: the default config yaml could be in a dropdown.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've cleaned it up now, but still kept the detailed schema in a drop down menu. Otherwise, some configurations look just too bloated

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @fneum that we should put the generated jsonschema directive first and without the dropdown. I think i would even re-include the defaults there. I think i would only keep it in the dropdown for the data section where it does not add any content.

@lkstrp
Copy link
Member Author

lkstrp commented Dec 17, 2025

The only disadvantage I see is that developing new configuration settings now becomes a bit more complicated and less accessible.

For development, not really. You can still do whatever you want locally in your config.default.yaml, and Snakemake will use only this. The worst case is that you need to comment out the validation call. Just before merging, you need to adjust the schema to keep both versions in sync. I will explain this all in the docs.

Did you consider another location for the config validation code? Maybe just ./validation/config, then scripts is for the rules? Not sure, but the current structure looks quite deeply nested in subfolders.

I thought about it, but ended up with the current path:

  • config subdir is included since we wanna aim to add also validation on input data for any rule, which would sit next to the config validation. This is also the reason why everything sits in scripts.
  • lib I added in the desperate try to plan for more structure in pypsa-eur. _helpers.py could go there, tons of boilerplate of rules, splitting up rules etc.
  • Putting config validation in an extra ./validation project level dir, while still having rule validation in scripts feels odd. Also adding it to ./configs isn't a good idea.
    Which leads to ./scripts/lib/validation/config, but with basically to placeholder dirs which should be further filled in future.j

Thinking ahead, will the display also work in mkdocs with a similar plugin?

Haven't tried it yet, but a way will be found. This will not be a blocker to deprecate doctables.

Could you maybe summarise in one sentence (just for reference), why this is better than the standard validation in snakemake? https://snakemake.readthedocs.io/en/stable/snakefiles/configuration.html#validation

Using Snakemake for validation doesn't change much. Those are basically two questions: How you write the schema and how you validate it:

  • In Add config validation via json-schema #1547, I wrote the schema straight away in JSON Schema format and you could use snakemake to validate it. But Snakemake does not check for deprecated arguments, which JSON schemas in general support, and therefore I used already there a Python library to validate the schema.
  • In here, the schema is written as a Pydantic model. This makes it much easier to write and maintain than having everything in JSON. It can still be exported to a schema representation, which is used to build docs, or get IDE documentation etc. And we could still use snakemake to validate on that JSON schema representation. But we can also use Pydantic for validation, which is done here, and allows us to have:
    • More complex validations in Pydantic, which can't be exported to the JSON schema representation.
    • Not only validate the Snakemake config (which is a simple Python dictionary), but also cast it to a typed Python object, which allows linters to know the type and gives you autocompletion on possible config keys, etc. See (3) above. This is not implemented here, but would be nice to have in the future.

@fneum
Copy link
Member

fneum commented Dec 17, 2025

Thanks for the responses! That all sounds reasonable.

@coroa
Copy link
Member

coroa commented Dec 19, 2025

Since [config.default.yaml] is auto generated now, we also lose all comments.

proposal for using ruamel.yaml to bring back comments (not all top-level sections do have a doc heading, i think), in lkstrp#34

@coroa
Copy link
Member

coroa commented Dec 19, 2025

Apply for schemastore with **/pypsa-eur/config/.yaml and */pypsa-eur/config/.yaml

I think i would prefer to add language-server comments to the yaml files linking relatively to the schema. ie. add to config.default.yaml:

# yaml-language-server: $schema=./schema.json

version: v2025.07.0
tutorial: false

This would be more self-contained and apply the right version automatically. This comment is interpreted as is by zed, in vscode one needs the redhat's vscode yaml extension.

For using schemastore we would probably have to come up with a versioning scheme?

@coroa
Copy link
Member

coroa commented Dec 19, 2025

Currently links in the description are markdown formatted and therefore don't parse correctly in the docs. In principle this is temporary, since we are switching to mkdocs eventually. I added a few commits to lkstrp#34 that convert those links to sphinx and then add a markdownDescription field to the generated json schema. This renders fine correctly in sphinx and in the zed and vscode tooltips.

Copy link
Member

@coroa coroa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i like it. i think mainly some guidance for soft-fork authors on how this should be extended is missing.

Comment on lines 63 to 67
.. dropdown:: Details

.. jsonschema:: ../config/schema.json#/properties/version
:lift_description:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @fneum that we should put the generated jsonschema directive first and without the dropdown. I think i would even re-include the defaults there. I think i would only keep it in the dropdown for the data section where it does not add any content.

Comment on lines +42 to +45
osm: _DataSourceConfig = Field(
default_factory=lambda: _DataSourceConfig(source="archive", version="latest"),
description="OSM data source configuration.",
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use the _DataSourceConfig defaults implicitly?

Suggested change
osm: _DataSourceConfig = Field(
default_factory=lambda: _DataSourceConfig(source="archive", version="latest"),
description="OSM data source configuration.",
)
osm: _DataSourceConfig = Field(
default_factory=_DataSourceConfig,
description="OSM data source configuration.",
)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Pin documentation package requirements

3 participants