Config driven detectors - part 2 #462

ascillitoe · 2022-03-16T00:03:50Z

This is the second of a series of PR's for the config-driven detector functionality. The original PR (#389) has been split into a number of smaller PR's to aid the review process.

Summary of PR

This PR introduces validation of the detector config files using pydantic. Pydantic models are defined for each detector in alibi_detect.utils.schemas. These are then used in alibi_detect.utils.loading.validate_config; a public function to validate a detector's config dictionary.

Use cases

The validate_config function will be used in a number of ways. Consider the load_detector workflow` (functionality to be introduced in part III):

validate_config will be used:

On the unresolved config (where artefacts are still strings), e.g. validate_config(cfg).
After resolve_config has been called (so artefacts are now runtime objects such as ndarray's) e.g. validate_config(cfg, resolved=True).
On raw config dictionaries (when an advanced user has parsed a config.toml themselves with read_config, and perhaps has manipulated the config dict). For this use case, the validate_config has been made public.

Validate config return

A small (but important!) detail -> The validate_config returns a dict, which is the output of the pydantic model's .dict() method. This is the same config dict that was passed in, but with missing fields populated by their default values. This is a somewhat philosophical design choice (I think), in that it means we internally we always deal with fully populated config dicts. We can set additional metadata etc in schemas.py at a later date, and can set defaults for artefact dictionaries etc (see below). A downside is that we will have to ensure the kwarg defaults set in schemas.py are kept in sync with the detector kwargs themself (this could also be a positive if we wanted divergence in some cases).

Pydantic models for artefacts

Artefacts are specified in a config with dictionaries, which may then have further artefact dictionaries within them (see preprocess_fn, modeletc in theconfig.tomlexample in schematic above). These artefact dictionaries are also validated e.g. seePreprocessConfigandPreprocessConfigResolvedinschemas.py`.

Example

Example notebook:
https://gist.github.com/ascillitoe/8ea123776151368a7a6a91688a1809cb

alibi_detect/base.py

alibi_detect/cd/tensorflow/mmd.py

setup.py

alibi_detect/utils/schemas.py

alibi_detect/utils/loading.py

alibi_detect/utils/schemas.py

alibi_detect/utils/loading.py

jklaise

Looks good overall, just wanted some clarification on the conventions for the pydantic models, specifically wrt to Optional values and None defaults.

I also saw a few TODOs wrt to conditional checking of required parameters given some other parameters. Is this fairly easy to add (I'm assuming pydantic validators)?

ascillitoe · 2022-03-16T17:03:14Z

I also saw a few TODOs wrt to conditional checking of required parameters given some other parameters. Is this fairly easy to add (I'm assuming pydantic validators)?

I believe this is possible. But tbh I haven't looked into it too deeply as wanted to get the basic functionality in first.

ascillitoe · 2022-03-16T18:44:37Z

@jklaise I've resolved a few of your comments, feel free to unresolve if you disagree. I've responded to the remaining larger comments, would you be able to take a look and mark as resolved (or not) please?

mauicv

LGTM

alibi_detect/utils/loading.py

Addition of pydantic validation of detector configs

Add config validation

80a72c5

ascillitoe marked this pull request as ready for review March 16, 2022 00:22

ascillitoe requested review from jklaise and mauicv March 16, 2022 00:22

This was referenced Mar 16, 2022

Config detectors part2 #460

Closed

WIP: Config driven detectors #389

Closed