How to deal with constants. #162

davidorme · 2023-02-04T21:24:23Z

davidorme
Feb 4, 2023
Maintainer

Off the back of a couple of recent conversations on PRs and the open issue #129, here's a suggestion to get the ball rolling on a strategy for constants. This is straight up biased as I'm suggesting we adopt something very like what I have in pyrealm but if there are strong arguments for a different approach I'm equally happy to update pyrealm!

Links to those conversations:

Highly relevant discussion on the Radiation class PR that touches on different approaches: Feature/radiation dummy 2 #155 (comment)
Discussion on using dataclasses for constants in the
CarbonPool model: Minimal carbon pool model #134 (comment)

Constants proposal

This is a sketch for how I think we could structure the "constants". These values are not all 'constant' constants, but they are things that would be constant for a simulation or multiple sets of simulations.

So my starting points are these statements:

Constants should not be hard coded into functions and should be exposed in some way that allows them to be configured for a simulation. That seems mad for things like the gas constant but for many constants - such as coefficients of empirical
processes - we will want to be able to explore the effects of changing defaults.
Each model will need its own constants. We previously discussed a single core.constants.py module. We might still want that for shared or global constants but it won't fly for the modular models approach as a custom BaseModel needs to be
able to provide its own constants without needing to hack the core.constant module: it should be plug and play.
Functions or methods that use calculations should accept an argument that provides the constants used within the function, so that there is a clear way to pass a specific set of those constants into the calculations. Ultimately that means that higher level functions (such as model __init__ methods) also should include the various constant arguments so they can pass them on to called functions.

So, as an example of how I think we could do this, let's say we have a new Hunting model that has a core function to model hunted biomass of a location as a function of distance from habitation and topographic complexity. That might then have:

A hunting/constants.py file containing:

@dataclass
class HuntingConstants:
    intercept: float = 5
    dist_slope: float = -0.1
    topo_slope: float = -0.9
    interaction: float = -0.02

And then a hunting/models.py containing (fragmentary code ahead...):

def hunting_pressure(
    topo: NDArray, 
    dist: NDArray, 
    hunting_constants = HuntingConstants()
) -> NDArray:

    return (
        hunting_constants.intercept + 
        hunting_constants.dist_slope * dist + 
        hunting_constants.topo_slope * topo + 
        hunting_constants.interaction * dist * topo
    ) 

class Hunting(BaseModel):

    model_name = 'hunting'

    def __init__(
        data: Data,
        ..., 
        hunting_constants: HuntingConstants = HuntingConstants()
    ) -> None:
        ...
        self.data = data
        # The set of constants are a key attribute of the model instance
        self.hunting_constants = hunting_constants
        ...

    def update(...) -> None:
        ...
        # The constants get passed on to other functions used within the model
        pressure = hunting_pressure(self.data['topo'], self.data['dist'], self.hunting_constants)
        ...

    def from_config(data: Data, ..., config: dict) -> HuntingModel:
        ...
         # If the model config changes constant defaults, extract them and use them in creating 
         # the returned model instance. 
         hunting_const: dict = {}
         if 'constants' in config and 'HuntingConstants' in config['constants']:
             hunting_const = config['constants']['HuntingConstants']
         ...
        
         return HuntingModel(data, ..., hunting_constants=HuntingConstants(**hunting_const))

Programatically, you can then just use the defaults or adjust them:

hunting_model = HuntingModel(data, ...)
hunting_model2 = HuntingModel(data, ..., hunting_constants=HuntingConstants(intercept=7))

Then we have to allow users to configure this for simulations run from a command line configuration. We can add a constants section to module JSONSchema definitions so that users can alter constants in the model config:

[hunting.constants.HuntingConstants]
slope = 7

Then, the from_config factory method can intercept that section of the configuration and use it to initialise the HuntingConstants instance for the model instance.

There are at least two areas that seem possibly iffy.

JSONSchema and dataclass synchronisation

There is overlap in the role of the dataclass, which is the programmatic API, and the JSONSchema for the module configuration, which is setting up the programmatic API from a configuration. I don't have a good handle on how to avoid breaking DRY.

We could just have the JSON schema allow hunting.constants.HuntingConstants to be any random dictionary and then just use try in from_config to handle badly configured constants. Then everything is defined by the dataclass, which is clean and DRY but does mean that the use of JSONSchema to clean the configuration is inconsistent.
At the other end, we could duplicate the HuntingConstants dataclass definition in the JSONSchema, right down to the default values, types, array sizes etc. Then the configuration gets cleaned before it goes near the from_config factory, but it is highly repetitive of the data structure definition in the dataclass and not remotely DRY.

'Bundling' constants

With any remotely complex model, there are likely to be several sets of parameters for several different core functions. If all of these are defined as individual dataclasses then it seems like you'd end up with potentially very long __init__ methods with
multiple different constant dataclasses as arguments.

Here I'd lean towards having bundled constant dataclasses. So the HuntingConstants dataclass might contain all of the different 'constants' used in HuntingModel and you only have one or two arguments for setting constants in __init__ (HuntingConstants and maybe CoreConstants). That does mean that functions get passed a dataclass containing a whole load of parameters they don't need, along with the ones they do, but it seems cleaner. If it turns out that they get huge and splitting them into a couple of logical groups makes life easier, then that's fine too.

But you could have lots of dataclasses in hunting.constants, which are all unique to a specific function. I don't see that being as clean an interface, but there may be an obvious solution.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to deal with constants. #162

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

How to deal with constants. #162

davidorme Feb 4, 2023 Maintainer

Constants proposal

JSONSchema and dataclass synchronisation

'Bundling' constants

Replies: 0 comments

davidorme
Feb 4, 2023
Maintainer