Description
🚀 Feature Request
Hydra is a great tool for merging configs. E.g., it can easily start with a default config
x: 3
y: 'me'
data:
path: ???
and add user specifications
x: 2
z: 4
data:
path: ~/data
to obtain
x: 2
y: 'me'
z: 4
data:
path: ~/data
However, in many cases, especially when configuring a system made of components, we want to replace a part of a config. E.g., with a default config,
name: 'root_finder'
algorithm:
_target_: alg.GradientBased
optimizer:
_target_: optim.Adam
lr: 0.3
eps: 1e-5
we might want to change the algorithm.optimizer
to a different choice that have a completely different set of options, such as
optimizer:
_target_: optim.Newton
num_iterations: 1000
A merging behavior will result in
optimizer:
_target_: optim.Newton
lr: 0.3
eps: 1e-5
num_iterations: 1000
which is not what we want. In other words, here the two optimizer
choices are different configurable objects, rather than namespaces/subconfigs to be stitched together to form the big config file.
Now, in Hydra, AFAICT, the main mechanism of Hydra to perform this is via default list & overrides. The syntax can be a bit overly verbose. But it works. Here's the example:
# /config.yaml
defaults:
- optim@algorithm.optimizer: adam
name: 'root_finder'
algorithm:
_target_: alg.GradientBased
optimizer: ???
# /optim/adam.yaml
_target_: optim.Adam
lr: 0.3
eps: 1e-5
# /optim/newton.yaml
_target_: optim.Newton
num_iterations: 1000
and the user can use override with ~algorithm.optimizer optim@algorithm.optimizer=newton
(note the necessity of using two flags).
Okay, this might be okay if your config is this simple. But what if it is not? Say we need to select twoalgorithms, and the default optimizer is different for them, and the algorithms also come with different types..... Then we need to do
# /config.yaml
defaults:
- algorithm@algorithm_one: gradient_based_default_for_alg_one
- algorithm@algorithm_two: gradient_based_default_for_alg_two
name: 'root_finder'
algorithm_one: ???
algorithm_two: ???
# /algorithm/gradient_based_default_for_alg_one.yaml
defaults:
- optim@optimizer: adam_default_for_alg_one
_target_: alg.GradientBased
name: 'grad_based'
optimizer: ???
# /algorithm/gradient_based_default_for_alg_two.yaml
defaults:
- optim@optimizer: adam_default_for_alg_two
_target_: alg.GradientBased
name: 'grad_based'
optimizer: ???
# /algorithm/cma_es.yaml
name: 'cma_es'
# /optim/newton.yaml
_target_: optim.Newton
num_iterations: 1000
# /optim/adam_default_for_alg_one.yaml
_target_: optim.Adam
lr: 0.3
eps: 1e-5
# /optim/adam_default_for_alg_two.yaml
_target_: optim.Adam
lr: 0.1
eps: 1e-1
# /optim/newton.yaml
_target_: optim.Newton
num_iterations: 1000
Hmmm, seriously? Throwing a default list whenever a subconfig is meant to be replaced in user specification (rather than merged) quickly becomes annoying, difficult to read, and hard to manage. Surely there should be an easier way to do this....
Implications for structure configs
For the same reason (I believe), it can be annoying to compose structured configs that use a subclass value as default for a superclass-annotated type. To make it concrete, consider
@dataclass
class Base:
x: int = MISSING
@dataclass
class ImplA(Base):
x: int = 4
y: int = 10
@dataclass
class ImplB(Base):
x: int = 100
y: str = 'y'
@dataclass
class Config:
key: Base = ImplA()
cs.store(name='config', node=Config)
then, if we try to merge config
with a user override/config that use ImplB
for key
, OmegaConf merging fails, because in the view of OmegaConf, Config
's key
field has type ImplA
, because of its default ImplA
value, despite the Base
type annotation.
This can be also worked around with the same default list approach. But, as mentioned above, it quickly becomes unmanageable in slightly larger projects.