Description
I was looking around in the code for a way to instantiate an sklearn model based on a Configuration
object. My use-case is that I try to implement a generalized way of getting standard metadata about a completed autosklearn run. Eg, I call autosklearn_model.get_models_with_weights()
and that ends up containing some Configuration
objects. But these may for example just describe an sklearn model, although I understand it is possible to extend and register other model types as well. In either case I guess I would like access to an instance of the model with matching configuration, so that I can try calling get_params()
on it to see if it supports that interface. Maybe this is simply accessible somewhere else, but my idea was to re-instantiate a dummy model based on the Configuration
, and then do this calling of get_params()
. Ideal would be that I could dynamically instantiate whatever model is described by the __choice__
(even if it's not sklearn), according to however autosklearn does it internally.
I was poking around in the code and found eg
I was expecting to find though some place where this import is dynamically selected based on
choice
, but maybe this is just a wrapper class and is itself chosen dynamically?
Then bits like this:
auto-sklearn/test/test_pipeline/components/classification/test_base.py
Lines 279 to 283 in bb8396b
Could you point me in the right direction? Or advise if I am missing some fundamental point about how this should work.
To make sure I am clear above I'll also include an example. I have a config
object like this:
Configuration:
balancing:strategy, Value: 'none'
classifier:__choice__, Value: 'random_forest'
classifier:random_forest:bootstrap, Value: 'True'
classifier:random_forest:criterion, Value: 'gini'
classifier:random_forest:max_depth, Constant: 'None'
classifier:random_forest:max_features, Value: 0.5
classifier:random_forest:max_leaf_nodes, Constant: 'None'
classifier:random_forest:min_impurity_decrease, Constant: 0.0
classifier:random_forest:min_samples_leaf, Value: 1
classifier:random_forest:min_samples_split, Value: 2
classifier:random_forest:min_weight_fraction_leaf, Constant: 0.0
data_preprocessing:categorical_transformer:categorical_encoding:__choice__, Value: 'one_hot_encoding'
data_preprocessing:categorical_transformer:category_coalescence:__choice__, Value: 'minority_coalescer'
data_preprocessing:categorical_transformer:category_coalescence:minority_coalescer:minimum_fraction, Value: 0.01
data_preprocessing:numerical_transformer:imputation:strategy, Value: 'mean'
data_preprocessing:numerical_transformer:rescaling:__choice__, Value: 'standardize'
feature_preprocessor:__choice__, Value: 'no_preprocessing'
And I want something that produces the equivalent of this (but without hard-coding the model choice and removing parameters that sklearn doesn't like etc):
hps = {hp_name.rsplit(':')[-1]: config[hp_name] for hp_name in config if config[hp_name] is not None}
from sklearn.ensemble import RandomForestClassifier
hps = {k:v for k,v in hps.items() if k not in ['strategy', '__choice__', 'minimum_fraction']}
return to_mls_sklearn(RandomForestClassifier(**hps))