Open
Description
It appears the the code still accepts data_preprocessor
as a valid entry to include: Dict[str, Any]
of the estimators.
It states the only valid entry is 'feature_type'
if passing data_preprocessor: []
AutoSklearnClassifier({
time_left_for_this_task=30,
include={
'data_preprocessor': ['feature_type']
}
}
While this works fine, we also document that it can't be turned off here and provide a broken example of how to turn it off here. It's broken as see by the sprint statistics that show only a single DummyModel returned.
This is also confusing as InputValidator
also seems to also do OrdinalEncoding which is a possible choice of the DataPreprocessor step in the pipieline.
In the short term I can think of two possibilities:
- Allow a boolean switch,
include: {'data_preproccessing' : True/False}
, removing the pipeline step entirely - Include a string like
include: {'data_preproccessing' : 'no_preprocessing'}
, having the pipeline step perform a no-op on the data.