Description
-
EDIT: After some investigation, this appears to have less to do with the configurations and more with imputation. After trying to recreate the failures, it seems that
X
arrays that reach a treshold of Nans end up causing the configurations to fail. These Nan's are added randomly and so it explains the infrequency of it. -
Edit2:
"fast_ica"
with"fun":"exp"
fails if there is a majority of Nans in the data. -
Edit3:
"fast_ica"
with"fast_ica:whiten" : "False"
fails with NaN's in the input. -
Edit4:
"fast_ica"
with"whiten" : "False"
fails even with no NaN values present. -
Edit:5
"fast_ica"
with"iris"
dataset works, even with high occurence of Nan's, it seems that it is more dependant on the frequency of 0's in the dataset rather than Nan's. -
Edit6: Trying to force a certain
"feature:preprocessor:__choice__"
is currently not possible. Trying to manually go in and edit the Config is not straight forward and should be approached whenConfigSpace.Configuration
get's updated to allow for easier modificaiton of aConfig
. See this issue ConfigSpace #205 for why it's not straight forward to delete a key and add a new one.
We leave some randomness in the configurations that get tested when testing different classifier and regressor components, these are collected here:
- Python version 3.8
- Test
test/test_pipeline/test_classification.py::SimpleClassificationPipelineTest::test_configurations_sparse
Configuration:
balancing:strategy, Value: 'weighting'
classifier:__choice__, Value: 'sgd'
classifier:sgd:alpha, Value: 7.27693595714389e-05
classifier:sgd:average, Value: 'False'
classifier:sgd:eta0, Value: 0.013654826040547558
classifier:sgd:fit_intercept, Constant: 'True'
classifier:sgd:learning_rate, Value: 'invscaling'
classifier:sgd:loss, Value: 'log'
classifier:sgd:penalty, Value: 'l1'
classifier:sgd:power_t, Value: 0.5468767593727824
classifier:sgd:tol, Value: 8.162675288740052e-05
data_preprocessor:__choice__, Value: 'feature_type'
data_preprocessor:feature_type:categorical_transformer:categorical_encoding:__choice__, Value: 'encoding'
data_preprocessor:feature_type:categorical_transformer:category_coalescence:__choice__, Value: 'no_coalescense'
data_preprocessor:feature_type:numerical_transformer:imputation:strategy, Value: 'mean'
data_preprocessor:feature_type:numerical_transformer:rescaling:__choice__, Value: 'quantile_transformer'
data_preprocessor:feature_type:numerical_transformer:rescaling:quantile_transformer:n_quantiles, Value: 467
data_preprocessor:feature_type:numerical_transformer:rescaling:quantile_transformer:output_distribution, Value: 'normal'
feature_preprocessor:__choice__, Value: 'kernel_pca'
feature_preprocessor:kernel_pca:gamma, Value: 6.985386846337043
feature_preprocessor:kernel_pca:kernel, Value: 'rbf'
feature_preprocessor:kernel_pca:n_components, Value: 10
- Python version 3.8
- Test
test/test_pipeline/test_classification.py::SimpleClassificationPipelineTest::test_configurations_signed_data
Configuration:
balancing:strategy, Value: 'weighting'
classifier:__choice__, Value: 'lda'
classifier:lda:shrinkage, Value: 'auto'
classifier:lda:tol, Value: 0.038890093430048595
data_preprocessor:__choice__, Value: 'feature_type'
data_preprocessor:feature_type:categorical_transformer:categorical_encoding:__choice__, Value: 'encoding'
data_preprocessor:feature_type:categorical_transformer:category_coalescence:__choice__, Value: 'minority_coalescer'
data_preprocessor:feature_type:categorical_transformer:category_coalescence:minority_coalescer:minimum_fraction, Value: 0.001521146558163954
data_preprocessor:feature_type:numerical_transformer:imputation:strategy, Value: 'mean'
data_preprocessor:feature_type:numerical_transformer:rescaling:__choice__, Value: 'none'
feature_preprocessor:__choice__, Value: 'fast_ica'
feature_preprocessor:fast_ica:algorithm, Value: 'deflation'
feature_preprocessor:fast_ica:fun, Value: 'exp'
feature_preprocessor:fast_ica:whiten, Value: 'False'
- Python version 3.10
- Test
SimpleClassificationPipelineTest.test_configurations_sparse
Configuration:
balancing:strategy, Value: 'none'
classifier:__choice__, Value: 'qda'
classifier:qda:reg_param, Value: 0.7722372097734942
data_preprocessor:__choice__, Value: 'feature_type'
data_preprocessor:feature_type:categorical_transformer:categorical_encoding:__choice__, Value: 'encoding'
data_preprocessor:feature_type:categorical_transformer:category_coalescence:__choice__, Value: 'no_coalescense'
data_preprocessor:feature_type:numerical_transformer:imputation:strategy, Value: 'median'
data_preprocessor:feature_type:numerical_transformer:rescaling:__choice__, Value: 'quantile_transformer'
data_preprocessor:feature_type:numerical_transformer:rescaling:quantile_transformer:n_quantiles, Value: 1761
data_preprocessor:feature_type:numerical_transformer:rescaling:quantile_transformer:output_distribution, Value: 'normal'
feature_preprocessor:__choice__, Value: 'kernel_pca'
feature_preprocessor:kernel_pca:gamma, Value: 2.351280410584469
feature_preprocessor:kernel_pca:kernel, Value: 'rbf'
feature_preprocessor:kernel_pca:n_components, Value: 10
- Python version 3.8
- Test
SimpleClassificationPipelineTest.test_configurations_signed_data
Configuration:
balancing:strategy, Value: 'none'
classifier:__choice__, Value: 'gaussian_nb'
data_preprocessor:__choice__, Value: 'feature_type'
data_preprocessor:feature_type:categorical_transformer:categorical_encoding:__choice__, Value: 'encoding'
data_preprocessor:feature_type:categorical_transformer:category_coalescence:__choice__, Value: 'no_coalescense'
data_preprocessor:feature_type:numerical_transformer:imputation:strategy, Value: 'most_frequent'
data_preprocessor:feature_type:numerical_transformer:rescaling:__choice__, Value: 'quantile_transformer'
data_preprocessor:feature_type:numerical_transformer:rescaling:quantile_transformer:n_quantiles, Value: 1004
data_preprocessor:feature_type:numerical_transformer:rescaling:quantile_transformer:output_distribution, Value: 'normal'
feature_preprocessor:__choice__, Value: 'fast_ica'
feature_preprocessor:fast_ica:algorithm, Value: 'deflation'
feature_preprocessor:fast_ica:fun, Value: 'exp'
feature_preprocessor:fast_ica:whiten, Value: 'False'