Skip to content

Bug fixes for performance over time calculations #1602

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 11 commits into from

Conversation

nklemashev
Copy link

I have found and corrected several bugs in getting performance over time:

  1. Incorrect calculation of single_best_optimization_score for metrics, which should be mimized (like MSE). When you update single_best_optimization_score you assume, that the greater value of the score is, the better model you have. This is not true for some metrics, like MSE. Because of this, the table you get from .performance_over_time_ property shows that MSE of the best model may increase with time.
  2. No sorting by Timesamp when calculating single_best_optimization_score. After sorting by Timestamp when merging with cluster scores, this may result in non-monotonic behavior for single_best_optimization_score.
  3. Cluster scores have incorrect sign for the loss metrics, like MSE. The cluster scores for MSE are negative, while values of single_best_optimization_score are positive. Checking with MSE calculation for model predictions on the test set shows that the cluster scores are negatives of MSE.
  4. The returning table with performance over time contains full duplicates.

All these four features may be seen in the following MWE:

import sklearn.datasets
import sklearn.metrics

import autosklearn.regression
import matplotlib.pyplot as plt
from autosklearn.metrics import mean_squared_error

import pandas as pd
pd.options.display.max_rows = 100

X, y = sklearn.datasets.load_diabetes(return_X_y=True)

X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(
    X, y, random_state=24
)

params = {
    'allow_string_features': False,
    'dask_client': None,
    'dataset_compression': False,
    'delete_tmp_folder_after_terminate': True,
    'disable_evaluator_output': False,
    'ensemble_class': autosklearn.ensembles.ensemble_selection.EnsembleSelection,
    'ensemble_kwargs': {'ensemble_size': 1},
    'ensemble_nbest': 50,
    'ensemble_size': None,
    'exclude': None,
    'get_smac_object_callback': None,
    'get_trials_callback': None,
    'include': {
        'regressor': [
            'adaboost',
            'ard_regression',
            'decision_tree',
            'extra_trees',
            'gaussian_process',
            'gradient_boosting',
            'k_nearest_neighbors',
            'liblinear_svr',
            'libsvm_svr',
            'mlp',
            'random_forest',
            'sgd'
        ],
        'feature_preprocessor': [
            'densifier',
            'extra_trees_preproc_for_regression',
            'fast_ica',
            'feature_agglomeration',
            'kernel_pca',
            'kitchen_sinks',
            'no_preprocessing',
            'nystroem_sampler',
            'pca',
            'polynomial',
            'random_trees_embedding',
            'select_percentile_regression',
            'select_rates_regression',
            'truncatedSVD'
        ]
    },
    'initial_configurations_via_metalearning': 25,
    'load_models': True,
    'logging_config': None,
    'max_models_on_disc': 50,
    'memory_limit': 3072,
    'metadata_directory': None,
    'metric': mean_squared_error,
    'n_jobs': -1,
    'per_run_time_limit': 20,
    'resampling_strategy': 'holdout',
    'resampling_strategy_arguments': {
        'train_size': 0.67,
        'shuffle': True,
        'folds': 5
    },
    'scoring_functions': None,
    'seed': 24,
    'smac_scenario_args': None,
    'time_left_for_this_task': 60,
    'tmp_folder': None
}

automl = autosklearn.regression.AutoSklearnRegressor(
    **params
)
automl.fit(X_train, y_train, X_test, y_test)

train_predictions = automl.predict(X_train)
print("Train MSE:", sklearn.metrics.mean_squared_error(y_train, train_predictions))
test_predictions = automl.predict(X_test)
print("Test MSE:", sklearn.metrics.mean_squared_error(y_test, test_predictions))

pot = automl.performance_over_time_

print(pot)

The output of this script is

Train MSE: 2856.9978041757504
Test MSE: 2861.3524999950955
                    Timestamp  ensemble_optimization_score  \
0  2022-10-26 08:32:03.830196                 -3390.133471   
28 2022-10-26 08:32:04.000000                 -3390.133471   
29 2022-10-26 08:32:04.000000                 -3390.133471   
30 2022-10-26 08:32:04.000000                 -3390.133471   
32 2022-10-26 08:32:04.000000                 -3390.133471   
33 2022-10-26 08:32:04.000000                 -3390.133471   
34 2022-10-26 08:32:04.000000                 -3390.133471   
35 2022-10-26 08:32:04.000000                 -3390.133471   
36 2022-10-26 08:32:04.000000                 -3390.133471   
37 2022-10-26 08:32:04.000000                 -3390.133471   
38 2022-10-26 08:32:04.000000                 -3390.133471   
27 2022-10-26 08:32:04.000000                 -3390.133471   
39 2022-10-26 08:32:04.000000                 -3390.133471   
41 2022-10-26 08:32:04.000000                 -3390.133471   
42 2022-10-26 08:32:04.000000                 -3390.133471   
43 2022-10-26 08:32:04.000000                 -3390.133471   
44 2022-10-26 08:32:04.000000                 -3390.133471   
45 2022-10-26 08:32:04.000000                 -3390.133471   
46 2022-10-26 08:32:04.000000                 -3390.133471   
47 2022-10-26 08:32:04.000000                 -3390.133471   
48 2022-10-26 08:32:04.000000                 -3390.133471   
49 2022-10-26 08:32:04.000000                 -3390.133471   
50 2022-10-26 08:32:04.000000                 -3390.133471   
40 2022-10-26 08:32:04.000000                 -3390.133471   
26 2022-10-26 08:32:04.000000                 -3390.133471   
31 2022-10-26 08:32:04.000000                 -3390.133471   
24 2022-10-26 08:32:04.000000                 -3390.133471   
25 2022-10-26 08:32:04.000000                 -3390.133471   
1  2022-10-26 08:32:04.505665                 -3249.377552   
2  2022-10-26 08:32:04.505665                 -3249.377552   
3  2022-10-26 08:32:04.505665                 -3249.377552   
4  2022-10-26 08:32:04.505665                 -3249.377552   
23 2022-10-26 08:32:05.000000                 -3249.377552   
5  2022-10-26 08:32:06.192514                 -3114.696954   
6  2022-10-26 08:32:06.192514                 -3114.696954   
8  2022-10-26 08:32:06.192514                 -3114.696954   
9  2022-10-26 08:32:06.192514                 -3114.696954   
10 2022-10-26 08:32:06.192514                 -3114.696954   
11 2022-10-26 08:32:06.192514                 -3114.696954   
7  2022-10-26 08:32:06.192514                 -3114.696954   
13 2022-10-26 08:32:06.192514                 -3114.696954   
14 2022-10-26 08:32:06.192514                 -3114.696954   
15 2022-10-26 08:32:06.192514                 -3114.696954   
16 2022-10-26 08:32:06.192514                 -3114.696954   
17 2022-10-26 08:32:06.192514                 -3114.696954   
18 2022-10-26 08:32:06.192514                 -3114.696954   
19 2022-10-26 08:32:06.192514                 -3114.696954   
20 2022-10-26 08:32:06.192514                 -3114.696954   
21 2022-10-26 08:32:06.192514                 -3114.696954   
22 2022-10-26 08:32:06.192514                 -3114.696954   
12 2022-10-26 08:32:06.192514                 -3114.696954   
54 2022-10-26 08:32:24.000000                 -3114.696954   
53 2022-10-26 08:32:24.000000                 -3114.696954   
51 2022-10-26 08:32:24.000000                 -3114.696954   
52 2022-10-26 08:32:24.000000                 -3114.696954   
61 2022-10-26 08:32:25.000000                 -3114.696954   
55 2022-10-26 08:32:25.000000                 -3114.696954   
56 2022-10-26 08:32:25.000000                 -3114.696954   
57 2022-10-26 08:32:25.000000                 -3114.696954   
58 2022-10-26 08:32:25.000000                 -3114.696954   
59 2022-10-26 08:32:25.000000                 -3114.696954   
60 2022-10-26 08:32:25.000000                 -3114.696954   
62 2022-10-26 08:32:25.000000                 -3114.696954   

    ensemble_test_score  single_best_optimization_score  \
0          -2908.654161                             NaN   
28         -2908.654161                     5547.237465   
29         -2908.654161                     5547.237465   
30         -2908.654161                     5547.237465   
32         -2908.654161                     5547.237465   
33         -2908.654161                     5547.237465   
34         -2908.654161                     5547.237465   
35         -2908.654161                     5547.237465   
36         -2908.654161                     5547.237465   
37         -2908.654161                     5547.237465   
38         -2908.654161                     5547.237465   
27         -2908.654161                     5547.237465   
39         -2908.654161                     5547.237465   
41         -2908.654161                     5547.237465   
42         -2908.654161                     5547.237465   
43         -2908.654161                     5547.237465   
44         -2908.654161                     5547.237465   
45         -2908.654161                     5547.237465   
46         -2908.654161                     5547.237465   
47         -2908.654161                     5547.237465   
48         -2908.654161                     5547.237465   
49         -2908.654161                     5547.237465   
50         -2908.654161                     5547.237465   
40         -2908.654161                     5547.237465   
26         -2908.654161                     5547.237465   
31         -2908.654161                     5547.237465   
24         -2908.654161                     5547.237465   
25         -2908.654161                     5547.237465   
1          -3056.364840                     5547.237465   
2          -3056.364840                     5547.237465   
3          -3056.364840                     5547.237465   
4          -3056.364840                     5547.237465   
23         -3056.364840                     3566.974222   
5          -2861.352500                     3566.974222   
6          -2861.352500                     3566.974222   
8          -2861.352500                     3566.974222   
9          -2861.352500                     3566.974222   
10         -2861.352500                     3566.974222   
11         -2861.352500                     3566.974222   
7          -2861.352500                     3566.974222   
13         -2861.352500                     3566.974222   
14         -2861.352500                     3566.974222   
15         -2861.352500                     3566.974222   
16         -2861.352500                     3566.974222   
17         -2861.352500                     3566.974222   
18         -2861.352500                     3566.974222   
19         -2861.352500                     3566.974222   
20         -2861.352500                     3566.974222   
21         -2861.352500                     3566.974222   
22         -2861.352500                     3566.974222   
12         -2861.352500                     3566.974222   
54         -2861.352500                     5552.368526   
53         -2861.352500                     5552.368526   
51         -2861.352500                     5552.368526   
52         -2861.352500                     5552.368526   
61         -2861.352500                     6140.768623   
55         -2861.352500                     6140.768623   
56         -2861.352500                     6140.768623   
57         -2861.352500                     6140.768623   
58         -2861.352500                     6140.768623   
59         -2861.352500                     6140.768623   
60         -2861.352500                     6140.768623   
62         -2861.352500                     6140.768623   

    single_best_train_score  single_best_test_score  
0                       NaN                     NaN  
28                 8.204380             5956.516151  
29                 8.204380             5956.516151  
30                 8.204380             5956.516151  
32                 8.204380             5956.516151  
33                 8.204380             5956.516151  
34                 8.204380             5956.516151  
35                 8.204380             5956.516151  
36                 8.204380             5956.516151  
37                 8.204380             5956.516151  
38                 8.204380             5956.516151  
27                 8.204380             5956.516151  
39                 8.204380             5956.516151  
41                 8.204380             5956.516151  
42                 8.204380             5956.516151  
43                 8.204380             5956.516151  
44                 8.204380             5956.516151  
45                 8.204380             5956.516151  
46                 8.204380             5956.516151  
47                 8.204380             5956.516151  
48                 8.204380             5956.516151  
49                 8.204380             5956.516151  
50                 8.204380             5956.516151  
40                 8.204380             5956.516151  
26                 8.204380             5956.516151  
31                 8.204380             5956.516151  
24                 8.204380             5956.516151  
25                 8.204380             5956.516151  
1                  8.204380             5956.516151  
2                  8.204380             5956.516151  
3                  8.204380             5956.516151  
4                  8.204380             5956.516151  
23               458.863654             3095.679501  
5                458.863654             3095.679501  
6                458.863654             3095.679501  
8                458.863654             3095.679501  
9                458.863654             3095.679501  
10               458.863654             3095.679501  
11               458.863654             3095.679501  
7                458.863654             3095.679501  
13               458.863654             3095.679501  
14               458.863654             3095.679501  
15               458.863654             3095.679501  
16               458.863654             3095.679501  
17               458.863654             3095.679501  
18               458.863654             3095.679501  
19               458.863654             3095.679501  
20               458.863654             3095.679501  
21               458.863654             3095.679501  
22               458.863654             3095.679501  
12               458.863654             3095.679501  
54                 4.928092             5961.683783  
53                 4.928092             5961.683783  
51                 4.928092             5961.683783  
52                 4.928092             5961.683783  
61                 0.000000             5977.949471  
55                 0.000000             5977.949471  
56                 0.000000             5977.949471  
57                 0.000000             5977.949471  
58                 0.000000             5977.949471  
59                 0.000000             5977.949471  
60                 0.000000             5977.949471  
62                 0.000000             5977.949471

A nicer check for these issues with graphics is presented in Jupyter notebook: Bugs MWE.

The same notebook with corrected autosklearn code: Bugs MWE Corrected.

I have slightly modified code in performance_over_time_ method of AutoML class for fixing these issues.

P.S.: Also modified Makefile to prevent black command from reformatting everything in the directory, in which autosklearn is located. I followed your contribution guidelines and created Python virtual environment directory my-virtual-env in the same directory as autosklearn and running black autosklearn/.* reformats all Python packages in my-virtual-env.

@eddiebergman
Copy link
Contributor

Heyo, thanks for the contribution :) I've run the tests and hopefully they will all pass. I will do a pass and leave a review once I can!

@nklemashev
Copy link
Author

nklemashev commented Oct 28, 2022

All the errors seem to have no relation to the files I have modified. I have rerun pytest on my docker image, where I modified the py-files. It shows some warnings, but they are for files I have not touched. And no test has failed. To double check I ran pytest --last-failed and noticed the message run-last-failure: no previously failed tests, not deselecting items.

@carlosnatalino
Copy link

Hi,

I'm a new user and was wondering: is there an explanation of what each of these curves within the performance_over_time_ mean? I checked the documentation but could not find it anywhere. The example here has a list of items but no explanation of what they mean. Any help is appreciated.

@nklemashev
Copy link
Author

@carlosnatalino, as far as I understand,

  • single_best_optimization_score -- best value a single model attains for the metric you optimize calculated by the validation method specified in resampling_strategy parameter;
  • single_best_train_score -- the value of the metric you optimize on the training set for the best single model defined by optimization score; the training set here is defined by X, y parameteres of the fit method and resampling_strategy parameter of the AutoML class; so it is either a fixed part of the initial train set, defined by X, y parameteres of the fit method, or defined by cross validation;
  • single_best_test_score -- the value of the metric you optimize on the test set for the best single model defined by optimization score; the test set here is defined by X_test, y_test parameteres of the fit method;
  • ensemble_optimization_score -- best value attained by the ensemble model for the metric you optimize calculated by the validation method specified in resampling_strategy parameter;
  • ensemble_test_score -- the value of the metric you optimize on the test set for the best ensemble model defined by optimization score; the test set here is defined by X_test, y_test parameteres of the fit method;

@carlosnatalino
Copy link

Hi @nklemashev thank you very much for the explanation. Very appreciated!

@eddiebergman
Copy link
Contributor

Hi @nklemashev,

Apologies for the delay, I was on vacation, I reran the tests and will look at this either today or tomorrow!

Best,
Eddie

@aron-bram
Copy link
Collaborator

All the errors seem to have no relation to the files I have modified. I have rerun pytest on my docker image, where I modified the py-files. It shows some warnings, but they are for files I have not touched. And no test has failed. To double check I ran pytest --last-failed and noticed the message run-last-failure: no previously failed tests, not deselecting items.

I ran into the same issue in PR #1606. The error likely occures because pytest-xdist exported some functionality to another package called pytest-forked so the --forked flag does not get recognized anymore. This should be fixed soon however, and thereafter the failed tests should pass.

@eddiebergman
Copy link
Contributor

As an update, I am now half-back and pushing to update scikit-learn and some of internal dependancies before moving onto these PR's. Apologies for delays and your PR is definitely appreciated.

@github-actions github-actions bot added the Stale label Jan 15, 2023
@github-actions github-actions bot closed this Jan 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants