Skip to content

Issue with array dimension error in regression models #1297

Closed
@PanyiDong

Description

@PanyiDong

Describe the bug

I'm calling some of the regression methods provided in auto-sklearn for my project and the error shows when using mlp/libsvm_svr/sgd, the exact error message is (omitted the returned 1D array):

~/anaconda3/lib/python3.8/site-packages/autosklearn/pipeline/components/regression/libsvm_svr.py in predict(self, X)
    100             raise NotImplementedError
    101         Y_pred = self.estimator.predict(X)
--> 102         return self.scaler.inverse_transform(Y_pred)
    103 
    104     @staticmethod

~/anaconda3/lib/python3.8/site-packages/sklearn/preprocessing/_data.py in inverse_transform(self, X, copy)
   1014 
   1015         copy = copy if copy is not None else self.copy
-> 1016         X = check_array(
   1017             X,
   1018             accept_sparse="csr",

~/anaconda3/lib/python3.8/site-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator)
    759             # If input is 1D raise error
    760             if array.ndim == 1:
--> 761                 raise ValueError(
    762                     "Expected 2D array, got 1D array instead:\narray={}.\n"
    763                     "Reshape your data either using array.reshape(-1, 1) if "

ValueError: Expected 2D array, got 1D array instead:

for autosklearn/pipeline/components/regression/mlp.py, autosklearn/pipeline/components/regression/libsvm_svr.py and autosklearn/pipeline/components/regression/sgd.py

To Reproduce

Test data: https://www.kaggle.com/tejashvi14/medical-insurance-premium-prediction/download
Using "PremiumPrice" as response/y and other variables as features/X

  1. Call above three models with fit, predict workflow. The above message will appears at predict stage.
  2. Or, I tried using AutoSklearnRegressor
    Fit stage (Time limit just to save time, I don't expect it can return anything meaningful.)
from autosklearn.regression import AutoSklearnRegressor
reg = AutoSklearnRegressor(
    time_left_for_this_task = 360,
    include = {'regressor' : ['mlp']}
)
reg.fit(data[features], data[[response]])

Predict Stage

reg.predict(data[features], data[[response]])

The training stage will return enormous amount of [WARNING] [2021-11-09 15:14:31,628:Client-AutoMLSMBO(1)::079213e7-41a2-11ec-97c8-00155d1712a6] Configuration 119 not found (with different numbers at 119 position).
And for AutoSklearnRegressor, predict will just return a (n_sample, ) numpy array with all same elements (close to mean of response but not exact the same), which I don't think is completed as intended.

Returns of the test predict stage (only taken first few lines, others are just the same)

array([24110.60546875, 24110.60546875, 24110.60546875, 24110.60546875,
       24110.60546875, 24110.60546875, 24110.60546875, 24110.60546875,
       24110.60546875, 24110.60546875, 24110.60546875, 24110.60546875,
       24110.60546875, 24110.60546875, 24110.60546875, 24110.60546875,

Reason for the Problem

I think the problem is caused by standardization (sklearn.preprocessing.StandardScaler) used in autosklearn/pipeline/components/regression/mlp.py, autosklearn/pipeline/components/regression/libsvm_svr.py and autosklearn/pipeline/components/regression/sgd.py

Code below extracted from autosklearn/pipeline/components/regression/sgd.py, iterative_fit, line 92-95

self.scaler = sklearn.preprocessing.StandardScaler(copy=True)
self.scaler.fit(y.reshape((-1, 1)))
Y_scaled = self.scaler.transform(y.reshape((-1, 1))).ravel()
self.estimator.fit(X, Y_scaled)

And in predict method, line 131-132

Y_pred = self.estimator.predict(X)
return self.scaler.inverse_transform(Y_pred)

Y_pred is returned by predict method, a (n_sample, ) numpy array, while the inverse_transform of StandardScaler requires a (n_sample, 1) array. Correction should be something like:

Y_pred = self.estimator.predict(X)
return self.scaler.inverse_transform(Y_pred.reshape(-1, 1)).ravel()

I think mlp/libsvm_svr have the same problem.

Environment and installation:

  • OS: Windows 11 Education, OS build 22000.282, WSL version 2 with Ubuntu 20.04.3 LTS (run on WSL)
  • Conda version: 4.10.3
  • Python version: 3.8.8
  • Sklearn version: 1.0.1
  • Auto-sklearn version: 0.14.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions