Issue with array dimension error in regression models

## Describe the bug ##
I'm calling some of the regression methods provided in auto-sklearn for my project and the error shows when using mlp/libsvm_svr/sgd, the exact error message is (omitted the returned 1D array):
```python
~/anaconda3/lib/python3.8/site-packages/autosklearn/pipeline/components/regression/libsvm_svr.py in predict(self, X)
    100             raise NotImplementedError
    101         Y_pred = self.estimator.predict(X)
--> 102         return self.scaler.inverse_transform(Y_pred)
    103 
    104     @staticmethod

~/anaconda3/lib/python3.8/site-packages/sklearn/preprocessing/_data.py in inverse_transform(self, X, copy)
   1014 
   1015         copy = copy if copy is not None else self.copy
-> 1016         X = check_array(
   1017             X,
   1018             accept_sparse="csr",

~/anaconda3/lib/python3.8/site-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator)
    759             # If input is 1D raise error
    760             if array.ndim == 1:
--> 761                 raise ValueError(
    762                     "Expected 2D array, got 1D array instead:\narray={}.\n"
    763                     "Reshape your data either using array.reshape(-1, 1) if "

ValueError: Expected 2D array, got 1D array instead:
```
for ```autosklearn/pipeline/components/regression/mlp.py```, ```autosklearn/pipeline/components/regression/libsvm_svr.py``` and ```autosklearn/pipeline/components/regression/sgd.py```

## To Reproduce ##
Test data: https://www.kaggle.com/tejashvi14/medical-insurance-premium-prediction/download
Using "PremiumPrice" as response/y and other variables as features/X
1. Call above three models with fit, predict workflow. The above message will appears at predict stage.
2. Or, I tried using AutoSklearnRegressor
Fit stage (Time limit just to save time, I don't expect it can return anything meaningful.)
```python
from autosklearn.regression import AutoSklearnRegressor
reg = AutoSklearnRegressor(
    time_left_for_this_task = 360,
    include = {'regressor' : ['mlp']}
)
reg.fit(data[features], data[[response]])
```
Predict Stage
```python
reg.predict(data[features], data[[response]])
```
The training stage will return enormous amount of ```[WARNING] [2021-11-09 15:14:31,628:Client-AutoMLSMBO(1)::079213e7-41a2-11ec-97c8-00155d1712a6] Configuration 119 not found``` (with different numbers at 119 position).
And for AutoSklearnRegressor, predict will just return a (n_sample, ) numpy array with all same elements (close to mean of response but not exact the same), which I don't think is completed as intended.

Returns of the test predict stage (only taken first few lines, others are just the same)
```python
array([24110.60546875, 24110.60546875, 24110.60546875, 24110.60546875,
       24110.60546875, 24110.60546875, 24110.60546875, 24110.60546875,
       24110.60546875, 24110.60546875, 24110.60546875, 24110.60546875,
       24110.60546875, 24110.60546875, 24110.60546875, 24110.60546875,
````

## Reason for the Problem ##
I think the problem is caused by standardization (```sklearn.preprocessing.StandardScaler```) used in ```autosklearn/pipeline/components/regression/mlp.py```, ```autosklearn/pipeline/components/regression/libsvm_svr.py``` and ```autosklearn/pipeline/components/regression/sgd.py```

Code below extracted from ```autosklearn/pipeline/components/regression/sgd.py```, iterative_fit, line 92-95
```python
self.scaler = sklearn.preprocessing.StandardScaler(copy=True)
self.scaler.fit(y.reshape((-1, 1)))
Y_scaled = self.scaler.transform(y.reshape((-1, 1))).ravel()
self.estimator.fit(X, Y_scaled)
```
And in predict method, line 131-132
```python
Y_pred = self.estimator.predict(X)
return self.scaler.inverse_transform(Y_pred)
```
Y_pred is returned by predict method, a (n_sample, ) numpy array, while the inverse_transform of StandardScaler requires a (n_sample, 1) array. Correction should be something like:
```python
Y_pred = self.estimator.predict(X)
return self.scaler.inverse_transform(Y_pred.reshape(-1, 1)).ravel()
```
I think mlp/libsvm_svr have the same problem.

## Environment and installation: ##

* OS: Windows 11 Education, OS build 22000.282, WSL version 2 with Ubuntu 20.04.3 LTS (run on WSL)
* Conda version: 4.10.3
* Python version: 3.8.8
* Sklearn version: 1.0.1
* Auto-sklearn version: 0.14.0


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Issue with array dimension error in regression models #1297

Describe the bug

To Reproduce

Reason for the Problem

Environment and installation:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue with array dimension error in regression models #1297

Description

Describe the bug

To Reproduce

Reason for the Problem

Environment and installation:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions