Skip to content

ValueError: Incompatible dataset entered for current task,expected dataset to have task type :tabular_regression got :tabular_classification #352

Closed
@nilsplettenberg

Description

@nilsplettenberg
  • I'm submitting a ...
    • bug report

Issue Description

I've tested AutoPytorch with the example codes on multiple OpenML datasets. For some of them, I get the Value Error due to an incompatible dataset even though y is a dataframe with a numeric column, e.g. OpenML 574

Your Code

from autoPyTorch.api.tabular_regression import TabularRegressionTask
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler

"input": [
        "P1",
        "P5p1",
        "P6p2",
        "P11p4",
        "P14p9",
        "P15p1",
        "P15p3",
        "P16p2",
        "P18p2",
        "P27p4",
        "H2p2",
        "H8p2",
        "H10p1",
        "H13p1",
        "H18pA",
        "H40p4"
    ],
"output": [
    "price"
],

# read dataframe
df = pd.read_csv("data.csv")

# normalize data
scaler = MinMaxScaler()
scaler.fit(df)
df = scaler.transform(df)

# split data
X = df[input]
y = df[output]
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.8,random_state=1)

# train model
api = TabularRegressionTask(n_jobs=7)
api.search(
    X_train=X_train,
    y_train=y_train,
    X_test=X_test.copy(),
    y_test=y_test.copy(),
    optimize_metric='r2',
    total_walltime_limit=3600,
)

Error message

api.search(
  File "/home/ubuntu/AutoPytorchValidation/venv/lib/python3.8/site-packages/autoPyTorch/api/tabular_regression.py", line 300, in search
    return self._search(
  File "/home/ubuntu/AutoPytorchValidation/venv/lib/python3.8/site-packages/autoPyTorch/api/base_task.py", line 876, in _search
    raise ValueError("Incompatible dataset entered for current task,"
ValueError: Incompatible dataset entered for current task,expected dataset to have task type :tabular_regression got :tabular_classification

Your Local environment

absl-py==1.0.0
autoPyTorch==0.1.1
backcall==0.2.0
cachetools==4.2.4
catboost==1.0.3
certifi==2021.10.8
charset-normalizer==2.0.8
click==8.0.3
cloudpickle==2.0.0
ConfigSpace==0.4.20
cycler==0.11.0
Cython==0.29.24
dask==2021.11.2
debugpy==1.5.1
decorator==5.1.0
distributed==2021.11.2
entrypoints==0.3
flaky==3.7.0
fonttools==4.28.2
fsspec==2021.11.1
google-auth==2.3.3
google-auth-oauthlib==0.4.6
graphviz==0.19
grpcio==1.42.0
HeapDict==1.0.1
idna==3.3
imageio==2.13.0
imgaug==0.4.0
importlib-metadata==4.8.2
ipykernel==6.5.1
ipython==7.30.0
jedi==0.18.1
Jinja2==3.0.3
joblib==1.1.0
jupyter-client==7.1.0
jupyter-core==4.9.1
kiwisolver==1.3.2
lightgbm==3.3.1
locket==0.2.1
lockfile==0.12.2
Markdown==3.3.6
MarkupSafe==2.0.1
matplotlib==3.5.0
matplotlib-inline==0.1.3
msgpack==1.0.3
nest-asyncio==1.5.1
networkx==2.6.3
numpy==1.21.4
oauthlib==3.1.1
opencv-python==4.5.4.60
packaging==21.3
pandas==1.3.4
parso==0.8.2
partd==1.2.0
pexpect==4.8.0
pickleshare==0.7.5
Pillow==8.4.0
pkg_resources==0.0.0
plotly==5.4.0
prompt-toolkit==3.0.23
protobuf==3.19.1
psutil==5.8.0
ptyprocess==0.7.0
pyasn1==0.4.8
pyasn1-modules==0.2.8
Pygments==2.10.0
pynisher==0.6.4
pyparsing==3.0.6
pyrfr==0.8.2
python-dateutil==2.8.2
pytz==2021.3
PyWavelets==1.2.0
PyYAML==6.0
pyzmq==22.3.0
requests==2.26.0
requests-oauthlib==1.3.0
rsa==4.8
scikit-image==0.18.3
scikit-learn==0.24.2
scipy==1.7.3
setuptools-scm==6.3.2
Shapely==1.8.0
six==1.16.0
smac==0.14.0
sortedcontainers==2.4.0
tabulate==0.8.9
tblib==1.7.0
tenacity==8.0.1
tensorboard==2.7.0
tensorboard-data-server==0.6.1
tensorboard-plugin-wit==1.8.0
threadpoolctl==3.0.0
tifffile==2021.11.2
tomli==1.2.2
toolz==0.11.2
torch==1.10.0
torchvision==0.11.1
tornado==6.1
traitlets==5.1.1
typing_extensions==4.0.0
urllib3==1.26.7
wcwidth==0.2.5
Werkzeug==2.0.2
zict==2.0.0
zipp==3.6.0

Data set

data.csv

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions