Skip to content

Commit 69d245d

Browse files
ArlindKadraravinkohlinabenabe0928
committed
Bug fixes (#249)
* Update implementation * Coding style fixes * Implementation update * Style fix * Turn weighted loss into a constant again, implementation update * Cocktail branch inconsistencies (#275) * To nemo * Revert change in T_curr as results conclusively prove it should be 0 * Revert cutmix change after data from run * Final conclusion after results * FIX bug in shake alpha beta * Updated if is_training condition for shake drop * Remove temp fix in row cutmic * Cocktail fixes time debug (#286) * preprocess inside data validator * add time debug statements * Add fixes for categorical data * add fit_ensemble * add arlind fix for swa and se * fix bug in trainer choice fit * fix ensemble bug * Correct bug in cleanup * Cleanup for removing time debug statements * ablation for adversarial * shuffle false in dataloader * drop last false in dataloader * fix bug for validation set, and cutout and cutmix * shuffle = False * Shake Shake updates (#287) * To test locally * fix bug in trainer choice fit * fix ensemble bug * Correct bug in cleanup * To test locally * Cleanup for removing time debug statements * ablation for adversarial * shuffle false in dataloader * drop last false in dataloader * fix bug for validation set, and cutout and cutmix * To test locally * shuffle = False * To test locally * updates to search space * updates to search space * update branch with search space * undo search space update * fix bug in shake shake flag * limit to shake-even * restrict to even even * Add even even and others for shake-drop also * fix bug in passing alpha beta method * restrict to only even even * fix silly bug: * remove imputer and ordinal encoder for categorical transformer in feature validator * Address comments from shuhei * fix issues with ensemble fitting post hoc * Address comments on the PR * Fix flake and mypy errors * Address comments from PR #286 * fix bug in embedding * Update autoPyTorch/api/tabular_classification.py Co-authored-by: nabenabe0928 <47781922+nabenabe0928@users.noreply.github.com> * Update autoPyTorch/datasets/base_dataset.py Co-authored-by: nabenabe0928 <47781922+nabenabe0928@users.noreply.github.com> * Update autoPyTorch/datasets/base_dataset.py Co-authored-by: nabenabe0928 <47781922+nabenabe0928@users.noreply.github.com> * Update autoPyTorch/pipeline/components/training/trainer/base_trainer.py Co-authored-by: nabenabe0928 <47781922+nabenabe0928@users.noreply.github.com> * Address comments from shuhei * adress comments from shuhei * fix flake and mypy * Update autoPyTorch/pipeline/components/training/trainer/RowCutMixTrainer.py Co-authored-by: nabenabe0928 <47781922+nabenabe0928@users.noreply.github.com> * Update autoPyTorch/pipeline/tabular_classification.py Co-authored-by: nabenabe0928 <47781922+nabenabe0928@users.noreply.github.com> * Update autoPyTorch/pipeline/components/setup/network_backbone/utils.py Co-authored-by: nabenabe0928 <47781922+nabenabe0928@users.noreply.github.com> * Update autoPyTorch/pipeline/components/setup/network_backbone/utils.py Co-authored-by: nabenabe0928 <47781922+nabenabe0928@users.noreply.github.com> * Update autoPyTorch/pipeline/components/setup/network_backbone/utils.py Co-authored-by: nabenabe0928 <47781922+nabenabe0928@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: nabenabe0928 <47781922+nabenabe0928@users.noreply.github.com> * increase threads_per_worker * fix bug in rowcutmix * Enhancement for the tabular validator. (#291) * Initial try at an enhancement for the tabular validator * Adding a few type annotations * Fixing bugs in implementation * Adding wrongly deleted code part during rebase * Fix bug in _get_args * Fix bug in _get_args * Addressing Shuhei's comments * Address Shuhei's comments * Refactoring code * Refactoring code * Typos fix and additional comments * Replace nan in categoricals with simple imputer * Remove unused function * add comment * Update autoPyTorch/data/tabular_feature_validator.py Co-authored-by: nabenabe0928 <47781922+nabenabe0928@users.noreply.github.com> * Update autoPyTorch/data/tabular_feature_validator.py Co-authored-by: nabenabe0928 <47781922+nabenabe0928@users.noreply.github.com> * Adding unit test for only nall columns in the tabular feature categorical evaluator * fix bug in remove all nan columns * Bug fix for making tests run by arlind * fix flake errors in feature validator * made typing code uniform * Apply suggestions from code review Co-authored-by: nabenabe0928 <47781922+nabenabe0928@users.noreply.github.com> * address comments from shuhei * address comments from shuhei (2) Co-authored-by: Ravin Kohli <kohliravin7@gmail.com> Co-authored-by: Ravin Kohli <13005107+ravinkohli@users.noreply.github.com> Co-authored-by: nabenabe0928 <47781922+nabenabe0928@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: nabenabe0928 <47781922+nabenabe0928@users.noreply.github.com> * resolve code issues with new versions * Address comments from shuhei * make run_traditional_ml function * implement suggestion from shuhei and fix bug in rowcutmixtrainer * fix return type docstring * add better documentation and fix bug in shake_drop_get_bl * Apply suggestions from code review Co-authored-by: nabenabe0928 <47781922+nabenabe0928@users.noreply.github.com> * add test for comparator and other improvements based on PR comments * fix bug in test * [fix] Fix the condition in the raising error of all_nan_columns * [refactor] Unite name conventions of numpy array and pandas dataframe * [doc] Add the description about the tabular feature transformation * [doc] Add the description of the tabular feature transformation * address comments from arlind * address comments from arlind * change to as_tensor and address comments from arlind * correct description for functions in data module Co-authored-by: nabenabe0928 <47781922+nabenabe0928@users.noreply.github.com> Co-authored-by: Arlind Kadra <arlindkadra@gmail.com> Co-authored-by: nabenabe0928 <shuhei.watanabe.utokyo@gmail.com> * Addressing Shuhei's comments * flake8 problems fix * Update autoPyTorch/api/base_task.py Add indent. Co-authored-by: Ravin Kohli <13005107+ravinkohli@users.noreply.github.com> * Update autoPyTorch/api/base_task.py Add indent. Co-authored-by: Ravin Kohli <13005107+ravinkohli@users.noreply.github.com> * Update autoPyTorch/data/tabular_feature_validator.py Add indentation. Co-authored-by: Ravin Kohli <13005107+ravinkohli@users.noreply.github.com> * Update autoPyTorch/pipeline/components/setup/network_backbone/utils.py Add line indentation. Co-authored-by: Ravin Kohli <13005107+ravinkohli@users.noreply.github.com> * Update autoPyTorch/data/tabular_feature_validator.py Validate if there is a column transformer since for sparse matrices we will not have one. Co-authored-by: Ravin Kohli <13005107+ravinkohli@users.noreply.github.com> * Update autoPyTorch/utils/implementations.py Delete uncommented line. Co-authored-by: Ravin Kohli <13005107+ravinkohli@users.noreply.github.com> * Allow the number of threads to be given by the user * Removing unnecessary argument and refactoring the attribute. * Addressing Ravin's comments * Update autoPyTorch/pipeline/components/setup/network_backbone/utils.py Updating the function documentation according to the agreed style. Co-authored-by: Ravin Kohli <13005107+ravinkohli@users.noreply.github.com> * Update autoPyTorch/pipeline/components/setup/network_backbone/utils.py Providing information on the wrong method provided for shake-shake regularization. Co-authored-by: nabenabe0928 <47781922+nabenabe0928@users.noreply.github.com> * add todo for backend and accept changes from shuhei * Addressing Shuhei's and Ravin's comments * Addressing Shuhei's and Ravin's comments, bug fix * Update autoPyTorch/pipeline/components/setup/network_backbone/ResNetBackbone.py Improving code readibility. Co-authored-by: nabenabe0928 <47781922+nabenabe0928@users.noreply.github.com> * Update autoPyTorch/pipeline/components/setup/network_backbone/ResNetBackbone.py Improving consistency. Co-authored-by: nabenabe0928 <47781922+nabenabe0928@users.noreply.github.com> * bug fix Co-authored-by: Ravin Kohli <13005107+ravinkohli@users.noreply.github.com> Co-authored-by: nabenabe0928 <47781922+nabenabe0928@users.noreply.github.com> Co-authored-by: nabenabe0928 <shuhei.watanabe.utokyo@gmail.com> Co-authored-by: Ravin Kohli <kohliravin7@gmail.com>
1 parent 50983f1 commit 69d245d

37 files changed

+1828
-427
lines changed

autoPyTorch/api/base_task.py

Lines changed: 283 additions & 54 deletions
Large diffs are not rendered by default.

autoPyTorch/api/tabular_classification.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -390,6 +390,8 @@ def search(
390390
)
391391

392392

393+
if self.dataset is None:
394+
raise ValueError("`dataset` in {} must be initialized, but got None".format(self.__class__.__name__))
393395
return self._search(
394396
dataset=self.dataset,
395397
optimize_metric=optimize_metric,

autoPyTorch/api/tabular_regression.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -69,7 +69,6 @@ class TabularRegressionTask(BaseTask):
6969
search space updates that can be used to modify the search
7070
space of particular components or choice modules of the pipeline
7171
"""
72-
7372
def __init__(
7473
self,
7574
seed: int = 1,
@@ -388,6 +387,8 @@ def search(
388387
)
389388

390389

390+
if self.dataset is None:
391+
raise ValueError("`dataset` in {} must be initialized, but got None".format(self.__class__.__name__))
391392
return self._search(
392393
dataset=self.dataset,
393394
optimize_metric=optimize_metric,

autoPyTorch/data/base_feature_validator.py

Lines changed: 36 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
import logging
2-
from typing import List, Optional, Union
2+
from typing import List, Optional, Set, Tuple, Union
33

44
import numpy as np
55

@@ -35,24 +35,21 @@ class BaseFeatureValidator(BaseEstimator):
3535
List of the column types found by this estimator during fit.
3636
data_type (str):
3737
Class name of the data type provided during fit.
38-
column_transformer (Optional[BaseEstimator])
38+
encoder (Optional[BaseEstimator])
3939
Host a encoder object if the data requires transformation (for example,
40-
if provided a categorical column in a pandas DataFrame)
41-
transformed_columns (List[str])
42-
List of columns that were encoded.
40+
if provided a categorical column in a pandas DataFrame).
4341
"""
4442
def __init__(
4543
self,
4644
logger: Optional[Union[PicklableClientLogger, logging.Logger]] = None,
47-
):
45+
) -> None:
4846
# Register types to detect unsupported data format changes
4947
self.feat_type: Optional[List[str]] = None
5048
self.data_type: Optional[type] = None
5149
self.dtypes: List[str] = []
5250
self.column_order: List[str] = []
5351

5452
self.column_transformer: Optional[BaseEstimator] = None
55-
self.transformed_columns: List[str] = []
5653

5754
self.logger: Union[
5855
PicklableClientLogger, logging.Logger
@@ -64,6 +61,8 @@ def __init__(
6461
self.categorical_columns: List[int] = []
6562
self.numerical_columns: List[int] = []
6663

64+
self.all_nan_columns: Optional[Set[Union[int, str]]] = None
65+
6766
self._is_fitted = False
6867

6968
def fit(
@@ -86,7 +85,7 @@ def fit(
8685

8786
# If a list was provided, it will be converted to pandas
8887
if isinstance(X_train, list):
89-
X_train, X_test = self.list_to_dataframe(X_train, X_test)
88+
X_train, X_test = self.list_to_pandas(X_train, X_test)
9089

9190
self._check_data(X_train)
9291

@@ -120,6 +119,7 @@ def _fit(
120119
self:
121120
The fitted base estimator
122121
"""
122+
123123
raise NotImplementedError()
124124

125125
def _check_data(
@@ -129,11 +129,12 @@ def _check_data(
129129
"""
130130
Feature dimensionality and data type checks
131131
132-
Arguments:
132+
Args:
133133
X (SUPPORTED_FEAT_TYPES):
134134
A set of features that are going to be validated (type and dimensionality
135135
checks) and a encoder fitted in the case the data needs encoding
136136
"""
137+
137138
raise NotImplementedError()
138139

139140
def transform(
@@ -150,4 +151,30 @@ def transform(
150151
np.ndarray:
151152
The transformed array
152153
"""
154+
155+
raise NotImplementedError()
156+
157+
def list_to_pandas(
158+
self,
159+
X_train: SUPPORTED_FEAT_TYPES,
160+
X_test: Optional[SUPPORTED_FEAT_TYPES] = None,
161+
) -> Tuple[pd.DataFrame, Optional[pd.DataFrame]]:
162+
"""
163+
Converts a list to a pandas DataFrame. In this process, column types are inferred.
164+
165+
If test data is provided, we proactively match it to train data
166+
167+
Args:
168+
X_train (SUPPORTED_FEAT_TYPES):
169+
A set of features that are going to be validated (type and dimensionality
170+
checks) and a encoder fitted in the case the data needs encoding
171+
X_test (Optional[SUPPORTED_FEAT_TYPES]):
172+
A hold out set of data used for checking
173+
Returns:
174+
pd.DataFrame:
175+
transformed train data from list to pandas DataFrame
176+
pd.DataFrame:
177+
transformed test data from list to pandas DataFrame
178+
"""
179+
153180
raise NotImplementedError()

autoPyTorch/data/base_target_validator.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,7 @@ def __init__(self,
4848
logging.Logger
4949
]
5050
] = None,
51-
):
51+
) -> None:
5252
self.is_classification = is_classification
5353

5454
self.data_type: Optional[type] = None
@@ -98,6 +98,7 @@ def fit(
9898
np.shape(y_test)
9999
))
100100
if isinstance(y_train, pd.DataFrame):
101+
y_train = cast(pd.DataFrame, y_train)
101102
y_test = cast(pd.DataFrame, y_test)
102103
if y_train.columns.tolist() != y_test.columns.tolist():
103104
raise ValueError(
@@ -143,7 +144,7 @@ def _fit(
143144

144145
def transform(
145146
self,
146-
y: Union[SUPPORTED_TARGET_TYPES],
147+
y: SUPPORTED_TARGET_TYPES,
147148
) -> np.ndarray:
148149
"""
149150
Args:

0 commit comments

Comments
 (0)