Description
Currently we accept the following kinds of data and use the type internally throughout or selectively just use one of them when more than one is appropriate. Generally, there is also quite a few mypy errors for not covering all cases of type signatures which are too general.
SUPPORTED_FEAT_TYPES = Union[List, DataFrame, ndarray, spmatrix]
SUPPORTED_TARGET_TYPES = Union[List, Series, DataFrame, ndarray, spmatrix]
Firstly, due to the general lack of support in sklearn
for y's being spmatrix
, we should probably remove that, keeping the check and issuing the warning about densification if a sparse y is passed in.
Second, a proposed change to have the following types:
X_INPUT_T = Union[List, DataFrame, ndarray, spmatrix]
Y_INPUT_T = Union[List, DataFrame, ndarray, Series]
XT = Union[ndarray, spmatrix]
YT = ndarray
The X_INPUT_T
and Y_INPUT_T
are types exposed to the user interface but the input validator converts all data:
class AutoML:
def fit(X: X_INPUT_T, y: Y_INPUT_T, ...):
...
X_transformed : XT
y_transformed : YT
X_transformed, y_transformed = input_validator.transform(X, y)
...
# Use X_transformed and y_transformed with type signatures of XT and YT from here
This would mean updating the estimators interfaces as well
Bonus points for a function that specifies that it returns the same type it gets in:
T = TypeVar("T", ndarray, spmatrix)
def process(X: T) -> T: ...