Skip to content

Cleanup typing of internal X, y data passed around #1264

Open
@eddiebergman

Description

@eddiebergman

Currently we accept the following kinds of data and use the type internally throughout or selectively just use one of them when more than one is appropriate. Generally, there is also quite a few mypy errors for not covering all cases of type signatures which are too general.

SUPPORTED_FEAT_TYPES = Union[List, DataFrame, ndarray, spmatrix]
SUPPORTED_TARGET_TYPES = Union[List, Series, DataFrame, ndarray, spmatrix]

Firstly, due to the general lack of support in sklearn for y's being spmatrix, we should probably remove that, keeping the check and issuing the warning about densification if a sparse y is passed in.

Second, a proposed change to have the following types:

X_INPUT_T = Union[List, DataFrame, ndarray, spmatrix]
Y_INPUT_T = Union[List, DataFrame, ndarray, Series]
 
XT = Union[ndarray, spmatrix]
YT = ndarray

The X_INPUT_T and Y_INPUT_T are types exposed to the user interface but the input validator converts all data:

class AutoML:
    
    def fit(X: X_INPUT_T, y: Y_INPUT_T, ...):
        ...
        X_transformed : XT
        y_transformed : YT
        X_transformed, y_transformed = input_validator.transform(X, y)
        ...
        # Use X_transformed and y_transformed with type signatures of XT and YT from here        

This would mean updating the estimators interfaces as well

Bonus points for a function that specifies that it returns the same type it gets in:

T = TypeVar("T", ndarray, spmatrix)
def process(X: T) -> T: ...

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions