Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[python-package] fix mypy errors about custom eval and metric functions #5790

Merged
merged 4 commits into from
Mar 30, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 10 additions & 4 deletions python-package/lightgbm/engine.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@

from . import callback
from .basic import (Booster, Dataset, LightGBMError, _choose_param_value, _ConfigAliases, _InnerPredictor,
_LGBM_CategoricalFeatureConfiguration, _LGBM_CustomObjectiveFunction,
_LGBM_CategoricalFeatureConfiguration, _LGBM_CustomObjectiveFunction, _LGBM_EvalFunctionResultType,
_LGBM_FeatureNameConfiguration, _log_warning)
from .compat import SKLEARN_INSTALLED, _LGBMBaseCrossValidator, _LGBMGroupKFold, _LGBMStratifiedKFold

Expand All @@ -22,9 +22,15 @@
]


_LGBM_CustomMetricFunction = Callable[
[np.ndarray, Dataset],
Union[Tuple[str, float, bool], List[Tuple[str, float, bool]]]
_LGBM_CustomMetricFunction = Union[
Callable[
[np.ndarray, Dataset],
_LGBM_EvalFunctionResultType,
],
Callable[
[np.ndarray, Dataset],
List[_LGBM_EvalFunctionResultType]
],
]

_LGBM_PreprocFunction = Callable[
Expand Down
50 changes: 34 additions & 16 deletions python-package/lightgbm/sklearn.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,32 +33,50 @@
scipy.sparse.spmatrix
]
_LGBM_ScikitCustomObjectiveFunction = Union[
# f(labels, preds)
Callable[
[np.ndarray, np.ndarray],
[Optional[np.ndarray], np.ndarray],
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why add all these Optionals?

mypy is struggling with the facts that Dataset.get_field() can return None.

This PR proposes updating the type hints for custom metric and objective functions to match that behavior.

I intentionally chose not to update the user-facing docs about custom metric and objective functions to reflect that the label, group, and weights passed to these functions can technically be None... in almost all situations, they should be non-None. I don't think complicating the docs is worth it.

Tuple[np.ndarray, np.ndarray]
],
# f(labels, preds, weights)
Callable[
[np.ndarray, np.ndarray, np.ndarray],
[Optional[np.ndarray], np.ndarray, Optional[np.ndarray]],
Tuple[np.ndarray, np.ndarray]
],
# f(labels, preds, weights, group)
Callable[
[np.ndarray, np.ndarray, np.ndarray, np.ndarray],
[Optional[np.ndarray], np.ndarray, Optional[np.ndarray], Optional[np.ndarray]],
Tuple[np.ndarray, np.ndarray]
],
]
_LGBM_ScikitCustomEvalFunction = Union[
# f(labels, preds)
Callable[
[np.ndarray, np.ndarray],
Union[_LGBM_EvalFunctionResultType, List[_LGBM_EvalFunctionResultType]]
[Optional[np.ndarray], np.ndarray],
_LGBM_EvalFunctionResultType
],
Callable[
[np.ndarray, np.ndarray, np.ndarray],
Union[_LGBM_EvalFunctionResultType, List[_LGBM_EvalFunctionResultType]]
[Optional[np.ndarray], np.ndarray],
List[_LGBM_EvalFunctionResultType]
],
# f(labels, preds, weights)
Callable[
[np.ndarray, np.ndarray, np.ndarray, np.ndarray],
Union[_LGBM_EvalFunctionResultType, List[_LGBM_EvalFunctionResultType]]
[Optional[np.ndarray], np.ndarray, Optional[np.ndarray]],
_LGBM_EvalFunctionResultType
],
Callable[
[Optional[np.ndarray], np.ndarray, Optional[np.ndarray]],
List[_LGBM_EvalFunctionResultType]
],
# f(labels, preds, weights, group)
Callable[
[Optional[np.ndarray], np.ndarray, Optional[np.ndarray], Optional[np.ndarray]],
_LGBM_EvalFunctionResultType
],
Callable[
[Optional[np.ndarray], np.ndarray, Optional[np.ndarray], Optional[np.ndarray]],
List[_LGBM_EvalFunctionResultType]
]
Comment on lines +72 to +79
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Splitting cases like

Callable[
    [np.ndarray, np.ndarray],
    Union[_LGBM_EvalFunctionResultType, List[_LGBM_EvalFunctionResultType]]
]

into

Union[
    Callable[
        [np.ndarray, np.ndarray],
        _LGBM_EvalFunctionResultType
    ],
    Callable[
        [np.ndarray, np.ndarray],
        List[_LGBM_EvalFunctionResultType]
    ]
]

helps mypy with errors like this:

Argument 1 to "eval_valid" of "Booster" has incompatible type "Union[Callable[[ndarray[Any, Any], Dataset], Union[Tuple[str, float, bool], List[Tuple[str, float, bool]]]], List[Callable[[ndarray[Any, Any], Dataset], Union[Tuple[str, float, bool], List[Tuple[str, float, bool]]]]], None]"; expected "Union[Union[Callable[[ndarray[Any, Any], Dataset], Tuple[str, float, bool]], Callable[[ndarray[Any, Any], Dataset], List[Tuple[str, float, bool]]]], List[Union[Callable[[ndarray[Any, Any], Dataset], Tuple[str, float, bool]], Callable[[ndarray[Any, Any], Dataset], List[Tuple[str, float, bool]]]]], None]"  [arg-type]

And I think it's slightly more correct... I'd expect people to provide a custom metric function that returns a single tuple on every iteration or one that returns a list of tuples on each iteration, but not one that could return either of those.

]
_LGBM_ScikitEvalMetricType = Union[
str,
Expand Down Expand Up @@ -135,11 +153,11 @@ def __call__(self, preds: np.ndarray, dataset: Dataset) -> Tuple[np.ndarray, np.
labels = dataset.get_label()
argc = len(signature(self.func).parameters)
if argc == 2:
grad, hess = self.func(labels, preds)
grad, hess = self.func(labels, preds) # type: ignore[call-arg]
elif argc == 3:
grad, hess = self.func(labels, preds, dataset.get_weight())
grad, hess = self.func(labels, preds, dataset.get_weight()) # type: ignore[call-arg]
elif argc == 4:
grad, hess = self.func(labels, preds, dataset.get_weight(), dataset.get_group())
grad, hess = self.func(labels, preds, dataset.get_weight(), dataset.get_group()) # type: ignore [call-arg]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mypy isn't able to break apart the individual Union items base on how many arguments inspect.signature() returns. This PR proposes just ignoring errors like this:

sklearn.py:132: error: Too many arguments  [call-arg]
sklearn.py:132: error: Too few arguments  [call-arg]

else:
raise TypeError(f"Self-defined objective function should have 2, 3 or 4 arguments, got {argc}")
return grad, hess
Expand Down Expand Up @@ -213,11 +231,11 @@ def __call__(
labels = dataset.get_label()
argc = len(signature(self.func).parameters)
if argc == 2:
return self.func(labels, preds)
return self.func(labels, preds) # type: ignore[call-arg]
elif argc == 3:
return self.func(labels, preds, dataset.get_weight())
return self.func(labels, preds, dataset.get_weight()) # type: ignore[call-arg]
elif argc == 4:
return self.func(labels, preds, dataset.get_weight(), dataset.get_group())
return self.func(labels, preds, dataset.get_weight(), dataset.get_group()) # type: ignore[call-arg]
else:
raise TypeError(f"Self-defined eval function should have 2, 3 or 4 arguments, got {argc}")

Expand Down Expand Up @@ -819,7 +837,7 @@ def _get_meta_data(collection, name, i):
num_boost_round=self.n_estimators,
valid_sets=valid_sets,
valid_names=eval_names,
feval=eval_metrics_callable,
feval=eval_metrics_callable, # type: ignore[arg-type]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This # type: ignore fixes this error:

Argument "feval" to "train" has incompatible type "List[_EvalFunctionWrapper]"; expected "Union[Union[Callable[[ndarray[Any, Any], Dataset], Tuple[str, float, bool]], Callable[[ndarray[Any, Any], Dataset], List[Tuple[str, float, bool]]]], List[Union[Callable[[ndarray[Any, Any], Dataset], Tuple[str, float, bool]], Callable[[ndarray[Any, Any], Dataset], List[Tuple[str, float, bool]]]]], None]"  [arg-type]
sklearn.py:814: note: "List" is invariant -- see https://mypy.readthedocs.io/en/stable/common_issues.html#variance
sklearn.py:814: note: Consider using "Sequence" instead, which is covariant

I'm not sure why, but mypy isn't able to figure out that _EvalFunctionWrapper is actually a Callable[[ndarray, Dataset], Tuple[str, float, bool]].

I think that's maybe related to some issues it has with comparing lists to union types including lists? e.g. python/mypy#6463

This PR proposes just ignoring that error for now... the use of custom eval metric functions is well-covered by the unit tests in test_sklearn.py and test_dask.py.

init_model=init_model,
feature_name=feature_name,
callbacks=callbacks
Expand Down