Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Add default losses to KerasClassifier and KerasRegressor #208

Open
wants to merge 52 commits into
base: master
Choose a base branch
from

Conversation

stsievert
Copy link
Collaborator

@stsievert stsievert commented Feb 27, 2021

What does this PR implement?
It adds loss="categorical_crossentropy" to KerasClassifier by default. It adds some protection in case the user passes multiple classes in y but the model is only configured for one class. This implementation covers the following:

  • multi-class classification with multiple outputs in the final layer of the model (e.g., Dense(10) for MNIST).
  • binary classification with a single output (e.g., Dense(1)).

I test both these cases.

Reference issues/PR
This PR closes #206

Copy link
Owner

@adriangb adriangb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR. I left some more general questions.

Comment on lines 35 to 37
* binary classification
* one hot classification
* single class classification
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean we do not support 1 output with multiple classes? Am I getting confused by the usage of outputs vs. output units.

The most common way to set up single target multi-class problems in Keras is with output=Dense(n_classes, activation="softmax") and one of categorical_crossentropy or sparse_categorical_crossentropy:

def clf():
    model = tf.keras.Sequential()
    model.add(tf.keras.layers.Input(shape=(FEATURES,)))
    model.add(tf.keras.layers.Dense(N_CLASSES, activation="softmax"))
    model.compile(loss="cce")  # or "scce"
    return model

Would this use no longer be supported?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I see now. This should work!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made the uses cases where this works more clear in 9358d6e. It works for all major use cases.

This PR simply changes the default loss; it doesn't change compatibility in any way.

Comment on lines 1319 to 1335
def _fit_keras_model(self, *args, **kwargs):
try:
super()._fit_keras_model(*args, **kwargs)
except ValueError as e:
if (
self.loss == "categorical_crossentropy"
and hasattr(self, "model_")
and 1 in {o.shape[1] for o in getattr(self.model_, "outputs", [])}
):
raise ValueError(
"The model is configured to have one output, but the "
f"loss='{self.loss}' is expecting multiple outputs "
"(which is often used with one-hot encoded targets). "
"More detail on Keras losses: https://keras.io/api/losses/"
) from e
else:
raise e
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like it should live in _check_model_compatibility, or they should be merged in some way.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error message only provides marginal utility: it protects against cases when the model has one output but there are multiple classes.

It can not go in _check_model_compatibility; I wait for an error to be raised before issuing this warning (otherwise a model with a single output raises an error).

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Is there a specific error message we can check for, like if "some Keras error" in str(e)?

getattr(self.model_, "outputs", [])

Is this necessary? model_ should always have an outputs attribute, except in the case described in #207, but that should be a separate check/error.

f"loss='{self.loss}' is expecting multiple outputs "

Can you clarify what you mean by a loss expecting a number of outputs? My understanding is that Keras "broadcasts" losses to outputs, so if you give it a scalar loss (ie.. loss="bce") with 2 outputs (i.e. len(model_.outputs) == 2), it will implicitly compile the model with loss=[original_loss] * len(outputs). But you can actually map losses to outputs manually, by passing loss=["bce", "mse"] or loss={"out1": "bce", "out2": "mse"}. From the tests, it seems like by "loss is expecting multiple outputs" you mean that there is a single output unit but multiple classes, which I feel like could be confused with the above concept of configuring a loss for multiple outputs.

I'm also curious about the iteration through outputs (1 in {o.shape[1] for o in self.model_.outputs}). SciKeras does not support >1 output out of the box (users need to override target_encoder) so it seems a bit strange to try to account for that when using the default loss. I feel that using the default loss should only be supported for the simple single-output cases that target_encoder supports.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a side note: I think giving users better errors and validating their inputs like you are doing here can be a very valuable part of SciKeras, but currently it is done in an ad-hoc manner via _check_model_compatibility, etc. I think if we add more of these types of things, it would be nice to have an organized interface for it. I opened #209 to try to brainstorm ideas for this.

@stsievert stsievert changed the title ENH: Add default loss to KerasClassifier ENH: Add default losses to KerasClassifier and KerasRegressor Feb 28, 2021
@adriangb
Copy link
Owner

Should we do anything for models with a single output and a single linear output unit and 2 classes? This is a subset of your error catching / test, but Keras won't raise a ValueError for it. Instead, it will train and not learn anything. The same model with loss="bce" will easily predict with 100% accuracy:

import numpy as np
import tensorflow as tf

from scikeras.wrappers import KerasClassifier


N_CLASSES = 2
FEATURES = 1
n_eg = 10000
y = np.random.choice(N_CLASSES, size=n_eg).astype(int)
X = y.reshape(-1, 1).astype("float32")


def clf():
    model = tf.keras.Sequential()
    model.add(tf.keras.layers.Input(shape=(FEATURES,)))
    model.add(tf.keras.layers.Dense(4))
    model.add(tf.keras.layers.Dense(1))
    return model


model = KerasClassifier(clf, loss="categorical_crossentropy", epochs=50)

model.fit(X, y).score(X, y)  # ~ 0.5

Copy link
Owner

@adriangb adriangb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you're making some progress, thank you for keeping at it. If you're budding up against annoying / hard to debug tests let me know, I'm happy to jump in and try to iron it out.

scikeras/_types.py Show resolved Hide resolved
@stsievert
Copy link
Collaborator Author

stsievert commented Mar 2, 2021

I've got most of the issues resolved:

  • A warning is issued for a binary classification target but loss="categorical_crossentropy"; the user might have compiled their own model.
  • I removed the GitHub action dependence; why should the tests depend on linting? It's really annoying.

The tests on fully compliant Scikit-learn estimators are failing now. I'd appreciate some help.

Copy link
Owner

@adriangb adriangb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A warning is issued for a binary classification target but loss="categorical_crossentropy"; the user might have compiled their own model.

👍

I removed the GitHub action dependence; why should the tests depend on linting? It's really annoying.

This is a somewhat arbitrary choice. I'm open to removing it, but I'd like that done in a separate PR. If you use the git pre-commit hook, you should never have a problem being annoyed by linting because you won't even be able to make a commit that fails linting (unless you manually override it).

The tests on fully compliant Scikit-learn estimators are failing now

I'll take a look.

scikeras/wrappers.py Outdated Show resolved Hide resolved
tests/test_simple_usage.py Show resolved Hide resolved
tests/test_simple_usage.py Outdated Show resolved Hide resolved
est = KerasRegressor(model=shallow_net, model__single_output=True)
assert est.loss == "mse"
est.partial_fit(X, y)
assert est.model_.loss.__name__ == "mean_squared_error"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the assertion that the "long" name for the loss was used in the model necessary here? I don't see the same assertion for classifiers.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assert statement is present to make sure the BaseWrapper.loss is mirror in BaseWrapper.model_.loss. I'll add a test for KerasClassifier too.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this is good use case for scikeras.utils.loss_name?

@stsievert
Copy link
Collaborator Author

stsievert commented Mar 2, 2021

Tests are passing now; I manually specified loss=None. I think loss="auto" should decide between binary_crossentropy and categorical_crossentropy, and I'm not a huge fan of changing the package implementation based on the tests.

On the whole, this PR makes SciKeras more usable, though technically backwards incompatible for a very narrow use case. That relies on a couple points:

  • Simple usage works now.
    • Before: For some tests in test_simple_usage.py, the master branch raises a "ValueError: No valid loss function found. You must provide a loss function to train" for most of the tests in test_simple_usage.py.
    • After: If I specify a commonly used model (e.g., a regressor with one output node or classifier with N_CLASSES output nodes) and supply training data, my Keras model will learn.
    • Technically, this is backward incompatible if the user is compiling their own model and depending on loss is None in their function.
  • One less error raised:
    • Before: An exception is raised if BaseWrapper.loss and BaseWrapper.model_.loss are different.
    • After: a warning is issued for the same case. (if the user is compiling their own model, I presume they have a special use case or know what they're doing).

@adriangb
Copy link
Owner

adriangb commented Mar 3, 2021

I believe this PR is not breaking basic wrapper backwards compatibility:

import numpy as np
from scikeras.wrappers import KerasClassifier
from tensorflow import keras

num_classes = 3
n_features = 4
n_samples = 100

X = np.random.random(size=(n_samples, n_features))
y = np.random.randint(low=0, high=num_classes, size=(n_samples,))

def get_model():
    model = keras.Sequential(
        [
            keras.Input((n_features,)),
            keras.layers.Dense(n_features),
            keras.layers.Dense(num_classes, activation="softmax"),
        ]
    )
    model.compile(loss="sparse_categorical_crossentropy")
    return model

# Works via Keras
get_model().fit(X, y)

# Works via Keras wrappers
keras.wrappers.scikit_learn.KerasClassifier(get_model).fit(X, y)

# Does not work with SciKeras stsievert:clf-default-loss but does work on SciKeras master
KerasClassifier(get_model).fit(X, y)  # InvalidArgumentError: logits and labels must have the same first dimension

This can be "fixed" by using KerasClassifier(get_model, loss=None) like you do in the tests, but this is a breaking change so I'd like to call it out explicitly before moving forward.

As to why this happens, it is. because if loss="categorical_crossentropy" but the user passes non one-hot encoded labels, SciKeras (and the TF wrappers before it) will automatically one-hot encode the labels. So in this case, the labels get one-hot encoded but then the loss function is expecting non one-hot encoded labels.

@stsievert
Copy link
Collaborator Author

This error only happens when both of the following occur:

  • When the user compiles the model themselves w/ loss="sparse_categorical_crossentropy".
  • When the default loss is not changed from categorical_crossentropy.

I'm going to raise an exception if the Keras model specifies loss="sparse_categorical_crossentropy" and, in the error message, tell the user to how to clear the error (changing the loss parameter in KerasClassifier.fit is not allowed by the Scikit-learn API).

@adriangb
Copy link
Owner

adriangb commented Mar 4, 2021

Can't this also be solved by just defaulting to "auto"?

This also makes me wonder if the design decision to put the description of the data and encoding / decoding into the same place was flawed. Or if perhaps inspecting the compiled model was better than relying on the user's input.

@stsievert
Copy link
Collaborator Author

Yes, loss="auto" is a good idea. It simplifies the tests, and also allows choosing loss="bce" if there is one output node, there's only two unique values in y and the user has not compiled the model.

@adriangb
Copy link
Owner

adriangb commented Mar 4, 2021

Yes, loss="auto" is a good idea.

I'll try to cook up an implementation of that in the next couple of days. Regardless, thank you for all of the great work you've done on this PR so far, I think it's done a lot to illuminate our options and challenges with this feature.

@stsievert
Copy link
Collaborator Author

I'd like to see a new function inserted after model creation and before model compiling to choose the some compile arguments based on the model architecture. It'd be useful to have this function to choose the loss function from the number of output neurons. That'd help a lot with the loss="auto" implementation and choosing between loss="bce" and loss="cce".

@adriangb
Copy link
Owner

adriangb commented Mar 4, 2021

We have _compile_model, although it currently also decides if it should compile the model or not. Perhaps we can split that into _should_compile_model and _compile_model where _compile_model is a designated dependency injection point for insertion of arbitrary compile logic based upon model architecture and any other pre-determined parameters (like the output from target_encoder)? Then SciKeras can by default use this to handle loss="auto", but users can also override it to support other uses.

@stsievert
Copy link
Collaborator Author

stsievert commented Mar 4, 2021

I'm thinking of this pseudo-code:

assert len(self.model_.outputs) == 1
out = self.model_.outputs[0]

if self.loss == "auto":
    if out.shape[1] == 1 and y_n_unique <= 2:
        loss = "binary_crossentropy"
    elif out.shape[1] > 1 and y_encoded.ndim == 1:
        loss = "sparse_categorical_crossentropy"
    elif out.shape[1] > 1 and y_encoded.ndim > 1:
        loss = "categorical_crossentropy"

self._compile_model(loss=loss)

This code isn't exact – I'm not sure if all the conditions for the loss functions are right.

This requires that _build_fn be called (for self.model_) and that the data be encoded (for metadata about y_encoded). It looks like this code could go an overwrite of _ensure_compiled_model (which needs a better name). KerasClassifier would also have to override _get_compile_kwargs to return a default loss value (the promise of compile_kwargs is that they're all valid arguments to self.model_.compile, right?).

@adriangb
Copy link
Owner

adriangb commented Mar 4, 2021

This seems pretty similar to what I am proposing in #210 , please check that PR if you can.

adriangb added a commit that referenced this pull request Mar 6, 2021
See thread in #208 (review)

Since we use GHA as a FOSS project, we really don't pay for usage. This will increase usage because tests will run even if linting fails, but will help PR feedback TAT because failed linting won't preclude getting test data back.
@adriangb
Copy link
Owner

adriangb commented Mar 6, 2021

  • I removed the GitHub action dependence; why should the tests depend on linting? It's really annoying.

Resolved via #217

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

No default loss for KerasClassifier
3 participants