Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MAINT: better error message about one-hot encoded targets w/ loss="auto" #218

Merged
merged 19 commits into from
Mar 31, 2021

Conversation

stsievert
Copy link
Collaborator

What does this PR implement?
It adds support for one-hot encoded targets when loss="auto". If the classifier has the attributed n_classes_ (which KerasClassifier.target_encoder_ provides by default), then loss="categorical_crossentropy" is provided.

Reference issues/PRs
#208 I guess.

@adriangb
Copy link
Owner

adriangb commented Mar 8, 2021

This is great, good idea! A lot of my concerns are mitigated by the fact that:

  • It only works with pre-onhotted targets
  • If n_classes_ is absent, users will still get a NotImplementedError instead of possibly confusing results.

scikeras/wrappers.py Outdated Show resolved Hide resolved
@stsievert
Copy link
Collaborator Author

This PR still needs a test to catch when the user mis-specifies the number of output neurons.

@adriangb
Copy link
Owner

adriangb commented Mar 8, 2021

This PR still needs a test to catch when the user mis-specifies the number of output neurons.

That is, to check for this ValueError?

scikeras/wrappers.py Outdated Show resolved Hide resolved
Co-authored-by: Adrian Garcia Badaracco <1755071+adriangb@users.noreply.github.com>
@adriangb
Copy link
Owner

adriangb commented Mar 8, 2021

Would you be open to, instead of raising the specific ValueError, allow the more general error below to be raised for loss="auto" is not supported for tasks of type 'multilabel-indicator', since we basically aren't sure that this is a one-hot problem?

@stsievert
Copy link
Collaborator Author

Are you talking adding the message "loss='auto' is not supported for tasks of type 'multilabel-indicator'" to this ValueError?

@adriangb
Copy link
Owner

adriangb commented Mar 8, 2021

I'm talking about a flow like this:

if loss == "auto":
  activation = self.model_.layers[-1].activation.__name__
  ...
  elif (
      self.target_type_ == "multilabel-indicator"
      and
      hasattr(self, "n_classes_")
      and
      self.model_.outputs[0].shape[1] == self.n_classes_
      and
      activation in ("linear", "softmax")
  ):
      use_logits = activation == "linear"
      compile_kwargs["loss"] = CategoricalCrossentropy(from_logits=use_logits)
  else:
      raise NotImplementedError(
          f'`loss="auto"` is not supported for tasks of type {self.target_type_}.'
          " Instead, you must explicitly pass a loss function, for example:"
          '\n   clf = KerasClassifier(..., loss="categorical_crossentropy")'
      )

Basically, we check if the combination of the target-type and model outputs indicate with high certainty that this is a run of the mill multiclass problem with one-hot encoded labels and if not, we tell the user that we don't know how to handle this type of target (which would then be classified as `multilabel-indicator').

Full example
def _compile_model(self, compile_kwargs: Dict[str, Any]) -> None:
    if compile_kwargs["loss"] == "auto":
        if len(self.model_.outputs) > 1:
            raise ValueError(
                'Only single-output models are supported with `loss="auto"`'
            )
        activation = self.model_.get_layer(self.model_.output_names[0]).activation
        if hasattr(activation, "__name__"):
            activation_name = activation.__name__
        else:
            activation_name = None
        if self.target_type_ == "binary":
            if self.model_.outputs[0].shape[1] != 1:
                raise ValueError(
                    "Binary classification expects a model with exactly 1 output unit."
                )
            compile_kwargs["loss"] = "binary_crossentropy"
        elif self.target_type_ == "multiclass":
            if self.model_.outputs[0].shape[1] == 1:
                raise ValueError(
                    "Multi-class targets require the model to have >1 output units."
                )
            compile_kwargs["loss"] = "sparse_categorical_crossentropy"
        elif (
            self.target_type_ == "multilabel-indicator"
            and
            hasattr(self, "n_classes_")
            and
            self.model_.output_shape[1:] == (self.n_classes_, )
            and
            activation_name in ("linear", "softmax")
        ):
            use_logits = activation_name == "linear"
            compile_kwargs["loss"] = losses_module.CategoricalCrossentropy(from_logits=use_logits)
        else:
            raise NotImplementedError(
                f'`loss="auto"` is not supported for tasks of type {self.target_type_}.'
                " Instead, you must explicitly pass a loss function, for example:"
                '\n   clf = KerasClassifier(..., loss="categorical_crossentropy")'
            )
    self.model_.compile(**compile_kwargs)

@stsievert
Copy link
Collaborator Author

stsievert commented Mar 8, 2021

I think checks for the n_classes_ attribute is enough, especially if the number of output nodes is the number of classes. I also think that's simpler to document and for the users to understand. I think as a user, I'd rather see a documentation page explaining what activation functions to use.

we check if the combination of the target-type and model outputs indicate with high certainty that this is a run of the mill multiclass problem with one-hot encoded labels

I think SciKeras already makes that check. loss="categorical_crossentropy" is chosen when all of the following are true:

  • A classification problem is of interest (n_classes_ attribute).
  • The number of output nodes is equal to the number of classes.
  • The label type is multilabel-indicator.

That sounds like a pretty clear indicator that this is a run-of-the-mill multiclass problem with one-hot encoded labels. What use case are you imagining where it fails?

@adriangb
Copy link
Owner

adriangb commented Mar 8, 2021

This use case:

def get_model(compile):
    model = keras.Sequential(
        [
            keras.Input((num_classes,)),
            keras.layers.Dense(num_classes, activation="relu"),
            keras.layers.Dense(num_classes, activation="sigmoid"),
        ]
    )
    if compile:
        model.compile(loss="binary_crossentropy")
    return model

This is used when each class is independent and there can be more than 1 class per label, for example to characterize sentiment in text or movie genres.

Full example code on fc9e7ee
import numpy as np
from scikeras.wrappers import KerasClassifier
from tensorflow import keras

num_classes = 3
n_samples = 10000

y = np.random.randint(low=0, high=2, size=(n_samples, num_classes))
X = y

def get_model(compile):
    model = keras.Sequential(
        [
            keras.Input((num_classes,)),
            keras.layers.Dense(num_classes, activation="relu"),
            keras.layers.Dense(num_classes, activation="sigmoid"),
        ]
    )
    if compile:
        model.compile(loss="binary_crossentropy")
    return model

m1 = get_model(True)
m1.fit(X, y, epochs=10)

m2 = KerasClassifier(get_model, model__compile=False, epochs=10).fit(X, y)
assert m2.model_.loss == "categorical_crossentropy"

Even worse than an error, there is no error, SciKeras silently chooses "categorical_crossentropy", which is wrong for this problem, resulting in no / bad learning with no errors and no way to really tell what happened based on the output (it looks fine in terms of shape, dtype and being binary values, it's just a bunch of garbage).

I'm not sure we should support this use case via loss="auto" (although MLPClassifier does and our LabelEncoder does as well, so perhaps we should support it here), but I am sure we should not assume we can use "categorical_crossentropy" for this problem.

If this sounds reasonable to you, I can push a commit that distinguishes and supports both of these use cases and passes all of the tests, with the caveat that they are distinguished by introspecting the activation of the last layer ("sigmoid" is interpreted as the use case presented here and "softmax" as one-hot encoded multiclass). I generally do not like this type of introspection because Keras is incredibly flexible, but I think for this narrow use case of 1 output w/ n_output_units == n_classes using the activation function to decide should be pretty reliable.

Proposed _compile_model
def _compile_model(self, compile_kwargs: Dict[str, Any]) -> None:
    if compile_kwargs["loss"] == "auto":
        if len(self.model_.outputs) > 1:
            raise ValueError(
                'Only single-output models are supported with `loss="auto"`'
            )
        activation = self.model_.get_layer(self.model_.output_names[0]).activation
        activation_name = None
        if hasattr(activation, "__name__"):
            activation_name = activation.__name__
        if self.target_type_ == "binary":
            if self.model_.outputs[0].shape[1] != 1:
                raise ValueError(
                    "Binary classification expects a model with exactly 1 output unit."
                )
            compile_kwargs["loss"] = "binary_crossentropy"
        elif self.target_type_ == "multiclass":
            if self.model_.outputs[0].shape[1] == 1:
                raise ValueError(
                    "Multi-class targets require the model to have >1 output units."
                )
            compile_kwargs["loss"] = "sparse_categorical_crossentropy"
        elif (
            self.target_type_ == "multilabel-indicator"
            and
            hasattr(self, "n_classes_")
            and
            self.model_.output_shape[1:] == (self.n_classes_, )
            and
            activation_name == "softmax"
        ):
            # one-hot encoded multiclass problem
            compile_kwargs["loss"] = "categorical_crossentropy"
        elif (
            self.target_type_ == "multilabel-indicator"
            and
            hasattr(self, "n_classes_")
            and
            self.model_.output_shape[1:] == (self.n_classes_, )
            and
            activation_name == "sigmoid"
        ):
            # multilabel-indicator multiclass problem
            compile_kwargs["loss"] = "binary_crossentropy"
        else:
            raise NotImplementedError(
                f'`loss="auto"` is not supported for tasks of type {self.target_type_}.'
                " Instead, you must explicitly pass a loss function, for example:"
                '\n   clf = KerasClassifier(..., loss="categorical_crossentropy")'
            )

What will not be supported is a linear output activation w/ a loss that takes logits. That is, SciKeras would not support the following model with loss="auto":

def get_model(compile):
    model = keras.Sequential(
        [
            keras.Input((num_classes,)),
            keras.layers.Dense(num_classes, activation="relu"),
            keras.layers.Dense(num_classes),
        ]
    )
    model.compile(loss=keras.losses.BinaryCrossentropy(from_logits=True))
    return model

@stsievert
Copy link
Collaborator Author

stsievert commented Mar 8, 2021

Why not ensure that one-hot encoded labels are passed before setting loss="categorical_crossentropy"? The target encoder should probably do that verification.

@adriangb
Copy link
Owner

adriangb commented Mar 8, 2021

What do you mean by ensure that one-hot labels are passed? What if LabelEncoder gets non one-hot labels? If loss="auto", how is it supposed to do in this situation? Currently, it does nothing, which I think is the right behavior.

Edit: I think you mean have LabelEncoder set some attribute indicating that this is a one-hot problem. We'd now have 2 type_of_target results: the one set by BaseWrapper and the one set by LabelEncoder (currently, LabelEncoder's includes "multiclass-onehot", but it is private). This would also introduce even more coupling between LabelEncoder and KerasClassifier, which is not great. To do this right, I think we'd need to create a wrapper around sklearn.multiclass.type_of_target that introduces "mutliclass-onehot" and use that directly in BaseWrapper (LabelEncoder can then request that as a parameter). But previously, you had been against adding "mutliclass-onehot" as a public part of the SciKeras API.

@stsievert
Copy link
Collaborator Author

Maybe it'd be better to raise an error and provide a some information on why a user might not want to set loss="categorical_crossentropy".

@adriangb
Copy link
Owner

adriangb commented Mar 8, 2021

Maybe it'd be better to raise an error and provide a some information on why a user might not want to set loss="categorical_crossentropy".

Where would this error be raised?

Am I understanding correctly that you feel that this extra bit of introspection into the model is too much, and that instead we should not support any of the use cases in #218 (comment)?

@stsievert
Copy link
Collaborator Author

I think a warning should be issued if SciKeras thinks loss="categorical_crossentropy" is appropriate. I think an error should be raised if a loss is not chosen. I think this would be an appropriate error message:

SciKeras could not automatically decide on the appropriate loss. To clear this error, set the loss to be any valid argument Keras accepts for losses (details at https://keras.io/api/losses).

The target type is "multilabel-indicator," there are 4 classes and the model has 4 output neurons. If there is only one class per example, loss="categorical_crossentropy" might be appropriate. If there are multiple classes per example, loss="binary_crossentropy" might be appropriate. For more detail, see XXX.

I think the last paragraph should be added dynamically.

Am I understanding correctly that you feel that this extra bit of introspection into the model is too much, and that instead we should not support any of the use cases in #218 (comment)?

Yes. I think the user knows their model/dataset better than SciKeras, and they're also a lot smarter than SciKeras. I think SciKeras should support them (provide links/good information) in guiding their choice.

@adriangb
Copy link
Owner

adriangb commented Mar 8, 2021

I agree on the error message if no loss is chosen. We can discuss the specifics of what that error message should contain separately. I'd first like to settle on what is supported, that is, what you mean by SciKeras thinks loss="categorical_crossentropy" is appropriate. How does Scikeras know that "categorical_crossentropy" is appropriate? Some pseudocode might help me understand, I'm sorry if I'm being slow to pick up.

@stsievert
Copy link
Collaborator Author

def _compile(...):
    warning = ""
    loss = None
    if condition:
        loss = "binary_crossentropy"
    elif target_type == "multiclass-indicator" and hasattr(self, "n_classes_"):
        n_out = ...
        if n_out == self.n_classes_:
            warning = (
                "The target type is 'multilabel-indicator,' there are 4 classes and the model has 4 output neurons. "
                "If there is only one class per example, loss='categorical_crossentropy' might be appropriate. "
                "If there are multiple classes per example, loss='binary_crossentropy' might be appropriate. "
                "For more detail, see XXX."
            )
    if loss is None:
        err_msg = "loss='auto' failed to find a suitable loss. To clear this message, set loss to be a Keras loss ..."
        if len(warning):
            warning = f"\n\n{warning}" 
        raise ValueError("{err_msg}{warning}")
    else:
        compile_kwargs["loss"] = loss

@adriangb
Copy link
Owner

adriangb commented Mar 8, 2021

Thank you for the snipped, that is very helpful.

So the elif target_type == "multilabel-indicator" and hasattr(self, "n_classes_") does not set loss? This means that one-hot encoded targets would not be supported (although they will get a helpful error). I am okay with that; I think we can always add support later based on user feedback.

@stsievert
Copy link
Collaborator Author

So the elif target_type == "multilabel-indicator" and hasattr(self, "n_classes_") does not set loss? This means that one-hot encoded targets would not be supported (although they will get a helpful error). I am okay with that; I think we can always add support later based on user feedback.

Yeah, that's too much I think. It requires too introspection, or might produce unexpected results for the user. Users are smarter than software, so let's let them decide. I see why now you were hesitant to support it.

This PR should be refactored. I could use some help there.

@adriangb
Copy link
Owner

adriangb commented Mar 8, 2021

Users are smarter than software, so let's let them decide

👍
And like I said, we can always add this introspection later. It's easier to add support for a feature than to have to remove it or deal with unforeseen edge cases.

This PR should be refactored

If you don't mind me pushing to your fork, I can make the edits to have this reflect the intention in #218 (comment). How does that sound?

@stsievert
Copy link
Collaborator Author

If you don't mind me pushing to your fork, I can make the edits to have this reflect the intention in #218 (comment). How does that sound?

Please. Thanks for asking. I have the box "allow edits by maintainers" checked for a reason.

@adriangb
Copy link
Owner

adriangb commented Mar 8, 2021

@stsievert the basics are here, do you want to look at this here, or as part of #210 ?

hint = (
"For this type of problem, the following may help:"
'\n - If there is only one class per example, loss="categorical_crossentropy" might be appropriate.'
'\n - If there are multiple classes per example, loss="binary_crossentropy" might be appropriate.'
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think SciKeras should have a documentation page that explains this more, and points to other resources.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, however I'd like to table this until we merge this PR into #210

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

f'`loss="auto"` is not supported for tasks of type {self.target_type_}.'
" Instead, you must explicitly pass a loss function, for example:"
"\nInstead, you must explicitly pass a loss function, for example:"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style nit:

Suggested change
"\nInstead, you must explicitly pass a loss function, for example:"
" Instead, you must explicitly pass a loss function, for example:"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are two ways to clear this error:

  1. Compile the model yourself (even if loss="auto")
  2. Provide an explicit loss yourself (not loss="auto")

Is that accurate?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can add a mention of the first, but not more than that, users should know what to do.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we should definitely mention it in the error message. Maybe there should be a note about how it's only suitable for advanced usage.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a basic mention. I think that should be enough.

scikeras/wrappers.py Outdated Show resolved Hide resolved
scikeras/wrappers.py Show resolved Hide resolved
tests/test_loss_auto.py Show resolved Hide resolved
tests/test_loss_auto.py Outdated Show resolved Hide resolved
@@ -214,7 +219,7 @@ def test_multiclass_single_output_unit():
def test_binary_multiple_output_units():
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't binary classification work with loss="categorical_crossentropy"? If so, shouldn't that information be conveyed in the error message?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It technically can since bce is simply a specific case of cce, but there is no advantage to that and, depending on the implementation, I can imagine there may be a performance hit. So I think this is one area where it may just be easier to enforce that loss="auto" will not accomodate the use of a multiple output units with softmax activation by "intelligently" choosing cce, if that is what you were referring to?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be clearly mentioned on the documentation. I think that's appropriate for this PR.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests/test_loss_auto.py Show resolved Hide resolved
@stsievert stsievert changed the title ENH: allow one-hot encoded targets for loss="auto" MAINT: better error message about one-hot encoded targets w/ loss="auto" Mar 9, 2021
adriangb and others added 3 commits March 8, 2021 20:23
Co-authored-by: Scott Sievert <stsievert@users.noreply.github.com>
@codecov-io
Copy link

codecov-io commented Mar 31, 2021

Codecov Report

Merging #218 (a6c2509) into default-loss-auto (c7b567f) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@                Coverage Diff                 @@
##           default-loss-auto     #218   +/-   ##
==================================================
  Coverage              99.72%   99.72%           
==================================================
  Files                      6        6           
  Lines                    730      740   +10     
==================================================
+ Hits                     728      738   +10     
  Misses                     2        2           
Impacted Files Coverage Δ
scikeras/wrappers.py 99.75% <100.00%> (+0.25%) ⬆️
scikeras/_saving_utils.py 98.85% <0.00%> (-1.15%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c7b567f...a6c2509. Read the comment docs.

hidden_layer_dim=100,
)

clf.fit(X, y)
y_proba = clf.predict_proba(X)


Note that SciKeras even chooses a loss function and compiles your model
(see the :ref:`Advanced Usage` section for more details)!
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change: "if you want to add Keras's [loss name], specify loss="loss_name"".

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I copied over the relevant section from the "Advanced" section instead:

To override the default loss, simply specify a loss function:

.. code-block:: diff

    -KerasClassifier(model=model_build_fn)
    +KerasClassifier(model=model_build_fn, loss="categorical_crossentropy")

Do you think that works?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't categorical_crossentropy the default, or one of the defaults? Maybe use loss="kl_divergence" instead?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's one of the unsupported cases for loss="auto". But at the same time, it is relatively common in Keras. So I chose it because it's unsupported but also common.

Copy link
Collaborator Author

@stsievert stsievert Mar 31, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then 👍 to loss="categorical_crossentropy". I would add a short note about when it's relevant. I think that'd be especially useful in the quick start section.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some detail.

@adriangb
Copy link
Owner

@stsievert I think we're ready to merge this back into #208 and continue there if you are okay with that

| 1 | >=2 | one-hot | unsupported |
+-----------+-----------+----------+---------------------------------+
| > 1 | -- | -- | unsupported |
+-----------+-----------+----------+---------------------------------+
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd also add a note here saying "if you have more than one output, loss="categorical_crossentropy" is likely desired" with maybe some links to an MNIST example or something.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe:

If your targets are one-hot encoded, loss="categorical_crossentropy" is likely desired.
?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Advanced Usage of SciKeras Wrappers
===================================
==============
Advanced Usage
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the section on loss selection in "advanced usage"? I suspect the average Keras user knows about loss/optimization.

Copy link
Owner

@adriangb adriangb Mar 31, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is what it was historically named... I only shortened the name here in an attempt to get links working (and because the of SciKeras Wrappers part is arguably redundant). I'm happy to change it, in another PR that should not block this one.

@adriangb adriangb merged commit 8d50bf9 into adriangb:default-loss-auto Mar 31, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants