Skip to content

Add warning messages when full data-set is used #441

@ArturoAmorQ

Description

@ArturoAmorQ

The full data-set (no train-test split or cv) is used for modeling in the following notebooks:

This has been a source of confusion (see for instance this forum question).

We should add a Warning message similar (but adapted to each case) to the one in logistic_regression_non_linear.py:

Warning: Be aware that we fit and will check the boundary decision of the classifier on the same dataset without splitting the dataset into a training set and a testing set. While this is a bad practice, we use it for the sake of simplicity to depict the model behavior. Always use cross-validation when you want to assess the generalization performance of a machine-learning model.

Additionally, a Warning message should be added in the following notebooks

where we remind the user that scoring the model in the full data-set is not necessarily wrong but provides no info about under/over-fitting.

What do you think?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions