Add warning messages when full data-set is used

The full data-set (no train-test split or cv) is used for modeling in the following notebooks:

- [linear_regression_without_sklearn.py](https://inria.github.io/scikit-learn-mooc/python_scripts/linear_regression_without_sklearn.html)
- [linear_models_ex_01.py](https://inria.github.io/scikit-learn-mooc/python_scripts/linear_models_ex_01.html) and its [solution](https://inria.github.io/scikit-learn-mooc/python_scripts/linear_models_sol_01.html)
- [linear_regression_in_sklearn.py](https://inria.github.io/scikit-learn-mooc/python_scripts/linear_regression_in_sklearn.html)
- [linear_models_ex_02.py](https://inria.github.io/scikit-learn-mooc/python_scripts/linear_models_ex_01.html) and its [solution](https://inria.github.io/scikit-learn-mooc/python_scripts/linear_models_sol_02.html)
- [linear_regression_non_linear_link.py](https://inria.github.io/scikit-learn-mooc/python_scripts/linear_regression_non_linear_link.html)
- [linear_models_ex_04.py](https://inria.github.io/scikit-learn-mooc/python_scripts/linear_models_ex_01.html) and its [solution](https://inria.github.io/scikit-learn-mooc/python_scripts/linear_models_sol_04.html)
- [logistic_regression_non_linear.py](https://inria.github.io/scikit-learn-mooc/python_scripts/logistic_regression_non_linear.html)
- [trees_regression.py](https://inria.github.io/scikit-learn-mooc/python_scripts/trees_regression.html)
- [trees_ex_02.py](https://inria.github.io/scikit-learn-mooc/python_scripts/trees_ex_01.html) and its [solution](https://inria.github.io/scikit-learn-mooc/python_scripts/trees_sol_02.html)
- [ensemble_bagging.py](https://inria.github.io/scikit-learn-mooc/python_scripts/ensemble_bagging.html)
- [ensemble_adaboost.py](https://inria.github.io/scikit-learn-mooc/python_scripts/ensemble_adaboost.html)

This has been a source of confusion (see for instance [this forum question](https://mooc-forums.inria.fr/moocsl/t/split-train-and-test/10658)).

We should add a _Warning_ message similar (but adapted to each case) to the one in [logistic_regression_non_linear.py](https://inria.github.io/scikit-learn-mooc/python_scripts/logistic_regression_non_linear.html):

> Warning: Be aware that we fit and will check the boundary decision of the classifier on the same dataset without splitting the dataset into a training set and a testing set. While this is a bad practice, we use it for the sake of simplicity to depict the model behavior. Always use cross-validation when you want to assess the generalization performance of a machine-learning model.

Additionally, a _Warning_ message should be added in the following notebooks

- [linear_models_ex_01.py](https://inria.github.io/scikit-learn-mooc/python_scripts/linear_models_ex_01.html) and its [solution](https://inria.github.io/scikit-learn-mooc/python_scripts/linear_models_sol_01.html)
- [linear_regression_in_sklearn.py](https://inria.github.io/scikit-learn-mooc/python_scripts/linear_regression_in_sklearn.html)
- [linear_models_ex_02.py](https://inria.github.io/scikit-learn-mooc/python_scripts/linear_models_ex_01.html) and its [solution](https://inria.github.io/scikit-learn-mooc/python_scripts/linear_models_sol_02.html)
- [linear_regression_non_linear_link.py](https://inria.github.io/scikit-learn-mooc/python_scripts/linear_regression_non_linear_link.html)

where we remind the user that scoring the model in the full data-set is not necessarily wrong but provides no info about under/over-fitting.

What do you think?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add warning messages when full data-set is used #441

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add warning messages when full data-set is used #441

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions