Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC improve introduction to undersampling methods #1018

Conversation

solegalli
Copy link
Contributor

related to #853

As per request, splitting into smaller PRs

@solegalli
Copy link
Contributor Author

@glemaitre ready for review

@glemaitre glemaitre changed the title reword introduction to undersampling methods DOC improve introduction to undersampling methods Jul 10, 2023
Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just 2 nitpics, otherwise LGTM

You can refer to
:ref:`sphx_glr_auto_examples_under-sampling_plot_comparison_under_sampling.py`.
One way of handling imbalanced datasets is to reduce the number of observations from
the majority class or classes. The most well known algorithm in this group is random
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the majority class or classes

I think it is a bit weird to say majority classes since usually we would refer to a single class. What about saying something like:

"reduce the number of observations from all classes but the minority class (i.e. the one with the least number of observations)."

We will discuss the different algorithms throughout this document.

Refer to :ref:`sphx_glr_auto_examples_under-sampling_plot_comparison_under_sampling.py`
for a comparison of the different undersampling methodologies.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that we this sentence because sphinx will use the title which is already stating "Compare under-sampling samplers".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand this suggestion, but let me try a fix

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I got it.

@solegalli
Copy link
Contributor Author

done

@glemaitre glemaitre merged commit ef2e75b into scikit-learn-contrib:master Jul 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants