Skip to content

New datasets #96

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 25 commits into from
Dec 19, 2022
Merged

New datasets #96

merged 25 commits into from
Dec 19, 2022

Conversation

denkle
Copy link
Collaborator

@denkle denkle commented Nov 19, 2022

The first attempt to start adding datasets from a collection used within “Do we Need Hundreds of Classifiers to Solve Real World Classification Problems?”
The file for the first dataset is one of the most important ones because other files from the collection will pretty much follow what is specified in this file.

@denkle denkle requested a review from mikeheddes November 19, 2022 22:28
@mikeheddes
Copy link
Member

Thanks for submitting this PR! It looks great, I think having all these datasets as part of the library is a great addition and from here it should not be too hard to add more of them. Great work!

@denkle denkle requested a review from mikeheddes December 10, 2022 17:58
Copy link
Member

@mikeheddes mikeheddes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking really good, exactly what I had in mind. I just added some minor code organization and refactoring comments. Which is mostly about trying to isolate the common behavior.

Also, can you remove the .DS_Store file from the PR? and make sure to add Adult also to the documentation.

Copy link
Collaborator Author

@denkle denkle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file should not have been deleted - I did that accidentally while trying to revert its inclusion into the PR.
Not sure at the moment how to revert this deletion.

@denkle denkle requested a review from mikeheddes December 12, 2022 23:48
@mikeheddes
Copy link
Member

mikeheddes commented Dec 15, 2022

I am resolving some minor outstanding issues and will push my changes soon. Small question, is the number of folds always 4 or is it dataset dependent?

@denkle
Copy link
Collaborator Author

denkle commented Dec 15, 2022

I am resolving some minor outstanding issues and will push my changes soon. Small question, is the number of folds always 4 or is it dataset dependent?

Yes, for datasets in the collection the number of folds is always 4.

@mikeheddes
Copy link
Member

@denkle could you review my refactoring of the _load_data methods? I want to make sure I didn't break it. Otherwise I think it's good to go

@denkle
Copy link
Collaborator Author

denkle commented Dec 17, 2022

@denkle could you review my refactoring of the _load_data methods? I want to make sure I didn't break it. Otherwise I think it's good to go

@mikeheddes, great revision of the code! The logic is more streamlined in multiple places! I do not see any problems with _load_data methods so assume it is good to go

Copy link
Collaborator Author

@denkle denkle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to go, I believe

@mikeheddes mikeheddes merged commit 92d3b4a into main Dec 19, 2022
@mikeheddes mikeheddes deleted the New_datasets branch December 19, 2022 08:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants