New datasets #96

denkle · 2022-11-19T22:28:03Z

The first attempt to start adding datasets from a collection used within “Do we Need Hundreds of Classifiers to Solve Real World Classification Problems?”
The file for the first dataset is one of the most important ones because other files from the collection will pretty much follow what is specified in this file.

torchhd/datasets/abalone.py

mikeheddes · 2022-11-20T01:58:04Z

Thanks for submitting this PR! It looks great, I think having all these datasets as part of the library is a great addition and from here it should not be too hard to add more of them. Great work!

…torchhd into New_datasets

torchhd/datasets/abalone.py

…ctions

mikeheddes

This is looking really good, exactly what I had in mind. I just added some minor code organization and refactoring comments. Which is mostly about trying to isolate the common behavior.

Also, can you remove the .DS_Store file from the PR? and make sure to add Adult also to the documentation.

torchhd/datasets/abalone.py

torchhd/datasets/adult.py

torchhd/datasets/collection_datasets.py

Resovling the merge conflict

denkle

This file should not have been deleted - I did that accidentally while trying to revert its inclusion into the PR.
Not sure at the moment how to revert this deletion.

mikeheddes · 2022-12-15T09:27:35Z

I am resolving some minor outstanding issues and will push my changes soon. Small question, is the number of folds always 4 or is it dataset dependent?

denkle · 2022-12-15T10:32:55Z

I am resolving some minor outstanding issues and will push my changes soon. Small question, is the number of folds always 4 or is it dataset dependent?

Yes, for datasets in the collection the number of folds is always 4.

…torchhd into New_datasets

mikeheddes · 2022-12-17T11:19:41Z

@denkle could you review my refactoring of the _load_data methods? I want to make sure I didn't break it. Otherwise I think it's good to go

denkle · 2022-12-17T16:18:59Z

@denkle could you review my refactoring of the _load_data methods? I want to make sure I didn't break it. Otherwise I think it's good to go

@mikeheddes, great revision of the code! The logic is more streamlined in multiple places! I do not see any problems with _load_data methods so assume it is good to go

denkle

Good to go, I believe

denkle and others added 2 commits November 19, 2022 23:21

Create the first attempt to integrate datasets from Do we need 100s..

be9df9c

[github-action] formatting fixes

3e39078

denkle requested a review from mikeheddes November 19, 2022 22:28

mikeheddes and others added 2 commits November 19, 2022 17:33

Faster download and extraction

168e9e2

[github-action] formatting fixes

1f8b7b1

mikeheddes reviewed Nov 20, 2022

View reviewed changes

mikeheddes and others added 7 commits November 22, 2022 21:38

Move dataset to Google Drive and add download progress bar

b103967

[github-action] formatting fixes

efe2b63

Add tqdm dependency

4746c35

Merge branch 'New_datasets' of github.com:hyperdimensional-computing/…

f3ba74a

…torchhd into New_datasets

Revisting logic of assigning data w.r.t. variables

884cbac

[github-action] formatting fixes

c6d88b0

Fix google drive download link extraction

e142121

denkle commented Nov 23, 2022

View reviewed changes

torchhd/datasets/abalone.py Outdated Show resolved Hide resolved

denkle and others added 2 commits December 10, 2022 18:56

Rework classes to streamline inclusion of new datasets from the colle…

abbd27d

…ctions

[github-action] formatting fixes

4aeacfb

denkle requested a review from mikeheddes December 10, 2022 17:58

mikeheddes reviewed Dec 12, 2022

View reviewed changes

denkle and others added 5 commits December 13, 2022 00:20

Revised some logic of classes

0ee717d

Resovling the merge conflict

[github-action] formatting fixes

97c28fc

Delete collection_datasets.py

0c924db

Removed DS store

9914e98

Delete __init__.py

eced027

denkle commented Dec 12, 2022

View reviewed changes

denkle requested a review from mikeheddes December 12, 2022 23:48

Merge branch 'main' into New_datasets

08382c0

mikeheddes and others added 2 commits December 15, 2022 10:55

Refactor datasets

256959b

[github-action] formatting fixes

a28e6e6

mikeheddes added 3 commits December 17, 2022 11:45

Update workflow python version

fdfb1cf

Refactor data loading

0f71570

Merge branch 'New_datasets' of github.com:hyperdimensional-computing/…

e21ebe8

…torchhd into New_datasets

denkle commented Dec 17, 2022

View reviewed changes

Update docs

8065953

mikeheddes approved these changes Dec 19, 2022

View reviewed changes

mikeheddes merged commit 92d3b4a into main Dec 19, 2022

mikeheddes deleted the New_datasets branch December 19, 2022 08:58

New datasets #96

New datasets #96

Uh oh!

Conversation

denkle commented Nov 19, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mikeheddes commented Nov 20, 2022

Uh oh!

Uh oh!

mikeheddes left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

denkle left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mikeheddes commented Dec 15, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

denkle commented Dec 15, 2022

Uh oh!

mikeheddes commented Dec 17, 2022

Uh oh!

denkle commented Dec 17, 2022

Uh oh!

denkle left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mikeheddes left a comment •

edited

Loading

denkle left a comment •

edited

Loading

mikeheddes commented Dec 15, 2022 •

edited

Loading