-
-
Notifications
You must be signed in to change notification settings - Fork 249
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dataset improvements #74
Conversation
Codecov Report
@@ Coverage Diff @@
## master #74 +/- ##
==========================================
+ Coverage 49.44% 51.17% +1.73%
==========================================
Files 57 57
Lines 3762 3875 +113
==========================================
+ Hits 1860 1983 +123
+ Misses 1902 1892 -10
Continue to review full report at Codecov.
|
awesome! I'm currently traveling and will be back end of december, but the changes are looking good 👍 for `map_targets`: just drop the Vec implementation, it is only confusing for the user
20.12.2020 10:59:02 Ivano Donadi <notifications@github.com>:
… @Sauro98 commented on this pull request.
----------------------------------------
In src/dataset/impl_dataset.rs[#74 (comment)]:
> + pub fn map_targets_array<T, G: FnMut(&E) -> T>(self, fnc: G) -> Dataset<F, T> {
+ let DatasetBase {
+ records,
+ targets,
+ weights,
+ ..
+ } = self;
+
+ let new_targets = targets.iter().map(fnc).collect::<Vec<T>>();
+ let new_targets = Array1::from_shape_vec(new_targets.len(), new_targets).unwrap();
+
+ DatasetBase {
+ records,
+ targets: new_targets,
+ weights,
+ }
+ }
I had to add it in order to change the return type of functions in the datasets sub-crate, since map_targets puts the new targets in a Vec
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub[#74 (review)], or unsubscribe[https://github.com/notifications/unsubscribe-auth/AAHRRKI5C2RDGRWM7N4TBULSVXDGDANCNFSM4VC6ZXLQ].
[data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAEIAAABCCAYAAADjVADoAAAAAXNSR0IArs4c6QAAAARzQklUCAgICHwIZIgAAAAnSURBVHic7cEBDQAAAMKg909tDwcUAAAAAAAAAAAAAAAAAAAAAPBjRFIAASHKmKkAAAAASUVORK5CYII=###24x24:true###][Tracking-Bild][https://github.com/notifications/beacon/AAHRRKKOZXGBQ5YQ7F46BJTSVXDGDA5CNFSM4VC6ZXL2YY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGOEER6Z7Q.gif]
|
yes!
20.12.2020 10:56:43 Ivano Donadi <notifications@github.com>:
… @Sauro98 commented on this pull request.
----------------------------------------
In src/dataset/mod.rs[#74 (comment)]:
> + assert_eq!(train.targets().len(), 25);
+ assert_eq!(val.targets().len(), 25);
split_with_ratio_view for type Dataset uses the already existing implementation of the method, but this returns a slice as the targets. Would it be better to re-implement it for Dataset so that it returns an ArrayView1 as the targets?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub[#74 (review)], or unsubscribe[https://github.com/notifications/unsubscribe-auth/AAHRRKJD6PU6YUILRGMOPCTSVXC5NANCNFSM4VC6ZXLQ].
[data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAEIAAABCCAYAAADjVADoAAAAAXNSR0IArs4c6QAAAARzQklUCAgICHwIZIgAAAAnSURBVHic7cEBDQAAAMKg909tDwcUAAAAAAAAAAAAAAAAAAAAAPBjRFIAASHKmKkAAAAASUVORK5CYII=###24x24:true###][Tracking-Bild][https://github.com/notifications/beacon/AAHRRKLF5HIKMG3RO47QYELSVXC5NA5CNFSM4VC6ZXL2YY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGOEER6YRA.gif]
|
63a3b54
to
1b7ab08
Compare
Hi, I've added k-folding for
Of course, I'm here for any comments about the changes already implemented. Happy holidays ☃️ |
They are looking good, I think that documentation should really be improved. You can take this blogpost as a guide how to document functions: https://deterministic.space/machine-readable-inline-markdown-code-cocumentation.html
yes we should, but only once we have proper error handling with
thanks :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for
- switching to
DatasetBase
- extending testing and documentation
- introducing k-folding
I merged the naive bayes PR, can you rebase? |
one idea which would improve readability: |
…set and DatasetView
d674bfd
to
6e91574
Compare
9f0839f
to
2d29d8b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did not mean to start a review
66ebc93
to
f6572a1
Compare
Hi, I'm opening the PR a little early because I'd like to ask some questions. This PR wants to make the changes proposed in #70.
Dataset
has been renamedDatasetBase
Dataset
intoDatasetBase
Dataset
: ownedArray2
as records, ownedArray1
as targets. The current implementation allows to writeDataset<f24,f24>
orDataset<f24,usize>
and so on. Is this in line with the objective of the proposed changes?DatasetView
: view of anArray2
as records, view of anArray1
as targets. The current implementation allows to writeDatasetView<f24,f24>
orDatasetView<f24,usize>
and so on. Callingview()
on aDataset
produces aDatasetView
.datasets
have been changed to return an instance ofDataset
.There's still k-folding missing but I figured that it would be better to have an intermediate check. I also added tests to ensure the existence of the implementations of the required methods for the two new types and they also provide a look at how the two types can be used