-
Notifications
You must be signed in to change notification settings - Fork 86
First draft of the new n-dimensional arrays + NB use case #108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@morenol there are multiple problems with this PR, like |
Codecov Report
@@ Coverage Diff @@
## v0.5-wip #108 +/- ##
=========================================
Coverage 38.72% 38.72%
=========================================
Files 86 86
Lines 7031 7031
=========================================
Hits 2723 2723
Misses 4308 4308 Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@VolodymyrOrlov this looks nice.
Since we are rethinking the way that the arrays are represented I think that we should consider the usage of const generics.
src/naive_bayes/categorical.rs
Outdated
//! ]); | ||
//! let y = vec![0., 0., 1., 1., 1., 0., 1., 0., 1., 1., 1., 1., 1., 0.]; | ||
//! let y: Vec<u32> = vec![0, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is nice to be able to use normal integers for this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that you make a call to .to_usize()
in fn log_likelihood
and fn fit
, and this function will fail if used on negative integer values. Why would you like to support integers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am sorry, I was trying to say that it is good to be able to use other types different than floats (the constraint that we currently have in the main branch)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see! I thought you want to make sure we can run CategoricalNB on integer target variables. Yes, that's the plan! Floats are great, but we do need to support other types as well. That's one reason why I decided to try new layer on Naive Bayes first, since NB methods do not apply to floating-point data most of the time.
Any concrete ideas how we can use const generics to improve the proposed abstract layer? For example, const generics can be used to define number of dimensions in S |
I think that I edited your comment instead of a response: here it is
I am not sure how we can use them in the abstract layer. I was thinking in the structs defined in linalg::dense::. We can create an special type for static sized arrays. But maybe it is better to keep it simple and use nalgebra and ndarray when we want that kind of optimization |
Great work Volodymyr! This looks really promising.
Will help with documentation as soon as it is relatively stable. @Steboss check it out! |
That's right! The goal is to build a rich interface for 1- and 2-dimensional arrays that can be extended by anyone later. On one hand we can make ML algorithms more efficient by relying on a concrete library, like nalgebra or ndarray. Investing time into an abstract layer that facilitates integration with 3rd partly libraries, on the other hand, gives us a lot of flexibility and saves time later, if we find it necessary to bring in some other library in the future. |
@morenol please take a look, there are still missing implementations but the code is at least cleaned up. |
|
please fetch this branch and run |
I can take a look later today |
sure, when you have time. If we can fix the static analyzer (in |
@montanalow please take a look to |
great job @morenol |
not completely sure, we probably need to split in several PRs once we have all the conflicts solved in order to make it easy to review |
I've got most of the merge conflicts sorted in my personal branch, but we need to apply many of the updates in this branch to the new search params, and apply the rng changes to this branch. I think I understand it enough to get it done in an hour or so. |
@montanalow great! I was splitting the changes into multiple branches but if you can do in one pass that would be awesome. |
#171 is up that should resolve all merge conflicts in this branch, although it's a fairly large change set... |
7c88c2f
to
088f7e3
Compare
The library now compiles but there are still a lot of runtime errors and tests to fix. I have not yet a clear understanding about what it is going to be the substitute for I am not still sure we can merge this all at once, I will probably start opening a PR to merge |
I am seeing discrepancies between the |
please help. it is quite easy, just follow the implementations in the other ported modules. cc: @morenol @montanalow @titoeb Missing modules,
|
All work moved to https://github.com/smartcorelib/smartcore/tree/v0.5-wip |
This PR will eventually solve #85. The goal is to have an abstract layer that can be implemented by concrete library that enables all traditional functionality provided by n-dimensional arrays: array manipulation, transformation, basic linear algebra routines, views e.t.c.
The abstract layer I am proposing here enables us to:
As a bonus, this PR introduces a row-major dense matrix that might get handy in some cases.