-
Notifications
You must be signed in to change notification settings - Fork 86
grid search #154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
grid search #154
Conversation
Codecov Report
@@ Coverage Diff @@
## development #154 +/- ##
===============================================
+ Coverage 83.92% 84.08% +0.16%
===============================================
Files 81 82 +1
Lines 8733 9481 +748
===============================================
+ Hits 7329 7972 +643
- Misses 1404 1509 +105
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
This looks great!
If you have any idea about how to extend this (multiple metrics evaluation, randomized search), please open issues |
I think it will be a similar sized effort (slightly larger) to implement randomized search with similar functionality to scikit, e.g. sampling from distributions. There might be an intermediate implementation if we only added a sampling Iterator that operates on the same parameter vectors, but instead of exhaustively enumerating every possible combination, it just samples from the grid. I don't think that interface would be as nice though, since it would put more work on the caller to pre-generate their arrays, although that is one of the options for scikits random search. |
wow that's a lot of stuff! great work 👍🏼 Let's try to merge this then we move on to what is next. |
src/svm/svc.rs
Outdated
/// Tolerance for stopping epoch. | ||
pub tol: Vec<T>, | ||
/// The kernel function. | ||
pub kernel: Vec<K>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't work, since we may want to compare multi different types of kernels, and Vec must contain only a single type. SVM Kernel is the only example of this issue among the existing parameters.
The options to fix don't appear straightforward to me. I haven't had success boxing the kernels, since they are templated themselves, and can't be dyn
. I started looking at a potential enum
based solution where we could store equal sized enum values that represent the kernels in a vec, but haven't gotten it working yet.
I'm adding a test case to illustrate the issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Enums can store values:
enum Kernels<T> {
Linear,
Polynomial(T, T, T),
}
fn main() {
let v = vec![
Kernels::<f32>::Linear,
Kernels::<f32>::Polynomial(32.0, 1.0, 5.0),
];
for i in &v {
match i {
Kernel::<f32>::Linear => {}
Kernel::<f32>::Polynomial(a, b, c) => {}
};
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've started outlining a solution with the commit below that moves from the current Kernel
implementations to an Enum version, but I believe this would be a breaking change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test failure appears to be unrelated to the refactor at a surface level, but due to the new test case exercising SVC in more depth. I could use some help understanding the issue further since I'm less familiar with how SVC is actually implemented.
/// * `parameter_search` - an iterator for parameters that will be tested. | ||
/// * `cv` - the cross-validation splitting strategy, should be an instance of [`BaseKFold`](./trait.BaseKFold.html) | ||
/// * `score` - a metric to use for evaluation, see [metrics](../metrics/index.html) | ||
pub fn grid_search<T, M, I, E, K, F, S>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a test case for this function, does this lint ignore tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change svr.rs
line 717 to let deserialized_svr: SVR<f64, DenseMatrix<f64>> = ...
and coverage will pass. No need to declare the last type.
One test is now failing for me:
---- model_selection::hyper_tuning::tests::svm_check stdout ----
thread 'main' panicked at 'index out of bounds: the len is 0 but the index is 0', src/svm/svc.rs:688:28
failures:
model_selection::hyper_tuning::tests::svm_check
I think we grew this PR too much. If there are some blockers, we should rollback this one to ebeb987, merge those changes and open new PRs for the rest; otherwise everything can become hard to manage. |
18b041f
to
ebeb987
Compare
As long as you are OK with merging while there are issues, breaking this up into smaller PRs works for me. Even though it's not complete, I think this is a good step forward. |
I think it would be viable to add the tests that are failing and flag them with |
* grid search draft * hyperparam search for linear estimators
Implements a basic grid search for #153. This approach will requires that we implement Iterator SearchParameter structs for all estimators that have Parameters that can be searched.