Skip to content

grid search #154

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Sep 19, 2022
Merged

Conversation

montanalow
Copy link
Collaborator

@montanalow montanalow commented Sep 15, 2022

Implements a basic grid search for #153. This approach will requires that we implement Iterator SearchParameter structs for all estimators that have Parameters that can be searched.

@codecov-commenter
Copy link

codecov-commenter commented Sep 15, 2022

Codecov Report

Merging #154 (e4c02e3) into development (e445f0d) will increase coverage by 0.16%.
The diff coverage is 84.21%.

@@               Coverage Diff               @@
##           development     #154      +/-   ##
===============================================
+ Coverage        83.92%   84.08%   +0.16%     
===============================================
  Files               81       82       +1     
  Lines             8733     9481     +748     
===============================================
+ Hits              7329     7972     +643     
- Misses            1404     1509     +105     
Impacted Files Coverage Δ
src/model_selection/hyper_tuning.rs 0.00% <0.00%> (ø)
src/svm/mod.rs 86.48% <ø> (ø)
src/ensemble/random_forest_classifier.rs 73.82% <77.67%> (+2.79%) ⬆️
src/ensemble/random_forest_regressor.rs 74.76% <80.00%> (+4.17%) ⬆️
src/linear/linear_regression.rs 78.87% <81.25%> (+0.69%) ⬆️
src/svm/svc.rs 88.68% <81.25%> (-1.08%) ⬇️
src/svm/svr.rs 86.19% <81.25%> (-1.08%) ⬇️
src/linear/ridge_regression.rs 75.80% <85.29%> (+3.58%) ⬆️
src/naive_bayes/bernoulli.rs 74.05% <85.71%> (+2.72%) ⬆️
src/linear/elastic_net.rs 78.53% <85.93%> (+4.19%) ⬆️
... and 13 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@Mec-iS
Copy link
Collaborator

Mec-iS commented Sep 15, 2022

This looks great!

  • maybe it would be better to not use mod.rs but a dedicated hyper_tuning.rs module
  • In general would be better to have more accurate assertions in the tests as far as possible, like asserting on values or arrays assert_eq(result, expected). Also reusing existing tests to get expected values.

If you have any idea about how to extend this (multiple metrics evaluation, randomized search), please open issues

@Mec-iS Mec-iS requested review from morenol and Mec-iS September 15, 2022 10:29
@montanalow montanalow marked this pull request as ready for review September 15, 2022 18:46
@montanalow montanalow changed the title grid search draft grid search Sep 15, 2022
@montanalow
Copy link
Collaborator Author

I think it will be a similar sized effort (slightly larger) to implement randomized search with similar functionality to scikit, e.g. sampling from distributions. There might be an intermediate implementation if we only added a sampling Iterator that operates on the same parameter vectors, but instead of exhaustively enumerating every possible combination, it just samples from the grid. I don't think that interface would be as nice though, since it would put more work on the caller to pre-generate their arrays, although that is one of the options for scikits random search.

@Mec-iS
Copy link
Collaborator

Mec-iS commented Sep 15, 2022

wow that's a lot of stuff! great work 👍🏼

Let's try to merge this then we move on to what is next.

src/svm/svc.rs Outdated
/// Tolerance for stopping epoch.
pub tol: Vec<T>,
/// The kernel function.
pub kernel: Vec<K>,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't work, since we may want to compare multi different types of kernels, and Vec must contain only a single type. SVM Kernel is the only example of this issue among the existing parameters.

The options to fix don't appear straightforward to me. I haven't had success boxing the kernels, since they are templated themselves, and can't be dyn. I started looking at a potential enum based solution where we could store equal sized enum values that represent the kernels in a vec, but haven't gotten it working yet.

I'm adding a test case to illustrate the issue.

Copy link

@levkk levkk Sep 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Enums can store values:

enum Kernels<T> {
    Linear,
    Polynomial(T, T, T),
}

fn main() {
    let v = vec![
        Kernels::<f32>::Linear,
        Kernels::<f32>::Polynomial(32.0, 1.0, 5.0),
    ];

    for i in &v {
        match i {
            Kernel::<f32>::Linear => {}

            Kernel::<f32>::Polynomial(a, b, c) => {}
        };
    }
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've started outlining a solution with the commit below that moves from the current Kernel implementations to an Enum version, but I believe this would be a breaking change.

Copy link
Collaborator Author

@montanalow montanalow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test failure appears to be unrelated to the refactor at a surface level, but due to the new test case exercising SVC in more depth. I could use some help understanding the issue further since I'm less familiar with how SVC is actually implemented.

/// * `parameter_search` - an iterator for parameters that will be tested.
/// * `cv` - the cross-validation splitting strategy, should be an instance of [`BaseKFold`](./trait.BaseKFold.html)
/// * `score` - a metric to use for evaluation, see [metrics](../metrics/index.html)
pub fn grid_search<T, M, I, E, K, F, S>(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a test case for this function, does this lint ignore tests?

Copy link
Collaborator

@Mec-iS Mec-iS Sep 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change svr.rs line 717 to let deserialized_svr: SVR<f64, DenseMatrix<f64>> = ... and coverage will pass. No need to declare the last type.

One test is now failing for me:

---- model_selection::hyper_tuning::tests::svm_check stdout ----
thread 'main' panicked at 'index out of bounds: the len is 0 but the index is 0', src/svm/svc.rs:688:28


failures:
    model_selection::hyper_tuning::tests::svm_check

@Mec-iS
Copy link
Collaborator

Mec-iS commented Sep 16, 2022

I think we grew this PR too much. If there are some blockers, we should rollback this one to ebeb987, merge those changes and open new PRs for the rest; otherwise everything can become hard to manage.

@montanalow
Copy link
Collaborator Author

As long as you are OK with merging while there are issues, breaking this up into smaller PRs works for me. Even though it's not complete, I think this is a good step forward.

@Mec-iS
Copy link
Collaborator

Mec-iS commented Sep 16, 2022

I think it would be viable to add the tests that are failing and flag them with #[ignore] (so the checks pass but we keep track about what is failing).

@Mec-iS Mec-iS merged commit 4685fc7 into smartcorelib:development Sep 19, 2022
morenol pushed a commit that referenced this pull request Nov 8, 2022
* grid search draft
* hyperparam search for linear estimators
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants