Skip to content

Add seed params to search params #168

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 21, 2022

Conversation

montanalow
Copy link
Collaborator

#164 & #166 didn't have a direct conflict, but they did race while adding both seeds and search params to certain classes. This completes them together.

@Mec-iS
Copy link
Collaborator

Mec-iS commented Sep 21, 2022

why tests did not spot these missing parts? do we have something uncovered?

@Mec-iS Mec-iS merged commit 403d3f2 into smartcorelib:development Sep 21, 2022
@montanalow
Copy link
Collaborator Author

As far as I can tell, this code doesn't even compile. I think tests pass independently on branchs, and the breakage only shows up when both branches are merged into master.

@@ -145,13 +145,17 @@ pub struct KMeansSearchParameters {
pub k: Vec<usize>,
/// Maximum number of iterations of the k-means algorithm for a single run.
pub max_iter: Vec<usize>,
/// Determines random number generation for centroid initialization.
/// Use an int to make the randomness deterministic
pub seed: Vec<Option<u64>>,
Copy link
Collaborator

@morenol morenol Sep 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need Vec of Option? It looks to me that this should be either Option or Option<Vec<u64>>

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, I think the cleanest type would be be Vec, although that'll require a little extra logic to convert to/from the Option seed type during iteration. I can take this up.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not completely sure if this is equivalent one-to-one, but in scikit it seems that random_state is only a number instead of a Vec, so not sure if we should try to match that here with a Option<u64> instead of the Vec

Copy link
Collaborator Author

@montanalow montanalow Sep 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're looking at normal random_state params for estimators. Scikit grid search accepts a Dict with values that are lists.

In theory, I agree that it would seem odd to me that people would want to grid search with multiple random seeds, but in practice I think that making this the only *SearchParameter that is scalar would be more surprising.

morenol added a commit that referenced this pull request Sep 22, 2022
* Complete grid search params (#166)

* grid search draft

* hyperparam search for linear estimators

* grid search for ensembles

* support grid search for more algos

* grid search for unsupervised algos

* minor cleanup

* Lmm/add seeds in more algorithms (#164)

* Provide better output in flaky tests

* feat: add seed parameter to multiple algorithms

* Update changelog

Co-authored-by: Luis Moreno <morenol@users.noreply.github.com>

* add seed param to search params (#168)

* make default params available to serde (#167)

* add seed param to search params

* make default params available to serde

* lints

* create defaults for enums

* lint

Co-authored-by: morenol <22335041+morenol@users.noreply.github.com>
Co-authored-by: Luis Moreno <morenol@users.noreply.github.com>
morenol pushed a commit that referenced this pull request Nov 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants