-
Notifications
You must be signed in to change notification settings - Fork 86
Add seed params to search params #168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
why tests did not spot these missing parts? do we have something uncovered? |
As far as I can tell, this code doesn't even compile. I think tests pass independently on branchs, and the breakage only shows up when both branches are merged into master. |
@@ -145,13 +145,17 @@ pub struct KMeansSearchParameters { | |||
pub k: Vec<usize>, | |||
/// Maximum number of iterations of the k-means algorithm for a single run. | |||
pub max_iter: Vec<usize>, | |||
/// Determines random number generation for centroid initialization. | |||
/// Use an int to make the randomness deterministic | |||
pub seed: Vec<Option<u64>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need Vec of Option? It looks to me that this should be either Option or Option<Vec<u64>>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, I think the cleanest type would be be Vec, although that'll require a little extra logic to convert to/from the Option seed type during iteration. I can take this up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not completely sure if this is equivalent one-to-one, but in scikit it seems that random_state is only a number instead of a Vec, so not sure if we should try to match that here with a Option<u64>
instead of the Vec
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're looking at normal random_state params for estimators. Scikit grid search accepts a Dict
with values that are lists.
In theory, I agree that it would seem odd to me that people would want to grid search with multiple random seeds, but in practice I think that making this the only *SearchParameter that is scalar would be more surprising.
* Complete grid search params (#166) * grid search draft * hyperparam search for linear estimators * grid search for ensembles * support grid search for more algos * grid search for unsupervised algos * minor cleanup * Lmm/add seeds in more algorithms (#164) * Provide better output in flaky tests * feat: add seed parameter to multiple algorithms * Update changelog Co-authored-by: Luis Moreno <morenol@users.noreply.github.com> * add seed param to search params (#168) * make default params available to serde (#167) * add seed param to search params * make default params available to serde * lints * create defaults for enums * lint Co-authored-by: morenol <22335041+morenol@users.noreply.github.com> Co-authored-by: Luis Moreno <morenol@users.noreply.github.com>
#164 & #166 didn't have a direct conflict, but they did race while adding both seeds and search params to certain classes. This completes them together.