Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AutoML Add Recommendation Task #4246
AutoML Add Recommendation Task #4246
Changes from 10 commits
6f3d26c
7e0f6d0
22edabb
50e0dcd
09c56f7
15c58f1
ac57d9a
c07948f
f182a20
9695ffe
b54de14
913b4af
3fc520c
2f47c02
c78efbf
5864b78
4010d90
fef926e
9c4852c
74cbc5c
7e7c272
17500cf
b882ee1
d7a272d
b69d9c3
f9c6abb
7d856c8
7852c5e
f889fa5
2ec0649
c39ae94
7186280
d3d6b4a
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we'd want to use
HashToKey
(name be off) instead of the mentionedValueToKey
as theValueToKey
will map future unseen values to NA in your test dataset; and as a lesser issue is slow by taking a full pass of the dataset.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please copy sweep ranges & hyperparameters from:
machinelearning/src/Microsoft.ML.Recommender/MatrixFactorizationTrainer.cs
Lines 173 to 250 in edfd10f
Ideally, in the future, we should access them directly from the learner instead of having our copy in AutoML.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems that automl will try 5 * 4 * 3 * 3 *... set of params on MatrixFactorization, will that cost too much time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tldr; SMAC focuses our search efforts, and we are bounded by our total runtime.
This is expected (and should be greater). The Bayesian hyperparameter optimization (SMAC) focuses on the useful areas of the search space.
For selected trainers, we first do 20 iterations of random sweeping to warm up the search space. Then SMAC uses the found results of those 20 iterations to predict which areas of the hyperparameter space is best to explore next. The choice of what to explore next is based on which areas areas are doing best and which areas are unexplored/uncertain.
Beyond SMAC we have additional hyperparameter optimization algos we should be using like KDO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For your comment in the PR description:
Is AutoML not stopping at the specified timeout? If you set it to 60s of runtime, it should stop soon after this limit.
AutoML is designed to do round robin between three trainers culled from 8-11 depending on the task. Since this has only one trainer to choose from, perhaps the AutoML code needs updating?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turns out that I forget to set the experiment time so it is defaulted to 16400... My Bad