-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Samples second pass for Clustering Trainer #3317
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
{ | ||
Label = (uint)label, | ||
// Create random features with two clusters. | ||
// The first half has feature values cetered around 0.6 the second half has values centered around 0.4. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cetered [](start = 57, length = 7)
centered #Resolved
// The coordinates of centroid 0 are: (26, 6, 1) | ||
// Expected output similar to: | ||
// The first 3 coordinates of the first centroid are: (0.6035213, 0.6017533, 0.5964218) | ||
// The first 3 coordinates of the second centroid are: (0.4031044, 0.4175443, 0.4082336) | ||
// | ||
// Note: use the advanced options constructor to set the number of threads to 1 for a deterministic behavior. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Note: use the advanced options constructor to set the number of threads to 1 for a deterministic behavior. [](start = 12, length = 109)
This comment seems out of context. Specifically what does advance options constructor mean here? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
foreach (var p in predictions.Take(2)) | ||
Console.WriteLine($"Label: {p.Label}, Prediction: {p.PredictedLabel}"); | ||
foreach (var p in predictions.TakeLast(3)) | ||
Console.WriteLine($"Label: {p.Label}, Prediction: {p.PredictedLabel}"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of the time, clustering doesn't have a label. Maybe we should make one sample for labeled clustering, one for label-free clustering. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some comments to explain that the Label column is not used during training. This is simply for comparison with the predicted label.
In reply to: 275080292 [](ancestors = 275080292)
<#=ExpectedCentroidsOutput#> | ||
} | ||
|
||
private static IEnumerable<DataPoint> GenerateRandomDataPoints(int count, int seed = 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
int count [](start = 71, length = 9)
Set this to the default used above. #Resolved
<#=ExpectedCentroidsOutput#> | ||
} | ||
|
||
private static IEnumerable<DataPoint> GenerateRandomDataPoints(int count, int seed = 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
int seed = 0 [](start = 82, length = 12)
I'm superstitious about passing a 0 to someone else's random number generator. Maybe 1? #Pending
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use 0 in all our templates for the method GenerateRandomDataPoints. I don't see what it should be an issue
In reply to: 275080461 [](ancestors = 275080461)
private static IEnumerable<DataPoint> GenerateRandomDataPoints(int count, int seed = 0) | ||
{ | ||
var random = new Random(seed); | ||
float randomFloat() => (float)random.NextDouble(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(float)random.NextDouble() [](start = 35, length = 26)
- 0.5f
#Pending
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that centering at 0 will give any significant advantage.
In reply to: 275080652 [](ancestors = 275080652)
string TrainerOptions = @"KMeansTrainer.Options | ||
{ | ||
NumberOfClusters = 2, | ||
MaximumNumberOfIterations = 100, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MaximumNumberOfIterations = 100 [](start = 16, length = 31)
I don't think we should suggest tuning this. Mostly we want this option to not be hit. (I consider things we use in our samples to be suggestions.) #Resolved
string ClassName = "KMeans"; | ||
string Trainer = "KMeans"; | ||
string TrainerOptions = null; | ||
string InlineTrainerOptions = "numberOfClusters: 2"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
numberOfClusters: 2 [](start = 43, length = 19)
10
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Codecov Report
@@ Coverage Diff @@
## master #3317 +/- ##
=========================================
Coverage ? 72.69%
=========================================
Files ? 807
Lines ? 145172
Branches ? 16225
=========================================
Hits ? 105539
Misses ? 35220
Partials ? 4413
|
fc2c2b2
to
421e91e
Compare
421e91e
to
0e31b8b
Compare
Tracked in #2522.
This PR removes the dependency on SampleUtils for clustering trainers samples (KMeans samples).