Skip to content

Samples second pass for Clustering Trainer #3317

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 17, 2019

Conversation

artidoro
Copy link
Contributor

Tracked in #2522.

This PR removes the dependency on SampleUtils for clustering trainers samples (KMeans samples).

@artidoro artidoro added the documentation Related to documentation of ML.NET label Apr 12, 2019
@artidoro artidoro self-assigned this Apr 12, 2019
{
Label = (uint)label,
// Create random features with two clusters.
// The first half has feature values cetered around 0.6 the second half has values centered around 0.4.
Copy link
Contributor

@zeahmed zeahmed Apr 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cetered [](start = 57, length = 7)

centered #Resolved

// The coordinates of centroid 0 are: (26, 6, 1)
// Expected output similar to:
// The first 3 coordinates of the first centroid are: (0.6035213, 0.6017533, 0.5964218)
// The first 3 coordinates of the second centroid are: (0.4031044, 0.4175443, 0.4082336)
//
// Note: use the advanced options constructor to set the number of threads to 1 for a deterministic behavior.
Copy link
Contributor

@zeahmed zeahmed Apr 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// Note: use the advanced options constructor to set the number of threads to 1 for a deterministic behavior. [](start = 12, length = 109)

This comment seems out of context. Specifically what does advance options constructor mean here? #Resolved

Copy link
Contributor

@zeahmed zeahmed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

foreach (var p in predictions.Take(2))
Console.WriteLine($"Label: {p.Label}, Prediction: {p.PredictedLabel}");
foreach (var p in predictions.TakeLast(3))
Console.WriteLine($"Label: {p.Label}, Prediction: {p.PredictedLabel}");
Copy link
Contributor

@rogancarr rogancarr Apr 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the time, clustering doesn't have a label. Maybe we should make one sample for labeled clustering, one for label-free clustering. #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some comments to explain that the Label column is not used during training. This is simply for comparison with the predicted label.


In reply to: 275080292 [](ancestors = 275080292)

<#=ExpectedCentroidsOutput#>
}

private static IEnumerable<DataPoint> GenerateRandomDataPoints(int count, int seed = 0)
Copy link
Contributor

@rogancarr rogancarr Apr 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

int count [](start = 71, length = 9)

Set this to the default used above. #Resolved

<#=ExpectedCentroidsOutput#>
}

private static IEnumerable<DataPoint> GenerateRandomDataPoints(int count, int seed = 0)
Copy link
Contributor

@rogancarr rogancarr Apr 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

int seed = 0 [](start = 82, length = 12)

I'm superstitious about passing a 0 to someone else's random number generator. Maybe 1? #Pending

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use 0 in all our templates for the method GenerateRandomDataPoints. I don't see what it should be an issue


In reply to: 275080461 [](ancestors = 275080461)

private static IEnumerable<DataPoint> GenerateRandomDataPoints(int count, int seed = 0)
{
var random = new Random(seed);
float randomFloat() => (float)random.NextDouble();
Copy link
Contributor

@rogancarr rogancarr Apr 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(float)random.NextDouble() [](start = 35, length = 26)

- 0.5f #Pending

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that centering at 0 will give any significant advantage.


In reply to: 275080652 [](ancestors = 275080652)

string TrainerOptions = @"KMeansTrainer.Options
{
NumberOfClusters = 2,
MaximumNumberOfIterations = 100,
Copy link
Contributor

@rogancarr rogancarr Apr 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MaximumNumberOfIterations = 100 [](start = 16, length = 31)

I don't think we should suggest tuning this. Mostly we want this option to not be hit. (I consider things we use in our samples to be suggestions.) #Resolved

string ClassName = "KMeans";
string Trainer = "KMeans";
string TrainerOptions = null;
string InlineTrainerOptions = "numberOfClusters: 2";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

numberOfClusters: 2 [](start = 43, length = 19)

10

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you suggesting to use more clusters?


In reply to: 275080967 [](ancestors = 275080967)

Copy link

@shmoradims shmoradims left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@codecov
Copy link

codecov bot commented Apr 15, 2019

Codecov Report

❗ No coverage uploaded for pull request base (master@8644b3b). Click here to learn what that means.
The diff coverage is n/a.

@@            Coverage Diff            @@
##             master    #3317   +/-   ##
=========================================
  Coverage          ?   72.69%           
=========================================
  Files             ?      807           
  Lines             ?   145172           
  Branches          ?    16225           
=========================================
  Hits              ?   105539           
  Misses            ?    35220           
  Partials          ?     4413
Flag Coverage Δ
#Debug 72.69% <ø> (?)
#production 68.23% <ø> (?)
#test 88.97% <ø> (?)

@artidoro artidoro force-pushed the clusteringsamples branch from fc2c2b2 to 421e91e Compare April 16, 2019 18:05
@artidoro artidoro force-pushed the clusteringsamples branch from 421e91e to 0e31b8b Compare April 16, 2019 23:19
@artidoro artidoro merged commit 32bd0e2 into dotnet:master Apr 17, 2019
@ghost ghost locked as resolved and limited conversation to collaborators Mar 22, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
documentation Related to documentation of ML.NET
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants