Skip to content

Fix reversed hyperparameters in Scenarios Tests. #94

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 10, 2018
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 14 additions & 14 deletions test/Microsoft.ML.Tests/Scenarios/SentimentPredictionTests.cs
Original file line number Diff line number Diff line change
Expand Up @@ -32,8 +32,8 @@ public void TrainAndPredictSentimentModelTest()
OutputTokens = true,
StopWordsRemover = new PredefinedStopWordsRemover(),
VectorNormalizer = TextTransformTextNormKind.L2,
CharFeatureExtractor = new NGramNgramExtractor() { NgramLength = 2, AllLengths = true },
WordFeatureExtractor = new NGramNgramExtractor() { NgramLength = 3, AllLengths = false }
CharFeatureExtractor = new NGramNgramExtractor() { NgramLength = 3, AllLengths = false },
Copy link
Contributor

@glebuk glebuk May 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

false [](start = 97, length = 5)

Igor,
You are correct, that it is better to have 3 chargrams and 2 word grams.
To mitigate explosion of different noisy n-grams It would also be good to add a FeatureSelectorByCount transform a with count =2 to drop rarely used features.
Once making a change, please ensure that quailty metrics (AUC, accuracy etc) do not drop. #ByDesign

Copy link
Contributor

@TomFinley TomFinley May 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose of these tests is not to train a great model; if that was a purpose we'd be training on a dataset with more than 250 examples. :) The purpose of these tests is so that we (1) give the code a workout and (2) are alerted when we've changed the behavior of something, which may have been a change we did not mean to introduce. For that reason, it is important that we test both AllLengths = true and AllLengths = false. If we make them both the same we're reducing code coverage of the test.

Of course, for that same reason, I'd say that issue #25 was not an issue in the first place, since the purpose of these tests is to work out the code, not to create a good model. We often set parameters that will reduce in a "worse" model in tests because model performance is not the point. Code coverage is the point.

Our samples are a different matter, since those are meant to express what we might consider to be best practices. (Even then I would say that AllLengths for chargrams are probably not a good idea.) But again that's not the point of our tests at all. If this was intended to be a sample or example,then it should probably run somewhere else.


In reply to: 187114129 [](ancestors = 187114129)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I issue #25 is important as it provides users good defaults. These defaults are benchmarked to work well on many text datasets.

As said, this a toy example, so the actual output metrics aren't the goal. Hence it's ok that they went down.

The updated code does exercise both AllLengths=true and AllLengths=false:

 CharFeatureExtractor = new NGramNgramExtractor() { NgramLength = 3, AllLengths = false },
  WordFeatureExtractor = new NGramNgramExtractor() { NgramLength = 2, AllLengths = true }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok then, I'l update my vote.


In reply to: 187211456 [](ancestors = 187211456)

WordFeatureExtractor = new NGramNgramExtractor() { NgramLength = 2, AllLengths = true }
});

pipeline.Add(new FastTreeBinaryClassifier() { NumLeaves = 5, NumTrees = 5, MinDocumentsInLeafs = 2 });
Expand Down Expand Up @@ -65,16 +65,16 @@ public void TrainAndPredictSentimentModelTest()
var evaluator = new BinaryClassificationEvaluator();
BinaryClassificationMetrics metrics = evaluator.Evaluate(model, testData);

Assert.Equal(.7222, metrics.Accuracy, 4);
Assert.Equal(.9643, metrics.Auc, 1);
Assert.Equal(.96, metrics.Auprc, 2);
Assert.Equal(.5556, metrics.Accuracy, 4);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@justinormont - the quality of this model went really far down with this change. Note the confusion matrix below - there was only 1/9 negative sentences predicted correctly.

Assert.Equal(.8, metrics.Auc, 1);
Assert.Equal(.87, metrics.Auprc, 2);
Assert.Equal(1, metrics.Entropy, 3);
Assert.Equal(.7826, metrics.F1Score, 4);
Assert.Equal(.812, metrics.LogLoss, 3);
Assert.Equal(18.831, metrics.LogLossReduction, 3);
Assert.Equal(.6923, metrics.F1Score, 4);
Assert.Equal(.969, metrics.LogLoss, 3);
Assert.Equal(3.083, metrics.LogLossReduction, 3);
Assert.Equal(1, metrics.NegativePrecision, 3);
Assert.Equal(.444, metrics.NegativeRecall, 3);
Assert.Equal(.643, metrics.PositivePrecision, 3);
Assert.Equal(.111, metrics.NegativeRecall, 3);
Assert.Equal(.529, metrics.PositivePrecision, 3);
Assert.Equal(1, metrics.PositiveRecall);

ConfusionMatrix matrix = metrics.ConfusionMatrix;
Expand All @@ -88,10 +88,10 @@ public void TrainAndPredictSentimentModelTest()
Assert.Equal(0, matrix[0, 1]);
Assert.Equal(0, matrix["positive", "negative"]);

Assert.Equal(5, matrix[1, 0]);
Assert.Equal(5, matrix["negative", "positive"]);
Assert.Equal(4, matrix[1, 1]);
Assert.Equal(4, matrix["negative", "negative"]);
Assert.Equal(8, matrix[1, 0]);
Assert.Equal(8, matrix["negative", "positive"]);
Assert.Equal(1, matrix[1, 1]);
Assert.Equal(1, matrix["negative", "negative"]);
}

public class SentimentData
Expand Down