Skip to content

Commit 15ae51d

Browse files
authored
Use includes for ml samples (#5245)
* update sentiment analysis for included snippets Include all code snippets from a running sample. * reference snippets for taxi ML tutorial * respond to feedback
1 parent 19dce45 commit 15ae51d

File tree

2 files changed

+56
-239
lines changed

2 files changed

+56
-239
lines changed

docs/machine-learning/tutorials/sentiment-analysis.md

Lines changed: 28 additions & 129 deletions
Original file line numberDiff line numberDiff line change
@@ -5,9 +5,9 @@ ms.date: 05/07/2018
55
ms.custom: mvc
66
#Customer intent: As a developer, I want to use ML.NET to apply a binary classification task so that I can understand how to use sentiment prediction to take appropriaste action.
77
---
8-
# Walkthrough: Use the ML.NET APIs in a sentiment analysis classification scenario
8+
# Tutorial: Use the ML.NET APIs in a sentiment analysis classification scenario
99

10-
This sample walkthrough illustrates using the ML.NET API to create a sentiment classifier via a .NET Core console application using C# in Visual Studio 2017.
10+
This sample tutorial illustrates using the ML.NET API to create a sentiment classifier via a .NET Core console application using C# in Visual Studio 2017.
1111

1212
In this tutorial, you learn how to:
1313
> [!div class="checklist"]
@@ -28,7 +28,7 @@ Sentiment analysis is either positive or negative. So, you can use classificatio
2828

2929
## Machine learning workflow
3030

31-
This walkthrough follows a machine learning workflow that enables the process to move in an orderly fashion.
31+
This tutorial follows a machine learning workflow that enables the process to move in an orderly fashion.
3232

3333
The workflow phases are as follows:
3434

@@ -43,7 +43,7 @@ The workflow phases are as follows:
4343

4444
You first need to understand the problem, so you can break it down to parts that can support building and training the model. Breaking the problem down you to predict and evaluate the results.
4545

46-
The problem for this walkthrough is to understand incoming website comment sentiment to take the appropriate action.
46+
The problem for this tutorial is to understand incoming website comment sentiment to take the appropriate action.
4747

4848
You can break down the problem to the sentiment text and sentiment value for the data you want to train the model with, and a predicted sentiment value that you can evaluate and then use operationally.
4949

@@ -81,17 +81,7 @@ Predict the **sentiment** of a new website comment, either positive or negative.
8181

8282
Add the following `using` statements to the top of the *Program.cs* file:
8383

84-
```csharp
85-
using System;
86-
using Microsoft.ML.Models;
87-
using Microsoft.ML.Runtime;
88-
using Microsoft.ML.Runtime.Api;
89-
using Microsoft.ML.Trainers;
90-
using Microsoft.ML.Transforms;
91-
using System.Collections.Generic;
92-
using System.Linq;
93-
using Microsoft.ML;
94-
```
84+
[!code-csharp[AddUsings](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#1 "Add necessary usings")]
9585

9686
You need to create two global variables to hold the path to the recently downloaded files:
9787

@@ -100,10 +90,7 @@ You need to create two global variables to hold the path to the recently downloa
10090

10191
Add the following code to the line right above the `Main` method:
10292

103-
```csharp
104-
const string _dataPath = @"..\..\..\data\imdb_labelled.txt";
105-
const string _testDataPath = @"..\..\..\data\yelp_labelled.txt";
106-
```
93+
[!code-csharp[Declare file variables](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#2 "Declare variables to store data files")]
10794

10895
You need to create some classes for your input data and predictions. Add a new class to your project:
10996

@@ -113,35 +100,17 @@ You need to create some classes for your input data and predictions. Add a new c
113100

114101
The *SentimentData.cs* file opens in the code editor. Add the following `using` statements to the top of *SentimentData.cs*:
115102

116-
```csharp
117-
using Microsoft.ML.Runtime.Api;
118-
```
103+
[!code-csharp[AddUsings](../../../samples/machine-learning/tutorials/SentimentAnalysis/SentimentData.cs#1 "Add necessary usings")]
119104

120105
Add the following code, which has two classes `SentimentData` and `SentimentPrediction`, to the *SentimentData.cs* file:
121106

122-
```csharp
123-
public class SentimentData
124-
{
125-
[Column(ordinal: "0")]
126-
public string SentimentText;
127-
[Column(ordinal: "1", name: "Label")]
128-
public float Sentiment;
129-
}
130-
131-
public class SentimentPrediction
132-
{
133-
[ColumnName("PredictedLabel")]
134-
public bool Sentiment;
135-
}
136-
```
107+
[!code-csharp[DeclareTypes](../../../samples/machine-learning/tutorials/SentimentAnalysis/SentimentData.cs#2 "Declare data record types")]
137108

138109
`SentimentData` is the input dataset class and has a string for the comment (`SentimentText`), a `float` (`Sentiment`) that has a value for sentiment of either positive or negative. Both fields have `Column` attributes attached to them. This attribute describes the order of each field in the data file, and which is the `Label` field. `SentimentPrediction` is the class used for prediction after the model has been trained. It has a single boolean (`Sentiment`) and a `PredictedLabel` `ColumnName` attribute. The `Label` is used to create and train the model, and it's also used with a second dataset to evaluate the model. The `PredictedLabel` is used during prediction and evaluation. For evaluation, an input with training data, the predicted values, and the model are used.
139110

140111
In the *Program.cs* file, replace the `Console.WriteLine("Hello World!")` line with the following code in the `Main` method:
141112

142-
```csharp
143-
var model = TrainAndPredict();
144-
```
113+
[!code-csharp[TrainAndPredict](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#3 "Train and predict your model")]
145114

146115
The `TrainAndPredict` method executes the following tasks:
147116

@@ -152,36 +121,25 @@ The `TrainAndPredict` method executes the following tasks:
152121

153122
Create the `TrainAndPredict` method, just after the `Main` method, using the following code:
154123

155-
```csharp
156-
public static PredictionModel<SentimentData, SentimentPrediction> TrainAndPredict()
157-
{
158-
159-
}
160-
```
124+
[!code-csharp[DeclareTrainAndPredict](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#4 "Declare the TrainAndPredict model")]
161125

162126
## Ingest the data
163127

164128
Initialize a new instance of <xref:Microsoft.ML.LearningPipeline> that will include the data loading, data processing/featurization, and model. Add the following code as the first line of the `TrainAndPredict` method:
165129

166-
```csharp
167-
var pipeline = new LearningPipeline();
168-
```
130+
[!code-csharp[LearningPipeline](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#5 "Create a learning pipeline")]
169131

170132
The <xref:Microsoft.ML.TextLoader%601> object is the first part of the pipeline, and loads the training file data.
171133

172-
```csharp
173-
pipeline.Add(new TextLoader<SentimentData>(_dataPath, useHeader: false, separator: "tab"));
174-
```
134+
[!code-csharp[TextLoader](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#6 "Add a text loader to the pipeline")]
175135

176136
## Data preprocess and feature engineering
177137

178138
Pre-processing and cleaning data are important tasks that occur before a dataset is used effectively for machine learning. Raw data is often noisy and unreliable, and may be missing values. Using data without these modeling tasks can produce misleading results. ML.NET's transform pipelines allow you to compose a custom set of transforms that are applied to your data before training or testing. The transforms' primary purpose is for data featurization. A transform pipeline's advantage is that after transform pipeline definition, save the pipeline to apply it to test data.
179139

180140
Apply a <xref:Microsoft.ML.Transforms.TextFeaturizer> to convert the `SentimentText` column into a numeric vector called `Features` used by the machine learning algorithm. This is the preprocessing/featurization step. Using additional components available in ML.NET can enable better results with your model. Add `TextFeaturizer` to the pipeline as the next line of code:
181141

182-
```csharp
183-
pipeline.Add(new TextFeaturizer("Features", "SentimentText"));
184-
```
142+
[!code-csharp[TextFeaturizer](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#7 "Add a TextFeaturizer to the pipeline")]
185143

186144
### About the classification model
187145

@@ -201,132 +159,73 @@ The <xref:Microsoft.ML.Trainers.FastTreeBinaryClassifier> object is a decision t
201159

202160
Add the following code to the `TrainAndPredict` method:
203161

204-
```csharp
205-
pipeline.Add(new FastTreeBinaryClassifier() { NumLeaves = 5, NumTrees = 5, MinDocumentsInLeafs = 2 });
206-
```
162+
[!code-csharp[BinaryClassifier](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#8 "Add a fast binary tree classifier")]
207163

208164
## Train the model
209165

210166
You train the model, <xref:Microsoft.ML.PredictionModel%602>, based on the dataset that has been loaded and transformed. `pipeline.Train<SentimentData, SentimentPrediction>()` trains the pipeline (loads the data, trains the featurizer and learner). The experiment is not executed until this happens.
211167

212168
Add the following code to the `TrainAndPredict` method:
213169

214-
```csharp
215-
PredictionModel<SentimentData, SentimentPrediction> model = pipeline.Train<SentimentData, SentimentPrediction>();
216-
```
170+
[!code-csharp[TrainModel](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#9 "Train the model")]
217171

218172
## Predict the model
219173

220174
Add some comments to test the trained model's predictions in the `TrainAndPredict` method:
221175

222-
```csharp
223-
IEnumerable<SentimentData> sentiments = new[]
224-
{
225-
new SentimentData
226-
{
227-
SentimentText = "Contoso's 11 is a wonderful experience",
228-
Sentiment = 0
229-
},
230-
new SentimentData
231-
{
232-
SentimentText = "Really bad",
233-
Sentiment = 0
234-
},
235-
new SentimentData
236-
{
237-
SentimentText = "Joe versus the Volcano Coffee Company is a great film.",
238-
Sentiment = 0
239-
}
240-
};
241-
```
176+
[!code-csharp[PredictionData](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#10 "CVreate test data for predictions")]
242177

243178
Now that you have a model, you can use that to predict the positive or negative sentiment of the comment data using the <xref:Microsoft.ML.PredictionModel.Predict%2A?displayProperty=nameWithType> method. To get a prediction, use `Predict` on new data. Note that the input data is a string and the model includes the featurization. Your pipeline is in sync during training and prediction. You didn’t have to write preprocessing/featurization code specifically for predictions, and the same API takes care of both batch and one-time predictions.
244179

245-
```csharp
246-
IEnumerable<SentimentPrediction> predictions = model.Predict(sentiments);
247-
```
180+
[!code-csharp[Predict](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#11 "Create predictions of sentiments")]
248181

249182
### Model operationalization: prediction
250183

251184
Display `SentimentText` and corresponding sentiment prediction in order to share the results and act on them accordingly. This is called operationalization, using the returned data as part of the operational policies. Create a header for the results using the following <xref:System.Console.WriteLine?displayProperty=nameWithType> code:
252185

253-
```csharp
254-
Console.WriteLine();
255-
Console.WriteLine("Sentiment Predictions");
256-
Console.WriteLine("---------------------");
257-
```
186+
[!code-csharp[OutputHeaders](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#12 "Display prediction outputs")]
258187

259188
Before displaying the predicted results, combine the sentiment and prediction together to see the original comment with its predicted sentiment. The following code uses the <xref:System.Linq.Enumerable.Zip%2A> method to make that happen, so add that code next:
260189

261-
```csharp
262-
var sentimentsAndPredictions = sentiments.Zip(predictions, (sentiment, prediction) => new { sentiment, prediction });
263-
```
190+
[!code-csharp[BuildTuples](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#13 "Build the pairs of sentiment data and predictions")]
264191

265192
Now that you've combined the `SentimentText` and `Sentiment` into a class, you can display the results using the <xref:System.Console.WriteLine?displayProperty=nameWithType> method:
266193

267-
```csharp
268-
foreach (var item in sentimentsAndPredictions)
269-
{
270-
Console.WriteLine($"Sentiment: {item.sentiment.SentimentText} | Prediction: {(item.prediction.Sentiment ? "Positive" : "Negative")}");
271-
}
272-
Console.WriteLine();
273-
```
194+
[!code-csharp[DisplayPredictions](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#14 "Display the predictions")]
274195

275196
#### Return the model trained to use for evaluation
276197

277198
Return the model at the end of the `TrainAndPredict` method. At this point, you could then save it to a zip file or continue to work with it. For this tutorial, you're going to work with it, so add the following code to the next line in `TrainAndPredict`:
278199

279-
```csharp
280-
return model;
281-
```
200+
[!code-csharp[ReturnModel](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#15 "Return the model")]
282201

283202
## Evaluate the model
284203

285204
Now that you've created and trained the model, you need to evaluate it with a different dataset for quality assurance and validation. In the `Evaluate` method, the model created in `TrainAndPredict` is passed in to be evaluated. Create the `Evaluate` method, just after `TrainAndPredict`, as in the following code:
286205

287-
```csharp
288-
public static void Evaluate(PredictionModel<SentimentData, SentimentPrediction> model)
289-
{
290-
291-
}
292-
```
206+
[!code-csharp[Evaluate](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#16 "Evaluate your model")]
293207

294208
Add a call to the new method from the `Main` method, right under the `TrainAndPredict` method call, using the following code:
295209

296-
```csharp
297-
Evaluate(model);
298-
```
210+
[!code-csharp[CallEvaluate](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#17 "Call the Evaluate method")]
299211

300212
The <xref:Microsoft.ML.TextLoader%601> class loads the new test dataset with the same schema. You can evaluate the model using this dataset as a quality check. Add that next to the `Evaluate` method call, using the following code:
301213

302-
```csharp
303-
var testData = new TextLoader<SentimentData>(_testDataPath, useHeader: false, separator: "tab");
304-
```
214+
[!code-csharp[LoadText](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#18 "Load the test dataset")]
305215

306216
The <xref:Microsoft.ML.Models.BinaryClassificationEvaluator> object computes the quality metrics for the `PredictionModel` using the specified dataset. To see those metrics, add the evaluator as the next line in the `Evaluate` method, with the following code:
307217

308-
```csharp
309-
var evaluator = new BinaryClassificationEvaluator();
310-
```
218+
[!code-csharp[BinaryEvaluator](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#19 "Create the binary evaluator")]
311219

312220
The <xref:Microsoft.ML.Models.BinaryClassificationMetrics> contains the overall metrics computed by binary classification evaluators. To display these to determine the quality of the model, we need to get the metrics first. Add the following code:
313221

314-
```csharp
315-
BinaryClassificationMetrics metrics = evaluator.Evaluate(model, testData);
316-
```
222+
[!code-csharp[CreateMetrics](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#20 "Evaluate the model and create metrics")]
317223

318224
### Displaying the metrics for model validation
319225

320226
Use the following code to display the metrics, share the results, and act on them accordingly:
321227

322-
```csharp
323-
Console.WriteLine();
324-
Console.WriteLine("PredictionModel quality metrics evaluation");
325-
Console.WriteLine("------------------------------------------");
326-
Console.WriteLine($"Accuracy: {metrics.Accuracy:P2}");
327-
Console.WriteLine($"Auc: {metrics.Auc:P2}");
328-
Console.WriteLine($"F1Score: {metrics.F1Score:P2}");
329-
```
228+
[!code-csharp[DisplayMetrics](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#21 "Display selected metrics")]
330229

331230
## Results
332231

@@ -336,7 +235,7 @@ Your results should be similar to the following. As the pipeline processes, it d
336235
Sentiment Predictions
337236
---------------------
338237
Sentiment: Contoso's 11 is a wonderful experience | Prediction: Positive
339-
Sentiment: Really bad | Prediction: Negative
238+
Sentiment:The acting in this movie is really bad | Prediction: Negative
340239
Sentiment: Joe versus the Volcano Coffee Company is a great film. | Prediction: Positive
341240
342241

0 commit comments

Comments
 (0)