You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* update sentiment analysis for included snippets
Include all code snippets from a running sample.
* reference snippets for taxi ML tutorial
* respond to feedback
#Customer intent: As a developer, I want to use ML.NET to apply a binary classification task so that I can understand how to use sentiment prediction to take appropriaste action.
7
7
---
8
-
# Walkthrough: Use the ML.NET APIs in a sentiment analysis classification scenario
8
+
# Tutorial: Use the ML.NET APIs in a sentiment analysis classification scenario
9
9
10
-
This sample walkthrough illustrates using the ML.NET API to create a sentiment classifier via a .NET Core console application using C# in Visual Studio 2017.
10
+
This sample tutorial illustrates using the ML.NET API to create a sentiment classifier via a .NET Core console application using C# in Visual Studio 2017.
11
11
12
12
In this tutorial, you learn how to:
13
13
> [!div class="checklist"]
@@ -28,7 +28,7 @@ Sentiment analysis is either positive or negative. So, you can use classificatio
28
28
29
29
## Machine learning workflow
30
30
31
-
This walkthrough follows a machine learning workflow that enables the process to move in an orderly fashion.
31
+
This tutorial follows a machine learning workflow that enables the process to move in an orderly fashion.
32
32
33
33
The workflow phases are as follows:
34
34
@@ -43,7 +43,7 @@ The workflow phases are as follows:
43
43
44
44
You first need to understand the problem, so you can break it down to parts that can support building and training the model. Breaking the problem down you to predict and evaluate the results.
45
45
46
-
The problem for this walkthrough is to understand incoming website comment sentiment to take the appropriate action.
46
+
The problem for this tutorial is to understand incoming website comment sentiment to take the appropriate action.
47
47
48
48
You can break down the problem to the sentiment text and sentiment value for the data you want to train the model with, and a predicted sentiment value that you can evaluate and then use operationally.
49
49
@@ -81,17 +81,7 @@ Predict the **sentiment** of a new website comment, either positive or negative.
81
81
82
82
Add the following `using` statements to the top of the *Program.cs* file:
[!code-csharp[Declare file variables](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#2"Declare variables to store data files")]
107
94
108
95
You need to create some classes for your input data and predictions. Add a new class to your project:
109
96
@@ -113,35 +100,17 @@ You need to create some classes for your input data and predictions. Add a new c
113
100
114
101
The *SentimentData.cs* file opens in the code editor. Add the following `using` statements to the top of *SentimentData.cs*:
Add the following code, which has two classes `SentimentData` and `SentimentPrediction`, to the *SentimentData.cs* file:
121
106
122
-
```csharp
123
-
publicclassSentimentData
124
-
{
125
-
[Column(ordinal:"0")]
126
-
publicstringSentimentText;
127
-
[Column(ordinal:"1", name:"Label")]
128
-
publicfloatSentiment;
129
-
}
130
-
131
-
publicclassSentimentPrediction
132
-
{
133
-
[ColumnName("PredictedLabel")]
134
-
publicboolSentiment;
135
-
}
136
-
```
107
+
[!code-csharp[DeclareTypes](../../../samples/machine-learning/tutorials/SentimentAnalysis/SentimentData.cs#2"Declare data record types")]
137
108
138
109
`SentimentData` is the input dataset class and has a string for the comment (`SentimentText`), a `float` (`Sentiment`) that has a value for sentiment of either positive or negative. Both fields have `Column` attributes attached to them. This attribute describes the order of each field in the data file, and which is the `Label` field. `SentimentPrediction` is the class used for prediction after the model has been trained. It has a single boolean (`Sentiment`) and a `PredictedLabel``ColumnName` attribute. The `Label` is used to create and train the model, and it's also used with a second dataset to evaluate the model. The `PredictedLabel` is used during prediction and evaluation. For evaluation, an input with training data, the predicted values, and the model are used.
139
110
140
111
In the *Program.cs* file, replace the `Console.WriteLine("Hello World!")` line with the following code in the `Main` method:
141
112
142
-
```csharp
143
-
varmodel=TrainAndPredict();
144
-
```
113
+
[!code-csharp[TrainAndPredict](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#3"Train and predict your model")]
145
114
146
115
The `TrainAndPredict` method executes the following tasks:
147
116
@@ -152,36 +121,25 @@ The `TrainAndPredict` method executes the following tasks:
152
121
153
122
Create the `TrainAndPredict` method, just after the `Main` method, using the following code:
[!code-csharp[DeclareTrainAndPredict](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#4"Declare the TrainAndPredict model")]
161
125
162
126
## Ingest the data
163
127
164
128
Initialize a new instance of <xref:Microsoft.ML.LearningPipeline> that will include the data loading, data processing/featurization, and model. Add the following code as the first line of the `TrainAndPredict` method:
165
129
166
-
```csharp
167
-
varpipeline=newLearningPipeline();
168
-
```
130
+
[!code-csharp[LearningPipeline](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#5"Create a learning pipeline")]
169
131
170
132
The <xref:Microsoft.ML.TextLoader%601> object is the first part of the pipeline, and loads the training file data.
[!code-csharp[TextLoader](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#6"Add a text loader to the pipeline")]
175
135
176
136
## Data preprocess and feature engineering
177
137
178
138
Pre-processing and cleaning data are important tasks that occur before a dataset is used effectively for machine learning. Raw data is often noisy and unreliable, and may be missing values. Using data without these modeling tasks can produce misleading results. ML.NET's transform pipelines allow you to compose a custom set of transforms that are applied to your data before training or testing. The transforms' primary purpose is for data featurization. A transform pipeline's advantage is that after transform pipeline definition, save the pipeline to apply it to test data.
179
139
180
140
Apply a <xref:Microsoft.ML.Transforms.TextFeaturizer> to convert the `SentimentText` column into a numeric vector called `Features` used by the machine learning algorithm. This is the preprocessing/featurization step. Using additional components available in ML.NET can enable better results with your model. Add `TextFeaturizer` to the pipeline as the next line of code:
[!code-csharp[BinaryClassifier](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#8"Add a fast binary tree classifier")]
207
163
208
164
## Train the model
209
165
210
166
You train the model, <xref:Microsoft.ML.PredictionModel%602>, based on the dataset that has been loaded and transformed. `pipeline.Train<SentimentData, SentimentPrediction>()` trains the pipeline (loads the data, trains the featurizer and learner). The experiment is not executed until this happens.
211
167
212
168
Add the following code to the `TrainAndPredict` method:
[!code-csharp[TrainModel](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#9"Train the model")]
217
171
218
172
## Predict the model
219
173
220
174
Add some comments to test the trained model's predictions in the `TrainAndPredict` method:
221
175
222
-
```csharp
223
-
IEnumerable<SentimentData>sentiments=new[]
224
-
{
225
-
newSentimentData
226
-
{
227
-
SentimentText="Contoso's 11 is a wonderful experience",
228
-
Sentiment=0
229
-
},
230
-
newSentimentData
231
-
{
232
-
SentimentText="Really bad",
233
-
Sentiment=0
234
-
},
235
-
newSentimentData
236
-
{
237
-
SentimentText="Joe versus the Volcano Coffee Company is a great film.",
238
-
Sentiment=0
239
-
}
240
-
};
241
-
```
176
+
[!code-csharp[PredictionData](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#10"CVreate test data for predictions")]
242
177
243
178
Now that you have a model, you can use that to predict the positive or negative sentiment of the comment data using the <xref:Microsoft.ML.PredictionModel.Predict%2A?displayProperty=nameWithType> method. To get a prediction, use `Predict` on new data. Note that the input data is a string and the model includes the featurization. Your pipeline is in sync during training and prediction. You didn’t have to write preprocessing/featurization code specifically for predictions, and the same API takes care of both batch and one-time predictions.
[!code-csharp[Predict](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#11"Create predictions of sentiments")]
248
181
249
182
### Model operationalization: prediction
250
183
251
184
Display `SentimentText` and corresponding sentiment prediction in order to share the results and act on them accordingly. This is called operationalization, using the returned data as part of the operational policies. Create a header for the results using the following <xref:System.Console.WriteLine?displayProperty=nameWithType> code:
Before displaying the predicted results, combine the sentiment and prediction together to see the original comment with its predicted sentiment. The following code uses the <xref:System.Linq.Enumerable.Zip%2A> method to make that happen, so add that code next:
[!code-csharp[BuildTuples](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#13"Build the pairs of sentiment data and predictions")]
264
191
265
192
Now that you've combined the `SentimentText` and `Sentiment` into a class, you can display the results using the <xref:System.Console.WriteLine?displayProperty=nameWithType> method:
[!code-csharp[DisplayPredictions](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#14"Display the predictions")]
274
195
275
196
#### Return the model trained to use for evaluation
276
197
277
198
Return the model at the end of the `TrainAndPredict` method. At this point, you could then save it to a zip file or continue to work with it. For this tutorial, you're going to work with it, so add the following code to the next line in `TrainAndPredict`:
278
199
279
-
```csharp
280
-
returnmodel;
281
-
```
200
+
[!code-csharp[ReturnModel](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#15"Return the model")]
282
201
283
202
## Evaluate the model
284
203
285
204
Now that you've created and trained the model, you need to evaluate it with a different dataset for quality assurance and validation. In the `Evaluate` method, the model created in `TrainAndPredict` is passed in to be evaluated. Create the `Evaluate` method, just after `TrainAndPredict`, as in the following code:
[!code-csharp[Evaluate](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#16"Evaluate your model")]
293
207
294
208
Add a call to the new method from the `Main` method, right under the `TrainAndPredict` method call, using the following code:
295
209
296
-
```csharp
297
-
Evaluate(model);
298
-
```
210
+
[!code-csharp[CallEvaluate](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#17"Call the Evaluate method")]
299
211
300
212
The <xref:Microsoft.ML.TextLoader%601> class loads the new test dataset with the same schema. You can evaluate the model using this dataset as a quality check. Add that next to the `Evaluate` method call, using the following code:
[!code-csharp[LoadText](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#18"Load the test dataset")]
305
215
306
216
The <xref:Microsoft.ML.Models.BinaryClassificationEvaluator> object computes the quality metrics for the `PredictionModel` using the specified dataset. To see those metrics, add the evaluator as the next line in the `Evaluate` method, with the following code:
307
217
308
-
```csharp
309
-
varevaluator=newBinaryClassificationEvaluator();
310
-
```
218
+
[!code-csharp[BinaryEvaluator](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#19"Create the binary evaluator")]
311
219
312
220
The <xref:Microsoft.ML.Models.BinaryClassificationMetrics> contains the overall metrics computed by binary classification evaluators. To display these to determine the quality of the model, we need to get the metrics first. Add the following code:
0 commit comments