Skip to content

Clean up metadata #994

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 5, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -7,33 +7,30 @@ languages:
- csharp
products:
- dotnet
- dotnet-core
- vs
- azure
- azure-functions
- mlnet
---

# Azure Functions Sentiment Analysis Sample

This sample highlights dependency injection in conjunction with the **.NET Core Integration Package** to build a scalable, serverless Azure Functions application.
# Azure Functions Sentiment Analysis Sample

This sample highlights dependency injection in conjunction with the **.NET Core Integration Package** to build a scalable, serverless Azure Functions application.

| ML.NET version | Status | App Type | Data type | Scenario | ML Task | Algorithms |
|----------------|-------------------------------|-------------|-----------|---------------------|---------------------------|-----------------------------|
| v1.3.1 | Up-to-date | Azure Functions | Single data sample | Sentiment Analysis | Binary Classification | Linear Classification |

For a detailed explanation of how to build this application, see the accompanying [how-to guide](https://docs.microsoft.com/en-us/dotnet/machine-learning/how-to-guides/serve-model-serverless-azure-functions-ml-net) on the Microsoft Docs site.

# Goal
## Goal

The goal is to be able to predict sentiment using an HTTP triggered Azure Functions serverless application.

# Problem
## Problem

The problem with running/scoring an ML.NET model in multi-threaded applications comes when you want to do single predictions with the PredictionEngine object and you want to cache that object (i.e. as Singleton) so it is being reused by multiple Http requests (therefore it would be accessed by multiple threads). This is a problem because **the Prediction Engine is not thread-safe** ([ML.NET issue, Nov 2018](https://github.com/dotnet/machinelearning/issues/1718))

# Solution
## Solution

This is an Azure Functions application optimized for scalability and performance when running/scoring an ML.NET model. It uses dependency injection and the .NET Core Integration Package.

Expand Down Expand Up @@ -74,7 +71,7 @@ Basically, with this component, you register the `PredictionEnginePool` in a sin
.FromFile(modelName: "SentimentAnalysisModel", filePath:"MLModels/sentiment_model.zip", watchForChanges: true);
```

In the example above, by setting the `watchForChanges` parameter to `true`, the `PredictionEnginePool` starts a `FileSystemWatcher` that listens to the file system change notifications and raises events when there is a change to the file. This prompts the `PredictionEnginePool` to automatically reload the model without having to redeploy the application. The model is also given a name using the `modelName` parameter. In the event you have multiple models hosted in your application, this is a way of referencing them.
In the example above, by setting the `watchForChanges` parameter to `true`, the `PredictionEnginePool` starts a `FileSystemWatcher` that listens to the file system change notifications and raises events when there is a change to the file. This prompts the `PredictionEnginePool` to automatically reload the model without having to redeploy the application. The model is also given a name using the `modelName` parameter. In the event you have multiple models hosted in your application, this is a way of referencing them.

Then you just need to need to inject the `PredictionEnginePool` inside the respective Azure Function constructor:

Expand All @@ -97,7 +94,7 @@ For a much more detailed explanation of a PredictionEngine object pool comparabl

[How to optimize and run ML.NET models on scalable ASP.NET Core WebAPIs or web apps](https://devblogs.microsoft.com/cesardelatorre/how-to-optimize-and-run-ml-net-models-on-scalable-asp-net-core-webapis-or-web-apps/)

**NOTE:** You don't need to make the implementation explained in the blog post. Precisely that functionality is implemented for you in the .NET Integration Package.
**NOTE:** You don't need to make the implementation explained in the blog post. Precisely that functionality is implemented for you in the .NET Integration Package.

## Test the application locally

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,6 @@ languages:
- csharp
products:
- dotnet
- dotnet-core
- vs
- mlnet
---

Expand All @@ -21,15 +19,18 @@ products:
In this introductory sample, you'll see how to use [ML.NET](https://www.microsoft.com/net/learn/apps/machine-learning-and-ai/ml-dotnet) to detect **anomalies** in a series of number of calls data. In the world of machine learning, this type of task is called TimeSeries Anomaly Detection.

## Problem

We are having data on number of calls over 10 weeks with daily granularity. The data itself has a periodical pattern as the volumn of calls is large is weekdays and small in weekends. We want to find those points that fall out of the regular pattern of the series. In the world of machine learning, this type of task is called Time-Series anomaly detection.

To solve this problem, we will build an ML model that takes as inputs:

* Date
* Number of calls.

and outputs the anomalies in the number of calls.

## Dataset

We have created sample dataset for number of calls. The dataset `phone_calls.csv` can be found [here](./SrCnnEntireDetection/Data/phone_calls.csv)

Format of **Phone Calls DataSet** looks like below.
Expand All @@ -47,9 +48,11 @@ Format of **Phone Calls DataSet** looks like below.
The data in Phone Calls dataset is collected in real world transactions with normalization and rescale transformation.

## ML task - Time Series Anomaly Detection

Anomaly detection is the process of detecting outliers in the data. Anomaly detection in time-series refers to detecting time stamps, or points on a given input time-series, at which the time-series behaves differently from what was expected. These deviations are typically indicative of some events of interest in the problem domain: a cyber-attack on user accounts, power outage, bursting RPS on a server, memory leak, etc.

## Solution

To solve this problem, first, we should determine the period of the series. Second, we can extract the periodical component of the series and apply anomaly detection on the residual part of the series. In ML.net, we could use the detect seasonality function to find the period of a given series. Given the period, the STL algorithm decompose the time-series into three components as `Y = T + S + R`, where `Y` is the original series, `T` is the trend component, `S` is the seasonal componnent and `R` is the residual component of the series(Refer to [this](http://www.nniiem.ru/file/news/2016/stl-statistical-model.pdf) paper for more details on this algorithm). Then, SR-CNN detector is applied to detect anomaly on `R` to capture the anomalies(Refer to [this](https://arxiv.org/pdf/1906.03821.pdf) paper for more details on this algorithm).

![Detect-Anomaly-Pipeline](docs/images/detect-anomaly-pipeline.png)
Expand All @@ -67,6 +70,7 @@ int period = mlContext.AnomalyDetection.DetectSeasonality(dataView, inputColumnN
### 2. Detect Anomaly

First, we need to specify the parameters used for SrCnnEntire detector(Please refer to [here](https://docs.microsoft.com/en-us/dotnet/api/microsoft.ml.timeseriescatalog.detectentireanomalybysrcnn?view=ml-dotnet#Microsoft_ML_TimeSeriesCatalog_DetectEntireAnomalyBySrCnn_Microsoft_ML_AnomalyDetectionCatalog_Microsoft_ML_IDataView_System_String_System_String_System_Double_System_Int32_System_Double_Microsoft_ML_TimeSeries_SrCnnDetectMode_) for the details on the parameters). Then, we invoke the detector and obtain a view of the output data.

```CSharp
var options = new SrCnnEntireAnomalyDetectorOptions()
{
Expand All @@ -79,7 +83,8 @@ var outputDataView = mlContext.AnomalyDetection.DetectEntireAnomalyBySrCnn(dataV
```

### 3. Consume results
The result can be retrived by simply enumerate the result. `Anomaly`, `ExpectedValue`, `UpperBoundary` and `LowerBoundary` are some of the useful output columns.

The result can be retrieved by simply enumerate the result. `Anomaly`, `ExpectedValue`, `UpperBoundary` and `LowerBoundary` are some of the useful output columns.

```CSharp
//STEP 5: Get the detection results as an IEnumerable
Expand Down Expand Up @@ -135,5 +140,5 @@ foreach (var p in predictions)
//25,0,0,0.018746201354033914,29.381125690882463,32.92296779138513,33.681408258162854,25.080843123602072
//26,0,0,0.0141022037992637,5.261543539820418,32.92296779138513,9.561826107100808,0.9612609725400283
//27,0,0,0.013396001938040617,5.4873712582971805,32.92296779138513,9.787653825577571,1.1870886910167897
//28,1,0.4971326063712256,0.3521692757832201,36.504694001629254,32.92296779138513,40.804976568909645,32.20441143434886 < --alert is on, detecte anomaly
//28,1,0.4971326063712256,0.3521692757832201,36.504694001629254,32.92296779138513,40.804976568909645,32.20441143434886 < --alert is on, detected anomaly
```
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,6 @@ languages:
- csharp
products:
- dotnet
- dotnet-core
- vs
- mlnet
---

Expand Down Expand Up @@ -68,7 +66,7 @@ class ImageData
class ModelInput
{
public byte[] Image { get; set; }

public UInt32 LabelAsKey { get; set; }

public string ImagePath { get; set; }
Expand All @@ -92,7 +90,7 @@ class ModelOutput

## Load the data

1. Before loading the data, it needs to be formatted into a list of `ImageInput` objects. To do so, create a data loading utility method `LoadImagesFromDirectory`.
1. Before loading the data, it needs to be formatted into a list of `ImageInput` objects. To do so, create a data loading utility method `LoadImagesFromDirectory`.

```csharp
public static IEnumerable<ImageData> LoadImagesFromDirectory(string folder, bool useFolderNameAsLabel = true)
Expand Down Expand Up @@ -198,13 +196,14 @@ var trainingPipeline = mlContext.MulticlassClassification.Trainers.ImageClassifi

## Train the model

Apply the data to the training pipeline.
Apply the data to the training pipeline.

```
ITransformer trainedModel = trainingPipeline.Fit(trainSet);
```

## Use the model

1. Create a utility method to display predictions.

```csharp
Expand All @@ -217,7 +216,7 @@ private static void OutputPrediction(ModelOutput prediction)

### Classify a single image

1. Make predictions on the test set using the trained model. Create a utility method called `ClassifySingleImage`.
1. Make predictions on the test set using the trained model. Create a utility method called `ClassifySingleImage`.

```csharp
public static void ClassifySingleImage(MLContext mlContext, IDataView data, ITransformer trainedModel)
Expand All @@ -241,7 +240,7 @@ ClassifySingleImage(mlContext, testSet, trainedModel);

### Classify multiple images

1. Make predictions on the test set using the trained model. Create a utility method called `ClassifyImages`.
1. Make predictions on the test set using the trained model. Create a utility method called `ClassifyImages`.

```csharp
public static void ClassifyImages(MLContext mlContext, IDataView data, ITransformer trainedModel)
Expand Down Expand Up @@ -302,7 +301,7 @@ Image: 7001-77.jpg | Actual Value: UD | Predicted Value: UD

## Improve the model

- More Data: The more examples a model learns from, the better it performs. Download the full [SDNET2018 dataset](https://digitalcommons.usu.edu/cgi/viewcontent.cgi?filename=2&article=1047&context=all_datasets&type=additional) and use it to train.
- More Data: The more examples a model learns from, the better it performs. Download the full [SDNET2018 dataset](https://digitalcommons.usu.edu/cgi/viewcontent.cgi?filename=2&article=1047&context=all_datasets&type=additional) and use it to train.
- Augment the data: A common technique to add variety to the data is to augment the data by taking an image and applying different transforms (rotate, flip, shift, crop). This adds more varied examples for the model to learn from.
- Train for a longer time: The longer you train, the more tuned the model will be. Increasing the number of epochs may improve the performance of your model.
- Experiment with the hyper-parameters: In addition to the parameters used in this tutorial, other parameters can be tuned to potentially improve performance. Changing the learning rate, which determines the magnitude of updates made to the model after each epoch may improve performance.
Expand Down
Loading