Adding Early stopping feature in ImageClassification (WIP) #4237

ashbhandare · 2019-09-20T23:06:26Z

Modeled after https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/EarlyStopping

commit 1e990b16209f9d293dfb3111d152b0b8ff9c0fb6 Author: Aishwarya Bhandare <aibhanda@microsoft.com> Date: Fri Sep 20 15:27:11 2019 -0700 cleanup .gitignore commit 54ccaa6d79e420f4624bf1779053ed8709cb3dc9 Author: Aishwarya Bhandare <aibhanda@microsoft.com> Date: Fri Sep 20 15:25:51 2019 -0700 cleanup commit 93b966453895acc40468c7f4d339540e0c7729fb Author: Aishwarya Bhandare <aibhanda@microsoft.com> Date: Fri Sep 20 14:39:07 2019 -0700 initial support for eary stopping feature in ImageClassification.

src/Microsoft.ML.Dnn/ImageClassificationTransform.cs

src/Microsoft.ML.Dnn/DnnCatalog.cs

eerhardt · 2019-09-23T19:28:09Z

src/Microsoft.ML.Dnn/DnnCatalog.cs

@@ -113,6 +116,9 @@ public static class DnnCatalog
            int epoch = 100,
            int batchSize = 10,
            float learningRate = 0.01f,
+            bool enableEarlyStopping = true,


Do we really need a enableEarlyStopping parameter? What if we instead used nullable earlyStoppingminDelta and earlyStoppingPatience parameters, whose default value is null. If the user doesn't supply those values, then early stopping isn't enabled. #Resolved

According to @Zeeshan Siddiqui , we do not want the user to have to set any of these parameters, and want the default values to work well in most of the cases. If the default is set to null, the users would have to appropirately set these values to make use of this feature.

In reply to: 327289504 [](ancestors = 327289504)

I'm not convinced this is the best API to enable this. What happens if we want to enable a different stopping criteria in the future?

It feels like we should consider a different API to enable this. Check out the EarlyStoppingCriteria in FastTree. That seems like more of an extensible/future-proof API.

machinelearning/src/Microsoft.ML.FastTree/Training/EarlyStoppingCriteria.cs

Lines 34 to 38 in 2942ca4

/// <summary>

/// Early stopping rule used to terminate training process once meeting a specified criterion.

/// Used for setting <see cref="EarlyStoppingRule"/> <see cref="BoostedTreeOptions.EarlyStoppingRule"/>.

/// </summary>

public abstract class EarlyStoppingRuleBase

#Resolved

@eerhardt @ashbhandare We spoke about this on Friday and I suggested we have a class object that defines early stopping criteria, this class should extend an interface that defines bool ShouldStop(...). The API parameter should be a reference to this interface EarlyStoping and if it is set to null then we don't apply early stopping but by default it can be set to XYXEarlyStoping ... #Resolved

eerhardt · 2019-09-23T19:28:48Z

src/Microsoft.ML.Dnn/ImageClassificationTransform.cs

+                        wait += 1;
+                        if (wait >= options.Patience)
+                        {
+                            Console.WriteLine("*** Early Stopping at epoch " + epoch.ToString());


Please don't Console.Write inside of library code. #Resolved

src/Microsoft.ML.Dnn/ImageClassificationTransform.cs

bpstark · 2019-09-23T20:09:26Z

src/Microsoft.ML.Dnn/ImageClassificationTransform.cs

+            int wait = 0;
+            var history = new TrainMetrics();
+            history.Accuracy = 0;
+


why are you using a TrainMetrics object when you only care about the history of the accuracy. you can simply use a float here instead. #Resolved

This was with a potential of possibly using other train metrics as well for the criteria for early stopping. I will refactor the code and this will change. #Resolved

codemzs · 2019-09-25T16:39:09Z

...icrosoft.ML.Samples/Dynamic/ImageClassification/ResnetV2101TransferLearningTrainTestSplit.cs

@@ -67,6 +67,7 @@ public static void Example()
                    epoch: 50,
                    batchSize: 10,
                    learningRate: 0.01f,
+                    enableEarlyStopping: true,


I will prefer you create a new sample for early stopping. #Resolved

codemzs · 2019-09-25T16:39:52Z

src/Microsoft.ML.Dnn/DnnCatalog.cs

@@ -89,6 +89,9 @@ public static class DnnCatalog
        /// <param name="epoch">Number of training iterations. Each iteration/epoch refers to one pass over the dataset.</param>
        /// <param name="batchSize">The batch size for training.</param>
        /// <param name="learningRate">The learning rate for training.</param>
+        /// <param name="enableEarlyStopping">Whether early stopping technique should be used when accuracy stops improving.</param>
+        /// <param name="earlyStoppingminDelta">Minimum change in accuracy to qualify as improvement.</param>
+        /// <param name="earlyStoppingPatience">Number of epochs to wait after no improvement is observed before early stopping.</param>


Make this option a class. #Resolved

src/Microsoft.ML.Dnn/ImageClassificationTransform.cs

codemzs · 2019-09-25T16:42:12Z

src/Microsoft.ML.Dnn/ImageClassificationTransform.cs

+                            earlyStop = true;
+                        }
+                    }
+                }


Please add comments here that document this technique and also add relevant links #Resolved

codemzs · 2019-09-25T16:43:41Z

src/Microsoft.ML.Dnn/ImageClassificationTransform.cs

+                        wait += 1;
+                        if (wait >= options.Patience)
+                        {
+                            Console.WriteLine("*** Early Stopping at epoch " + epoch.ToString());


Console.WriteLine("*** Early Stopping at epoch " + epoch.ToString()); [](start = 28, length = 69)

Use message channels for logging. #Resolved

codemzs · 2019-09-25T16:47:37Z

src/Microsoft.ML.Dnn/ImageClassificationTransform.cs

+            bool earlyStop = false;
+            int wait = 0;
+            var history = new TrainMetrics();
+            history.Accuracy = 0;


Why create the a new class when you just want a variable to store accuracy? just create a float variable "lastSeenAccuracy" #Resolved

addresssed here: #4237 (comment)

In reply to: 328229850 [](ancestors = 328229850)

codemzs · 2019-09-25T16:54:08Z

src/Microsoft.ML.Dnn/ImageClassificationTransform.cs

+                            Console.WriteLine("*** Early Stopping at epoch " + epoch.ToString());
+                            earlyStop = true;
+                        }
+                    }


Please do not put a PR without a unit-test, if you do, please mark it as Draft PR or WIP. #Resolved

codemzs

🕐

codemzs · 2019-09-25T16:56:47Z

src/Microsoft.ML.Dnn/DnnCatalog.cs

@@ -136,6 +142,9 @@ public static class DnnCatalog
                Epoch = epoch,
                LearningRate = learningRate,
                BatchSize = batchSize,
+                EnableEarlyStopping = enableEarlyStopping,
+                MinDelta = earlyStoppingminDelta,
+                Patience = earlyStoppingPatience,


Since you are taking this technique from https://github.com/tensorflow/tensorflow/blob/r1.14/tensorflow/python/keras/callbacks.py#L1143-L1260

were you also planning to add the "modes", i.e min, max, auto?

mode: One of {"auto", "min", "max"}. In min mode,
training will stop when the quantity
monitored has stopped decreasing; in max
mode it will stop when the quantity
monitored has stopped increasing; in auto
mode, the direction is automatically inferred
from the name of the monitored quantity.

I think we should add this. #Resolved

codemzs · 2019-09-25T17:31:31Z

src/Microsoft.ML.Dnn/ImageClassificationTransform.cs

+                    {
+                        history.Accuracy = metrics.Train.Accuracy;
+                        wait = 0;
+                    }


This is incorrect. It needs to be absolute change as documented here: https://github.com/tensorflow/tensorflow/blob/r1.14/tensorflow/python/keras/callbacks.py#L1143-L1260

min_delta: Minimum change in the monitored quantity
to qualify as an improvement, i.e. an absolute
change of less than min_delta, will count as no
improvement.

Looking at the code:

if mode == 'min':
self.monitor_op = np.less
elif mode == 'max':
self.monitor_op = np.greater
else:
if 'acc' in self.monitor:
self.monitor_op = np.greater
else:
self.monitor_op = np.less

if self.monitor_op == np.greater: self.min_delta *= 1 else: self.min_delta *= -1

The last 4 lines above change the sign of the delta and that takes care of absolute difference in below function:

def on_epoch_end(self, epoch, logs=None):
current = self.get_monitor_value(logs)
if current is None:
return
if self.monitor_op(current - self.min_delta, self.best):
self.best = current
self.wait = 0
if self.restore_best_weights:
self.best_weights = self.model.get_weights()
else:
self.wait += 1
if self.wait >= self.patience:
self.stopped_epoch = epoch
self.model.stop_training = True
if self.restore_best_weights:
if self.verbose > 0:
print('Restoring model weights from the end of the best epoch.')
self.model.set_weights(self.best_weights)

#Resolved

As we discussed offline, the change in sign of the min_delta doesn't handle taking the absolute value. Even though it is mentioned in the comment that they take absolute value of the change, it is not implemented that way in the code. However, we want to consider absolute value and I will make that change.

In reply to: 328248944 [](ancestors = 328248944)

…it test and sample.

codecov · 2019-09-30T19:40:30Z

Codecov Report

❗ No coverage uploaded for pull request base (master@d290881). Click here to learn what that means.
The diff coverage is 90.44%.

@@            Coverage Diff            @@
##             master    #4237   +/-   ##
=========================================
  Coverage          ?   74.56%           
=========================================
  Files             ?      878           
  Lines             ?   154012           
  Branches          ?    16852           
=========================================
  Hits              ?   114833           
  Misses            ?    34446           
  Partials          ?     4733

Flag	Coverage Δ
#Debug	`74.56% <90.44%> (?)`
#production	`70.15% <90.19%> (?)`
#test	`89.51% <90.56%> (?)`

Impacted Files	Coverage Δ
src/Microsoft.ML.Dnn/DnnCatalog.cs	`78.66% <100%> (ø)`
...c/Microsoft.ML.Dnn/ImageClassificationTransform.cs	`86.25% <90%> (ø)`
...cenariosWithDirectInstantiation/TensorflowTests.cs	`89.96% <90.56%> (ø)`

codemzs · 2019-10-01T23:18:42Z

src/Microsoft.ML.Dnn/ImageClassificationTransform.cs

+                if (options.EarlyStopper != null)
+                {
+                    earlyStop = options.EarlyStopper.ShouldStop(metrics.Train);
+                }


why not just break out? it will save you a variable #Resolved

done

In reply to: 330316698 [](ancestors = 330316698)

codemzs · 2019-10-01T23:19:33Z

src/Microsoft.ML.Dnn/ImageClassificationTransform.cs

+
+            /// <summary>
+            /// Current number of epochs where there has been no improvement.
+            /// Stop training when wait >=patience.


wait [](start = 35, length = 4)

please use param ref to refer variables. #Resolved

updated description to not use variables. Couldn't use as these variables are not parameters to this particular variable(i guess).

In reply to: 330316879 [](ancestors = 330316879)

codemzs · 2019-10-01T23:20:33Z

src/Microsoft.ML.Dnn/ImageClassificationTransform.cs

+                    currentMetricValue = currentMetrics.Accuracy;
+                else
+                    currentMetricValue = currentMetrics.CrossEntropy;
+                if(CheckIncreasing)


if(CheckIncreasing) [](start = 16, length = 19)

new line #Resolved

codemzs · 2019-10-01T23:21:28Z

src/Microsoft.ML.Dnn/ImageClassificationTransform.cs

+                if (_metric == EarlyStoppingMetric.Accuracy)
+                    currentMetricValue = currentMetrics.Accuracy;
+                else
+                    currentMetricValue = currentMetrics.CrossEntropy;


currentMetricValue = _metric == EarlyStoppingMetric.Accuracy ? currentMetrics.Accuracy : currentMetrics.CrossEntropy #Resolved

codemzs · 2019-10-01T23:21:54Z

src/Microsoft.ML.Dnn/ImageClassificationTransform.cs

+                    {
+                        _wait += 1;
+                        if(_wait >= Patience)
+                            return (true);


(true); [](start = 35, length = 7)

why the brackets? just "return true;" #Resolved

codemzs · 2019-10-01T23:22:24Z

src/Microsoft.ML.Dnn/ImageClassificationTransform.cs

+                    if((currentMetricValue- _bestMetricValue) < MinDelta)
+                    {
+                        _wait += 1;
+                        if(_wait >= Patience)


if(_wait >= Patience) [](start = 24, length = 21)

Can _wait ever be greater than Patience? #Resolved

since patience is an int, user might supply a negative value. in that case, it is better to check >= instead of ==.

In reply to: 330317534 [](ancestors = 330317534)

codemzs · 2019-10-01T23:23:00Z

src/Microsoft.ML.Dnn/ImageClassificationTransform.cs

+                    {
+                        _wait += 1;
+                        if (_wait >= Patience)
+                            return (true);


return (true); [](start = 28, length = 14)

return true #Resolved

codemzs · 2019-10-01T23:23:37Z

src/Microsoft.ML.Dnn/ImageClassificationTransform.cs

+                        _bestMetricValue = currentMetricValue;
+                    }
+                }
+                return (false);


return (false); [](start = 15, length = 16)

return false #Resolved

codemzs · 2019-10-01T23:24:38Z

src/Microsoft.ML.Dnn/ImageClassificationTransform.cs

+            /// Early Stopping technique to stop training when accuracy stops improving.
+            /// </summary>
+            [Argument(ArgumentType.AtMostOnce, HelpText = "Early Stopping technique to stop training when accuracy stops improving.", SortOrder = 15)]
+            public EarlyStopping EarlyStopper;


EarlyStopper [](start = 33, length = 12)

We generally refer this as EarlyStoppingCriteria #Resolved

renamed

In reply to: 330318020 [](ancestors = 330318020)

codemzs · 2019-10-01T23:27:04Z

test/Microsoft.ML.Tests/ScenariosWithDirectInstantiation/TensorflowTests.cs

+                epoch: 50,
+                batchSize: 5,
+                learningRate: 0.01f,
+                earlyStopping: new ImageClassificationEstimator.EarlyStopping(),


earlyStopping: new ImageClassificationEstimator.EarlyStopping(), [](start = 16, length = 64)

isn't this the case by default? How is this test different from the above test? May be in the above test disable early stopping so we get that case covered and here enable but also verify the epoch at which it stops via metrics callback #Resolved

done

In reply to: 330318498 [](ancestors = 330318498)

test/Microsoft.ML.Tests/ScenariosWithDirectInstantiation/TensorflowTests.cs

codemzs

ashbhandare requested a review from a team as a code owner September 20, 2019 23:06

ashbhandare force-pushed the early_stopping branch from fcd9a99 to 7432ba7 Compare September 23, 2019 18:03

Revert .gitignore

2b7e214

eerhardt reviewed Sep 23, 2019

View reviewed changes

bpstark reviewed Sep 23, 2019

View reviewed changes

codemzs reviewed Sep 25, 2019

View reviewed changes

src/Microsoft.ML.Dnn/ImageClassificationTransform.cs Outdated Show resolved Hide resolved

codemzs reviewed Sep 25, 2019

View reviewed changes

codemzs requested changes Sep 25, 2019

View reviewed changes

codemzs reviewed Sep 25, 2019

View reviewed changes

Merge branch 'master' into early_stopping

baab5de

ashbhandare changed the title ~~Adding Early stopping feature in ImageClassification~~ Adding Early stopping feature in ImageClassification (WIP) Sep 26, 2019

ashbhandare added 3 commits September 26, 2019 14:39

Renaming, changing default, cleanup

4ebcb40

(WIP) Refactored EarlyStopping as class, improved API usage, added un…

899d264

…it test and sample.

fix unit test

4d87809

ashbhandare added 2 commits September 30, 2019 16:25

Merge branch 'master' into early_stopping

48e1729

added explanation of early stopping, enabled earlyStopping by default.

e8d4de3

codemzs reviewed Oct 1, 2019

View reviewed changes

test/Microsoft.ML.Tests/ScenariosWithDirectInstantiation/TensorflowTests.cs Show resolved Hide resolved

codemzs reviewed Oct 1, 2019

View reviewed changes

test/Microsoft.ML.Tests/ScenariosWithDirectInstantiation/TensorflowTests.cs Show resolved Hide resolved

updated test for a narrower range, minor refactor of code

4a30441

codemzs approved these changes Oct 2, 2019

View reviewed changes

Merge branch 'master' into early_stopping

bf4f22d

ashbhandare merged commit f8a672a into dotnet:master Oct 2, 2019

ashbhandare deleted the early_stopping branch October 2, 2019 17:27

codemzs mentioned this pull request Oct 3, 2019

Fix build breaks. #4278

Merged

ghost locked as resolved and limited conversation to collaborators Mar 20, 2022

	/// <summary>
	/// Early stopping rule used to terminate training process once meeting a specified criterion.
	/// Used for setting <see cref="EarlyStoppingRule"/> <see cref="BoostedTreeOptions.EarlyStoppingRule"/>.
	/// </summary>
	public abstract class EarlyStoppingRuleBase

Adding Early stopping feature in ImageClassification (WIP) #4237

Adding Early stopping feature in ImageClassification (WIP) #4237

Conversation

ashbhandare commented Sep 20, 2019

eerhardt Sep 23, 2019 • edited by ashbhandare Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eerhardt Sep 27, 2019 • edited by ashbhandare Loading

Choose a reason for hiding this comment

codemzs Sep 29, 2019 • edited by ashbhandare Loading

Choose a reason for hiding this comment

eerhardt Sep 23, 2019 • edited by ashbhandare Loading

Choose a reason for hiding this comment

bpstark Sep 23, 2019 • edited by ashbhandare Loading

Choose a reason for hiding this comment

ashbhandare Sep 26, 2019 • edited Loading

Choose a reason for hiding this comment

codemzs Sep 25, 2019 • edited by ashbhandare Loading

Choose a reason for hiding this comment

codemzs Sep 25, 2019 • edited by ashbhandare Loading

Choose a reason for hiding this comment

codemzs Sep 25, 2019 • edited by ashbhandare Loading

Choose a reason for hiding this comment

codemzs Sep 25, 2019 • edited by ashbhandare Loading

Choose a reason for hiding this comment

codemzs Sep 25, 2019 • edited by ashbhandare Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codemzs Sep 25, 2019 • edited by ashbhandare Loading

Choose a reason for hiding this comment

codemzs left a comment

Choose a reason for hiding this comment

codemzs Sep 25, 2019 • edited by ashbhandare Loading

Choose a reason for hiding this comment

codemzs Sep 25, 2019 • edited by ashbhandare Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Sep 30, 2019 • edited Loading

Codecov Report

codemzs Oct 1, 2019 • edited by ashbhandare Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codemzs Oct 1, 2019 • edited by ashbhandare Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codemzs Oct 1, 2019 • edited by ashbhandare Loading

Choose a reason for hiding this comment

codemzs Oct 1, 2019 • edited by ashbhandare Loading

Choose a reason for hiding this comment

codemzs Oct 1, 2019 • edited by ashbhandare Loading

Choose a reason for hiding this comment

codemzs Oct 1, 2019 • edited by ashbhandare Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codemzs Oct 1, 2019 • edited by ashbhandare Loading

Choose a reason for hiding this comment

codemzs Oct 1, 2019 • edited by ashbhandare Loading

Choose a reason for hiding this comment

codemzs Oct 1, 2019 • edited by ashbhandare Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codemzs Oct 1, 2019 • edited by ashbhandare Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codemzs left a comment

Choose a reason for hiding this comment

eerhardt Sep 23, 2019 •

edited by ashbhandare

Loading

eerhardt Sep 27, 2019 •

edited by ashbhandare

Loading

codemzs Sep 29, 2019 •

edited by ashbhandare

Loading

eerhardt Sep 23, 2019 •

edited by ashbhandare

Loading

bpstark Sep 23, 2019 •

edited by ashbhandare

Loading

ashbhandare Sep 26, 2019 •

edited

Loading

codemzs Sep 25, 2019 •

edited by ashbhandare

Loading

codemzs Sep 25, 2019 •

edited by ashbhandare

Loading

codemzs Sep 25, 2019 •

edited by ashbhandare

Loading

codemzs Sep 25, 2019 •

edited by ashbhandare

Loading

codemzs Sep 25, 2019 •

edited by ashbhandare

Loading

codemzs Sep 25, 2019 •

edited by ashbhandare

Loading

codemzs Sep 25, 2019 •

edited by ashbhandare

Loading

codemzs Sep 25, 2019 •

edited by ashbhandare

Loading

codecov bot commented Sep 30, 2019 •

edited

Loading

codemzs Oct 1, 2019 •

edited by ashbhandare

Loading

codemzs Oct 1, 2019 •

edited by ashbhandare

Loading

codemzs Oct 1, 2019 •

edited by ashbhandare

Loading

codemzs Oct 1, 2019 •

edited by ashbhandare

Loading

codemzs Oct 1, 2019 •

edited by ashbhandare

Loading

codemzs Oct 1, 2019 •

edited by ashbhandare

Loading

codemzs Oct 1, 2019 •

edited by ashbhandare

Loading

codemzs Oct 1, 2019 •

edited by ashbhandare

Loading

codemzs Oct 1, 2019 •

edited by ashbhandare

Loading

codemzs Oct 1, 2019 •

edited by ashbhandare

Loading