Skip to content

Latest commit

 

History

History
158 lines (115 loc) · 4.28 KB

README.md

File metadata and controls

158 lines (115 loc) · 4.28 KB

Cyclic Learning Rate (CLR)

TensorFlow implementation of cyclic learning rate from the paper: Smith, Leslie N. "Cyclical learning rates for training neural networks." 2017.

TOC

  1. What is CLR ?
  2. Usage
  3. Functional Tests
  4. License

What is CLR ?

CLR is used to enhance the way the learning rate is scheduled during training, to provide better convergence and help in regularizing deep learning models. It eliminates the need to experimentally find the best values for the global learning rate. Allowing the learning rate to cyclically vary between lower and upper boundary values. The idea is to divide the training process into cycles determined by a stepsize parameter, which defines the number of iterations in half a cycle. The author claims that it is often good to set the stepsize to:

stepsize = (2 to 10) times the number of iterations in an epoch

The learning rate is computed as:

cycle = floor( 1 + global_step / ( 2 * step_size ) )
x = abs( global_step / step_size2 * cycle + 1 )
clr = learning_rate + ( max_lrlearning_rate ) * max( 0 , 1 - x )

The author proposes three variations of this policy:

  • 'triangular': Default, linearly increasing then linearly decreasing the learning rate at each cycle.
  • 'triangular2': The same as the triangular policy except that the learning rate difference is cut in half at the end of each cycle. This means the learning rate difference drops after each cycle.
  • 'exp_range': The learning rate varies between the minimum and maximum boundaries and each boundary value declines by an exponential factor of:
𝑓 = 𝑔𝑎𝑚𝑚𝑎^𝑔𝑙𝑜𝑏𝑎𝑙_𝑠𝑡𝑒𝑝

Where global_step is a number indicating the current iteration and gamma is a constant passed as an argument to the CLR callback.

Usage

Upgrade to the latest version of TensorFlow:

!pip install --upgrade tensorflow
import tensorflow as tf
tf.__version__

[out]: '1.9.0'

Eager mode

Eager mode evaluates operations immediately, without building graphs.

Enable eager execution:

import tensorflow as tf
tf.enable_eager_execution()

Generate cyclic learning rates:

import clr
import matplotlib.pyplot as plt
%matplotlib inline

print(tf.executing_eagerly()) # => True

rates = []

for i in range(0, 250):
    x = clr.cyclic_learning_rate(i, mode='exp_range', gamma=.997)
    rates.append(x())

plt.xlabel('iterations (epochs)')
plt.ylabel('learning rate')
plt.plot(range(250), rates)

#plt.savefig('exp_range.png', dpi=600)
[out]:
True

png

Graph mode

import tensorflow as tf
import clr
import matplotlib.pyplot as plt
%matplotlib inline

print(tf.executing_eagerly()) # => False

rates = []

with tf.Session() as sess:
    for i in range(0, 250):
        rates.append(sess.run(clr.cyclic_learning_rate(i, mode='exp_range', gamma=.997)))

plt.xlabel('iterations (epochs)')
plt.ylabel('learning rate')
plt.plot(range(250), rates)

#plt.savefig('exp_range.png', dpi=600)
[out]:
False

png

Training Example:

  • 'triangular2' mode cyclic learning rate:
...
global_step = tf.Variable(0, trainable=False)
optimizer = tf.train.AdamOptimizer(learning_rate=
  clr.cyclic_learning_rate(global_step=global_step, mode='triangular2'))
train_op = optimizer.minimize(loss_op, global_step=global_step)
...
 with tf.Session() as sess:
    sess.run(init)
    for step in range(1, num_steps+1):
      assign_op = global_step.assign(step)
      sess.run(assign_op)
...

Running Functional Tests

from clr_test import CyclicLearningRateTest

CyclicLearningRateTest().test_triangular()
CyclicLearningRateTest().test_triangular2()
CyclicLearningRateTest().test_exp_range()

License

This project is licensed under the MIT License - see the LICENSE.md file for details

Inspired by Brad Kenstler keras CLR implementation.