Learning Rate Scheduling

LambdaLR

Sets the learning rate of each parameter group to the initial $lr$ times a given function $f_{\lambda}$. When last_epoch=$-1$, sets initial $lr$ as $lr$.

$$ lr_{\text {epoch}} = lr_{\text {initial}} * f_{\lambda}(epoch) $$
In [49]:
Out[49]:
<matplotlib.legend.Legend at 0x14a6ca4f0>

MultiplicativeLR

Multiply the learning rate of each parameter group by the factor given in the specified function $f$. When last_epoch=-1, sets initial lr as lr.

$$ l r_{\text {epoch}} = l r_{\text {epoch - 1}} * f_{\lambda}(epoch) $$
In [43]:
Out[43]:
[<matplotlib.lines.Line2D at 0x14b129370>]

StepLR

Decays the learning rate of each parameter group by gamma every step_size epochs. Notice that such decay can happen simultaneously with other changes to the learning rate from outside this scheduler. When last_epoch=-1, sets initial lr as lr.

$$ l r_{\text {epoch}}=\left\{\begin{array}{ll} \gamma * l r_{\text {epoch - 1}}, & \text { if } {\text {epoch % step_size}}=0 \\ l r_{\text {epoch - 1}}, & \text { otherwise } \end{array}\right. $$
In [32]:
Out[32]:
[<matplotlib.lines.Line2D at 0x14ad77ac0>]

MultiStepLR

Decays the learning rate of each parameter group by gamma once the number of epoch reaches one of the milestones. Notice that such decay can happen simultaneously with other changes to the learning rate from outside this scheduler. When last_epoch=-1, sets initial lr as lr.

$$ l r_{\text {epoch}}=\left\{\begin{array}{ll} \gamma * l r_{\text {epoch - 1}}, & \text { if } {\text{ epoch in [milestones]}} \\ l r_{\text {epoch - 1}}, & \text { otherwise } \end{array}\right. $$
In [37]:
Out[37]:
[<matplotlib.lines.Line2D at 0x14af02850>]

ExponentialLR

Decays the learning rate of each parameter group by gamma every epoch. When last_epoch=-1, sets initial lr as lr.

$$ l r_{\text {epoch}}= \gamma * l r_{\text {epoch - 1}} $$
In [46]:
Out[46]:
[<matplotlib.lines.Line2D at 0x14b223820>]

ReduceLROnPlateau

Reduce learning rate when a metric has stopped improving. Models often benefit from reducing the learning rate by a factor of 2-10 once learning stagnates. This scheduler reads a metrics quantity and if no improvement is seen for a ‘patience’ number of epochs, the learning rate is reduced.

In [109]:
Out[109]:
[<matplotlib.lines.Line2D at 0x14c436c10>]

CosineAnnealingLR

Set the learning rate of each parameter group using a cosine annealing schedule. When last_epoch=-1, sets initial lr as lr. Notice that because the schedule is defined recursively, the learning rate can be simultaneously modified outside this scheduler by other operators. If the learning rate is set solely by this scheduler, the learning rate at each step becomes:

$$ \eta_{t}=\eta_{\min }+\frac{1}{2}\left(\eta_{\max }-\eta_{\min }\right)\left(1+\cos \left(\frac{T_{c u r}}{T_{\max }} \pi\right)\right) $$

It has been proposed in SGDR: Stochastic Gradient Descent with Warm Restarts. Note that this only implements the cosine annealing part of SGDR, and not the restarts. https://arxiv.org/abs/1608.03983

In [58]:
Out[58]:
[<matplotlib.lines.Line2D at 0x14b56fca0>]

CosineAnnealingWarmRestarts

Set the learning rate of each parameter group using a cosine annealing schedule, and restarts after Ti epochs.

$$ \eta_{t}=\eta_{\min }+\frac{1}{2}\left(\eta_{\max }-\eta_{\min }\right)\left(1+\cos \left(\frac{T_{\operatorname{cur}}}{T_{i}} \pi\right)\right) $$
In [97]:
Out[97]:
[<matplotlib.lines.Line2D at 0x14c256220>]
In [72]:
Out[72]:
[<matplotlib.lines.Line2D at 0x14b498730>]

OneCycleLR

Sets the learning rate of each parameter group according to the 1 cycle learning rate policy. The 1cycle policy anneals the learning rate from an initial learning rate to some maximum learning rate and then from that maximum learning rate to some minimum learning rate much lower than the initial learning rate. This policy was initially described in the paper Super-Convergence: Very Fast Training of Neural Networks Using Large Learning Rates.

The 1 cycle learning rate policy changes the learning rate after every batch. step should be called after a batch has been used for training.

This scheduler is not chainable.

In [73]:
Out[73]:
[<matplotlib.lines.Line2D at 0x14b69cd00>]
This blog is converted from machine-learning-learning-rate-scheduling.ipynb
Written on June 12, 2021