Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
__init__.py		__init__.py
constant_scheduler.py		constant_scheduler.py
constant_scheduler_with_warmup.py		constant_scheduler_with_warmup.py
cosine_scheduler_with_warmup.py		cosine_scheduler_with_warmup.py
cosine_scheduler_with_warmup_and_hard_restart.py		cosine_scheduler_with_warmup_and_hard_restart.py
layerwise_decay_scheduler.py		layerwise_decay_scheduler.py
linear_scheduler.py		linear_scheduler.py
linear_scheduler_with_warmup.py		linear_scheduler_with_warmup.py
super_scheduler.py		super_scheduler.py

README.md

Schedulers

Schedulers modify the learning rate through the epochs. Every scheduler is designed to be called every step and not every epoch.

LinearScheduler

This scheduler will linearly decrease the learning rate from its default value learning_rate to 0 during the num_training_steps.

LinearSchedulerWithWarmup

This scheduler will linearly increment the learning rate from 0 to the learning_rate in the first num_warmup_steps steps. After that, it will linearly decrease it up to num_training_steps, that is, the total number of training steps.

ConstantScheduler

This scheduler will always return the original learning_rate.

ConstantSchedulerWithWarmup

This scheduler will linearly increment the learning rate from 0 to the learning_rate in the first num_warmup_steps steps. After that, it will always return the original learning_rate.

CosineSchedulerWithWarmup

This scheduler will linearly increment the learning rate from 0 to the learning_rate in the first num_warmup_steps steps. After that, it will decrease it up to 0 following the value of the cosine function.

CosineSchedulerWithWarmupAndHardRestart

This scheduler will linearly increment the learning rate from 0 to the learning_rate in the first num_warmup_steps steps. After that, it will decrease it up to 0 following the value of the cosine function with several hard restarts.

PolynomialLayerwiseDecaySchedulerWithWarmup

Create a polynomially decreasing scheduler. Conversion of https://www.tensorflow.org/api_docs/python/tf/compat/v1/train/polynomial_decay. More informations about the default parameters can be found on the documentation of _LRScheduler in the torch project.

The layerwise_lr_decay_power parameter allows to multiply the learning rate for each group by a factor f = layerwise_lr_decay_power^(max(depth) - depth). depth is a key that should be defined in every group of parameters, along with the usual weight_decay. A small depth will mean that the learning rate will be only slightly decreased. A high depth will reduce a lot the effects of the training on that parameters.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

schedulers

schedulers

README.md

Schedulers

LinearScheduler

LinearSchedulerWithWarmup

ConstantScheduler

ConstantSchedulerWithWarmup

CosineSchedulerWithWarmup

CosineSchedulerWithWarmupAndHardRestart

PolynomialLayerwiseDecaySchedulerWithWarmup

Files

schedulers

Directory actions

More options

Directory actions

More options

Latest commit

History

schedulers

Folders and files

parent directory

README.md

Schedulers

LinearScheduler

LinearSchedulerWithWarmup

ConstantScheduler

ConstantSchedulerWithWarmup

CosineSchedulerWithWarmup

CosineSchedulerWithWarmupAndHardRestart

PolynomialLayerwiseDecaySchedulerWithWarmup