The learning rate decay factor.
Actually adds the gradient to the weights.
Actually adds the gradient to the weights. ParameterAveraging overrides this.
The weights
The gradient
The learning rate
Once learning is done, the weights should be copied back into normal tensors.
Once learning is done, the weights should be copied back into normal tensors.
The weights
Some optimizers swap out weights with special purpose tensors for e.
Some optimizers swap out weights with special purpose tensors for e.g. efficient scoring while learning.
The weights
Online optimizers generally don't converge
Online optimizers generally don't converge
Always false
Override this method to change the learning rate
Override this method to change the learning rate
The weights
The gradient
The value
The learning rate
Override this method do to some transformation to the gradient before going on with optimization
Override this method do to some transformation to the gradient before going on with optimization
The weights
The gradient
The base learning rate
To override if you want to reset internal state.
To override if you want to reset internal state.
Should not be overriden.
Should not be overriden. The main flow of a GradientStep optimizer.
The weights
The gradient
The value
This implements the adaptive learning rates from the AdaGrad algorithm (with Composite Mirror Descent update) from "Adaptive Subgradient Methods for Online Learning and Stochastic Optimization" by Duchi et al.
Can be mixed into any GradientStep.