Actually adds the gradient to the weights.
Actually adds the gradient to the weights. ParameterAveraging overrides this.
The weights
The gradient
The learning rate
Once learning is done, the weights should be copied back into normal tensors.
Once learning is done, the weights should be copied back into normal tensors.
The weights
Some optimizers swap out weights with special purpose tensors for e.
Some optimizers swap out weights with special purpose tensors for e.g. efficient scoring while learning.
The weights
Online optimizers generally don't converge
Online optimizers generally don't converge
Always false
Override this method to change the learning rate
Override this method to change the learning rate
The weights
The gradient
The value
The learning rate
Override this method do to some transformation to the gradient before going on with optimization
Override this method do to some transformation to the gradient before going on with optimization
The weights
The gradient
To override if you want to reset internal state.
To override if you want to reset internal state.
Should not be overriden.
Should not be overriden. The main flow of a GradientStep optimizer.
The weights
The gradient
The value
Base trait for optimizers whose operational form can be described as
Traits which extend this one can have things like parameter averaging or MIRA learning rates or adaptive learning rates for free.