See MIRA.
See MIRA.
See AdaGrad, but again here it matters much less.
See AdaGrad, but again here it matters much less.
Actually adds the gradient to the weights.
Actually adds the gradient to the weights. ParameterAveraging overrides this.
The weights
The gradient
The learning rate
Once learning is done, the weights should be copied back into normal tensors.
Once learning is done, the weights should be copied back into normal tensors.
The weights
Some optimizers swap out weights with special purpose tensors for e.
Some optimizers swap out weights with special purpose tensors for e.g. efficient scoring while learning.
The weights
Online optimizers generally don't converge
Online optimizers generally don't converge
Always false
Override this method to change the learning rate
Override this method to change the learning rate
The weights
The gradient
The value
The learning rate
Override this method do to some transformation to the gradient before going on with optimization
Override this method do to some transformation to the gradient before going on with optimization
The weights
The gradient
See AdaGrad, but here it matters much less.
See AdaGrad, but here it matters much less.
To override if you want to reset internal state.
To override if you want to reset internal state.
Should not be overriden.
Should not be overriden. The main flow of a GradientStep optimizer.
The weights
The gradient
The value
The combination of AdaGrad with MIRA