The base learning rate
l1 regularization constant. Should be set similarly to that in AdaGradRDA
l2 regularization constant. Should be set similarly to that in AdaGradRDA
The number of examples for online training, used to scale regularizers
Once learning is done, the weights should be copied back into normal tensors.
Once learning is done, the weights should be copied back into normal tensors.
The weights
Some optimizers swap out weights with special purpose tensors for e.
Some optimizers swap out weights with special purpose tensors for e.g. efficient scoring while learning.
The weights
Whether the optimizer has converged yet.
Whether the optimizer has converged yet.
l1 regularization constant.
l1 regularization constant. Should be set similarly to that in AdaGradRDA
l2 regularization constant.
l2 regularization constant. Should be set similarly to that in AdaGradRDA
The base learning rate
Reset the optimizers internal state (such as Hessian approximation, etc.
Reset the optimizers internal state (such as Hessian approximation, etc.)
Updates the weights according to the gradient.
Updates the weights according to the gradient.
The weights
The gradient
The value
Implements the Regularized Dual Averaging algorithm of Xiao with support for l1 and l2 regularization