## Adagrad and projections onto ellipsoids

((Caveat! I am not sure the manipulations done in this post are correct, but the gist is certainly there.)) One of my favorite optimization techniques is Adagrad, a first-order technique that approximates the Hessian by using all the gradients up to that point. It calls for updates of the form: \[ x_{t+1} = \Pi_{\mathcal{X}}^{G_t^{1/2}} (x_t […]