Adagrad and projections onto ellipsoids

((Caveat! I am not sure the manipulations done in this post are correct, but the gist is certainly there.)) One of my favorite optimization techniques is Adagrad, a first-order technique that approximates the Hessian by using all the gradients up to that point. It calls for updates of the form: \[ x_{t+1} = \Pi_{\mathcal{X}}^{G_t^{1/2}} (x_t […]

Back of the envelope calculations of how fast your computer can do linear algebra operations

Let’s talk about CPU speed, practically. By practically, I mean, how fast can your CPU do linear algebra operations. And by linear algebra operations, I mean matrix-matrix multiplies. First, you need to calculate how many FLOPS your computer can do. The following formula comes in handy: \[ \text{nFLOPS} = \text{cores} \cdot \frac{\text{clock cycles}}{\text{second}} \cdot \frac{\text{FLOPS}}{\text{cycle}}. […]

A useful trick for computing gradients w.r.t. matrix arguments, with some examples

I’ve spent hours this week and last week computing, recomputing, and checking expressions for matrix gradients of functions. It turns out that except in the simplest of cases, the most painfree method for finding such gradients is to use the Frechet derivative (this is one of the few concrete benefits I derived from the differential […]

A workaround for installing SQBLib

I spent about an hour today getting Carlos Becker’s SQBLib package for gradient boosted tree regression in matlab working. The issue is that it depends on liblbfgs, and following the instruction Carlos gave for compiling liblbfgs resulted in errors when MATLAB tried to link liblbfgs into his code. Specifically, I got a mysterious error like […]

Quick note on the Chen, Chi, Goldsmith covariance sketching paper

NB: I will update this post as I read the paper, in case it turns out that the first issue I raised is not legitimately a concern. Covariance estimation (and the natural extension, precision estimation) has always been an interesting topic for me because it (can) represent a concise, concrete, and very broadly applicable instance […]

Sampling uniformly from the set of partitions into a fixed number of nonempty sets

It’s easy to sample uniformly from the set of partitions of a set: you pick a number of bins using an appropriate exponential distribution, then randomly i.i.d. toss each element of the set into one of those bins. At the end of this procedure, the nonempty bins will constitute your uniformly sampled random partition. [literature […]

I miss Mathematica

Why? Because Mathics is not up to helping me determine if indeed \[ f(\{A_1, \ldots, A_p\}) = \frac{(n-p)!^2}{n! p!} \left(\frac{1}{p} \right)^{n-p} \frac{|A_1|^2 \cdots |A_p|^2}{|A_1|!\cdots |A_p|!} \] is a pmf over the set of partitions of the set \(\{1, \ldots, n\}\) into \(p \leq n\) nonempty sets. In particular, the sets \(A_1, \ldots, A_p\) in the […]

The convergence rate of the OLS estimator for linear regression

A lot of machine learning and modern statistics concerns the rate at which certain statistical estimators converge to the quantities they are estimating. For instance, in classification, you use a collection of observations of covariates \(\mathbf{x}_i \in \mathbb{R}^p \) and classifications \(y_i \in \{\pm 1\}\) to fit an estimated classification function \(h : \mathbb{R}^p \rightarrow […]