Adagrad and projections onto ellipsoids

((Caveat! I am not sure the manipulations done in this post are correct, but the gist is certainly there.)) One of my favorite optimization techniques is Adagrad, a first-order technique that approximates the Hessian by using all the gradients up to that point. It calls for updates of the form: \[ x_{t+1} = \Pi_{\mathcal{X}}^{G_t^{1/2}} (x_t […]

Back of the envelope calculations of how fast your computer can do linear algebra operations

Let’s talk about CPU speed, practically. By practically, I mean, how fast can your CPU do linear algebra operations. And by linear algebra operations, I mean matrix-matrix multiplies. First, you need to calculate how many FLOPS your computer can do. The following formula comes in handy: \[ \text{nFLOPS} = \text{cores} \cdot \frac{\text{clock cycles}}{\text{second}} \cdot \frac{\text{FLOPS}}{\text{cycle}}. […]

A useful trick for computing gradients w.r.t. matrix arguments, with some examples

I’ve spent hours this week and last week computing, recomputing, and checking expressions for matrix gradients of functions. It turns out that except in the simplest of cases, the most painfree method for finding such gradients is to use the Frechet derivative (this is one of the few concrete benefits I derived from the differential […]

A workaround for installing SQBLib

I spent about an hour today getting Carlos Becker’s SQBLib package for gradient boosted tree regression in matlab working. The issue is that it depends on liblbfgs, and following the instruction Carlos gave for compiling liblbfgs resulted in errors when MATLAB tried to link liblbfgs into his code. Specifically, I got a mysterious error like […]

Sampling uniformly from the set of partitions into a fixed number of nonempty sets

It’s easy to sample uniformly from the set of partitions of a set: you pick a number of bins using an appropriate exponential distribution, then randomly i.i.d. toss each element of the set into one of those bins. At the end of this procedure, the nonempty bins will constitute your uniformly sampled random partition. [literature […]

I miss Mathematica

Why? Because Mathics is not up to helping me determine if indeed \[ f(\{A_1, \ldots, A_p\}) = \frac{(n-p)!^2}{n! p!} \left(\frac{1}{p} \right)^{n-p} \frac{|A_1|^2 \cdots |A_p|^2}{|A_1|!\cdots |A_p|!} \] is a pmf over the set of partitions of the set \(\{1, \ldots, n\}\) into \(p \leq n\) nonempty sets. In particular, the sets \(A_1, \ldots, A_p\) in the […]

The convergence rate of the OLS estimator for linear regression

A lot of machine learning and modern statistics concerns the rate at which certain statistical estimators converge to the quantities they are estimating. For instance, in classification, you use a collection of observations of covariates \(\mathbf{x}_i \in \mathbb{R}^p \) and classifications \(y_i \in \{\pm 1\}\) to fit an estimated classification function \(h : \mathbb{R}^p \rightarrow […]

We play the long game here

I’ve been off Being Human (the original UK version) for a while now. I stopped watching after the first episode of season 3, in part because the American version came out and I got into that before it started circling the drain, and in part because Mitchell gets on my damned last nerve. Now I’m […]

Roll On

I was reading, of all things, a romance novel when I came across this famous excerpt from Childe Harold’s Pilgrimage: Roll on, thou deep and dark blue Ocean – roll! Ten thousand fleets sweep over thee in vain; Man marks the earth with ruin – his control Stops with the shore; – upon the watery […]