We play the long game here

I’ve been off Being Human (the original UK version) for a while now. I stopped watching after the first episode of season 3, in part because the American version came out and I got into that before it started circling the drain, and in part because Mitchell gets on my damned last nerve.

Now I’m back to watching it after I discovered all the original cast members leave by the end of the show. I’m so looking forward to seeing emo-goth pretty boy Mitchell get dusted— or maybe they’ll dial up his already epic levels of self-pity, narcicissm, and angst to the point that he combusts— that this was just what I needed to get back in the game.

Too bad that means Lenora Crichlow and Russell Tovey will be leaving eventually too. Their acting is the strongest reason I watch this show: when they cry, I cry. When Lenora gives her monologues, I cry a little. When Russell opens up about his feelings, I ball. It’s pathetic, I know, but the point is that they’re very good at getting across the underlying message of the show: that even monsters can be human, because humanity is about our relationships.

Anyhooo, now I’m back to watching Robson Green play a homeless(?) drifter werewolf dad forced into underground supernatural cage fights.

Roll On

I was reading, of all things, a romance novel when I came across this famous excerpt from Childe Harold’s Pilgrimage:

Roll on, thou deep and dark blue Ocean – roll!
Ten thousand fleets sweep over thee in vain;
Man marks the earth with ruin – his control
Stops with the shore; – upon the watery plain
The wrecks are all thy deed, nor doth remain
A shadow of man’s ravage, save his own,
When for a moment, like a drop of rain,
He sinks into thy depths with bubbling groan,
Without a grave, unknelled, uncoffined, and unknown.

There are few poets that affect me like Lord Byron does… or, more likely, they are lots of poets whose work would affect me similarly if I knew of them.

Monument

Jeff Salyards is doing an Ask Me Anything (AMA) on Reddit tonight at 5 PCT. I started reading his debut novel, Scourge of the Betrayer, a couple of weeks ago, but couldn’t get into it for one reason or another. Maybe I was just coming off of an urban fantasy binge, and the relatively heavier setting of SOTB was just too much for me to get into at the time (the couple dozens of pages I read evoked Glenn Cook’s Dark Company novels, especially since the main character is a chronicler). Or maybe I was put off by the heavy-handed way in which a SECRET MISSION with EPIC IMPLICATIONS was being hinted at in the little I read (I mean, almost everyone reading this genre knows what to expect, ok?). Either way, I plan on finishing it eventually, if only to see why the Jeff chose a flail as the weapon of import. But I do expect that it gets better, eventually.

I mention Jeff’s AMA mostly because it got me thinking about his book, which in turn got me thinking about the Dark Company novels (which I love love love), which in turn got me thinking about Ian Graham’s excellent debut novel, Monument. I swear this book flew under the radar, and Ian Graham seems not to have written anything since. But when I come across a fantasy novel revolving around an anti-hero, at some point or another I end up comparing it with Monument. So far none of the anti-heroes I’ve come across have been as despicable and yet somehow comfortably human as Ballas. I’m not even going to try to sell the book to you: just go and read it.

A list of \(\|\cdot\|_{\infty \rightarrow 1}^*\) problems

Recall our old friend the \(\|\cdot\|_{\infty \rightarrow 1}^*\) norm (from the first version of this website, which I still haven’t gotten around to merging into the current iteration). Given a matrix \(\mat{A},\) it is a measure of the energy in the smallest expansion of \(\mat{A}\) in terms of rank one sign matrices:
\[
\|\mat{A}\|_{\infty \rightarrow 1}^* = \inf\left\{\sum_i d_i \,:\, \mat{A} = \sum_i d_i \vec{u}_i \vec{v}_i, \right\}
\]
where \(\{\vec{u}_i, \vec{v}_i\}\) is a collection of sign vectors. Note that the number of sign vectors necessary to achieve this minimum is not known a priori.

This is an interesting norm both intrinsically (as a potential source of factor models in statistics and machine learning) and because of its relations to other norms. See these slides for details.

An estimate of \(\|\cdot\|_{\infty \rightarrow 1}^*\) to within a multiplicative factor of about \(1.783\) can be obtained using a semidefinite program, but I suspect it’s NP hard to compute \(\|\cdot\|_{\infty \rightarrow 1}^*\) exactly (because \(\|\cdot\|_{\infty \rightarrow 1}^*\) is the trace dual of the \(\|\cdot\|_{\infty \rightarrow 1}\) norm, which is known to be NP-hard to compute — if the latter is also APX-hard, which I’m not sure of either way, then that means you definitely couldn’t compute \(\|\cdot\|_{\infty \rightarrow 1}^*\) efficiently in polynomial time). However, more interesting is the question of approximately obtaining the factorization corresponding to \(\|\cdot\|_{\infty \rightarrow 1}^*\): there doesn’t seem to be a good way to obtain such an approximate factorization from the SDP that gives the approximation of \(\|\cdot\|_{\infty \rightarrow 1}^*\), but maybe there’s a clever way to do so that I haven’t seen.

Other interesting questions abound: can we characterize \(\|\cdot\|_{\infty \rightarrow 1}^*\) or the number of rank 1 matrices needed to obtain the corresponding decomposition from a randomly chosen matrix (say i.i.d. Gaussian entries)? What if we just want probability bounds on the number of rank 1 matrices in the decomposition of a randomly chosen matrix?

I think the latter question is probably the easiest to answer of all posed here (except maybe coming up with an SDP-based approximate factorization).

Drawing to a close at IBM

Well damn. It’s been a long while. Time for another post about my not having posted in a while.

I’m wrapping up my internship here at IBM, preparing the exit talk on my research. It’s entitled “New results on the accuracy of randomized spectral methods.” I’ll put up the talk after we’ve wrapped up the paper and submitted it to ArXiv, in a month or so, but here’s a synopsis. Motivated by the fact that a lot of applications of randomized projection-based low-rank approximations use them to approximate subspaces rather than matrices (the canonical example being spectral clustering), we investigated the accuracy of the subspaces so obtained. It turns out that you have a pretty nice dependence on the number of steps of the power method you take, and the oversampling … the error bounds are almost usable. That’s a coup for machine learning applications! However, I think a sharper analysis (ours is as simple as possible) might be able to make the bounds tight enough for practical use. Along the way, we revisited the analysis of the classical subspace iteration method (unintentionally), and found a geometric interpretation of that “random space interaction with singular spaces” term that shows up in the analysis of random projection-based low-rank approximation error bounds.

I think the results are nice enough for a conference paper, but I’m (as usual) a little leery of claiming a strong original contribution, since a lot of what we did turns out to be only a few steps removed from others’ work. But them’s the breaks when you work in numerical linear algebra!

when do \(A\) and \(AA^T\) have the same spectrum?

There are matrices for which the spectrum of \(\mat{A}\) and \(\mat{A}\mat{A}^T\) are identical. As an example, take
\[
\mat{A} = \begin{pmatrix}
\frac{1}{2} & 0 & \frac{1}{2} & 0 \\
0 & \frac{1}{2} & 0 & \frac{1}{2} \\
0 & 0 & 0 & 0 \\
0 & 0 & 0 & 0
\end{pmatrix}.
\]
Freaky, right? I wouldn’t have believed it. I know one class of matrices for which this is true, but it’s a bit funky: if \(\mat{C}\) and \(\mat{B}\) are rectangular matrices related in a nice way (essentially, \(\mat{B}\) is \(\mat{C}\) plus a perturbation that is orthogonal to the range space of \(\mat{C}\)), then \(\mat{A} = \mat{C} \mat{B}^\dagger\) has this property. Here \(\dagger\) denotes pseudoinversion. The proof of this special case is complicated, and not particularly intuitive. So I wonder if there’s a simpler proof for the general case.

Summer of languages

This summer’s the summer of languages for me. I’m learning R piecemeal, because I’m working on a data analytics project that requires a lot of statistical analysis: learning a new language is a bother, but the unprecedented amount of statistical packages available for R justifies the investment. I also decided to dive into Julia, despite its youth, mostly because of this:

We want a language that’s open source, with a liberal license. We want the speed of C with the dynamism of Ruby. We want a language that’s homoiconic, with true macros like Lisp, but with obvious, familiar mathematical notation like Matlab. We want something as usable for general programming as Python, as easy for statistics as R, as natural for string processing as Perl, as powerful for linear algebra as Matlab, as good at gluing programs together as the shell. Something that is dirt simple to learn, yet keeps the most serious hackers happy. We want it interactive and we want it compiled.

(Did we mention it should be as fast as C?)

Well, that and the fact that Matlab isn’t readily available here at IBM. I could use my Caltech license, but the writing’s on the wall. I might as well train myself in a good alternative for when I no longer have that option. Octave isn’t an option I’m happy with: it looks kludgy, Matlab code doesn’t *actually* run without modification, and the plotting options are embarassingly sad (Gnuplot in 2012? come on).

Third week of the IBM internship

I’m working on an internal project that motivated me to look into survival analysis. Cool stuff, that. Essentially, you have a bunch of data about the lifetimes of the some objects and potentially related covariates, and from that data you’d like to estimate the functional relationship between the lifetimes and the covariates, in the form of the survival function \(P(T > x) = f(x, X_1, \ldots, X_n).\) It’s not clear that this is the best lens to approach the particular problem I’m looking at (churn)— for instance, you could think of this in machine learning or approximation theory terms rather than statistical terms—, but it’s certainly a very sensible approach.

I still have access to the online Springer books through Caltech, so I started to read Dynamic regression methods for survival analysis, which is more than a bit ambitious given my lack of statistical training. The first chapter’s quite inspiring though: I now want to learn about compensators and martingale central limit theorems. Apparently one can view a lot of statistical estimators as compensators (or something close), and the difference between the estimators and the actual quantities as a martingale (or, more generally, a zero-mean stochastic process). Then the asymptotic error of the estimator can be understood through the central limit theorem (\emph{or}, in the more general case, through empirical process theory— which for various reasons, including the connection to sums of independent r.v.s in Banach spaces, I’m also very interested in learning).

Of course, my first question is: is there a nonasymptotic version of the martingale central limit theorem? And my second question is: do these concepts make sense in the matrix case (whatever that is), and what’s known there? Is this something worth working on? Unfortunately, I know nothing about statistics … I think maybe there’s a relation to the self-normalizing processes that Joel mentioned I should look into. So there’s one more thing to add to my list of too-little-time investigations.

Links to interesting literature: