## Roll On

### February 4, 2013

I was reading, of all things, a romance novel when I came across this famous excerpt from Childe Harold’s Pilgrimage:

Roll on, thou deep and dark blue Ocean – roll!

Ten thousand fleets sweep over thee in vain;

Man marks the earth with ruin – his control

Stops with the shore; – upon the watery plain

The wrecks are all thy deed, nor doth remain

A shadow of manâ€™s ravage, save his own,

When for a moment, like a drop of rain,

He sinks into thy depths with bubbling groan,

Without a grave, unknelled, uncoffined, and unknown.

There are few poets that affect me like Lord Byron does… or, more likely, they are lots of poets whose work would affect me similarly if I knew of them.

## Monument

### January 24, 2013

Jeff Salyards is doing an Ask Me Anything (AMA) on Reddit tonight at 5 PCT. I started reading his debut novel, Scourge of the Betrayer, a couple of weeks ago, but couldn’t get into it for one reason or another. Maybe I was just coming off of an urban fantasy binge, and the relatively heavier setting of SOTB was just too much for me to get into at the time (the couple dozens of pages I read evoked Glenn Cook’s Dark Company novels, especially since the main character is a chronicler). Or maybe I was put off by the heavy-handed way in which a SECRET MISSION with EPIC IMPLICATIONS was being hinted at in the little I read (I mean, almost everyone reading this genre knows what to expect, ok?). Either way, I plan on finishing it eventually, if only to see why the Jeff chose a flail as the weapon of import. But I do expect that it gets better, eventually.

I mention Jeff’s AMA mostly because it got me thinking about his book, which in turn got me thinking about the Dark Company novels (which I love love love), which in turn got me thinking about Ian Graham’s excellent debut novel, Monument. I swear this book flew under the radar, and Ian Graham seems not to have written anything since. But when I come across a fantasy novel revolving around an anti-hero, at some point or another I end up comparing it with Monument. So far none of the anti-heroes I’ve come across have been as despicable and yet somehow comfortably human as Ballas. I’m not even going to try to sell the book to you: just go and read it.

## A list of \(\|\cdot\|_{\infty \rightarrow 1}^*\) problems

### November 27, 2012

Recall our old friend the \(\|\cdot\|_{\infty \rightarrow 1}^*\) norm (from the first version of this website, which I still haven’t gotten around to merging into the current iteration). Given a matrix \(\mat{A},\) it is a measure of the energy in the smallest expansion of \(\mat{A}\) in terms of rank one sign matrices:

\[

\|\mat{A}\|_{\infty \rightarrow 1}^* = \inf\left\{\sum_i d_i \,:\, \mat{A} = \sum_i d_i \vec{u}_i \vec{v}_i, \right\}

\]

where \(\{\vec{u}_i, \vec{v}_i\}\) is a collection of sign vectors. Note that the number of sign vectors necessary to achieve this minimum is not known *a priori.*

This is an interesting norm both intrinsically (as a potential source of factor models in statistics and machine learning) and because of its relations to other norms. See these slides for details.

An estimate of \(\|\cdot\|_{\infty \rightarrow 1}^*\) to within a multiplicative factor of about \(1.783\) can be obtained using a semidefinite program, but I suspect it’s NP hard to compute \(\|\cdot\|_{\infty \rightarrow 1}^*\) exactly (because \(\|\cdot\|_{\infty \rightarrow 1}^*\) is the trace dual of the \(\|\cdot\|_{\infty \rightarrow 1}\) norm, which is known to be NP-hard to compute — if the latter is also APX-hard, which I’m not sure of either way, then that means you definitely couldn’t compute \(\|\cdot\|_{\infty \rightarrow 1}^*\) efficiently in polynomial time). However, more interesting is the question of approximately obtaining the factorization corresponding to \(\|\cdot\|_{\infty \rightarrow 1}^*\): there doesn’t seem to be a good way to obtain such an approximate factorization from the SDP that gives the approximation of \(\|\cdot\|_{\infty \rightarrow 1}^*\), but maybe there’s a clever way to do so that I haven’t seen.

Other interesting questions abound: can we characterize \(\|\cdot\|_{\infty \rightarrow 1}^*\) or the number of rank 1 matrices needed to obtain the corresponding decomposition from a randomly chosen matrix (say i.i.d. Gaussian entries)? What if we just want probability bounds on the number of rank 1 matrices in the decomposition of a randomly chosen matrix?

I think the latter question is probably the easiest to answer of all posed here (except maybe coming up with an SDP-based approximate factorization).

## Drawing to a close at IBM

### August 10, 2012

Well damn. It’s been a long while. Time for another post about my not having posted in a while.

I’m wrapping up my internship here at IBM, preparing the exit talk on my research. It’s entitled “New results on the accuracy of randomized spectral methods.” I’ll put up the talk after we’ve wrapped up the paper and submitted it to ArXiv, in a month or so, but here’s a synopsis. Motivated by the fact that a lot of applications of randomized projection-based low-rank approximations use them to approximate subspaces rather than matrices (the canonical example being spectral clustering), we investigated the accuracy of the subspaces so obtained. It turns out that you have a pretty nice dependence on the number of steps of the power method you take, and the oversampling … the error bounds are almost usable. That’s a coup for machine learning applications! However, I think a sharper analysis (ours is as simple as possible) might be able to make the bounds tight enough for practical use. Along the way, we revisited the analysis of the classical subspace iteration method (unintentionally), and found a geometric interpretation of that “random space interaction with singular spaces” term that shows up in the analysis of random projection-based low-rank approximation error bounds.

I think the results are nice enough for a conference paper, but I’m (as usual) a little leery of claiming a strong original contribution, since a lot of what we did turns out to be only a few steps removed from others’ work. But them’s the breaks when you work in numerical linear algebra!

## Problem: another inequality

### July 3, 2012

Let \(\mat{F},\mat{G}\) be positive definite matrices (do they have to be definite?) and \(0 \leq p \leq 2.\) Show that

\[

\tr\left(\mat{F}^{p/2} – \mat{G}^{p/2}\right) \leq \frac{p}{2} \tr\left((\mat{F} – \mat{G})\mat{G}^{p/2 – 1}\right).

\]

I’m working on it. See if you can get a proof before I do đź™‚

## when do \(A\) and \(AA^T\) have the same spectrum?

### June 22, 2012

There are matrices for which the spectrum of \(\mat{A}\) and \(\mat{A}\mat{A}^T\) are identical. As an example, take

\[

\mat{A} = \begin{pmatrix}

\frac{1}{2} & 0 & \frac{1}{2} & 0 \\

0 & \frac{1}{2} & 0 & \frac{1}{2} \\

0 & 0 & 0 & 0 \\

0 & 0 & 0 & 0

\end{pmatrix}.

\]

Freaky, right? I wouldn’t have believed it. I know one class of matrices for which this is true, but it’s a bit funky: if \(\mat{C}\) and \(\mat{B}\) are rectangular matrices related in a nice way (essentially, \(\mat{B}\) is \(\mat{C}\) plus a perturbation that is orthogonal to the range space of \(\mat{C}\)), then \(\mat{A} = \mat{C} \mat{B}^\dagger\) has this property. Here \(\dagger\) denotes pseudoinversion. The proof of this special case is complicated, and not particularly intuitive. So I wonder if there’s a simpler proof for the general case.

## Summer of languages

### June 22, 2012

This summer’s the summer of languages for me. I’m learning R piecemeal, because I’m working on a data analytics project that requires a lot of statistical analysis: learning a new language is a bother, but the unprecedented amount of statistical packages available for R justifies the investment. I also decided to dive into Julia, despite its youth, mostly because of this:

We want a language thatâ€™s open source, with a liberal license. We want the speed of C with the dynamism of Ruby. We want a language thatâ€™s homoiconic, with true macros like Lisp, but with obvious, familiar mathematical notation like Matlab. We want something as usable for general programming as Python, as easy for statistics as R, as natural for string processing as Perl, as powerful for linear algebra as Matlab, as good at gluing programs together as the shell. Something that is dirt simple to learn, yet keeps the most serious hackers happy. We want it interactive and we want it compiled.

(Did we mention it should be as fast as C?)

Well, that and the fact that Matlab isn’t readily available here at IBM. I could use my Caltech license, but the writing’s on the wall. I might as well train myself in a good alternative for when I no longer have that option. Octave isn’t an option I’m happy with: it looks kludgy, Matlab code doesn’t *actually* run without modification, and the plotting options are embarassingly sad (Gnuplot in 2012? come on).

## new reading blog

### June 21, 2012

I’ve started a separate blog to track my mathematical reading. Check it out, comment, and suggest material I might find interesting.

## Third week of the IBM internship

### June 13, 2012

I’m working on an internal project that motivated me to look into survival analysis. Cool stuff, that. Essentially, you have a bunch of data about the lifetimes of the some objects and potentially related covariates, and from that data you’d like to estimate the functional relationship between the lifetimes and the covariates, in the form of the survival function \(P(T > x) = f(x, X_1, \ldots, X_n).\) It’s not clear that this is the best lens to approach the particular problem I’m looking at (churn)— for instance, you could think of this in machine learning or approximation theory terms rather than statistical terms—, but it’s certainly a very sensible approach.

I still have access to the online Springer books through Caltech, so I started to read *Dynamic regression methods for survival analysis*, which is more than a bit ambitious given my lack of statistical training. The first chapter’s quite inspiring though: I now want to learn about compensators and martingale central limit theorems. Apparently one can view a lot of statistical estimators as compensators (or something close), and the difference between the estimators and the actual quantities as a martingale (or, more generally, a zero-mean stochastic process). Then the asymptotic error of the estimator can be understood through the central limit theorem (\emph{or}, in the more general case, through empirical process theory— which for various reasons, including the connection to sums of independent r.v.s in Banach spaces, I’m also very interested in learning).

Of course, my first question is: is there a nonasymptotic version of the martingale central limit theorem? And my second question is: do these concepts make sense in the matrix case (whatever that is), and what’s known there? Is this something worth working on? Unfortunately, I know nothing about statistics … I think maybe there’s a relation to the self-normalizing processes that Joel mentioned I should look into. So there’s one more thing to add to my list of too-little-time investigations.

Links to interesting literature:

- Pollard’s notes on empirical process theory (the most concrete/approachable I can find)
- Sara van de Geer’s notes on empirical process theory (concise and simple)
- An introduction to survival analysis

## Truncated SVD … how?

### June 7, 2012

This question has been bothering me off and on for several months now: how *exactly* is the truncated SVD computed, and what is the cost? I’ve gathered that most methods are based on Krylov subspaces, and that implicitly restarted Arnoldi processes seem to be the most popular proposal … in what miniscule amount of literature I’ve been able to find on the subject (like only two or three papers, tops, that actually focus on computing truncated SVDs, as opposed to the *multitudes* that propose using them). However, I’ve no idea if that’s the case, or just the impression I have.

But mostly, I want to know the compute time of the most efficient truncated SVD algorithm. Why? A naive estimate of getting the rank-\(k\) truncated SVD of an \(m\times n\) matrix using Krylov subspace techniques is \(\mathrm{O}(mnk)\), but in practice the compute time actually depends on properties of the matrix. On the other hand, basically all randomized low-rank techniques take around that many operations but without depending on the properties of the matrix. Soooo, it’d be nice to know when randomized techniques are competitive. Since there’s so much literature on them, I assume they are competitive (and the Halko–Martinsson–Tropp survey paper has some plots that show speedups).