Truncated SVD … how?

This question has been bothering me off and on for several months now: how *exactly* is the truncated SVD computed, and what is the cost? I’ve gathered that most methods are based on Krylov subspaces, and that implicitly restarted Arnoldi processes seem to be the most popular proposal … in what miniscule amount of literature I’ve been able to find on the subject (like only two or three papers, tops, that actually focus on computing truncated SVDs, as opposed to the multitudes that propose using them). However, I’ve no idea if that’s the case, or just the impression I have.

But mostly, I want to know the compute time of the most efficient truncated SVD algorithm. Why? A naive estimate of getting the rank-$$k$$ truncated SVD of an $$m\times n$$ matrix using Krylov subspace techniques is $$\mathrm{O}(mnk)$$, but in practice the compute time actually depends on properties of the matrix. On the other hand, basically all randomized low-rank techniques take around that many operations but without depending on the properties of the matrix. Soooo, it’d be nice to know when randomized techniques are competitive. Since there’s so much literature on them, I assume they are competitive (and the Halko–Martinsson–Tropp survey paper has some plots that show speedups).

QOTD: total unimodularity

I started reading Matousek’s discrete geometry book. Specifically, chapter 12 on applications of high-dimensional polytopes. Mostly because he apparently draws a connection between graphs and the Brunn-Minkowski theory, and I’m interested to see exactly what that connection is and if it has any interesting implications.

I just finished the proof of the weak perfect graph conjecture (it’s been a while since I read a nontrivial pure math proof, so I haven’t actually *digested* it entirely). Now I’m on the exercises for that portion. So, here’s an interesting question involving the concept of total unimodularity, which is apparently one of those foundational concepts in an area called polyhedral combinatorics.

A matrix $$\mat{A}$$ is called totally unimodular if every square submatrix of $$\mat{A}$$ has determinant 0 or $$\pm 1.$$ The question is to show that every nonsingular totally unimodular $$n \times n$$ matrix maps the lattice $$\mathbb{Z}^n$$ bijectively onto itself.

Update (solution)
It’s easy peasy. Clearly each entry of $$\mat{A}$$ is in $$\{-1,0,1\}$$, so $$\mat{A}$$ maps $$\mathbb{Z}^n$$ into itself. The fact that the mapping is bijective follows easily from Cramer’s rule, the fact that $$|\mathrm{det}(\mat{A})| = 1,$$ and the fact that all the minors of $$\mat{A}$$ are integral.

Are there perturbations that preserve incoherence and give nicely conditioned “submatrices”?

Let $$\mat{P}_{\mathcal{U}}$$ denote the projection unto a $$k$$-dimensional subspace of $$\C^{n}.$$ We say $$\mathcal{U}$$ is $$\mu$$-coherent if $$(\mat{P}_{\mathcal{U}})_{ii} \leq \mu \frac{k}{n}$$ for all $$i = 1, \ldots, n.$$

Let $$\mat{A} \in \C^{n \times n}$$ be a SPSD matrix whose top $$k$$-dimensional invariant subspace is $$\mu$$-coherent. Given $$\delta > 0$$ and $$\mat{S} \in \C^{n \times \ell}$$, where $$k < \ell \ll n,$$  is there a matrix $$\mat{E}_{\delta,\mat{S}} \in \C^{n \times n}$$ such that

1. $$\hat{\mat{A}} := \mat{A} + \mat{E}_{\delta,\mat{S}}$$ is SPSD,
2. The top $$k$$-dimensional invariant subspace of $$\hat{\mat{A}}$$ is also $$\mu$$-coherent,
3. $$\|\hat{\mat{A}} – \mat{A}\|_2 < \delta$$, and
4. $$\mat{S}^t \hat{\mat{A}} \mat{S} \succeq \delta \cdot \mat{I}$$ ?

It’d be fine if instead of (2) holding, the coherence of the top $$k$$-dimensional invariant subspace of $$\hat{\mat{A}}$$ is slightly larger than that of $$\mat{A}$$.

Sampling models and vector sums

What happens when you sample columns from a matrix whose columns all have roughly the same norm? It’s clear that you should have some kind of concentration: say if you sample $$\ell$$ of $$n$$ columns, then the Frobenius norm of the resulting matrix, appropriately scaled, should be a very good approximation of the Frobenius norm of the original matrix. This is easy to show (well, it’s easy to establish a rather weak probability tail bound on the norm) using a vector Bernstein inequality, where the `vectors’ are simply the original matrix with all but one column set to zero. The key step is to change the sampling model from one where you randomly select a subset of cardinality $$\ell$$ to a simpler one based on i.i.d. Bernoulli r.v.s where you randomly select a subset with expected cardinality $$\ell$$. The change is straightforward because in this case, the support of the vectors don’t overlap, so adding more vectors montonically increases the Frobenius norm.

Harder question: what happens when you sample columns from $$\mat{X}$$ and rows from $$\mat{Y}$$, once again with appropriate rescaling, to form $$\hat{\mat{X}}$$ and $$\hat{\mat{Y}}$$? In particular, what can you say about $$\|\hat{\mat{X}}\hat{\mat{Y}} – \mat{X}\mat{Y}\|_F$$? Again, assuming the Frobenius norms of $$\mat{X}$$ and $$\mat{Y}$$ are evenly distributed across their columns and rows, respectively, this difference should be small. It’s easy to get a bound on the expectation, but what happens in probability? Here you could again use a vector Bernstein inequality to get a reasonable result (I think), with vectors $$\text{vec}(\mat{X}_i \mat{Y}^i)$$, but now the support of the vectors overlap, so it’s not clear if the change of sampling models can be done.

So, the question is: let $$\mathbb{P}_{U(\ell)}$$ denote the probability under the model of randomly selecting subsets of cardinality exactly $$\ell$$ and $$\mathbb{P}_{B(\ell)}$$ denote the probability under the model of randomly selecting subsets of cardinality on average $$\ell$$. Is it the case that, for vectors in some vector space,
$\mathbb{P}_{U(\ell)}\left( \left\|\sum_{i \in I} v_i \right\| \geq t \right) \leq 2 \mathbb{P}_{B(\ell)}\left( \left\|\sum_{i \in I} v_i \right\| \geq t \right)?$
The constant may vary, or there may even be a small additive term on the rhs, and I really only need an answer in the case that the norm is the standard Euclidean norm on $$\R^n$$.

Verify these matrix properties (easy and fun)

I’m looking at products like $$\mat{G}^t \mat{A} \mat{G}$$ where the columns of $$\mat{G}$$ are (nonisotropic) normal vectors. Specifically, I’d like to know the distribution of the eigen/singular-values of this product. Surprisingly, I was unable to find any results in the literature on this, so I started reading Gupta and Nagar to see if I could work it out using Jacobian magic.

In the preliminary material, they list some basic matrix algebra facts. Your mission, should you choose to accept it, is to prove the following:

• If $$\mat{A} > \mat{0},$$ $$\mat{B} > \mat{0},$$ and $$\mat{A} – \mat{B} > \mat{0},$$ then $$\mat{B}^{-1} – \mat{A}^{-1} > \mat{0}$$
• If $$\mat{A}>\mat{0}$$ and $$\mat{B} > \mat{0},$$ then $$\det(\mat{A}+\mat{B}) > \det(\mat{A}) + \det(\mat{B})$$

The second’s easy, but for the first I found it necessary to use the fact that every positive matrix has a unique positive square root.

Some more, this time on Kronecker products:

• If $$\mat{A}$$ has eigenvalues $$\{\alpha_i\}$$ and $$\mat{B}$$ has eigenvalues $$\{\beta_j\},$$ then $$\mat{A} \otimes \mat{B}$$ has eigenvalues $$\{\alpha_i \beta_j\}.$$
• If $$\mat{A}$$ is $$m \times m$$ and $$\mat{B}$$ is $$n \times n$$, then $$\det(\mat{A} \otimes \mat{B}) = \det(\mat{A})^n \det(\mat{B})^m$$

It’s useful here to note that
$(\mat{A} \otimes \mat{B}) (\mat{C} \otimes \mat{D}) = \mat{AC} \otimes \mat{BD},$
which has some obvious consequences, like that the Kronecker product of two orthogonal matrices is orthogonal. To be clear, the first Kronecker question can be addressed without any tedious calculations.