NB: I will update this post as I read the paper, in case it turns out that the first issue I raised is not legitimately a concern.
Covariance estimation (and the natural extension, precision estimation) has always been an interesting topic for me because it (can) represent a concise, concrete, and very broadly applicable instance of applied nonasymptotic random matrix theory. Likewise, I’m also quite interested in matrix sketching algorithms. Thus I was very excited to see the latest preprint by Chen, Chi, and Goldsmith on arxiv which presents a convex optimization algorithm for recovering covariance matrices from randomized sketches obtained in a streaming manner. Their convex optimization problem is essentially the PhaseLift formulation used for recovering phase from magnitude, but their proof shows that it works for covariance matrix recovery.
I have only just started reading this paper, and I’m still excited, but I have two concerns already: first, it is not clear to me that the algorithm they propose is actually the algorithm they provide a guarantee for! At the least, the text must be corrected to make it clear that this is indeed the case. To be more precise, their algorithm is to compute a few random sketching vectors ahead of time, then as the rows of the matrix come in, compute the magnitude of the projection of each row onto *one* randomly chosen sketching vector. The measurement model described mathematically seems to compute the magnitude of the projection of each row onto *each* of the sketching vectors. Big difference there.
Second, their algorithm provides a Frobenius norm guarantee, which is par for the course, but they make claims about things like subspace detection, for which afaik, Frobenius norm guarantees are too weak to really be of interest. But here, this may be a matter of preference, and practitioners probably don’t care about sharp guarantees as long as it works in practice and has at least a semblance of theoretical support.