February 2012 – ChapterZero

My Android ebook reading workflow (optimized for frequent turnover of ebooks)

Synopsis: Aldiko’s not appropriate if you plan on constantly rotating the collection of ebooks on your mobile device. Instead, a combination of Calibre, Moon+Reader, and DropBox seems to do the trick. Consider using Moon+Reader as your default reader unless you need Aldiko’s capabilities for dealing with large ebook collections, as Moon+Reader has a more polished reading experience than Aldiko.

A major contributor to my decision to get an Android tablet was my desire to be able to read ebooks comfortably. Namely, with more portability and a form factor closer to the usual book experience. (the other major contributor was my desire to be able to read research papers comfortably, but that’s an issue for another discussion)

Until recently I was using Aldiko as my primary ebook reader, because it seems to be most popular reader, going both from the Android Market impression and Google searches for “Android ebook reader” and similar terms. Aldiko’s great if you have a large, static, well-tagged collection of EPUBs. The interface makes it a breeze to find books based on tags or authors, and keeps track of your recent reads, but it has some really annoying drawbacks. First, it is awful at properly removing books: it leaves behind the tag information for files that have been deleted. Even reinstalling Aldiko doesn’t seem to fix this. On a related note, oftentimes Aldiko has multiple index entries for one book. This may have been due to me having two formats of the same file, but one would expect that any ereader should recognize this and adjust accordingly.

These are somewhat major usability issues, but the main problem I have with Aldiko is that it’s just not aimed at a user like me, who constantly switches out ebooks. Aldiko is not designed for rapid turnover of books. It’s aimed more at people who store all their ebooks on the device, add books occasionally, and don’t delete any books. That’s a dealbreaker for me, since I have lots of ebooks, don’t want to waste space on my tablet storing files I won’t look at for several years, and want to regularly switch out books I’ve finished reading for ones I plan on reading soon.

Tonight I’ve been experimenting with a new setup of Calibre, Moon+Reader, and DropBox that seems to provide a setup that suits my style more. Moon+Reader has a book database, just as Aldiko does, but it’s also more amenable to reading EPUBs on a file-by-file basis (I forgot to mention that Aldiko can’t just load an EPUB off the disk: you have to import it into Aldiko’s special purpose database first). I’m quite impressed with the reading experience in Moon+Reader: it doesn’t have the collection organizational tools of Aldiko, but on a per-book basis, the navigation capabilities and formatting options as well as the overall feel of the application is more polished.

My setup is simple: in Calibre, I use the “connect to folder” option to load my DropBox folder as a device, then sync the books I’m currently reading to the device. On my tablet, I then open the files from DropBox using Moon+Reader. As long as I don’t reboot the tablet (or force kill DropBox), I’m able to interact with the file as though it were stored permanently on the tablet (e.g. Moon+Reader will store your location and return to it every time you visit the book). If you want, you can also favorite the ebook files in DropBox so that a local permanent copy of them is mirrored to your tablet, and then import them (from your DropBox scratch folder) into Moon+Reader. This method has the advantage that you don’t lose any metadata when you reboot the tablet, and if you delete the files from DropBox, they’re deleted locally and automagically from Moon+Reader. The only disadvantage to this technique is that the DropBox application doesn’t let you favorite folders for now, so if you have multiple ebooks in a folder, you have to favorite them all manually. This isn’t a big deal.

Symmetrized products of positive matrices are not positive

.. duh. This is in relation to the symmetrized matrix AM-GM inequality conjectured by Recht and Re. The most obvious/direct way of maybe attempting to prove this inequality is to show the semidefinite inequality
\[
\left(\frac{1}{n}\sum\nolimits_{i=1}^n\mat{A}_i\right)^n \succeq \frac{1}{n!} \sum_\sigma \mat{A}_{\sigma(1)}\mat{A}_{\sigma(2)}\times\cdots\times \mat{A}_{\sigma(n)}.
\]

Unfortunately, the right hand side isn’t even a positive matrix in general. The proof is pretty ridiculously simple, but I console myself with the fact that it wasn’t immediately obvious to some folks I’ve discussed this problem with that the symmetrized sum on the rhs could be nonpositive in general.

Consider the case of just two summands. Then \(\vec{x}^T (\mat{B}\mat{A} + \mat{A}\mat{B}) \vec{x} = 2\vec{x}^t \mat{B}\mat{A} \vec{x},\) so \(\mat{A}\mat{B} + \mat{B}\mat{A}\) is positive only if \(\mat{A}\mat{B}\) is positive. This in turn implies \(\mat{A} \mat{B} = \mat{B}\mat{A},\) so at least in this case, the matrices must commute for the symmetrized sum to be positive.

QOTD (from an old IMO): show that none of these sums are integers

Here’s an old IMO question (or maybe it’s off one of the short or long lists) that I haven’t made any progress on:

Show that for any natural numbers \(a\) and \(n\) the sum
\[
1 + \frac{1}{1+a} + \frac{1}{1 + 2a} + \cdots + \frac{1}{1+n a}
\]
is not an integer.

An obvious and cool corollary is that the harmonic numbers are never integers.

Reading List for Feb and March 2012

My top picks, with links if you want to tag along:

Ledoux and Talagrand (available from Talagrand’s webpage). Specifically, the chapters on Rademacher and Gaussian sums, symmetrization and contraction properties, and tail bounds for sums of random vectors. Also the bits on scalar martingale arguments.
Sourav Chatterjee’s thesis and its matrix counterpart. He came up with a beautiful approach for deriving concentration inequalities from exchangeable pairs which generalizes nicely to the matrix case.
Plan and Vershynin’s works on one-bit compressed sensing, tessellation for dimension reduction, and sparse logistic regression. I’ve said it before and I’ll say it again: I’m not interested in compressed sensing per se. But the concentration and geometric techniques they use are both beautiful and useful, and I’m interesting in learning them.
Robust computation of linear models, or How to find a needle in a haystack. Robust PCA is more interesting than compressed sensing, but again I’m most interested in the tools here.
Candes et al.’s paper on subspace clustering. Interesting problem and interesting geometric tools.

Norms of sums of random vectors sampled without replacement

Let \(\mathbb{P}_{WOR(\ell)}\) denote a probability under the model where a set \(I\) of \(\ell\) indices is sampled without replacement from \(\{1,2,\ldots,n\}\), and \(\mathbb{P}_{WR(\ell)}\) denote a probability under the model where the \(\ell\) indices are chosen with replacement. Then is it the case that, for any collection \(\vec{v}_i\) of \(n\) vectors in a vector space,
\[
\mathbb{P}_{WOR(\ell)}\left(\left\| \sum\nolimits_{i \in I} \vec{v}_i \right\| \geq t \right) \leq \mathbb{P}_{WR(\ell)}\left(\left\| \sum\nolimits_{i \in I} \vec{v}_i \right\| \geq t \right)
\]
up to constants?
In general, the \(WOR(\ell)\) model is harder to work with, because the terms in the resulting sum are dependent, whereas the terms in the \(WR(\ell)\) sum are independent. So if this inequality held, it would make life easier.

I don’t know if this inequality is true in general, but there are classical exponential tail bounds for the rhs, and I believe that those bounds hold up to constants for the lhs. Here’s an outline of the argument. The key fact is that
\[
\E_{WOR(\ell)}f\left(\sum\nolimits_{i \in I} \vec{v}_i\right) \leq \E_{WR(\ell)}f\left(\sum\nolimits_{i \in I} \vec{v}_i\right)
\]
for any convex function \(f.\) This was originally realized by Hoeffding, but Gross provides a more readable account. Take \(f(x) = \exp(\|x\|)\) to see that the moment generating function under the \(WOR(\ell)\) model is dominated by the moment generating function under the \(WR(\ell)\) model.

Next step. We know that the deviation of the norm of the sum from its expectation under the \(WR(\ell)\) model satisfies a Bennett inequality (see Ledoux and Talagrand, chapter 6):
\[
\mathbb{P}_{WR(\ell)}\left( \left|\left\|\sum\nolimits_{i \in I} \vec{v}_i\right\| – \E_{WR(\ell)} \left\|\sum\nolimits_{i \in I} \vec{v}_i \right\| \right| \geq t \right) \leq \exp\left[\frac{t}{2a} – \left(\frac{t}{2a} + \frac{b^2}{4a^2} \right) \log\left(1 + \frac{2 a t}{b^2} \right) \right].
\]
Here \(a\) is a bound on the norms of the individual summands and \(b^2 = \ell/n \cdot \sum_{i=1}^n \|\vec{v}_i\|^2.\)
From the tail bound/moment/mgf connection, we know that from this tail bound we can recover a bound on the mgf of this deviation under the \(WR(\ell)\) model that allows recovery of the Bennett inequality from the usual Chernoff arguments (up to changes in the constants).

(Alternatively, I realized latterly, you can just look for a proof this or a similar inequality which proceeds by bounding the mgf, and use that bound directly rather than reverse engineering to get a similar but looser bound. The issue here is that you have to find a proof of this inequality with uses the mgf. I was not able to do this, but I didn’t try too hard, and there are similar results in chapter 1 of Ledoux and Talagrand which are good enough for my purposes)

Now we do some algebra to bound an mgf involving the norm of the sum formed under the \(WOR(\ell)\) model:
\[
\E_{WOR(\ell)} \exp\left(\lambda \left\|\sum\nolimits_{i \in I} \vec{v}_i \right\|\right) \leq \E_{WR(\ell)} \exp\left(\lambda \left\|\sum\nolimits_{i \in I} \vec{v}_i \right\|\right)
\]
from our first observation, and the right hand side is smaller than
\[
\E_{WR(\ell)} \exp\left(\lambda \bigg|\left\|\sum\nolimits_{i \in I} \vec{v}_i \right\| – \E_{WR(\ell)}\left\|\sum\nolimits_{i \in I} \vec{v}_i \right\|\bigg| \right) \exp\left(\lambda \E_{WR(\ell)}\left\|\sum\nolimits_{i \in I} \vec{v}_i \right\|\right),
\]
so rearranging gives
\[
\E_{WOR(\ell)} \exp\left(\lambda \left[\left\|\sum\nolimits_{i \in I} \vec{v}_i \right\| – \E_{WR(\ell)}\left\|\sum\nolimits_{i \in I} \vec{v}_i \right\|\right] \right) \leq \E_{WR(\ell)} \exp\left(\lambda \bigg|\left\|\sum\nolimits_{i \in I} \vec{v}_i \right\| – \E_{WR(\ell)}\left\|\sum\nolimits_{i \in I} \vec{v}_i \right\|\bigg| \right).
\]

Recall that we have a bound on the rhs from the Bennett inequality. Thus we have a bound on the mgf of the deviation of the norm of the sum under the \(WOR(\ell)\) model above the expectation of the norm of the sum under the \(WR(\ell)\) model, and this bound is sufficiently strong to generate a Bennett inequality. Thus, caveat! up to changes in the constants (since I have not done the calculations), we have that
\[
\mathbb{P}_{WOR(\ell)}\left(\left\| \sum\nolimits_{i \in I} \vec{v}_i \right\| \geq t + \E_{WR(\ell)}\left\|\sum\nolimits_{i \in I} \vec{v}_i \right\| \right) \leq \exp\left[\frac{t}{2a} – \left(\frac{t}{2a} + \frac{b^2}{4a^2} \right) \log\left(1 + \frac{2 a t}{b^2} \right) \right].
\]

Sampling models and vector sums

What happens when you sample columns from a matrix whose columns all have roughly the same norm? It’s clear that you should have some kind of concentration: say if you sample \(\ell\) of \(n\) columns, then the Frobenius norm of the resulting matrix, appropriately scaled, should be a very good approximation of the Frobenius norm of the original matrix. This is easy to show (well, it’s easy to establish a rather weak probability tail bound on the norm) using a vector Bernstein inequality, where the `vectors’ are simply the original matrix with all but one column set to zero. The key step is to change the sampling model from one where you randomly select a subset of cardinality \(\ell\) to a simpler one based on i.i.d. Bernoulli r.v.s where you randomly select a subset with expected cardinality \(\ell\). The change is straightforward because in this case, the support of the vectors don’t overlap, so adding more vectors montonically increases the Frobenius norm.

Harder question: what happens when you sample columns from \(\mat{X}\) and rows from \(\mat{Y}\), once again with appropriate rescaling, to form \(\hat{\mat{X}}\) and \(\hat{\mat{Y}}\)? In particular, what can you say about \(\|\hat{\mat{X}}\hat{\mat{Y}} – \mat{X}\mat{Y}\|_F\)? Again, assuming the Frobenius norms of \(\mat{X}\) and \(\mat{Y}\) are evenly distributed across their columns and rows, respectively, this difference should be small. It’s easy to get a bound on the expectation, but what happens in probability? Here you could again use a vector Bernstein inequality to get a reasonable result (I think), with vectors \(\text{vec}(\mat{X}_i \mat{Y}^i)\), but now the support of the vectors overlap, so it’s not clear if the change of sampling models can be done.

So, the question is: let \(\mathbb{P}_{U(\ell)}\) denote the probability under the model of randomly selecting subsets of cardinality exactly \(\ell\) and \(\mathbb{P}_{B(\ell)}\) denote the probability under the model of randomly selecting subsets of cardinality on average \(\ell\). Is it the case that, for vectors in some vector space,
\[
\mathbb{P}_{U(\ell)}\left( \left\|\sum_{i \in I} v_i \right\| \geq t \right) \leq 2 \mathbb{P}_{B(\ell)}\left( \left\|\sum_{i \in I} v_i \right\| \geq t \right)?
\]
The constant may vary, or there may even be a small additive term on the rhs, and I really only need an answer in the case that the norm is the standard Euclidean norm on \(\R^n\).