Sampling models and vector sums

What happens when you sample columns from a matrix whose columns all have roughly the same norm? It’s clear that you should have some kind of concentration: say if you sample \(\ell\) of \(n\) columns, then the Frobenius norm of the resulting matrix, appropriately scaled, should be a very good approximation of the Frobenius norm of the original matrix. This is easy to show (well, it’s easy to establish a rather weak probability tail bound on the norm) using a vector Bernstein inequality, where the `vectors’ are simply the original matrix with all but one column set to zero. The key step is to change the sampling model from one where you randomly select a subset of cardinality \(\ell\) to a simpler one based on i.i.d. Bernoulli r.v.s where you randomly select a subset with expected cardinality \(\ell\). The change is straightforward because in this case, the support of the vectors don’t overlap, so adding more vectors montonically increases the Frobenius norm.

Harder question: what happens when you sample columns from \(\mat{X}\) and rows from \(\mat{Y}\), once again with appropriate rescaling, to form \(\hat{\mat{X}}\) and \(\hat{\mat{Y}}\)? In particular, what can you say about \(\|\hat{\mat{X}}\hat{\mat{Y}} – \mat{X}\mat{Y}\|_F\)? Again, assuming the Frobenius norms of \(\mat{X}\) and \(\mat{Y}\) are evenly distributed across their columns and rows, respectively, this difference should be small. It’s easy to get a bound on the expectation, but what happens in probability? Here you could again use a vector Bernstein inequality to get a reasonable result (I think), with vectors \(\text{vec}(\mat{X}_i \mat{Y}^i)\), but now the support of the vectors overlap, so it’s not clear if the change of sampling models can be done.

So, the question is: let \(\mathbb{P}_{U(\ell)}\) denote the probability under the model of randomly selecting subsets of cardinality exactly \(\ell\) and \(\mathbb{P}_{B(\ell)}\) denote the probability under the model of randomly selecting subsets of cardinality on average \(\ell\). Is it the case that, for vectors in some vector space,
\[
\mathbb{P}_{U(\ell)}\left( \left\|\sum_{i \in I} v_i \right\| \geq t \right) \leq 2 \mathbb{P}_{B(\ell)}\left( \left\|\sum_{i \in I} v_i \right\| \geq t \right)?
\]
The constant may vary, or there may even be a small additive term on the rhs, and I really only need an answer in the case that the norm is the standard Euclidean norm on \(\R^n\).