Let PWOR(ℓ) denote a probability under the model where a set I of ℓ indices is sampled without replacement from {1,2,…,n}, and PWR(ℓ) denote a probability under the model where the ℓ indices are chosen with replacement. Then is it the case that, for any collection →vi of n vectors in a vector space,
PWOR(ℓ)(‖∑i∈I→vi‖≥t)≤PWR(ℓ)(‖∑i∈I→vi‖≥t)
up to constants?
In general, the
WOR(ℓ) model is harder to work with, because the terms in the resulting sum are dependent, whereas the terms in the
WR(ℓ) sum are independent. So if this inequality held, it would make life easier.
I don’t know if this inequality is true in general, but there are classical exponential tail bounds for the rhs, and I believe that those bounds hold up to constants for the lhs. Here’s an outline of the argument. The key fact is that
\EWOR(ℓ)f(∑i∈I→vi)≤\EWR(ℓ)f(∑i∈I→vi)
for any convex function
f. This was originally realized by Hoeffding, but Gross provides a more readable account. Take
f(x)=exp(‖x‖) to see that the moment generating function under the
WOR(ℓ) model is dominated by the moment generating function under the
WR(ℓ) model.
Next step. We know that the deviation of the norm of the sum from its expectation under the WR(ℓ) model satisfies a Bennett inequality (see Ledoux and Talagrand, chapter 6):
PWR(ℓ)(|‖∑i∈I→vi‖–\EWR(ℓ)‖∑i∈I→vi‖|≥t)≤exp[t2a–(t2a+b24a2)log(1+2atb2)].
Here
a is a bound on the norms of the individual summands and
b2=ℓ/n⋅∑ni=1‖→vi‖2.
From the tail bound/moment/mgf connection, we know that from this tail bound we can recover a bound on the mgf of this deviation under the
WR(ℓ) model that
allows recovery of the Bennett inequality from the usual Chernoff arguments (up to changes in the constants).
(Alternatively, I realized latterly, you can just look for a proof this or a similar inequality which proceeds by bounding the mgf, and use that bound directly rather than reverse engineering to get a similar but looser bound. The issue here is that you have to find a proof of this inequality with uses the mgf. I was not able to do this, but I didn’t try too hard, and there are similar results in chapter 1 of Ledoux and Talagrand which are good enough for my purposes)
Now we do some algebra to bound an mgf involving the norm of the sum formed under the WOR(ℓ) model:
\EWOR(ℓ)exp(λ‖∑i∈I→vi‖)≤\EWR(ℓ)exp(λ‖∑i∈I→vi‖)
from our first observation, and the right hand side is smaller than
\EWR(ℓ)exp(λ|‖∑i∈I→vi‖–\EWR(ℓ)‖∑i∈I→vi‖|)exp(λ\EWR(ℓ)‖∑i∈I→vi‖),
so rearranging gives
\EWOR(ℓ)exp(λ[‖∑i∈I→vi‖–\EWR(ℓ)‖∑i∈I→vi‖])≤\EWR(ℓ)exp(λ|‖∑i∈I→vi‖–\EWR(ℓ)‖∑i∈I→vi‖|).
Recall that we have a bound on the rhs from the Bennett inequality. Thus we have a bound on the mgf of the deviation of the norm of the sum under the WOR(ℓ) model above the expectation of the norm of the sum under the WR(ℓ) model, and this bound is sufficiently strong to generate a Bennett inequality. Thus, caveat! up to changes in the constants (since I have not done the calculations), we have that
PWOR(ℓ)(‖∑i∈I→vi‖≥t+\EWR(ℓ)‖∑i∈I→vi‖)≤exp[t2a–(t2a+b24a2)log(1+2atb2)].