# Sampling uniformly from the set of partitions into a fixed number of nonempty sets

It’s easy to sample uniformly from the set of partitions of a set: you pick a number of bins using an appropriate exponential distribution, then randomly i.i.d. toss each element of the set into one of those bins. At the end of this procedure, the nonempty bins will constitute your uniformly sampled random partition. [literature ref: “Generation of a random partition of a finite set by an urn model” by Stam, 1983; pretty pictures ref: see this blog post].

Is there a similarly efficient and simple algorithm for sampling uniformly from the set of partitions into a fixed number of nonempty sets? The only algorithm I’m aware of takes advantage of the fact that the number of such partitions is given by $$\left\{n \atop p \right\}$$, a Stirling number of the second kind, if the set has $$n$$ elements and we desire $$p$$ nonempty subsets in the partition. In particular, we have the identity
$\left\{ {n \atop p} \right\} = \left\{ {n-1 \atop p-1} \right\} + p \left\{ {n-1 \atop p} \right\}$
that comes from observing that the only two ways to construct a partition of $$n$$ elements into $$p$$ nonempty sets are to: either partition the first $$n-1$$ elements into $$p-1$$ nonempty sets and take the remaining singleton as our final set in the partition, or we partition the first $$n-1$$ elements into $$p$$ nonempty sets and place the remaining element into any of these $$p$$ sets.

This observation leads to a straightforward recursive sampling procedure: with probability $$\left\{ {n-1 \atop p-1} \right\}/\left\{ {n \atop p} \right\}$$, use the first procedure with a randomly sampled partition of the first $$n-1$$ elements into $$p-1$$ nonempty sets, otherwise use the second procedure with a randomly sampled partition of the first $$n-1$$ elements into $$p$$ nonempty sets.

Unfortunately, this is not an efficient procedure for several reasons. In particular, it requires computing Stirling numbers, and taking their ratio. When $$n,k$$ are large, this will require both computational time and, more of a practical impediment, arbitrary-precision arithmetic. A straightforward implementation also relies on recursion, which is infeasible when $$n$$ is large. Clearly one can implement this algorithm without using recursion; one can also use an asymptotic expansion of the Stirling numbers to approximate the ratio $$\left\{ {n-1 \atop p-1} \right\}/\left\{ {n \atop p} \right\}$$ when $$n,k$$ are large … but this comes at the cost of some unspecified inaccuracy and just doesn’t feel right.

So the question remains: is there an efficient and simple way to sample uniformly from the set of partitions into a fixed number of nonempty sets?

• disqus_ZSXLOzHnA3

Does this work? Start by choosing p elements uniformly at random without replacement to “seed” the urns. Then place the remaining n-p elements into the urns one at a time. If urn i currently has k_i elements, then it is chosen with probability proportional to k_i.

• http://thousandfold.net/cz Alex Gittens

If you drop your last sentence, then I know the procedure doesn’t work. Why do you suggest sampling with probability proportional to k_i rather than inversely proportional? The latter seems more intuitive, but I doubt either would work…

• disqus_ZSXLOzHnA3

Sampling urns uniformly gives a distribution on partitions proportional to prod_i k_i. (maximized when k_i are roughly equal). I’d hoped a simple formula would undo this bias, but it doesn’t work. Rejection sampling should work, but I think it’s extremely inefficient unless p or n-p is small.

• http://thousandfold.net/cz Alex Gittens

Yep. This might be one of those distributions that MCMC methods is best for, and that’s just too inefficient for my purposes.

• http://xingshicai.ca Xing Shi Cai

Hello! I’m interested in this problem as it is related to my recent research project. Have you found any further information about this problem? Can you explain a bit about why you need this?

• http://thousandfold.net/cz Alex Gittens

Hi Xing, I have not looked further. The application I was thinking of was to get a randomized approximation a la Recht and Rahimi to a boolean kernel (http://machinelearning.wustl.edu/mlpapers/paper_files/nips02-LT13.pdf ). But it turns out that boolean kernels aren’t particularly useful: http://www.jmlr.org/papers/volume6/khardon05a/khardon05a.pdf ; after seeing this result, I have not spent any more time on the problem

• http://xingshicai.ca Xing Shi Cai

Hi Alex. Thank you for your reply! Maybe you’re not interested anymore, but here’s my solution for the special case of sampling a partition of kn numbers into n parts when k is fixed integer. Please see section 7 of this paper http://arxiv.org/abs/1504.06238

The expected running time is O(n^{3/2}). But we think it is possible to do it in O(n) time.