Sharing numpy arrays between processes using multiprocessing and ctypes

Because of its global interpreter lock, Python doesn’t support multithreading. To me, this is a ridiculous limitation that should be gotten rid of post-haste: a programming language is not modern unless it support multithreading.

Python supports multiprocessing, but the straightforward manner of using multiprocessing requires you to pass data between processes using pickling/unpickling rather than sharing memory. Needless to say, this slows down execution when large amounts of data need to be shared by processes.

In my case, I’ve been using multiprocessing to speed up the training of a recursive autoencoder: instead of training one auto-encoder, I train an ensemble of around 30 simultaneously, and periodically average their parameters together and use these as the initialization points for further training of all the models. Pickling the parameters here doesn’t seem to be that costly, because the optimization itself takes so much time. However, it’s nice to note that there’s a relatively simple workaround that allows you to avoid the pickling/unpickling process: create the shared array as a ctypes object, and pass it into the initializer of each process, making it a global variable in each thread. Here’s some example code:

from multiprocessing import Pool, sharedctypes
import numpy as np
import warnings

def populate(args):
    """ Populates the portion of the shared array arr indicated by the offset and blocksize options """
    offset, blocksize = args
    with warnings.catch_warnings():
        warnings.simplefilter('ignore', RuntimeWarning)
        v = np.ctypeslib.as_array(arr)

    for idx in range(offset, blocksize+offset):
        v[idx, :] = offset

    return offset

def _init(arr_to_populate):
    """ Each pool process calls this initializer. Load the array to be populated into that process's global namespace """
    global arr
    arr = arr_to_populate

size = 2000
blocksize = 100
tmp = np.ctypeslib.as_ctypes(np.zeros((size, 10)))
shared_arr = sharedctypes.Array(tmp._type_, tmp, lock=False) 

pool = Pool(processes=4, initializer=_init, initargs=(shared_arr, ))
result =, zip(range(0, size, blocksize), [blocksize]*(size/blocksize)))

with warnings.catch_warnings():
    warnings.simplefilter('ignore', RuntimeWarning)
    final_arr = np.ctypeslib.as_array(shared_arr)

print final_arr

The warnings filter is there because a PEP compliance bug gives rise to some superfluous runtime warnings.

  • sheldonross

    With use such word as ‘pickling’, your blog entry is very much insider jargon.

    Offhand question, do you know whether or not Python is Turning Complete? Last weekend, I learned that CSS is.

    • Alex Gittens

      This post is aimed at a very specific audience, namely people who google for information on the same problem I had. Or more likely, me in the future looking for a solution to the same problem.

      Python is Turing complete, as are all other major general purpose programming languages: according to Wikipedia, all imperative languages with if statements and an arbitrary number of variables is Turing complete. I’m surprised CSS is; I haven’t read the spec since the first version; they must have made some big changes.

  • Guest

    You don’t need to pickle/unpickle data for multi process communication. If your data is small enough to fit in memory, you can use the Queue and Pipe objects to pass it between processes.