2

I've attempted to run parallel processing on a locally defined function as follows:

import multiprocessing as mp                                                                                               
import numpy as np
import pdb


def testFunction():                                                                                                        
  x = np.asarray( range(1,10) )
  y = np.asarray( range(1,10) )

  def myFunc( i ):
    return np.sum(x[0:i]) * y[i]

  p = mp.Pool( mp.cpu_count() )
  out = p.map( myFunc, range(0,x.size) )
  print( out )


if __name__ == '__main__':
  print( 'I got here' )                                                                                                         
  testFunction()

When I do so, I get the following error:

cPickle.PicklingError: Can't pickle <type 'function'>: attribute lookup __builtin__.function failed

How can I use multiprocessing to processing several arrays in parallel like I'm trying to do here? x and y are necessarily defined inside the function; I'd rather not make them global variables.

All help is appreciated.

4
  • I think you have a misunderstanding of how multiprocessing works - the functions invoked by map will be executed in separate processes with no concept of function-local data. You will have to pass the data to be processed to the function which will process it, either explicitly in the map parameter or by passing data through e.g. Queue. Commented May 21, 2019 at 20:55
  • @barny How can I pass data through? Note that I don't need to change x or y; I just need to use them. Commented May 21, 2019 at 21:40
  • Have you read the documentation e.g. docs.python.org/3/library/multiprocessing.html - you could try shared memory, maybe? Otherwise you have to explicitly send the data it's going to operate on to each process. Commented May 22, 2019 at 12:36
  • @barny I have read the documentation and I am struggling with it. How does one use shared memory? How can I explicitly send the data to each process? Commented May 22, 2019 at 15:45

1 Answer 1

3

Just make the processing function global and pass pairs of array values instead of referencing them by index in the function:

import multiprocessing as mp

import numpy as np


def process(inputs):
    x, y = inputs

    return x * y


def main():
    x = np.asarray(range(10))
    y = np.asarray(range(10))

    with mp.Pool(mp.cpu_count()) as pool:
        out = pool.map(process, zip(x, y))

    print(out)


if __name__ == '__main__':
    main()

Output:

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

UPDATE: According to the new details provided, you have to share arrays between different processes. This is exactly what the multiprocessing.Manager is used for.

A manager object returned by Manager() controls a server process which holds Python objects and allows other processes to manipulate them using proxies.

So the resulting code will look something like this:

from functools import partial
import multiprocessing as mp

import numpy as np


def process(i, x, y):
    return np.sum(x[:i]) * y[i]


def main():
    manager = mp.Manager()

    x = manager.Array('i', range(10))
    y = manager.Array('i', range(10))

    func = partial(process, x=x, y=y)

    with mp.Pool(mp.cpu_count()) as pool:
        out = pool.map(func, range(len(x)))

    print(out)


if __name__ == '__main__':
    main()

Output:

[0, 0, 2, 9, 24, 50, 90, 147, 224, 324]
Sign up to request clarification or add additional context in comments.

6 Comments

I'd prefer pool.starmap though, if you have multiple parameters
Thank you for your help! I've altered the problem slightly because I need to do something more than iterate over x. How can the solution be adjusted to address it?
@user24205 I've updated the answer so that now different processes share the same data arrays.
@Rightleg Again, thank you so much! One last question, what if the shared variable is not an array but something more complicated? In my case, I am trying to share the output of scipy's Voronoi function (which is complicated; I'm not sure exactly what it is).
@user24205, if you don't know what the output is, then how are you going to use it? :) Anyway, I'd suggest converting your data into arrays before sharing it between worker processes, use some kind of preprocessing before feeding your data to workers.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.