2

I have a large array and an object I'd like to call multiple times with multiprocessing. Neither the data nor the object internals get modified.

This works:

import numpy as np
from multiprocessing import Pool

data = np.arange(400).reshape(20, 20)

class MyClass():
    def __call__(self, indx):
        return np.sum(data[:, indx])

my_class = MyClass()

def call_single_indx(indx):
    result = my_class(indx)
    return result

def launch_jobs(nmap=10, num_jobs=3):
    with Pool(processes=num_jobs) as pool:
        result = pool.map(call_single_indx, range(nmap))
    result = np.array(result)
    return result

if __name__ == "__main__":

    result = launch_jobs()
    print(result)

But this fails:

import numpy as np
from multiprocessing import Pool

data = None
my_class = None

def set_globals(n1, n2):
    global data
    data = np.arange(n1).reshape(n2, n2)
    global my_class
    my_class = MyClass()

class MyClass():
    def __call__(self, indx):
        return np.sum(data[:, indx])

def call_single_indx(indx):
    result = my_class(indx)
    return result

def launch_jobs(nmap=10, num_jobs=3):
    with Pool(processes=num_jobs) as pool:
        result = pool.map(call_single_indx, range(nmap))
    result = np.array(result)
    return result

if __name__ == "__main__":
    set_globals(n1=400, n2=20)
    launch_jobs()

Fails with TypeError: 'NoneType' object is not callable, so set_globals failed to actually change the value of the global variable and it is still None.

How can I get my set_globals function to actually set the global variables so that all the processes share them?

I'm adding some more text here because the post is mostly code and it won't let me post without more details.

2
  • Can you include the error message? Commented Aug 27, 2025 at 23:26
  • 2
    yes, there are tons of questions about this, and the multiprocessing documentation addresses this at length, but you are creating multiple processes that don't share state. Commented Aug 27, 2025 at 23:31

2 Answers 2

1

Technically speaking it is not possible to set global variables in the way you are thinking with multiprocessing since each process is completely independent. Each process basically makes its own copy of __main_ and has its own copy of the global variables. Thus, as counterintuitive as it is, each process is running with its own copy of the global variables and when any process makes its own update to global variables, it is only updating its personal copy of it and not impacting other process's global vars. I have ran into the same problem often and basically have four solutions that have worked for me, I will label these Great, Good, Bad, Ugly:

1. The Great: use multithreading not multiprocessing:
Processes are all independent from one another and cannot share anything with each other in a nice "direct" way as you are attempting here. Threads on the other hand do not make their own copies of __main_ and therefore share all globals. While there are many use cases the differences between processes and threads really matters for technical reasons, I cannot see anything in your example code which necessitates one over the other.

edit: OP asked a good question in the comments that made me realize I forgot to elaborate on how to do this, the rest of this ("1. The Great") section elaborates on processes vs threads:

In order to do multithreading, you will want to use 'threading' from the standard library. To see examples of how to use it you can see the docs here: docs.python.org/3/library/threading.html. You often do not need any types of Pooling objects since you just establish and kick off as many threads as you want. You can learn what a process is and thread is here: superfastpython.com/thread-vs-process. As an aside, if you are doing a lot of large computational work the superfastpython site also has a lot of other interesting resources :)

As an example of how to do multithreading, here is a direct copy of code from the standard library threading page I just linked:

# The code below shows how to run multiple threads and is 
# copied from docs.python.org/3/library/threading.html
# in the python 3.13 version

import threading
import time

def crawl(link, delay=3):
    print(f"crawl started for {link}")
    time.sleep(delay)  # Blocking I/O (simulating a network request)
    print(f"crawl ended for {link}")

links = [
    "https://python.org",
    "https://docs.python.org",
    "https://peps.python.org",
]

# Start threads for each link
threads = []
for link in links:
    # Using `args` to pass positional arguments and `kwargs` for keyword arguments
    t = threading.Thread(target=crawl, args=(link,), kwargs={"delay": 2})
    threads.append(t)

# Start each thread
for t in threads:
    t.start()

# Wait for all threads to finish
for t in threads:
    t.join()

2. The Good: Multiprocess Manager:
A Multiprocess manager is basically an overseer of all processes that stores objects in itself, you can read more about managemer objects in the docs:
https://docs.python.org/3/library/multiprocessing.html#multiprocessing.Manager

3. The Bad: Process - Queues and Pipes:
To handle this issue that is very common the multiprocessesing API offers some solutions, the two of which most common are Queues and Pipes. You can read more about these and see how to use their variations (there are a handful of flavors) in the docs. These structures function a bit different than Managers:
https://docs.python.org/3/library/multiprocessing.html#pipes-and-queues

4. The Ugly: Off site dump:
Let me be clear Managers, Queues, and Pipes are the recommended/pythonic way to pass info between processes. However, they can be pretty annoying and cumbersome to deal with for small amounts of data passage, at least in my experience. While I called this solution 'The Ugly' I did save the best for last! In this case you can just use some space in memory or storage that is not a part of the __main_ thread, normally just a file or a small sqlite volatile table. This one is ugly because it is susceptible to race conditions, whereas Managers, Queues, and Pipes generally are not. However, reading/writing to a file or table is a very agnostic and repeatable way to handle this issue that is extremely easy to reuse. The race conditions issue is pretty minor on modern processors and for smaller number of process/data shared between them. Though if it is a concern then you can also use sqlite tables with WAL journaling.

Normally if I need to share data between these I just use multithreading instead. Since I have a large amount of code from doing this before, spinning up a sqlite db is considerably fast (you basically need to build this out once).

Hope this helps! I am happy to edit this answer with any follow up questions you may have :)

Sign up to request clarification or add additional context in comments.

2 Comments

If I swap from multiprocessing import Pool to from multiprocessing.pool import ThreadPool as Pool, it runs, but I don't get any speedup. Looks like it is just using one core. So that doesn't seem like the solution I'm looking for. The Multiprocess manager looks promising. Looks like I would sublcass it and register MyClass and data?
Oh simply switching out an import name does not always guarantee an identical API. I just realized I completely forgot to explain this difference between threads and processes in my answer, moving this info into the "1. The Great Part" in case anyone else needs to reference it :) For the Manager question check out this SO question, the example code in the top answer uses the Manager object for object storage without any subclassing or other boiler plate; this should be perfect for the use case you're describing :) stackoverflow.com/questions/9436757/how-to-use-a-multipro
1

Here's the working solution I eventually came up with. Created a shared numpy array, then used a Manager to store objects and make them available to all processes.

import numpy as np
from multiprocessing import Pool, shared_memory, Manager, set_start_method


class SharedNumpyArray:
    # from https://e-dorigatti.github.io/python/2020/06/19/multiprocessing-large-objects.html
    def __init__(self, array):
        # create the shared memory location of the same size of the array
        self._shared = shared_memory.SharedMemory(create=True, size=array.nbytes)
        # save data type and shape, necessary to read the data correctly
        self._dtype, self._shape = array.dtype, array.shape
        # create a new numpy array that uses the shared memory we created.
        # at first, it is filled with zeros
        res = np.ndarray(
            self._shape, dtype=self._dtype, buffer=self._shared.buf
        )
        # copy data from the array to the shared memory. numpy will
        # take care of copying everything in the correct format
        np.copyto(res, array)

    def read(self):
        """ Read array without copy.
        """
        return np.ndarray(self._shape, self._dtype, buffer=self._shared.buf)

    def copy(self):
        """Copy arrray
        """
        return np.copy(self.read_array())

    def unlink(self):
        """Unlink when done with data
        """
        self._shared.close()
        self._shared.unlink()


class MyClass1():
    def __init__(self, val):
        self.val = val

    def _compute_val(self):
        return self.val

    def __call__(self, data, indx):
        return np.sum(data[:, indx]) + self._compute_val()


class MyClass2():
    def __call__(self, data, myclass1, indx):
        return myclass1(data, indx) + myclass1.val


def call_single_indx(shared_data, shared_class1, shared_class2, indx):
    result = shared_class2(shared_data.read(), shared_class1, indx)
    return result


def launch_jobs(shared_data, nmap=10, num_jobs=3):
    with Manager() as manager:
        manager.shared_class1 = MyClass1(5)
        manager.shared_class2 = MyClass2()
        args = [[shared_data, manager.shared_class1, manager.shared_class2, i] for i in range(nmap)]
        with Pool(processes=num_jobs) as pool:
            result = pool.starmap(call_single_indx, args)
    result = np.array(result)
    return result


if __name__ == "__main__":

    # This seems to be required for things to
    # work on mac OS. 
    set_start_method("fork", force=True)

    data = np.arange(400).reshape(20, 20)
    shared_data = SharedNumpyArray(data)
    result = launch_jobs(shared_data)
    print(result)
    shared_data.unlink()

1 Comment

This looks good, well done! It is up to you, but it might be helpful to edit this code into your answer in case people stumble into the same problem but do not scroll down to see this :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.