RuntimeError: python multiprocessing error

Question

I am trying to use parallel processing in python using the following code:

import os
import datetime
import numpy as np
import FarimaModule
from  statsmodels.tsa.arima.model import ARIMA
import matplotlib.pyplot as plt
import multiprocessing as mp

# Here I define some variables: p_max,q_max,m_list,wlen,mstep,fs, listFile

def implement(fname,p_max,q_max,m_list,wlen,mstep,fs):
    # It is a really long code

# run the function 'implement' in parallel for different values of the input variable 'fname'
pool = mp.Pool(10)
results = [pool.apply(implement, args=(fname,p_max,q_max,m_list,wlen,mstep,fs)) for fname in listFile]
pool.close()

But it throws the following error:

    An attempt has been made to start a new process before the
    current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.

Others have posted questions with the same error. But I am not able to implement solutions posted there as it is unclear how do I adapt those solutions for my code.

Did you add if __name__ == '__main__': as the error message indicates? The multiprocessing documentation explains the need for that line. — sj95126
– sj95126, Commented Oct 21, 2021 at 4:10
More specifically, the last three lines need to be executed only in the main thread. The way you have it, each newly started thread will read your file and start yet another pool of ten threads. The pool creation code needs to only be executed once. — Frank Yellin
– Frank Yellin, Commented Oct 21, 2021 at 4:18
Thanks for the suggestion. I could not find mp.Pool.apply() method illustrated there. But the pool.map() seems to be working. — Abhinav Gupta
– Abhinav Gupta, Commented Oct 21, 2021 at 4:21
@FrankYellin I was adding the if __name__ == '__main__':' after pool = mp.Pool. That is why it was not working. It works if I add if name == 'main':' before this line. But now it seems that it is not running in sequence like a usual for loop; it is not parallelizing. — Abhinav Gupta
– Abhinav Gupta, Commented Oct 21, 2021 at 4:35

tdelaney · Accepted Answer · 2021-10-21 04:59:56Z

1

On some systems, multiprocessing has to spawn a new copy of python and import your module to get to the worker code. Anything at module level is executed again... including the parent code that creates the pool. This would be an infinite recursion except python detects the problem and gives you a handy tip. You would follow it by

import os
import datetime
import numpy as np
import FarimaModule
from  statsmodels.tsa.arima.model import ARIMA
import matplotlib.pyplot as plt
import multiprocessing as mp

# Here I define some variables: p_max,q_max,m_list,wlen,mstep,fs, listFile

def implement(fname,p_max,q_max,m_list,wlen,mstep,fs):
    # It is a really long code

if __name__ == "__main__":
    # run the function 'implement' in parallel for different values of the input variable 'fname'
    pool = mp.Pool(10)
    results = [pool.apply(implement, args= 
       (fname,p_max,q_max,m_list,wlen,mstep,fs)) for fname in listFile]
    pool.close()

A top level python script always has the name "__main__". When imported by the subprocess, its now a module and has a different name.

pool.apply is likely not the method you want - it waits for the pool worker to complete. map may be the better choice. I chunks (groups) input. In your case, with an expensive calculation, you likely want a small chunksize. starmap is just map with multiple parameters.

if __name__ == "__main__":
    # run the function 'implement' in parallel for different values of the input variable 'fname'
    with mp.Pool(10) as pool:
        results = pool.starmap(implement, 
            [(fname,p_max,q_max,m_list,wlen,mstep,fs)) 
                for fname in listFile],
            chunksize=1)

edited Oct 21, 2021 at 4:59

answered Oct 21, 2021 at 4:21

tdelaney

78k6 gold badges91 silver badges129 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Abhinav Gupta Over a year ago

This code is working, but it is not doing parallel processing. It is iterating over 'fname` sequentially.

tdelaney Over a year ago

That's apply. You could apply_async or use map like the example I added.

Frank Yellin Over a year ago

pool.apply uses one thread. If you want to use multiple threads, you must either make multiple calls to pool.apply or use one of the varieties of pool.map().

Abhinav Gupta Over a year ago

Thanks all. pool.starmap is working for me.

Collectives™ on Stack Overflow

RuntimeError: python multiprocessing error

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related