0

I am trying to use parallel processing in python using the following code:

import os
import datetime
import numpy as np
import FarimaModule
from  statsmodels.tsa.arima.model import ARIMA
import matplotlib.pyplot as plt
import multiprocessing as mp

# Here I define some variables: p_max,q_max,m_list,wlen,mstep,fs, listFile

def implement(fname,p_max,q_max,m_list,wlen,mstep,fs):
    # It is a really long code

# run the function 'implement' in parallel for different values of the input variable 'fname'
pool = mp.Pool(10)
results = [pool.apply(implement, args=(fname,p_max,q_max,m_list,wlen,mstep,fs)) for fname in listFile]
pool.close()

But it throws the following error:

    An attempt has been made to start a new process before the
    current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.

Others have posted questions with the same error. But I am not able to implement solutions posted there as it is unclear how do I adapt those solutions for my code.

4
  • 2
    Did you add if __name__ == '__main__': as the error message indicates? The multiprocessing documentation explains the need for that line. Commented Oct 21, 2021 at 4:10
  • 1
    More specifically, the last three lines need to be executed only in the main thread. The way you have it, each newly started thread will read your file and start yet another pool of ten threads. The pool creation code needs to only be executed once. Commented Oct 21, 2021 at 4:18
  • Thanks for the suggestion. I could not find mp.Pool.apply() method illustrated there. But the pool.map() seems to be working. Commented Oct 21, 2021 at 4:21
  • @FrankYellin I was adding the if __name__ == '__main__':' after pool = mp.Pool. That is why it was not working. It works if I add if name == 'main':' before this line. But now it seems that it is not running in sequence like a usual for loop; it is not parallelizing. Commented Oct 21, 2021 at 4:35

1 Answer 1

1

On some systems, multiprocessing has to spawn a new copy of python and import your module to get to the worker code. Anything at module level is executed again... including the parent code that creates the pool. This would be an infinite recursion except python detects the problem and gives you a handy tip. You would follow it by

import os
import datetime
import numpy as np
import FarimaModule
from  statsmodels.tsa.arima.model import ARIMA
import matplotlib.pyplot as plt
import multiprocessing as mp

# Here I define some variables: p_max,q_max,m_list,wlen,mstep,fs, listFile

def implement(fname,p_max,q_max,m_list,wlen,mstep,fs):
    # It is a really long code

if __name__ == "__main__":
    # run the function 'implement' in parallel for different values of the input variable 'fname'
    pool = mp.Pool(10)
    results = [pool.apply(implement, args= 
       (fname,p_max,q_max,m_list,wlen,mstep,fs)) for fname in listFile]
    pool.close()

A top level python script always has the name "__main__". When imported by the subprocess, its now a module and has a different name.

pool.apply is likely not the method you want - it waits for the pool worker to complete. map may be the better choice. I chunks (groups) input. In your case, with an expensive calculation, you likely want a small chunksize. starmap is just map with multiple parameters.

if __name__ == "__main__":
    # run the function 'implement' in parallel for different values of the input variable 'fname'
    with mp.Pool(10) as pool:
        results = pool.starmap(implement, 
            [(fname,p_max,q_max,m_list,wlen,mstep,fs)) 
                for fname in listFile],
            chunksize=1)
Sign up to request clarification or add additional context in comments.

4 Comments

This code is working, but it is not doing parallel processing. It is iterating over 'fname` sequentially.
That's apply. You could apply_async or use map like the example I added.
pool.apply uses one thread. If you want to use multiple threads, you must either make multiple calls to pool.apply or use one of the varieties of pool.map().
Thanks all. pool.starmap is working for me.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.