2

I would like to parallelize a for loop in python.

The loop gets fed by a generator and I expect 1 billion items.

It turned out, that joblib has a giant memory leak

Parallel(n_jobs=num_cores)(delayed(testtm)(tm) for tm in powerset(all_turns))

I do not want to store data in this loop, just print sometimes something out, but the main thread grows in seconds to 1 GB size.

Are there any other frameworks for a large number of iterations?

3
  • Running things in parallel doesn't magically fix memory leaks. Commented Feb 12, 2015 at 15:02
  • The memory leak is not in my code, at least I am pretty sure about that Commented Feb 12, 2015 at 15:03
  • Memoryleak seems to be in the joblib library, at least there is no leak without joblib anymore Commented Feb 12, 2015 at 15:17

1 Answer 1

2
from multiprocessing import Pool

if __name__ == "__main__":
   pool = Pool() # use all available CPUs
   for result in pool.imap_unordered(delayed(testtm), powerset(all_turns),
                                     chunksize=1000):
       print(result)
Sign up to request clarification or add additional context in comments.

2 Comments

btw: what is this name line for?, results can be seen here: github.com/thigg/fapr/blob/master/graph/builder.py. Instead of Gigabytes of ram I need no ~10mb per task. Awesome!
__main__ guard allows to import the module without creating a pool of processes. multiprocessing may import the module in the child processes.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.