I'm having trouble getting this example working. In serial this function works just fine, but when I attempt to run it in a multiprocessing.Pool it locks up and will not return a simple random integer. I'm specifically using the spawn context because I'm developing in a Windows environment.
import multiprocessing
from tqdm import tqdm
def test_parallel(min_rng: int):
import random
return random.randint(min_rng, 100)
def bootstrap_test_parallel(n_tasks: int = 10_000, min_rng: int = 3,
pool_size: int = max(1, multiprocessing.cpu_count() - 1)):
results = [None for _ in range(n_tasks)]
def log_result(value, ix):
results[ix] = value
if pool_size == 1:
for ix in tqdm(range(n_tasks)):
log_result(test_parallel(min_rng), ix)
else:
with multiprocessing.get_context("spawn").Pool(pool_size) as pool:
for ix in tqdm(range(n_tasks)):
log_result(pool.apply_async(test_parallel, args=(min_rng,)), ix=ix)
for ix in tqdm(range(n_tasks)):
results[ix] = results[ix].get()
return results
if __name__ == "__main__":
output_one = bootstrap_test_parallel(n_tasks=10, pool_size=1) # runs fine
print(output_one)
output_two = bootstrap_test_parallel(n_tasks=10, pool_size=2) # hangs indefinitely
print(output_two)
__main__module of the program. This trick does not get propagated to the child process, and so the child process ends up looking fortest_parallelin the main module of the Jupyter Notebook Server. The solution is puttest_parallelis a separate python module or to use cloudpickle.