-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Description
I have been building PyTorch from source in the past couple of months. Since around 1 (or maybe 2) month(s) ago I am unable to do forward passing to my models when using the multiprocessing package of PyTorch. If I'm not mistaken, the last version of PyTorch that I built and still allows me to use forking is this one; or more precisely, the version of PyTorch that I am using now and allows me to do forking is 1.1.0a0+3a01a45.
Not being able to do forking is pretty annoying cause I have written all of my work flow in a way that it depends on using multiprocessing.Process with forking. In one part of my work flow I start tens of processes and do forward passing on a model that is on the CPU memory. This is pretty useful and super efficient since the weights of the model are not being copied every time I start a new process. However, if I spawn processes the weights of the model needs to get copied and it's not gonna be efficient anymore.
In another part of my framework, I start a process while the model is still on CPU's memory and then do model.cuda() within the process and the model is then copied to the GPU memory. This is a bit inefficient but still allows me to start a 4-5 processes (depending on the GPU memory) and do forward-pass my inputs.
Here's a pseudocode of what I do:
from torch.multiprocessing import Process
model = loadModel() # somehow load a model (e.g. from torch vision)
inputList = loadListOfInputs() # somehow get the list of input tensors
processes = []
for i in range(100):
processes.append(Process(target=doForwardPass, kwargs={'input': inputList[i]}))
processes[-1].start()
for i in range(100):
processes[i].join()
def doForwardPass(input):
# model.cuda() # this, uncommented, used to work also
output = model(input)
With the current versions of PyTorch (since 1-2 months ago) I cannot do this anymore. I'm afraid this is happening due to some indirect affects of some commits which did not intend to disable forking. So I wonder if you guys can either revert the changes that have caused this or enable this feature again.
Also, when using some of the PyTorch versions from the master branch (since 1-2 months ago) I might get the following if I do model.cuda() within an started process:
THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=51 error=3 : initialization error
This might be somewhat relevant to lazy initialization of CUDA.