Skip to content

Forking is not possible anymore when using any PyTorch version from about 2 months ago #20048

@Amir-Arsalan

Description

@Amir-Arsalan

I have been building PyTorch from source in the past couple of months. Since around 1 (or maybe 2) month(s) ago I am unable to do forward passing to my models when using the multiprocessing package of PyTorch. If I'm not mistaken, the last version of PyTorch that I built and still allows me to use forking is this one; or more precisely, the version of PyTorch that I am using now and allows me to do forking is 1.1.0a0+3a01a45.

Not being able to do forking is pretty annoying cause I have written all of my work flow in a way that it depends on using multiprocessing.Process with forking. In one part of my work flow I start tens of processes and do forward passing on a model that is on the CPU memory. This is pretty useful and super efficient since the weights of the model are not being copied every time I start a new process. However, if I spawn processes the weights of the model needs to get copied and it's not gonna be efficient anymore.

In another part of my framework, I start a process while the model is still on CPU's memory and then do model.cuda() within the process and the model is then copied to the GPU memory. This is a bit inefficient but still allows me to start a 4-5 processes (depending on the GPU memory) and do forward-pass my inputs.

Here's a pseudocode of what I do:

from torch.multiprocessing import Process
model = loadModel() # somehow load a model (e.g. from torch vision)
inputList = loadListOfInputs() # somehow get the list of input tensors

processes = []
for i in range(100):
    processes.append(Process(target=doForwardPass, kwargs={'input': inputList[i]}))
    processes[-1].start()

for i in range(100):
    processes[i].join()

def doForwardPass(input):
    # model.cuda() # this, uncommented, used to work also
    output = model(input)

With the current versions of PyTorch (since 1-2 months ago) I cannot do this anymore. I'm afraid this is happening due to some indirect affects of some commits which did not intend to disable forking. So I wonder if you guys can either revert the changes that have caused this or enable this feature again.

Also, when using some of the PyTorch versions from the master branch (since 1-2 months ago) I might get the following if I do model.cuda() within an started process:

THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=51 error=3 : initialization error

This might be somewhat relevant to lazy initialization of CUDA.

Metadata

Metadata

Assignees

No one assigned

    Labels

    high prioritymodule: cudaRelated to torch.cuda, and CUDA support in generalmodule: regressionIt used to work, and now it doesn'ttriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions