RPC should destroy/deinitialize dist autograd container when rpc is shutdown

## 🚀 Feature/Enhnacement

During RPC initialization, each participating process initializes a singleton `DistAutogradContainer`. During shutdown, we cleanup items such as the RRef context and RPC agent, but do not reset this singleton dist autograd container. This prevents RPC from being reinitialized (i.e. a group of processes cannot call `rpc.shutdown() ; dist.barrier() ; rpc.init_rpc(...)`. It will fail with the following error:

```
/test/distributed/rpc/faulty_agent/rpc_spawn_faulty#binary,link-tree
/torch/distributed/rpc/__init__.py", line 85, in init_rpc
>     dist_autograd._init(rank)
> RuntimeError: Container is already initialized! Cannot initialize it twice!
```

## Motivation
I was looking into seeing if we could reuse spawned subprocesses across our RPC tests, which would decrease the amount of time it takes to run the RPC test suite; however, this would require processes to be able to re-init RPC, which is blocked by this error. Also, I was writing a set of similar tests in https://github.com/pytorch/pytorch/pull/38590, and wanted to use this pattern to reduce code duplication, but couldn't (the tests need to use different initializations of RPC since they test the RRef deletion behavior triggered by the shutdown sequence). 

## Pitch
I don't know if we necessarily need to destroy the container, maybe we could just de-initialize it, and then re-init if RPC is being started back up again.

cc @pietern @mrshenli @pritamdamania87 @zhaojuanmao @satgera @gqchen @aazzolini @rohan-varma @xush6528 @jjlilley @osalpekar

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RPC should destroy/deinitialize dist autograd container when rpc is shutdown #39340

🚀 Feature/Enhnacement

Motivation

Pitch

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

RPC should destroy/deinitialize dist autograd container when rpc is shutdown #39340

Description

🚀 Feature/Enhnacement

Motivation

Pitch

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions