Skip to content

Commit 4549d42

Browse files
committed
Merge branch 'optimizeAliasAnalysis' of github.com:Chillee/pytorch into optimizeAliasAnalysis
2 parents d0297c5 + 49602ce commit 4549d42

File tree

15 files changed

+65
-46
lines changed

15 files changed

+65
-46
lines changed

.gitmodules

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@
2121
[submodule "third_party/protobuf"]
2222
ignore = dirty
2323
path = third_party/protobuf
24-
url = https://github.com/google/protobuf.git
24+
url = https://github.com/protocolbuffers/protobuf.git
2525
[submodule "third_party/ios-cmake"]
2626
ignore = dirty
2727
path = third_party/ios-cmake
@@ -57,7 +57,7 @@
5757
[submodule "third-party/cpuinfo"]
5858
ignore = dirty
5959
path = third_party/cpuinfo
60-
url = https://github.com/Maratyszcza/cpuinfo.git
60+
url = https://github.com/pytorch/cpuinfo.git
6161
[submodule "third_party/python-enum"]
6262
ignore = dirty
6363
path = third_party/python-enum

aten/src/TH/generic/THTensor.cpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -200,7 +200,7 @@ THTensor *THTensor_(newView)(THTensor *tensor, at::IntArrayRef size)
200200
inferred_size);
201201
THArgCheck(stride.has_value(), 2, "view size is "
202202
"not compatible with input tensor's size and stride (at least one dimension spans "
203-
"across two contiguous subspaces). Call .contiguous() before .view().");
203+
"across two contiguous subspaces). Use .reshape(...) instead.");
204204
auto stride_value = *stride;
205205
THTensor_setStorage(self, THTensor_getStoragePtr(tensor), tensor->storage_offset(), inferred_size, stride_value);
206206
return self;

aten/src/THC/generic/THCTensor.cpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -206,7 +206,7 @@ THCTensor *THCTensor_(newView)(THCState *state, THCTensor *tensor, at::IntArrayR
206206
inferred_size);
207207
THArgCheck(stride.has_value(), 2, "view size is "
208208
"not compatible with input tensor's size and stride (at least one dimension spans "
209-
"across two contiguous subspaces). Call .contiguous() before .view().");
209+
"across two contiguous subspaces). Use .reshape(...) instead.");
210210
auto stride_value = *stride;
211211

212212
// NOTE: This path of constructing the Tensor directly with the viewed Storage is necessary

docs/source/multiprocessing.rst

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,9 @@ Strategy management
1919
.. autofunction:: get_sharing_strategy
2020
.. autofunction:: set_sharing_strategy
2121

22+
23+
.. _multiprocessing-cuda-sharing-details:
24+
2225
Sharing CUDA tensors
2326
--------------------
2427

@@ -28,8 +31,13 @@ Python 2 can only create subprocesses using ``fork``, and it's not supported
2831
by the CUDA runtime.
2932

3033
Unlike CPU tensors, the sending process is required to keep the original tensor
31-
as long as the receiving process retains a copy of the tensor. It is implemented
32-
under the hood but requires users to follow the next best practices.
34+
as long as the receiving process retains a copy of the tensor. The refcounting is
35+
implemented under the hood but requires users to follow the next best practices.
36+
37+
.. warning::
38+
If the consumer process dies abnormally to a fatal signal, the shared tensor
39+
could be forever kept in memory as long as the sending process is running.
40+
3341

3442
1. Release memory ASAP in the consumer.
3543

docs/source/notes/cuda.rst

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -277,8 +277,9 @@ memory. CPU tensors and storages expose a :meth:`~torch.Tensor.pin_memory`
277277
method, that returns a copy of the object, with data put in a pinned region.
278278

279279
Also, once you pin a tensor or storage, you can use asynchronous GPU copies.
280-
Just pass an additional ``non_blocking=True`` argument to a :meth:`~torch.Tensor.cuda`
281-
call. This can be used to overlap data transfers with computation.
280+
Just pass an additional ``non_blocking=True`` argument to a
281+
:meth:`~torch.Tensor.to` or a :meth:`~torch.Tensor.cuda` call. This can be used
282+
to overlap data transfers with computation.
282283

283284
You can make the :class:`~torch.utils.data.DataLoader` return batches placed in
284285
pinned memory by passing ``pin_memory=True`` to its constructor.

docs/source/notes/multiprocessing.rst

Lines changed: 18 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -20,22 +20,26 @@ memory and will only send a handle to another process.
2020
This allows to implement various training methods, like Hogwild, A3C, or any
2121
others that require asynchronous operation.
2222

23-
Sharing CUDA tensors
24-
--------------------
25-
26-
Sharing CUDA tensors between processes is supported only in Python 3, using
27-
a ``spawn`` or ``forkserver`` start methods. :mod:`python:multiprocessing` in
28-
Python 2 can only create subprocesses using ``fork``, and it's not supported
29-
by the CUDA runtime.
23+
CUDA in multiprocessing
24+
-----------------------
3025

31-
.. warning::
26+
The CUDA runtime does not support the ``fork`` start method. However,
27+
:mod:`python:multiprocessing` in Python 2 can only create subprocesses using
28+
``fork``. So Python 3 and either ``spawn`` or ``forkserver`` start method are
29+
required to use CUDA in subprocesses.
3230

33-
CUDA API requires that the allocation exported to other processes remains
34-
valid as long as it's used by them. You should be careful and ensure that
35-
CUDA tensors you shared don't go out of scope as long as it's necessary.
36-
This shouldn't be a problem for sharing model parameters, but passing other
37-
kinds of data should be done with care. Note that this restriction doesn't
38-
apply to shared CPU memory.
31+
.. note::
32+
The start method can be set via either creating a context with
33+
``multiprocessing.get_context(...)`` or directly using
34+
``multiprocessing.set_start_method(...)``.
35+
36+
Unlike CPU tensors, the sending process is required to keep the original tensor
37+
as long as the receiving process retains a copy of the tensor. It is implemented
38+
under the hood but requires users to follow the best practices for the program
39+
to run correctly. For example, the sending process must stay alive as long as
40+
the consumer process has references to the tensor, and the refcounting can not
41+
save you if the consumer process exits abnormally via a fatal signal. See
42+
:ref:`this section <multiprocessing-cuda-sharing-details>`.
3943

4044
See also: :ref:`cuda-nn-dataparallel-instead`
4145

setup.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -288,6 +288,10 @@ def check_file(f):
288288
check_file(os.path.join(third_party_path, 'foxi', 'CMakeLists.txt'))
289289
check_file(os.path.join(third_party_path, 'QNNPACK', 'CMakeLists.txt'))
290290
check_file(os.path.join(third_party_path, 'fbgemm', 'CMakeLists.txt'))
291+
check_file(os.path.join(third_party_path, 'fbgemm', 'third_party',
292+
'asmjit', 'CMakeLists.txt'))
293+
check_file(os.path.join(third_party_path, 'onnx', 'third_party',
294+
'benchmark', 'CMakeLists.txt'))
291295

292296
check_pydep('yaml', 'pyyaml')
293297
check_pydep('typing', 'typing')

third_party/nccl/nccl

Submodule nccl updated 81 files

torch/_torch_docs.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -3583,13 +3583,13 @@ def merge_dicts(*dicts):
35833583

35843584
add_docstr(torch.ones,
35853585
r"""
3586-
ones(*sizes, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False) -> Tensor
3586+
ones(*size, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False) -> Tensor
35873587
35883588
Returns a tensor filled with the scalar value `1`, with the shape defined
35893589
by the variable argument :attr:`sizes`.
35903590
35913591
Args:
3592-
sizes (int...): a sequence of integers defining the shape of the output tensor.
3592+
size (int...): a sequence of integers defining the shape of the output tensor.
35933593
Can be a variable number of arguments or a collection like a list or tuple.
35943594
{out}
35953595
{dtype}
@@ -5632,13 +5632,13 @@ def merge_dicts(*dicts):
56325632

56335633
add_docstr(torch.zeros,
56345634
r"""
5635-
zeros(*sizes, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False) -> Tensor
5635+
zeros(*size, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False) -> Tensor
56365636
56375637
Returns a tensor filled with the scalar value `0`, with the shape defined
56385638
by the variable argument :attr:`sizes`.
56395639
56405640
Args:
5641-
sizes (int...): a sequence of integers defining the shape of the output tensor.
5641+
size (int...): a sequence of integers defining the shape of the output tensor.
56425642
Can be a variable number of arguments or a collection like a list or tuple.
56435643
{out}
56445644
{dtype}

0 commit comments

Comments
 (0)