Skip to content

Conversation

@goldsborough
Copy link
Contributor

@goldsborough goldsborough commented May 26, 2018

This PR is a prototype / WIP for a broad change to the way tensors are created in ATen. The fundamental motivation for this PR is to make tensor creation from C++ more like in Python, which means changing the API for factory functions like at::ones, as well as teaching ATen about construction axes beyond the backend and scalar type, meaning especially a device. This change essentially lays the ground for how tensor creation will happen in c10, and thus defines a new API, currently bolted on top of existing ATen.

This PR proposes the following:

Creation of a type encapsulating all construction axes of a tensor

At the moment, this type will contain:

  • A scalar type,
  • A backend,

And in the near future shall contain (and already does somewhat in this prototype):

  • A device ordinal,
  • A layout (strided, coo_sparse etc.)
  • A requires_grad boolean.

in code:

struct TensorOptions {
        size_t device = 0;
	Layout layout = kStrided;
	ScalarType dtype = kFloat;
        Backend backend = kCPU;
	bool requires_grad = false;
};

Each of these fields have default values, such that a default-constructed TensorOptions object fully specifies a new tensor.

Furthermore, this class shall have chaining methods:

TensorOptions().device(1).dtype(kInt)

and there shall be functions with the names of fields, that return a new TensorOptions instance:

// dtype is a free function that returns a new TensorOptions
auto options = dtype(kFloat).requires_grad(true); 

Use of TensorOptions in all factory methods

The new API for tensor creation shall then be (e.g. for ones):

Tensor ones(IntList size, TensorOptions options = {}) { }

with usage examples such as

at::ones({2, 2}) // float cpu tensor
at::ones({2, 2}, kInt); // int cpu tensor
at::ones({2, 2}, type(kInt).requires_grad(true)) // int cpu tensor, which sets requires_grad for variables

New code path

This PR also implements a new code path for factory methods. For our recollection, at the moment, we create tensors via e.g.

at::ones(at::CPU(at::kFloat), {3, 3})

which dispatches to

at::CPU(at::kFloat).ones({3, 3});

which itself calls

at::native::ones(at::CPU(at::kFloat), {3, 3})

which then calls

auto tensor = at::CPU(at::kFloat).tensor({3, 3});
tensor.fill_(1);

This PR proposes to remove factory methods from Type, and skip the dispatch via Type altogether for all factory methods except tensor, which remains the fundamental source of tensors invoked by all other tensor factories:

at::ones({3, 3}, at::type(at::kFloat).device(at::CPU))

now directly calls

at::native::ones({3, 3}, at::type(at::kFloat).device(at::CPU))

which now does

// `TensorOptions::type()`  returns the type corresponding to its `backend` and `scalar_type`
auto tensor = tensor_options.type().tensor({3, 3});
tensor.fill_(1)

At the moment, I have implemented this for ones, and I think nothing broke. There should be no implications for the Python API.

@zdevito @apaszke @gchanan @colesbury @ezyang

Fixes #6285
Fixes #6286
Fixes #7735

This comment was marked as off-topic.

@ezyang
Copy link
Contributor

ezyang commented May 30, 2018

In PyTorch, there is a context-manager which can be used to change the default device options, e.g., changing everything from float to double. Do you plan to eventually support this? If so, with the design as written here you are committing to the fact that these values get read at DeviceOption construction time (since these are direct things, not optional)

@goldsborough
Copy link
Contributor Author

That's a very good point. I think it would be easy to implement with thread local globals and a RAII mechanism to set defaults in a particular scope. I would personally think it's fine that values are fixed at the call site (where you write at::ones(...)). Do you foresee a need to defer the point in time when values are fixed?

@gchanan
Copy link
Contributor

gchanan commented May 30, 2018

Why does TensorOptions have backend and integral device? Why not have device objects (like python) and get rid of backend (which can be inferred from device and layout)?

@goldsborough
Copy link
Contributor Author

That's Coming

@goldsborough goldsborough force-pushed the tensor-options branch 7 times, most recently from 1aec84c to 1b6813b Compare June 4, 2018 00:55
@goldsborough goldsborough force-pushed the tensor-options branch 4 times, most recently from 755fa3e to 4e3cb37 Compare June 5, 2018 07:23

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

Copy link
Contributor

@ezyang ezyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ship it already yo

Copy link
Contributor

@zdevito zdevito left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some details about TensorOptions like the underlying Type* field that we might want to figure out better solutions for. However, this PR is getting to big, so assuming there are not any known bugs, I think we should merge and revisit those nits afterwards.

@goldsborough goldsborough force-pushed the tensor-options branch 7 times, most recently from bb97ef5 to 87159de Compare June 16, 2018 00:31
Storing the type in TensorOptions to solve the Variable problem

Created convenience creation functions for TensorOptions and added tests

Converted zeros to TensorOptions

Converted rand to TensorOptions

Fix codegen for TensorOptions and multiple arguments

Put TensorOptions convenience functions into torch namespace too

All factory functions except *_like support TensorOptions

Integrated with recent JIT changes

Support *_like functions

Fix in place modification

Some cleanups and fixes

Support sparse_coo_tensor

Fix bug in Type.cpp

Fix .empty calls in C++ API

Fix bug in Type.cpp

Trying to fix device placement

Make AutoGPU CPU compatible

Remove some auto_gpu.h uses

Fixing some headers

Fix some remaining CUDA/AutoGPU issues

Fix some AutoGPU uses

Fixes to dispatch_tensor_conversion

Reset version of new variables to zero

Implemented parsing device strings

Random fixes to tests

Self review cleanups

flake8

Undo changes to variable.{h,cpp} because they fail on gcc7.2

Add [cuda] tag to tensor_options_cuda.cpp

Move AutoGPU::set_index_from into .cpp file because Windows is stupid and sucks

Fix linker error in AutoGPU.cpp

Fix bad merge conflict in native_functions.yaml

Fixed caffe2/contrib/aten

Fix new window functions added to TensorFactories.cpp
Added code to generate wrapper functions for factory methods

Add implicit constructor from Backend to TensorOptions

Remove Var() from C++ API and use torch:: functions

Use torch:: functions more subtly in C++ API

Make AutoGPU::set_device more exception safe

Check status directly in DynamicCUDAHooksInterface

Rename AutoGPU to DeviceGuard

Removed set_requires_grad from python_variables.h and warn appropriately in Variable::set_requires_grad

remove python_default_init: self.type()

Add back original factory functions, but with deprecation warnings

Disable DeviceGuard for a couple functions in ATen

Remove print statement

Fix DeviceGuard construction from undefined tensor

Fixing CUDA device compiler issues

Moved as many methods as possible into header files

Dont generate python functions for deprecated factories

Remove merge conflict artefact

Fix tensor_options_cuda.cpp

Fix set_requires_grad not being checked

Fix tensor_new.h

TEMPORARILY put some methods in .cpp files to see if it solves issues on windows and mac

Fix bug in DeviceGuard.h

Missing includes

TEMPORARILY moving a few more methods into .cpp to see if it fixes windows

Fixing linker errors
Undo device agnostic behavior of DeviceGuard

Use -1 instead of optional for default device index

Also move DeviceGuard methods into header

Fixes around device index after optional -> int32_t switch

Fix use of DeviceGuard in new_with_tensor_copy

Fix tensor_options.cpp
@goldsborough goldsborough merged commit 372d1d6 into pytorch:master Jun 16, 2018
@goldsborough goldsborough deleted the tensor-options branch June 16, 2018 07:40

// Guard GPU device
AutoGPU gpuGuard;
at::DeviceGuard gpuGuard;

This comment was marked as off-topic.

This comment was marked as off-topic.

} else {
backend = (layout_ == kStrided) ? kCUDA : kSparseCUDA;
}
return getType(backend, dtype_);

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants