Skip to content

Conversation

@realdoug
Copy link
Contributor

@realdoug realdoug commented Sep 6, 2018

Couple questions:

  1. I used the log1p implementation in Add log1p for sparse tensor #8969 as a guide especially for testing. I'm not sure what the @skipIfROCM annotation is for, so unsure if i need it for my test.

  2. I implemented the branching logic in the narrow function itself; is this the right place to do so? I noticed that there a number of places where sparse-specific logic is handled with just an if statement in this file. Or should I implement a separate dispatch in native_functions.yml as in the log1p?

And of course, happy to make any any other updates/changes that I may have missed as well. This is my first PR to the project.

@ezyang
Copy link
Contributor

ezyang commented Sep 9, 2018

I didn't review the code closely, but IIUC you are not sharing storage with the old tensor are you? Since this is the case, you should not call it narrow, as part of the API contract of narrow is that it shares storage.

Tensor _narrow_sparse(const Tensor& self, int64_t dim, int64_t start, int64_t length){
LongTensor indices = self._indices();
Tensor values = self._values();
int64_t numCoords = indices.size(1);

This comment was marked as off-topic.

Copy link
Contributor

@ezyang ezyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, this needs some work.

The big problem is that, on CUDA, this kernels is written in a hugely inefficient way, because the iterated calls to toCLong each do a CUDA synchronization. That's a lot of synchronizations. There is also a code smell, where loops are written by hand (which means there's no opportunity for vectorization or parallelism).

Assuming that you want to write a non-storage sharing narrow, I think a better strategy is to compute a boolean mask of indices to keep, and then use masked_select to grab them from indices and values. That should eliminate all of the loops.

@ezyang
Copy link
Contributor

ezyang commented Sep 9, 2018

To answer your questions:

I used the log1p implementation in #8969 as a guide especially for testing. I'm not sure what the @skipIfROCM annotation is for, so unsure if i need it for my test.

skipIfROCM turns off the test for AMD GPUs. I would advise you to omit that for now, and then add it if the AMD tests fail.

I implemented the branching logic in the narrow function itself; is this the right place to do so? I noticed that there a number of places where sparse-specific logic is handled with just an if statement in this file. Or should I implement a separate dispatch in native_functions.yml as in the log1p?

Moot if you don't call this narrow.

@realdoug
Copy link
Contributor Author

realdoug commented Sep 10, 2018

@ezyang Thanks for the comments. My intention is to mirror the functionality of dense.narrow() as closely as I can; so sharing storage is the preferred approach as much as possible.

A) For sizes & indices the values must change, so I don't know if sharing storage is possible. Am I correct in saying that? Can still eliminate the loop w/ masked_select though.

B) For the values tensor sounds like I can just compute a boolean mask to share storage. Will do.

If I'm correct in A & B do you still suggest a name change for the function?

@realdoug realdoug force-pushed the narrow-sparse branch 3 times, most recently from 8187c8b to 0cc6561 Compare September 10, 2018 22:16
@realdoug
Copy link
Contributor Author

realdoug commented Sep 10, 2018

I've pushed a new version w/ rename & replaced the for loops. This implementation does not share storage, but it is an improvement over to_dense().narrow().

If there is an easy way to do Tensor newValues = self._values().masked_select(mask);
while preserving shared storage, let me know. Otherwise I think I'd have to rely on/enforce that the values tensor is contiguous, meaning indices must be sorted by the dimension being narrowed on.

@ezyang
Copy link
Contributor

ezyang commented Sep 12, 2018

Yes, I think I agree that writing a storage sharing narrow seems difficult. It might be possible for coalesced tensors, if you only narrow a trailing dimension, but that seems like a limited enough case that we shouldn't bother.

If there is an easy way to do Tensor newValues = self._values().masked_select(mask);
while preserving shared storage, let me know. Otherwise I think I'd have to rely on/enforce that the values tensor is contiguous, meaning indices must be sorted by the dimension being narrowed on.

I don't see how your second sentence follows from the first. There's no way to do a masked_select that preserves shared storage. But if you select the correct values for the indices in the narrow range, I don't see why they have to be sorted. You've already committed to doing an O(n) operation...

Tensor newIndices = indices.masked_select(mask).view({dims, -1});
return self.type().sparse_coo_tensor(newIndices, newValues, newSizes);
}else{
return self.clone().narrow(dim,start,length);

This comment was marked as off-topic.

@ezyang
Copy link
Contributor

ezyang commented Sep 12, 2018

CC @yf225 @weiyangfb @gchanan if you have any opinions about the naming. Maybe copy_narrow is marginally better than narrow_copy, because it has the correct order of operations in English.

newSizes[dim]=length;

Tensor narrowDim = at::zeros_like(indices[dim]);
narrowDim.copy_(indices[dim]);

This comment was marked as off-topic.

This comment was marked as off-topic.

narrowDim.copy_(indices[dim]);
Tensor mask = (narrowDim >= start).__and__((narrowDim < end));

indices[dim] = indices[dim].add(-start);

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

@ezyang
Copy link
Contributor

ezyang commented Sep 12, 2018

New functions would need docs.

Copy link
Contributor

@ezyang ezyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still buggy.

@realdoug
Copy link
Contributor Author

realdoug commented Sep 12, 2018

I don't see how your second sentence follows from the first. There's no way to do a masked_select that preserves shared storage. But if you select the correct values for the indices in the narrow range, I don't see why they have to be sorted. You've already committed to doing an O(n) operation...

Sorry, I skipped a few logical steps there. I was just referring to the fact that you can theoretically do values[start:end] which does preserve storage.

@realdoug
Copy link
Contributor Author

alrighty, added docs & a simplifications re: convo above.

r"""
narrow(dimension, start, length) -> Tensor
Same functionality as :meth:`Tensor.narrow` except returning a full copy,

This comment was marked as off-topic.

This comment was marked as off-topic.

Copy link
Contributor

@weiyangfb weiyangfb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR still requires some changes

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

narrow_copy(dimension, start, length) -> Tensor
Same as :meth:`Tensor.narrow` except returning a copy rather
than shared storage. This is primarily for sparse tensors, which

This comment was marked as off-topic.

This comment was marked as off-topic.

Tensor narrow_copy_sparse(const Tensor& self, int64_t dim, int64_t start, int64_t length){
AT_CHECK(self.dim() > 0, "narrow() cannot be applied to a 0-dim tensor.");
auto cur_size = self.size(dim);
AT_CHECK(length >= 0 && start <= cur_size - length,

This comment was marked as off-topic.


with_dense, _, _ = self._gen_sparse(2, 7, shape)
for narrow_args in self._all_narrow_combs(shape):
self._test_narrow(with_dense, narrow_args)

This comment was marked as off-topic.

self._test_narrow(with_dense, narrow_args)
self._test_narrow(with_dense, narrow_args)

self.assertRaises(RuntimeError, lambda: input.narrow_copy(10, 0, 3)) # dim > sparseDim + denseDim

This comment was marked as off-topic.

input, _, _ = self._gen_sparse(4, 19, shape)
for narrow_args in self._all_narrow_combs(shape):
self._test_narrow(input, narrow_args)
self._test_narrow(input.coalesce(), narrow_args)

This comment was marked as off-topic.


Tensor narrow_copy_sparse(const Tensor& self, int64_t dim, int64_t start, int64_t length){
int64_t allDim = self.dim();
int64_t end = start+length;

This comment was marked as off-topic.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

weiyangfb has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@weiyangfb
Copy link
Contributor

Looks good! Thanks @realdoug !

zdevito pushed a commit to zdevito/ATen that referenced this pull request Sep 26, 2018
Summary:
Couple questions:

1) I used the log1p implementation in #8969 as a guide especially for testing.  I'm not sure what the ```skipIfROCM``` annotation is for, so unsure if i need it for my test.

2) I implemented the branching logic in the narrow function itself; is this the right place to do so?  I noticed that there a number of places where sparse-specific logic is handled with just an if statement in this file.  Or should I implement a separate dispatch in native_functions.yml as in the log1p?

And of course, happy to make any any other updates/changes that I may have missed as well.  This is my first PR to the project.
Pull Request resolved: pytorch/pytorch#11342

Differential Revision: D9978430

Pulled By: weiyangfb

fbshipit-source-id: e73dc20302ab58925afb19e609e31f4a38c634ad
petrex pushed a commit to petrex/pytorch that referenced this pull request Sep 26, 2018
* upstream/master: (117 commits)
  Add full namespace resolution in CAFFE_DURATION (pytorch#12065)
  T33898723: Simple put operators for caffe2 stats (pytorch#12057)
  add narrow() support for sparse tensors re: pytorch#8853 (pytorch#11342)
  Fix ONNX bug, add symbolic for full
  Enable tracing of tensor factories with an out argument
  Fix warnings emitted when testing distributions (pytorch#12038)
  Unify versions across setup.py, libtorch, and libcaffe2 (pytorch#12053)
  add autodiff expressions for common operations (pytorch#11832)
  Blob doesn't allow access to destroyCall anymore (pytorch#11548)
  IValue can store Blob (pytorch#11414)
  Move Blob to ATen/core (pytorch#11924)
  Use tempfile during serialized test comparison (pytorch#12021)
  fix segfault when grad to a hook fn is None (pytorch#12028)
  Fallback CreateMutex/AtomicIter operators for mkl-dnn
  Unify all *_EXPORT and *_IMPORT macros across c++ backend (pytorch#12019)
  Add safety asserts for methods on TensorImpl which don't work on Variable. (pytorch#12058)
  Make USE_IDEEP work again (pytorch#12026)
  Fix "identifier following the 'template' keyword does not refer to a template" (pytorch#12037)
  Delete some unused variables. (pytorch#12059)
  Support TypeIdentifier::name() (pytorch#12036)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants