-
Notifications
You must be signed in to change notification settings - Fork 26.3k
speed up torch.sparse_mask() cpu kernel #13290
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
facebook-github-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@weiyangfb has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
torch/_tensor_docs.py
Outdated
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
I'm not sure what you mean by this. |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
ezyang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good work, thank you.
|
@ezyang I meant for code in condition: D.dim > S.sparse_dim, I just copy & paste from pytorch/aten/src/ATen/native/sparse/cuda/SparseCUDATensor.cpp Lines 33 to 51 in 482b136
|
|
It's OK, I wouldn't block th epatch on it. |
facebook-github-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@weiyangfb has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
5920e0a to
7e8d4e1
Compare
facebook-github-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@weiyangfb has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
…, copy over CUDA kernel implementation
7e8d4e1 to
e12531f
Compare
facebook-github-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@weiyangfb has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Summary:
- `sparse_mask(D, S)` is useful to implement backward for `sparse_addmm()`
- previous `sparse_mask(D, S)` cpu kernel is not parallelized
- this PR speed up the cpu kernel for two separated cases:
- `D.dim == S.sparse_dim`: simply parallelize the kernel
- `D.dim > S.sparse_dim`: simply use CUDA kernel implementation
- performance:
`D.dim == S.sparse_dim`
```
>>> nnz = 100000
>>> dims = [1000, 1000]
>>> I = torch.cat([torch.randint(0, dims[0], size=(nnz,)),
torch.randint(0, dims[1], size=(nnz,))], 0).reshape(2, nnz)
>>> V = torch.randn(nnz)
>>> size = torch.Size(dims)
>>> S = torch.sparse_coo_tensor(I, V, size).coalesce()
>>> D = torch.randn(dims)
>>> %timeit D.sparse_mask(S)
======= before change =======
6.4 ms ± 684 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
======= after change =======
333 µs ± 89.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
```
`D.dim > S.sparse_dim`
```
>>> nnz = 100000
>>> dims = [1000, 1000, 2, 2]
>>> I = torch.cat([torch.randint(0, dims[0], size=(nnz,)),
torch.randint(0, dims[1], size=(nnz,))], 0).reshape(2, nnz)
>>> V = torch.randn(nnz, dims[2], dims[3])
>>> size = torch.Size(dims)
>>> S = torch.sparse_coo_tensor(I, V, size).coalesce()
>>> D = torch.randn(dims)
%timeit D.sparse_mask(S)
======= before change =======
495 ms ± 41.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
======= after change =======
594 µs ± 68.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
```
Pull Request resolved: pytorch/pytorch#13290
Differential Revision: D12878336
Pulled By: weiyangfb
fbshipit-source-id: 10b5981af382f7c6095a42c0fee7297d6438ce37
sparse_mask(D, S)is useful to implement backward forsparse_addmm()sparse_mask(D, S)cpu kernel is not parallelizedD.dim == S.sparse_dim: simply parallelize the kernelD.dim > S.sparse_dim: simply use CUDA kernel implementationD.dim == S.sparse_dimD.dim > S.sparse_dim