Skip to content

Conversation

@colesbury
Copy link
Member

No description provided.

@colesbury colesbury merged commit 849794c into pytorch:master Dec 30, 2016
@colesbury colesbury deleted the delete branch December 30, 2016 23:39
jjsjann123 pushed a commit to jjsjann123/pytorch that referenced this pull request Sep 23, 2020
* Basic Write-After-Read (WAR) check to add __syncthreads to end of for-loop

* Enable Tiled GEMM example

* Check that IterDomain iterates from zero to some positive integer

Co-authored-by: Ryan Spring <rspring@nvidia.com>
jjsjann123 pushed a commit to jjsjann123/pytorch that referenced this pull request Sep 23, 2020
* Get a crazy test example working.

* Change problem size and tile size, still an issue with N > 32.

* Add sync threads in loops that read from smem, to make sure we finish reading before writing.

* Predicate off threads bound to a broadcast dim of an output when its in shared memory.

* Predicate smem tiling writing based on broadcasted dims in consumer.

* Cleanup example a bit.

* Revert "Add sync threads in loops that read from smem, to make sure we finish reading before writing."

This reverts commit dffaa76.

Revert this in favor of pytorch#383

* Add _syncthreads for Write-After-Read Race (pytorch#383)

* Basic Write-After-Read (WAR) check to add __syncthreads to end of for-loop

* Enable Tiled GEMM example

* Check that IterDomain iterates from zero to some positive integer

Co-authored-by: Ryan Spring <rspring@nvidia.com>

* Refactor thread predication for writes to smem

Co-authored-by: Naoya Maruyama <nmaruyama@nvidia.com>
Co-authored-by: Ryan Spring <rdspring1@gmail.com>
Co-authored-by: Ryan Spring <rspring@nvidia.com>
jjsjann123 pushed a commit to jjsjann123/pytorch that referenced this pull request Sep 24, 2020
* Basic Write-After-Read (WAR) check to add __syncthreads to end of for-loop

* Enable Tiled GEMM example

* Check that IterDomain iterates from zero to some positive integer

Co-authored-by: Ryan Spring <rspring@nvidia.com>
jjsjann123 pushed a commit to jjsjann123/pytorch that referenced this pull request Sep 24, 2020
* Get a crazy test example working.

* Change problem size and tile size, still an issue with N > 32.

* Add sync threads in loops that read from smem, to make sure we finish reading before writing.

* Predicate off threads bound to a broadcast dim of an output when its in shared memory.

* Predicate smem tiling writing based on broadcasted dims in consumer.

* Cleanup example a bit.

* Revert "Add sync threads in loops that read from smem, to make sure we finish reading before writing."

This reverts commit dffaa76.

Revert this in favor of pytorch#383

* Add _syncthreads for Write-After-Read Race (pytorch#383)

* Basic Write-After-Read (WAR) check to add __syncthreads to end of for-loop

* Enable Tiled GEMM example

* Check that IterDomain iterates from zero to some positive integer

Co-authored-by: Ryan Spring <rspring@nvidia.com>

* Refactor thread predication for writes to smem

Co-authored-by: Naoya Maruyama <nmaruyama@nvidia.com>
Co-authored-by: Ryan Spring <rdspring1@gmail.com>
Co-authored-by: Ryan Spring <rspring@nvidia.com>
KyleCZH pushed a commit to KyleCZH/pytorch that referenced this pull request Sep 20, 2021
eellison pushed a commit to eellison/pytorch that referenced this pull request Jun 29, 2022
* Adding nan

* Jason's comments
hubertlu-tw pushed a commit to hubertlu-tw/pytorch that referenced this pull request Nov 1, 2022
…te_term

Add support for fp16 update term (new UPD_T typename in template)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants