-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Open
Labels
module: cuda graphsAbility to capture and then replay streams of CUDA kernelsAbility to capture and then replay streams of CUDA kernelstriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module
Description
A stalled #85519 adds capabilities to dump cuda graphs for inspection. We would like to extend it to
- provide a way to get all nodes from the graph
- modify nodes to patch in addresses for the inputs to avoid copying inputs. Note, this is more complicated than just comparing old and new pointers and swapping one for another because kernels could operate on input slices, so one would need to inspect if kernel argument falls within the range of the original input tensor and, if so, replace it with the corresponding new address + offset.
Metadata
Metadata
Assignees
Labels
module: cuda graphsAbility to capture and then replay streams of CUDA kernelsAbility to capture and then replay streams of CUDA kernelstriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module