You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
EXPERIMENTAL: Insert special debug ops (e.g., DebugIdentity) to graph for debugging. Currently, debug ops need to take exactly one input and has the string attribute "tensor_name" to indicate what tensor it watches.
For example, before the node insertion, the graph may look like:
A:0 -----------1----------> B
|
---------2-----------> C
wherein the output slot 0 of node A feeds as the input to nodes B through
edge 1 and to node C through edge 2.
After the node insertion, assuming both B and C have non-Ref input, the graph becomes:
A:0 ---3---> Copy -----------4----------> B
|
---------5--------> C
|
---------6--------> X
If a node (e.g., B) has Ref input, the graph becomes:
----------------4---------------> B
|
A:0 ---3-----> Copy -----------5----------> C
|
-----------6--------> X
In other words, we do not feed Refs to deep-copies to downstream nodes.
The Copy node is the inserted deep-copy node that copies the input tensor on-device (e.g., CPU-to-CPU or GPU-to-GPU deep copy) that reduces the likelihood of racy updates during debug tensor-watching. X is the newly created debug node that transforms the input (copy of the watched tensor) into a debug signal.
DebugIdentity is the simplest debugging paradigm, in which the debug signal (i.e., X:0) equals the tensor itself. More sophisticated debug ops can be used to transform the tensor into other useful debug signals. An example is the added DebugNanCounter op.
If the nodes (A, B and C) are located on GPU and the edges from A to B or C is HOST_MEMORY, the CopyHost op will be used instead of the Copy op.
A reserved string attribute "debug_url" is created for the debug ops to make it possible to send debug signals to files or RPC calls in the future.
Other points worth noting:
* The debug ops have control-edge connections to the original destination node, in order to ensure that the debug signals are deterministically generated before the destination node executes.
* More than one debug ops can be added to watch a tensor.
* A new field called "DebugTensorWatch" is added to RunOptions to support debug node insertion.
* A new method GPUUtil::CopyGPUTensorToSameGPU has been added to make GPU-to-GPU deep-copy of tensors possible.
* The two test files (debug_gateway_test.cc and debug_gateway_gpu_test.cc) have been consolidated to the former, by using the GOOGLE_CUDA macro.
Change: 127562075
0 commit comments