Skip to content

Conversation

@zou3519
Copy link
Contributor

@zou3519 zou3519 commented Feb 13, 2018

This is a tool that is intended to be used as initial exploratory debugging of bottlenecks in user scripts. Run it with

python -m torch.utils.bottleneck /path/to/source/script.py

Internally it runs the script once with the python profiler and once with the autograd profiler and prints the top 15 hits sorted by cpu time (for both profilers).

Sample Output

cc @soumith

Test Plan

Really basic tests to check that the output of torch.util.bottleneck isn't completely empty. Taking suggestions on how to write better tests.

Built the new docs page:
image

@ezyang
Copy link
Contributor

ezyang commented Feb 13, 2018

SO COOL! :D

This comment was marked as off-topic.

This comment was marked as off-topic.

@vadimkantorov
Copy link
Contributor

very cool! wish list for monitoring stats: gpu-util / cpu-util / analyzing cpu-bound vs ram-bound vs gpu-compute bound vs gpu-memory bound / analysis of committed gpu memory

@apaszke
Copy link
Contributor

apaszke commented Feb 13, 2018

Nice tool! However presenting only the output of CUDA_LAUNCH_BLOCKING can be very misleading, as it will show some ops as very costly, even though they are the ones that allow to hide latency much later. I think we should at least mention this in the output, or produce two lists (one more without CUDA_LAUNCH_BLOCKING).

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

@Evpok
Copy link
Contributor

Evpok commented Feb 24, 2018

Would it be possible to allow running scripts with command line arguments ?

@zou3519
Copy link
Contributor Author

zou3519 commented Feb 26, 2018

@Evpok that sounds like a good idea. I'll look into it and put it into the next iteration of this.

zou3519 and others added 5 commits March 7, 2018 13:22
This is a tool that is intended to be used as initial exploratory
debugging of bottlenecks in user scripts. Run it with

    python -m torch.utils.bottleneck /path/to/source/script.py
Copy link
Contributor

@apaszke apaszke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. It would be good to expand on CUDA profiling a bit (because it's really very complicated), but should be good to go after that.

Due to the asynchronous nature of CUDA kernels, when running against
CUDA code, the cProfile output and CPU-mode autograd profilers may
not show correct timings. In this case, the CUDA-mode autograd
profiler is better at assigning blame to the relevant operator(s).

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

@ezyang ezyang merged commit feb2785 into pytorch:master Mar 23, 2018
sighingnow added a commit to sighingnow/pytorch that referenced this pull request Mar 25, 2018
* upstream/master: (663 commits)
  Fix "command not found" error in perf test (pytorch#5982)
  add pip mkl-devel to the error message when mkl is found but mkl headers are not (pytorch#5984)
  Support batch LowerCholeskyTransform (pytorch#5980)
  Linearly interpolating upsampling fix (pytorch#5927)
  Store perf numbers in S3 (pytorch#5951)
  Modidy setup docs for Windows (pytorch#5981)
  Group Normalization (pytorch#5968)
  [distributions] Implement Power transform (pytorch#5976)
  Disable TestBottleneck test_cuda on Windows (pytorch#5977)
  Fix crash when cat-ing empty cuda tensors (pytorch#5971)
  Update no_unions flag for nanopb gen and update ONNX proto files (pytorch#5972)
  Expose gradients w.r.t. input & weight for conv1d, conv2d, conv3d in Python (pytorch#5408)
  Fixed non-determinate preprocessing on DataLoader (pytorch#4640)
  add AVX2 implementation for sigmoid function (pytorch#5010)
  Implement torch.util.bottleneck (pytorch#5216)
  Remove pragma once from cpp file (pytorch#5965)
  fix mvn docs (pytorch#5967)
  Fix incorrect rendering of Tensor.index_*_ doc examples. (pytorch#5969)
  Implement range for loop in script (pytorch#5827)
  Add windows doc (pytorch#5859)
  ...

# Conflicts:
#	aten/src/TH/generic/THTensorMath.c
#	torch/_tensor_docs.py
#	torch/csrc/generic/methods/TensorCompare.cwrap
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants