1© 2017 Rogue Wave Software, Inc. All Rights Reserved. 1
Debugging CUDA
applications
GPU Technology Conference
Roundtable discussions
2© 2017 Rogue Wave Software, Inc. All Rights Reserved. 2
Complexity in debugging CUDA
• Heavily parallel
– Explosion in threads
– Usage of MPI, OpenMP, OpenACC
• Used together
• Complex platforms
– CPUs with GPUs many times mixed
• Multiple languages being used
– Python for ease of use
– C/C++ for speed and legacy algorithms
– The data passing and glue code adds complexity
3© 2017 Rogue Wave Software, Inc. All Rights Reserved. 3
Directive languages + GPUs debugging
• OpenMP 4 debugging support (CPUs and GPUs) for Sierra
• Collaborate on OpenMP Debug API (OMPD) design
• Three phases
– Phase 1: OMPD: OMP3.1/CPU, x86_64
– Phase 2: OMPD: OMP4/CPU/GPU, x86_64
– Phase 3: OMPD: OMP4/CPU/GPU, PowerLE
– Need compiler support to make it all work
– OpenACC works always interested in user input
4© 2017 Rogue Wave Software, Inc. All Rights Reserved. 4
Current debugging advancements
Debugging multiple processes needs to be exclusive
Support one debug process per GPU
5© 2017 Rogue Wave Software, Inc. All Rights Reserved. 5
Python with filtering
Python code available by choosing the stack frame
Program counter shows where the call was made from in Python
Come on Thursday to S7506 - ROLLING IN THE DEEP for details
6© 2017 Rogue Wave Software, Inc. All Rights Reserved. 6
TotalView for the NVIDIA ® GPU
Accelerator
• TotalView
– Leading debugger for multi-
threaded issues at scale
• CUDA 8.0
• Cray, OpenPOWER, Linux, OpenACC
• Compiler: GCC, PGI
• Features and capabilities include
– MPI-based clusters
– Flexible Display and Navigation
– Memory address spaces
– Leverages CUDA memcheck
– CUDA dynamic parallelism
© 2015 ROGUE WAVE SOFTWARE, INC. ALL RIGHTS RESERVED
7© 2017 Rogue Wave Software, Inc. All Rights Reserved. 7
Interested in learning more
about TotalView for HPC:
www.roguewave.com
8© 2017 Rogue Wave Software, Inc. All Rights Reserved. 8
9© 2017 Rogue Wave Software, Inc. All Rights Reserved. 9
TotalView for HPC
• Comprehensive multi-threaded analysis and debug environment
– Thread specific breakpoints
– Control individual thread execution
– View thread specific stack and data
– View complex data types easily
• Integrated Reverse debugging
• Track memory leaks in running applications
• Supports C/C++ on Linux
• Allowing the business to have
– Predictable development schedules
– Less time spent debugging

Debugging CUDA applications

  • 1.
    1© 2017 RogueWave Software, Inc. All Rights Reserved. 1 Debugging CUDA applications GPU Technology Conference Roundtable discussions
  • 2.
    2© 2017 RogueWave Software, Inc. All Rights Reserved. 2 Complexity in debugging CUDA • Heavily parallel – Explosion in threads – Usage of MPI, OpenMP, OpenACC • Used together • Complex platforms – CPUs with GPUs many times mixed • Multiple languages being used – Python for ease of use – C/C++ for speed and legacy algorithms – The data passing and glue code adds complexity
  • 3.
    3© 2017 RogueWave Software, Inc. All Rights Reserved. 3 Directive languages + GPUs debugging • OpenMP 4 debugging support (CPUs and GPUs) for Sierra • Collaborate on OpenMP Debug API (OMPD) design • Three phases – Phase 1: OMPD: OMP3.1/CPU, x86_64 – Phase 2: OMPD: OMP4/CPU/GPU, x86_64 – Phase 3: OMPD: OMP4/CPU/GPU, PowerLE – Need compiler support to make it all work – OpenACC works always interested in user input
  • 4.
    4© 2017 RogueWave Software, Inc. All Rights Reserved. 4 Current debugging advancements Debugging multiple processes needs to be exclusive Support one debug process per GPU
  • 5.
    5© 2017 RogueWave Software, Inc. All Rights Reserved. 5 Python with filtering Python code available by choosing the stack frame Program counter shows where the call was made from in Python Come on Thursday to S7506 - ROLLING IN THE DEEP for details
  • 6.
    6© 2017 RogueWave Software, Inc. All Rights Reserved. 6 TotalView for the NVIDIA ® GPU Accelerator • TotalView – Leading debugger for multi- threaded issues at scale • CUDA 8.0 • Cray, OpenPOWER, Linux, OpenACC • Compiler: GCC, PGI • Features and capabilities include – MPI-based clusters – Flexible Display and Navigation – Memory address spaces – Leverages CUDA memcheck – CUDA dynamic parallelism © 2015 ROGUE WAVE SOFTWARE, INC. ALL RIGHTS RESERVED
  • 7.
    7© 2017 RogueWave Software, Inc. All Rights Reserved. 7 Interested in learning more about TotalView for HPC: www.roguewave.com
  • 8.
    8© 2017 RogueWave Software, Inc. All Rights Reserved. 8
  • 9.
    9© 2017 RogueWave Software, Inc. All Rights Reserved. 9 TotalView for HPC • Comprehensive multi-threaded analysis and debug environment – Thread specific breakpoints – Control individual thread execution – View thread specific stack and data – View complex data types easily • Integrated Reverse debugging • Track memory leaks in running applications • Supports C/C++ on Linux • Allowing the business to have – Predictable development schedules – Less time spent debugging

Editor's Notes

  • #7 CLG: Added slide