Performance
Profiling and
Numeric Python
BEN WELLER
Brief Problem Statement
 Reading DIRECTLY off the slides
 Never posts the slides
 Video is available
 Automatically grab all the slides
 Quickly underwhelmed with speed
 How to speed this up
 Speed has never been an issue previously
Python isn’t the Problem*
 The fault dear programmer isn’t in Python
 But sometimes it is
 Get around it
 Async
 C extension
 Pypy (JIT everything)
 Numba (just JIT certain methods)
Before Anything Else:
 Software Hierarchy of needs
Fast enough?
Tests
Does the code
work?
Initial Code/Data
 Reading the frames:
 Data:
 1920 x 1080 x 3
 All pixels recorded in the video, then 3 values for each pixel (RGB)
 Test for validation:
Version 0.0
 Establish a baseline
 Time: 788.85
Version 0.1
 Remove all the redundant lines:
 Time: 728.88
Version 0.15
 Apply Numba
 Time: 200.35
Version 0.175
 Realize your calculation is wrong!
 Redo with new formula
 Time: 582.29
Version 0.2
 Apply numba
 Very slow odd results
 Time: 1020.90
 Results: [4458465513457839.0, 2137656293717757.8, 3461590301822020.5,
2435933916096979.0, 1020423444981550.0 …]
 Not bounded between 0 and 100
Version 0.75
 Realize there’s something wrong with broadcast operations?
 Time: 549.63
 Output was all in the valid range
Version 1.0
 Apply numba in full force
 Time: 186.37
What I learned
 Follow the hierarchy of needs!
 If you don’t you will get burned
 Tests/initial benchmarks are key
 Slowest code first
 Incrementally solve the next bit
 Very rarely, break out specialized tools
 Know your tools
 Has anyone ran in to the problem I had with broadcasting?

Performance Profiling and Numeric Python

  • 1.
  • 2.
    Brief Problem Statement Reading DIRECTLY off the slides  Never posts the slides  Video is available  Automatically grab all the slides  Quickly underwhelmed with speed  How to speed this up  Speed has never been an issue previously
  • 3.
    Python isn’t theProblem*  The fault dear programmer isn’t in Python  But sometimes it is  Get around it  Async  C extension  Pypy (JIT everything)  Numba (just JIT certain methods)
  • 4.
    Before Anything Else: Software Hierarchy of needs Fast enough? Tests Does the code work?
  • 5.
    Initial Code/Data  Readingthe frames:  Data:  1920 x 1080 x 3  All pixels recorded in the video, then 3 values for each pixel (RGB)  Test for validation:
  • 6.
    Version 0.0  Establisha baseline  Time: 788.85
  • 7.
    Version 0.1  Removeall the redundant lines:  Time: 728.88
  • 8.
    Version 0.15  ApplyNumba  Time: 200.35
  • 9.
    Version 0.175  Realizeyour calculation is wrong!  Redo with new formula  Time: 582.29
  • 10.
    Version 0.2  Applynumba  Very slow odd results  Time: 1020.90  Results: [4458465513457839.0, 2137656293717757.8, 3461590301822020.5, 2435933916096979.0, 1020423444981550.0 …]  Not bounded between 0 and 100
  • 12.
    Version 0.75  Realizethere’s something wrong with broadcast operations?  Time: 549.63  Output was all in the valid range
  • 13.
    Version 1.0  Applynumba in full force  Time: 186.37
  • 14.
    What I learned Follow the hierarchy of needs!  If you don’t you will get burned  Tests/initial benchmarks are key  Slowest code first  Incrementally solve the next bit  Very rarely, break out specialized tools  Know your tools  Has anyone ran in to the problem I had with broadcasting?