PyconMini JP 2011




How to Create a High-Speed
Template Engine in Python
Python




makoto kuwata
http://www.kuwata-lab.com/
Profile
 @makotokuwata

 http://www.kwuata-lab.com/

 Ruby/PHP/Python programmer

 Creator of Erubis (*)

 Python4PHPer


                 (*) default template engine on Rails 3
Python Products
 Tenjin       : very fast temlate engine
 Kook         : task utility like Ant/Rake
 Benchmarker : a good friend for performance
 Oktest       : new-style testing library
Tenjin
 Very fast
                       	      	     	  	      	    	 
 One file, 2000 lines   	 
                       	 	 
 Full-featured         	 
                       	      	             	 
 Python 3 support

 Google App Engine

 Release 1.0 coming soon!

 http://www.kuwta-lab.com/tenjin/
Benchmark
                   Tenjin                                                   2660.1
                   Mako                                    1426.4
                   Jinja2                                 1257.6
             Templetor                           903.0
               Cheetah                    562.3
                 Django           114.2
                 Genshi           55.7
                       Kid        34.6
                              0          600       1200       1800   2400     3000

Python 2.5.5, MacOS X 10.6 (x86_64), 2GB                                    pages/sec
Tenjin 1.0.0, Mako 0.2.5, Jinja2 2.2.1, Templetor 0.32,
Cheetah 2.2.2, Django 1.1.0, Genshi 0.5.1, Kid 0.9.6
Benchmarks for
String Concatenation
append()
Benchmark                          pages/sec



  append()




             0   200   400   600       800     1000
extend()
Benchmark                           pages/sec



  append()
   extend()




              0   200   400   600       800     1000
StringIO
Benchmark                           pages/sec



  append()
   extend()
   StringIO




              0   200   400   600       800     1000
mmap
Benchmark                           pages/sec



  append()
   extend()
   StringIO
    mmap




              0   200   400   600       800     1000
Generator
Benchmark                           pages/sec



  append()
   extend()
   StringIO
     mmap
  generator




              0   200   400   600       800     1000
Slice
Benchmark                               pages/sec



    append()
     extend()
     StringIO
       mmap
    generator
     slice[-1:]
 slice[99999:]

                  0   200   400   600       800     1000
Bound method
Benchmark                                 pages/sec



       append()
        extend()
       StringIO
         mmap
      generator
       slice[-1:]
   slice[99999:]
extend() (bound)
                    0   200   400   600       800     1000
Summary
Fast
 bound method >= slice[] > extend()

Slow
 Generator > append() > mmap > StringIO
Try Benchmark Script
Step by Step to
Tune-up Template Code
HTML Template
Python Code
Benchmark                                                 pages/sec

  append (singleline)




                        0   2000   4000   6000   8000 10000 12000
Multiple Line String
      	 
                       	 

           	 

      	 
                            	 

 	              Eliminates method call
Benchmark                                                 pages/sec

  append (singleline)
   append (multiline)




                        0   2000   4000   6000   8000 10000 12000
From append() to extend()
     	 
                              	 	 



     	 
                                	 	 
 	 	 	 	 	 	 	 	 	 	 	 	 	 
 	 	 	 	 	 	 	 	 	 	 	 	 	             	 

Eliminates method call
Benchmark                                                 pages/sec

  append (singleline)
   append (multiline)
   extend (unbound)




                        0   2000   4000   6000   8000 10000 12000
Bound Method
 	 
                   	      	         	 
                   	      	         	 
                   	      	         	 

 	 
      	  	 
              	        	       	 
              	        	       	 
                      Eliminates
              	        	       	 
                    fetch method
Benchmark                                                 pages/sec

  append (singleline)
   append (multiline)
   extend (unbound)
     extend (bound)




                        0   2000   4000   6000   8000 10000 12000
str() function
     	 
           	       	          	 
 	 	       	       	 
 	 	       	       	                 	 

     	 
           	       	               	 
 	 	       	             	 
 	 	       	             	                	 

  Necessary in Python!
Benchmark                                                 pages/sec

  append (singleline)
   append (multiline)
   extend (unbound)
     extend (bound)

        extend + str




                        0   2000   4000   6000   8000 10000 12000
Local Variable
     	 

 	 	 	 	 	 	 	 	 	 
 	 	 	 	 	 	 	 	 	      	 

     	                Local var is faster than
          	  	          global/build-in var
Benchmark                                                 pages/sec

  append (singleline)
   append (multiline)
   extend (unbound)
     extend (bound)

        extend + str
   extend + _str=str




                        0   2000   4000   6000   8000 10000 12000
Format ('%' operator)
     	 
                              	                     	 
                 	                      	 
                 	                      	                	 

     	 
                                             Delete all str() call
                                              by '%' operator
Benchmark                                                 pages/sec

  append (singleline)
   append (multiline)
   extend (unbound)
     extend (bound)

        extend + str
   extend + _str=str
   append + format




                        0   2000   4000   6000   8000 10000 12000
None => Empty String
     	                     Converts None
       	                   to empty string
Benchmark                                                   pages/sec

    append (singleline)
     append (multiline)
     extend (unbound)
       extend (bound)

           extend + str
     extend + _str=str
      append + format
       extend + to_str
extend + _to_str=to_str




                          0   2000   4000   6000   8000 10000 12000
Escape HTML
Benchmark                                                   pages/sec

    append (singleline)
     append (multiline)
     extend (unbound)
       extend (bound)

           extend + str
     extend + _str=str
      append + format
       extend + to_str
extend + _to_str=to_str

     escape_html + str
  escape_html + to_str



                          0   2000   4000   6000   8000 10000 12000
C Extension
                               Implemented in C
     	 
          	           	          	         	 

                        	 	 
 	 	 	 	 	 	 	 	 	 
 	 	 	 	 	 	 	 	 	                    	 
     	 
                        	 	 
 	 	 	 	 	 	 	 	 	 
 	 	 	 	 	 	 	 	 	                    	 

     webext: http://pypi.python.org/pypi/Webext/
Benchmark                                                      pages/sec

        append (singleline)
         append (multiline)
         extend (unbound)
           extend (bound)

               extend + str
         extend + _str=str
          append + format
           extend + to_str
    extend + _to_str=to_str

        escape_html + str
     escape_html + to_str
webext.escape_html, to_str
     webext.escape_html
                              0   2000   4000   6000   8000 10000 12000
Extreme join()
             Not escaped                          Be escaped
          if index % 2 == 0                   if index % 2 == 1
     	                                        (no need to call
          	  	                                escape_html() !)
Benchmark


  Not implemeted yet...
Summary
String concatenation is not a bottleneck
  extend() & join() are enough fast

Bottleneck is str() and escape_html()
 join() should call str() internally

 C Extension (webext) is great
Other Topics
Google says...

  ... The major web applications we
  have surveyed have indicated that
  they bottleneck primarily on
  template systems, ...
                                    Django?
  http://code.google.com/p/unladen-swallow/wiki/ProjectPlan
Case Study #1
  http://www.myweightracker.com/
  Switch from Django template to Tenjin

        M, C, Network, etc...                  Django


                                                           ed
        M, C, Network, etc...                           Spe !
                                                      pp Up
                                                    A
                                                      30%
https://groups.google.com/group/kuwata-lab-products/
browse_thread/thread/b50877a9c56d64c9/60f77b5c9b9f5238
Case Study #2
   Ruby on Rails 1.2
   Remove helper methods by preprocessing

  M, C, Network, etc...                       Helper Methods

         template engine
                                                                ed
  M, C, Network, etc...                                   pp Spe !
                                                        A      Up
                                                         1 00%

http://jp.rubyist.net/magazine/?0021-Erubis
Components of View Layer
                              Just one of them

                   Template
Important for       Engine           More Important
performance!
                                    for performance!



        Helper                 Cache
       Functions              Mechanism
Preprocessing in Tenjin

    Convert

              	         	        	 

    Execute        Called everytime
Preprocessing in Tenjin

              Call function
    Convert
              in this stage

                    	 

    Execute   Func call removed
Python v.s. Others
   plTenjin (Perl)                                  12108.0

pyTenjin+Webext              4179.7
                                                        he
                                                      st !
 phpTenjin (PHP)         2788.0
                                               Pe rl i ion
                                                 ha  mp
pyTenjin (Python)        2682.9                C
  rbTenjin (Ruby)        2634.8

                     0       2500     5000   7500   10000   12500

                                                     pages/sec
Why Perl is so Fast?
 No need to call str(val) nor val.toString()
 Bytecode op for string concatenation
C Ext v.s. Pure Script
        plTenjin                                                  Pure Perl
       MobaSiF                                            C Ext
Template::Toolkit       C Ext

pyTenjin+Webext                     Python + C Ext
         pyTenjin               Pure Python
        Cheetah         C Ext                        No need to impl
                                                       engine in C
         rbTenjin             Pure Ruby              (except helpers)
           eruby         C Ext
                    0      2500     5000      7500     10000    12500

                                                               pages/sec
Summary
View layer components
 Template engine, Helper functions, and
 Cache mechanism
No need to implement engine in C
(except helper functions)
Perl is great
Django temlate engine sucks
Appendix
 Tenjin: fast & full-featured template engine
  http://www.kuwata-lab.com/tenjin/

 Webext: C extension for escape_html()
  http://pypi.python.org/pypi/Webext/

 Benchmarker: a utility for benchmarking
  http://pypi.python.org/pypi/Benchmarker/
Appendix
 C              Ruby
     http://www.kuwata-lab.com/presen/rubykaigi2007.pdf
     http://jp.rubyist.net/magazine/?0022-FasterThanC

 Java               LL
     http://www.kuwata-lab.com/presen/LL2007LT.pdf


     http://jp.rubyist.net/magazine/?0024-TemplateSystem
     http://jp.rubyist.net/magazine/?0024-TemplateSystem2
thank you

How to Create a High-Speed Template Engine in Python

  • 1.
    PyconMini JP 2011 Howto Create a High-Speed Template Engine in Python Python makoto kuwata http://www.kuwata-lab.com/
  • 2.
    Profile @makotokuwata http://www.kwuata-lab.com/ Ruby/PHP/Python programmer Creator of Erubis (*) Python4PHPer (*) default template engine on Rails 3
  • 3.
    Python Products Tenjin : very fast temlate engine Kook : task utility like Ant/Rake Benchmarker : a good friend for performance Oktest : new-style testing library
  • 4.
    Tenjin Very fast One file, 2000 lines Full-featured Python 3 support Google App Engine Release 1.0 coming soon! http://www.kuwta-lab.com/tenjin/
  • 5.
    Benchmark Tenjin 2660.1 Mako 1426.4 Jinja2 1257.6 Templetor 903.0 Cheetah 562.3 Django 114.2 Genshi 55.7 Kid 34.6 0 600 1200 1800 2400 3000 Python 2.5.5, MacOS X 10.6 (x86_64), 2GB pages/sec Tenjin 1.0.0, Mako 0.2.5, Jinja2 2.2.1, Templetor 0.32, Cheetah 2.2.2, Django 1.1.0, Genshi 0.5.1, Kid 0.9.6
  • 6.
  • 7.
  • 8.
    Benchmark pages/sec append() 0 200 400 600 800 1000
  • 9.
  • 10.
    Benchmark pages/sec append() extend() 0 200 400 600 800 1000
  • 11.
  • 12.
    Benchmark pages/sec append() extend() StringIO 0 200 400 600 800 1000
  • 13.
  • 14.
    Benchmark pages/sec append() extend() StringIO mmap 0 200 400 600 800 1000
  • 15.
  • 16.
    Benchmark pages/sec append() extend() StringIO mmap generator 0 200 400 600 800 1000
  • 17.
  • 18.
    Benchmark pages/sec append() extend() StringIO mmap generator slice[-1:] slice[99999:] 0 200 400 600 800 1000
  • 19.
  • 20.
    Benchmark pages/sec append() extend() StringIO mmap generator slice[-1:] slice[99999:] extend() (bound) 0 200 400 600 800 1000
  • 21.
    Summary Fast bound method>= slice[] > extend() Slow Generator > append() > mmap > StringIO
  • 22.
  • 23.
    Step by Stepto Tune-up Template Code
  • 24.
  • 25.
  • 26.
    Benchmark pages/sec append (singleline) 0 2000 4000 6000 8000 10000 12000
  • 27.
    Multiple Line String Eliminates method call
  • 28.
    Benchmark pages/sec append (singleline) append (multiline) 0 2000 4000 6000 8000 10000 12000
  • 29.
    From append() toextend() Eliminates method call
  • 30.
    Benchmark pages/sec append (singleline) append (multiline) extend (unbound) 0 2000 4000 6000 8000 10000 12000
  • 31.
    Bound Method Eliminates fetch method
  • 32.
    Benchmark pages/sec append (singleline) append (multiline) extend (unbound) extend (bound) 0 2000 4000 6000 8000 10000 12000
  • 33.
    str() function Necessary in Python!
  • 34.
    Benchmark pages/sec append (singleline) append (multiline) extend (unbound) extend (bound) extend + str 0 2000 4000 6000 8000 10000 12000
  • 35.
    Local Variable Local var is faster than global/build-in var
  • 36.
    Benchmark pages/sec append (singleline) append (multiline) extend (unbound) extend (bound) extend + str extend + _str=str 0 2000 4000 6000 8000 10000 12000
  • 37.
    Format ('%' operator) Delete all str() call by '%' operator
  • 38.
    Benchmark pages/sec append (singleline) append (multiline) extend (unbound) extend (bound) extend + str extend + _str=str append + format 0 2000 4000 6000 8000 10000 12000
  • 39.
    None => EmptyString Converts None to empty string
  • 40.
    Benchmark pages/sec append (singleline) append (multiline) extend (unbound) extend (bound) extend + str extend + _str=str append + format extend + to_str extend + _to_str=to_str 0 2000 4000 6000 8000 10000 12000
  • 41.
  • 42.
    Benchmark pages/sec append (singleline) append (multiline) extend (unbound) extend (bound) extend + str extend + _str=str append + format extend + to_str extend + _to_str=to_str escape_html + str escape_html + to_str 0 2000 4000 6000 8000 10000 12000
  • 43.
    C Extension Implemented in C webext: http://pypi.python.org/pypi/Webext/
  • 44.
    Benchmark pages/sec append (singleline) append (multiline) extend (unbound) extend (bound) extend + str extend + _str=str append + format extend + to_str extend + _to_str=to_str escape_html + str escape_html + to_str webext.escape_html, to_str webext.escape_html 0 2000 4000 6000 8000 10000 12000
  • 45.
    Extreme join() Not escaped Be escaped if index % 2 == 0 if index % 2 == 1 (no need to call escape_html() !)
  • 46.
    Benchmark Notimplemeted yet...
  • 47.
    Summary String concatenation isnot a bottleneck extend() & join() are enough fast Bottleneck is str() and escape_html() join() should call str() internally C Extension (webext) is great
  • 48.
  • 49.
    Google says... ... The major web applications we have surveyed have indicated that they bottleneck primarily on template systems, ... Django? http://code.google.com/p/unladen-swallow/wiki/ProjectPlan
  • 50.
    Case Study #1 http://www.myweightracker.com/ Switch from Django template to Tenjin M, C, Network, etc... Django ed M, C, Network, etc... Spe ! pp Up A 30% https://groups.google.com/group/kuwata-lab-products/ browse_thread/thread/b50877a9c56d64c9/60f77b5c9b9f5238
  • 51.
    Case Study #2 Ruby on Rails 1.2 Remove helper methods by preprocessing M, C, Network, etc... Helper Methods template engine ed M, C, Network, etc... pp Spe ! A Up 1 00% http://jp.rubyist.net/magazine/?0021-Erubis
  • 52.
    Components of ViewLayer Just one of them Template Important for Engine More Important performance! for performance! Helper Cache Functions Mechanism
  • 53.
    Preprocessing in Tenjin Convert Execute Called everytime
  • 54.
    Preprocessing in Tenjin Call function Convert in this stage Execute Func call removed
  • 55.
    Python v.s. Others plTenjin (Perl) 12108.0 pyTenjin+Webext 4179.7 he st ! phpTenjin (PHP) 2788.0 Pe rl i ion ha mp pyTenjin (Python) 2682.9 C rbTenjin (Ruby) 2634.8 0 2500 5000 7500 10000 12500 pages/sec
  • 56.
    Why Perl isso Fast? No need to call str(val) nor val.toString() Bytecode op for string concatenation
  • 57.
    C Ext v.s.Pure Script plTenjin Pure Perl MobaSiF C Ext Template::Toolkit C Ext pyTenjin+Webext Python + C Ext pyTenjin Pure Python Cheetah C Ext No need to impl engine in C rbTenjin Pure Ruby (except helpers) eruby C Ext 0 2500 5000 7500 10000 12500 pages/sec
  • 58.
    Summary View layer components Template engine, Helper functions, and Cache mechanism No need to implement engine in C (except helper functions) Perl is great Django temlate engine sucks
  • 59.
    Appendix Tenjin: fast& full-featured template engine http://www.kuwata-lab.com/tenjin/ Webext: C extension for escape_html() http://pypi.python.org/pypi/Webext/ Benchmarker: a utility for benchmarking http://pypi.python.org/pypi/Benchmarker/
  • 60.
    Appendix C Ruby http://www.kuwata-lab.com/presen/rubykaigi2007.pdf http://jp.rubyist.net/magazine/?0022-FasterThanC Java LL http://www.kuwata-lab.com/presen/LL2007LT.pdf http://jp.rubyist.net/magazine/?0024-TemplateSystem http://jp.rubyist.net/magazine/?0024-TemplateSystem2
  • 61.