Showing posts with label python. Show all posts
Showing posts with label python. Show all posts

Saturday, February 16, 2008

XPath: Getting All the Descendant Nodes

For some reason I can never remember the proper XPath for getting all the descendant nodes (both element and text nodes). I figure if I post it on my blog, I can just look it up whenever I forget (or maybe writing it down will force it permanently into my brain). Here's the XPath expression:

//*|//text()

Pretty simple, huh? At first, I thought it was //*|text(), but that doesn't actually work. Neither does //text()|*. Those two XPath expressions aren't even equivalent -- they actually give you different results.

Now for an example! Let's say that you have the following XML:

<html>
<head>
    <title>Converting from Local Time to UTC</title>
    <link rel="stylesheet" href="../preview.css" type="text/css" />
</head>
    <body>
        <div id="meta">
            <table>
                <tr>
                    <td><b>Title:</b></td>
                    <td>Converting from Local Time to UTC</td>
                </tr>
                <tr>
                    <td><b>Entry Id:</b></td>
                    <td>None</td>
                </tr>
                <tr>
                    <td><b>Labels:</b></td>
                    <td>python, utc, datetime</td>
                </tr>
            </table>
        </div>
    </body>
</html>

Using Python's lxml module, we can write a short script that prints out all the element tags and non-whitespace strings:

from lxml import etree

tree = etree.parse(open('temp.xml'))

for node in tree.xpath('//*|//text()'):
    if isinstance(node, basestring):
        if node.strip():
            print repr(node.strip())
    else:
        print '<%s>' % node.tag

Running the above code, we get the following output:

<html>
<head>
<title>
'Converting from Local Time to UTC'
<link>
<body>
<div>
<table>
<tr>
'Converting from Local Time to UTC'
<tr>
<td>
<b>
<tr>
'Title:'
<td>
<td>
<b>
'Entry Id:'
<td>
'None'
<td>
<b>
'Labels:'
<td>
'python, utc, datetime'

As you can see, the XPath expression gives you the element and text nodes in the exact order that they appear in the document.

Friday, February 15, 2008

Python.NET 2.0 for .NET SP1

In a recent post I showed how to compile and install Python.NET 2.0 alpha 2. Unfortunately, if you are using .NET 2.0 SP 1, the instructions won't produce a fully-functional build [1]. That's because some changes in SP1 broke the latest Python.NET code. However, Nicolas Lelong figured out the solution and posted a simple patch to the Python.NET mailing list. The patch is only necessary if you are using .NET 2.0 SP 1, but non-SP 1 systems can use it as well.

Note

I've recompiled the binaries myself and posted them here.

To verify that the patch works correctly, try running the following code.

import clr
clr.AddReference('System.Windows.Forms')
from System.Drawing import Color
from System.Windows.Forms import Form, Button, Label, BorderStyle, DockStyle

count = 0

def onclick(sender, args):
    global count
    count += 1
    l.Text = 'You clicked the {%s} button %d times' % (sender.Text, count)

if __name__ == '__main__':
    f = Form()
    f.Text = 'Hello WinForms!'
    f.Width = 500

    l = Label()
    l.Text = 'This is a label'

    b = Button()
    b.Text = 'Click Me!'
    b.Click += onclick

    # Layout:
    l.Dock = DockStyle.Top
    b.Dock = DockStyle.Top
    f.Controls.Add(l)
    f.Controls.Add(b)

    f.ShowDialog()

If you don't see any weird errors [2], then everything should be fine.

[1]Specifically, delegates won't work. That means, for example, that you won't be able to use Python functions to handle button clicks in a GUI.
[2]

If you are using an unpatched Python.NET with .NET SP1, you would see an error like this:

System.TypeInitializationException: The type initializer for 'Python.Runtime.CodeGenerator' threw an exception.

Wednesday, February 13, 2008

Installing Python.NET 2.0 Alpha 2 on Windows XP

Python.NET is a project that lets you use .NET libraries from within Python [1]. The latest version (2.0) is still alpha as of this writing, so the project owners do not provide binary downloads. Even though it's alpha, I still vastly prefer version 2.0 over 1.0 because it has fewer warts [2], and its API is compatible with IronPython. In practice, it does seem pretty stable too.

I'm going to explain how I compiled and installed Python.NET 2.0 using Visual C# 2008 Express Edition, a free IDE for C#. If you don't have it installed yet, see my previous post.

Note

If you don't want to go through all the hassle of following the steps below, you can download the binary files I created here.

Here are the steps:

  1. Download the source files from Sourceforge. You can choose either a tarball or a zip file.

  2. Extract the contents to your hard drive and open the pythonnet.sln solution file using Visual C#.

  3. Convert the solution file to the Visual C# 2008 format [3].

  4. There are several projects in the solution file. The main one is called Python.Runtime. Right-click on that project and select Properties.

  5. In the Application tab, change Target Framework to ".NET Framework 2.0".

  6. The default is to build binaries for Python 2.5 and UCS2. If you need to change this, you need to make some additional tweaks [4].

  7. From the menu, select Build -> Configuration Manager. In the dialog that opens, change the Active solution configuration to Release. Press the Close button.

  8. From the menu, select Build -> Build Solution.

  9. The binary files you want can be found under src\runtime\bin\Release:

    • Python.Runtime.dll
    • clr.pyd
  10. Copy the two binary files into your Python directory (e.g. C:\Python25).

  11. Test the installation by starting the shell and typing import clr. If there are no errors, it probably worked.

If you're going to PyCon this March and are interested in learning more about Python.NET, then come to my talk! [5]

[1]Python.NET works with CPython, the default implementation of Python. It's not to be confused with IronPython, which is implemented in .NET.
[2]For example, importing System.Data is a pain if you have both .NET 1.1 and .NET 2.0 installed.
[3]Because Python.NET was originally compiled using Visual C# 2005, you will be prompted with the Visual Studio Conversion Wizard dialog. You should go ahead and convert the project.
[4]Go to the Properties -> Build tab [4]. Change the Configuration combo box to Release. Then change the value inside the Conditional compilation symbols field. For example, PYTHON24,UCS4 will build a binary for Python 2.4 with UCS4. For more details about the difference between UCS2 and UCS4, see the readme.html file inside the doc folder.
[5]According to the official schedule, my talk will be held on Friday, March 14 at 2:45 pm.
.. entry_id:: tag:blogger.com,1999:blog-5424252364534723300.post-7884767366505308962

Installing Python.NET 2.0 Alpha 2 on Windows XP
===============================================
.. labels:: python, dotnet, python.net, visual studio

Python.NET_ is a project that lets you use .NET libraries from within Python [#]_. The latest version (2.0) is still alpha as of this writing, so the project owners do not provide binary downloads. Even though it's alpha, I still vastly prefer version 2.0 over 1.0 because it has fewer warts [#]_, and its API is compatible with IronPython_. In practice, it does seem pretty stable too.

.. _Python.NET: http://pythonnet.sourceforge.net
.. _IronPython: http://codeplex.com/ironpython
.. _Visual C# 2008 Express Edition: http://www.microsoft.com/express/vcsharp/

I'm going to explain how I compiled and installed Python.NET 2.0 using `Visual C# 2008 Express Edition`_, a free IDE for C#. If you don't have it installed yet, see my `previous post`_. 

.. _previous post: http://feihonghsu.blogspot.com/2008/02/installing-visual-studio-2008-express.html

.. note::
    If you don't want to go through all the hassle of following the steps below, you can download the binary files I created here_.
    
.. _here: http://feihong.hsu.googlepages.com/PythonNET2.0.zip

Here are the steps:

#. Download the `source files from Sourceforge`_. You can choose either a tarball or a zip file.
#. Extract the contents to your hard drive and open the ``pythonnet.sln`` solution file using Visual C#. 
#. Convert the solution file to the Visual C# 2008 format [#]_.
#. There are several projects in the solution file. The main one is called Python.Runtime. Right-click on that project and select ``Properties``. 
#. In the ``Application`` tab, change ``Target Framework`` to ".NET Framework 2.0".
#. The default is to build binaries for Python 2.5 and UCS2. If you need to change this, you need to make some additional tweaks [#]_.
#. From the menu, select ``Build -> Configuration Manager``. In the dialog that opens, change the ``Active solution configuration`` to ``Release``. Press the ``Close`` button.
#. From the menu, select ``Build -> Build Solution``.
#. The binary files you want can be found under ``src\runtime\bin\Release``:

    - Python.Runtime.dll
    - clr.pyd

#. Copy the two binary files into your Python directory (e.g. ``C:\Python25``).
#. Test the installation by starting the shell and typing ``import clr``. If there are no errors, it probably worked.

.. _source files from Sourceforge: http://sourceforge.net/project/showfiles.php?group_id=162464

If you're going to PyCon_ this March and are interested in learning more about Python.NET, then come to my talk!  [#]_

.. _PyCon: http://us.pycon.org/2008

.. [#] Python.NET works with CPython_, the default implementation of Python. It's not to be confused with IronPython_, which is implemented in .NET.

.. [#] For example, importing ``System.Data`` is a pain if you have both .NET 1.1 and .NET 2.0 installed.

.. [#] Because Python.NET was originally compiled using Visual C# 2005, you will be prompted with the Visual Studio Conversion Wizard dialog. You should go ahead and convert the project.

.. [#] Go to the ``Properties -> Build`` tab [4]_. Change the ``Configuration`` combo box to ``Release``. Then change the value inside the ``Conditional compilation symbols`` field. For example, ``PYTHON24,UCS4`` will build a binary for Python 2.4 with UCS4. For more details about the difference between UCS2 and UCS4, see the ``readme.html`` file inside the ``doc`` folder.

.. [#] According to the official schedule_, my talk will be held on Friday, March 14 at 2:45 pm.

.. _CPython: http://en.wikipedia.org/wiki/Cpython
.. _schedule: http://us.pycon.org/2008/conference/schedule/

Monday, February 11, 2008

Converting from Local Time to UTC

OK, this problem took me way, WAY too long to figure out, so I'll post it here for future reference. I was struggling with following question: How do I convert back and forth between local time and Coordinated Universal Time (UTC)?

The solution is actually fairly simple:

def local_to_utc(t):
    """Make sure that the dst flag is -1 -- this tells mktime to take daylight
    savings into account"""
    secs = time.mktime(t)
    return time.gmtime(secs)

def utc_to_local(t):
    secs = calendar.timegm(t)
    return time.localtime(secs)

OK, simple enough, but I think it's kind of weird -- why the heck do I have to use a function from the calendar module to convert from UTC to local time? Why is the timegm function in calendar instead of time?

Even though my solution looked correct (as per my understanding from reading the Python docs), I decided I needed to do a sanity check. I compared the results of my local_to_utc function with .NET's DateTime.ToUniversalTime method. Meanwhile, I also compared my utc_to_local with .NET's DateTime.ToLocalTime method.

Frankly, I was surprised by the results of my first comparison:

   Local time           Python UTC            .NET UTC
----------------     ----------------     ----------------
2000-03-12 02:00     2000-03-12 07:00     2000-03-12 08:00
2001-03-11 02:00     2001-03-11 07:00     2001-03-11 08:00
2002-03-10 02:00     2002-03-10 07:00     2002-03-10 08:00
2003-03-09 02:00     2003-03-09 07:00     2003-03-09 08:00
2004-03-14 02:00     2004-03-14 07:00     2004-03-14 08:00
2005-03-13 02:00     2005-03-13 07:00     2005-03-13 08:00
2006-03-12 02:00     2006-03-12 07:00     2006-03-12 08:00
2007-03-11 02:00     2007-03-11 07:00     2007-03-11 08:00
2008-03-09 02:00     2008-03-09 07:00     2008-03-09 08:00
2009-03-08 02:00     2009-03-08 07:00     2009-03-08 08:00
2010-03-14 02:00     2010-03-14 07:00     2010-03-14 08:00

The table above shows that Python and .NET disagree on a single hour in March of every year. That particular hour is always 2 AM on the start of Daylight Savings Time [1].

I was surprised yet again when I ran the UTC to local comparison. Having already witnessed the one-day-per-year discrepancy, I figured I would see something similar in this comparison. But there were no discrepancies at all! So I won't bother showing the table.

When I thought about it, though, this all makes sense. That 2 AM on the start of DST is a magical date/time. From our perspective, it's 1:59 AM and then the minute hand crosses over the 12, and all of a sudden it's 3 AM. So it's like that 2 AM never existed at all. Since this date/time doesn't exist in local time, there shouldn't be a way to convert to UTC at all. Technically, the Python function should have returned None, and .NET's DateTime.ToUniversalTime method should have returned null. But instead of doing that, the designers decided to just fudge it and return a UTC value that was somewhere in the ballpark. And since it was two different designers, they chose two different values to fudge on.

There are no discrepancies in the conversion from UTC to local time because here, there are no values to fudge. UTC doesn't have magical date/time values. It just keeps ticking along, and never misses a beat.

I hope this helps other people out. I googled for this information and didn't find any results that clearly explained how to convert in both directions. As for the whole magic local time issue, I think it's not a big deal as long as you're aware of it. The designers of both libraries probably figured that most people would not bother to check if a date conversion function was returning None.

[1]Daylight Savings Time is set individually by each country. Here in the US, our daylight savings period begins at 2:00 am of the second Sunday of March, and ends at 2:00 am of the first Sunday of November. The length of DST was actually changed in 2007 in an attempt to reduce energy consumption.
Converting from Local Time to UTC
=================================
.. labels:: python, utc, datetime

OK, this problem took me way, WAY too long to figure out, so I'll post it here for future reference. I was struggling with following question: How do I convert back and forth between local time and `Coordinated Universal Time`_ (UTC)?

.. _Coordinated Universal Time: http://en.wikipedia.org/wiki/Utc

The solution is actually fairly simple:

.. code:: python

    def local_to_utc(t):
        """Make sure that the dst flag is -1 -- this tells mktime to take daylight
        savings into account"""
        secs = time.mktime(t)
        return time.gmtime(secs)

    def utc_to_local(t):
        secs = calendar.timegm(t)
        return time.localtime(secs)

OK, simple enough, but I think it's kind of weird -- why the heck do I have to use a function from the ``calendar`` module to convert from UTC to local time? Why is the ``timegm`` function in ``calendar`` instead of ``time``?

Even though my solution looked correct (as per my understanding from reading the Python docs), I decided I needed to do a sanity check. I compared the results of my ``local_to_utc`` function with .NET's ``DateTime.ToUniversalTime`` method. Meanwhile, I also compared my ``utc_to_local`` with .NET's ``DateTime.ToLocalTime`` method. 

Frankly, I was surprised by the results of my first comparison::

       Local time           Python UTC            .NET UTC
    ----------------     ----------------     ----------------
    2000-03-12 02:00     2000-03-12 07:00     2000-03-12 08:00
    2001-03-11 02:00     2001-03-11 07:00     2001-03-11 08:00
    2002-03-10 02:00     2002-03-10 07:00     2002-03-10 08:00
    2003-03-09 02:00     2003-03-09 07:00     2003-03-09 08:00
    2004-03-14 02:00     2004-03-14 07:00     2004-03-14 08:00
    2005-03-13 02:00     2005-03-13 07:00     2005-03-13 08:00
    2006-03-12 02:00     2006-03-12 07:00     2006-03-12 08:00
    2007-03-11 02:00     2007-03-11 07:00     2007-03-11 08:00
    2008-03-09 02:00     2008-03-09 07:00     2008-03-09 08:00
    2009-03-08 02:00     2009-03-08 07:00     2009-03-08 08:00
    2010-03-14 02:00     2010-03-14 07:00     2010-03-14 08:00

The table above shows that Python and .NET disagree on a single hour in March of every year. That particular hour is always 2 AM on the start of Daylight Savings Time [#]_. 

I was surprised yet again when I ran the UTC to local comparison. Having already witnessed the one-day-per-year discrepancy, I figured I would see something similar in this comparison. But there were no discrepancies at all! So I won't bother showing the table.

When I thought about it, though, this all makes sense. That 2 AM on the start of DST is a magical date/time. From our perspective, it's 1:59 AM and then the minute hand crosses over the 12, and all of a sudden it's 3 AM. So it's like that 2 AM never existed at all. Since this date/time doesn't exist in local time, there shouldn't be a way to convert to UTC at all. Technically, the Python function should have returned ``None``, and .NET's ``DateTime.ToUniversalTime`` method should have returned ``null``. But instead of doing that, the designers decided to just fudge it and return a UTC value that was somewhere in the ballpark. And since it was two different designers, they chose two different values to fudge on. 

There are no discrepancies in the conversion from UTC to local time because here, there are no values to fudge. UTC doesn't have magical date/time values. It just keeps ticking along, and never misses a beat.

I hope this helps other people out. I googled for this information and didn't find any results that clearly explained how to convert in both directions. As for the whole magic local time issue, I think it's not a big deal as long as you're aware of it. The designers of both libraries probably figured that most people would not bother to check if a date conversion function was returning ``None``.

.. [#] Daylight Savings Time is set individually by each country. Here in the US, our daylight savings period begins at 2:00 am of the second Sunday of March, and ends at 2:00 am of the first Sunday of November. The length of DST was actually `changed in 2007`_ in an attempt to reduce energy consumption.

.. _changed in 2007: http://tf.nist.gov/general/dst.htm

Friday, October 26, 2007

Why .NET Programmers Should Care About Python

OK, so you're a hotshot C# programmer (or VB, JScript, etc). Your skills are in demand, and you have a metric ton of tools at your disposal. XML, database, GUI, web, networking, threading, graphics, it's all there, and the documentation is pretty decent too, so you don't have to spend an eternity figuring out how to use it. Even more esoteric stuff isn't so hard for you. Well, maybe you have to do some extra reading but you're pretty sure you can tackle game programming, message-passing concurrency, handwriting recognition, OCR, speech recognition, and even robotics. So life is good.

But you have this one coworker who just won't shut up about this weird little language he's always using. He says that when he programs, he doesn't use an IDE, he doesn't declare types, he doesn't use curly braces to delimit blocks of code, and he never compiles anything. &quot;WTF?&quot; you say. &quot;Who would want to use a language like that?&quot; Your coworker gets a sneaky little smile on his face as if that's just what he was waiting to hear. He pulls out a crinkly piece of paper with a list of names written on it and hands it to you. You recognize every name on that paper, and it's not because they're people you know. They're the names of companies that everyone knows.

Google. Microsoft. VMWare. Nokia. HP. Cisco. Sony Imageworks. Canonical. Philips. Honeywell. And the list just goes on and on and on. "What language did you say you're using?" you ask. Your coworker stands up like a bolt, throws off his sweater vest, rips off his Simpsons T-shirt in a bad imitation of Hulk Hogan, and proceeds to point triumphantly with both index fingers at a single word tattooed on his pasty, hairless chest. It says: Python. Under that there's a teeny little cartoon snake and under that some English guy with a mustache. And on his navel you see...

OK, enough with the story. It's getting really weird anyway. The point is, companies around the world are using Python everyday to make their products and deliver value to their customers. That means that smart people, people like you, are using Python and doing amazing things with it. And they are doing these amazing things much more efficiently than you might suspect...

OK, stop with the marketing talk and just show some code already. The following code is supposed to get customer information from an XML file and print the names and emails of customers who joined in 2006 or later. A snippet of that XML file might look like this:

<customer>
    <givenName>Greg</givenName>
    <familyName>Rucka</familyName>
    <contact email="agent001@queenandcountry.com"
             phone="986.445.1200" />
    <memberSince>1994-01-03</memberSince>
</customer>

We want some output that looks like this:

Ang Lee <director@lustcaution.net>
Hayao Miyazaki <porco@ghibli.co.jp>
Joe Armstrong <joe@erlang.org>

The C# code to handle this task would look something like:

using System;
using System.Xml.XPath;

public class Task
{
    public static void Main()
    {
        XPathDocument xpd = new XPathDocument("customers.xml");
        XPathNavigator nav = xpd.CreateNavigator();

        foreach (XPathNavigator customer in nav.Select("//customer")) {
            DateTime memberSince = DateTime.Parse(
              customer.SelectSingleNode("memberSince").Value);

            string givenName = customer.SelectSingleNode("givenName").Value;
            string familyName = customer.SelectSingleNode("familyName").Value;
            string email = customer.SelectSingleNode("contact/@email").Value;

            if (memberSince.Year >= 2006) {
                Console.WriteLine("{0} {1} <{2}>", givenName, familyName,
                    email);
            }
        }
    }
}

The equivalent Python code would be:

import clr
from System import DateTime
import amara

doc = amara.parse('customers.xml')

for customer in doc.xml_xpath('//customer'):
    print type(node)
    memberSince = DateTime.Parse(str(customer.memberSince))

    if memberSince.Year >= 2006:
        print "%s %s <%s>" % (customer.givenName, customer.familyName,
                              customer.contact['email'])

Clearly, the Python code is shorter and more understandable ;-) If you don't believe me, take a look at the the two code samples side by side. We'll go over the code line by line so you understand what's going on, but first I want to mention that the Python code doesn't need to be compiled. If you are using a Python IDE (like IDLE), you can execute the script and get your result right away!

The first three lines import the libraries and classes that we need. You can tell that Python's import statement is roughly equivalent to .NET's using statement. Except in Python we import modules, not namespaces (the difference will be explained below).

import clr

Import the clr module, which gives us access to the .NET classes.

from System import DateTime

Import the DateTime class from the System module. This is the same .NET DateTime class you know and love.

import amara

Import the amara module, which contains classes and functions for handling XML. Note that this statement does NOT import everything under the amara module, it just imports the amara module itself. Also, the amara module is not included with Python, it is actually a third party module that you can download here. Amara is similar to other DOM-based XML libraries, except much easier to use than most.

doc = amara.parse('customers.xml')

This line creates an object named <code>doc</code> by calling the <code>parse()</code> function in the amara module. The big difference between modules in Python and namespaces in C# is that modules are objects, and can contain attributes and functions just like any other object.

for customer in doc.xml_xpath('//customer'):

This is a loop using the for keyword, which is similar to C#'s foreach keyword. The doc.xml_xpath('//customer') expression returns all the customer nodes inside doc. Each customer node will be bound to the variable customer.

memberSince = DateTime.Parse(str(customer.memberSince))

The str() function converts any object to a string. The expression customer.memberSince refers to the customers/customer/memberSince node in the DOM. Finally, we call the DateTime.Parse() method we know and love from .NET.

if memberSince.Year >= 2006:

This is a just a simple conditional that checks if the Year property of memberSince is greater than or equal to 2006.

print "%s %s <%s>" % (customer.givenName,
                      customer.familyName,
                      customer.contact['email'])

The print statement is very similar to C's printf() function. The interpolation syntax is identical, and the usage only differs in that the values to be interpolated are listed after the % symbol. The customer.contact['email'] expression refers to the customers/customer/contact/@email node in the DOM.

From that one example, you know almost all you possibly need to know to make effective use of amara. Python has great libraries of its own, which are worth learning because of their simplicity and flexibility. But even if you're too busy to learn them, it doesn't matter because Python allows you to leverage all of your .NET API knowledge. Here's the same example in Python, but using only .NET libraries:

import clr
from System import DateTime
from System.Xml.XPath import XPathDocument

nav = XPathDocument('customers.xml').CreateNavigator()

for customer in nav.Select('//customer'):
    memberSince = DateTime.Parse(
        node.SelectSingleNode('memberSince').Value)

    if memberSince.Year >= 2006:
        print "%s %s <%s>" % (
            customer.SelectSingleNode('givenName').Value,
            customer.SelectSingleNode('familyName').Value,
            customer.SelectSingleNode('contact/@email').Value)

So if you've stayed with me all the way then you might have these questions in mind:

  • Python looks interesting, but will it really make my programming easier?
  • How do I get started learning Python syntax?
  • How do I get started learning how to use Python with .NET?

All these questions will be answered in time, my friend. But this blog post has gotten rather long, and I need to go to bed.

.. entry_id:: tag:blogger.com,1999:blog-5424252364534723300.post-3233371896959427476

Why .NET Programmers Should Care About Python
=============================================
.. labels:: python, dotnet

OK, so you're a hotshot C# programmer (or VB, JScript, etc). Your skills are in demand, and you have a metric ton of tools at your disposal. XML, database, GUI, web, networking, threading, graphics, it's all there, and the documentation is pretty decent too, so you don't have to spend an eternity figuring out how to use it. Even more esoteric stuff isn't so hard for you. Well, maybe you have to do some extra reading but you're pretty sure you can tackle `game programming`_, `message-passing concurrency`_, `handwriting recognition`_, OCR_, `speech recognition`_, and even robotics_. So life is good.

.. _game programming: http://en.wikipedia.org/wiki/Microsoft_XNA
.. _message-passing concurrency: http://en.wikipedia.org/wiki/Concurrency_and_Coordination_Runtime
.. _handwriting recognition: http://www.codeproject.com/mobilepc/StrokeViewer.asp
.. _OCR: http://www.codeproject.com/office/modi.asp
.. _speech recognition: http://en.wikipedia.org/wiki/Speech_Application_Programming_Interface
.. _robotics: http://en.wikipedia.org/wiki/Robotics_Studio

But you have this one coworker who just won't shut up about this weird little language he's always using. He says that when he programs, he doesn't use an IDE, he doesn't declare types, he doesn't use curly braces to delimit blocks of code, and he never compiles anything. &quot;WTF?&quot; you say. &quot;Who would want to use a language like that?&quot; Your coworker gets a sneaky little smile on his face as if that's just what he was waiting to hear. He pulls out a crinkly piece of paper with a list of names written on it and hands it to you. You recognize every name on that paper, and it's not because they're people you know. They're the names of companies that everyone knows. 

Google. Microsoft. VMWare. Nokia. HP. Cisco. Sony Imageworks. Canonical. Philips. Honeywell. And the list just goes on and on and on. "What language did you say you're using?" you ask. Your coworker stands up like a bolt, throws off his sweater vest, rips off his Simpsons T-shirt in a bad imitation of Hulk Hogan, and proceeds to point triumphantly with both index fingers at a single word tattooed on his pasty, hairless chest. It says: Python. Under that there's a teeny little cartoon snake and under that some English guy with a mustache. And on his navel you see...

OK, enough with the story. It's getting really weird anyway. The point is, companies around the world are using Python everyday to make their products and deliver value to their customers. That means that smart people, people like you, are using Python and doing amazing things with it. And they are doing these amazing things much more efficiently than you might suspect...

OK, stop with the marketing talk and just show some code already. The following code is supposed to get customer information from an XML file and print the names and emails of customers who joined in 2006 or later. A snippet of that XML file might look like this:

.. code:: xml

    <customer>
        <givenName>Greg</givenName>
        <familyName>Rucka</familyName>
        <contact email="agent001@queenandcountry.com"
                 phone="986.445.1200" />
        <memberSince>1994-01-03</memberSince>
    </customer>

We want some output that looks like this::

    Ang Lee <director@lustcaution.net>
    Hayao Miyazaki <porco@ghibli.co.jp>
    Joe Armstrong <joe@erlang.org>

The C# code to handle this task would look something like:

.. code:: c#

    using System;
    using System.Xml.XPath;

    public class Task
    {
        public static void Main()
        {
            XPathDocument xpd = new XPathDocument("customers.xml");
            XPathNavigator nav = xpd.CreateNavigator();

            foreach (XPathNavigator customer in nav.Select("//customer")) {
                DateTime memberSince = DateTime.Parse(
                  customer.SelectSingleNode("memberSince").Value);

                string givenName = customer.SelectSingleNode("givenName").Value;
                string familyName = customer.SelectSingleNode("familyName").Value;
                string email = customer.SelectSingleNode("contact/@email").Value;

                if (memberSince.Year >= 2006) {
                    Console.WriteLine("{0} {1} <{2}>", givenName, familyName,
                        email);
                }
            }
        }
    }

The equivalent Python code would be:

.. code:: python

    import clr
    from System import DateTime
    import amara

    doc = amara.parse('customers.xml')

    for customer in doc.xml_xpath('//customer'):
        print type(node)
        memberSince = DateTime.Parse(str(customer.memberSince))

        if memberSince.Year >= 2006:
            print "%s %s <%s>" % (customer.givenName, customer.familyName,
                                  customer.contact['email'])


Clearly, the Python code is shorter and more understandable ;-) If you don't believe me, take a look at the the two code samples side by side. We'll go over the code line by line so you understand what's going on, but first I want to mention that the Python code doesn't need to be compiled. If you are using a Python IDE (like IDLE_), you can execute the script and get your result right away!

.. _side by side: http://feihong.hsu.googlepages.com/CSharpVsPython.html
.. _IDLE: http://en.wikipedia.org/wiki/IDLE_%28Python%29

The first three lines import the libraries and classes that we need. You can tell that Python's ``import`` statement is roughly equivalent to .NET's ``using`` statement. Except in Python we import modules, not namespaces (the difference will be explained below). 

.. code:: python

    import clr
    
Import the clr module, which gives us access to the .NET classes.

.. code:: python 
    
    from System import DateTime
    
Import the DateTime class from the System module. This is the same .NET DateTime class you know and love.

.. code:: python
    
    import amara

Import the amara module, which contains classes and functions for handling XML. Note that this statement does NOT import everything under the amara module, it just imports the amara module itself. Also, the amara module is not included with Python, it is actually a third party module that you can download here_. Amara is similar to other DOM-based XML libraries, except much easier to use than most.

.. _here: http://uche.ogbuji.net/tech/4suite/amara/

.. code:: python

    doc = amara.parse('customers.xml')

This line creates an object named <code>doc</code> by calling the <code>parse()</code> function in the amara module. The big difference between modules in Python and namespaces in C# is that modules are objects, and can contain attributes and functions just like any other object.

.. code:: python

    for customer in doc.xml_xpath('//customer'):

This is a loop using the ``for`` keyword, which is similar to C#'s ``foreach`` keyword. The ``doc.xml_xpath('//customer')`` expression returns all the customer nodes inside ``doc``. Each customer node will be bound to the variable ``customer``.

.. code:: python

    memberSince = DateTime.Parse(str(customer.memberSince))
    
The ``str()`` function converts any object to a string. The expression ``customer.memberSince`` refers to the ``customers/customer/memberSince`` node in the DOM. Finally, we call the ``DateTime.Parse()`` method we know and love from .NET.

.. code:: python

    if memberSince.Year >= 2006:

This is a just a simple conditional that checks if the ``Year`` property of ``memberSince`` is greater than or equal to 2006.

.. code:: python

    print "%s %s <%s>" % (customer.givenName, 
                          customer.familyName,
                          customer.contact['email'])
                          
The ``print`` statement is very similar to C's ``printf()`` function. The interpolation syntax is identical, and the usage only differs in that the values to be interpolated are listed after the ``%`` symbol. The ``customer.contact['email']`` expression refers to the ``customers/customer/contact/@email`` node in the DOM.


From that one example, you know almost all you possibly need to know to make effective use of amara.  Python has great libraries of its own, which are worth learning because of their simplicity and flexibility. But even if you're too busy to learn them, it doesn't matter because Python allows you to leverage all of your .NET API knowledge. Here's the same example in Python, but using only .NET libraries:

.. code:: python

    import clr
    from System import DateTime
    from System.Xml.XPath import XPathDocument

    nav = XPathDocument('customers.xml').CreateNavigator()

    for customer in nav.Select('//customer'):
        memberSince = DateTime.Parse(
            node.SelectSingleNode('memberSince').Value)

        if memberSince.Year >= 2006:
            print "%s %s <%s>" % (
                customer.SelectSingleNode('givenName').Value,
                customer.SelectSingleNode('familyName').Value,
                customer.SelectSingleNode('contact/@email').Value)

So if you've stayed with me all the way then you might have these questions in mind:

- Python looks interesting, but will it really make my programming easier?
- How do I get started learning Python syntax?
- How do I get started learning how to use Python with .NET?

All these questions will be answered in time, my friend. But this blog post has gotten rather long, and I need to go to bed.

Monday, March 12, 2007

Unicode Talk

On March 8, 2007 I gave a presentation on Unicode to the Chicago Python Users Group. Unlike most talks on Unicode, mine was geared for small children.

Anyway, here are the downloads for the talk in various formats:

  • OpenOffice Impress (this is the best version to look at, if you have OpenOffice installed)
  • PDF (my notes are embedded into the PDF, but you have to scroll to the end to see them)
  • HTML (warning: the "horse vs unicode" and "ISO8859 vs unicode" tables don't show up)

Also, here are the demos associated with the talk. I didn't have time to show any of them, but hopefully the comments inside the source files are pretty understandable.