Get Firefox!

Python Gotchas

Home
Updated: 2005-08-15
Stephen Ferg

For best results when printing, set your browser's font size to SMALLEST before printing.
Visit Pythonology.org for more information about Python


Creative Commons Licence This work is licensed under the Creative Commons Attribution 2.0 License You are free to copy, distribute, and display the work, and to make derivative works (including translations). If you do, you must give the original author credit. The author specifically permits (and encourages) teachers to post, reproduce, and distribute some or all of this material for use in their classes or by their students.


About this Page

What is a "gotcha"?

The word "gotcha" started out as the expression "Got you!" This is something that someone who speaks idiomatic American English might say when he succeeds in playing a trick or prank on someone else. "I really got you with that trick!"

The expression "Got you!" is pronounced "Got ya!" or "Got cha!".

Among computer programmers, a "gotcha" has become a term for a feature of a programming language that is likely to play tricks on you to display behavior that is different than what you expect.

Just as a fly or a mosquito can "bite" you, we say that a gotcha can "bite" you.

This is a page devoted to Python "gotchas". Python is a very clean and intuitive language, so it hasn't got many gotchas, but it still has a few that often bite beginning Python programmers. My hope is that if you are warned in advance about these gotchas, you won't be bit quite so hard!

Note that a gotcha isn't necessarily a problem in the language itself. Rather, it is a situation in which there is a mismatch between the programmer's expections of how the language will work, and the way the language actually does work. Often, the source of a gotcha lies not in the language, but in the programmer. Part of what creates a programmer's expectations is his own personal background. A programmer with a Windows or mainframe background, or a background in COBOL or the Algol-based family of languages (PL/1, Pascal, etc.), is especially prone to experiencing gotchas in Python, a language that evolved in a Unix environment and incorporates a number of conventions of the C family of programming languages (C, C++, Java).

If you're such a programmer, don't worry. There aren't many Python gotchas. Keep learning Python. It is a great language, and you'll soon come to love it.

Other Lists of Python Gotchas

Table of Contents

 
 
 
 
 
 
 
 
 

1 Backslashes are escape characters

This is a language feature that is so common on Unix that Unix programmers never think twice about it. Certainly, a Unix programmer would never consider it to be a gotcha. But for someone coming from a Windows background, it may very well be unfamiliar.

The gotcha may occur when you try to code a Windows filename like this:

myFilename = "c:\newproject\typenames.txt"
myFile = open(myFilename, "r")

and — even though the input file exists — when you run your program, you get the error message

IOError: [Errno 2] No such file or directory: 'c:\newproject\typenames.txt'

To find out what's going on, you put in some debugging code:

myFilename = "c:\newproject\typenames.txt"
print "(" + myFilename + ")"

And what you see printed on the console is:

(c:
ewproject       ypenames.txt)

What has happened is that you forgot that in Python (as in most languages that evolved in a Unix environment) in quoted string literals the backslash has the magical power of an escape character. This means that a backslash isn't interpreted as a backslash, but as a signal that the next character is to be given a special interpretation. So when you coded

myFilename = "c:\newproject\typenames.txt"

the "\n" that begins "\newproject" was interpreted as the newline character, and the "\t" that begins "\typenames.txt" was interpreted as the tab character. That's why, when you printed the filename, you got the result that you did. And it is why Python couldn't find your file — because no file with the name c:(newline)ewproject(tab)ypenames.txt could be found.

To put a backslash into a string, you need to code two backslashes — that is, the escape character followed by a backslash. So to get the filename that you wanted, you needed to code

myFilename = "c:\\newproject\\typenames.txt"

And under some circumstances, if Python prints information to the console, you will see the two backslashes rather than one. For example, this is part of the difference between the repr() function and the str() function.

myFilename = "c:\\newproject\\typenames.txt"
print repr(myFilename), str(myFilename)

produces

'c:\\newproject\\typenames.txt' c:\newproject\typenames.txt

Escaped characters are documented in the Python Language Reference Manual. If they are new to you, you will find them disconcerting for a while, but you will gradually grow to appreciate their power.

2 "raw" strings and backslashes (when dealing with Windows filenames)

Once upon a time there was a beatiful Windows programmer named Red Ridinghood.

One day, Red's supervisor told her that they were going to start building a new application called GrandmasHouse. The feature list for the application was so long that they would never have attempted to get to GrandmasHouse if they hadn't learned about a shortcut through Python Woods that would make the journey much shorter.

So Red started working her way through Python, and indeed found the going quick and easy. She loved the woods, and was happy to be travelling in them.

There was only one problem. Her programs did a lot of file manipulation, and so she had to do a lot of coding of filenames. Windows filenames used a backslash as a separator, but within Python the backslash had the magic power of an escape character, so every time she wanted a backslash in a filename she had to code two backslashes, like this:

myServer = "\\\\aServer" # ==> \\aServer
myFilename = myServer + "\\aSharename\\aDirName\\aFilename"

This feature of Python got very old very quickly. Red started calling it The Wolf, and it was the one part of Python that she hated.

One day as she was walking through the forest, she came to a clearing. In the clearing was a charming little pub, and inside the pub she met a tall, dark, and handsome stranger named Rawstrings.

Rawstrings said he could save her from The Wolf. All she had to do, he said, was to put an "r" in front of her quoted string literals. This would change them from escaped strings into raw strings. The backslash would lose its powers as an escape character, and become just an ordinary character. For example, with raw strings, you could code r"\t" and you wouldn't get a string contining a single tab character — you would get a string containing the backslash character followed by "t".

So Red, instead of coding

myServer = "\\\\aServer"

could just code

myServer = r"\\aServer"

Red was seduced by the things that Rawstrings was telling her, and she began to spend a lot of time in his company.

Then one day, she coded

myDirname = r"c:\aDirname\"

and her program blew up with the following message:

myDirname = r"c:\aDirname\"
                         ^
SyntaxError: invalid token

After some experimenting, she discovered that — contrary to what Rawstrings had told her — the backslash seemingly hadn't lost all of its magic powers after all. For example, she could code:

aString = r"abc\"xyz"; print aString

When she did this, it seemed perfectly legal. The double-quote just before "xyz" did not close the raw string at all. Somehow the backslash seemed to protect it — it wasn't recognized as the closing delimiter of the raw string, but was included in the string. When she coded print aString, she got

abc\"xyz

It was this protective character that the backslash had acquired that made

myDirname = r"c:\aDirname\"

blow up. The final backslash was protecting the closing double-quote, so it was not being recognized as a closing quote. And since there was nothing after the double-quote, the raw string was not closed, and she got an error. She tried coding the raw string with two backslashes at the end — as if the backslash was an escape character —

myDirname = r"c:\aDirname\\"

but that didn't do it either. Instead of getting the single closing backslash that she wanted, she got two backslashes at the end:

c:\aDirname\\

She was in despair. She couldn't figure out any way to use raw strings to put a single backslash at the end of a string, and she didn't want to have to go back to fighting The Wolf.

Fortunately, at this point she confided her troubles to Paul Woodman, a co-worker who had started exploring Python a few months earlier. Here is what he told her.

In raw strings, backslashes do not have the magical power of escape characters that they do in regular strings. But they don't lose all of their magical powers.

In raw strings — as you discovered — backslashes have the magical power of protection characters. Basically, this means that a backslash protects any character that follows it from being recognized as the closing delimiter of the raw string.

Coming from a Windows programming background, you assumed that support for raw strings was a feature whose purpose was to make the work of coding Windows filenames easier by removing the magical escape character powers from the backslash. And you were surprised to discover that raw strings aren't truly raw in the way that you expected — raw in the sense that the backslash had no magical powers.

The reason for the special powers of backslashes in raw strings is that — contrary to what you assumed — raw strings were not developed to make it easier for Windows programmers to code filenames containing backslash characters. In fact, raw strings were originally developed to make the work of coding regular expressions easier. In raw strings, the backslash has the magical power of a protection character because that is just the kind of behavior it needs to have in order to make it easier to code regular expressions. The feature that you can't end a raw string with a single backslash is not a bug. It is a feature, because it is not legal to end a regular expression with a single backslash (or an odd number of backslashes).

Unfortunately for you, this power makes it impossible to create a raw string that ends in a single backslash, or in an odd number of backslashes. So raw expressions won't do what you want them to, namely save you from The Wolf.

But don't give despair! There is a way...

In Python, there are a number of functions in the os.path module that change forward slashes in a string to the appropriate filename separator for the platform that you are on. One of these function is os.path.normcase(). The trick is to enter all of your filename strings using forward slashes, and then let os.path.normcase() change them to backslashes for you, this way.

myDirname = os.path.normcase("c:/aDirname/")

It takes a bit of practice to get into the habit of specifying filenames this way, but you'll find that you adapt to it surprisingly easily, and you'll find it a lot easier than struggling with The Wolf.

Red was super happy to hear this. She transferred to Woodman's project team, and they all coded happily ever after!

3 Integer division

This is the feature of Python that (along with case-sensitivity) most frequently bites students who are learning Python as their first programming language.

The gotcha is this — if you divide an integer by an integer, you get an integer. If the division produces a remainder, the result is truncated downward. (Note that this is true truncation, not rounding toward zero.)

print 5/3     # produces 1
print (-5)/3  # produces -2
print 5/3.0   # produces 1.66666666667, because 3.0 is a float, not an integer

This behavior is called "classic" division, and is something that Python inherited from the C programming language. What one would like, of course, is for Python division to behave more intuitively, like "true" division:

print 5/3      # produces  1.66666666667
print (-5)/3   # produces -1.66666666667

Starting in Python 2.2, the process of slowly changing the behavior of the division operator began. First, a new floor operator // was introduced to provide division with downward truncation of the results. Second, a technique was introduced for a programmer to make the division operator behave in the more intuitive "true division" fashion. In order to get true division, you need to put this statement at the top of your module.

from __future__ import division

In Python version 2.3, Python will continue to use old-style classic division by default, but will also issue a warning whenever division is applied to two integers. You can use this to find code that's affected by the change and fix it. The fix — depending on what you want your program to be doing — will mean either changing the / division operator to the // floor operator, or adding the from __future__ import division statement to your module.

Eventually (in Python version 3) the new "true division" behavior for the division operator will become standard, and you won't need the from __future__ import division statement.

The bottom line is that you should not be running a version of Python older than 2.2, and should upgrade to 2.3 as soon as possible, and you should put from __future__ import division at the beginning of all of your modules.

For more information on this subject, see section 6 of What's New in Python 2.2.

4 A comma at the end of a print statement prevents a newline from being written... but appears to write a trailing space.

It doesn't really write a trailing space -- it just causes an immediately subsequent print statement to write a leading space!

The Python Language Reference Manual says, about the print statement,

A "\n" character is written at the end, unless the print statement ends with a comma.

But it also says that if two print statements in succession write to stdout, and the first one ends with a comma (and so doesn't write a trailing newline), then the second one prepends a leading space to its output. (See section "6.6 The print statement" in the Python Reference Manual. Thanks to Marcus Rubenstein and Hans Meine for pointing this out to me.)

So

for i in range(10): print "*",
print

produces

* * * * * * * * * *

If you want to print a string without any trailing characters at all, your best bet is to use sys.stdout.write()

import sys
for i in range(10): sys.stdout.write("*")
sys.stdout.write("\n")

produces

**********

5 Omitting parentheses when invoking a method

In Python, omitting the trailing parentheses from the end of a method call (one that takes no arguments) is not a syntax error.

The place where this most frequently bites me is with the "close" method on file objects. Suppose you have an output file called "foo" and you want to close it. The correct way to do this is:

foo.close()

However, if you accidentally omit the trailing parentheses, and code this:

foo.close

Python will not report a syntax error, because this is not an error in Python. In Python, this is a perfectly legitimate statement that returns the method object "close". (Remember that methods are first-class objects in Python.) If you do this in the Python interpreter, you will get a message like this:

<built-in method close of file object at 0x007E6AE0>

The nastiness about this gotcha is that if you fail to code the trailing parentheses on a "close" method for an output file, the output file will not be closed properly. The file's output buffer will not be flushed out to disk, and the part of the output stream that was still left in the output buffer will be lost. After your program finishes, part of your output file will be missing, and you won't know why.

The best way of dealing with this gotcha is just to be aware that it can be a problem, and to be alert. Be careful to code the parenthese on your method calls, and especially careful to code them on calls to the "close" method of file objects.

And if you find yourself with an output file that seems to be inexplicably truncated, your first thought should be to check for missing parentheses in the file.close() statement that closes the file.

Programs like PyChecker and PyLint may be able to detect this kind of error, which is one good reason to use them.

6 Mutable defaults for function/method arguments

There's a Python gotcha that bites everybody as they learn Python. In fact, I think it was Tim Peters who suggested that every programmer get caught by it exactly two times. It is call the mutable defaults trap. Programmers are usually bit by the mutable defaults trap when coding class methods, but I'd like to begin with explaining it in functions, and then move on to talk about class methods.

The gotcha occurs when you are coding default values for the arguments to a function or a method. Here is an example for a function named functionF:

def functionF(argString = "abc", argList = []):

Here's what most beginning Python programmers believe will happen when functionF is called without any arguments:

A new string object containing "abc" will be created and bound to the "argString" variable name. A new, empty list object will be created and bound to the "argList" variable name. In short, if the arguments are omitted by the caller, the functionF will always get "abc" and [] in its arguments.

This, however, is not what will happen. Here's why.

The objects that provide the default values are not created at the time that functionF is called. They are created at the time that the statement that defines the function is executed. If functionF, for example, is contained in a module named moduleM, then the statement that defines functionF will probably be executed at the time when moduleM is imported.

When the def statement that creates functionF is executed:

After that, whenever functionF is called without arguments, argString will be bound to the default string object, and argList will be bound to the default list object. In such a case, argString will always be "abc", but argList may or may not be an empty list. Here's why.

There is a crucial difference between a string object and a list object. A string object is immutable, whereas a list object is mutable. That means that the default for argString can never be changed, but the default for argList can be changed.

Let's see how the default for argList can be changed. Here is a program. It invokes functionF four times. Each time that functionF is invoked it displays the values of the arguments that it receives, then adds something to each of the arguments.

def functionF(argString="abc", argList = []):
        print argString, argList
        argString = argString + "xyz"
        argList.append("F")

for i in range(4): functionF()

The output of this program is:

abc []
abc ['F']
abc ['F', 'F']
abc ['F', 'F', 'F']

As you can see, the first time through, the argument have exactly the default that we expect. On the second and all subsequent passes, the argString value remains unchanged — just what we would expect from an immutable object. The line

argString = argString + "xyz"

creates a new object — the string "abcxyz" — and binds the name "argString" to that new object, but it doesn't change the default object for the argString argument.

But the case is quite different with argList, whose value is a list — a mutable object. On each pass, we append a member to the list, and the list grows. On the fourth invocation of functionF — that is, after three earlier invocations — argList contains three members.

6.1 The Solution

This behavior is not a wart in the Python language. It really is a feature, not a bug. There are times when you really do want to use mutable default arguments. One thing they can do (for example) is retain a list of results from previous invocations, something that might be very handy.

But for most programmers — especially beginning Pythonistas — this behavior is a gotcha. So for most cases we adopt the following rules.

  1. 6.2 Never use a mutable object — that is: a list, a dictionary, or a class instance — as the default value of an argument.

  2. 6.3 Ignore rule 1 only if you really, really, REALLY know what you're doing.

So... we plan always to follow rule #1. Now, the question is how to do it... how to code functionF in order to get the behavior that we want.

Fortunately, the solution is straightforward. The mutable objects used as defaults are replaced by None, and then the arguments are tested for None.

def functionF(argString="abc", argList = None):
        if argList is None: argList = []
        ...

Another solution that you will sometimes see is this:

def functionF(argString="abc", argList=None):
        argList = argList or []
        ...

This solution, however, is not equivalent to the first, and should be avoided. See Learning Python p. 123 for a discussion of the differences. Thanks to Lloyd Kvam for pointing this out to me.

And of course, in some situations the best solution is simply not to supply a default for the argument.

Now let's look at how the mutable arguments gotcha presents itself when a class method is given a mutable default for one of its arguments. Here is a complete program.

# define a class for company employees
class Employee:

        def __init__ (self, argName, argDependents=[]):
                # an employee has two attributes: a name, and a list of his dependents
                self.name = argName
                self.Dependents = argDependents

        def addDependent(self, argName):
                # an employee can add a dependent by getting married or having a baby
                self.Dependents.append(argName)

        def show(self):
                print
                print "My name is.......: ", self.name
                print "My dependents are: ", str(self.Dependents)


#---------------------------------------------------
#   main routine -- hire employees for the company
#---------------------------------------------------

# hire a married employee, with dependents
joe = Employee("Joe Smith", ["Sarah Smith", "Suzy Smith"])

# hire a couple of unmarried employess, without dependents
mike = Employee("Michael Nesmith")
barb = Employee("Barbara Bush")

# mike gets married and acquires a dependent
mike.addDependent("Nancy Nesmith")

# now have our employees tell us about themselves
joe.show()
mike.show()
barb.show()

Let's look at what happens when this program is run. First, the code that defines the Employee class is run. Then we hire Joe. Joe has two dependents, so that fact is recorded at the time that the joe object is created. Next we hire Mike and Barb. Then Mike acquires a dependent. Finally, the last three statements of the program ask each employee to tell us about himself. Here is the result.

My name is.......:  Joe Smith
My dependents are:  ['Sarah Smith', 'Suzy Smith']

My name is.......:  Michael Nesmith
My dependents are:  ['Nancy Nesmith']

My name is.......:  Barbara Bush
My dependents are:  ['Nancy Nesmith']

Joe is just fine. But somehow, when Mike acquired Nancy as his dependent, Barb also acquired Nancy as a dependent. This of course is wrong. And we're now in a position to understand what is causing the program to behave this way.

When the code that defines the Employee class is run, objects for the class definition, the method definitions, and the default values for each argument are created. The constructor has an argument argDependents whose default value is an empty list, so an empty list object is created and attached to the __init__ method as the default value for argDependents.

When we hire Joe, he already has a list of dependents, which is passed in to the Employee constructor — so the argDependents attribute does not use the default empty list object.

Next we hire Mike and Barb. Since they have no dependents, the default value for argDependents is used. Remember — this is the empty list object that was created when the code that defined the Employee class was run. So in both cases, the empty list is bound to the argDependents argument, and then — again in both cases — it is bound to the self.Dependents attribute. The result is that after Mike and Barb are hired, the self.Dependents attribute of both Mike and Barb point to the same object — the default empty list object.

When Michael gets married, and Nancy Nesmith is added to his self.dependents list, Barb also acquires Nancy as a dependent, because Barb's self.dependents variable name is bound to the same list object as Mike's self.dependents variable name.

So this is what happens when mutuable objects are used as defaults for arguments in class methods. If the defaults are used when the method is called, different class instances end up sharing references to the same object.

And that is why you should never, never, NEVER use a list or a dictionary as a default value for an argument to a class method. Unless, of course, you really, really, REALLY know what you're doing.