Showing posts with label object-oriented-programming. Show all posts
Showing posts with label object-oriented-programming. Show all posts

Wednesday, August 13, 2014

Getting more familiar with Python Classes and Objects.


If you've been following this blog, you're aware that I feel unprepared to make good use of the Object Oriented Programming facilities of the Python programming language. Python is kind enough to allow programming in non-OOP styles, but Python's OOP features keep glaring at me, reminding me of what I don't comfortably know.


I posted a question on Quora.com asking for real world examples of multiple inheritance being used in Python programs. I was disappointed about how few answers came back. That dearth of response left me with the suspicion that I'm not the only person around who isn't completely comfortable with multiple inheritance. http://www.quora.com/What-is-a-real-world-example-of-the-use-of-multiple-inheritance-in-Python. Looking around at other multiple inheritance questions on Quora (http://www.quora.com/Why-is-there-multiple-inheritance-in-C++-but-not-in-Java), I see some reason to suspect that super serious application of OOP is just going to need more time to sink into the heads of the majority of developers. So, I'll continue to watch and learn, but will try to remember to adhere to the KISS principle to the greatest extent possible.


Additional Lessons


  1. This 40 minute video from Ray Hettinger (The Art of Subclassing) explains how Python classes differ from classes in other languages. Ray tries to reshape people's thinking here, so if you aren't already deeply steeped in OOP lore, you may feel he's dwelling on the obvious. He may give you some ideas of appropriate uses of inheritance in Python objects.


  2. Ray mentions this 2nd talk in his video. This 2nd talk was the next talk after his at Pycon 2012. "Stop Writing Classes", Jack Diederich, 28 minutes. Basically, that video asserts that my own example so far of writing a class for a Python program is not very good. The clue: My example class had only __init__ and one other method. I could have written it as a simple function and used the partial function from the library functools module to handle the initialization.


Further Reading


I have 3 previous blog articles on OOP in Python.


In Creeping up on OOP in Python - Part 1 I described a use of an object-oriented library module, pyparsing, to solve a Project Euler problem.


In Creeping up on OOP in Python - Part 2 I reworked my solution to add a simple class of my own. I was happy that introducing that class made the code cleaner to read. But if you watched the "Stop Writing Classes" video given up above in this blog article, you'll probably notice that my class is an example of exactly what they say you shouldn't do.What can I say? I'm still learning this stuff.


The 3rd in my Creeping up on OOP in Python" series was a bit different from the first 2. It explored an academic question about multiple inheritance. It is exactly the kind of A, B, C example that Ray mentions avoiding in his talk. Creeping up on OOP in Python - Part 3. I haven't forgotten my promise of a Part 4 as soon as I have a practical example of multiple inheritance to substitute for A, B, C and D in that academic question. But so far, there is no Part 4.


Ray mentions "the gang of 4". If you aren't familiar with them, here's a reference for you to pursue: http://en.wikipedia.org/wiki/Design_Patterns. And he mentions "Uncle Bob". I also mentioned Uncle Bob, with some links here: SOLID software design.


Know more about this OOP stuff then I do? Well, don't keep the info to yourself. Please post a comment with a link or example to help me learn more. Have you found a particularly relevant MOOC that you'd suggest?

Thursday, May 15, 2014

Comparative Programming Languages

Here are a couple of videos from luminaries in the software world that I believe are worth the time to watch.

First is Rob Pike explaining why he created the Go programming language.

OSCON 2010: Rob Pike, "Public Static Void"

That's a short talk of 12.5 minutes.

Far longer at an hour is this talk by "Uncle Bob" explaining what killed Smalltalk and how the same factors could kill Ruby too.

RailsConf 09: Robert Martin, "What Killed Smalltalk Could Kill Ruby too"

Note, that when Bob finishes with the last of his note cards, the talk is NOT over. I urge you to stay in your seat until the end of the video. Patience!

Now to be fair, I'll have to admit that I haven't programmed in Ruby nor Go nor used Rails and I'll even confess to limited experience with Test Driven Development (TDD), but I think I've got to get one of those green wristbands for myself. So if your reaction to the prospect of watching these videos is "Ain't nobody got time for that!", well, as a consolation prize, I'll give you a link to a related, but short to read XKCD comic:

How Standards Proliferate

Wednesday, September 4, 2013

SOLID software design...

My hopeless backlog

One of the bad habits I have is accumulating lists of things to read. Google Reader used to be a handy place to categorize and keep my "subscriptions" to blogs and so forth to read. Unfortunately, Google pulled the plug on that service in July. Fortunately, another web service, theoldreader.com provided a very similar free service. In fact, their service was designed to look like an earlier version of Google Reader. Some time ago, Google removed some "social" aspects of Google Reader and the change irked some of their users. theoldreader.com sprung up to undo the perceived damage of Google's changes. When Google announced the planned demise of the Google Reader service, Google at least was kind enough to provide a mechanism for retrieving my list of subscriptions, so I opened a free account with theoldreader.com and submitted my subscription list. But so did a zillion other people, so it took literally weeks for the site to get around to processing my subscription list. In the fullness of time, they did get the subscriptions into their system and it seemed usable enough, but then their servers crumbled under the newly stepped up load. (Well, I think that the story is more like they made changes to beef up their servers and in making the changes someone tripped over a power cord or something). More days and days of noticeable downtime before things got back on the air and seemingly stable again.

My point in mentioning Blog subscriptions is that the universe conspires to generate "interesting" blog articles faster than I manage to read them. "Another day older and deeper in debt...". My backlog of unread blog articles is quite hopeless, but whenever I have nothing much to do, instead of turning on the boob-tube in the living room, I fire up theoldreader.com in my browser and try to read up on what I've been missing.

If you've got your own solution to tracking new blog articles, I hope my blog here is on your subscription list. If not, please take a moment to sign-up my blog onto your list. Go ahead and do it now. I'll wait for you to get back. (Full disclosure: No one has ever told me of positive or negative experiences from subscribing to my blog with their favorite RSS tool. I'm assuming that blogspot.com does the right thing, but if you run into a snag, I'd sure like to hear about it).

Emily's "Coding is Like Cooking" blog

One of the blogs that I do try to keep up with is from an expert on test-driven-development (TDD) named Emily Bache, an Englishwoman who now lives and works in Sweden. Her blog is called "Coding is Like Cooking". I like it because it tends to be quite well written and covers relatively recent software development topics that I might otherwise miss out on. i.e. Stuff that I didn't learn in school back in the days when Structured Programming was still somewhat controversial, and that I didn't pick up by osmosis in the later years of my employment at Bell Labs. I freely confess that Test-Driven-Development wasn't part of the quite informal methodology that "we" in Math Research were following. The mathematicians tended to find the underlying math of the problems far more interesting then the structure of the code. I also found myself unexposed to a great mentor for Object-Oriented-Programming. Java got some use in our projects, but I knew enough to recognize bad use of the language when I saw it. So, now I'm retired and still have much to learn. The web has no shortage of material for me to learn from and by reading Emily's articles and saying "Huh?" when something comes up that isn't at all familiar to me, I find lots of great stuff to learn.

One of my "Huh?" moments came from her mention of the "London School". I followed her link and picked up a book for my Amazon book Wish list: Growing Object-Oriented Software, Guided by Tests

SOLID Principles and TDD

So today I was reading an article of her's from September, 2012, SOLID principles and TDD. I didn't get very far into it when my brain complained "Huh? What does she mean by SOLID?". So I opened up another window thanks to a link she provided and read the Wikipedia article on SOLID. It isn't a particularly excellent Wikipedia article. It is dense with off-putting terminology - might be the fault of the SOLID acronym and not really the fault of the article - I've got to forgive the Wikipedia article because it has some excellent links to reference material.

Uncle Bob's principles of Object Oriented Design

One of the links from Wikipedia that I followed was http://butunclebob.com/ArticleS.UncleBob.PrinciplesOfOod. Still more links to clarify the mysterious-to-me parts of that SOLID acronym. This summary article looks especially good: http://www.objectmentor.com/resources/articles/Principles_and_Patterns.pdf.

And just to make sure I don't run out of things to read, there's a book mentioned and even recommended by another reader - Agile Software Development, Principles, Patterns, and Practices that I've added to my Amazon "Agile" book wish list to remind me I really need to take a look at it.

In closing...

So, my situation is quite hopeless if the goal is to finish reading the stuff on my list. If I'm reading anything interesting, it tends to add more things to my list. And that doesn't even count the time to actually try the tools and techniques that I'm reading about. I hope you found some of this material interesting enough to add to your own read-and-try list. I guess I have a bit of a sadistic streak to want to inflict my backlog on other folks.

Learned anything interesting lately?

Sunday, April 14, 2013

Creeping up on OOP in Python, Part 3.

In Part 1 of this series, I looked at a solution to Project Euler problem 22 using the object-based pyparsing library. In Part 2 of this series I refined that code by introducing a simple object of my own creation. The change had the salulatory effect of moving the name-scoring initialization into a logical spot instead of leaving it up in the main routine.

Today, in Part 3, I look at a classroom assignment about multiple inheritance. On the one hand, working through this assignment did leave me with a new and better understanding of multiple inheritance. But, I confess to more than a little discomfort with this sort of assignment, dealing in obviously artificial A,B,C,D classes. What the assignment doesn't do is provide me with any insight as to why I might want as complex a class hierarchy as the A, B, C and D of this assignment. Perhaps I need to spend some time with some OOP text books to find practical examples of this sort of complexity. If someday the Aha! light clicks "on" for this matter for me, I promise to return with a Part 4 that has some less synthetic examples of multiple inheritance. Meanwhile, I confess that I've been looking at the Go programming language and it seems to have a much more obvious mechanism for "inheritance". But, not having written line 1 of Golang code, I'll not expound on that opinion here at this time.

The original problem posed is here: http://pastebin.com/ZgQAYPxw. It's just the kind of homework problem that I hate. It deals in a complex hierarchy of classes A, B, C and D, but does nothing to help me understand why a program might want such a hierarchy. I'm not exactly sure where the problem was posed, but I'll guess it was from the MIT 6.00x MOOC. At the point where my friend asked for help, he already had the correct answers and was concerned that he couldn't work out how they got those answers. On my first try I too came up with incorrect answers, so I had to dig a little deeper.

If you know something of Python multiple inheritance, now would be a good time to try to work out your answers to the questions in that assignment.

If you've read Part 1 and Part 2 of this series, you know I've had very little experience with classes and objects in Python, and have never dabbled at all in multiple inheritance. What I know is that inheritance is a way to make a specialized subclass of a class. e.g. suppose you have an employee object. Employee's have an ID number, a title, a location/room, office phone and so forth. Now suppose you want to define a manager object. A manager is an employee, but perhaps has an associated list of directly reporting employees (the folks he manages). So, to save having to re-implement all the basic stuff for the employee aspect of the manager, you instead derive the class for manager from the class for employee.

Fine. Where I get confused is when looking for good examples of multiple inheritance. Sitting here trying to make one up: Suppose we had a Stash class that knew how to pickle an object and write it to nonvolatile storage, returning a key value that you could later present to a method of Stash to retrieve the stored object. A "permanent" employee might usefully be derived from both the employee class and the stash class. But why multiple inheritance instead of deriving employee from stash and then manager from employee? The crystal grows dark. Perhaps if I get around to Part 4, I'll have good answers. If you can give me a clue that will further my education on this topic, do feel free to add a comment to this blog or if you'd like more room, post a link to a blog response of your own.

Back to A, B, C and D of the homework problem. The language is Python 2.7. That's potentially important because the method resolution order (MRO) changed in the move from Python 2.2 to Python 2.3. On http://www.python.org/download/releases/2.3/mro/, all the hairy details are explained and python code is given to inspect a class hierarchy. So I cross-bred that print_mro code with the original assignment and massaged the lines so the file was executable Python code. That gave me this: http://pastebin.com/CYmEeGds. My thinking was that by actually executing the code from the questions, I could see for certain if the official correct answers do come out, and then I can explore the test code to understand why those answers came out.

My first big surprise is that as given, print_mro only works on classes, not on objects. As I understand it, an object in python knows what class it is an instance of, so I believe that print_mro could be readily souped up to start from an object and report on the class MRO anyhow. I didn't bother - just lived with the limitation of only handing classes, not objects to the print_mro routine.

As the code now stands, this works:

obj = D()
print "D.__dict__=", D.__dict__
print_mro(D)

but print_mro(obj) fails.

A class has a set of built-in class attributes. CLASS.__dict__ is a dictionary of all those class attributes. See: http://www.tutorialspoint.com/python/python_classes_objects.htm. For all the foofaraw about MRO's the print_mro routine showed no surprises. The general rule is to find a method, start at the root of the hierarchy and work down and left to right. First match to the sought method name wins.

So the class attributes include __bases__, the parent classes of the class.

An object also has a set of built-in attributes: OBJECT.__class__ is the name of the object's class, and OBJECT.__dict__ is a dictionary of the object's attributes (just data attributes, like a, b, c, d in our example). Notice that __bases__ isn't lugged along with each instance of the class, just the class name and you'd have to look at the class's built in attributes for further info.

The output of running inherit.py is here: http://pastebin.com/FKhzJGRu Lines 76-77 show the linear list of the classes that make up class D. I see no surprise there. It's D, C, B, A. So when looking for a method (i.e. function) to invoke from an instance of D, that's the order of the classes that it'll look through in questing for that method.

But by peeking at the obj.__dict__ I see only simple values in that instance. e.g. {'a': 2, 'c': 5, 'b': 3, 'd': 6}, not the complicated class hierarchy I was expecting to see reflected in the data too. I was expecting to see an "a" attribute associated with class B and another from class A. But it seems the only "a" is for obj's class D. So the __init__ methods get invoked when obj is instantiated and the last to run is the one for class A, so the value of the "a" attribute set by B.__init__() is the one that prevails. A.__init__ sets "a" to 1, but it was invoked by B.__init__ and later in B.__init__, the "a" gets set to 2.

So that's quite different from what I expected to see, but perhaps is simpler than I was looking for. In any case, it does explain why obj.a ends up being 2.

Looking a bit further up in the hierarchy, D.__init__ invokes C.__init__ which sets "a" to 4, but next thing D.__init__ does is invoke B.__init__ so the "a" changes to 1 and then to 2 as I just described.

All those complex resolution order rules apply for find a method to apply to an object, but don't apply at all for a reference to a data attribute. If you are looking for an attribute named "a" the object's dictionary either has an entry for "a" or you are out of luck.

I still believe it'd be much more valuable if a real use for this stuff was contrived for the examples here instead of A, B, C and D. I haven't done any significant OOP, so I have no real experience with inheritance. Maybe if I had such experience, the merits of the Python implementation of multiple inheritance would be more obvious to me. Seems to me that having all the data attributes in one dictionary implies that in modifying any of the code of any of the classes in the complex hierarchy that a certain amount of care is needed to not introduce unintended name conflicts across the classes. My best guess is that someday when working a far more complex problem in Python that I'll suddenly recognize a burning need for multiple inheritance, but until then, I intend to follow the KISS principle.

If you know more about multiple inheritance in Python then I do (setting a very low bar...), please point out any errors in this article. I'm just sharing what I saw in my experiments, so there's plenty of room for me to get some or all of it outright wrong.

Tuesday, April 2, 2013

Creeping up on OOP in Python - Part 2.

Introduction to part 2 and Spoiler alert

In my previous blog post, I showed my first solution to Project Euler problem 22. For those of you who want to solve the problem on your own, I cautioned that my posting was a "spoiler". In today's article, part 2, I refine the code to use a Python object of my own. I believe the result is better code, but again, a "spoiler" if you'd rather develop your own independent implementation. Sorry, but if you want to remain untarnished by my prior art, please don't read any futher in this article and do not follow any of the links to the source code under discussion.

A classier solution to Problem 22

Well this has been fun! My improved version of problem22 is here: http://pastebin.com/agBBSQ9Z

In the previous version of the code, you may remember I had main() initialize valuedict so the value dictionary could be passed in to the namescorer routine. As I mentioned in part 1, I wasn't entirely happy with that arrangment as the valuedict was clearly an internal implementation detail of the namescorer routine. So, why should main() even know about that stuff?

The first big change was to add a NameScorer class. When you instantiate it, you tell it the alphabet of symbols that the NameScorer object will be asked to process. The first symbol is counted as value 1, the 2nd symbol is given a value of 2, etc. The __init__ routine of NameScorer builds the valuedict, which when you come down to it, is just an implementation detail of how we compute the score for a name. Assuming ASCII and using the ord(c)-ord('A') approach is another way to do it. The new approach removes the valuedict from main(). That looked out of place in main() and I either had to make it global (which is considered bad form) or do what my previous version did: pass the valuedict in to the nameScore routine each time it was called!. Happy to be rid of that.

What I learned as I wrote my first class is that a self. qualifier is very important when trying to refer to a symbol that is part of the instance of the class. Multiple times the tests aborted with a message that there's no such global symbol as valdict. I'd have to correct the code to say self.valdict and hit my forehead while exclaiming Doh! I kind of wish for a short cut like import has so you don't have to repeatedly qualify a symbol reference with the module name.

The next big change didn't actually survive to the final version. I had "for" loops that iterated over the alphabet or the name list, but I also had a counter that had to tick along with each iteration. That meant an initialization statement before the "for" and an incrementing statement in the body of the loop. There isn't wonderful documentation of the syntax in the Python books I have here, but you can say

     for (i, name) in

and thus have both the names and the counter automatically handled by the "for". That left almost nothing inside the "for" and it was then that I realized, the initialization of the value dictionary, could be handled by a "list comprehension" generating the tuples that then just become an argument to dict(). And the totaling of the weighted name scores in main could be done by invoking

    sum(i*ns.score(name) for (i, name) in
        zip(range(1,len(nameList)+1), namelist))

Best explanation of this that I've since found is Ned Batchelder's blog post "Loop like a native".

The books I have here say that current Python implementations implement list comprehensions like that by actually generating the list of tuples and iterating over that. Apparently the intent is that some day the tuples will be generated as needed. But heck. Memory is plentiful.

zip by the way is a function that given 2 lists, generates tuples by pairing together the i-th element in one list with the i-th element in the other list. It only generates as many tuples as the shorter of the 2 lists, but both my lists were the same length.

Somewhat invisible in the code is that while making these changes, I learned a few more basic tricks with the git version control system. I checked in yesterday's problem22.py to git. That version was implicitly the master branch.

Then I created an experimental branch (yeah, this is overkill for a one person operation, but remember I'm doing this to learn). Each time I had a working change to problem22.py, I'd check it into the experimental branch. My last incremental change was in recognition of the ease of getting old versions thanks to git and frequent well commented logs of what changed. So I decided that my old habit of leaving test prints, etc. embedded into the code, albeit commented out, was just making it harder to read the program. So I weeded that commented-out scaffolding out of the final version.

Premature optimization is a bad idea

One disapointment was I was hoping the list comprehensions would get interpreted faster than the for loops, albeit at the price of using more memory. But it runs no faster, maybe even 10ms slower, but that's in the noise range, so more tests would be needed to pin down what is actually happening. I did accidently give myself one clue. I added print statements to show the type of the grammar variable and of the grammar.parsefile(...) result that I had to convert to a vanilla list before I could sort the list. I was lazy, so rather than tear apart the return(list(grammar.parsefile(...))) so I'd have an intermediate result that I could pass to type() to see what it was, I left the return statement alone and preceded it with a print statement that changed list(...) to type(...). That meant I parsed the file twice. The total CPU time nearly doubled, so without resorting to grand profiling tools (which I confess I don't yet know anyhow - and profiling a 320ms run is not such a great thing to do!), but we have some evidence that the vast majority of the run time is going into pyparsing as it reads and parses the 5000 or so names in the input file. So hoping that tweeking the other loops and argument passing would improve CPU time was naive on my part. Oops.

How many times do I have to hear measure first before you try to optimize, before it really sinks into my brain?

Oh, and the type of the grammar is "pyparsing.And", which agrees with the top level of delimited list generating an expression using the overloaded + operator. I'm quite certain that the type of "grammar" is a subclass of parser element, so a grander grammer could be built up with this as just one part of it, Mercifully, I didn't need anything grander.

The type of grammar.parseFile(...) is 'pyparsing.ParseResults". I didn't dig deeper to see why the thing looks like a list, but won't sort. It occurs to me that sort depends heavily on the comparison operators and presumably copy operators to swap values around. Perhaps pyparsing overloads those operators for its ParseResults type. If and when future problems cause me to revisit pyparsing, I'll take a closer look.

The entire listing now handily fits on a single page.

Metrics

I ran the file through sloccount and it says 15 non-comment, non-blank lines of source code. It also estimated that creating 15 lines of Python code like this would take about 0.65 months, which I figure is 2-3 weeks, which is scarily close to right. But if I wasn't taking the time to learn a new language and its modules and git, all at the same time, I'd like to believe I could knock out 15 lines of code in much less time.

(Of course, working on my own, my time is not getting sapped into Affirmative Action meetings, design reviews, questions from the peanut gallery and all those other things that motivated me to work at home when I was doing real design or real coding. So maybe in a real corporate environment, that 2-3 week estimate still would apply, even for a seasoned Pythonista, which I don't claim to be yet!)

My one regret is that I'm still not doing test-driven-development.

Python has the annoying property of accepting just about anything at "compile-time" and not telling you of problems until you drive execution into the problem code at runtime. That makes it super important to have excellent test cases to drive the execution into as many nooks and crannies of the code as you reasonably can manage to do.

My programs don't have built-in self-testing capability to watch out for any regressions as changes are made. I'll have to force myself to try that stuff in some future problems. At this point, I'm not even sure where the test code would go. A separate file in place of "__main__"? And, just to keep life interesting, Python has multiple unit-test modules available, so picking one intelligently will take some work. One of the unit-test packages is called "nose". If I pick that one will you go "Ewww!"?

Sunday, March 31, 2013

Creeping up on OOP in Python - Part 1

Background and Spoiler Alert

Project Euler is a challenge set of problems, hundreds of them, that you probably need a program to solve each problem. You are allowed to write the program in any programming language of your choice. Some of the problems require reading an input which is given in a file linked to the problem description page. All of the programs produce an answer - a single numeric output (sometimes the answer is a very big number with many digits in it). The site poses the problems and you get to tell the site the number that is the answer. The site provides no guidance beyond feedback as to whether your answer is right or wrong until you give it the right answer. A right answer unlocks access to a bulletin board where you can share how you solved the problem. Lots of bragging there from folks about how few lines of code they used or how real mathematical insight made the program so much easier than it appeared to be at first blush.

My background is software development. I've worked with mathematicians, but make no claim to being a mathematician myself. So solutions that are founded on real mathematical insight are not the sorts of things to expect from me. Sometimes, armed with the mathematical insight explained on the bulletin board, I've found it worthwhile to revisit my solution to make my code better. Sometimes I revisit my code with a critical eye just to make it better to satisfy myself. I do miss having talented co-workers who can review code and make suggestions at the lunch table. But as a retiree, it falls on me to be my own reviewer. Fortunately, I have always had strong opinions of the quality of code and a thick skin, so I can provide my own harsh comments to myself and get some cleaner code out of the introspection.

This series of articles is going to look at Problem 22 of Project Euler and is, quite frankly, a spoiler. If you'd rather do it yourself without seeing somebody else's solution, do not open any of the links to the code samples here, and do not read the rest of this blog post if you consider description of the code of a solution to be off-limits too. Before posting this article, I Googled for:

project euler problem 22 solution

and can see I'm by no means the only person breaking the wall and revealing a solution on the Web.

My motivation for tackling problems in Project Euler was to learn the Python programming language. Shortly before my retirement, David James, then of the Bell Labs statistics department of the Murray Hill Math research center, had suggested to me that Python looked like a great programming language to learn. I had no reason to doubt that, but was knee deep in projects in other languages, so I didn't have time just then to stop and learn something entirely new to me. Now that I'm retired, I have time for such adventures. Sadly, except for whatever value there is in telling the tales of such adventures here around the campfire, it isn't exactly clear what good will come from my learning yet-another-programming-language. If nothing else, it has certainly shown to me that David James has an eye for value in programming languages. His recommendation of Python appears to have been well founded indeed. I'm still a long way from becoming an expert in Python programming, but I find it is certainly a pleasant programming language to use to solve problems. Python code is readable (unlike, say, Perl or APL) and tends to be quite compact (unlike Java). The language supports a broad range of programming styles and brings more possibilities for control structures than any language I've worked in except maybe for assembler language, where the language brings no real structure to the program and anything goes that you can dream up - but document your assembler code well so the folks who come later can figure out what you did!

Old School: Structured Programming

My roots are in "structured programming" from the days when "Goto considered harmful" was still considered to be a controversial position. Happily, I've long since accepted the truth of that position and so don't miss at all that Python has no goto statement.

Quite some time ago, I laid out a list of things I believed I needed to master to consider myself a Python expert. Perhaps surprisingly, the language itself is only a small part of the things to learn. I blogged my list in an article that started out to be about "literate programming", but that mutated as I created that article into my Python study guide.

So, my solution of Project Euler problem 22 started out as a small, reasonably clean structured program.

The problem statement for the problem is here: http://projecteuler.net/problem=22, complete with a link to the names.txt file that the program is to process.

By the way, I've been working with Python 2.7. Availability of libraries for Python 3 has been my reason for hanging back with Python 2. As time goes by, that presumably will become less and less of a good reason.

Wrapper Main Program

Since all of the Project Euler problems produce a single numeric answer, I base each of my Project Euler programs on a wrapper main routine that invokes the code for the problem, expecting the called code to return a value. The wrapper main program has a print statement to print the returned value and infrastructure to time the runtime for the called routine and print a report of the runtime. My code for the wrapper main routine can be found here: http://pastebin.com/trk7E1n9. The wrapper main routine expects a single argument from the command line to tell it where the actual program code is located. The wrapper assumes that the program to be run is in the file named by the command line argument and that the name of the program to be invoked from that file is main().

I've not been entirely delighted with my wrapper main program. It reports elapsed time for the test, not CPU time. And the time routines from the library seem rather coarse in their granularity, but Project Euler's guidance on runtimes is only that the data for these problems shouldn't take more than a minute to process on a typical PC or you should rethink your algorithm. My timer code may be a little crude, but it can certainly show whether I've met that guideline.

I've written my program but should it take days to get to the answer?


Absolutely not! Each problem has been designed according to a "one-minute rule", which means that although it may take several hours to design a successful algorithm with more difficult problems, an efficient implementation will allow a solution to be obtained on a modestly powered computer in less than one minute.

If you really want to think about measuring time in your program, you ought to study this page: http://stackoverflow.com/questions/85451/python-time-clock-vs-time-time-accuracy

The Rest of the Problem Solution

My first-version code for problem 22 is here: http://pastebin.com/SgZ7y54b

There are commented-out print statements in the file that I used when I was doing my testing of the code.

One other tool that I learned to work with as part of my study of Python is the source-code-control tool git as provided by github. As I got used to the power of git, I found that the code really reads better if you remove the debugging print statements, not merely comment them out. But the edition of problem22.py that I'm showing is from earlier in the process than my reaching that realization.

The description of the input file for Problem 22 says the input is a sequence of names, each enclosed in double-quotes and separated from each other by commas. Now Python is great at that kind of parsing of input, but I dis-like writing and debugging the fiddly code needed to carefully parse an input file, particularly if the parser is to make reasonable recovery from errors in the input's punctuation. So, I invoked the computer science principle that "there's no problem so big or so complicated that it can't be run away from". My escape hatch was to apply the pyparsing library module. pyparsing is, in my opinion, very Pythonic in its approach. The library allows you to define a grammar in terms of Python objects and then apply that grammar against your input. There are library alternatives that may give better performance, but I found pyparsing to be reasonably well documented and a joy to apply to the straight-forward problem at hand. I ended up having to write delightfully little code to solve the problem. (I concede that later, as I worked through CS101 in Udacity.com that I came to understand that Python may be good enough at parsing input strings that perhaps I should have just gritted my teeth and written the straight-forward code to parse the input file. But I do suspect that familiarity with pyparsing will be useful to me in the future if I ever am faced with parsing a more complex input structure.

This pyparsing library was my first serious encounter with the objected-oriented-programming (OOP) aspects of the Python language. Python is gentle in providing for such modern stuff, yet not forcing you to really grok the feature before letting you write reasonable software in Python.

The Devil is in the Details

Python is a relatively small programming language. As I mentioned, I strongly believe that the Python language itself is only a small part of what I believe I have to learn to be a Python expert. Nevertheless, there were still details of the language that I sort of stumbled into understanding as I worked this problem. I kept notes that I'll share with you here...

The ParseInput routine effectively has only 4 lines of code:

    def ParseInput(filename):
         from pyparsing import (delimitedList, QuotedString)
         grammar=delimitedList(QuotedString('""))
         return(list(grammar.parsefile(filename, parseAll=True)))

The define statement is the boilerplate to define a function. The from import () statement is where I import the pyparsing module from the library. My first try was to say:

     from pyparsing import *

which I expected would import all the "public" names from the module into ParseInput's name space. That shortcut is frowned upon as a change in the library's module could bring a name conflict into my function, but I figured it would save me the trouble of having to name every little thing I wanted to try. To my surprise, the import * was rejected as only usable at a module level. I think its saying that if I'm writing my own module, I can import * all the names in another module, but if I'm just inside a function, I have to be more specific. Rats, shortcut denied. Next try I said:

     from pyparsing import (delimitedList, QuotedString, parseFile)

but it objected that I couldn't import parseFile like that. delimitedList and QuotedString are classes defined in the pyparsing module. They are subclasses of the ParseElement class. The advantage of importing the names into my function are that I can refer to the names without having to qualify the name with the pyparsing module name. pyparsing.delimitedList is kind of long winded in composing a grammar. parseFile though is different. It's a method (invocable function attribute) of the grammar. I'd have to dig into the source code to be sure I'm saying this right, but I think that the grammar built from the parserElements is itself a parserElement, so I believe parseFile and its brother, parseString, are methods that can be invoked for a parserElement.

In the 3rd line of code, I creatively name my grammar, "grammar". delimitedList defaults to a list of comma separated items. The grammar for each item is the argument you pass to delimitedList, so I passed in QuotedString('"'). The argument to Quoted String is the quote character it is to look for. Default is that it strips out the quotes and just returns the stuff enclosed in the quotes. My first try I didn't know that QuotedString required an argument, so I coded it as QuotedString() and got a message about a class being invoked with one argument where it expected two.

The object-oriented stuff in Python always implies an argument of "self", the object on which the method is being invoked. So, my missing argument was the missing "2nd" argument.

I mistakenly reasoned that maybe I should just pass in QuotedString without the (). That got me an obscure (to me) message that I can't "combine type 'type' with a ParserElement". I eventually figured out what it meant. QuotedString is a subclass of Token and Token is a subclass of ParserElement, but QuotedString with no arguments is a class definition, a type. That is type(QuotedString) is 'type'. I tried

     qs=QuotedString('"')

and got type(qs) is "QuotedString" and isinstance(qs, ParserElement) was true. Whew! Good thing this is all OpenSource, as I don't think I'd have made much progress at all if I couldn't follow Obiwan's advice and "Use the source, Luke".

delimitedList internally uses + to construct the grammar for what it is looking for. (pretty straight forward. It wants to find a match to the argument grammar, followed by a comma, which it suppresses [gobbles up], followed by another match to the argument grammar OR it will settle for a match to the argument grammar, not followed by the delimiter [to cover the last item in the list. No need for a trailing comma after the last item]. I'm oversimplifying as the list can have an arbitrary number of comma separated items in it, and optionally, you can name an alternative delimiter instead of sticking with the comma).

parserElement redefines the + operator to make it into an "and" concatenation of parserElements, but it didn't want to concatenate a type to a parserElement. Giving an argument to QuotedString causes an instantiation of the class so I have an object that is an instance of a ParserElement and so is acceptable to the overloaded + operator to concatenate that object to another ParserElement object.

The 4th line of code started out as:

     return(grammer.parseFile(filename, parseAll=True)

(The parseAll=True tells parseFile to throw an exception if the grammar doesn't successfully parse the whole of the input file. Default is it parses as far as the grammar will match in the file and leaves things such that another grammar can be applied next.

That return statement actually seemed to work, but the returned value wouldn't re-order when I invoked the sort method that is associated with lists. I used the type function to see what the type of the returned value was. It was a pyparsing.<something> type, which I'm guessing is a subclass of the built-in list type. It had no objection to my invoking a sort method on it, but it didn't sort the list. Solution was to explicitly convert the returned value to a plain old list using the list() function which you'll now find in that 4th line of code.

So 4 lines of code written, and I'm sure glad no one is measuring my lines of code per hour productivity.

nameScore Routine

The next function is nameScore. 5 lines of code that given a name computes a score for it. A=1, B=2, C=3, .... Score is the sum of the values for the characters in the name. Straight forward iteration over list(name). List knows how to bust a string up into the sequence of characters that I want to process one by one.

valuedict is a nifty Pythonic feature. Basically its a hash table. I wasn't quite sure where to define it, so I just put the initialization into main() and passed the valuedict as an argument to nameScore. Maybe if I was more object-oriented in my thinking, I'd have made nameScore into some kind of a class with an __init__ function that built that valuedict that stays with the nameScore object. In Part 2 of this article I'll go back and try it that way just to see what it looks like.

The names.txt file for this problem is fairly massive - 46KB containing 5000 names all on one very long line of text. For debugging, I constructed a smalllist.txt file that had only a few names in it. smalllist.txt is a small file in the syntax of the file given with the program. The real file had some 5000 names in it, all on one line. Looking at the discussion board for the problem, a disappointing number of people pre-processed the names.txt file (e.g. using Excel) to pull the words apart and alphabetize them. Some even left the words with the double quotes still there, just giving " a zero value in scoring the names.

main() Routine

That brings us to the main() routine. Line 30 runs ParseInput on the input file to get the list of names into nameList. Line 36 applies the sort method of lists to put nameList into alphabetical order. Line 46 to 50 initializes the valuedict. The given problem data only has upper case letters. I threw in values for lower case letters, numbers, and space and period so I could put "R. Drew Davis" into my smalllist.txt test data file. Lines 53-58 are the loop to tally up the weighted sum of all the name scores. And line 59 is where I returned the result. 14 lines of code in main(). Clear and straight-forward.

Object Lesson

The objective of this article was to get closer to understanding Object Oriented Programming and why it is useful in writing clean code. I haven't tried to provide a tutorial of the mechanics of using OOP in Python. Wesley Chun's book "Core Python Programming", 2nd edition, Chapter 13 is a decent reference for learning this stuff. Or if you are too cheap to invest in a reference book, dig into the Web with Google or your favorite search engine. There are plenty of web pages to be found that talk about Objects in Python.

In this article we looked at use of the pyparsing library's objects to define and apply a grammar. I haven't implemented an object and class on my own yet. We'll get to that in Part 2.

I'm still learning this stuff. If you spot anything that I've gotten wrong, please add a comment to let me know so I can correct both the article and my understanding of the topic. Thank-you.