python: August 2008 Archives

Assign Objects to Element Attributes in elementtree

| | Comments () |

You can store pretty much anything as an attribute of an elementtree Element instance, not just strings.
A recent comment got me experimenting with this. Tom mentioned "Attributes can only be text", which is abso-smurfly correct for XML. But remember: ElementTree is not XML. If you don't need to serialize to XML, you can hang whatever you like on Elements.

Extending my power plant elementtree example a bit, I create a simple class to represent an employee, then hang instances of it on elements in a nested structure:

from elementtree.ElementTree import Element
from elementtree.ElementTree import SubElement

class Emp(object):
    """A simple Employee

    """
    def __init__(self, name, title):
        self.name = name
        self.title = title

# make some employee instances
burns = Emp("Monty", "CEO")
smithers = Emp("Waylon", "Flunkie")
karl = Emp("Carl", "Engineer")
lenny = Emp("Lenny", "Engineer")
homer = Emp("Homer","Safety Inspector")

# hang them on a tree
ceo = Element('e', emp=burns)
smithers_elem = SubElement(ceo, 'e', emp=smithers)
SubElement(smithers_elem, 'e', emp=karl)
SubElement(smithers_elem, 'e', emp=lenny)
SubElement(smithers_elem, 'e', emp=homer)

It's gross because I create element labels that will never be used ("e"), and I can't ever serialize the structure to XML (bummer), but if you need nested data in a hurry, it's hard to beat.

Plus, I'm pretty sure that someone smarter than I could make this scenario cleaner by doing things like extending tostring to correctly serialize the Emp instances.

My Number 1 Java to Python Gotcha

| | Comments () |
Fredrik Lundh is almost certainly a benevolent alien in disguise, sent to Earth to help the pitiful human race drag itself up out of the muck.

In a recent post, he touched on something that burned me BAD when I first started slinging pythion: using mutables as default parameters.

My particular run-in was a cousin of what Fredik describes, using a mutable as a default class attribute. I had a class like this:

class MyPage(object): errors = [] def __init__(self): # set me up def run(self): try: self.build_page() except: self.errors.append("Oops. Something went wrong")
My problem was that I was setting a mutable as a default attribute value. This meant that each instance of MyPage was sharing the same error list; analagous to a static class variable in Java. It wasn't long before everything had errors, since the array just kept growing. This happened in a production environment. Sub-awesome indeed. What I SHOULD have done is this:

class MyPage(object): errors = None def __init__(self): # set me up self.errors = [] def run(self): try: self.build_page() except: self.errors.append("Oops. Something went wrong")
The __init__ clears out the array each time a new object is created. No sharing between instances. No pain.

Nice Comparison of Python and Java

| | Comments () |
Brian M. Clapper does a nice job comparing Java and Python with specific cases and examples. He shows the places where he likes Python, and explains why.

I made a similar transition to Python, and it was nice to see a lot of my own discoveries in his list.

Brian manages to give a good summary without calling Java nasty names or badmouthing it's devotees:

I've been a Java programmer for a long time, and I bear no ill will toward a language I happily used for many years. I'm just trying to capture why I find Python to be more fun.

[link] Why is Python More Fun Than Java?

An Extensive Look At Unicode In Python

| | Comments () |
About once a year, I run into a problem where I need to sling Unicode (turns out scientists and math folks really dig those Greek characters). Inevitably, this means a few hours fumbling around with converting to and from Unicode, along with endless trolling though Unicode tables to verify I'm doing things right. If you have a minute, check out counting rod numerals in the "Other Ancient Scripts" section. Fascinating, and practical!

When you find yourself cursing Unicode, and you will, remember that you bookmarked this post that goes into great detail on how to use Python to handle Unicode.

Trust me. In six months you're going to hit Unicode, and you'll remember "Hey! I remember seeing something about this somewhere...if only I could remember..."

Drop That Nested Dictionary; Use elementtree

| | Comments (5) |
Tree-like dictionaries are gross and hard to work with. Use elementtree for nested data.

In Python, most people's instinct is to model tree-like structures with nested dictionaries. An organization chart from a nuclear power plant is a simple example:


powerplant_org_chart.png



The old me would have rushed to model this like so:
org = {'name':'Monty',
       'title':'CEO',
       'boss_of':[
                  {'name':'Waylon',
                     'title':'Flunkie',
                     'boss_of':[
                                  {'name':'Carl',
                                   'title':'Engineer',
                                   'boss_of': None},
                                  {'name':'Lenny',
                                   'title':'Engineer',
                                   'boss_of': None},
                                  {'name':'Homer',
                                   'title':'Safety Iinspector',
                                   'boss_of': None}
                                 ]
                    }]
    }

This is annoying to type and even harder to read. Keeping track of quotes, brackets and commas is frustrating. Imagine the pain when of a larger or more complex tree.

Fear not. elementtree makes things super-easy when dealing with nested data.

"But wait", you say. "You're not doing XML!"

Say it with me: elementtree is not XML. Yes, it's good at serializing to and de-serializing from XML, but it's very useful even if you never read or generate a lick of markup.

Here's the same org chart using Element. Much shorter, less error-prone, and easier on the eyeballs:

from elementtree.ElementTree import Element
from elementtree.ElementTree import SubElement


burns = Element('emp', name="Monty", title="CEO")
smithers = SubElement(burns, 'emp', name="Waylan", title="Flunkie")
karl = SubElement(smithers, 'emp', name="Carl", title="Engineer")
lenny = SubElement(smithers, 'emp', name="Lenny", title="Engineer")
homer = SubElement(smithers, 'emp', name="Homer", title="Inspector")

I didn't create a whole class to help me model the nested data; that would have been overkill. I just used Elements  to string together related data. Now it's easy to navigate the tree in idiot-proof fashion with methods like getchildren and getiterator.

Nested data pops up quite often, and it's nice to have a good tool for dealing with it gracefully.