minidom

From The Python Library Reference

"xml.dom.minidom is a light-weight implementation of the Document Object Model interface. It is intended to be simpler than the full DOM and also significantly smaller."

Good:

comes with Python
Covers most of DOM 2

Bad:

slow
memory hog
DOM-based, so API isn't very Pythonic

ElementTree

From http://effbot.org/zone/element-index.htm

"The Element type is a simple but flexible container object, designed to store hierarchical data structures, such as simplified XML infosets, in memory. The element type can be described as a cross between a Python list and a Python dictionary. The ElementTree wrapper adds code to load XML files as trees of Element objects, and save them back again."

ElementTree

Good

fast
API is very Pythonic
XPath queries
the core ships with Python 2.5 as xml.etree

Bad

separate download for Py<2.5 (and maybe even 2.5)
Not DOM-like, so you have to learn a new API
How it handles namespaces (maybe just me)

cElementTree

From http://effbot.org/zone/celementtree.htm

"The cElementTree module is a C implementation of the ElementTree API, optimized for fast parsing and low memory use. On typical documents, cElementTree is 15-20 times faster than the Python version of ElementTree, and uses 2-5 times less memory. On modern hardware, that means that documents in the 50-100 megabyte range can be manipulated in memory, and that documents in the 0-1 megabyte range load in zero time (0.0 seconds). This allows you to drastically simplify many kinds of XML applications."

cElementTree

Good

Does everything that ElementTree does, only faster

Bad

requires ElementTree module
everything else on ElementTree
yet another download for Py<2.5

Other Libraries

Other libraries you may see mentioned on the Web:

PyXML

http://pyxml.sourceforge.net/
next step up from Python's standard XML libraries.
stable versions make it into Python's standard library.
not actively maintained at this point.

4Suite

http://4suite.org/index.xhtml
includes DOM, XPath, XSLT, etc.

Other Libraries

libxml2

http://xmlsoft.org/python.html
Python wrapper around Gnome's libxml2 library
quite fast and quite complete
C-like interface

Examples

In the following examples, the minidom code is on the top and the cElementTree code is on the bottom.

Example

Importing the Packages

(at least how I do it)

from xml.dom import minidom

import cElementTree as ET

Creating XML From Scratch

    roottag="<tag/>"
    newdoc=minidom.parseString(roottag)

    etElement=ET.Element("tag")

Example

Creating A New Element

    newtag = newdoc.createElement("newtag")
    newdoc.documentElement.appendChild(newtag)

    newElement=ET.SubElement(etElement,"newtag")

Adding Attributes

    newtag.setAttribute("name","value")

    newElement.set('name','value')

Example

Adding Text

    newtag.appendChild(newdoc.createTextNode("text value"))

    newElement.text="text value"

Importing From Another XML Tree

    newdoc2=minidom.parseString(roottag)
    newtag=newdoc2.childNodes[0]
    newtag2=newdoc.importNode(newtag,deep=1)
    newtag2 = newdoc.documentElement.appendChild(newtag2)

    newElement=ET.Element("tag")
    etElement.append(newElement)

Example

Removing Elements

    newtag2.parentNode.removeChild(newtag2)

    etElement.remove(newElement)

Iteration

    for tag in newdoc.getElementsByTagName("newtag"):
        print tag.getAttribute("name")

    for tag in etElement.find("newtag"):
        print tag.get("name")

Example

Printing Out

    print newdoc.toxml()

    print ET.tostring(etElement)

The Final Result

<?xml version="1.0" ?>
<tag><newtag name="value">text value</newtag></tag>

<tag><newtag name="value">text value</newtag></tag>

ElementTree-only stuff

XPath and Elementtree
Namespaces

XPath and Elementtree

ElementTree can use basic XPath queries to find Elements in the path.

The find method will find the first Element that matches the XPath query

The findall will find all the items that match the query.

The findtext method will find the first tag matching the XPath query and return it's text.

    print et.find("/wpt/desc").text

    for t in et.findall("/wpt/desc"):
        print t.text

    print et.findtext("/wpt/desc")

Namespaces

Namespaces are special in ElementTree. If you have namespaces defined, you must always use it, even if it's the default namespace, i.e. a namespace defined as:

<gpx
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns="http://www.topografix.com/GPX/1/0" 
xsi:schemaLocation="http://www.topografix.com/GPX/1/0">

Namespaces

Here is a sample of how you have to access it:


>>> et=ET.parse("everything.gpx")
>>> et.find("wpt")
>>> print et.find("wpt")
None
>>> print et.find("{http://www.topografix.com/GPX/1/0}wpt")
<Element '{http://www.topografix.com/GPX/1/0}wpt' at 0xe3f8>

I like to use string templates for this. See next example.

Example -- GPX to CSV

GPX is the format that comes out of a GPS. Here is a script that takes a GPX file and outputs lat,lon,wayptname


import sys,os
import cElementTree as ET
import string

if __name__ == '__main__':

    mainNS=string.Template("{http://www.topografix.com/GPX/1/0}$tag")

    wptTag=mainNS.substitute(tag="wpt")
    nameTag=mainNS.substitute(tag="name")

    et=ET.parse(open("everything.gpx"))

    for wpt in et.findall("//"+wptTag):
        wptinfo=[]
        wptinfo.append(wpt.get("lat"))
        wptinfo.append(wpt.get("lon"))
        wptinfo.append(wpt.findtext(nameTag))

        print ",".join(wptinfo)

Example -- GPX to CSV (cont)

Result:

39.655717000,-104.902083000,GCRQRZ
39.568783000,-104.913300000,GCRRCK
39.556767000,-104.874400000,GCRRHG
39.660650000,-104.762467000,GCRRQ5
39.664640000,-104.764720000,GCRWHG
39.572367000,-104.912567000,GCRWV5
39.705883000,-104.778600000,GCRZ5V
39.709617000,-104.786800000,GCRZ5X
39.566450000,-104.889233000,GCT2BP
....

Example -- restconnect

restconnect is a module I wrote to abstract a REST webservice API. Instead of creating the XML yourself, the RestConnect class creates the XML from properties, and assigns the result to another property. Here is an example:

### create the Geocode class
class Geocode(RestConnect):

    def __init__(self):

        RestConnect.__init__(self,
                     "http://api.local.yahoo.com/MapsService/V1/geocode?",
                     "urn:yahoo:maps")

        self.appid='xxxxx'

Example -- restconnect (cont)


## Use Geocode
if __name__=='__main__':

    g= Geocode()
    if len(sys.argv)<2:

      g.city="Omaha"
      g.State="NE"
      g.Street="14620 Frances Cir"
      g.zip="68144"


    else:
        g.location=''
        for x in sys.argv[1:]:
            g.location+="%s " %x

    g.fetch()
    print g.Latitude,g.Longitude

Example -- restconnect (cont)

The minidom version:

    def _parse(self,xmlstr):

        dom = minidom.parseString(xmlstr)
        result = dom.getElementsByTagName("Result")[0]

        for child in result.childNodes:
            if child.firstChild:
                self.__dict__[child.tagName] = child.firstChild.data

        del dom

Example -- restconnect (cont)

The cElementtree version:

    def _parse(self,xmlstr):

        et = ET.parse(xmlstr)
        if self._namesp:
            namesp=string.Template("{%s}$tag" %self._namesp)

        else:
            namesp=string.Template("%tag")

        resultTag=namesp.substitute(tag="Result")
        result=et.find(resultTag)

        for child in list(result):
            if child.tag.find("}")>-1:
                tagname=child.tag[child.tag.find("}")+1:]
            else:
                tagname=child.tag

            self.__dict__[tagname] = child.text


        del et

Python and XML

By Mike Hostetler

What we will cover

minidom

Good:

Bad:

ElementTree

ElementTree

Good

Bad

cElementTree

cElementTree

Good

Bad

Other Libraries

PyXML

4Suite

Other Libraries

libxml2

Examples

Example

Importing the Packages

Creating XML From Scratch

Example

Creating A New Element

Adding Attributes

Example

Adding Text

Importing From Another XML Tree

Example

Removing Elements

Iteration

Example

Printing Out

The Final Result

ElementTree-only stuff

XPath and Elementtree

Namespaces

Namespaces

Example -- GPX to CSV

Example -- GPX to CSV (cont)

Result:

Example -- restconnect

Example -- restconnect (cont)

Example -- restconnect (cont)

The minidom version:

Example -- restconnect (cont)

The cElementtree version:

Questions?