XML News from Wednesday, May 3, 2006

The Apache WebServices Commons Project has released of AXIOM 1.0. Near as I can tell this is yet another tree model like DOM, JDOM, or XOM. However it's built from StAX rather than SAX. Most importantly Axiom can build the object tree on demand so you don't spend memory on nodes you don't want. That sounds good, but it's been tried before (notably in Xerces's deferred DOM) and the results have not been impressive. Maybe these folks have figured out a more practical way to do this, though. The underlying push-pull parser distinction may be important for this.

Also of note is the support for XML Optimized Packaging (XOP) and MTOM. The Axiom announcement gets this exactly backwards though. XOP and MTOM do not allow "XML to carry binary data efficiently and in a transparent manner." Instead they allow both XML and binary data to be bundled together in the same non-XML file. Understanding the distinction is critical for proper use of these technologies.

The Axiom API itself is too complex. For example, here's a chunk of code from the tutorial:

OMFactory factory = OMAbstractFactory.getOMFactory();
OMNamespace ns1 = factory.createOMNamespace("bar","x");
OMElement root = factory.createOMElement("root",ns1);
OMNamespace ns2 = root.declareNamespace("bar1","y");
OMElement elt1 = factory.createOMElement("foo",ns1);
OMElement elt2 = factory.createOMElement("yuck",ns2);
OMText txt1 = factory.createOMText(elt2,"blah");
elt2.addChild(txt1);
elt1.addChild(elt2);
root.addChild(elt1);

And here's the equivalent in XOM for comparison:

Element root =  new Element("x:root", "bar");
Element elt1 = new Element("x:foo", "bar");
Element elt2 = new Element("y:yuck", "bar1");
Text txt1 = new Text("blah");
elt2.appendChild(txt1);
elt1.appendChild(elt2);
root.appendChild(elt1);

Of course, XOM would notice that the requested elements use relative namespace URIs, and thus that the document containing them does not have a valid Infoset. For all the talk about Infosets on the Axiom pages, you'd hope somebody would have noticed this. Their examples also demonstrate a lack of correct white space handling, and some serious mistakes with encoding detection. I haven't tried to write code with this API yet, so I can't tell if the problems are in the library itself or just the tutorial. Either way, it's disturbing.

Folks: if you're going to write yet another XML API, please, please ask for early review from people who have been through this before. The reason the mistakes in Axiom jump out at me is that I've seen them all dozens of times before. XML is not as simple a spec as it seems at first glance. There are a lot of tricky areas that trip up the unwary. There are some interesting new ideas here, that should be explored further. However, as a library it's clearly unsuitable for production use.