33. Choose DOM for Standards Support

While SAX programs are almost always faster and more memory efficient than the DOM equivalents, performance is not a god to be worshiped above all others. There are many times when using DOM makes sense. In particular, for many classes of applications, programmers will find DOM much easier to work with. If shaving off 10% of execution time or 90% of space matters less to you than saving 10% of development time, then you need to consider whether DOM might better fit the problem at hand than SAX.

In particular, the following characteristics indicate that a problem might be profitably addressed by DOM:

All of these are fuzzy. If speed matters more to you than product development time or memory usage, you may choose to use SAX even for a system that uses data structures as complex as the XML document itself and requires random access to the tree. The only criterion that's really carved in stone is memory. If the program needs to process documents that are large compared to the available memory, then you really have to use a streaming API such as SAX. Otherwise, a lot depends on your comfort level and the need for each characteristic.

If my recommendation for DOM sounds a lot more reticent than that for SAX, there's a good reason. DOM can be just plain weird. It is very much like the proverbial horse designed by committee, and, to be perfectly honest, camels don't smell as bad as DOM. DOM is packed with gotchas. Here's a representative sampling of just a few:

I could go on--I haven't even begun to consider issues like naming conventions and the use of short constants that may make sense to programmers in some languages but not others--but I'll restrain myself. DOM is such an incredibly baroque API that most experienced XML developers turn to it only as a last resort.

Most of the reasons to use DOM are really reasons to use a tree-based API that holds the document in memory. There's no particular reason this has to be DOM instead of JDOM, XOM, dom4j, or any of the numerous other tree-based APIs. Microsoft implements DOM in MSXML, but has added so many additional non-standard methods that the resulting API really isn't DOM at all. Indeed the proliferation of alternate tree-based object models for XML is a symptom of the widespread dissatisfaction with DOM in the developer community. By way of contrast, the much cleaner SAX API has the field of push parsing XML almost completely to itself. There are a few rough spots in SAX, but not of them have itched developers enough to make them replace it. By contrast, DOM itches developers worse than the fleas of a thousand camels.

Considered relative to other tree-based APIs, where does DOM stand out? I've talked about its unique weaknesses. What, if any, are its unique strengths? Believe it or not, there are a few that occasionally suggest or even mandate its use:

In brief, DOM is more standard and more broadly supported than other APIs; and thus may be important in situations where you need to exchange code with diverse programmers. However, it is not the cleanest, most efficient, fastest, or most productive API you can use to process XML.