|
|
|
|
|
This ugliness
isn’t created just so mixed content can be avoided. It also avoids the use
of namespaces (Item 20) and modularization (Item 8). But fear of mixed
content is certainly a major contributing factor. What’s really telling in
this example is that the community promptly hacked their own uglier version
of mixed content back into RSS, even though the original developers had
tried to avoid it. Mixed content is not a mistake. It is not something to be
feared. It is at the core of much of the information XML is designed to mark
up.
|
|
Tools that fail
to handle mixed content properly range from simple programs such as XML
pretty printers to complete data binding APIs. One particularly perverse API
I encountered read mixed content but reordered it so all the plain text
nodes came after all the child elements. Many other tools came into
existence without support for mixed content and had to undergo complicated
and expensive retrofitting when the need to support it became obvious.
|
|
Another common
problem is software that claims to be able to handle mixed content but was
never extensively tested with narrative documents. I’ve brought more than
one XML editor to its knees by loading in a book written in DocBook. Too
often programmers introduce bugs into their code based on mistaken notions
of what XML documents can look like. For example, a programmer who forgets
about mixed content may try to store the children of an element as a list of
Element objects, rather than a more generic list of Object or Node objects.
True XML software needs to be prepared to handle all the many forms XML can
take, including both narrative and record-oriented documents.
|
|
The underlying
cause of these problems is that the designers started with the question “How
do I convert an object into an XML document?” rather than the much tougher
question “How do I convert an XML document into an object?” A variant starts
with the question “How do I convert a relational table to an XML document?”
but the underlying problem is the same. This is a toothpaste problem: It’s a
lot easier to squirt XML out of an object than to push it back in. Most of
these tools claim to be able to read XML documents into Java or C++, but
they fail very quickly as soon as you start throwing real-world documents at
them. Generally speaking, the developers designing these tools are laboring
under numerous faulty assumptions, including the following.
|
|
Documents have W3C XML Schema
Language schemas. (The vast majority don’t.)
|
|
Documents have some kind of
schema. (Many, perhaps most, don’t.)
|
|
Documents that actually have
schemas of some kind do in fact adhere to those schemas. (Often untrue.)
|
|
You know the sorts of structures
you’re going to encounter before you see the documents. In other words, the
documents are predictable. (Not an unreasonable assumption, but nonetheless
it is often untrue in practice.)
|
|
Mixed content doesn’t exist.
(Patently false.)
|
|
XML documents are fairly flat.
In particular they have nearly tabular structures. (The database mapping
folks tend to make this assumption. The object folks are a little less
likely to fall into this particular trap.)
|
|
The same issues
arise when developers try to store XML data in relational tables. XML
documents are not tables. You can force them in, in a variety of very ugly
ways, but this is simply not the task a relational database is designed for.
You’ll be happier with a database and API designed for XML from the start
that doesn’t try to pretend XML is simpler than it really is.
|
|
The fact is,
XML documents considered in their full generality are extremely complicated.
They are not tables. They are not objects. Any reasonable model for them has
to take this complexity into account. Their structures very rarely match the
much more restrictive domains of tables and objects. You can certainly
design mappings from XML to classes, but unless you’re working in a very
limited domain, it’s questionable whether you can invent anything much
simpler than JDOM. And if you are working in a restricted domain, all you
really need is a standard way of serializing and deserializing instances of
particular classes to and from a particular XML format. This can be almost
hidden from the client programmer. Be wary of tools that implicitly subset
XML and handle only some kinds of XML documents. Robust, reliable XML
processing needs to use tools that are ready to handle all of XML, including
mixed content.
|