PowerPoint Presentation - Effective XML

What people do is this:

<description><a href="http://www.xerlin.org">Xerlin 1.3</a>, an open source XML Editor written in Java, has been released. Users can extend the application via custom editor interfaces for specific DTDs. New features in version 1.3 include:

<ul>

<li>XML Schema support</li>

<li>WebDAV capabilities</li>

<li>Various user interface enhancements</li>

</ul>

Java 1.2 or later is required.

</description>


	This ugliness isn’t created just so mixed content can be avoided. It also avoids the use of namespaces (Item 20) and modularization (Item 8). But fear of mixed content is certainly a major contributing factor. What’s really telling in this example is that the community promptly hacked their own uglier version of mixed content back into RSS, even though the original developers had tried to avoid it. Mixed content is not a mistake. It is not something to be feared. It is at the core of much of the information XML is designed to mark up.
	Tools that fail to handle mixed content properly range from simple programs such as XML pretty printers to complete data binding APIs. One particularly perverse API I encountered read mixed content but reordered it so all the plain text nodes came after all the child elements. Many other tools came into existence without support for mixed content and had to undergo complicated and expensive retrofitting when the need to support it became obvious.
	Another common problem is software that claims to be able to handle mixed content but was never extensively tested with narrative documents. I’ve brought more than one XML editor to its knees by loading in a book written in DocBook. Too often programmers introduce bugs into their code based on mistaken notions of what XML documents can look like. For example, a programmer who forgets about mixed content may try to store the children of an element as a list of Element objects, rather than a more generic list of Object or Node objects. True XML software needs to be prepared to handle all the many forms XML can take, including both narrative and record-oriented documents.
	The underlying cause of these problems is that the designers started with the question “How do I convert an object into an XML document?” rather than the much tougher question “How do I convert an XML document into an object?” A variant starts with the question “How do I convert a relational table to an XML document?” but the underlying problem is the same. This is a toothpaste problem: It’s a lot easier to squirt XML out of an object than to push it back in. Most of these tools claim to be able to read XML documents into Java or C++, but they fail very quickly as soon as you start throwing real-world documents at them. Generally speaking, the developers designing these tools are laboring under numerous faulty assumptions, including the following.
			Documents have W3C XML Schema Language schemas. (The vast majority don’t.)
			Documents have some kind of schema. (Many, perhaps most, don’t.)
			Documents that actually have schemas of some kind do in fact adhere to those schemas. (Often untrue.)
			You know the sorts of structures you’re going to encounter before you see the documents. In other words, the documents are predictable. (Not an unreasonable assumption, but nonetheless it is often untrue in practice.)
			Mixed content doesn’t exist. (Patently false.)
			XML documents are fairly flat. In particular they have nearly tabular structures. (The database mapping folks tend to make this assumption. The object folks are a little less likely to fall into this particular trap.)
	The same issues arise when developers try to store XML data in relational tables. XML documents are not tables. You can force them in, in a variety of very ugly ways, but this is simply not the task a relational database is designed for. You’ll be happier with a database and API designed for XML from the start that doesn’t try to pretend XML is simpler than it really is.
	The fact is, XML documents considered in their full generality are extremely complicated. They are not tables. They are not objects. Any reasonable model for them has to take this complexity into account. Their structures very rarely match the much more restrictive domains of tables and objects. You can certainly design mappings from XML to classes, but unless you’re working in a very limited domain, it’s questionable whether you can invent anything much simpler than JDOM. And if you are working in a restricted domain, all you really need is a standard way of serializing and deserializing instances of particular classes to and from a particular XML format. This can be almost hidden from the client programmer. Be wary of tools that implicitly subset XML and handle only some kinds of XML documents. Robust, reliable XML processing needs to use tools that are ready to handle all of XML, including mixed content.