XML News from Tuesday, November 18, 2003

I've posted the notes from last night's XQuery presentation to the New York XML Special Interest Group. We had a small but enthusiastic crowd of about 20 people. One thing that became apparent by the end of the evening was that the XPath 2.0 data model is deeply confusing, at least on first presentation, especially compared to XSLT 1.0. The problem surfaced very early with my first sample query:

   for $t in doc("bib.xml")/bib/book/title
   return
      $t 

This replaced the first query I've used in some other presentations:

<bib>
  {
   for $t in doc("bib.xml")/bib/book/title
   return
    <book>
     { $t }
    </book>
  }
</bib>

I thought the first query was simpler because it didn't use direct element constructors, and made it more apparent that an XQuery is not an XML document. (Another design feature several attendees objected to, by the way. Trying to explain exactly when, where, and how it was necessary to escape different content was another high, holy mess.) However, the second query produces a single element, which can be obviously serialized as an XML document. People liked this. The first query produces a sequence of element nodes, which Saxon serializes as several document fragments, like so:

<?xml version="1.0" encoding="UTF-8"?>
<title>TCP/IP Illustrated</title>
<?xml version="1.0" encoding="UTF-8"?>

<title>Advanced Programming in the Unix Environment</title>
<?xml version="1.0" encoding="UTF-8"?>
<title>Data on the Web</title>
<?xml version="1.0" encoding="UTF-8"?>
<title>The Economics of Technology and Content for Digital TV</title>

Aside from the text declarations, this isn't any different than you might see in XSLT 1.0. And I pointed out that this was hardly the only possible serialization of the result sequence. For example, if you turned on wrapping, Saxon gives you this output instead:

<?xml version="1.0" encoding="UTF-8"?>
<result:sequence xmlns:result="http://saxon.sf.net/xquery-results">
   <result:element>
      <title>TCP/IP Illustrated</title>
   </result:element>
   <result:element>

      <title>Advanced Programming in the Unix Environment</title>
   </result:element>
   <result:element>
      <title>Data on the Web</title>
   </result:element>

   <result:element>
      <title>The Economics of Technology and Content for Digital TV</title>
   </result:element>
</result:sequence>

The attendees liked this result even less. And they really hated the idea that a different tool might produce still a third or a fourth format. They really, really wanted one unique XML output from a query, possibly modulo insignificant details like the use of empty-element tags or boundary white space. Nobody objected when I turned on Saxon's option to pretty print the output because they didn't view that as a creating a different result from the same query.

In XSLT 1.0 all output is XML. A transformation creates a result tree, which can always be serialized as either an XML document or a well-formed document fragment. In XSLT 2.0 and XQuery the output is not a result tree. Rather, it is a sequence. This sequence may contain XML; but it can also contain atomic values such as ints, doubles, gYears, dates, hexBinaries, and more; and there's no obvious or unique serialization for these things. For instance, what exactly are you supposed to do with an XQuery that generates a sequence containing a date, a document node, an int, and a parentless attribute? How do you serialize this construct? That a sequence has no particular connection to an XML document was very troubling to many attendees.

Looking at it now, I'm seeing that perhaps the flaw is in thinking of XQuery as like XSLT; that is, a tool to produce an XML document. It's not. It's a tool for producing collections of XML documents, XML nodes, and other non-XML things like ints. (I probably should have said it that way last night.) However, the specification does not define any concrete serialization or API for accessing and representing these non-XML collections. That's a pretty big hole left to implementers to fill.