Native XML Databases

eXQuisite or eXcruciating?

Elliotte Rusty Harold

Wednesday, March 5, 2008

elharo@metalab.unc.edu

http://www.cafeconleche.org/

What is a Database?

That is, what services does it provide?

Persistence/Storage
Search/Retrieval
Combinations/Restructuring
Transactions
Access Control
Storage Independence
Replication

These services are independent of database models

Network
Hierarchical
Relational
Object
Entity-attribute-value
Key-value
Hybrid
Ad Hoc (Filemaker, dBase, etc)

Relational (i.e. SQL) databases handle about 20% of the world's information

But it's a very important 20%:

Payroll
Accounting
Billing
Census
Shipping
Stock trades
Scientific measurements
And much more

Relational models work very well for problems that can be decomposed into predictable sets of small atomic pieces of information that can be managed independently.

Characteristics of normalizable data

Order doesn't matter
Limited substructure (no hierarchies)
Fully bidirectional connections
Repeating structures
Few nulls
Predictable, reliable schema

But what about the other 80%?

Poetry
Legal briefs
Technical Manuals
Magazines
Books
Encyclopedias
Web Sites
The Library of Congress
Tax Forms
E-mail
Course Notes
Medical charts and doctor's notes
Case histories
etc.

Of course you can always fit anything into anything

I'll make it fit
Unary encoding
Infinity is the key
Relational databases can encode anything

But should you?

We need databases that provide services like transactions but without pounding everything into rectangular holes.
Different databases for different data.
Many relational databases implicitly recognize this by providing non-relational extensions such as full-text search.
Many database designs implicitly recognize this by deliberately storing non-normalized data.
And of course SQL isn't truly relational anyway.

Index Engines: The Other Extreme

From fully normalized structure to no structure at all
"Database" only stores the index, not the data. Data is usually stored in a possibly distributed file system.
Markup is what gets in the way of the content.
Great for full text search
The only technology proven to work at the size of the Web (not just a web site but the entire Web)
Examples:
- Lucene
- Fast
- Google
Is there anything in between all structure (relational) and no structure (index engines)?

XML Databases

Do not replace SQL databases for data that does fit into rectangular holes.
Handle some non-rectangular data.
In particular, work very well with hierarchical data.
We're reinventing hierarchical databases with XML. Are network databases next?

Why Did Hierarchical Databases Fail?

Did they really fail?
Relational databases did replace them for many tasks
Primarily because they required you to define your queries at database design time.
And because, like SQL, they required a fixed schema. Fixed table schemas are more flexible than fixed hierarchy schemas.

But what if

You didn't have to have a fixed schema?
And you didn't have to specify all your queries up front?

30 Years Makes a Big Difference

We know a lot more about database design and optimization now than we did in the 1970s.
And we have much more powerful hardware.
Maybe it's time to revisit some of the assumptions that led us to relational databases.

What are XML Databases Good For?

Order and position matter
Deeply nested and even recursive data
Mixed content
Duplicate content and redundant data
Grouping at multiple levels (hierarchies)
Unpredictable schema or no schema at all

What XML Databases cannot do:

Multiple structures per document (e.g. Xanadu)
Overlap
Pointers that jump around in the document, not just parent to child

The Players

Open Source Native

eXist
Berkeley dbXML
Sedna
MXQuery
XIndice (formerly dbXML)

Closed source Native

Mark Logic
iPedo
X-Hive/DB
TigerLogic

Hybrid Relational-XML

DB2 9
Oracle 10g
Microsoft SQL Server

Questions to Ask

i.e. characteristics that distinguish products

Can a DB handle very large single documents?
Can the DB handle very many small-to-medium documents?
Maximum database size
Granularity of locks: per-document or smaller?
XQuery 1.0 support
XQuery Full-text search support
Update support
Schemas: required, optional, or unsupported?
Import formats
And of course cost, freedom, and support

An XML Database should include

XQuery
XQuery Update Facility
XQuery Fulltext Search
Proprietary extensions to fill in the holes
And of course bindings for your favorite language

Berkeley DB XML

Open Source
Storage engine is well-proven Berkeley DB, a non-networked, embeddable key-value database
Now owned by Oracle
XQuery Fulltext Search

Installing

Compile from source

Creating a Database

$ dbxml

dbxml> createContainer website.dbxml
Creating node storage container with nodes indexed

Put Documents Into the Database

dbxml> putDocument pages '/home/elharo/initialdata.xml' f
Document added, name = pages

Run A Query

dbxml> query 'collection("website.dbxml")//*'
220 objects returned for eager expression ' collection("website.dbxml")//*'

dbxml> print
<feed xmlns="http://www.w3.org/2005/Atom" 
      xmlns:thr="http://purl.org/syndication/thread/1.0" 
      xml:lang="en">
  <id>http://www.elharo.com/blog/feed/atom/</id>
  <updated>2007-10-07T15:43:26Z</updated>
  <title type="text">Mokka mit Schlag</title>
...

Queries with Namespaces

dbxml> setNamespace atom http://www.w3.org/2005/Atom
Binding atom -> http://www.w3.org/2005/Atom

dbxml> query 'collection("website.dbxml")//atom:id'
11 objects returned for eager expression 'collection("website.dbxml")//atom:id'

dbxml> print
<id xmlns="http://www.w3.org/2005/Atom">http://www.elharo.com/blog/feed/atom/</id>
<id xmlns="http://www.w3.org/2005/Atom">http://www.elharo.com/blog/birding/2007/10/07/brown-lipped-snail/</id>
<id xmlns="http://www.w3.org/2005/Atom">http://www.elharo.com/blog/birding/2007/10/06/common-house-spider/</id>
<id xmlns="http://www.w3.org/2005/Atom">http://www.elharo.com/blog/networks/2007/10/05/were-back/</id>
<id xmlns="http://www.w3.org/2005/Atom">http://www.elharo.com/blog/networks/2007/10/01/shopping-for-isps/</id>
<id xmlns="http://www.w3.org/2005/Atom">http://www.elharo.com/blog/birding/2007/09/29/autumn-meadowhawks/</id>
<id xmlns="http://www.w3.org/2005/Atom">http://www.elharo.com/blog/new-york/2007/09/28/new-york-skyline/</id>
<id xmlns="http://www.w3.org/2005/Atom">http://www.elharo.com/blog/birding/2007/09/28/connecticut-warbler/</id>
<id xmlns="http://www.w3.org/2005/Atom">http://www.elharo.com/blog/science/astronomy/2007/09/28/adaptive-optics-on-the-cheap/</id>
<id xmlns="http://www.w3.org/2005/Atom">http://www.elharo.com/blog/birding/2007/09/28/390-connecticut-warbler-at-metrotech/</id>
<id xmlns="http://www.w3.org/2005/Atom">http://www.elharo.com/blog/software-development/web-development/2007/09/26/amazon-breaks-their-site/</id>

Run A Query Against a Named Document in the database

dbxml> query 'doc("website.dbxml/pages")//*'
220 objects returned for eager expression ' collection("website.dbxml")//*'

dbxml> print
<feed xmlns="http://www.w3.org/2005/Atom" 
    xmlns:thr="http://purl.org/syndication/thread/1.0" 
    xml:lang="en">
  <id>http://www.elharo.com/blog/feed/atom/</id>
  <updated>2007-10-07T15:43:26Z</updated>
  <title type="text">Mokka mit Schlag</title>
...

XQuery

Three parts:

A data model for XML documents based on the XML Infoset and the W3C XML Schema Language Post Schema Validation Infoset (PSVI)
A mathematically precise query algebra; that is, a set of query operators on that data model
A query language based on these query operators and this algebra

XQuery Language

A fourth generation declarative language like SQL; not a procedural language like Java or a functional language like XSLT
Queries operate on single documents or fixed collections of documents.
Queries select whole documents or subtrees of documents that match conditions defined on document content and structure
Can construct new documents based on what is selected
No updates, inserts, or deletes!

Documents to Query

Narrative documents and collections of such documents; e.g. generate a table of contents for a book
Record-like documents; e.g. SQL-like queries of an XML dump of a database
Filtering streams to process logs of email messages, network packets, stock market data, newswire feeds, EDI, or weather data to filter and route messages represented in XML, to extract data from XML streams, or to transform data in XML streams.
XML views of non-XML data

Physical Representations to Query

Files on a disk
Native-XML databases like Mark Logic Content Server
Fields in hybrid databases like IBM DB2 9
DOM trees in memory
Streaming data
Other representations of the infoset

Where is XQuery used?

Command line query tools
GUI query tools
JSP, ASP, PHP, and other such server side technologies
Programs written in Java, C++, and other languages that need to extract data from XML documents
Others are possible
Anywhere SQL is used to extract data from a database, XQuery is used to extract data from an XML document.
SQL is a non-compiled language that must be processed by some other tool to extract data from a database. So is XQuery.

The XML Model vs. the Relational Model

A relational database contains tables	An XML database contains collections
A relational table contains records with the same schema	A collection contains XML documents with the same DTD
A relational record is an unordered list of named values	An XML document is a tree of nodes
A SQL query returns an unordered set of records	An XQuery returns an ordered sequence of nodes

Query Data Types

XML 1.0 #PCDATA
Schema primitive types: positiveInteger, String, float, double, unsignedLong, gYear, date, time, boolean, etc.
Schema complex types
Collections of these types
References to these types

Picture taken from XQuery 1.0 and XPath 2.0 Functions and Operators W3C Working Draft 3 November 2005

An example document to query

Most of the examples in this talk query this bibliography document at the (relative) URL bib.xml:

<?xml version="1.0"?>
<bib>
  <book year="1994">
    <title>TCP/IP Illustrated</title>
    <author><last>Stevens</last><first>W.</first></author>
    <publisher>Addison-Wesley</publisher>
    <price>65.95</price>
  </book>

  <book year="1992">
    <title>Advanced Programming in the Unix Environment</title>
    <author><last>Stevens</last><first>W.</first></author>
    <publisher>Addison-Wesley</publisher>
    <price>65.95</price>
  </book>

  <book year="2000">
    <title>Data on the Web</title>
    <author><last>Abiteboul</last><first>Serge</first></author>
    <author><last>Buneman</last><first>Peter</first></author>
    <author><last>Suciu</last><first>Dan</first></author>
    <publisher>Morgan Kaufmann Publishers</publisher>
    <price>39.95</price>
  </book>

  <book year="1999">
    <title>The Economics of Technology and Content for Digital TV</title>
    <editor>
      <last>Gerbarg</last><first>Darcy</first>
      <affiliation>CITI</affiliation>
    </editor>
    <publisher>Kluwer Academic Publishers</publisher>
    <price>129.95</price>
  </book>

</bib>

Adapted from Mary Fernandez, Jerome Simeon, and Phil Wadler: XML Query Languages: Experiences and Exemplars, 1999, as adapted in XML Query Use Cases

The XQuery FLWOR

for: each item in an XPath 2.0 sequence
let: a new variable have a specified value
where: a condition expressed in XPath is true
order by: the value of an XPath expression
return: a sequence of items

Query: List titles of all books in one document

   for $t in doc("bib.xml")/bib/book/title
   return
      $t

Adapted from XML Query Use Cases

Query Result: Book Titles

% java -classpath saxon8.jar net.sf.saxon.Query query1
<?xml version="1.0" encoding="UTF-8"?>
<title>TCP/IP Illustrated</title>
<title>Advanced Programming in the Unix Environment</title>
<title>Data on the Web</title>
<title>The Economics of Technology and Content for Digital TV</title>

Serialization Format

<?xml version="1.0"?>
<title>TCP/IP Illustrated</title><title>Advanced Programming in the Unix Environment</title><title>Data on the Web</title><title>The Economics of Technology and Content for Digital TV</title>

Algorithm for converting sequence to a document fragment:
1. Convert each atomic value to a string.
2. Concatenate adjacent strings after separating them with a single space.
3. Change the strings to text nodes
4. Replace any document node in the sequence with its children.
5. It is a serialization error if the sequence contains an attribute node or a namespace node at this point.
Output methods:
- xml
- html
- xhtml
Serialization options:
- version
- encoding
- indent
- cdata-section-elements
- omit-xml-declaration
- standalone
- doctype-system
- doctype-public
- undeclare-prefixes
- media-type
- byte-order-mark
- escape-uri
- normalization-form
- include-content-type
- escape-uri-attributes
- use-character-maps

XPath 1.0 Data Model

(Adapted from Jeni Tennison)

The first class objects are strings, numbers, booleans, and node-sets (plus result tree fragments for XSLT)
Node-sets contain nodes (which are not first-class objects)
Nodes have various properties, including children - a node-set (the order of the children can be worked out from the nodes' document order)
Seven node types: document, element, attribute, text, namespace, processing instruction, and comment
There are conceptually two kinds of node-sets:
- Node-sets containing new nodes (result tree fragments) can only be generated using XSLT
- Node-sets containing existing nodes can only be generated using XPath
No list data types, only node-sets but no number sets
Not Infoset compatible

XPath 2.0 Data Model

(Adapted from Jeni Tennison)

The first class object type is a sequence; that is, an ordered list
Sequences contain items of two types: simple typed values or nodes. (They may not contain other sequences.)
Simple typed values have W3C XML Schema Language simple types: xs:gYear, xs:int, xs:decimal, xs:date, etc.
Seven node types: document, element, attribute, text, namespace, processing instruction, and comment
Nodes have these properties:
- node-kind: either "document", "element", "attribute", "text", "namespace", "processing-instruction", or "comment".
- node-name: a sequence containing one expanded QName if the node has a name (elements, attributes, etc.) or an empty sequence if the node doesn't have a name (comments, text nodes, etc.)
- parent: a sequence containing the unique parent node; the empty sequence is returned for parentless nodes, particularly document and namespace nodes
- base-uri: URI from which this particular node came (possibly adjusted by an xml:base attribute)
- document-uri: URI from which this node's document came
- string-value: same as XPath 1.0
- typed-value: a sequence of simple typed values corresponding to the node (always the empty sequence for anything other than elements and attributes)
- children: A sequence of nodes (empty except for element and document nodes)
- attributes: a sequence of attribute nodes; empty except for element nodes
- namespaces: a sequence of namespace nodes in-scope on the node
- nilled: true if this is a nil element (xsi:nil="true"), false otherwise
- type: a sequence containing 0 or 1 schema component
Infoset compatible

Constructing sequences

Parentheses enclose sequences.
In literal sequence, the item literals are separated by a commas:
```
(1, 3, 2, 34, 76, -87)
```
The to operator generates a range sequence without explicit listing:
```
(1 to 12)
```
Using constructors:
(xs:date("2003-03-11"), xs:date("2003-03-12"), xs:date("2003-03-13"), xs:date("2003-03-14"), xs:date("2003-03-15"))
Sequences can have mixed types: (xs:date("2002-03-11"), "Hello", 15)
Sequences do not nest; that is, a sequence cannot be a member of a sequence
Sequences are not sets: they are ordered and can contain duplicates

Sequence Math

Sequences do not nest; that is, a sequence cannot be a member of a sequence
Sequences are not sets: they are ordered and can contain duplicates
A sequence containing one item is the same as the item, and vice versa

Sequence example

for $a in (1 to 10)
return $a

Output:

Data types and the PSVI

All data is typed according to XML Schema Part 2: Datatypes.
A schema specifies the types
If no schema is available, the default complex type is xs:anyType and the default simple type is xdt:untypedAtomic
Operators and functions are type-aware; e.g. can't add a string to a double or compare an integer to a year.
Constructors and casts are are available to convert data to appropriate types
Automatic casting is sometimes performed on untyped data, but can fail

Element Constructors

Tags are given as literals
XQuery expression which is evaluated to become the contents of the element is enclosed in curly braces
The contents can also contain literal text outside the braces

List titles of all books in a bib element. Put each title in a book element.

<bib>
  {
   for $title in doc("bib.xml")/bib/book/title
   return
    <book>
     { $title }
    </book>
  }
</bib>

Adapted from XML Query Use Cases

Query Result: Book Titles

<?xml version="1.0" encoding="UTF-8"?>
<bib>
   <book>
      <title>TCP/IP Illustrated</title>
   </book>
   <book>
      <title>Advanced Programming in the Unix Environment</title>
   </book>
   <book>
      <title>Data on the Web</title>
   </book>
   <book>
      <title>The Economics of Technology and Content for Digital TV</title>
   </book>
</bib>

Attribute Constructors

List books including their year and title:

<bib>
 {
   for $book in doc("bib.xml")/bib/book
   return
    <book year ="{ $book/@year }">
     { $book/title }
    </book>
 }
</bib>

Adapted from XML Query Use Cases

Query Result

<?xml version="1.0" encoding="UTF-8"?>
<bib>
   <book year="1994">
      <title>TCP/IP Illustrated</title>
   </book>
   <book year="1992">
      <title>Advanced Programming in the Unix Environment</title>
   </book>
   <book year="2000">
      <title>Data on the Web</title>
   </book>
   <book year="1999">
      <title>The Economics of Technology and Content for Digital TV</title>
   </book>
</bib>

Text Constructors

Literal text is allowed outside the XQuery expressions, just like literal tags:

<bib>
  <h1>Bibliography</h1>
 {
   for $book in doc("bib.xml")/bib/book
   return
      $book/title 
 }
</bib>

Query Result

<?xml version="1.0" encoding="UTF-8"?>
<bib>
   <h1>Bibliography</h1>
   <title>TCP/IP Illustrated</title>
   <title>Advanced Programming in the Unix Environment</title>
   <title>Data on the Web</title>
   <title>The Economics of Technology and Content for Digital TV</title>
</bib>

Other Constructors

Literal comments and processing instructions are also allowed:

<?xml-stylesheet type="text/css" href="bibliography.css"?>,
<bib>
  <h1>Bibliography</h1>
 {
   for $b in doc("bib.xml")/bib/book
   return
      $b/title 
 }
</bib>,
<!-- An example from Elliotte Rusty Harold's 
  Native XML Databases presentation -->

Notice the commas
Remember, an XQuery is not an XML document.

Expected Query Result

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="application/xml" href="bibliography.css"?>
<bib>
   <h1>Bibliography</h1>
   <title>TCP/IP Illustrated</title>
   <title>Advanced Programming in the Unix Environment</title>
   <title>Data on the Web</title>
   <title>The Economics of Technology and Content for Digital TV</title>
</bib>
<!-- An example from Elliotte Rusty Harold's 
  Native XML Databases presentation -->

Query with where

List titles of books published by Addison-Wesley

<bib>
 {
   for $book in doc("bib.xml")/bib/book
   where $book/publisher = "Addison-Wesley"
   return
      $book/title 
  }
</bib>

This where clause could be replaced by an XPath predicate:

<bib>
 {
   for $book in doc("bib.xml")/bib/book[publisher="Addison-Wesley"]
   return
      $book/title 
 }
</bib>

But where clauses can combine multiple variables from multiple documents

Adapted from XML Query Use Cases

Query Result: Titles of books published by Addison-Wesley

<?xml version="1.0" encoding="UTF-8"?>
<bib>
   <title>TCP/IP Illustrated</title>
   <title>Advanced Programming in the Unix Environment</title>
</bib>

Adapted from XML Query Use Cases

Query with Booleans

XQuery booleans include:
- and
- or
- not()

List books published by Addison-Wesley before 1993:

<bib>
 {
   for $book in doc("bib.xml")/bib/book
   where $book/publisher = "Addison-Wesley" and $book/@year < 1993
   return
      $book/title 
 }
</bib>

Do you notice anything funny about this query?

Adapted from XML Query Use Cases

Query Result: books published by Addison-Wesley before 1993

<?xml version="1.0" encoding="UTF-8"?>
<bib>
   <title>Advanced Programming in the Unix Environment</title>
</bib>

Adapted from XML Query Use Cases

Query with multiple variables

Create a list of all the title-author pairs, with each pair enclosed in a result element.

<results>
 {
   for $book in doc("bib.xml")/bib/book,
     $title in $book/title,
     $author in $book/author
   return
    <result>
    { $title }
    { $author }
    </result>
  }
</results>

Adapted from XML Query Use Cases

Query Result: A list of all the title-author pairs

<?xml version="1.0" encoding="UTF-8"?>
<results>
   <result>
      <title>TCP/IP Illustrated</title>
      <author>
         <last>Stevens</last>
         <first>W.</first>
      </author>
   </result>
   <result>
      <title>Advanced Programming in the Unix Environment</title>
      <author>
         <last>Stevens</last>
         <first>W.</first>
      </author>
   </result>
   <result>
      <title>Data on the Web</title>
      <author>
         <last>Abiteboul</last>
         <first>Serge</first>
      </author>
   </result>
   <result>
      <title>Data on the Web</title>
      <author>
         <last>Buneman</last>
         <first>Peter</first>
      </author>
   </result>
   <result>
      <title>Data on the Web</title>
      <author>
         <last>Suciu</last>
         <first>Dan</first>
      </author>
   </result>
</results>

Adapted from XML Query Use Cases

Nested FLWOR Queries

For each book in the bibliography, list the title and authors, grouped inside a result element.

<results>
 {
   for $b in doc("bib.xml")/bib/book
     return
      <result>
       { $b/title }
       {  
         for $a in $b/author
         return $a
       }
      </result>
 }
</results>

Adapted from XML Query Use Cases

Query Result: A list of the title and authors of each book in the bibliography

<?xml version="1.0" encoding="UTF-8"?>
<results>
   <result>
      <title>TCP/IP Illustrated</title>
      <author>
         <last>Stevens</last>
         <first>W.</first>
      </author>
   </result>
   <result>
      <title>Advanced Programming in the Unix Environment</title>
      <author>
         <last>Stevens</last>
         <first>W.</first>
      </author>
   </result>
   <result>
      <title>Data on the Web</title>
      <author>
         <last>Abiteboul</last>
         <first>Serge</first>
      </author>
      <author>
         <last>Buneman</last>
         <first>Peter</first>
      </author>
      <author>
         <last>Suciu</last>
         <first>Dan</first>
      </author>
   </result>
   <result>
      <title>The Economics of Technology and Content for Digital TV</title>
   </result>
</results>

Adapted from XML Query Use Cases

Query with let

let assigns a variable for reuse.

For each book in the bibliography, list the difference between the book's price and the average price:

<results> 
  {
   let $doc := doc("bib.xml")
   let $average := avg($doc//price)
   for $b in $doc/bib/book
     let $difference := $b/price - $average
     return
       <data>{ $b/title } is {$difference} more expensive than the average. </data>
  }    
</results>

:= like Pascal, not = like C and Java

Query Result: price differences

<?xml version="1.0" encoding="UTF-8"?>
<results>
   <data>
      <title>TCP/IP Illustrated</title> is -9.5 more expensive than the average. </data>
   <data>
      <title>Advanced Programming in the Unix Environment</title> is -9.5 more expensive than the average. </data>
   <data>
      <title>Data on the Web</title> is -35.5 more expensive than the average. </data>
   <data>
      <title>The Economics of Technology and Content for Digital TV</title> is 54.499999999999986 more expensive than the average. </data>
</results>

if then else

For each book in the bibliography, list the difference between the book's price and the average price, but this time indicate whether the book is more or less expensive than the average

<results> 
  {
   let $doc := doc("bib.xml")
   let $average := avg($doc//price)
   for $b in $doc/bib/book
     return
       if ($b/price > $average) then
         <data>
           { $b/title } is ${$b/price - $average} 
           more expensive than the average.
         </data>
       else  
         <data>
           { $b/title } is ${$average - $b/price} 
           less expensive than the average.
         </data>
  }    
</results>

Query Result: Price differences

<?xml version="1.0" encoding="UTF-8"?>
<results>
   <data>
      <title>TCP/IP Illustrated</title> is $9.5 less expensive than the average.</data>
   <data>
      <title>Advanced Programming in the Unix Environment</title> is $9.5 less expensive than the average.</data>
   <data>
      <title>Data on the Web</title> is $35.5 less expensive than the average.</data>
   <data>
      <title>The Economics of Technology and Content for Digital TV</title> is $54.499999999999986 more expensive than the average.</data>
</results>

Query with sorting

List the titles and years of all books published by Addison-Wesley after 1991, in alphabetic order.

<bib>
 {
   for $b in doc("bib.xml")//book[publisher = "Addison-Wesley"]
   order by ($b/title)
   return
    <book>
     { $b/@year } { $b/title }
    </book> 
 }
</bib>

Adapted from XML Query Use Cases

Query Result

<?xml version="1.0" encoding="UTF-8"?>
<bib>
   <book year="1992">
      <title>Advanced Programming in the Unix Environment</title>
   </book>
   <book year="1994">
      <title>TCP/IP Illustrated</title>
   </book>
</bib>

Adapted from XML Query Use Cases

ORDER BY modifiers

ascending
descending
empty greatest
empty least
collation "name"

<bib>
 {
   for $b in doc("bib.xml")//book[publisher = "Addison-Wesley"]
   order by ($b/title) descending
   return
    <book>
     { $b/@year } { $b/title }
    </book> 
 }
</bib>

Adapted from XML Query Use Cases

Query Result

<?xml version="1.0" encoding="UTF-8"?>
<bib>
   <book year="1994">
      <title>TCP/IP Illustrated</title>
   </book>
   <book year="1992">
      <title>Advanced Programming in the Unix Environment</title>
   </book>
</bib>

Adapted from XML Query Use Cases

A different document about books

Sample data at "reviews.xml":

<?xml version="1.0"?>
<reviews>
  <entry>
    <title>Data on the Web</title>
    <price>34.95</price>
    <review>
       A very good discussion of semi-structured database
       systems and XML.
    </review>
  </entry>
  <entry>
    <title>Advanced Programming in the Unix Environment</title>
    <price>65.95</price>
    <review>
      A clear and detailed discussion of UNIX programming.
    </review>
  </entry>
  <entry>
    <title>TCP/IP Illustrated</title>
    <price>65.95</price>
    <review>
      One of the best books on TCP/IP.
    </review>
  </entry>
</reviews>

Adapted from XML Query Use Cases

This document uses a different DTD

<!ELEMENT reviews (entry*)>
<!ELEMENT entry   (title, price, review)>
<!ELEMENT title   (#PCDATA)>
<!ELEMENT price   (#PCDATA)>
<!ELEMENT review  (#PCDATA)>

Query that joins two documents

For each book found in both bib.xml and reviews.xml, list the title of the book and its price from each source.

<books-with-prices>
 {
   for $b in doc("bib.xml")//book,
     $a in doc("reviews.xml")//entry
   where $b/title = $a/title
   return
    <book-with-prices>
     { $b/title },
       <price-amazon> { $a/price/text() } </price-amazon>
       <price-bn> { $b/price/text() } </price-bn>
    </book-with-prices>
 }
</books-with-prices>

Adapted from XML Query Use Cases

Result

<?xml version="1.0" encoding="UTF-8"?>
<books-with-prices>
   <book-with-prices>
      <title>TCP/IP Illustrated</title>,
       <price-amazon>65.95</price-amazon>
      <price-bn>65.95</price-bn>
   </book-with-prices>
   <book-with-prices>
      <title>Advanced Programming in the Unix Environment</title>,
       <price-amazon>65.95</price-amazon>
      <price-bn>65.95</price-bn>
   </book-with-prices>
   <book-with-prices>
      <title>Data on the Web</title>,
       <price-amazon>34.95</price-amazon>
      <price-bn>39.95</price-bn>
   </book-with-prices>
</books-with-prices>

Adapted from XML Query Use Cases

prices.xml Query Sample Data

The next query also uses an input document named "prices.xml":

<?xml version="1.0"?>
<prices>
  <book>
    <title>Advanced Programming in the Unix Environment</title>
    <source>www.amazon.com</source>
    <price>65.95</price>
  </book>
  <book>
    <title>Advanced Programming in the Unix Environment</title>
    <source>www.bn.com</source>
    <price>65.95</price>
  </book>
  <book>
    <title>TCP/IP Illustrated</title>
    <source>www.amazon.com</source>
    <price>65.95</price>
  </book>
  <book>
    <title>TCP/IP Illustrated</title>
    <source>www.bn.com</source>
    <price>65.95</price>
  </book>
  <book>
    <title>Data on the Web</title>
    <source>www.amazon.com</source>
    <price>34.95</price>
  </book>
  <book>
    <title>Data on the Web</title>
    <source>www.bn.com</source>
    <price>39.95</price>
  </book>
</prices>

Adapted from XML Query Use Cases

Query with reused variables

In the document "prices.xml", find the minimum price for each book, in the form of a minprice element with the book title as its title attribute.

<results>
 {
   let $doc := doc("prices.xml")
   for $t in distinct-values($doc/prices/book/title)
     let $p := $doc/prices/book[title = $t]/price
     return
       <minprice title="{$t}">
         { min($p) }
       </minprice>
 }
</results>

Adapted from XML Query Use Cases

Query Result

<?xml version="1.0" encoding="UTF-8"?>
<results>
   <minprice title="Advanced Programming in the Unix Environment">65.95</minprice>
   <minprice title="TCP/IP Illustrated">65.95</minprice>
   <minprice title="Data on the Web">34.95</minprice>
</results>

Adapted from XML Query Use Cases

Multiple FLWOR Queries

For each book with an author, return a book with its title and authors. For each book with an editor, return a reference with the book title and the editor's affiliation.

<bib>
 {
   for $b in doc("bib.xml")//book[author]
   return
    <book>
     { $b/title }
     { $b/author }
    </book>,
   for $b in doc("bib.xml")//book[editor]
   return
    <reference>
     { $b/title }
     <org> { $b/editor/affiliation/text() } </org>
    </reference>
 }
</bib>

Adapted from XML Query Use Cases

Query Result

<?xml version="1.0" encoding="UTF-8"?>
<bib>
   <book>
      <title>TCP/IP Illustrated</title>
      <author>
         <last>Stevens</last>
         <first>W.</first>
      </author>
   </book>
   <book>
      <title>Advanced Programming in the Unix Environment</title>
      <author>
         <last>Stevens</last>
         <first>W.</first>
      </author>
   </book>
   <book>
      <title>Data on the Web</title>
      <author>
         <last>Abiteboul</last>
         <first>Serge</first>
      </author>
      <author>
         <last>Buneman</last>
         <first>Peter</first>
      </author>
      <author>
         <last>Suciu</last>
         <first>Dan</first>
      </author>
   </book>
   <reference>
      <title>The Economics of Technology and Content for Digital TV</title>
      <org>CITI</org>
   </reference>
</bib>

Adapted from XML Query Use Cases

Querying documents that use namespaces

Several namespace declarations are "understood":
- xml = http://www.w3.org/XML/1998/namespace
- xs = http://www.w3.org/2001/XMLSchema
- xsi = http://www.w3.org/2001/XMLSchema-instance
- fn = http://www.w3.org/2005/xpath-functions
- xdt = http://www.w3.org/2005/xpath-datatypes
- err = http://www.w3.org/2005/xqt-errors

Customary namespace declarations can be used in element constructors, much as in XSLT. For example,

<html xmlns="http://www.w3.org/1999/xhtml"
xmlns:html="http://www.w3.org/1999/xhtml">
  <head>
    {
    let $title := doc('http://www.cafeconleche.org/')//html:title
    return $title
    }
  </head>

</html>

Alternately, you can declare the namespace in the query's prolog, like so:

declare namespace html= "http://www.w3.org/1999/xhtml";
<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    {
    let $title := doc('http://www.cafeconleche.org/')//html:title
    return $title
    }
  </head>

</html>

Usual rules about the nearest conflicting namespace declaration taking precedence apply

Output:

<?xml version="1.0" encoding="UTF-8"?>
<html xmlns="http://www.w3.org/1999/xhtml">
   <head>
      <title>Cafe con Leche XML News and Resources</title>
   </head>
</html>

Query Software

Michael Kay's Saxon 8: http://saxon.sourceforge.net/
XQuisitor: http://www.cafeconleche.org/xquisitor/
eXist
Berkeley dbXML
Bell Labs' Galax: http://www.galaxquery.org/
Xavier Franc's Qizx/open: http://www.axyana.com/qizxopen, an open source impementation of XQuery, written in Java. It conforms to XQuery Basic with Static Type Checking.
Mark Logic Server: http://www.cerisent.com/products/ml_server.html
Software AG's Tamino: http://www.softwareag.com/tamino/
Ipedo: http://www.ipedo.com/
Cognetic Systems's XQuantum: http://www.cogneticsystems.com/xquery/xquery.html
Fatdog's XQEngine: http://xqengine.sourceforge.net/
GAEL's Derby: http://www.gael.fr/derby/
Qexo (Kawa-Query): http://www.qexo.org/ Compiles XQuery on-the-fly to Java bytecodes. Based on and part of the Kawa framework. Open-source.
IPSI's IPSI-XQ: http://ipsi.fhg.de/oasys/projects/ipsi-xq/index_e.html
Microsoft's XML Query Language Demo: http://xqueryservices.com
Oracle Database 10g Release 2: http://www.oracle.com/technology/tech/xml/xquery/index.html
QuiLogic's SQL/XML-IMDB: http://www.quilogic.cc/xml.htm
Chris Wilper's XQuench: http://xquench.sourceforge.net/ Open-source.
X-Hive's XQuery demo: http://www.x-hive.com/xquery

What's the difference between XQuery and XSLT?

XSLT is document-driven; XQuery is program driven
XSLT is functional; XQuery is declarative
XSLT is written in XML; XQuery is not
An assertion (unproven): XSLT 2.0 can do everything XQuery can do

XPath 2.0

Used by XSLT 2.0 and XQuery
Schema Aware

XPath 2.0 Goals

Simplify manipulation of XML Schema-typed content
Simplify manipulation of string content
Support related XML standards
Improve ease of use
Improve interoperability
Improve internationalization (i18n) support
Maintain backward compatibility
Enable improved processor efficiency

Held over from XPath 1.0

Basic syntax
Location paths and location steps
Axes: parent, child, ancestor, ancestor-or-self, self, descendant, descendant-or-self, following, following-sibling, preceding, preceding-sibling, attribute, namespace
However, support for the ancestor, ancestor-or-self, following, following-sibling, preceding, and preceding-sibling axes are optional in XQuery. XQuery implementations cannot use the namespace axis.
Node tests
Predicates
Abbreviated syntax: *, @*, //, etc.
Operators: +, -, div, mod, *, etc.
All functions, but they are now more strongly typed and don't always behave exactly the same
Basic expression syntax, though this has been greatly expanded
Most XPath 1.0 expressions are still legal XPath 2.0 expressions that mean pretty much the same thing.

Accessor Functions

fn:node-name(Node): returns zero or one QName
fn:string(Object): returns the string value of anything
fn:data(Node): returns a sequence of zero or more typed simple values
fn:base-uri(node): returns the base URI of an Element or Document node
fn:document-uri(node): returns the document URI of an Element or Document node

Constructor Functions

Create a simple type from a string
Constructors are in the http://www.w3.org/2001/XMLSchema-datatypes namespace which is "understood" to be mapped to the xs prefix
Numeric constructors:
- xs:decimal(string $srcval) => decimal
- xs:integer(string $srcval) => integer
- xs:long(string $srcval) => integer
- xs:int(string $srcval) => integer
- xs:short(string $srcval) => integer
- xs:byte(string $srcval) => integer
- xs:float(string $srcval) => float
- xs:double(string $srcval) => double
- xs:nonPositiveInteger($arg as xdt:anyAtomicType) => xs:nonPositiveInteger
- xs:negativeInteger($arg as xdt:anyAtomicType) => xs:negativeInteger
- xs:long($arg as xdt:anyAtomicType) => xs:long
- xs:int($arg as xdt:anyAtomicType) => xs:int
- xs:short($arg as xdt:anyAtomicType) => xs:short
- xs:byte($arg as xdt:anyAtomicType) => xs:byte
- xs:nonNegativeInteger($arg as xdt:anyAtomicType) => xs:nonNegativeInteger
- xs:unsignedLong($arg as xdt:anyAtomicType) => xs:unsignedLong
- xs:unsignedInt($arg as xdt:anyAtomicType) => xs:unsignedInt
- xs:unsignedShort($arg as xdt:anyAtomicType) => xs:unsignedShort
- xs:unsignedByte($arg as xdt:anyAtomicType) => xs:unsignedByte
- xs:positiveInteger($arg as xdt:anyAtomicType) => xs:positiveInteger
String constructors
- xs:string(string $srcval) => string
- xs:normalizedString(string $srcval) => normalizedString
- xs:token(string $srcval) => token
- xs:language(string $srcval) => language
- xs:Name(string $srcval) => Name
- xs:NMTOKEN(string $srcval) => NMTOKEN
- xs:NCName(string $srcval) => NCName
- xs:ID(string $srcval) => ID
- xs:IDREF(string $srcval) => IDREF
- xs:ENTITY(string $srcval) => ENTITY
- xs:QName(string $srcval) => QName
Boolean constructors:
- xs:boolean(string $srcval) => boolean
Duration and Datetime constructors:
- xs:duration(string $srcval) => duration
- xs:dateTime(string $srcval) => dateTime
- xs:date(string $srcval) => date
- xs:time(string $srcval) => time
- xs:gYearMonth(string $srcval) => gYearMonth
- xs:gYear(string $srcval) => gYear
- xs:gMonthDay(string $srcval) => gMonthDay
- xs:gMonth(string $srcval) => gMonth
- xs:gDay(string $srcval) => gDay
- xdt:yearMonthDuration($arg as xdt:anyAtomicType) => xdt:yearMonthDuration
- xdt:dayTimeDuration($arg as xdt:anyAtomicType) => xdt:dayTimeDuration
Constructor for anyURI:
- xs:anyURI(string $srcval) => anyURI
Constructors for NOTATION:
- xs:NOTATION(string $srcval) => NOTATION
Binary types:
- xs:hexBinary($arg as xdt:anyAtomicType) => xs:hexBinary
- xs:base64Binary($arg as xdt:anyAtomicType) => xs:base64Binary
Untyped type:
- xdt:untypedAtomic($arg as xdt:anyAtomicType) as xdt:untypedAtomic

Casting

instance of tests the type of an item against the QName of a type
Casting changes the type of an item
Not all casts are legal, but mostly it works like you'd expect.
castable returns true if the cast is possible, false otherwise
item cast as type permanently changes the type of an item
item treat as type temporarily changes the type of the item for this expression only

if ($x castable as xs:gYear) then 
  $x cast as xs:gYear
else if ($x castable as xs:integer) then 
  $x cast as xs:integer
else if ($x castable as xs:decimal) then 
  $x cast as xs:decimal
else 
  $x cast as string

Four kinds of comparison operators

Value comparisons: compare a single value to a single value of a comparable type for equality
General comparisons: compare a sequence to a sequence for equality of at least one pair of members
Node comparisons: test for node identity
Order comparisons: compare document order

Value comparison operators

Compare single values and sequences of single or no values:
- eq
- ne
- lt
- le
- gt
- ge
These operators return either true, false, the empty sequence, an error, or a type exception.
Types must be comparable (No automatic conversion from strings as in XPath 1.0!):
1. Subtype substitution: A derived type may substitute for its base type. In particular, integer may be used where decimal is expected.
2. Type promotion: decimal may be promoted to float, and float may be promoted to double.

General comparisons

Compare one sequence to another sequence
True the condition is true for any pair of items from the two sequences
- =
- !=
- <
- <=
- >
- >=
These operators always return either true or false.

Node comparisons

is
Only used on single nodes and empty sequences; otherwise a type error is raised.
Test for node identity like Java's == operator, not the equals() method

Order comparisons

>> and << compare single nodes for document order
The << operator returns true if the first operand node is reachable from the second operand node using the preceding axis; otherwise it returns false.
The >> operator returns true if the first operand node is reachable from the second operand node using the following axis; otherwise it returns false.

Functions and operators

Functions are identified by the fn prefix
The function prefix is understood in XQuery, without being explicitly stated.
Operators are indicated by the op: prefix
XPath implementations such as XQuery and XSLT map the operators to symbols like * and +

Arithmetic operators

+: op:numeric-add(numeric $operand1, numeric $operand2) => numeric
-: op:numeric-subtract(numeric $operand1, numeric $operand2) => numeric
*: op:numeric-multiply(numeric $operand1, numeric $operand2) => numeric
div: op:numeric-divide(numeric $operand1, numeric $operand2) => numeric
idiv: op:numeric-integer-divide(integer $operand1, integer $operand2) => integer
mod: op:numeric-mod(numeric $operand1, numeric $operand2) => numeric
+: op:numeric-unary-plus(numeric $operand) => numeric
-: op:numeric-unary-minus(numeric $operand) => numeric

Numeric Functions

fn:abs(double? $srcval) => double?
fn:floor(double? $srcval) => integer?
fn:ceiling(double? $srcval) => integer?
fn:round(double? $srcval) => integer?
fn:round-half-to-even(double? $srcval) => integer?

String functions

fn:codepoints-to-string( $arg as xs:integer*) => xs:string
fn:string-to-codepoints( $arg as xs:string?) => xs:integer*
fn:concat() => string
fn:concat(string? $op1) => string
fn:concat(string? $op1, string? $op2, ...) => string
fn:string-join(string* $operand1, string* $operand2) => string
fn:starts-with(string? $operand1, string? $operand2) => boolean?
fn:starts-with(string? $operand1, string? $operand2, anyURI $collationLiteral) => boolean?
fn:ends-with(string? $operand1, string? $operand2) => boolean?
fn:ends-with(string? $operand1, string? $operand2, anyURI $collationLiteral) => boolean?
fn:contains(string? $operand1, string? $operand2) => boolean?
fn:contains(string? $operand1, string? $operand2, anyURI $collationLiteral) => boolean?
fn:substring(string? $sourceString, decimal? $startingLoc) => string?
fn:substring(string? $sourceString, decimal? $startingLoc, decimal? $length) => string?
fn:string-length(string? $srcval) => integer?
fn:substring-before(string? $operand1, string? $operand2) => string?
fn:substring-before(string? $operand1, string? $operand2, anyURI $collationLiteral) => string?
fn:substring-after(string? $operand1, string? $operand2) => string?
fn:substring-after(string? $operand1, string? $operand2, anyURI $collationLiteral) => string?
fn:normalize-space(string? $srcval) => string?
fn:normalize-unicode(string? $srcval, string $normalizationForm) => string?
fn:upper-case(string? $srcval) => string?
fn:lower-case(string? $srcval) => string?
fn:translate(string? $srcval, string? $mapString, string? $transString) => string?
fn:matches(string? $srcval, string? $regexp) => integer*
fn:replace(string? $srcval, string? $regexp, string? $repval) => string?
fn:tokenize(string? $input as string?, string? $pattern) => string*
fn:tokenize(string? $input as string?, string? $pattern as string?, string? $flags) => string*
fn:escape-uri(string $uri-part as string, boolean $escape-reserved) => string

Regular expressions

Syntax for fn:matches() is based on W3C XML Schema Language regular expressions:
Syntax for fn:replace() is based on W3C XML Schema Language regular expressions plus $N in replace patterns to indicate the Nth match.

Boolean Functions and Operators

and: op:boolean-and(boolean $value1, boolean $value2) => boolean
or: op:boolean-or(boolean $value1, boolean $value2) => boolean
eq: op:boolean-equal(boolean? $value1, boolean? $value2) => boolean?
fn:not(boolean? $srcval) => boolean

Date and time functions

xs:duration is underspecified so new xdt:yearMonthDuration and xdt:dayTimeDuration types are defined.
- op:add-yearMonthDurations( $arg1 as xdt:yearMonthDuration, $arg2 as xdt:yearMonthDuration) => xdt:yearMonthDuration
- op:subtract-yearMonthDurations( $arg1 as xdt:yearMonthDuration, $arg2 as xdt:yearMonthDuration) => xdt:yearMonthDuration
- op:multiply-yearMonthDuration( $arg1 as xdt:yearMonthDuration, $arg2 as xs:double) => xdt:yearMonthDuration
- op:divide-yearMonthDuration( $arg1 as xdt:yearMonthDuration, $arg2 as xs:double) => xdt:yearMonthDuration
- op:add-dayTimeDurations( $arg1 as xdt:dayTimeDuration, $arg2 as xdt:dayTimeDuration) => xdt:dayTimeDuration
- op:subtract-dayTimeDurations( $arg1 as xdt:dayTimeDuration, $arg2 as xdt:dayTimeDuration) => xdt:dayTimeDuration
- op:multiply-dayTimeDuration( $arg1 as xdt:dayTimeDuration, $arg2 as xs:double) => xdt:dayTimeDuration
- op:divide-dayTimeDuration( $arg1 as xdt:dayTimeDuration, $arg2 as xs:double) => xdt:dayTimeDuration
Comparisons of Duration and Datetime Values:
- op:duration-equal(duration $operand1, duration $operand2) => boolean
- op:gYearMonth-equal(gYearMonth $operand1, gYearMonth $operand2) => boolean
- op:gYear-equal(gYear $operand1, gYear $operand2) => boolean
- op:gMonthDay-equal(gMonthDay $operand1, gMonthDay $operand2) => boolean
- op:gMonth-equal(gMonth $operand1, gMonth $operand2) => boolean
- op:gDay-equal(gDay $operand1, gDay $operand2) => boolean
- op:yearMonthDuration-equal(yearMonthDuration $operand1, yearMonthDuration $operand2) => boolean
- op:yearMonthDuration-less-than(yearMonthDuration $operand1, yearMonthDuration $operand2) => boolean
- op:yearMonthDuration-greater-than(yearMonthDuration $operand1, yearMonthDuration $operand2) => boolean
- op:dayTimeDuration-equal(dayTimeDuration $operand1, dayTimeDuration $operand2) => boolean
- op:dayTimeDuration-less-than(dayTimeDuration $operand1, dayTimeDuration $operand2) => boolean
- op:dayTimeDuration-greater-than(dayTimeDuration $operand1, dayTimeDuration $operand2) => boolean
- op:dateTime-equal(dateTime $operand1, dateTime $operand2) => boolean
- op:dateTime-less-than(dateTime $operand1, dateTime $operand2) => boolean
- op:dateTime-greater-than(dateTime $operand1, dateTime $operand2) => boolean
- op:time-equal(time $operand1, time $operand2) => boolean
- op:time-less-than(time $operand1, time $operand2) => boolean
- op:time-greater-than(time $operand1, time $operand2) => boolean
- op:date-equal(date $operand1, date $operand2) => boolean
- op:date-less-than(date $operand1, date $operand2) => boolean
- op:date-greater-than(date $operand1, date $operand2) => boolean
Component Extraction Functions on Duration, Date and Time Values:
- fn:get-years-from-yearMonthDuration(yearMonthDuration $srcval) => integer
- fn:get-months-from-yearMonthDuration(yearMonthDuration $srcval) => integer
- fn:get-days-from-dayTimeDuration(dayTimeDuration $srcval) => integer
- fn:get-hours-from-dayTimeDuration(dayTimeDuration $srcval) => integer
- fn:get-minutes-from-dayTimeDuration(dayTimeDuration $srcval) => integer
- fn:get-seconds-from-dayTimeDuration(dayTimeDuration $srcval) => integer
- fn:get-year-from-dateTime(dateTime $srcval) => integer
- fn:get-month-from-dateTime(dateTime $srcval) => integer
- fn:get-day-from-dateTime(dateTime $srcval) => integer
- fn:get-hours-from-dateTime(dateTime $srcval) => integer
- fn:get-minutes-from-dateTime(dateTime $srcval) => integer
- fn:get-seconds-from-dateTime(dateTime $srcval) => integer
- fn:get-timezone-from-dateTime(dateTime $srcval) => integer
- fn:get-year-from-date(date $srcval) => integer
- fn:get-month-from-date(date $srcval) => integer
- fn:get-day-from-date(date $srcval) => integer
- fn:get-timezone-from-date(date $srcval) => integer
- fn:get-hours-from-time(time $srcval) => integer
- fn:get-minutes-from-time(time $srcval) => integer
- fn:get-seconds-from-time(time $srcval) => integer
- fn:get-timezone-from-time(time $srcval) => integer
Time zone adjustment
- fn:adjust-dateTime-to-timezone( $arg as xs:dateTime?) => xs:dateTime?
- fn:adjust-dateTime-to-timezone( $arg as xs:dateTime?, $timezone as xdt:dayTimeDuration?) => xs:dateTime?
- fn:adjust-date-to-timezone( $arg as xs:date?) => xs:date?
- fn:adjust-date-to-timezone( $arg as xs:date?, $timezone as xdt:dayTimeDuration?) => xs:date?
- fn:adjust-time-to-timezone( $arg as xs:time?) => xs:time?
- fn:adjust-time-to-timezone( $arg as xs:time?, $timezone as xdt:dayTimeDuration?) => xs:time?
Adding and Subtracting Durations From dateTime, date and time:
- fn:subtract-dateTimes-yielding-yearMonthDuration( $arg1 as xs:dateTime?, $arg2 as xs:dateTime?) => xdt:yearMonthDuration?
- fn:subtract-dateTimes-yielding-dayTimeDuration( $arg1 as xs:dateTime?, $arg2 as xs:dateTime?) => xdt:dayTimeDuration?
- op:subtract-dates($arg1 as xs:date?, $arg2 as xs:date?) => xdt:dayTimeDuration?
- op:subtract-times($arg1 as xs:time?, $arg2 as xs:time?) => xdt:dayTimeDuration?
- op:add-yearMonthDuration-to-dateTime( $arg1 as xs:dateTime, $arg2 as xdt:yearMonthDuration) => xs:dateTime
- op:add-dayTimeDuration-to-dateTime( $arg1 as xs:dateTime, $arg2 as xdt:dayTimeDuration) => xs:dateTime
- op:subtract-yearMonthDuration-from-dateTime( $arg1 as xs:dateTime, $arg2 as xdt:yearMonthDuration) => xs:dateTime
- op:subtract-dayTimeDuration-from-dateTime( $arg1 as xs:dateTime, $arg2 as xdt:dayTimeDuration) => xs:dateTime
- op:add-yearMonthDuration-to-date( $arg1 as xs:date, $arg2 as xdt:yearMonthDuration) => xs:date
- op:add-dayTimeDuration-to-date( $arg1 as xs:date, $arg2 as xdt:dayTimeDuration) => xs:date
- op:subtract-yearMonthDuration-from-date( $arg1 as xs:date, $arg2 as xdt:yearMonthDuration) => xs:date
- op:subtract-dayTimeDuration-from-date( $arg1 as xs:date,
- op:add-dayTimeDuration-to-time( $arg1 as xs:time, $arg2 as xdt:dayTimeDuration) => xs:time
- op:subtract-dayTimeDuration-from-time( $arg1 as xs:time, $arg2 as xdt:dayTimeDuration) => xs:time

Qualified Name Functions

fn:QName-in-context(string $qname, boolean $use-default) => QName
fn:QName-in-context(string $qname, boolean $use-default, node $node) => QName
fn:get-local-name-from-QName(QName? $srcval) => string?
fn:get-namespace-uri-from-QName(QName? $srcval) => anyURI?
fn:get-namespace-uri-for-prefix(element $element, string $prefix) => string?
fn:get-in-scope-prefixes(element $element) => string*
fn:resolve-QName($qname as xs:string?, $element as element()?) => xs:QName?
fn:expanded-QName($paramURI as xs:string?, $paramLocal as xs:string) => xs:QName

Node Functions

fn:name() => string
fn:name(node $srcval) => string
fn:local-name() => string
fn:local-name(node $srcval) => string
fn:namespace-uri() => string
fn:namespace-uri(node $srcval) => string
fn:root() => node
fn:root(node $srcval) => node
fn:number() => double
fn:number(node $srcval) => double
fn:lang(string $testlang) => boolean

Sequence Functions

fn:boolean(item* $srcval) => boolean
,: op:concatenate(item* $seq1, item* $seq2) => item*
op:item-at(item* $seqParam, decimal $posParam) => item?
fn:index-of(item* $seqParam, item $srchParam) => unsignedInt?
fn:index-of(item* $seqParam, item $srchParam, anyURI $collationLiteral) => unsignedInt?
fn:empty(item* $srcval) => boolean
fn:exists(item* $srcval) => boolean
fn:distinct-values(item* $srcval) => item*
fn:distinct-values(item* $srcval, anyURI $collationLiteral) => item*
fn:insert-before(item* $target, decimal $position, item* $inserts) => item*
fn:remove(item* $target, decimal $position) => item*
fn:reverse($arg as item()*) => item()*
fn:unordered($sourceSeq as item()*) => item()*
fn:subsequence(item* $sourceSeq, decimal $startingLoc) => item*
fn:subsequence(item* $sourceSeq, decimal $startingLoc, decimal $length) => item*
fn:deep-equal(item* $parameter1, item* $parameter2) => boolean?
fn:deep-equal(item* $parameter1, item* $parameter2, anyURI $collationLiteral) => boolean?
fn:count(item* $srcval) => unsignedInt
fn:avg(item* $srcval) => double?
fn:max(item* $srcval) => anySimpleType?
fn:max(item* $srcval, anyURI $collationLiteral) => anySimpleType?
fn:min(item* $srcval) => anySimpleType?
fn:min(item* $srcval, anyURI $collationLiteral) => anySimpleType?
fn:avg(item* $srcval) => double?
fn:max(item* $srcval) => double?
fn:min(item* $srcval) => double?
fn:sum(item* $srcval) => double?
fn:id(IDREF* $srcval) => elementNode*
fn:idref(string* $srcval) => elementNode*
fn:collection(string $srcval) => node*
fn:input() => node*
fn:doc(string? $srcval) => node?

Sequence size Functions

fn:zero-or-one($arg as item()*) => item()?: Returns $arg if it contains zero or one items. Otherwise, raises an error
fn:one-or-more($arg as item()*) => item()?: Returns $arg if it containsone or more items. Otherwise, raises an error
fn:exactly-one($arg as item()*) => item()?: Returns $arg if it contains exactly one item. Otherwise, raises an error

Context Functions

fn:position() => unsignedInt
fn:last() => unsignedInt
op:context-document() => DocumentNode
fn:current-dateTime() => dateTime
fn:current-time() => time
fn:current-date() => date
fn:default-collation() => anyURI?
fn:implicit-timezone() => dayTimeDuration?

Other New features in XPath 2.0

Comments
Namespace wildcards
Functions as location steps
Parenthesized expressions as location steps
Dereference steps
For Expressions
Conditional Expressions
Quantified Expressions

XPath Comments

(: This is an XPath comment :)

<xsl:apply-templates 
 select="(: The difference between the context node and the 
             current node is crucial here :)
 ../composition[@composer=current()/@id]"/>

Namespace wildcards

<xsl:template match="*:set">
  This matches MathML set elements, SVG set elements, set
  elements in no namespace at all, etc. 
</xsl:template>

Can use functions as location steps

The doc() function returns the root of a document at a given URL
doc("http://www.cafeconleche.org/")//today

Can use parenthesized expressions as location steps

/child::contacts/(child::personal | child::business)/child::name
Abbreviated: /contacts/(personal | business)/name

Dereference steps

Map an IDREF attribute node to the element it refers to

Composers and their compositions are linked through the an ID-type id attribute of the composer element and the IDREF-type composer attribute of the composition element:

  <composer id="c3">
    <name>
      <first_name>Beth</first_name> 
      <middle_name></middle_name> 
      <last_name>Anderson</last_name>
    </name>
  </composer>
    
  <composition composers="c3">
    <title>Trio: Dream in D</title>
    <date><year>(1980)</year></date> 
    <length>10'</length>
    <instruments>fl, pn, vc, or vn, pn, vc</instruments>
    <description>
      Rhapsodic. Passionate. Available on CD 
      <cite><a href="http://www.amazon.com/exec/obidos/ASIN/B000007NMH/qid%3D913265342/sr%3D1-2/">Two by Three</a></cite> 
      from North/South Consonance (1998).
    </description> 
    <publisher></publisher>
  </composition>

With XPath 1.0:

<xsl:template match="composition">
  <h2>
    <xsl:value-of select="name"/> by
    <xsl:value-of select="../composer[@id=current()/@composer]"/>
  </h2>
</xsl:template>

With XPath 2.0:

<xsl:template match="composition">
  <h2>
    <xsl:value-of select="name"/> by
    <xsl:value-of select="@composers=>composer/name"/>
  </h2>
</xsl:template>

For Expressions

Useful for joining documents
Useful for restructuring data

Syntax:

for $var1 in expression, $var2 in expression...
return expression

for Example

Consider the list of weblogs at http://static.userland.com/weblogMonitor/logs.xml

<?xml version="1.0" encoding="ISO-8859-1" ?>
<!DOCTYPE foo SYSTEM "http://msdn.microsoft.com/xml/general/htmlentities.dtd">
<weblogs>
    <log>
        <name>MozillaZine</name>
        <url>http://www.mozillazine.org</url>
        <changesUrl>http://www.mozillazine.org/contents.rdf</changesUrl>
        <ownerName>Jason Kersey</ownerName>
        <ownerEmail>kerz@en.com</ownerEmail>
        <description>THE source for news on the Mozilla Organization.  DevChats, Reviews, Chats, Builds, Demos, Screenshots, and more.</description>
        <imageUrl></imageUrl>
        <adImageUrl>http://static.userland.com/weblogMonitor/ads/kerz@en.com.gif</adImageUrl>
    </log>
    <log>
        <name>SalonHerringWiredFool</name>
        <url>http://www.salonherringwiredfool.com/</url>
        <ownerName>Some Random Herring</ownerName>
        <ownerEmail>salonfool@wiredherring.com</ownerEmail>
        <description></description>
    </log>
    <log>
        <name>SlashDot.Org</name>
        <url>http://www.slashdot.org/</url>
        <ownerName>Simply a friend</ownerName>
        <ownerEmail>afriendofweblogs@weblogs.com</ownerEmail>
        <description>News for Nerds, Stuff that Matters.</description>
    </log>
</weblogs>

The changesUrl element points to a document like this:

<?xml version="1.0"?>
<!DOCTYPE rss PUBLIC "-//Netscape Communications//DTD RSS 0.91//EN" 
                     "http://my.netscape.com/publish/formats/rss-0.91.dtd">
<rss version="0.91">
  <channel>
    <title>MozillaZine</title>
    <link>http://www.mozillazine.org/</link>
    <language>en-us</language>
    <description>Your source for Mozilla news, advocacy, interviews, builds, and more!</description>
    <copyright>Copyright 1998-2002, The MozillaZine Organization</copyright>
    <managingEditor>jason@mozillazine.org</managingEditor>
    <webMaster>jason@mozillazine.org</webMaster>
    <image>
      <title>MozillaZine</title>
      <url>http://www.mozillazine.org/image/mynetscape88.gif</url>
      <description>Your source for Mozilla news, advocacy, interviews, builds, and more!</description>
      <link>http://www.mozillazine.org/</link>
    </image>

    <item>
      <title>BugDays Are Back!</title>
      <link>http://www.mozillazine.org/talkback.html?article=2151</link>
    </item>

    <item>
      <title>Independent Status Reports</title>
      <link>http://www.mozillazine.org/talkback.html?article=2150</link>
    </item>

  </channel>

</rss>

We want to process all the item elements from each weblog.

for Example


<xsl:template match="weblogs">
  <xsl:apply-templates select="
    for $url in log/changesUrl
    return doc($url)//item
  "/>
</xsl:template>

Conditional Expressions

if ( expression) then expression else expression

Not all weblogs have a changesUrl

<xsl:template match="log">
  <xsl:apply-templates select="
    if (changesUrl)
     then document(changesUrl)
     else document(url)"/>
</xsl:template>

Quantified Expressions

some $QualifedName in expression satisfies expression
every $QualifedName in expression satisfies expression
Both return boolean values, true or false

<xsl:template match="weblogs">
  <xsl:if test="some $log in log satisfies changesURL">
     At least one log has a changesURL
  </xsl:if>
</xsl:template>

<xsl:template match="weblogs">
  <xsl:if test="every $log in log satisfies url">
    Every log has a url
  </xsl:if>
</xsl:template>

A Closing Thought

"Documents vs. Data"?
"Documents and Data"?
Or "Documents are Data"

Traditionally we have distinguished between documents, which are static, linear files that live in a file system and data, which are small chunks of information that are stored in a database. Much, perhaps most, PHP is little more than a glue language for inserting data into templates to form documents. Ditto for many report generation tools.

What if we break down the walls? Suppose everything goes into the database, not just the "data"? A single controller turns every request to the web server into a database query. Everything is stored in one place, and gets the benefits of the database: ACID, backups, distribution, professional management, etc. Everything can be queried, and the architecture is simpler: one piece (a database) instead of two (database+file system). Furthermore the system is more flexible. Modern web servers are still crippled by the assumption that what they're doing is serving files, and that the URL structure maps to a file system. That's too limiting, especially for fully RESTful applications. To enable the next generation of web applications, we need to pull the file system out from under the web servers and teach developers to think in terms of URL design and database queries. SQL is too limited to handle data this broad. XQuery can.

To Learn More

This presentation: http://www.cafeconleche.org/slides/sd2008west/xmldb
XQuery: A Query Language for XML: http://www.w3.org/TR/xquery/
XML Query Requirements: http://www.w3.org/TR/xmlquery-req
XML Query Use Cases: http://www.w3.org/TR/xmlquery-use-cases
XML Query Data Model: http://www.w3.org/TR/query-datamodel/
The XML Query Algebra: http://www.w3.org/TR/query-algebra/
XML Syntax for XQuery 1.0 (XQueryX): http://www.w3.org/TR/xqueryx
XSLT 2.0 and XQuery 1.0 Serialization: http://www.w3.org/TR/xslt-xquery-serialization/
XQuery 1.0 and XPath 2.0 Functions and Operators Version 1.0: http://www.w3.org/TR/xquery-operators/
XPath 2.0: http://www.w3.org/TR/xpath20
XPath 2.0 Requirements: http://www.w3.org/TR/xpath20req
XSLT 2.0: http://www.w3.org/TR/xslt20
XSLT 2.0 Requirements: http://www.w3.org/TR/xslt20req

Index | Cafe con Leche | Cafe au Lait