XQuery


XQuery: XQuandary or eXQuisite?

Elliotte Rusty Harold

New York XML Special Interest Group

Monday, November 17, 2003

elharo@metalab.unc.edu

http://www.cafeconleche.org/


Versions Covered


XQuery

Three parts:


XQuery Language


Documents to Query


Physical Representations to Query


Where is XQuery used?


The XML Model vs. the Relational Model

A relational database contains tables An XML database contains collections
A relational table contains records with the same schema A collection contains XML documents with the same DTD
A relational record is an unordered list of named values An XML document is a tree of nodes
A SQL query returns an unordered set of records An XQuery returns an ordered sequence of nodes

Query Data Types

XPath 2.0 Data Model Type Hierarchy

Picture taken from XQuery 1.0 and XPath 2.0 Functions and Operators W3C Working Draft 12 November 2003


An example document to query

Most of the examples in this talk query this bibliography document at the (relative) URL bib.xml:

<?xml version="1.0"?>
<bib>
  <book year="1994">
    <title>TCP/IP Illustrated</title>
    <author><last>Stevens</last><first>W.</first></author>
    <publisher>Addison-Wesley</publisher>
    <price>65.95</price>
  </book>

  <book year="1992">
    <title>Advanced Programming in the Unix Environment</title>
    <author><last>Stevens</last><first>W.</first></author>
    <publisher>Addison-Wesley</publisher>
    <price>65.95</price>
  </book>

  <book year="2000">
    <title>Data on the Web</title>
    <author><last>Abiteboul</last><first>Serge</first></author>
    <author><last>Buneman</last><first>Peter</first></author>
    <author><last>Suciu</last><first>Dan</first></author>
    <publisher>Morgan Kaufmann Publishers</publisher>
    <price>39.95</price>
  </book>

  <book year="1999">
    <title>The Economics of Technology and Content for Digital TV</title>
    <editor>
      <last>Gerbarg</last><first>Darcy</first>
      <affiliation>CITI</affiliation>
    </editor>
    <publisher>Kluwer Academic Publishers</publisher>
    <price>129.95</price>
  </book>

</bib>

Adapted from Mary Fernandez, Jerome Simeon, and Phil Wadler: XML Query Languages: Experiences and Exemplars, 1999, as adapted in XML Query Use Cases


The XQuery FLWOR

photo of flower

Query: List titles of all books

   for $t in doc("bib.xml")/bib/book/title
   return
      $t 

Adapted from XML Query Use Cases


Query Result: Book Titles

% java  -cp saxon7.jar net.sf.saxon.Query query1
<?xml version="1.0" encoding="UTF-8"?>
<title>TCP/IP Illustrated</title>
<?xml version="1.0" encoding="UTF-8"?>
<title>Advanced Programming in the Unix Environment</title>
<?xml version="1.0" encoding="UTF-8"?>
<title>Data on the Web</title>
<?xml version="1.0" encoding="UTF-8"?>
<title>The Economics of Technology and Content for Digital TV</title>

XQueryX


Specifying a context node


Query Result with wrapping


XPath 1.0 Data Model

(Adapted from Jeni Tennison)


XPath 2.0 Data Model

(Adapted from Jeni Tennison)


Constructing sequences


Sequence example

for $a in (1 to 10)
return $a

Output:

1
2
3
4
5
6
7
8
9
10

Data types and the PSVI


Element Constructors

List titles of all books in a bib element. Put each title in a book element.

<bib>
  {
   for $t in doc("bib.xml")/bib/book/title
   return
    <book>
     { $t }
    </book>
  }
</bib>

Adapted from XML Query Use Cases


Query Result: Book Titles

<?xml version="1.0" encoding="UTF-8"?>
<bib>
   <book>
      <title>TCP/IP Illustrated</title>
   </book>
   <book>
      <title>Advanced Programming in the Unix Environment</title>
   </book>
   <book>
      <title>Data on the Web</title>
   </book>
   <book>
      <title>The Economics of Technology and Content for Digital TV</title>
   </book>
</bib>

Attribute Constructors

Adapted from XML Query Use Cases


Query Result

<?xml version="1.0" encoding="UTF-8"?>
<bib>
   <book year="1994">
      <title>TCP/IP Illustrated</title>
   </book>
   <book year="1992">
      <title>Advanced Programming in the Unix Environment</title>
   </book>
   <book year="2000">
      <title>Data on the Web</title>
   </book>
   <book year="1999">
      <title>The Economics of Technology and Content for Digital TV</title>
   </book>
</bib>

Text Constructors


Query Result

<?xml version="1.0" encoding="UTF-8"?>
<bib>
   <h1>Bibliography</h1>
   <title>TCP/IP Illustrated</title>
   <title>Advanced Programming in the Unix Environment</title>
   <title>Data on the Web</title>
   <title>The Economics of Technology and Content for Digital TV</title>
</bib>

Other Constructors


Expected Query Result

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="application/xml" href="bibliography.css"?>
<bib>
   <h1>Bibliography</h1>
   <title>TCP/IP Illustrated</title>
   <title>Advanced Programming in the Unix Environment</title>
   <title>Data on the Web</title>
   <title>The Economics of Technology and Content for Digital TV</title>
</bib>
<!-- An example from Elliotte Rusty Harold's 
  XQuery presentation -->

Query with where

Adapted from XML Query Use Cases


Query Result: Titles of books published by Addison-Wesley

<?xml version="1.0" encoding="UTF-8"?>
<bib>
   <title>TCP/IP Illustrated</title>
   <title>Advanced Programming in the Unix Environment</title>
</bib>

Adapted from XML Query Use Cases


Query with Booleans

Adapted from XML Query Use Cases


Query Result: books published by Addison-Wesley before 1993

<?xml version="1.0" encoding="UTF-8"?>
<bib>
   <title>Advanced Programming in the Unix Environment</title>
</bib>

Adapted from XML Query Use Cases


Query with multiple variables

Create a list of all the title-author pairs, with each pair enclosed in a result element.

<results>
 {
   for $b in doc("bib.xml")/bib/book,
     $t in $b/title,
     $a in $b/author
   return
    <result>
    { $t }
    { $a }
    </result>
  }
</results>

Adapted from XML Query Use Cases


Query Result: A list of all the title-author pairs

<?xml version="1.0" encoding="UTF-8"?>
<results>
   <result>
      <title>TCP/IP Illustrated</title>
      <author>
         <last>Stevens</last>
         <first>W.</first>
      </author>
   </result>
   <result>
      <title>Advanced Programming in the Unix Environment</title>
      <author>
         <last>Stevens</last>
         <first>W.</first>
      </author>
   </result>
   <result>
      <title>Data on the Web</title>
      <author>
         <last>Abiteboul</last>
         <first>Serge</first>
      </author>
   </result>
   <result>
      <title>Data on the Web</title>
      <author>
         <last>Buneman</last>
         <first>Peter</first>
      </author>
   </result>
   <result>
      <title>Data on the Web</title>
      <author>
         <last>Suciu</last>
         <first>Dan</first>
      </author>
   </result>
</results>

Adapted from XML Query Use Cases


Nested FLWOR Queries

For each book in the bibliography, list the title and authors, grouped inside a result element.

<results>
 {
   for $b in doc("bib.xml")/bib/book
     return
      <result>
       { $b/title }
       {  
         for $a in $b/author
         return $a
       }
      </result>
 }
</results>

Adapted from XML Query Use Cases


Query Result: A list of the title and authors of each book in the bibliography

<?xml version="1.0" encoding="UTF-8"?>
<results>
   <result>
      <title>TCP/IP Illustrated</title>
      <author>
         <last>Stevens</last>
         <first>W.</first>
      </author>
   </result>
   <result>
      <title>Advanced Programming in the Unix Environment</title>
      <author>
         <last>Stevens</last>
         <first>W.</first>
      </author>
   </result>
   <result>
      <title>Data on the Web</title>
      <author>
         <last>Abiteboul</last>
         <first>Serge</first>
      </author>
      <author>
         <last>Buneman</last>
         <first>Peter</first>
      </author>
      <author>
         <last>Suciu</last>
         <first>Dan</first>
      </author>
   </result>
   <result>
      <title>The Economics of Technology and Content for Digital TV</title>
   </result>
</results>

Adapted from XML Query Use Cases


Query with let

For each book in the bibliography, list the difference between the book's price and the average price.

<results> 
  {
   let $doc := doc("bib.xml")
   let $average := avg($doc//price)
   for $b in $doc/bib/book
     let $difference := $b/price - $average
     return
       <data>{ $b/title } is {$difference} more expensive than the average. </data>
  }    
</results>
  • := like Pascal, not = like C and Java


  • Query Result: price differences

    <?xml version="1.0" encoding="UTF-8"?>
    <results>
       <data>
          <title>TCP/IP Illustrated</title> is -9.5 more expensive than the average. </data>
       <data>
          <title>Advanced Programming in the Unix Environment</title> is -9.5 more expensive than the average. </data>
       <data>
          <title>Data on the Web</title> is -35.5 more expensive than the average. </data>
       <data>
          <title>The Economics of Technology and Content for Digital TV</title> is 54.499999999999986 more expensive than the average. </data>
    </results>

    if then else

    For each book in the bibliography, list the difference between the book's price and the average price, but this time indicate whether the book is more or less expensive than the average

    <results> 
      {
       let $doc := doc("bib.xml")
       let $average := avg($doc//price)
       for $b in $doc/bib/book
         return
           if ($b/price > $average) then
             <data>
               { $b/title } is ${$b/price - $average} 
               more expensive than the average.
             </data>
           else  
             <data>
               { $b/title } is ${$average - $b/price} 
               less expensive than the average.
             </data>
      }    
    </results>

    Query Result: Price differences

    <?xml version="1.0" encoding="UTF-8"?>
    <results>
       <data>
          <title>TCP/IP Illustrated</title> is $9.5 less expensive than the average.</data>
       <data>
          <title>Advanced Programming in the Unix Environment</title> is $9.5 less expensive than the average.</data>
       <data>
          <title>Data on the Web</title> is $35.5 less expensive than the average.</data>
       <data>
          <title>The Economics of Technology and Content for Digital TV</title> is $54.499999999999986 more expensive than the average.</data>
    </results>

    Query with sorting

    List the titles and years of all books published by Addison-Wesley after 1991, in alphabetic order.

    <bib>
     {
       for $b in doc("bib.xml")//book[publisher = "Addison-Wesley"]
       order by ($b/title)
       return
        <book>
         { $b/@year } { $b/title }
        </book> 
     }
    </bib>

    Adapted from XML Query Use Cases


    Query Result

    <?xml version="1.0" encoding="UTF-8"?>
    <bib>
       <book year="1992">
          <title>Advanced Programming in the Unix Environment</title>
       </book>
       <book year="1994">
          <title>TCP/IP Illustrated</title>
       </book>
    </bib>
    

    Adapted from XML Query Use Cases


    A different document about books

    Sample data at "reviews.xml":

    <?xml version="1.0"?>
    <reviews>
      <entry>
        <title>Data on the Web</title>
        <price>34.95</price>
        <review>
           A very good discussion of semi-structured database
           systems and XML.
        </review>
      </entry>
      <entry>
        <title>Advanced Programming in the Unix Environment</title>
        <price>65.95</price>
        <review>
          A clear and detailed discussion of UNIX programming.
        </review>
      </entry>
      <entry>
        <title>TCP/IP Illustrated</title>
        <price>65.95</price>
        <review>
          One of the best books on TCP/IP.
        </review>
      </entry>
    </reviews>
    

    Adapted from XML Query Use Cases


    This document uses a different DTD

    <!ELEMENT reviews (entry*)>
    <!ELEMENT entry   (title, price, review)>
    <!ELEMENT title   (#PCDATA)>
    <!ELEMENT price   (#PCDATA)>
    <!ELEMENT review  (#PCDATA)>
    

    Query that joins two documents

    For each book found in both bib.xml and reviews.xml, list the title of the book and its price from each source.

    <books-with-prices>
     {
       for $b in doc("bib.xml")//book,
         $a in doc("reviews.xml")//entry
       where $b/title = $a/title
       return
        <book-with-prices>
         { $b/title },
           <price-amazon> { $a/price/text() } </price-amazon>
           <price-bn> { $b/price/text() } </price-bn>
        </book-with-prices>
     }
    </books-with-prices>

    Adapted from XML Query Use Cases


    Result

    <?xml version="1.0" encoding="UTF-8"?>
    <books-with-prices>
       <book-with-prices>
          <title>TCP/IP Illustrated</title>,
           <price-amazon>65.95</price-amazon>
          <price-bn>65.95</price-bn>
       </book-with-prices>
       <book-with-prices>
          <title>Advanced Programming in the Unix Environment</title>,
           <price-amazon>65.95</price-amazon>
          <price-bn>65.95</price-bn>
       </book-with-prices>
       <book-with-prices>
          <title>Data on the Web</title>,
           <price-amazon>34.95</price-amazon>
          <price-bn>39.95</price-bn>
       </book-with-prices>
    </books-with-prices>
    

    Adapted from XML Query Use Cases


    prices.xml Query Sample Data

    The next query also uses an input document named "prices.xml":

    <?xml version="1.0"?>
    <prices>
      <book>
        <title>Advanced Programming in the Unix Environment</title>
        <source>www.amazon.com</source>
        <price>65.95</price>
      </book>
      <book>
        <title>Advanced Programming in the Unix Environment</title>
        <source>www.bn.com</source>
        <price>65.95</price>
      </book>
      <book>
        <title>TCP/IP Illustrated</title>
        <source>www.amazon.com</source>
        <price>65.95</price>
      </book>
      <book>
        <title>TCP/IP Illustrated</title>
        <source>www.bn.com</source>
        <price>65.95</price>
      </book>
      <book>
        <title>Data on the Web</title>
        <source>www.amazon.com</source>
        <price>34.95</price>
      </book>
      <book>
        <title>Data on the Web</title>
        <source>www.bn.com</source>
        <price>39.95</price>
      </book>
    </prices>
    
    
    

    Adapted from XML Query Use Cases


    Query with reused variables

    <results>
     {
       let $doc := doc("prices.xml")
       for $t in distinct-values($doc/prices/book/title)
         let $p := $doc/prices/book[title = $t]/price
         return
           <minprice title="{$t}">
             { min($p) }
           </minprice>
     }
    </results>

    Adapted from XML Query Use Cases


    Query Result

    <?xml version="1.0" encoding="UTF-8"?>
    <results>
       <minprice title="Advanced Programming in the Unix Environment">65.95</minprice>
       <minprice title="TCP/IP Illustrated">65.95</minprice>
       <minprice title="Data on the Web">34.95</minprice>
    </results>

    Adapted from XML Query Use Cases


    Multiple FLWOR Queries

    <bib>
     {
       for $b in doc("bib.xml")//book[author]
       return
        <book>
         { $b/title }
         { $b/author }
        </book>,
       for $b in doc("bib.xml")//book[editor]
       return
        <reference>
         { $b/title }
         <org> { $b/editor/affiliation/text() } </org>
        </reference>
     }
    </bib>

    Adapted from XML Query Use Cases


    Query Result

    <?xml version="1.0" encoding="UTF-8"?>
    <bib>
       <book>
          <title>TCP/IP Illustrated</title>
          <author>
             <last>Stevens</last>
             <first>W.</first>
          </author>
       </book>
       <book>
          <title>Advanced Programming in the Unix Environment</title>
          <author>
             <last>Stevens</last>
             <first>W.</first>
          </author>
       </book>
       <book>
          <title>Data on the Web</title>
          <author>
             <last>Abiteboul</last>
             <first>Serge</first>
          </author>
          <author>
             <last>Buneman</last>
             <first>Peter</first>
          </author>
          <author>
             <last>Suciu</last>
             <first>Dan</first>
          </author>
       </book>
       <reference>
          <title>The Economics of Technology and Content for Digital TV</title>
          <org>CITI</org>
       </reference>
    </bib>
    

    Adapted from XML Query Use Cases


    Querying documents that use namespaces


    Query Software


    What's the difference between XQuery and XSLT?


    XPath 2.0


    XPath 2.0 Goals


    Held over from XPath 1.0


    Accessor Functions

    fn:node-name(Node)
    returns zero or one QName
    fn:string(Object)
    returns the string value of anything
    fn:data(Node)
    returns a sequence of zero or more typed simple values
    fn:base-uri(node)
    returns the base URI of an Element or Document node
    fn:document-uri(node)
    returns the document URI of an Element or Document node

    Constructor Functions


    Casting

    if ($x castable as xs:gYear)
    then $x cast as xs:gYear
    else if ($x castable as xs:integer)
    then $x cast as xs:integer
    else if ($x castable as xs:decimal)
    then $x cast as xs:decimal
    else $x cast as string

    Four kinds of comparison operators


    Value comparison operators


    General comparisons


    Node comparisons


    Order comparisons


    Functions and operators


    Arithmetic operators


    Numeric Functions


    String functions


    Regular expressions


    Boolean Functions and Operators


    Date and time functions


    Qualified Name Functions


    Node Functions


    Sequence Functions


    Sequence size Functions

    fn:zero-or-one($arg as item()*) => item()?
    Returns $arg if it contains zero or one items. Otherwise, raises an error
    fn:one-or-more($arg as item()*) => item()?
    Returns $arg if it containsone or more items. Otherwise, raises an error
    fn:exactly-one($arg as item()*) => item()?
    Returns $arg if it contains exactly one item. Otherwise, raises an error

    Context Functions


    Other New features in XPath 2.0


    XPath Comments

    <xsl:apply-templates 
     select="(: The difference between the context node and the 
                 current node is crucial here :)
     ../composition[@composer=current()/@id]"/>

    Namespace wildcards

    <xsl:template match="*:set">
      This matches MathML set elements, SVG set elements, set
      elements in no namespace at all, etc. 
    </xsl:template>

    Can use functions as location steps


    Can use parenthesized expressions as location steps


    Dereference steps


    For Expressions


    for Example

    Consider the list of weblogs at http://static.userland.com/weblogMonitor/logs.xml

    <?xml version="1.0" encoding="ISO-8859-1" ?>
    <!DOCTYPE foo SYSTEM "http://msdn.microsoft.com/xml/general/htmlentities.dtd">
    <weblogs>
        <log>
            <name>MozillaZine</name>
            <url>http://www.mozillazine.org</url>
            <changesUrl>http://www.mozillazine.org/contents.rdf</changesUrl>
            <ownerName>Jason Kersey</ownerName>
            <ownerEmail>kerz@en.com</ownerEmail>
            <description>THE source for news on the Mozilla Organization.  DevChats, Reviews, Chats, Builds, Demos, Screenshots, and more.</description>
            <imageUrl></imageUrl>
            <adImageUrl>http://static.userland.com/weblogMonitor/ads/kerz@en.com.gif</adImageUrl>
        </log>
        <log>
            <name>SalonHerringWiredFool</name>
            <url>http://www.salonherringwiredfool.com/</url>
            <ownerName>Some Random Herring</ownerName>
            <ownerEmail>salonfool@wiredherring.com</ownerEmail>
            <description></description>
        </log>
        <log>
            <name>SlashDot.Org</name>
            <url>http://www.slashdot.org/</url>
            <ownerName>Simply a friend</ownerName>
            <ownerEmail>afriendofweblogs@weblogs.com</ownerEmail>
            <description>News for Nerds, Stuff that Matters.</description>
        </log>
    </weblogs>
    

    The changesUrl element points to a document like this:

    <?xml version="1.0"?>
    <!DOCTYPE rss PUBLIC "-//Netscape Communications//DTD RSS 0.91//EN" 
                         "http://my.netscape.com/publish/formats/rss-0.91.dtd">
    <rss version="0.91">
      <channel>
        <title>MozillaZine</title>
        <link>http://www.mozillazine.org/</link>
        <language>en-us</language>
        <description>Your source for Mozilla news, advocacy, interviews, builds, and more!</description>
        <copyright>Copyright 1998-2002, The MozillaZine Organization</copyright>
        <managingEditor>jason@mozillazine.org</managingEditor>
        <webMaster>jason@mozillazine.org</webMaster>
        <image>
          <title>MozillaZine</title>
          <url>http://www.mozillazine.org/image/mynetscape88.gif</url>
          <description>Your source for Mozilla news, advocacy, interviews, builds, and more!</description>
          <link>http://www.mozillazine.org/</link>
        </image>
    
        <item>
          <title>BugDays Are Back!</title>
          <link>http://www.mozillazine.org/talkback.html?article=2151</link>
        </item>
    
        <item>
          <title>Independent Status Reports</title>
          <link>http://www.mozillazine.org/talkback.html?article=2150</link>
        </item>
    
      </channel>
    
    </rss>
    

    We want to process all the item elements from each weblog.


    for Example

    <xsl:template match="weblogs">
      <xsl:apply-templates select="
        for $url in log/changesUrl
        return doc($url)/item
      "/>
    </xsl:template>

    Conditional Expressions

    Not all weblogs have a changesUrl

    <xsl:template match="log">
      <xsl:apply-templates select="
        if (changesUrl)
         then document(changesUrl)
         else document(url)"/>
    </xsl:template>

    Quantified Expressions

    <xsl:template match="weblogs">
      <xsl:if test="some $log in log satisfies changesURL">
         ????
      </xsl:if>
    </xsl:template>
    
    <xsl:template match="weblogs">
      <xsl:if test="every $log in log satisfies url">
        ????
      </xsl:if>
    </xsl:template>

    To Learn More


    Index | Cafe con Leche

    Copyright 2002, 2003 Elliotte Rusty Harold
    elharo@metalab.unc.edu
    Last Modified November 15, 2003