XSLT 2.0 and Beyond

Elliotte Rusty Harold

Wednesday, March 19, 2003

elharo@metalab.unc.edu

http://www.cafeconleche.org//

Outline

Part I: XPath 2.0
Part II: XSLT 2.0
Part III: XQuery

Versions Covered

XSLT 2.0 November 15, 2002 Working Draft: http://www.w3.org/TR/2002/WD-xslt20-20021115/
XPath 2.0 November 15, 2002 Working Draft: http://www.w3.org/TR/2002/WD-xpath20-20021115
XQuery: A Query Language for XML November 15, 2002 Working Draft: http://www.w3.org/TR/2002/WD-xquery-20021115/
XML Query Use Cases November 15, 2002 Working Draft: http://www.w3.org/TR/2002/WD-xmlquery-use-cases-20021115/
XML Query Data Model November 15, 2002 Working Draft: http://www.w3.org/TR/2002/WD-query-datamodel-20021115/
The XML Query Algebra November 15, 2002 Working Draft: http://www.w3.org/TR/query-algebra/
XML Syntax for XQuery 1.0 (XQueryX) June 7, 2001 Working Draft: http://www.w3.org/TR/2001/WD-xqueryx-20010607
XQuery 1.0 and XPath 2.0 Functions and Operators Version 1.0 November 15, 2002 Working Draft: http://www.w3.org/TR/2002/WD-xquery-operators-20021115/
XQuery 1.0 and XPath 2.0 Formal Semantics November 15, 2002 Working Draft: http://www.w3.org/TR/2002/WD-query-semantics-20021115/

XPath 2.0

Used by XSLT 2.0 and XQuery
Schema Aware
Partially implemented by Michael Kay's Saxon 7.3, http://saxon.sourceforge.net/

XPath 2.0 Goals

Simplify manipulation of XML Schema-typed content
Simplify manipulation of string content
Support related XML standards
Improve ease of use
Improve interoperability
Improve internationalization (i18n) support
Maintain backward compatibility
Enable improved processor efficiency

XPath 2.0 Requirements

Must express data model in terms of the Infoset
Must provide common core syntax and semantics for XSLT 2.0 and XML Query 1.0
Must support explicit "for any" or "for all" comparison and equality semantics
Must add min() and max() functions
Any valid XPath 1.0 expression SHOULD also be a valid XPath 2.0 expression when operating in the absence of XML Schema type information.
Should provide intersection and difference functions
Must loosen restrictions on location steps
Must provide a conditional expression (e.g. ternary ?: operator in Java and C)
Should support additional string functions, possibly including space padding, string replacement and conversion to upper or lower case
Must support regular expression string matching using the regexp syntax from schemas
Must add support for XML Schema primitive datatypes
Should add support for XML Schema structures

XPath 1.0 Data Model

(Adapted from Jeni Tennison)

The first class objects are strings, numbers, booleans, and node-sets (plus result tree fragments for XSLT)
Node-sets contain nodes (which are not first-class objects)
Nodes have various properties, including children - a node set (the order of the children can be worked out from the nodes' document order)
Seven node types: document, element, attribute, text, namespace, processing instruction, and comment
There are conceptually two kinds of node-sets:
- Node-sets containing new nodes (result tree fragments) can only be generated using XSLT
- Node-sets containing existing nodes can only be generated using XPath
No list data types, only node-sets but no number sets
Not Infoset compatible

XPath 2.0 Data Model

(Adapted from Jeni Tennison)

The first class object type is a sequence; i.e. an ordered list
Sequences contain items of two types: simple typed values or nodes. (They may not contain other sequences.)
A sequence containing one item is the same as the item.
Simple typed values have W3C XML Schema Language simple types: xsd:gYear, xsd:int, xsd:decimal, xsd:date, etc.
Seven node types: document, element, attribute, text, namespace, processing instruction, and comment
Nodes have these properties:
- node-kind: either "document", "element", "attribute", "text", "namespace", "processing-instruction", or "comment".
- name: a sequence containing one expanded QName if the node has a name (elements, attributes, etc.) or an empty sequence if the node doesn't have a name (comments, text nodes, etc.)
- parent: a sequence containing the unique parent node; the empty sequence is returned for parentless nodes, particularly document and namespace nodes
- base-uri: URI from which this particular node came
- string-value: same as XPath 1.0
- typed-value: a sequence of simple typed values corresponding to the node (always the empty sequence for anything other than elements and attributes)
- children: A sequence of nodes (empty except for element and document nodes)
- attributes: a sequence of attribute nodes; empty except for element nodes
- namespaces: a sequence of namespace nodes in-scope on the node
- declaration: a sequence containing 0 or 1 schema component
- type: a sequence containing 0 or 1 schema component
- unique-ID: a sequence containing 0 or 1 xsd:ID type node
Infoset compatible

Working with Sequences

Constructing sequences

Parentheses enclose sequences.
In literal sequence, the item literals are seprated by a commas:
```
(1, 3, 2, 34, 76, -87)
```
The to operator generates a range sequence without explicit listing:
```
(1 to 12)
```
Using constructors:
(fn:date("2002-03-11"), fn:date("2002-03-12"), fn:date("2002-03-13"), fn:date("2002-03-14"), fn:date("2002-03-15"))
Sequences can have mixed types: (fn:date("2002-03-11"), "Hello", 15)
Sequences do not nest; that is, a sequence cannot be a member of a sequence
Sequences are not sets: they are ordered and can contain duplicates
A single item is the same as a one-element sequence containing the item

Sequence example

<?xml version="1.0"?> 
<xsl:stylesheet version="2.0" 
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:template match="/">
    <numbers>
      <xsl:for-each select="(1 to 10)">
        <integer>
          <xsl:value-of select="."/>
        </integer>
      </xsl:for-each>
    </numbers>
  </xsl:template>

</xsl:stylesheet>

Output (modulo white space):

<?xml version="1.0" encoding="utf-8"?>
<numbers>
<integer>1</integer>
<integer>2</integer>
<integer>3</integer>
<integer>4</integer>
<integer>5</integer>
<integer>6</integer>
<integer>7</integer>
<integer>8</integer>
<integer>9</integer>
<integer>10</integer>
</numbers>

Unions of sequences

union or |
Duplicates are eliminated

<?xml version="1.0"?> 
<xsl:stylesheet version="2.0" 
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:output indent="yes"/>             
                
  <xsl:template match="/">
    <numbers>
      <xsl:for-each select='(3 to 10) | (5 to 12) | (20 to 23)'>
        <integer>
          <xsl:value-of select="."/>
        </integer>
      </xsl:for-each>
    </numbers>
  </xsl:template>

</xsl:stylesheet>

Output:

<numbers>
<integer>3</integer>
<integer>4</integer>
<integer>5</integer>
<integer>6</integer>
<integer>7</integer>
<integer>8</integer>
<integer>9</integer>
<integer>10</integer>
<integer>11</integer>
<integer>12</integer>
<integer>20</integer>
<integer>21</integer>
<integer>22</integer>
<integer>23</integer>
</numbers>

Intersections of sequences

intersect

<?xml version="1.0"?> 
<xsl:stylesheet version="2.0" 
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:output indent="yes"/>             
                
  <xsl:template match="/">
    <numbers>
      <xsl:for-each select='(3 to 10) intersect (5 to 12)'>
        <integer>
          <xsl:value-of select="."/>
        </integer>
      </xsl:for-each>
    </numbers>
  </xsl:template>

</xsl:stylesheet>

Output:

<numbers>
<integer>5</integer>
<integer>6</integer>
<integer>7</integer>
<integer>8</integer>
<integer>9</integer>
<integer>10</integer>
</numbers>

Except sequences

except

<?xml version="1.0"?> 
<xsl:stylesheet version="2.0" 
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:output indent="yes"/>             
                
  <xsl:template match="/">
    <numbers>
      <xsl:for-each select='(3 to 10) except (5 to 12)'>
        <integer>
          <xsl:value-of select="."/>
        </integer>
      </xsl:for-each>
    </numbers>
  </xsl:template>

</xsl:stylesheet>

Output:

<numbers>
  <integer>3</integer>
  <integer>4</integer>
</numbers>

Data types and the PSVI

All data is typed according to XML Schema Part 2: Datatypes.
Schema is used to specify types
If no schema is available, the default type is xs:anyType or xs:anySimpleType
Operators and functions are type-aware; e.g. can't add a string to a double or compare an integer to a year.
Constructors and casts are are available to convert data to appropriate types
Automatic casting is performed on untyped data, but can fail

Accessor Functions

fn:node-kind(Node): Returns a string identifying the kind of node; i.e. "document", "element", "attribute", "text", "namespace", "processing-instruction", or "comment".
fn:name(Node): returns zero or one QName
fn:string(Object): returns the string value of anything
fn:data(Node): returns a sequence of zero or more typed simple values
fn:base-uri(node): returns the base URI of an Element or Document node
fn:unique-ID(element): returns the unique ID of an element

Constructor Functions

Create a simple type from a string
Constructors are in the http://www.w3.org/2001/XMLSchema-datatypes namespace which is customarily mapped to the xs prefix
Numeric constructors:
- xs:decimal(string $srcval) => decimal
- xs:integer(string $srcval) => integer
- xs:long(string $srcval) => integer
- xs:int(string $srcval) => integer
- xs:short(string $srcval) => integer
- xs:byte(string $srcval) => integer
- xs:float(string $srcval) => float
- xs:double(string $srcval) => double
String constructors
- xs:string(string $srcval) => string
- xs:normalizedString(string $srcval) => normalizedString
- xs:token(string $srcval) => token
- xs:language(string $srcval) => language
- xs:Name(string $srcval) => Name
- xs:NMTOKEN(string $srcval) => NMTOKEN
- xs:NCName(string $srcval) => NCName
- xs:ID(string $srcval) => ID
- xs:IDREF(string $srcval) => IDREF
- xs:ENTITY(string $srcval) => ENTITY
Boolean constructors:
- xs:true() => boolean
- xs:false() => boolean
- xs:boolean-from-string(string $srcval) => boolean
Duration and Datetime constructors:
- xs:duration(string $srcval) => duration
- xs:dateTime(string $srcval) => dateTime
- xs:date(string $srcval) => date
- xs:time(string $srcval) => time
- xs:gYearMonth(string $srcval) => gYearMonth
- xs:gYear(string $srcval) => gYear
- xs:gMonthDay(string $srcval) => gMonthDay
- xs:gMonth(string $srcval) => gMonth
- xs:gDay(string $srcval) => gDay
Constructor for anyURI:
- xs:anyURI(string $srcval) => anyURI
Constructors for NOTATION:
- xs:NOTATION(string $srcval) => NOTATION

Casting

Four kinds of comparison operators

Value comparisons: compare a single value to a single value of a comparable type for equality
General comparisons: compare a sequence to a sequence for equality of at least one pair of members
Node comparisons: test for node identity
Order comparisons: compare document order

Value comparison operators

Compare single values and sequences of single or no values:
- eq
- ne
- lt
- le
- gt
- ge
These operators return either true, false, the empty sequence, an error, or a type exception.
Types must be comparable (No automatic conversion from strings as in XPath 1.0!):
1. Subtype substitution: A derived type may substitute for its base type. In particular, integer may be used where decimal is expected.
2. Type promotion: decimal may be promoted to float, and float may be promoted to double.

General comparisons

Compare one sequence to another sequence
True the condition is true for any pair of items from the two sequences
- =
- !=
- <
- <=
- >
- >=
These operators always return either true or false.
Can be used in XPath 1.0 compatibility mode, but is not by default.

Node comparisons

is and isnot
Only used on single nodes and empty sequences; otherwise a type error is raised.
Test for node identity like Java's == operator, not the equals() method

Order comparisons

>> and << compare single nodes for document order
The << operator returns true if the first operand node is reachable from the second operand node using the preceding axis; otherwise it returns false.
The >> operator returns true if the first operand node is reachable from the second operand node using the following axis; otherwise it returns false.

Functions and operators

Functions are in the http://www.w3.org/2002/11/xquery-functions namespace which is customarily mapped to the fn prefix
The function namespace name and prefix is understood in XSLT, without being explicitly stated.
Operators are in the http://www.w3.org/2002/11/xquery-operators namespace
XPath implementations such as XQuery and XSLT map the operators to symbols like * and +
These namespace URIs will change

Arithmetic operators

op:multiply(numeric $operand1, numeric $operand2) => numeric
op:numeric-add(numeric $operand1, numeric $operand2) => numeric
op:numeric-subtract(numeric $operand1, numeric $operand2) => numeric
op:numeric-multiply(numeric $operand1, numeric $operand2) => numeric
op:numeric-divide(numeric $operand1, numeric $operand2) => numeric
op:numeric-integer-divide(integer $operand1, integer $operand2) => integer
op:numeric-mod(numeric $operand1, numeric $operand2) => numeric
op:numeric-unary-plus(numeric $operand) => numeric
op:numeric-unary-minus(numeric $operand) => numeric

Numeric comparison operators

op:numeric-equal(numeric $operand1, numeric $operand2) => boolean
op:numeric-less-than(numeric $operand1, numeric $operand2) => boolean
op:numeric-greater-than(numeric $operand1, numeric $operand2) => boolean

Numeric Functions

fn:floor(double? $srcval) => integer?
fn:ceiling(double? $srcval) => integer?
fn:round(double? $srcval) => integer?

String functions

fn:concat() => string
fn:concat(string? $op1) => string
fn:concat(string? $op1, string? $op2, ...) => string
fn:string-join(string* $operand1, string* $operand2) => string
fn:starts-with(string? $operand1, string? $operand2) => boolean?
fn:starts-with(string? $operand1, string? $operand2, anyURI $collationLiteral) => boolean?
fn:ends-with(string? $operand1, string? $operand2) => boolean?
fn:ends-with(string? $operand1, string? $operand2, anyURI $collationLiteral) => boolean?
fn:contains(string? $operand1, string? $operand2) => boolean?
fn:contains(string? $operand1, string? $operand2, anyURI $collationLiteral) => boolean?
fn:substring(string? $sourceString, decimal? $startingLoc) => string?
fn:substring(string? $sourceString, decimal? $startingLoc, decimal? $length) => string?
fn:string-length(string? $srcval) => integer?
fn:substring-before(string? $operand1, string? $operand2) => string?
fn:substring-before(string? $operand1, string? $operand2, anyURI $collationLiteral) => string?
fn:substring-after(string? $operand1, string? $operand2) => string?
fn:substring-after(string? $operand1, string? $operand2, anyURI $collationLiteral) => string?
fn:normalize-space(string? $srcval) => string?
fn:normalize-unicode(string? $srcval, string $normalizationForm) => string?
fn:upper-case(string? $srcval) => string?
fn:lower-case(string? $srcval) => string?
fn:translate(string? $srcval, string? $mapString, string? $transString) => string?
fn:string-pad(string? $padString, decimal? $padCount) => string?
fn:matches(string? $srcval, string? $regexp) => integer*
fn:replace(string? $srcval, string? $regexp, string? $repval) => string?
fn:tokenize(string? $input as string?, string? $pattern) => string*
fn:tokenize(string? $input as string?, string? $pattern as string?, string? $flags) => string*
fn:escape-uri(string $uri-part as string, boolean $escape-reserved) => string

Regular expressions

Syntax for fn:matches() is based on W3C XML Schema Language regular expressions:
Syntax for fn:replace() is based on W3C XML Schema Language regular expressions plus $N in replace patterns to indicate the Nth match.

Boolean Functions

op:boolean-and(boolean $value1, boolean $value2) => boolean
op:boolean-or(boolean $value1, boolean $value2) => boolean
op:boolean-equal(boolean? $value1, boolean? $value2) => boolean?
fn:not(boolean? $srcval) => boolean

Date and time functions

xs:duration is underspecified so new yearMonthDuration and dayTimeDuration types are defined.
Comparisons of Duration and Datetime Values:
- op:duration-equal(duration $operand1, duration $operand2) => boolean
- op:gYearMonth-equal(gYearMonth $operand1, gYearMonth $operand2) => boolean
- op:gYear-equal(gYear $operand1, gYear $operand2) => boolean
- op:gMonthDay-equal(gMonthDay $operand1, gMonthDay $operand2) => boolean
- op:gMonth-equal(gMonth $operand1, gMonth $operand2) => boolean
- op:gDay-equal(gDay $operand1, gDay $operand2) => boolean
- op:yearMonthDuration-equal(yearMonthDuration $operand1, yearMonthDuration $operand2) => boolean
- op:yearMonthDuration-less-than(yearMonthDuration $operand1, yearMonthDuration $operand2) => boolean
- op:yearMonthDuration-greater-than(yearMonthDuration $operand1, yearMonthDuration $operand2) => boolean
- op:dayTimeDuration-equal(dayTimeDuration $operand1, dayTimeDuration $operand2) => boolean
- op:dayTimeDuration-less-than(dayTimeDuration $operand1, dayTimeDuration $operand2) => boolean
- op:dayTimeDuration-greater-than(dayTimeDuration $operand1, dayTimeDuration $operand2) => boolean
- op:dateTime-equal(dateTime $operand1, dateTime $operand2) => boolean
- op:dateTime-less-than(dateTime $operand1, dateTime $operand2) => boolean
- op:dateTime-greater-than(dateTime $operand1, dateTime $operand2) => boolean
- op:time-equal(time $operand1, time $operand2) => boolean
- op:time-less-than(time $operand1, time $operand2) => boolean
- op:time-greater-than(time $operand1, time $operand2) => boolean
- op:date-equal(date $operand1, date $operand2) => boolean
- op:date-less-than(date $operand1, date $operand2) => boolean
- op:date-greater-than(date $operand1, date $operand2) => boolean
Component Extraction Functions on Duration, Date and Time Values:
- fn:get-years-from-yearMonthDuration(yearMonthDuration $srcval) => integer
- fn:get-months-from-yearMonthDuration(yearMonthDuration $srcval) => integer
- fn:get-days-from-dayTimeDuration(dayTimeDuration $srcval) => integer
- fn:get-hours-from-dayTimeDuration(dayTimeDuration $srcval) => integer
- fn:get-minutes-from-dayTimeDuration(dayTimeDuration $srcval) => integer
- fn:get-seconds-from-dayTimeDuration(dayTimeDuration $srcval) => integer
- fn:get-year-from-dateTime(dateTime $srcval) => integer
- fn:get-month-from-dateTime(dateTime $srcval) => integer
- fn:get-day-from-dateTime(dateTime $srcval) => integer
- fn:get-hours-from-dateTime(dateTime $srcval) => integer
- fn:get-minutes-from-dateTime(dateTime $srcval) => integer
- fn:get-seconds-from-dateTime(dateTime $srcval) => integer
- fn:get-timezone-from-dateTime(dateTime $srcval) => integer
- fn:get-year-from-date(date $srcval) => integer
- fn:get-month-from-date(date $srcval) => integer
- fn:get-day-from-date(date $srcval) => integer
- fn:get-timezone-from-date(date $srcval) => integer
- fn:get-hours-from-time(time $srcval) => integer
- fn:get-minutes-from-time(time $srcval) => integer
- fn:get-seconds-from-time(time $srcval) => integer
- fn:get-timezone-from-time(time $srcval) => integer

Qualified Name Functions

fn:QName-in-context(string $qname, boolean $use-default) => QName
fn:QName-in-context(string $qname, boolean $use-default, node $node) => QName
fn:get-local-name-from-QName(QName? $srcval) => string?
fn:get-namespace-from-QName(QName? $srcval) => anyURI?
fn:get-namespace-uri-for-prefix(element $element, string $prefix) => string?
fn:get-in-scope-namespaces(element $element) => string*

Binary operators

op:hexBinary-equal(hexBinary $value1, hexBinary $value2) => boolean
op:base64Binary-equal(base64Binary $value1, base64Binary $value2) => boolean

Node Functions

fn:name() => string
fn:name(node $srcval) => string
fn:local-name() => string
fn:local-name(node $srcval) => string
fn:namespace-uri() => string
fn:namespace-uri(node $srcval) => string
fn:root() => node
fn:root(node $srcval) => node
fn:number() => double
fn:number(node $srcval) => double
fn:deep-equal(node $parameter1, node $parameter2) => boolean
fn:deep-equal(node $parameter1, node $parameter2, anyURI $collation) => boolean
fn:copy(node? $srcval) => node?
fn:lang(string $testlang) => boolean

Sequence Functions

fn:boolean(item* $srcval) => boolean
op:concatenate(item* $seq1, item* $seq2) => item*
op:item-at(item* $seqParam, decimal $posParam) => item?
fn:index-of(item* $seqParam, item $srchParam) => unsignedInt?
fn:index-of(item* $seqParam, item $srchParam, anyURI $collationLiteral) => unsignedInt?
fn:empty(item* $srcval) => boolean
fn:exists(item* $srcval) => boolean
fn:distinct-nodes(node* $srcval) => node*
fn:distinct-values(item* $srcval) => item*
fn:distinct-values(item* $srcval, anyURI $collationLiteral) => item*
fn:insert(item* $target, decimal $position, item* $inserts) => item*
fn:remove(item* $target, decimal $position) => item*
fn:subsequence(item* $sourceSeq, decimal $startingLoc) => item*
fn:subsequence(item* $sourceSeq, decimal $startingLoc, decimal $length) => item*
fn:sequence-deep-equal(item* $parameter1, item* $parameter2) => boolean?
fn:sequence-deep-equal(item* $parameter1, item* $parameter2, anyURI $collationLiteral) => boolean?
fn:sequence-node-equal(item*? $parameter1, item*? $parameter2) => boolean?
fn:count(item* $srcval) => unsignedInt
fn:avg(item* $srcval) => double?
fn:max(item* $srcval) => anySimpleType?
fn:max(item* $srcval, anyURI $collationLiteral) => anySimpleType?
fn:min(item* $srcval) => anySimpleType?
fn:min(item* $srcval, anyURI $collationLiteral) => anySimpleType?
fn:sum(item* $srcval) => double?
fn:id(IDREF* $srcval) => elementNode*
fn:idref(string* $srcval) => elementNode*
fn:collection(string $srcval) => node*
fn:input() => node*
fn:document(string? $srcval) => node?

Context Functions

op:context-item() => item
fn:position() => unsignedInt
fn:last() => unsignedInt
op:context-document() => DocumentNode
fn:current-dateTime() => dateTime
fn:current-time() => time
fn:current-date() => date
fn:default-collation() => anyURI?
fn:implicit-timezone() => dayTimeDuration?

Other New features in XPath 2.0

Comments
Namespace wildcards
Functions as location steps
Parenthesized expressions as location steps
Dereference steps
For Expressions
Conditional Expressions
Quantified Expressions

XPath Comments

{-- This is an XPath comment --}

<xsl:apply-templates 
 select="{-- The difference between the context node and the 
             current node is crucial here --}
 ../composition[@composer=current()/@id]"/>

Namespace wildcards

<xsl:template match="*:set">
  This matches MathML set elements, SVG set elements, set
  elements in no namespace at all, etc. 
</xsl:template>

Can use functions as location steps

The document() function returns the root of a document at a given URL
document("http://www.cafeconleche.org/")//today

Can use parenthesized expressions as location steps

/child::contacts/(child::personal | child::business)/child::name
Abbreviated: /contacts/(personal | business)/name

Dereference steps

Map an IDREF attribute node to the element it refers to

Composers and their compositions are linked through the an ID-type id attribute of the composer element and the IDREF-type composer attribute of the composition element:

  <composer id="c3">
    <name>
      <first_name>Beth</first_name> 
      <middle_name></middle_name> 
      <last_name>Anderson</last_name>
    </name>
  </composer>
    
  <composition composers="c3">
    <title>Trio: Dream in D</title>
    <date><year>(1980)</year></date> 
    <length>10'</length>
    <instruments>fl, pn, vc, or vn, pn, vc</instruments>
    <description>
      Rhapsodic. Passionate. Available on CD 
      <cite><a href="http://www.amazon.com/exec/obidos/ASIN/B000007NMH/qid%3D913265342/sr%3D1-2/">Two by Three</a></cite> 
      from North/South Consonance (1998).
    </description> 
    <publisher></publisher>
  </composition>

With XPath 1.0:

<xsl:template match="composition">
  <h2>
    <xsl:value-of select="name"/> by
    <xsl:value-of select="../composer[@id=current()/@composer]"/>
  </h2>
</xsl:template>

With XPath 2.0:

<xsl:template match="composition">
  <h2>
    <xsl:value-of select="name"/> by
    <xsl:value-of select="@composers=>composer/name"/>
  </h2>
</xsl:template>

For Expressions

Useful for joining documents
Useful for restructuring data

Syntax:

for $var1 in expression, $var2 in expression...
return expression

for Example

Consider the list of weblogs at http://static.userland.com/weblogMonitor/logs.xml

<?xml version="1.0" encoding="ISO-8859-1" ?>
<!DOCTYPE foo SYSTEM "http://msdn.microsoft.com/xml/general/htmlentities.dtd">
<weblogs>
    <log>
        <name>MozillaZine</name>
        <url>http://www.mozillazine.org</url>
        <changesUrl>http://www.mozillazine.org/contents.rdf</changesUrl>
        <ownerName>Jason Kersey</ownerName>
        <ownerEmail>kerz@en.com</ownerEmail>
        <description>THE source for news on the Mozilla Organization.  DevChats, Reviews, Chats, Builds, Demos, Screenshots, and more.</description>
        <imageUrl></imageUrl>
        <adImageUrl>http://static.userland.com/weblogMonitor/ads/kerz@en.com.gif</adImageUrl>
    </log>
    <log>
        <name>SalonHerringWiredFool</name>
        <url>http://www.salonherringwiredfool.com/</url>
        <ownerName>Some Random Herring</ownerName>
        <ownerEmail>salonfool@wiredherring.com</ownerEmail>
        <description></description>
    </log>
    <log>
        <name>SlashDot.Org</name>
        <url>http://www.slashdot.org/</url>
        <ownerName>Simply a friend</ownerName>
        <ownerEmail>afriendofweblogs@weblogs.com</ownerEmail>
        <description>News for Nerds, Stuff that Matters.</description>
    </log>
</weblogs>

The changesUrl element points to a document like this:

<?xml version="1.0"?>
<!DOCTYPE rss PUBLIC "-//Netscape Communications//DTD RSS 0.91//EN" 
                     "http://my.netscape.com/publish/formats/rss-0.91.dtd">
<rss version="0.91">
  <channel>
    <title>MozillaZine</title>
    <link>http://www.mozillazine.org/</link>
    <language>en-us</language>
    <description>Your source for Mozilla news, advocacy, interviews, builds, and more!</description>
    <copyright>Copyright 1998-2002, The MozillaZine Organization</copyright>
    <managingEditor>jason@mozillazine.org</managingEditor>
    <webMaster>jason@mozillazine.org</webMaster>
    <image>
      <title>MozillaZine</title>
      <url>http://www.mozillazine.org/image/mynetscape88.gif</url>
      <description>Your source for Mozilla news, advocacy, interviews, builds, and more!</description>
      <link>http://www.mozillazine.org/</link>
    </image>

    <item>
      <title>BugDays Are Back!</title>
      <link>http://www.mozillazine.org/talkback.html?article=2151</link>
    </item>

    <item>
      <title>Independent Status Reports</title>
      <link>http://www.mozillazine.org/talkback.html?article=2150</link>
    </item>

  </channel>

</rss>

We want to process all the item elements from each weblog.

for Example

<xsl:template match="weblogs">
  <xsl:apply-templates select="
    for $url in log/changesUrl
    return document($url)/item
  "/>
</xsl:template>

Conditional Expressions

if ( expression) then expression else expression

Not all weblogs have a changesUrl

<xsl:template match="log">
  <xsl:apply-templates select="
    if (changesUrl)
     then document(changesUrl)
     else document(url)"/>
</xsl:template>

Quantified Expressions

some $QualifedName in expression satisfies expression
every $QualifedName in expression satisfies expression
Both return boolean values, true or false

<xsl:template match="weblogs">
  <xsl:if test="some $log in log satisfies changesURL">
     ????
  </xsl:if>
</xsl:template>

<xsl:template match="weblogs">
  <xsl:if test="every $log in log satisfies url">
    ????
  </xsl:if>
</xsl:template>

XSLT 2.0

Uses XPath 2.0
Schema Aware
Partially implemented by Michael Kay's Saxon 7.3, http://saxon.sourceforge.net/

XSLT 2.0 Goals

Simplify manipulation of XML Schema-typed content
Simplify manipulation of string content
Support related XML standards
Improve ease of use
Improve interoperability
Improve i18n support
Maintain backward compatibility
Enable improved processor efficiency

XSLT 2.0 Non-goals

Simplifying the ability to parse unstructured information to produce structured results.
Turning XSLT into a general-purpose programming language

XSLT 2.0 Requirements

Must maintain backwards compatibility with XSLT 1.1
Should be able to match elements and attributes whose value is explicitly null.
Should allow included documents to encapsulate local stylesheets
Could support accessing infoset items for XML declaration
Could provide qualified name aware string functions
Could enable constructing a namespace with computed name
Could simplify resolving prefix conflicts in qname-valued attributes
Could support XHTML output method
Must allow matching on default namespace without explicit prefix
Must add date formatting functions
Must simplify accessing IDs and keys in other documents
Should provide function to absolutize relative URIs
Should include unparsed text from an external resource
Should allow authoring extension functions in XSLT
Should output character entity references instead of numeric character entities
Should construct entity reference by name
Should support Unicode string normalization
Should standardize extension element language bindings
Could improve efficiency of transformations on large documents
Could support reverse IDREF attributes
Could support case-insensitive comparisons
Could support lexigraphic string comparisons
Could allow comparing nodes based on document order
Could improve support for unparsed entities
Could allow processing a node with the "next best matching" template
Could make coercions symmetric by allowing scalar to nodeset conversion
Must support XML schema
Must simplify constructing and copying typed content
Must support sorting nodes based on XML schema type
Could support scientific notation in number formatting
Could provide ability to detect whether "rich" schema information is available
Must simplify grouping

Identifying 2.0 stylesheets

Namespace is still http://www.w3.org/1999/XSL/Transform
version attribute of xsl:stylesheet has value 2.0

<xsl:stylesheet version="2.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <!-- Top level elements -->

</xsl:stylesheet>

No result tree fragments

The result tree fragment data-type has been eliminated.
Variable-binding elements with content now construct sequences of nodes
These node sequences can now be operated on by templates
Functionality previously available with saxon:nodeSet() and similar extension functions
Allows pipelining of templates

xsl:for-each-group

Like xsl:for-each, but orders elements differently
Works well with flat structures
Replaces Muenchian method

Basic syntax:

<xsl:for-each-group
  select = expression
  group-by = "string expression"
  group-adjacent = "string expression"
  group-starting-with = pattern>
  <!-- Content: (xsl:sort*, content-constructor) -->
</xsl:for-each-group>

The select attribute selects the population to be grouped.
The group-by attribute calculates a string value for each node in the population. Nodes with the same value are grouped together.
The group-adjacent attribute calculates a string value for each node in the population. Every time the value changes, a new group is started.
The group-starting-with starts a new group every time its pattern is matched.
group-by, group-adjacent, and group-starting-with are mutually exclusive.

Grouping example: input

Task: Arrange articles in a large, flat document like this by section:

<?xml version="1.0"?>
<backslash>

<story>
<title>ROX Desktop Update</title>
<url>http://slashdot.org/article.pl?sid=02/02/18/180240</url>
<time>2002-02-18 18:50:13</time>
<author>timothy</author>
<department>small-simple-swift</department>
<topic>104</topic>
<comments>32</comments>
<section>developers</section>
<image>topicx.jpg</image>
</story>

<story>
<title>HP Selling Systems With Linux</title>
<url>http://slashdot.org/article.pl?sid=02/02/18/1712241</url>
<time>2002-02-18 17:37:20</time>
<author>timothy</author>
<department>wish-this-wasn't-remarkable</department>
<topic>173</topic>
<comments>188</comments>
<section>articles</section>
<image>topichp.gif</image>
</story>

<story>
<title>Excellent Hacks to the ReplayTV 4000</title>
<url>http://slashdot.org/article.pl?sid=02/02/18/1619213</url>
<time>2002-02-18 16:46:04</time>
<author>CmdrTaco</author>
<department>hardware-I-lust-after</department>
<topic>129</topic>
<comments>117</comments>
<section>articles</section>
<image>topictv.jpg</image>
</story>

<story>
<title>Peek-a-Boo(ty)</title>
<url>http://slashdot.org/article.pl?sid=02/02/18/1546226</url>
<time>2002-02-18 15:58:06</time>
<author>Hemos</author>
<department>pirate-treasure</department>
<topic>158</topic>
<comments>207</comments>
<section>articles</section>
<image>topicprivacy.gif</image>
</story>

<story>
<title>Self-Shredding E-Mail</title>
<url>http://slashdot.org/article.pl?sid=02/02/18/1343202</url>
<time>2002-02-18 14:37:45</time>
<author>timothy</author>
<department>plausible-deniability</department>
<topic>158</topic>
<comments>170</comments>
<section>articles</section>
<image>topicprivacy.gif</image>
</story>

<story>
<title>CIA &amp;amp; KGB Gadgets On Display</title>
<url>http://slashdot.org/article.pl?sid=02/02/18/0252219</url>
<time>2002-02-18 13:52:04</time>
<author>Hemos</author>
<department>looking-a-tthe-gear</department>
<topic>126</topic>
<comments>103</comments>
<section>articles</section>
<image>topictech2.gif</image>
</story>

<story>
<title>Re-Building the Wright Flyer</title>
<url>http://slashdot.org/article.pl?sid=02/02/18/060257</url>
<time>2002-02-18 12:29:12</time>
<author>timothy</author>
<department>we-hope-they-wear-modern-helmets</department>
<topic>126</topic>
<comments>132</comments>
<section>science</section>
<image>topictech2.gif</image>
</story>

<story>
<title>How to Fix the Unix Configuration Nightmare</title>
<url>http://slashdot.org/article.pl?sid=02/02/18/0248248</url>
<time>2002-02-18 10:48:36</time>
<author>Hemos</author>
<department>fixing-the-problem</department>
<topic>130</topic>
<comments>367</comments>
<section>articles</section>
<image>topicunix.jpg</image>
</story>

<story>
<title>Sleep Less, Live Longer</title>
<url>http://slashdot.org/article.pl?sid=02/02/18/0511253</url>
<time>2002-02-18 07:38:15</time>
<author>timothy</author>
<department>if-you're-reading-this</department>
<topic>134</topic>
<comments>309</comments>
<section>science</section>
<image>topicscience.gif</image>
</story>

<story>
<title>Warming and Slowing the World</title>
<url>http://slashdot.org/article.pl?sid=02/02/18/0243253</url>
<time>2002-02-18 04:39:39</time>
<author>Hemos</author>
<department>slowing-things-down</department>
<topic>134</topic>
<comments>312</comments>
<section>science</section>
<image>topicscience.gif</image>
</story>

</backslash>

Grouping example: desired output

<?xml version="1.0"?>
<forwardslash>

<section>
  <title>developers</title>
<story>
<title>ROX Desktop Update</title>
<url>http://slashdot.org/article.pl?sid=02/02/18/180240</url>
<time>2002-02-18 18:50:13</time>
<author>timothy</author>
<department>small-simple-swift</department>
<topic>104</topic>
<comments>32</comments>
<image>topicx.jpg</image>
</story>

</section>

<section>
  <title>articles</title>

<story>
<title>HP Selling Systems With Linux</title>
<url>http://slashdot.org/article.pl?sid=02/02/18/1712241</url>
<time>2002-02-18 17:37:20</time>
<author>timothy</author>
<department>wish-this-wasn't-remarkable</department>
<topic>173</topic>
<comments>188</comments>
<image>topichp.gif</image>
</story>

<story>
<title>Excellent Hacks to the ReplayTV 4000</title>
<url>http://slashdot.org/article.pl?sid=02/02/18/1619213</url>
<time>2002-02-18 16:46:04</time>
<author>CmdrTaco</author>
<department>hardware-I-lust-after</department>
<topic>129</topic>
<comments>117</comments>
<image>topictv.jpg</image>
</story>

<story>
<title>Peek-a-Boo(ty)</title>
<url>http://slashdot.org/article.pl?sid=02/02/18/1546226</url>
<time>2002-02-18 15:58:06</time>
<author>Hemos</author>
<department>pirate-treasure</department>
<topic>158</topic>
<comments>207</comments>
<image>topicprivacy.gif</image>
</story>

<story>
<title>Self-Shredding E-Mail</title>
<url>http://slashdot.org/article.pl?sid=02/02/18/1343202</url>
<time>2002-02-18 14:37:45</time>
<author>timothy</author>
<department>plausible-deniability</department>
<topic>158</topic>
<comments>170</comments>
<image>topicprivacy.gif</image>
</story>

<story>
<title>CIA &amp;amp; KGB Gadgets On Display</title>
<url>http://slashdot.org/article.pl?sid=02/02/18/0252219</url>
<time>2002-02-18 13:52:04</time>
<author>Hemos</author>
<department>looking-a-tthe-gear</department>
<topic>126</topic>
<comments>103</comments>
<image>topictech2.gif</image>
</story>


<story>
<title>How to Fix the Unix Configuration Nightmare</title>
<url>http://slashdot.org/article.pl?sid=02/02/18/0248248</url>
<time>2002-02-18 10:48:36</time>
<author>Hemos</author>
<department>fixing-the-problem</department>
<topic>130</topic>
<comments>367</comments>
<image>topicunix.jpg</image>
</story>


</section>
<section>
  <title>science</title>


<story>
<title>Re-Building the Wright Flyer</title>
<url>http://slashdot.org/article.pl?sid=02/02/18/060257</url>
<time>2002-02-18 12:29:12</time>
<author>timothy</author>
<department>we-hope-they-wear-modern-helmets</department>
<topic>126</topic>
<comments>132</comments>
<image>topictech2.gif</image>
</story>


<story>
<title>Sleep Less, Live Longer</title>
<url>http://slashdot.org/article.pl?sid=02/02/18/0511253</url>
<time>2002-02-18 07:38:15</time>
<author>timothy</author>
<department>if-you're-reading-this</department>
<topic>134</topic>
<comments>309</comments>
<section>science</section>
<image>topicscience.gif</image>
</story>

<story>
<title>Warming and Slowing the World</title>
<url>http://slashdot.org/article.pl?sid=02/02/18/0243253</url>
<time>2002-02-18 04:39:39</time>
<author>Hemos</author>
<department>slowing-things-down</department>
<topic>134</topic>
<comments>312</comments>
<section>science</section>
<image>topicscience.gif</image>
</story>

</section>

</forwardslash>

Grouping example: stylesheet

<?xml version="1.0"?> 
<xsl:stylesheet version="2.0" 
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:template match="/">
    <forwardslash>
      <xsl:apply-templates select="*"/>
    </forwardslash>
  </xsl:template>

  <xsl:template match="backslash">
    <xsl:for-each-group select="story" group-by="section">
      <section>
        <title><xsl:value-of select="current-group()/section"/></title>
        <xsl:apply-templates select="."/>
      </section>
    </xsl:for-each-group>
  </xsl:template>

  <xsl:template match="story">
    <story>
      <xsl:apply-templates/>
    </story>
  </xsl:template>

  <xsl:template match="*">
    <xsl:copy-of select="."/>
  </xsl:template>

  <xsl:template match="section"/>

</xsl:stylesheet>

xsl:result-document

Determines the URI of a new result tree; there can be several of these.
Allows you to generate multiple documents from one source document
Previously available with extension functions like xt:document and saxon:output

Syntax:

<!-- Category: instruction -->
<xsl:result-document
  format = "QualifiedName"
  href   = "uri-reference">
  <!-- Content: content-constructor -->
</xsl:result-document>

The format attribute names an xsl:output element for this result document.

xsl:result-document Example

     <xsl:output name="ccl:html" method="html" encoding="ISO-8859-1" />

     <xsl:result-document href="index.html" format="ccl:html">
       <html>
         <head>
           <title><xsl:value-of select="title"/></title>         
         </head>
         <body> 
           <h1 align="center"><xsl:value-of select="title"/></h1> 
           <ul>
             <xsl:for-each select="slide">
               <li><a href="{format-number(position(),'00')}.html"><xsl:value-of select="title"/></a></li>
             </xsl:for-each>    
           </ul>           
           
           <p><a href="{translate(title,' ', '_')}.html">Entire Presentation as Single File</a></p>
              
           <hr/>
           <div align="center">
             <A HREF="01.html">Start</A> | <A HREF="/xml/">Cafe con Leche</A>
           </div>
           <hr/>
           <font size="-1">
              Copyright 2002 
              <a href="http://www.elharo.com/">Elliotte Rusty Harold</a><br/>       
              <a href="mailto:elharo@metalab.unc.edu">elharo@metalab.unc.edu</a><br/>
              Last Modified <xsl:apply-templates select="last_modified" mode="lm"/>
           </font>
         </body>     
       </html>     
     </xsl:result-document>

Sorting

<xsl:sort-key
  name = "Qualified Name">
  <!-- Content: (xsl:sort+) -->
</xsl:sort-key>

xsl:namespace

Attaches an additional namespace node to a result tree element
Rarely necessary; normally the usual XSLT 1.0 namespace declarations are sufficient.
Occasionally useful if the output document uses a namespace prefix exclusively in element content or attribute values

<xsl:namespace name="xsd">http://www.w3.org/2001/XMLSchema</xsl:namespace>

Value of a sequence

Separator attribute identifies value placed between string value of each member of sequence

<x><xsl:value-of select="(1,2,3,4)" separator=" | "/></x>

<x>1 | 2 | 3 | 4</x>

default-xpath-namespace

An attribute that specifies the default namespace in effect for unprefixed element names used in XPath expressions within this element and its descendants
Can be used on literal result elements, in which case it is in the XSLT namespace and the attribute is prefixed as xsl:default-xpath-namespace

An XSLT 1.0 stylesheet for working with XHTML

XPath expressions must use a prefix to match XHTML element names.

<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns="http://www.w3.org/1999/xhtml" 
  xmlns:html="http://www.w3.org/1999/xhtml" 
>

  <xsl:output method="html" encoding="ISO-8859-1"/>

  <xsl:template match="week">
    <html xml:lang="en" lang="en">
      <head><title><xsl:value-of select="//html:h1[1]"/></title></head>
      <body bgcolor="#ffffff" text="#000000">

        <xsl:apply-templates select="html:body"/>

        <font size="-1">Last Modified Mon June 5, 2001<br />
          Copyright 2001 Elliotte Rusty Harold<br />
          <a href="mailto:elharo@metalab.unc.edu">elharo@metalab.unc.edu</a>
        </font>
      </body>
    </html>
  </xsl:template>

  <xsl:template match="html:body">
    <xsl:apply-templates 
      select="text()[count(following-sibling::html:hr)>1]|*[count(following-sibling::html:hr)>1]" />

    <hr/>
  </xsl:template>

  <xsl:template match="html:*">
    <xsl:copy>
      <xsl:for-each select="@*">
        <xsl:copy-of select="."/>
      </xsl:for-each>
      <xsl:apply-templates/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="html:font[@size='-1']"></xsl:template>

  <xsl:template match="html:a">
    <xsl:apply-templates/>
  </xsl:template>

  <xsl:template match="html:applet">
    <xsl:apply-templates/>
  </xsl:template>

  <xsl:template match="html:param"/>

</xsl:stylesheet>

An XSLT 2.0 stylesheet for working with XHTML

XPath expressions can use customary, non-prefixed XHTML element names.

<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns="http://www.w3.org/1999/xhtml" 
  default-xpath-namespace="http://www.w3.org/1999/xhtml"
>

  <xsl:output method="html" encoding="ISO-8859-1"/>

  <xsl:template match="week">
    <html xml:lang="en" lang="en">
      <head><title><xsl:value-of select="//h1[1]"/></title></head>
      <body bgcolor="#ffffff" text="#000000">

        <xsl:apply-templates select="body"/>

        <font size="-1">Last Modified Mon June 5, 2001<br />
          Copyright 2001 Elliotte Rusty Harold<br />
          <a href="mailto:elharo@metalab.unc.edu">elharo@metalab.unc.edu</a>
        </font>
      </body>
    </html>
  </xsl:template>

  <xsl:template match="body">
    <xsl:apply-templates 
     select="text()[count(following-sibling::hr)>1]|*[count(following-sibling::hr)>1]"/>

    <hr/>
  </xsl:template>

  <xsl:template match="*">
    <xsl:copy>
      <xsl:for-each select="@*">
        <xsl:copy-of select="."/>
      </xsl:for-each>
      <xsl:apply-templates/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="font[@size='-1']"></xsl:template>

  <xsl:template match="a">
    <xsl:apply-templates/>
  </xsl:template>

  <xsl:template match="applet">
    <xsl:apply-templates/>
  </xsl:template>

  <xsl:template match="param"/>

</xsl:stylesheet>

User defined data elements

Top-level elements in some namespace other than the XSLT namespace (No namespace is not allowed.)
Exact interpretation is processor specific, but may not change the meaning of cusotmary elements
Possible uses:
- Data for extension instructions and extension functions
- Information about what to do with the result tree
- Information about how to obtain the source tree
- Optimization hints
- Metadata about the stylesheet
- Documentation for the stylesheet

Typed variables and parameters

Variables can have a type:

Syntax:

<xsl:variable
  name   = "QualifiedName"
  select = expression
  type   = datatype>
  <!-- Content: content-constructor -->
</xsl:variable>

<xsl:param
  name   = "QualifiedName"
  select = expression
  type   = datatype>
  <!-- Content: content-constructor -->
</xsl:param>

Constants for types remain to be determined

Functions defined in XSLT

<xsl:function name="math:factorial"
 xmlns:math="http://www.example.com/math"
 exclude-result-prefixes="math">
  <xsl:param name="index" type="xsd:nonNegativeInteger"/>
  <xsl:result type="xsd:positiveInteger"
    select="if ($index eq 0) then 1
            else $index * math:factorial(index - 1)/>
</xsl:function>

Not parsing text files

sequence unparsed-text(sequence uris, String encoding?)

For example, to include a text document as an example:

<example href="examples/bib.xml"/>

<xsl:template match="example">
  <pre><code><xsl:value-of select="unparsed-text(@source)"/></code></pre>
</xsl:template>

Can also be used to load non-XML data such as tab-delimited text for parsing; e.g. with regular expressions

XQuery

Three parts:

A data model for XML documents based on the XML Infoset
A mathematically precise query algebra; that is, a set of query operators on that data model
A query language based on these query operators and this algebra

XQuery Language

A fourth generation declarative language like SQL; not a procedural language like Java or a functional language like XSLT
Queries operate on single documents or fixed collections of documents.
Queries select whole documents or subtrees of documents that match conditions defined on document content and structure
Can construct new documents based on what is selected
No updates or inserts!

Documents to Query

Narrative documents and collections of such documents; e.g. generate a table of contents for a book
Record-like documents; e.g. SQL-like queries of an XML dump of a database
Filtering streams to process logs of email messages, network packets, stock market data, newswire feeds, EDI, or weather data to filter and route messages represented in XML, to extract data from XML streams, or to transform data in XML streams.
XML views of non-XML data

Physical Representations to Query

Files on a disk
Native-XML databases like Software AG's Tamino
DOM trees in memory
Streaming data
Other representations of the infoset

Where is XQuery used?

Direct query tools at command line
GUI query tools
JSP, ASP, PHP, and other such server side technologies
Programs written in Java, C++, and other languages that need to extract data from XML documents
Others are possible
Anywhere SQL is used to extract data from a database, XQuery is used to extract data from an XML document.
SQL is a non-compiled language that must be processed by some other tool to extract data from a database. So is XQuery.

The XML Model vs. the Relational Model

A relational database contains tables	An XML database contains collections
A relational table contains records with the same schema	A collection contains XML documents with the same DTD
A relational record is an unordered list of named values	An XML document is a tree of nodes
A SQL query returns an unordered set of records	An XQuery returns an ordered sequence of nodes

Query Data Types

XML 1.0 #PCDATA
Schema primitive types: positiveInteger, String, float, double, unsignedLong, gYear, date, time, boolean, etc.
Schema complex types
Collections of these types
References to these types

An example document to query

Most of the examples in this talk query this bibliography document at the (relative) URL bib.xml:

<bib>
  <book year="1994">
  <title>TCP/IP Illustrated</title>
  <author><last>Stevens</last><first>W.</first></author>
  <publisher>Addison-Wesley</publisher>
  <price>65.95</price>
</book>

<book year="1992">
  <title>Advanced Programming in the Unix Environment</title>
  <author><last>Stevens</last><first>W.</first></author>
  <publisher>Addison-Wesley</publisher>
  <price>65.95</price>
</book>

<book year="2000">
  <title>Data on the Web</title>
  <author><last>Abiteboul</last><first>Serge</first></author>
  <author><last>Buneman</last><first>Peter</first></author>
  <author><last>Suciu</last><first>Dan</first></author>
  <publisher>Morgan Kaufmann Publishers</publisher>
  <price>39.95</price>
</book>

<book year="1999">
  <title>The Economics of Technology and Content for Digital TV</title>
  <editor>
    <last>Gerbarg</last><first>Darcy</first>
    <affiliation>CITI</affiliation>
  </editor>
  <publisher>Kluwer Academic Publishers</publisher>
  <price>129.95</price>
</book>

</bib>

Adapted from Mary Fernandez, Jerome Simeon, and Phil Wadler: XML Query Languages: Experiences and Exemplars, 1999, as adapted in XML Query Use Cases

The XQuery FLWOR

for: each node selected by an XPath 2.0 location path
let: a new variable have a specified value
where: a condition expressed in XPath is true
order by: the value of an XPath expression
return: a calculated XML fragment

Query: List titles of all books

   for $t in document("bib.xml")/bib/book/title
   return
      $t

Adapted from XML Query Use Cases

Query Result: Book Titles

  <title>TCP/IP Illustrated</title>
  <title>Advanced Programming in the Unix Environment</title>
  <title>Data on the Web</title>
  <title>The Economics of Technology and Content for Digital TV</title>

Adapted from XML Query Use Cases

XQueryX

An XML Syntax for XQuery
Intended for machine processing and programmer convenience, not for human legibility

In XQuery:

   for $t in document("bib.xml")/bib/book/title
   return
      $t

In XQueryX:

<?xml version="1.0"?>
<xq:query xmlns:xq="http://www.w3.org/2001/06/xqueryx">
  <xq:flwr>
    <xq:forAssignment variable="$t">
      <xq:step axis="CHILD">
        <xq:function name="document">
          <xq:constant datatype="CHARSTRING">bib.xml</xq:constant>
        </xq:function>
        <xq:identifier>bib</xq:identifier>
      </xq:step>
      <xq:step axis="CHILD">
        <xq:identifier>book</xq:identifier>
      </xq:step>
      <xq:step axis="CHILD">
        <xq:identifier>title</xq:identifier>
      </xq:step>
    </xq:forAssignment>
    <xq:return>
      <xq:variable>$t</xq:variable>
    </xq:return>
  </xq:flwr>
</xq:query>

Element Constructors

Tags are given as literals
XQuery expression which is evaluated to become the contents of the element is enclosed in curly braces
The contents can also contain literal text outside the braces

List titles of all books in a bib element. Put each title in a book element.

<bib>
  {
   for $t in document("bib.xml")/bib/book/title
   return
    <book>
     { $t }
    </book>
  }
</bib>

Adapted from XML Query Use Cases

Query Result: Book Titles

<bib>
  <book>
    <title>TCP/IP Illustrated</title>
  </book>
  <book>
    <title>Advanced Programming in the Unix Environment</title>
  </book>
  <book>
    <title>Data on the Web</title>
  </book>
  <book>
    <title>The Economics of Technology and Content for Digital TV</title>
  </book>
</bib>

Adapted from XML Query Use Cases

Query with where

List titles of books published by Addison-Wesley

<bib>
 {
   for $b in document("bib.xml")/bib/book
   where $b/publisher = "Addison-Wesley"
   return
      $b/title 
  }
</bib>

This where clause could be replaced by an XPath predicate:

<bib>
 {
   for $b in document("bib.xml")/bib/book[publisher="Addison-Wesley"]
   return
      $b/title 
 }
</bib>

But where clauses can combine multiple variables from multiple documents

Adapted from XML Query Use Cases

Query Result: Titles of books published by Addison-Wesley

<bib>
    <title>TCP/IP Illustrated</title>
    <title>Advanced Programming in the Unix Environment</title>
</bib>

Adapted from XML Query Use Cases

Query with Booleans

XQuery booleans include:
- and
- or
- not()

List books published by Addison-Wesley after 1993:

<bib>
 {
   for $b in document("bib.xml")/bib/book
   where $b/publisher = "Addison-Wesley" and $b/@year > 1993
   return
      $b/title 
 }
</bib>

Adapted from XML Query Use Cases

Query Result: books published by Addison-Wesley after 1993

<bib>
    <title>Advanced Programming in the Unix Environment</title>
</bib>

Adapted from XML Query Use Cases

Attribute Constructors

List books published by Addison-Wesley after 1993, including their year and title:

<bib>
 {
   for $b in document("bib.xml")/bib/book
   where $b/publisher = "Addison-Wesley" and $b/@year > 1993
   return
    <book year ="{ $b/@year }">
     { $b/title }
    </book>
 }
</bib>

Adapted from XML Query Use Cases

Query Result: books published by Addison-Wesley after 1993, including their year and title.

<bib>
  <book year="1992">
    <title>Advanced Programming in the Unix Environment</title>
  </book>
</bib>

Adapted from XML Query Use Cases

Query with multiple variables

Create a list of all the title-author pairs, with each pair enclosed in a result element.

<results>
 {
   for $b in document("bib.xml")/bib/book,
     $t in $b/title,
     $a in $b/author
   return
    <result>
    { $t }
    { $a }
    </result>
  }
</results>

Adapted from XML Query Use Cases

Query Result: A list of all the title-author pairs

<results>
    <result>
         <title>TCP/IP Illustrated</title>
         <author><last>Stevens</last><first>W.</first></author>
    </result>
    <result>
         <title>Advanced Programming in the Unix Environment</title>
         <author><last>Stevens</last><first>W.</first></author>
    </result>
    <result>
         <title>Data on the Web</title>
         <author><last>Abiteboul</last><first>Serge</first></author>
    </result>
    <result>
         <title> Data on the Web</title>
         <author><last>Buneman</last><first>Peter</first></author>
    </result>
    <result>
         <title>Data on the Web</title>
         <author><last>Suciu</last><first>Dan</first></author>
    </result>
</results>

Adapted from XML Query Use Cases

Nested Queries

For each book in the bibliography, list the title and authors, grouped inside a result element.

<results>
 {
   for $b in document("bib.xml")/bib/book
   return
    <result>
     { $b/title }
     {  
       for $a in $b/author
       return $a
     }
    </result>
 }
</results>

Adapted from XML Query Use Cases

Query Result: A list of the title and authors of each book in the bibliography

<?xml version="1.0"?>
<results xmlns:ino="http://namespaces.softwareag.com/tamino/response2" xmlns:xql="http://metalab.unc.edu/xql/">
  <result>
    <title>TCP/IP Illustrated</title>
    <author>
      <last>Stevens</last>
      <first>W.</first>
    </author>
  </result>
  <result>
    <title>Advanced Programming in the Unix Environment</title>
    <author>
      <last>Stevens</last>
      <first>W.</first>
    </author>
  </result>
  <result>
    <title>Data on the Web</title>
    <author>
      <last>Abiteboul</last>
      <first>Serge</first>
    </author>
    <author>
      <last>Buneman</last>
      <first>Peter</first>
    </author>
    <author>
      <last>Suciu</last>
      <first>Dan</first>
    </author>
  </result>
  <result>
    <title>The Economics of Technology and Content for Digital TV</title>
  </result>
</results>

Adapted from XML Query Use Cases

Query with distinct

For each author in the bibliography, list the author's name and the titles of all books by that author, grouped inside a result element.

<results>
 {
   for $a in distinct-values(document("bib.xml")//author)
   return
    <result>
     { $a }
     {  for $b in document("bib.xml")/bib/book[author=$a]
        return $b/title
     }
    </result>
 }
</results>

Adapted from XML Query Use Cases

Query Result

<results>
  <result>
    <author><last>Stevens</last><first>W.</first></author>
    <title>TCP/IP Illustrated</title>
    <title>Advanced Programming in the Unix Environment</title>
  </result>

  <result>
    <author><last>Abiteboul</last><first>Serge</first></author>
    <title>Data on the Web</title>
  </result>

  <result>
    <author><last>Buneman</last><first>Peter</first></author>
    <title>Data on the Web</title>
  </result>

  <result>
    <author><last>Suciu</last><first>Dan</first></author>
      <title>Data on the Web</title>
  </result>
</results>

Adapted from XML Query Use Cases

Query with sorting

List the titles and years of all books published by Addison-Wesley after 1991, in alphabetic order.

<bib>
 {
   for $b in document("bib.xml")//book
    [publisher = "Addison-Wesley" and @year > "1991"]
   order by (title)
   return
    <book>
     { $b/@year } { $b/title }
    </book> 
 }
</bib>

June 2002 version of QuiP does not yet handle this one.

Adapted from XML Query Use Cases

Query Result

<bib>
  <book year="1992">
    <title>Advanced Programming in the Unix Environment</title>
  </book>
  <book year="1994">
    <title>TCP/IP Illustrated</title>
   </book>
</bib>

Adapted from XML Query Use Cases

Queries with functions

Find books in which some element has a tag ending in "or" and the same element contains the string "Suciu" (at any level of nesting). For each such book, return the title and the qualifying element.

<result>
  for $b in document("bib.xml")//book,
    $e in $b/*[contains(string(.), "Suciu")]
  where ends_with(name($e), "or") 
  return
   <book>
    { $b/title} { $e }
   </book>
</result>

Not supported by Quip yet

Adapted from XML Query Use Cases

Query Result

<result>
 <book>
  <title> Data on the Web </title>
  <author> <last>Suciu</last> <first>Dan</first> </author>
 </book>
</result>

Adapted from XML Query Use Cases

A different document about books

Sample data at "reviews.xml":

<reviews>
  <entry>
    <title>Data on the Web</title>
    <price>34.95</price>
    <review>
       A very good discussion of semi-structured database
       systems and XML.
    </review>
  </entry>
  <entry>
    <title>Advanced Programming in the Unix Environment</title>
    <price>65.95</price>
    <review>
      A clear and detailed discussion of UNIX programming.
    </review>
  </entry>
  <entry>
    <title>TCP/IP Illustrated</title>
    <price>65.95</price>
    <review>
      One of the best books on TCP/IP.
    </review>
  </entry>
</reviews>

Adapted from XML Query Use Cases

This document uses a different DTD

<!ELEMENT reviews (entry*)>
<!ELEMENT entry   (title, price, review)>
<!ELEMENT title   (#PCDATA)>
<!ELEMENT price   (#PCDATA)>
<!ELEMENT review  (#PCDATA)>

Query that joins two documents

For each book found in both bib.xml and reviews.xml, list the title of the book and its price from each source.

<books-with-prices>
 {
   for $b in document("bib.xml")//book,
     $a in document("reviews.xml")//entry
   where $b/title = $a/title
   return
    <book-with-prices>
     { $b/title },
       <price-amazon> { $a/price/text() } </price-amazon>
       <price-bn> { $b/price/text() } </price-bn>
    </book-with-prices>
 }
</books-with-prices>

Adapted from XML Query Use Cases

Result

<books-with-prices>
  <book-with-prices>
    <title>TCP/IP Illustrated</title>
    <price-amazon>65.95</price-amazon>
    <price-bn>65.95</price-bn>
  </book-with-prices>

  <book-with-prices>
    <title>Advanced Programming in the Unix Environment</title>
    <price-amazon>65.95</price-amazon>
    <price-bn>65.95</price-bn>
  </book-with-prices>

  <book-with-prices>
    <title>Data on the Web</title>
    <price-amazon>34.95</price-amazon>
    <price-bn>39.95</price-bn>
  </book-with-prices>
</books-with-prices>

Adapted from XML Query Use Cases

prices.xml Query Sample Data

The next query also uses an input document named "prices.xml":

<prices>
  <book>
    <title>Advanced Programming in the Unix Environment</title>
    <source>www.amazon.com</source>
    <price>65.95</price>
  </book>
  <book>
    <title>Advanced Programming in the Unix Environment</title>
    <source>www.bn.com</source>
    <price>65.95</price>
  </book>
  <book>
    <title>TCP/IP Illustrated</title>
    <source>www.amazon.com</source>
    <price>65.95</price>
  </book>
  <book>
    <title>TCP/IP Illustrated</title>
    <source>www.bn.com</source>
    <price>65.95</price>
  </book>
  <book>
    <title>Data on the Web</title>
    <source>www.amazon.com</source>
    <price>34.95</price>
  </book>
  <book>
    <title>Data on the Web</title>
    <source>www.bn.com</source>
    <price>39.95</price>
  </book>
</prices>

Adapted from XML Query Use Cases

Query with reused variables

In the document "prices.xml", find the minimum price for each book, in the form of a minprice element with the book title as its title attribute.

<results>
 {
   let $doc := document("prices.xml")
   for $t in distinct-values($doc/prices/book/title)
   let $p := $doc/prices/book[title = $t]/price
   return
    <minprice title = "{ $t }" >
     { min($p) }
    </minprice>
 }
</results>

Adapted from XML Query Use Cases

Query Result

<results>
  <minprice title="Advanced Programming in the Unix Environment"> 65.95 </minprice>
  <minprice title="TCP/IP Illustrated"> 65.95 </minprice>
  <minprice title="Data on the Web"> 34.95 </minprice>
</results>

Adapted from XML Query Use Cases

Multiple FLWR Queries

For each book with an author, return a book with its title and authors. For each book with an editor, return a reference with the book title and the editor's affiliation.

<bib>
 {
   for $b in document("bib.xml")//book[author]
   return
    <book>
     { $b/title }
     { $b/author }
    </book>,
   for $b in document("bib.xml")//book[editor]
   return
    <reference>
     { $b/title }
     <org> { $b/editor/affiliation/text() } </org>
    </reference>
 }
</bib>

Adapted from XML Query Use Cases

Query Result

<bib>
    <book>
         <title>TCP/IP Illustrated</title>
         <author><last> Stevens </last> <first> W.</first></author>
    </book>

    <book>
         <title>Advanced Programming in the Unix Environment</title>
         <author><last>Stevens</last><first>W.</first></author>
    </book>

    <book>
         <title>Data on the Web</title>
         <author><last>Abiteboul</last><first>Serge</first></author>
         <author><last>Buneman</last><first>Peter</first></author>
         <author><last>Suciu</last><first>Dan</first></author>
    </book>

    <reference>
        <title>The Economics of Technology and Content for Digital TV</title>
        <org>CITI</org>
    </reference>
</bib>

Adapted from XML Query Use Cases

Query Software

QuiP: http://www.softwareag.com/developer/quip/
Quilt: http://www.almaden.ibm.com/cs/people/chamberlin/quilt.html
Kweelt: http://kweelt.sourceforge.net/
Software AG's Tamino: http://www.softwareag.com/tamino/
Ipedo: http://www.ipedo.com/
Cognetic Systems's XQuantum: http://www.cogneticsystems.com/xquery/xquery.html
Enosys Software's XQuery Demo : http://xquerydemo.enosyssoftware.com
eXcelon's eXtensible Information Server (XIS): http://www.xmlquickstart.com/
Fatdog's XQEngine: http://www.fatdog.com/
GAEL's Derby: http://www.gael.fr/derby/
Qexo (Kawa-Query): http://www.qexo.org/ Compiles XQuery on-the-fly to Java bytecodes. Based on and part of the Kawa framework. Open-source.
IPSI's IPSI-XQ: http://ipsi.fhg.de/oasys/projects/ipsi-xq/index_e.html
Lucent's Galax: http://db.bell-labs.com/galax/
Microsoft's XML Query Language Demo: http://xqueryservices.com
Nimble Technology's Nimble Integration Suite: http://www.nimble.com/
OpenLink Software's Virtuoso Universal Server: http://demo.openlinksw.com:8890/xqdemo
Oracle's XML DB: http://otn.oracle.com/tech/xml/xmldb/htdocs/querying_xml
QuiLogic's SQL/XML-IMDB: http://www.quilogic.cc/xml.htm
SourceForge's XQuench: http://xquench.sourceforge.net/ Open-source.
X-Hive's XQuery demo: http://www.x-hive.com/xquery
XML Global's GoXML DB: http://www.xmlglobal.com/prod/xmlworkbench/

What's the difference between XQuery and XSLT?

XSLT is document-driven; XQuery is program driven
XSLT is functional; XQuery is declarative
XSLT is written in XML; XQuery is not
An assertion (unproven): XSLT 2.0 can do everything XQuery can do

To Learn More

This presentation: http://www.cafeconleche.org/slides/xmlone/london2003/xslt2
XSLT 2.0 Working Draft: http://www.w3.org/TR/xslt20
XPath 2.0 Working Draft: http://www.w3.org/TR/xpath20
XPath 2.0 Requirements: http://www.w3.org/TR/2001/WD-xpath20req-20010214
XSLT 2.0 Requirements: http://www.w3.org/TR/2001/WD-xslt20req-20010214
XQuery: A Query Language for XML: http://www.w3.org/TR/xquery/
XML Query Requirements: http://www.w3.org/TR/xmlquery-req
XML Query Use Cases: http://www.w3.org/TR/xmlquery-use-cases
XML Query Data Model: http://www.w3.org/TR/query-datamodel/
The XML Query Algebra: http://www.w3.org/TR/query-algebra/
XML Syntax for XQuery 1.0 (XQueryX): http://www.w3.org/TR/xqueryx
XQuery 1.0 and XPath 2.0 Functions and Operators Version 1.0: http://www.w3.org/TR/xquery-operators/

Index | Cafe con Leche