Advanced XML Programming


Blood being drawn from an arm

The Bleeding Edge of XML

Elliotte Rusty Harold

XML and Web Services 2002 London

Monday, March 11, 2002

elharo@metalab.unc.edu

http://www.cafeconleche.org//


Outline


Part I: Semantics and Syntax

The Infoset is the unfortunate standard to which those in retreat from the radical and most useful implications of well-formedness have rallied. At its core the Infoset insists that there is 'more' to XML than the straightforward syntax of well-formedness. By imposing its canonical semantics the Infoset obviates the infinite other semantic outcomes which might be elaborated in particular unique circumstances from an instance of well-formed XML 1.0 syntax. The question we should be asking is not whether the Infoset has chosen the correct canonical semantics, but whether the syntactic possibilities of XML 1.0 should be curtailed in this way at all.
--Walter Perry on the xml-dev mailing list


A normal XML document

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/css" href="song.css"?>
<!DOCTYPE SONG SYSTEM "song.dtd">
<SONG xmlns="http://www.cafeconleche.org/namespace/song"
      xmlns:xlink="http://www.w3.org/1999/xlink">
  <TITLE>Hot Cop</TITLE>
  <PHOTO 
    xlink:type="simple" xlink:show="onLoad" xlink:href="hotcop.jpg"
    ALT="Victor Willis in Cop Outfit" WIDTH="100" HEIGHT="200"/>
  <COMPOSER>Jacques Morali</COMPOSER>
  <COMPOSER>Henri Belolo</COMPOSER>
  <COMPOSER>Victor Willis</COMPOSER>
  <PRODUCER>Jacques Morali</PRODUCER>
  <!-- The publisher is actually Polygram but I needed 
       an example of a general entity reference. -->
  <PUBLISHER xlink:type="simple" xlink:href="http://www.amrecords.com/">
    A &amp; M Records
  </PUBLISHER>
  <LENGTH>6:20</LENGTH>
  <YEAR>1978</YEAR>
  <ARTIST>Village People</ARTIST>
</SONG>
<!-- You can tell what album I was 
     listening to when I wrote this example -->

A canonical XML document

<?xml-stylesheet type="text/css" href="song.css"?>
<SONG xmlns="http://www.cafeconleche.org/namespace/song" xmlns:xlink="http://www.w3.org/1999/xlink">
  <TITLE>Hot Cop</TITLE>
  <PHOTO ALT="Victor Willis in Cop Outfit" HEIGHT="200" WIDTH="100" xlink:href="hotcop.jpg" xlink:show="onLoad" xlink:type="simple"></PHOTO>
  <COMPOSER>Jacques Morali</COMPOSER>
  <COMPOSER>Henri Belolo</COMPOSER>
  <COMPOSER>Victor Willis</COMPOSER>
  <PRODUCER>Jacques Morali</PRODUCER>
  
  <PUBLISHER xlink:href="http://www.amrecords.com/" xlink:type="simple">
    A &amp; M Records
  </PUBLISHER>
  <LENGTH>6:20</LENGTH>
  <YEAR>1978</YEAR>
  <ARTIST>Village People</ARTIST>
</SONG>

An org.w3c.dom.Document object formed by reading hotcop.xml


import org.apache.xerces.parsers.*;
import org.w3c.dom.*;
import org.xml.sax.*;
import java.io.*;

public class DOMHotCop {

  public static void main(String[] args) {

    DOMParser parser = new DOMParser();

    try {
      parser.parse("hot_cop.xml");
      Document d = parser.getDocument();
    }
    catch (SAXException e) {
      System.err.println(e);
    }
    catch (IOException e) {
      System.err.println(e);
    }

  }

}

An encrypted hotcop.xml


Are these four the same thing or not?


What is the XML Infoset?


The Infoset defines 11 kinds of Information Items


The Document Information Item


Elements

<PHOTO 
  xlink:type="simple" xlink:show="onLoad" xlink:href="hotcop.jpg"
  ALT="Victor Willis in Cop Outfit" WIDTH="100" HEIGHT="200"/>
  
<COMPOSER>Jacques Morali</COMPOSER>

<COMPOSER>
  <PERSON>
    <NAME>
      <FIRST>Henri</FIRST>
      <LAST>Belolo</LAST>
    </NAME>
  </PERSON>
</COMPOSER>

<rdf:RDF xmlns:rdf="http://www.w3.org/TR/REC-rdf-syntax#">
  <rdf:Description xmlns:dc="http://purl.org/dc/"
     about="http://www.ibiblio.org/examples/impressionists.xml">
    <dc:title> Impressionist Paintings </dc:title>
    <dc:creator> Elliotte Rusty Harold </dc:creator>
    <dc:description> 
      A list of famous impressionist paintings organized 
      by painter and date 
    </dc:description>
    <dc:date>2000-08-22</dc:date>
  </rdf:Description>
</rdf:RDF>

Element Information Items

An Element Information Item Includes:


Attributes

xlink:type="simple"
xlink:href="http://www.amrecords.com/"
xlink:type =  "simple"
xlink:show = "onLoad"
xlink:href="hotcop.jpg"
ALT="Victor Willis in Cop Outfit"
WIDTH=" 100 "
HEIGHT=' 200 '

An Attribute Information Item Includes:


Comments

  <!-- The publisher is actually Polygram but I needed 
       an example of a general entity reference. -->
<!--  <PUBLISHER xlink:type="simple" xlink:href="http://www.amrecords.com/">
    A &amp; M Records
  </PUBLISHER>
  <LENGTH>6:20</LENGTH>
  <YEAR>1978</YEAR>
  <ARTIST>Village People</ARTIST>
</SONG> -->
<!-- You can tell what album I was 
     listening to when I wrote this example -->

A comment Information Item includes:


A Processing Instruction Information Item Includes:

<?robots index="yes" follow="no"?>
<?php 
  mysql_connect("database.unc.edu", "clerk", "password"); 
  $result = mysql("CYNW", "SELECT LastName, FirstName FROM Employees 
    ORDER BY LastName, FirstName"); 
  $i = 0;
  while ($i < mysql_numrows ($result)) {
     $fields = mysql_fetch_row($result);
     echo "<person>$fields[1] $fields[0] </person>\r\n";
     $i++;
  }
  mysql_close();
?>

Characters


Namespaces


Document Type Declaration

<!DOCTYPE SONG SYSTEM "song.dtd">
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
                       "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

A Document Type Declaration Information Item includes:


Document Type Definition

<!ELEMENT SONG (TITLE, PHOTO?, COMPOSER+, PRODUCER*, 
 PUBLISHER*, LENGTH?, YEAR?, ARTIST+)>
<!ATTLIST SONG xmlns       CDATA #REQUIRED
               xmlns:xlink CDATA #REQUIRED>
<!ELEMENT TITLE (#PCDATA)>

<!ELEMENT PHOTO EMPTY>
<!ATTLIST PHOTO xlink:type CDATA #FIXED "simple"
                xlink:href CDATA #REQUIRED
                xlink:show CDATA #IMPLIED
                ALT        CDATA #REQUIRED
                WIDTH      CDATA #REQUIRED
                HEIGHT     CDATA #REQUIRED
>

<!ELEMENT COMPOSER (#PCDATA)>
<!ELEMENT PRODUCER (#PCDATA)>
<!ELEMENT PUBLISHER (#PCDATA)>
<!ATTLIST PUBLISHER xlink:type CDATA #IMPLIED
                    xlink:href CDATA #IMPLIED
>

<!ELEMENT LENGTH (#PCDATA)>
<!-- This should be a four digit year like "1999",
     not a two-digit year like "99" -->
<!ELEMENT YEAR (#PCDATA)>

<!ELEMENT ARTIST (#PCDATA)>

Entities


Entity Information Items


Unparsed Entity Information Items


Unexpanded Entity Information Items


The Infoset Omits:


The PSVI


Canonical XML


How are documents canonicalized?

  1. The document is encoded in UTF-8

  2. Line breaks are normalized to a linefeed (ASCII , \n)

  3. Attribute values are normalized, as if by a validating processor

  4. Character and parsed entity references are replaced

  5. CDATA sections are replaced with their character content

  6. The XML and document type declarations are removed

  7. Empty elements are converted to start tag-end tag pairs

  8. White space outside of the document element and within start and end tags is normalized

  9. All white space in character content is retained (except for characters removed during linefeed normalization)

  10. Attribute value delimiters are set to double quotes

  11. Special characters in attribute values and character content are replaced by character references

  12. Superfluous namespace declarations are removed from each element

  13. Default attributes are added to each element

  14. Lexicographic order is imposed on the namespace declarations and attributes of each element


Canonicalization software

XML Canonicalizer from IBM's XML Security Suite:
Apache XML Security Suite
A standard feature for DOM level 3's DOMWriter

Digital Signatures


Not Just for Signing XML


Generic Digital Signature Process

  1. The signature processor calculates a hash code for some data using a strong, one-way hash function.

  2. The processor encrypts the hash code using a private key.

  3. The verifier calculates the hash code for the data it's received.

  4. It then decrypts the encrypted hash code using the public key to see if the hash codes match.


XML Signature Process

  1. The signature processor digests (calculates the hash code for) a data object.

  2. The processor places the digest value in a Signature element.

  3. The processor digests the Signature element.

  4. The processor cryptographically signs the Signature element.


XML Digital Signature software


A Detached Signature for hotcop.xml

<?xml version='1.0' encoding='UTF-8'?>
<Signature xmlns="http://www.w3.org/2000/09/xmldsig#">
  <SignedInfo>
    <CanonicalizationMethod Algorithm="http://www.w3.org/TR/2000/WD-xml-c14n-20000119"/>
    <SignatureMethod Algorithm="http://www.w3.org/2000/09/xmldsig#dsa-sha1"/>
    <Reference URI="http://www.cafeconleche.org/slides/hoffman/fundamentals/examples/hotcop.xml">
      <DigestMethod Algorithm="http://www.w3.org/2000/09/xmldsig#sha1"/>
      <DigestValue>nvfYilfgN/rICyzhGmjidKCFoC8=</DigestValue>
    </Reference>
  </SignedInfo>
  <SignatureValue>
    hfowa4qdbuMkoZfX1/VXd4UBpIpZMM5+6CElmY7jOIKFqvXq5A5VKw==
  </SignatureValue>
  <KeyInfo>
    <KeyValue>
      <DSAKeyValue>
        <P>
          /X9TgR11EilS30qcLuzk5/YRt1I870QAwx4/gLZRJmlFXUAiUftZPY1Y+r/F9bow9s
          ubVWzXgTuAHTRv8mZgt2uZUKWkn5/oBHsQIsJPu6nX/rfGG/g7V+fGqKYVDwT7g/bT
          xR7DAjVUE1oWkTL2dfOuK2HXKu/yIgMZndFIAcc=
        </P>
        <Q>l2BQjxUjC8yykrmCouuEC/BYHPU=</Q>
        <G>
          9+GghdabPd7LvKtcNrhXuXmUr7v6OuqC+VdMCz0HgmdRWVeOutRZT+ZxBxCBgLRJFn
          Ej6EwoFhO3zwkyjMim4TwWeotUfI0o4KOuHiuzpnWRbqN/C/ohNWLx+2J6ASQ7zKTx
          vqhRkImog9/hWuWfBpKLZl6Ae1UlZAFMO/7PSSo=
        </G>
        <Y>
          6jKpNnmkkWeArsn5Oeeg2njcz+nXdk0f9kZI892ddlR8Lg1aMhPeFTYuoq3I6neFlb
          BjWzuktNZKiXYBfKsSTB8U09dTiJo2ir3HJuY7eW/p89osKMfixPQsp9vQMgzph6Qa
          lY7j4MB7y5ROJYsTr1/fFwmj/yhkHwpbpzed1LE=
        </Y>
      </DSAKeyValue>
    </KeyValue>
    <X509Data>
      <X509IssuerSerial>
        <X509IssuerName>CN=Elliotte Rusty Harold, OU=Metrotech, O=Polytechnic, L=Brooklyn, ST=New York, C=US</X509IssuerName>
        <X509SerialNumber>983556890</X509SerialNumber></X509IssuerSerial>
      <X509SubjectName>CN=Elliotte Rusty Harold, OU=Metrotech, O=Polytechnic, L=Brooklyn, ST=New York, C=US</X509SubjectName>
      <X509Certificate>
MIIDLzCCAu0CBDqf4xowCwYHKoZIzjgEAwUAMH0xCzAJBgNVBAYTAlVTMREwDwYDVQQIEwhOZXcg
WW9yazERMA8GA1UEBxMIQnJvb2tseW4xFDASBgNVBAoTC1BvbHl0ZWNobmljMRIwEAYDVQQLEwlN
ZXRyb3RlY2gxHjAcBgNVBAMTFUVsbGlvdHRlIFJ1c3R5IEhhcm9sZDAeFw0wMTAzMDIxODE0NTBa
Fw0wMTA1MzExODE0NTBaMH0xCzAJBgNVBAYTAlVTMREwDwYDVQQIEwhOZXcgWW9yazERMA8GA1UE
BxMIQnJvb2tseW4xFDASBgNVBAoTC1BvbHl0ZWNobmljMRIwEAYDVQQLEwlNZXRyb3RlY2gxHjAc
BgNVBAMTFUVsbGlvdHRlIFJ1c3R5IEhhcm9sZDCCAbgwggEsBgcqhkjOOAQBMIIBHwKBgQD9f1OB
HXUSKVLfSpwu7OTn9hG3UjzvRADDHj+AtlEmaUVdQCJR+1k9jVj6v8X1ujD2y5tVbNeBO4AdNG/y
ZmC3a5lQpaSfn+gEexAiwk+7qdf+t8Yb+DtX58aophUPBPuD9tPFHsMCNVQTWhaRMvZ1864rYdcq
7/IiAxmd0UgBxwIVAJdgUI8VIwvMspK5gqLrhAvwWBz1AoGBAPfhoIXWmz3ey7yrXDa4V7l5lK+7
+jrqgvlXTAs9B4JnUVlXjrrUWU/mcQcQgYC0SRZxI+hMKBYTt88JMozIpuE8FnqLVHyNKOCjrh4r
s6Z1kW6jfwv6ITVi8ftiegEkO8yk8b6oUZCJqIPf4VrlnwaSi2ZegHtVJWQBTDv+z0kqA4GFAAKB
gQDqMqk2eaSRZ4Cuyfk556DaeNzP6dd2TR/2Rkjz3Z12VHwuDVoyE94VNi6ircjqd4WVsGNbO6S0
1kqJdgF8qxJMHxTT11OImjaKvccm5jt5b+nz2iwox+LE9Cyn29AyDOmHpBqVjuPgwHvLlE4lixOv
X98XCaP/KGQfClunN53UsTALBgcqhkjOOAQDBQADLwAwLAIUODqxsFzS96BjrVA4LVo5FzuWBRMC
FC0xfXxbaJaCJuVqtcBv4bqwV0EX
      </X509Certificate>
    </X509Data>
  </KeyInfo>
</Signature>


XML Encryption


XML Encryption Algorithms


Complete standard algorithm list

From the spec:

Block Encryption
  1. REQUIRED TRIPLEDES
    http://www.w3.org/2001/04/xmlenc#tripledes-cbc

  2. REQUIRED AES-128
    http://www.w3.org/2001/04/xmlenc#aes128-cbc

  3. REQUIRED AES-256
    http://www.w3.org/2001/04/xmlenc#aes256-cbc

  4. OPTIONAL AES-192
    http://www.w3.org/2001/04/xmlenc#aes192-cbc

Key Transport
  1. REQUIRED RSA-v1.5
    http://www.w3.org/2001/04/xmlenc#rsa-1_5

  2. REQUIRED RSA-OAEP
    http://www.w3.org/2001/04/xmlenc#rsa-oaep-mgf1p

Key Agreement
  1. OPTIONAL Diffie-Hellman
    http://www.w3.org/2001/04/xmlenc#dh

Symmetric Key Wrap
  1. REQUIRED TRIPLEDES KeyWrap
    http://www.w3.org/2001/04/xmlenc#kw-tripledes

  2. REQUIRED AES-128 KeyWrap
    http://www.w3.org/2001/04/xmlenc#kw-aes128

  3. REQUIRED AES-256 KeyWrap
    http://www.w3.org/2001/04/xmlenc#kw-aes256

  4. OPTIONAL AES-192 KeyWrap
    http://www.w3.org/2001/04/xmlenc#kw-aes192

Message Digest
  1. REQUIRED SHA1
    http://www.w3.org/2000/09/xmldsig#sha1

  2. RECOMMENDED SHA256
    http://www.w3.org/2001/04/xmlenc#sha256

  3. OPTIONAL SHA512
    http://www.w3.org/2001/04/xmlenc#sha512

  4. OPTIONAL RIPEMD-160
    http://www.w3.org/2001/04/xmlenc#ripemd160

Message Authentication
  1. RECOMMENDED XML Digital Signature
    http://www.w3.org/TR/2001/CR-xmldsig-core-20010419/

Canonicalization
  1. OPTIONAL Canonical XML with Comments
    http://www.w3.org/TR/2001/REC-xml-c14n-20010315#WithComments

  2. OPTIONAL Canonical XML (omits comments)
    http://www.w3.org/TR/2001/REC-xml-c14n-20010315

Encoding
  1. REQUIRED base64
    http://www.w3.org/2000/09/xmldsig#base64


XML Encryption Syntax


An XML Document containing sensitive information

<?xml version='1.0'?>
<PaymentInfo xmlns="http://example.org/payment">
  <Name>Elliotte Rusty Harold<Name/>
  <CreditCard Limit='1000' Currency='USD'>
    <Number>1234 5678 9012 3456</Number>
    <Issuer>Citibank</Issuer>
    <Expiration>03/02</Expiration>
  </CreditCard>
</PaymentInfo>

An XML Document containing an encrypted element

<?xml version='1.0'?>
<PaymentInfo xmlns="http://example.org/payment">
  <Name>Elliotte Rusty Harold<Name/>
  <EncryptedData Type='http://www.w3.org/2001/04/xmlenc#Element'
     xmlns='http://www.w3.org/2001/04/xmlenc#'>
    <EncryptionMethod
      Algorithm="http://www.w3.org/2001/04/xmlenc#3des-cbc"/>
    <CipherData>
      <CipherValue>A23B45C56CABE4BE33327</CipherValue>
    </CipherData>
  </EncryptedData>
</PaymentInfo>

An XML Document containing encrypted element data

<?xml version='1.0'?>
<PaymentInfo xmlns="http://example.org/payment">
  <Name>Elliotte Rusty Harold<Name/>
  <CreditCard Limit="1000" Currency="USD">
     <EncryptedData Type="http://www.w3.org/2001/04/xmlenc#Content"
        xmlns="http://www.w3.org/2001/04/xmlenc#">
      <EncryptionMethod
        Algorithm="http://www.w3.org/2001/04/xmlenc#3des-cbc"/>
      <CipherData>
        <CipherValue>A23B45C56CABE4BE3</CipherValue>
      </CipherData>
    </EncryptedData>
  </CreditCard>
</PaymentInfo>

An XML Document containing encrypted text

<?xml version='1.0'?>
<PaymentInfo xmlns="http://example.org/payment">
  <Name>Elliotte Rusty Harold<Name/>
  <CreditCard Limit='1000' Currency='USD'>
    <Number>
      <EncryptedData Type="http://www.w3.org/2001/04/xmlenc#Content"
                     xmlns="http://www.w3.org/2001/04/xmlenc#">
        <EncryptionMethod
          Algorithm="http://www.w3.org/2001/04/xmlenc#3des-cbc"/>
        <CipherData>
          <CipherValue>A23B45C56CABE4BE</CipherValue>
        </CipherData>
      </EncryptedData>
    </Number>
    <Issuer>Citibank</Issuer>
    <Expiration>03/02</Expiration>
  </CreditCard>
</PaymentInfo>

A completely encrypted XML Document

<?xml version='1.0'?>
<EncryptedData 
   Type="http://www.isi.edu/in-notes/iana/assignments/media-types/text/xml"
   xmlns="http://www.w3.org/2001/04/xmlenc#">
  <EncryptionMethod Algorithm="http://www.w3.org/2001/04/xmlenc#3des-cbc"/>
  <CipherData>
    <CipherValue>A23B45C56CABE4BE7687989219C4E5DEADBEEFCAFEBABE</CipherValue>
  </CipherData>
</EncryptedData>

XML Encryption Software

xss4j, IBM's XML Security Suite:
Apache XML Security Suite
JSR-106: XML Digital Encryption APIs

Issues XML Encryption doesn't address


To Learn More


Part II: XML 1.1

Discriminating against languages simply because their scripts were not encoded in Unicode 2.0 is inherently unjust.

--XML Blueberry Requirements


New features


White Space


Native language markup


Name characters


Harm Reduction proposals


Part III: XPath 2.0 and Beyond

In SQL, the query language is not expressed in tables and rows. In XQuery, the query language is not expressed in XML. Why is this a problem?
--Jonathan Robie on the xml-dev mailing list


XPath 2.0


XPath 2.0 Goals


XPath 2.0 Requirements


XPath 1.0 Data Model

(Adapted from Jeni Tennison)


XPath 2.0 Data Model

(Adapted from Jeni Tennison)


XPath Comments

<xsl:apply-templates 
 select="{-- The difference between the context node and the 
             current node is crucial here --}
 ../composition[@composer=current()/@id]"/>

Namespace wildcards

<xsl:template match="*:set">
  This matches MathML set elements, SVG set elements, set
  elements in no namespace at all, etc. 
</xsl:template>

Can use functions as location steps


Can use parenthesized expressions as location steps


Dereference steps


Constructing sequences


Sequence example

<?xml version="1.0"?> 
<xsl:stylesheet version="2.0" 
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:template match="/">
    <numbers>
      <xsl:for-each select="(1 to 10)">
        <integer>
          <xsl:value-of select="."/>
        </integer>
      </xsl:for-each>
    </numbers>
  </xsl:template>

</xsl:stylesheet>

Output (modulo white space):

<?xml version="1.0" encoding="utf-8"?>
<numbers>
<integer>1</integer>
<integer>2</integer>
<integer>3</integer>
<integer>4</integer>
<integer>5</integer>
<integer>6</integer>
<integer>7</integer>
<integer>8</integer>
<integer>9</integer>
<integer>10</integer>
</numbers>

Unions of sequences

<?xml version="1.0"?> 
<xsl:stylesheet version="2.0" 
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:output indent="yes"/>             
                
  <xsl:template match="/">
    <numbers>
      <xsl:for-each select='(3 to 10) | (5 to 12) | (20 to 23)'>
        <integer>
          <xsl:value-of select="."/>
        </integer>
      </xsl:for-each>
    </numbers>
  </xsl:template>

</xsl:stylesheet>

Output:

<numbers>
<integer>3</integer>
<integer>4</integer>
<integer>5</integer>
<integer>6</integer>
<integer>7</integer>
<integer>8</integer>
<integer>9</integer>
<integer>10</integer>
<integer>11</integer>
<integer>12</integer>
<integer>20</integer>
<integer>21</integer>
<integer>22</integer>
<integer>23</integer>
</numbers>

Intersections of sequences

<?xml version="1.0"?> 
<xsl:stylesheet version="2.0" 
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:output indent="yes"/>             
                
  <xsl:template match="/">
    <numbers>
      <xsl:for-each select='(3 to 10) intersect (5 to 12)'>
        <integer>
          <xsl:value-of select="."/>
        </integer>
      </xsl:for-each>
    </numbers>
  </xsl:template>

</xsl:stylesheet>

Output:

<numbers>
<integer>5</integer>
<integer>6</integer>
<integer>7</integer>
<integer>8</integer>
<integer>9</integer>
<integer>10</integer>
</numbers>

Except sequences

<?xml version="1.0"?> 
<xsl:stylesheet version="2.0" 
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:output indent="yes"/>             
                
  <xsl:template match="/">
    <numbers>
      <xsl:for-each select='(3 to 10) except (5 to 12)'>
        <integer>
          <xsl:value-of select="."/>
        </integer>
      </xsl:for-each>
    </numbers>
  </xsl:template>

</xsl:stylesheet>

Output:

<numbers>
  <integer>3</integer>
  <integer>4</integer>
</numbers>

Value comparison operators


General comparisons


Node comparisons


For Expressions


for Example

Consider the list of weblogs at http://static.userland.com/weblogMonitor/logs.xml

<?xml version="1.0" encoding="ISO-8859-1" ?>
<!DOCTYPE foo SYSTEM "http://msdn.microsoft.com/xml/general/htmlentities.dtd">
<weblogs>
    <log>
        <name>MozillaZine</name>
        <url>http://www.mozillazine.org</url>
        <changesUrl>http://www.mozillazine.org/contents.rdf</changesUrl>
        <ownerName>Jason Kersey</ownerName>
        <ownerEmail>kerz@en.com</ownerEmail>
        <description>THE source for news on the Mozilla Organization.  DevChats, Reviews, Chats, Builds, Demos, Screenshots, and more.</description>
        <imageUrl></imageUrl>
        <adImageUrl>http://static.userland.com/weblogMonitor/ads/kerz@en.com.gif</adImageUrl>
    </log>
    <log>
        <name>SalonHerringWiredFool</name>
        <url>http://www.salonherringwiredfool.com/</url>
        <ownerName>Some Random Herring</ownerName>
        <ownerEmail>salonfool@wiredherring.com</ownerEmail>
        <description></description>
    </log>
    <log>
        <name>SlashDot.Org</name>
        <url>http://www.slashdot.org/</url>
        <ownerName>Simply a friend</ownerName>
        <ownerEmail>afriendofweblogs@weblogs.com</ownerEmail>
        <description>News for Nerds, Stuff that Matters.</description>
    </log>
</weblogs>

The changesUrl element points to a document like this:

<?xml version="1.0"?>
<!DOCTYPE rss PUBLIC "-//Netscape Communications//DTD RSS 0.91//EN" 
                     "http://my.netscape.com/publish/formats/rss-0.91.dtd">
<rss version="0.91">
  <channel>
    <title>MozillaZine</title>
    <link>http://www.mozillazine.org/</link>
    <language>en-us</language>
    <description>Your source for Mozilla news, advocacy, interviews, builds, and more!</description>
    <copyright>Copyright 1998-2002, The MozillaZine Organization</copyright>
    <managingEditor>jason@mozillazine.org</managingEditor>
    <webMaster>jason@mozillazine.org</webMaster>
    <image>
      <title>MozillaZine</title>
      <url>http://www.mozillazine.org/image/mynetscape88.gif</url>
      <description>Your source for Mozilla news, advocacy, interviews, builds, and more!</description>
      <link>http://www.mozillazine.org/</link>
    </image>

    <item>
      <title>BugDays Are Back!</title>
      <link>http://www.mozillazine.org/talkback.html?article=2151</link>
    </item>

    <item>
      <title>Independent Status Reports</title>
      <link>http://www.mozillazine.org/talkback.html?article=2150</link>
    </item>

  </channel>

</rss>

We want to process all the item elements from each weblog.


for Example

<xsl:template match="weblogs">
  <xsl:apply-templates select="
    for $url in log/changesUrl
    return document($url)/item
  "/>
</xsl:template>

Conditional Expressions

Not all weblogs have a changesUrl

<xsl:template match="log">
  <xsl:apply-templates select="
    if (changesUrl)
     then document(changesUrl)
     else document(url)"/>
</xsl:template>

Quantified Expressions

<xsl:template match="weblogs">
  <xsl:if test="some $log in log satisfies changesURL">
     ????
  </xsl:if>
</xsl:template>

<xsl:template match="weblogs">
  <xsl:if test="every $log in log satisfies url">
    ????
  </xsl:if>
</xsl:template>

Functions and operators


Accessor Functions

xf:node-kind(Node)
Returns a string identifying the kind of node; i.e. "document", "element", "attribute", "text", "namespace", "processing-instruction", or "comment".
xf:name(Node)
returns zero or one QName
xf:string(Object)
returns the string value of anything
xf:data(Node)
returns a sequence of zero or more typed simple values
xf:base-uri(node)
returns the base URI of an Element or Document node
xf:unique-ID(element)
returns the unique ID of an element

Constructor Functions


Arithmetic operators


Numeric comparison operators


Numeric Functions


String functions


Regular expressions


Boolean Functions


Date and time functions


Qualified Name Functions


Binary operators


Node Functions and Operators


Sequence Functions


Context Functions


Casting Functions


XSLT 2.0


XSLT 2.0 Goals


XSLT 2.0 Non-goals


XSLT 2.0 Requirements


Some specific improvements that are likely


Identifying 2.0 stylesheets

<xsl:stylesheet version="2.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <!-- Top level elements -->

</xsl:stylesheet>

No result tree fragments


xsl:for-each-group


Grouping example: input


Grouping example: desired output

<?xml version="1.0"?>
<forwardslash>

<section>
  <title>developers</title>
<story>
<title>ROX Desktop Update</title>
<url>http://slashdot.org/article.pl?sid=02/02/18/180240</url>
<time>2002-02-18 18:50:13</time>
<author>timothy</author>
<department>small-simple-swift</department>
<topic>104</topic>
<comments>32</comments>
<image>topicx.jpg</image>
</story>

</section>

<section>
  <title>articles</title>

<story>
<title>HP Selling Systems With Linux</title>
<url>http://slashdot.org/article.pl?sid=02/02/18/1712241</url>
<time>2002-02-18 17:37:20</time>
<author>timothy</author>
<department>wish-this-wasn't-remarkable</department>
<topic>173</topic>
<comments>188</comments>
<image>topichp.gif</image>
</story>

<story>
<title>Excellent Hacks to the ReplayTV 4000</title>
<url>http://slashdot.org/article.pl?sid=02/02/18/1619213</url>
<time>2002-02-18 16:46:04</time>
<author>CmdrTaco</author>
<department>hardware-I-lust-after</department>
<topic>129</topic>
<comments>117</comments>
<image>topictv.jpg</image>
</story>

<story>
<title>Peek-a-Boo(ty)</title>
<url>http://slashdot.org/article.pl?sid=02/02/18/1546226</url>
<time>2002-02-18 15:58:06</time>
<author>Hemos</author>
<department>pirate-treasure</department>
<topic>158</topic>
<comments>207</comments>
<image>topicprivacy.gif</image>
</story>

<story>
<title>Self-Shredding E-Mail</title>
<url>http://slashdot.org/article.pl?sid=02/02/18/1343202</url>
<time>2002-02-18 14:37:45</time>
<author>timothy</author>
<department>plausible-deniability</department>
<topic>158</topic>
<comments>170</comments>
<image>topicprivacy.gif</image>
</story>

<story>
<title>CIA &amp;amp; KGB Gadgets On Display</title>
<url>http://slashdot.org/article.pl?sid=02/02/18/0252219</url>
<time>2002-02-18 13:52:04</time>
<author>Hemos</author>
<department>looking-a-tthe-gear</department>
<topic>126</topic>
<comments>103</comments>
<image>topictech2.gif</image>
</story>


<story>
<title>How to Fix the Unix Configuration Nightmare</title>
<url>http://slashdot.org/article.pl?sid=02/02/18/0248248</url>
<time>2002-02-18 10:48:36</time>
<author>Hemos</author>
<department>fixing-the-problem</department>
<topic>130</topic>
<comments>367</comments>
<image>topicunix.jpg</image>
</story>


</section>
<section>
  <title>science</title>


<story>
<title>Re-Building the Wright Flyer</title>
<url>http://slashdot.org/article.pl?sid=02/02/18/060257</url>
<time>2002-02-18 12:29:12</time>
<author>timothy</author>
<department>we-hope-they-wear-modern-helmets</department>
<topic>126</topic>
<comments>132</comments>
<image>topictech2.gif</image>
</story>


<story>
<title>Sleep Less, Live Longer</title>
<url>http://slashdot.org/article.pl?sid=02/02/18/0511253</url>
<time>2002-02-18 07:38:15</time>
<author>timothy</author>
<department>if-you're-reading-this</department>
<topic>134</topic>
<comments>309</comments>
<section>science</section>
<image>topicscience.gif</image>
</story>

<story>
<title>Warming and Slowing the World</title>
<url>http://slashdot.org/article.pl?sid=02/02/18/0243253</url>
<time>2002-02-18 04:39:39</time>
<author>Hemos</author>
<department>slowing-things-down</department>
<topic>134</topic>
<comments>312</comments>
<section>science</section>
<image>topicscience.gif</image>
</story>

</section>

</forwardslash>

Grouping example: stylesheet

<?xml version="1.0"?> 
<xsl:stylesheet version="2.0" 
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:template match="/">
    <forwardslash>
      <xsl:apply-templates select="*"/>
    </forwardslash>
  </xsl:template>

  <xsl:template match="backslash">
    <xsl:for-each-group select="story" group-by="section">
      <section>
        <title><xsl:value-of select="current-group()/section"/></title>
        <xsl:apply-templates select="."/>
      </section>
    </xsl:for-each-group>
  </xsl:template>

  <xsl:template match="story">
    <story>
      <xsl:apply-templates/>
    </story>
  </xsl:template>

  <xsl:template match="*">
    <xsl:copy-of select="."/>
  </xsl:template>

  <xsl:template match="section"/>

</xsl:stylesheet>

xsl:destination


xsl:result-document


xsl:result-document Example

     <xsl:output name="ccl:html" method="html" encoding="ISO-8859-1" />

     <xsl:result-document href="index.html" format="ccl:html">
       <html>
         <head>
           <title><xsl:value-of select="title"/></title>         
         </head>
         <body> 
           <h1 align="center"><xsl:value-of select="title"/></h1> 
           <ul>
             <xsl:for-each select="slide">
               <li><a href="{format-number(position(),'00')}.html"><xsl:value-of select="title"/></a></li>
             </xsl:for-each>    
           </ul>           
           
           <p><a href="{translate(title,' ', '_')}.html">Entire Presentation as Single File</a></p>
              
           <hr/>
           <div align="center">
             <A HREF="01.html">Start</A> | <A HREF="/xml/">Cafe con Leche</A>
           </div>
           <hr/>
           <font size="-1">
              Copyright 2002 
              <a href="http://www.elharo.com/">Elliotte Rusty Harold</a><br/>       
              <a href="mailto:elharo@metalab.unc.edu">elharo@metalab.unc.edu</a><br/>
              Last Modified <xsl:apply-templates select="last_modified" mode="lm"/>
           </font>
         </body>     
       </html>     
     </xsl:result-document>  

Sorting

<xsl:sort-key
  name = "Qualified Name">
  <!-- Content: (xsl:sort+) -->
</xsl:sort-key>

xsl:namespace

<xsl:namespace name="xsd">http://www.w3.org/2001/XMLSchema</xsl:namespace>


Value of a sequence

<x><xsl:value-of select="(1,2,3,4)" separator=" | "/></x>

<x>1 | 2 | 3 | 4</x>


default-xpath-namespace


An XSLT 1.0 stylesheet for working with XHTML

<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns="http://www.w3.org/1999/xhtml" 
  xmlns:html="http://www.w3.org/1999/xhtml" 
>

  <xsl:output method="html" encoding="ISO-8859-1"/>

  <xsl:template match="week">
    <html xml:lang="en" lang="en">
      <head><title><xsl:value-of select="//html:h1[1]"/></title></head>
      <body bgcolor="#ffffff" text="#000000">

        <xsl:apply-templates select="html:body"/>

        <font size="-1">Last Modified Mon June 5, 2001<br />
          Copyright 2001 Elliotte Rusty Harold<br />
          <a href="mailto:elharo@metalab.unc.edu">elharo@metalab.unc.edu</a>
        </font>
      </body>
    </html>
  </xsl:template>

  <xsl:template match="html:body">
    <xsl:apply-templates 
      select="text()[count(following-sibling::html:hr)>1]|*[count(following-sibling::html:hr)>1]" />

    <hr/>
  </xsl:template>

  <xsl:template match="html:*">
    <xsl:copy>
      <xsl:for-each select="@*">
        <xsl:copy-of select="."/>
      </xsl:for-each>
      <xsl:apply-templates/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="html:font[@size='-1']"></xsl:template>

  <xsl:template match="html:a">
    <xsl:apply-templates/>
  </xsl:template>

  <xsl:template match="html:applet">
    <xsl:apply-templates/>
  </xsl:template>

  <xsl:template match="html:param"/>

</xsl:stylesheet>

An XSLT 2.0 stylesheet for working with XHTML

<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns="http://www.w3.org/1999/xhtml" 
  default-xpath-namespace="http://www.w3.org/1999/xhtml"
>

  <xsl:output method="html" encoding="ISO-8859-1"/>

  <xsl:template match="week">
    <html xml:lang="en" lang="en">
      <head><title><xsl:value-of select="//h1[1]"/></title></head>
      <body bgcolor="#ffffff" text="#000000">

        <xsl:apply-templates select="body"/>

        <font size="-1">Last Modified Mon June 5, 2001<br />
          Copyright 2001 Elliotte Rusty Harold<br />
          <a href="mailto:elharo@metalab.unc.edu">elharo@metalab.unc.edu</a>
        </font>
      </body>
    </html>
  </xsl:template>

  <xsl:template match="body">
    <xsl:apply-templates 
     select="text()[count(following-sibling::hr)>1]|*[count(following-sibling::hr)>1]"/>

    <hr/>
  </xsl:template>

  <xsl:template match="*">
    <xsl:copy>
      <xsl:for-each select="@*">
        <xsl:copy-of select="."/>
      </xsl:for-each>
      <xsl:apply-templates/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="font[@size='-1']"></xsl:template>

  <xsl:template match="a">
    <xsl:apply-templates/>
  </xsl:template>

  <xsl:template match="applet">
    <xsl:apply-templates/>
  </xsl:template>

  <xsl:template match="param"/>

</xsl:stylesheet>

User defined data elements


Typed variables and parameters


Functions defined in XSLT

<xsl:function name="math:factorial"
 xmlns:fib="http://www.example.com/math"
 exclude-result-prefixes="math">
  <xsl:param name="index" type="xsd:nonNegativeInteger"/>
  <xsl:result type="xsd:positiveInteger"
    select="if ($sentence eq 0) then 1
            else math:factorial(index - 1)/>
</xsl:function>

Including text files

For example,

<include_as_text source="bib.xml"/>

<xsl:template match="include_as_text">
  <xsl:value-of select="unparsed-text(@source)"/>
</xsl:template>

XQuery

Three parts:


XQuery Language


Documents to Query


Physical Representations to Query


Where is XQuery used?


The XML Model vs. the Relational Model

A relational database contains tables An XML database contains collections
A relational table contains records with the same schema A collection contains XML documents with the same DTD
A relational record is an unordered list of named values An XML document is a tree of nodes
A SQL query returns an unordered set of records An XQuery returns an unordered sequence of nodes

Query Data Types


An example document to query

Most of the examples in this talk query this bibliography document at the (relative) URL bib.xml:

<bib>
<book year="1994">
<title>TCP/IP Illustrated</title>
<author><last>Stevens</last><first>W.</first></author>
<publisher>Addison-Wesley</publisher>
<price> 65.95</price>
</book>

<book year="1992">
<title>Advanced Programming in the Unix Environment</title>
<author><last>Stevens</last><first>W.</first></author>
<publisher>Addison-Wesley</publisher>
<price>65.95</price>
</book>

<book year="2000">
<title>Data on the Web</title>
<author><last>Abiteboul</last><first>Serge</first></author>
<author><last>Buneman</last><first>Peter</first></author>
<author><last>Suciu</last><first>Dan</first></author>
<publisher>Morgan Kaufmann Publishers</publisher>
<price> 39.95</price>
</book>

<book year="1999">
<title>The Economics of Technology and Content for Digital TV</title>
<editor>
<last>Gerbarg</last><first>Darcy</first>
<affiliation>CITI</affiliation>
</editor>
<publisher>Kluwer Academic Publishers</publisher>
<price>129.95</price>
</book>

</bib>

Adapted from Mary Fernandez, Jerome Simeon, and Phil Wadler: XML Query Languages: Experiences and Exemplars, 1999, as adapted in XML Query Use Cases


The XQuery FLWR


Query: List titles of all books

   FOR $t IN document("bib.xml")/bib/book/title
   RETURN
      $t 

Adapted from XML Query Use Cases


Query Result: Book Titles

  <title>TCP/IP Illustrated</title>
  <title>Advanced Programming in the Unix Environment</title>
  <title>Data on the Web</title>
  <title>The Economics of Technology and Content for Digital TV</title>
 

Adapted from XML Query Use Cases


XQueryX


Element Constructors

List titles of all books in a bib element. Put each title in a book element.

<bib>
  {
   FOR $t IN document("bib.xml")/bib/book/title
   RETURN
    <book>
     { $t }
    </book>
  }
</bib>

Adapted from XML Query Use Cases


Query Result: Book Titles

<bib>
  <book>
    <title>TCP/IP Illustrated</title>
  </book>
  <book>
    <title>Advanced Programming in the Unix Environment</title>
  </book>
  <book>
    <title>Data on the Web</title>
  </book>
  <book>
    <title>The Economics of Technology and Content for Digital TV</title>
  </book>
</bib>
 

Adapted from XML Query Use Cases


Query with WHERE

Adapted from XML Query Use Cases


Query Result: Titles of books published by Addison-Wesley

<bib>
    <title>TCP/IP Illustrated</title>
    <title>Advanced Programming in the Unix Environment</title>
</bib>
 

Adapted from XML Query Use Cases


Query with Booleans

Adapted from XML Query Use Cases


Query Result: books published by Addison-Wesley after 1993

<bib>
    <title>Advanced Programming in the Unix Environment</title>
</bib>
 

Adapted from XML Query Use Cases


Attribute Constructors

Adapted from XML Query Use Cases


Query Result: books published by Addison-Wesley after 1993, including their year and title.

<bib>
  <book year="1992">
    <title>Advanced Programming in the Unix Environment</title>
  </book>
</bib>
 

Adapted from XML Query Use Cases


Query with multiple variables

Create a list of all the title-author pairs, with each pair enclosed in a result element.

<results>
 {
   FOR $b IN document("bib.xml")/bib/book,
     $t IN $b/title,
     $a IN $b/author
   RETURN
    <result>
    { $t }
    { $a }
    </result>
  }
</results>

Adapted from XML Query Use Cases


Query Result: A list of all the title-author pairs

<results>
    <result>
         <title>TCP/IP Illustrated</title>
         <author><last>Stevens</last><first>W.</first></author>
    </result>
    <result>
         <title>Advanced Programming in the Unix Environment</title>
         <author><last>Stevens</last><first>W.</first></author>
    </result>
    <result>
         <title>Data on the Web</title>
         <author><last>Abiteboul</last><first>Serge</first></author>
    </result>
    <result>
         <title> Data on the Web</title>
         <author><last>Buneman</last><first>Peter</first></author>
    </result>
    <result>
         <title>Data on the Web</title>
         <author><last>Suciu</last><first>Dan</first></author>
    </result>
</results>
 

Adapted from XML Query Use Cases


Nested Queries

For each book in the bibliography, list the title and authors, grouped inside a result element.

<results>
 {
   FOR $b IN document("bib.xml")/bib/book
   RETURN
    <result>
     { $b/title }
     {  
       FOR $a IN $b/author
       RETURN $a
     }
    </result>
 }
</results>

Adapted from XML Query Use Cases


Query Result: A list of the title and authors of each book in the bibliography

<?xml version="1.0"?>
<results xmlns:ino="http://namespaces.softwareag.com/tamino/response2" xmlns:xql="http://metalab.unc.edu/xql/">
  <result>
    <title>TCP/IP Illustrated</title>
    <author>
      <last>Stevens</last>
      <first>W.</first>
    </author>
  </result>
  <result>
    <title>Advanced Programming in the Unix Environment</title>
    <author>
      <last>Stevens</last>
      <first>W.</first>
    </author>
  </result>
  <result>
    <title>Data on the Web</title>
    <author>
      <last>Abiteboul</last>
      <first>Serge</first>
    </author>
    <author>
      <last>Buneman</last>
      <first>Peter</first>
    </author>
    <author>
      <last>Suciu</last>
      <first>Dan</first>
    </author>
  </result>
  <result>
    <title>The Economics of Technology and Content for Digital TV</title>
  </result>
</results> 

Adapted from XML Query Use Cases


Query with distinct

For each author in the bibliography, list the author's name and the titles of all books by that author, grouped inside a result element.

<results>
 {
   FOR $a IN distinct-values(document("bib.xml")//author)
   RETURN
    <result>
     { $a }
     {  FOR $b IN document("bib.xml")/bib/book[author=$a]
        RETURN $b/title
     }
    </result>
 }
</results>

Adapted from XML Query Use Cases


Query Result

<results>
  <result>
    <author><last>Stevens</last><first>W.</first></author>
    <title>TCP/IP Illustrated</title>
    <title>Advanced Programming in the Unix Environment</title>
  </result>

  <result>
    <author><last>Abiteboul</last><first>Serge</first></author>
    <title>Data on the Web</title>
  </result>

  <result>
    <author><last>Buneman</last><first>Peter</first></author>
    <title>Data on the Web</title>
  </result>

  <result>
    <author><last>Suciu</last><first>Dan</first></author>
      <title>Data on the Web</title>
  </result>
</results>
 

Adapted from XML Query Use Cases


Query with sorting

List the titles and years of all books published by Addison-Wesley after 1991, in alphabetic order.

<bib>
 {
   FOR $b IN document("bib.xml")//book
    [publisher = "Addison-Wesley" AND @year > "1991"]
   RETURN
    <book>
     { $b/@year } { $b/title }
    </book> SORTBY (title)
 }
</bib>

Adapted from XML Query Use Cases


Query Result

<bib>
  <book year="1992">
    <title>Advanced Programming in the Unix Environment</title>
  </book>
  <book year="1994">
    <title>TCP/IP Illustrated</title>
   </book>
</bib>
  

Adapted from XML Query Use Cases


Queries with functions

Adapted from XML Query Use Cases


Query Result

<result>
 <book>
  <title> Data on the Web </title>
  <author> <last> Suciu </last> <first> Dan </first> </author>
 </book>
</result>

Adapted from XML Query Use Cases


A different document about books

Sample data at "reviews.xml":

<reviews>
  <entry>
    <title>Data on the Web</title>
    <price>34.95</price>
    <review>
       A very good discussion of semi-structured database
       systems and XML.
    </review>
  </entry>
  <entry>
    <title>Advanced Programming in the Unix Environment</title>
    <price>65.95</price>
    <review>
      A clear and detailed discussion of UNIX programming.
    </review>
  </entry>
  <entry>
    <title>TCP/IP Illustrated</title>
    <price>65.95</price>
    <review>
      One of the best books on TCP/IP.
    </review>
  </entry>
</reviews>

Adapted from XML Query Use Cases


This document uses a different DTD

<!ELEMENT reviews (entry*)>
<!ELEMENT entry   (title, price, review)>
<!ELEMENT title   (#PCDATA)>
<!ELEMENT price   (#PCDATA)>
<!ELEMENT review  (#PCDATA)>

Query that joins two documents

For each book found in both bib.xml and reveiws.xml, list the title of the book and its price from each source.

<books-with-prices>
 {
   FOR $b IN document("bib.xml")//book,
     $a IN document("reviews.xml")//entry
   WHERE $b/title = $a/title
   RETURN
    <book-with-prices>
     { $b/title },
       <price-amazon> { $a/price/text() } </price-amazon>
       <price-bn> { $b/price/text() } </price-bn>
    </book-with-prices>
 }
</books-with-prices>

Adapted from XML Query Use Cases


Result

<books-with-prices>
  <book-with-prices>
    <title>TCP/IP Illustrated</title>
    <price-amazon>65.95</price-amazon>
    <price-bn>65.95</price-bn>
  </book-with-prices>

  <book-with-prices>
    <title>Advanced Programming in the Unix Environment</title>
    <price-amazon>65.95</price-amazon>
    <price-bn>65.95</price-bn>
  </book-with-prices>

  <book-with-prices>
    <title>Data on the Web</title>
    <price-amazon>34.95</price-amazon>
    <price-bn>39.95</price-bn>
  </book-with-prices>
</books-with-prices>
  

Adapted from XML Query Use Cases


prices.xml Query Sample Data

The next query also uses an input document named "prices.xml":

<prices>
  <book>
    <title>Advanced Programming in the Unix Environment</title>
    <source>www.amazon.com</source>
    <price>65.95</price>
  </book>
  <book>
    <title>Advanced Programming in the Unix Environment </title>
    <source>www.bn.com</source>
    <price>65.95</price>
  </book>
  <book>
    <title>TCP/IP Illustrated </title>
    <source>www.amazon.com</source>
    <price>65.95</price>
  </book>
  <book>
    <title>TCP/IP Illustrated </title>
    <source>www.bn.com</source>
    <price>65.95</price>
  </book>
  <book>
    <title>Data on the Web</title>
    <source>www.amazon.com</source>
    <price>34.95</price>
  </book>
  <book>
    <title>Data on the Web</title>
    <source>www.bn.com</source>
    <price>39.95</price>
  </book>
</prices>


Adapted from XML Query Use Cases


Query with reused variables

<results>
 {
   FOR $t IN distinct(document("prices.xml")/book/title)
   LET $p := $doc/book[title = $t]/price
   RETURN
    <minprice title = { $t/text() } >
     { min($p) }
    </minprice>
 }
</results>

Adapted from XML Query Use Cases


Query Result

<results>
  <minprice title="Advanced Programming in the Unix Environment"> 65.95 </minprice>
  <minprice title="TCP/IP Illustrated"> 65.95 </minprice>
  <minprice title="Data on the Web"> 34.95 </minprice>
</results>   

Adapted from XML Query Use Cases


Multiple FLWR Queries

<bib>
 {
   FOR $b IN document("bib.xml")//book[author]
   RETURN
    <book>
     { $b/title }
     { $b/author }
    </book>,
   FOR $b IN document("bib.xml")//book[editor]
   RETURN
    <reference>
     { $b/title }
     <org> { $b/editor/affiliation/text() } </org>
    </reference>
 }
</bib>

Adapted from XML Query Use Cases


Query Result

<bib>
    <book>
         <title>TCP/IP Illustrated</title>
         <author><last> Stevens </last> <first> W.</first></author>
    </book>

    <book>
         <title>Advanced Programming in the Unix Environment</title>
         <author><last>Stevens</last><first>W.</first></author>
    </book>

    <book>
         <title>Data on the Web</title>
         <author><last>Abiteboul</last><first>Serge</first></author>
         <author><last>Buneman</last><first>Peter</first></author>
         <author><last>Suciu</last><first>Dan</first></author>
    </book>

    <reference>
        <title>The Economics of Technology and Content for Digital TV</title>
        <org>CITI</org>
    </reference>
</bib>

Adapted from XML Query Use Cases


Query Software


What's the difference between XQuery and XSLT?


To Learn More


Part IV: SAX 2.1

Actually, SAX2 has ** MUCH ** better infoset support than DOM does. Yes, I've done the detailed analysis.

--David Brownell on the xml-dev mailing list


Goals


Specified vs. Defaulted Attributes


standalone declaration

<?xml version="1.0" standalone="yes"?>


The version and encoding properties

<?xml version="1.0" encoding="UTF-16"?>


Feature/Property discovery


DefaultHandler infoset extensions


Parser identification


A Verifier Class as in JDOM

package org.jdom;

public final class Verifier {

  public static final String checkElementName(String name) {}
  public static final String checkAttributeName(String name) {}
  public static final String checkCharacterData(String text) {}
  public static final String checkNamespacePrefix(String prefix) {}
  public static final String checkNamespaceURI(String uri) {}
  public static final String checkProcessingInstructionTarget(String target) {}
  public static final String checkCommentData(String data) {}
 
  public static boolean isXMLCharacter(char c) {}
  public static boolean isXMLNameCharacter(char c) {}
  public static boolean isXMLNameStartCharacter(char c) {}
  public static boolean isXMLLetterOrDigit(char c) {}
  public static boolean isXMLLetter(char c) {}
  public static boolean isXMLCombiningChar(char c) {}
  public static boolean isXMLExtender(char c) {}
  public static boolean isXMLDigit(char c) {}

}

To Learn More


Part V: DOM Level 3

of all of the things the W3C has given us, the DOM is probably the one with the least value.

--Michael Brennan on the xml-dev mailing list


DOM Evolution


New Features in DOM Level 3


DOM Level 3 Core Changes


DOMKey


New methods in Node interface


User Data


New methods in Entity


New methods in Document


New methods in Text


Bootstrapping


DOM3 Bootstrapping


DOM Error Handler Interfaces


The DOMErrorHandler Interface


The DOMLocator Interface


Load and Save


The DOM Process

  1. Library specific code creates a parser

  2. The parser parses the document and returns a DOM org.w3c.dom.Document object.

  3. The entire document is stored in memory.

  4. DOM methods and interfaces are used to extract data from this object


Parsing documents with DOM2

This program parses with Xerces. Other parsers are different.

import org.apache.xerces.parsers.*;
import org.w3c.dom.*;
import org.xml.sax.*;
import java.io.*;

public class DOMParserMaker {

  public static void main(String[] args) {
     
    DOMParser parser = new DOMParser();
    
    for (int i = 0; i < args.length; i++) {
      try {
        parser.parse(args[i]); 
       
        Document d = parser.getDocument();
      }
      catch (SAXException e) {
        System.err.println(e); 
      }
      catch (IOException e) {
        System.err.println(e); 
      }
      
    }
   
  }

}

Parsing documents with JAXP

import javax.xml.parsers.*; // JAXP
import org.xml.sax.SAXException;
import java.io.IOException;


public class JAXPParserMaker {

  public static void main(String[] args) {
     
    if (args.length <= 0) {
      System.out.println("Usage: java JAXPParserMaker URL");
      return;
    }
    String document = args[0];
    
    try {
      DocumentBuilderFactory factory 
       = DocumentBuilderFactory.newInstance();
      DocumentBuilder parser = factory.newDocumentBuilder();
      parser.parse(document); 
      System.out.println(document + " is well-formed.");
    }
    catch (SAXException e) {
      System.out.println(document + " is not well-formed.");
    }
    catch (IOException e) { 
      System.out.println(
       "Due to an IOException, the parser could not check " 
       + document
      ); 
    }
    catch (FactoryConfigurationError e) { 
      // JAXP suffers from excessive brain-damage caused by 
      // intellectual in-breeding at Sun. (Basically the Sun 
      // engineers spend way too much time talking to each other
      // and not nearly enough time talking to people outside 
      // Sun.) Fortunately, you can happily ignore most of the 
      // JAXP brain damage and not be any the poorer for it.
      
      // This, however, is one of the few problems you can't 
      // avoid if you're going to use JAXP at all. 
      // DocumentBuilderFactory.newInstance() should throw a 
      // ClassNotFoundException if it can't locate the factory
      // class. However, what it does throw is an Error,
      // specifically a FactoryConfigurationError. Very few 
      // programs are prepared to respond to errors as opposed
      // to exceptions. You should catch this error in your 
      // JAXP programs as quickly as possible even though the
      // compiler won't require you to, and you should 
      // never rethrow it or otherwise let it escape from the 
      // method that produced it. 
      System.out.println("Could not locate a factory class"); 
    }
    catch (ParserConfigurationException e) { 
      System.out.println("Could not locate a JAXP parser"); 
    }
   
  }

}

Parsing documents with DOM3

import org.w3c.dom.*;

public class DOM3ParserMaker {

  public static void main(String[] args) {

    DOMImplementation impl 
     = DOMImplementationRegistry.getDOMImplementation("XML 1.0 LS-Load 3.0");
    DOMImplementationLS implls = (DOMImplementationLS) impl;
    DOMBuilder parser = implls.getDOMBuilder();

    for (int i = 0; i < args.length; i++) {
      try {
        Document d = parser.parseURI(args[i]);
      }
      catch (DOMSystemException e) {
        System.err.println(e);
      }
      catch (DOMException e) {
        System.err.println(e);
      }

    }

  }

}

This code will not actually compile or run until some parser supports DOM3 Load and Save.


Load and Save

DOMImplementationLS
A sub-interface od DOMImplementation that provides the factory methods for creating the objects required for loading and saving.
DOMBuilder
A parser interface
DOMInputSource
Encapsulate information about the source of the XML to be loaded, like SAX's InputSource
DOMEntityResolver
During loading, provides a way for applications to redirect references to external entities.
DOMBuilderFilter
Provide the ability to examine and optionally remove Element nodes as they are being processed during the parsing of a document. like SAX filters.
DOMWriter
An interface for serializing DOM documents onto a stream or string.
DOMWriterFilter
Provide the ability to examine and optionally remove or modify nodes as they are being output.
DocumentLS
A "mechanism by which the content of a document can be replaced with the DOM tree produced when loading a URL, or parsing a string."
ParserErrorEvent
Some sort of error detected in the input document (well-formedness? validity?)
LSLoadEvent
A document has been completely loaded
LSProgressEvent
A document has been partially loaded

DOMImplementationLS


Creating DOMImplementationLS Objects

  1. Use the feature "LS-Load" to find a DOMImplementation object that supports Load and Save.

  2. Cast the DOMImplementation object to DOMImplementationLS.

DOMImplementation impl 
 = DOMImplementationRegistry.getDOMImplementation("XML 1.0 LS-Load 3.0");
  if (impl != null) {
    DOMImplementationLS implls = (DOMImplementationLS) impl;
    // ...
  }

DOMBuilder


DOMInputSource


DOMEntityResolver


DOMWriter


DOMBuilderFilter


DOMWriterFilter


DocumentLS


ParserErrorEvent


Grammar Access/Abstract Schemas


What are Abstract Schemas for?


Abstract Schema Interfaces


Loading an Abstract Schema from a URI into an ASModel

  1. Check to see if the implementation supports the "LS-AS" feature, version "3.0".

  2. Construct a DOMBuilder object

  3. Cast the DOMBuilder to ASDOMBuilder

  4. Call the parseASURI() method to read the schema

try {
  if (impl.hasFeature("LS-AS", "3.0")) {
    DOMImplementationFactoryLS impl =
      (DOMImplementationLS) DOMImplementationFactory.getDOMImplementation();
    DOMBuilder parser = impl.getDOMBuilder();
    ASDOMBuilder schemaParser = (ASDOMBuilder) parser;
    ASModel schema = schemaParser.parseASURI(
     "http://www.openhealth.org/RDDL/rddl-integration.rxg",
     "RELAX");
    // Use the schema...
  }
}
catch (DOMException e) {
  //...
}

Validating a DOM Document against an ASModel

    if (impl.hasFeature("AS-DOC 3.0")) {
      Document doc = parser.parseURI("????");
      DocumentAS docWithSchema = (DocumentAS) doc;
      docWithSchema.addAS(schema);
      docWithSchema.validate()
      // Process the data...
    }

Abstract Schema and AS-Editing Interfaces


The ASObject Interface


The ASModel Interface


The ASContentModel Interface


The ASObjectList Interface


The ASNamedObjectMap Interface


The ASDataType Interface


The ASElementDecl Interface


The ASAttributeDecl Interface


The ASEntityDecl Interface


The ASNotationDecl Interface


Validation Interfaces:


The DocumentAS Interface


Validating a document in-memory

  1. Call hasFeature("????", "3.0") to verify that this is supported

  2. Load the document in the usual way

  3. Load the ASModel

  4. Cast the Document to a DocumentAS

  5. Attach the ASModel the DocumentAS using the setAS() method

  6. Invoke the DocumentAS's validate() method

  7. If the Document is not valid, then a ASException is thrown with the code VALIDATION_ERR


The DOMImplementationAS Interface


Creating a schema in-memory


A DTD for XML-RPC

<!ELEMENT methodCall (methodName, params)>
<!ELEMENT methodName (#PCDATA)>
<!ELEMENT params     (param*)>
<!ELEMENT param      (value)>
<!ELEMENT value      
   (i4|int|string|dateTime.iso8601|double|base64|struct|array)>
<!ELEMENT i4               (#PCDATA)>
<!ELEMENT int              (#PCDATA)>
<!ELEMENT string           (#PCDATA)>
<!ELEMENT dateTime.iso8601 (#PCDATA)>
<!ELEMENT double           (#PCDATA)>
<!ELEMENT base64           (#PCDATA)>

<!ELEMENT array            (data)>
<!ELEMENT data             (value*)>
<!ELEMENT struct           (member+)>
<!ELEMENT member           (name, value)>
<!ELEMENT name             (#PCDATA)>

<!ELEMENT methodResponse   (params | fault)>
<!ELEMENT fault            (value)>

DOM code to create a DTD schema for XML-RPC

try {
  if (impl.hasFeature("AS-EDIT", "3.0")) {
    DOMImplementationFactoryLS impl =
      (DOMImplementationAS) DOMImplementationFactory.getDOMImplementation();
    ASModel dtd = impl.createAS(false, "DTD");

    // <!ELEMENT methodCall (methodName, params)>
    ASElementDecl  methodCall = dtd.createASElementDecl(null, "methodCall");
    ASContentModel methodCallModel = dtd.createASContentModel(
     "methodCall",  null, 1, 1, ASContentModel.AS_SEQUENCE);
    methodCall.setASContentModel(methodCallModel);
    methodCall.setContentType(ASElementDecl.ELEMENTS_CONTENTTYPE);
    
    // <!ELEMENT methodName (#PCDATA)>
    ASElementDecl methodName = dtd.createASElementDecl(null, "methodName");
    methodName.setIsPCDataOnly(true);
    
    // <!ELEMENT params (param*)>
    ASElementDecl  params = dtd.createASElementDecl(null, "params");
    ASContentModel paramsModel = dtd.createASContentModel(
     "params",  "", 0, ASContentModel.AS_UNBOUNDED, ASContentModel.AS_SEQUENCE);
    methodCall.setASContentModel(paramsModel);
    methodCall.setContentType(ASElementDecl.ELEMENTS_CONTENTTYPE);
    
    // <!ELEMENT param (value)>
    ASElementDecl  param = dtd.createASElementDecl(null, "param");
    ASContentModel paramModel = dtd.createASContentModel(
     "param",  "", 1, 1, ASContentModel.AS_SEQUENCE);
    methodCall.setASContentModel(paramModel);
    methodCall.setContentType(ASElementDecl.ELEMENTS_CONTENTTYPE);
    
    // <!ELEMENT value (i4|int|string|dateTime.iso8601|double|base64|struct|array)>
    ASElementDecl  value = dtd.createASElementDecl(null, "value");
    ASContentModel valueModel = dtd.createASContentModel(
     "param",  "", 1, 1, ASContentModel.AS_CHOICE);
    methodCall.setASContentModel(valueModel);
    methodCall.setContentType(ASElementDecl.ELEMENTS_CONTENTTYPE);
    
    
    // <!ELEMENT i4               (#PCDATA)>
    // <!ELEMENT int              (#PCDATA)>
    // <!ELEMENT string           (#PCDATA)>
    // <!ELEMENT dateTime.iso8601 (#PCDATA)>
    // <!ELEMENT double           (#PCDATA)>
    // <!ELEMENT base64           (#PCDATA)>
    ASElementDecl i4 = dtd.createASElementDecl(null, "i4");
    i4.setIsPCDataOnly(true);
    ASElementDecl intElement = dtd.createASElementDecl(null, "int");
    intElement.setIsPCDataOnly(true);
    ASElementDecl string = dtd.createASElementDecl(null, "string");
    string.setIsPCDataOnly(true);
    ASElementDecl dateTime.iso8601 = dtd.createASElementDecl(null, "dateTime.iso8601");
    dateTime.iso8601.setIsPCDataOnly(true);
    ASElementDecl base64 = dtd.createASElementDecl(null, "base64");
    base64.setIsPCDataOnly(true);
    ASElementDecl doubleElement = dtd.createASElementDecl(null, "doubleElement");
    doubleElement.setIsPCDataOnly(true);
    
    
    // <!ELEMENT array (data)>
    ASElementDecl  array = dtd.createASElementDecl(null, "array");
    ASContentModel arrayModel = dtd.createASContentModel(
     "array",  "", 1, ASContentModel.AS_UNBOUNDED, ASContentModel.AS_SEQUENCE);
    methodCall.setASContentModel(arrayModel);
    methodCall.setContentType(ASElementDecl.ELEMENTS_CONTENTTYPE);

    // <!ELEMENT data (value*)>
    ASElementDecl  data = dtd.createASElementDecl(null, "data");
    ASContentModel dataModel = dtd.createASContentModel(
     "data",  "", 0, 1, ASContentModel.AS_SEQUENCE);
    methodCall.setASContentModel(arrayModel);
    methodCall.setContentType(ASElementDecl.ELEMENTS_CONTENTTYPE);

    // <!ELEMENT struct (member+)>
    ASElementDecl  struct = dtd.createASElementDecl(null, "struct");
    ASContentModel structModel = dtd.createASContentModel(
     "struct",  "", 1, ASContentModel.AS_UNBOUNDED, ASContentModel.AS_SEQUENCE);
    methodCall.setASContentModel(structModel);
    methodCall.setContentType(ASElementDecl.ELEMENTS_CONTENTTYPE);

    // <!ELEMENT member (name, value)>
    ASElementDecl  member = dtd.createASElementDecl(null, "member");
    ASContentModel memberModel = dtd.createASContentModel(
     "member",  "", 2, 2, ASContentModel.AS_SEQUENCE);
    methodCall.setASContentModel(memberModel);
    methodCall.setContentType(ASElementDecl.ELEMENTS_CONTENTTYPE);
    
    // <!ELEMENT name (#PCDATA)>
    ASElementDecl name = dtd.createASElementDecl(null, "i4");
    name.setIsPCDataOnly(true);

    // <!ELEMENT methodResponse (params | fault)>
    ASElementDecl  methodResponse = dtd.createASElementDecl(null, "methodResponse");
    ASContentModel methodResponseModel = dtd.createASContentModel(
     "member",  "", 1, 1, ASContentModel.AS_CHOICE);
    methodCall.setASContentModel(methodResponseModel);
    methodCall.setContentType(ASElementDecl.ELEMENTS_CONTENTTYPE);
    
    // <!ELEMENT fault (value)>    
    ASElementDecl  fault = dtd.createASElementDecl(null, "fault");
    ASContentModel faultModel = dtd.createASContentModel(
     "fault",  "", 1, 1, ASContentModel.AS_SEQUENCE);
    methodCall.setASContentModel(faultModel);
    methodCall.setContentType(ASElementDecl.ELEMENTS_CONTENTTYPE);
    
    methodCallModel.appendSubModel(methodName);
    methodCallModel.appendSubModel(params);
    paramsModel.appendSubModel(param);
    paramModel.appendSubModel(value);
    valueModel.appendSubModel(i4);
    valueModel.appendSubModel(intElement);
    valueModel.appendSubModel(string);
    valueModel.appendSubModel(dateTime.iso8601);
    valueModel.appendSubModel(doubleElement);
    valueModel.appendSubModel(base64Element);
    valueModel.appendSubModel(structElement);
    valueModel.appendSubModel(arrayElement);
    arrayModel.appendSubModel(data);
    dataModel.appendSubModel(value);
    structModel.appendSubModel(name);
    methodResponseModel.appendSubModel(params);
    methodResponseModel.appendSubModel(fault);
    memberModel.appendSubModel(name);
    memberModel.appendSubModel(value);
    faultModel.appendSubModel(value);

  }
}
catch (ASException e) {
  System.err.println(e);  
}

A W3C XML Schema Language schema for XML-RPC

<?xml version="1.0"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">

  <!-- The only two possible root elements are methodResponse and
       methodCall so these are the only two I use a top-level
       declaration for. --> 

  <xsd:element name="methodCall">
    <xsd:complexType>
      <xsd:all>
        <xsd:element name="methodName">
          <xsd:simpleType>
            <xsd:restriction base="ASCIIString">
              <xsd:pattern value="([A-Za-z0-9]|/|\.|:|_)*" />
            </xsd:restriction>
          </xsd:simpleType>
        </xsd:element>
        <xsd:element name="params" minOccurs="0" maxOccurs="1">
          <xsd:complexType>
            <xsd:sequence>
              <xsd:element name="param"  type="ParamType" 
                           minOccurs="0" maxOccurs="unbounded"/>
            </xsd:sequence>
          </xsd:complexType>
         </xsd:element>
      </xsd:all>
    </xsd:complexType>  
  </xsd:element>

  <xsd:element name="methodResponse">
    <xsd:complexType>
      <xsd:choice>
        <xsd:element name="params">
          <xsd:complexType>
            <xsd:sequence>
              <xsd:element name="param" type="ParamType"/>
            </xsd:sequence>
          </xsd:complexType>
        </xsd:element>
        <xsd:element name="fault">
          <!-- What can appear inside a fault is very restricted -->
          <xsd:complexType>
            <xsd:sequence>
              <xsd:element name="value">
                <xsd:complexType>
                  <xsd:sequence>
                    <xsd:element name="struct"> 
                      <xsd:complexType> 
                        <xsd:sequence> 
                          <xsd:element name="member" 
                                       type="MemberType">
                          </xsd:element>
                          <xsd:element name="member" 
                                       type="MemberType">
                          </xsd:element>
                        </xsd:sequence>
                      </xsd:complexType>
                    </xsd:element>
                  </xsd:sequence>
                </xsd:complexType>
              </xsd:element>
            </xsd:sequence>
          </xsd:complexType>
         </xsd:element>
      </xsd:choice>
    </xsd:complexType>  
  </xsd:element>

  <xsd:complexType name="ParamType">
    <xsd:sequence>
      <xsd:element name="value" type="ValueType"/>
    </xsd:sequence>
  </xsd:complexType>

  <xsd:complexType name="ValueType" mixed="true">
    <!-- I need to figure out how to say that this
         is either a simple xsd:string type or that 
         it contains one of these elements; but that otherwise
         it does not have mixed content -->
    <xsd:choice>
      <xsd:element name="i4"            type="xsd:int"/>
      <xsd:element name="int"           type="xsd:int"/>
      <xsd:element name="string"        type="ASCIIString"/>
      <xsd:element name="double"        type="xsd:decimal"/>
      <xsd:element name="Base64"        type="xsd:base64Binary"/>
      <xsd:element name="boolean"       type="NumericBoolean"/>
      <xsd:element name="dateTime.iso8601" type="xsd:dateTime"/>
      <xsd:element name="array"         type="ArrayType"/>
      <xsd:element name="struct"        type="StructType"/>
    </xsd:choice>
  </xsd:complexType>

  <xsd:complexType name="StructType">
    <xsd:sequence>
      <xsd:element name="member" type="MemberType" 
                   maxOccurs="unbounded"/>
    </xsd:sequence>
  </xsd:complexType>

  <xsd:complexType name="MemberType">
    <xsd:sequence>
      <xsd:element name="name"  type="xsd:string" />
      <xsd:element name="value" type="ValueType"/>
    </xsd:sequence>
  </xsd:complexType>

  <xsd:complexType name="ArrayType">
    <xsd:sequence>
      <xsd:element name="data">
        <xsd:complexType>
          <xsd:sequence>
            <xsd:element name="value"  type="ValueType" 
                         minOccurs="0" maxOccurs="unbounded"/>
          </xsd:sequence>
        </xsd:complexType>
      </xsd:element>
    </xsd:sequence>
  </xsd:complexType>

  <xsd:simpleType name="ASCIIString">
    <xsd:restriction base="xsd:string">
      <xsd:pattern value="([ -~]|\n|\r|\t)*" />
    </xsd:restriction>
  </xsd:simpleType>

  <xsd:simpleType name="NumericBoolean">
    <xsd:restriction base="xsd:boolean">
      <xsd:pattern value="0|1" />
    </xsd:restriction>
  </xsd:simpleType>

</xsd:schema>

DOM code to create an abstract schema for XML-RPC


Serializing an abstract schema to a file


Schema-guided Document-Editing Interfaces:


The NodeEditAS Interface


The ElementEditAS Interface


The CharacterDataEditAS Interface


The DocumentEditAS Interface


To Learn More


To Learn More


Index | Cafe con Leche

Copyright 2000-2002 Elliotte Rusty Harold
elharo@metalab.unc.edu
Last Modified March 17, 2002