Advanced XML Programming

The Bleeding Edge of XML

Elliotte Rusty Harold

XML and Web Services 2002 London

Monday, March 11, 2002

elharo@metalab.unc.edu

http://www.cafeconleche.org//

Outline

Part I: XML Infoset, Canonical XML, Digital Signatures, and Encryption
Part II: XML 1.1
Part III: XPath 2.0 and Beyond
Part IV: SAX 2.1
Part V: DOM Level 3

Part I: Semantics and Syntax

The Infoset is the unfortunate standard to which those in retreat from the radical and most useful implications of well-formedness have rallied. At its core the Infoset insists that there is 'more' to XML than the straightforward syntax of well-formedness. By imposing its canonical semantics the Infoset obviates the infinite other semantic outcomes which might be elaborated in particular unique circumstances from an instance of well-formed XML 1.0 syntax. The question we should be asking is not whether the Infoset has chosen the correct canonical semantics, but whether the syntactic possibilities of XML 1.0 should be curtailed in this way at all.

--Walter Perry on the xml-dev mailing list

A normal XML document

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/css" href="song.css"?>
<!DOCTYPE SONG SYSTEM "song.dtd">
<SONG xmlns="http://www.cafeconleche.org/namespace/song"
      xmlns:xlink="http://www.w3.org/1999/xlink">
  <TITLE>Hot Cop</TITLE>
  <PHOTO 
    xlink:type="simple" xlink:show="onLoad" xlink:href="hotcop.jpg"
    ALT="Victor Willis in Cop Outfit" WIDTH="100" HEIGHT="200"/>
  <COMPOSER>Jacques Morali</COMPOSER>
  <COMPOSER>Henri Belolo</COMPOSER>
  <COMPOSER>Victor Willis</COMPOSER>
  <PRODUCER>Jacques Morali</PRODUCER>
  <!-- The publisher is actually Polygram but I needed 
       an example of a general entity reference. -->
  <PUBLISHER xlink:type="simple" xlink:href="http://www.amrecords.com/">
    A &amp; M Records
  </PUBLISHER>
  <LENGTH>6:20</LENGTH>
  <YEAR>1978</YEAR>
  <ARTIST>Village People</ARTIST>
</SONG>
<!-- You can tell what album I was 
     listening to when I wrote this example -->

A canonical XML document

<?xml-stylesheet type="text/css" href="song.css"?>
<SONG xmlns="http://www.cafeconleche.org/namespace/song" xmlns:xlink="http://www.w3.org/1999/xlink">
  <TITLE>Hot Cop</TITLE>
  <PHOTO ALT="Victor Willis in Cop Outfit" HEIGHT="200" WIDTH="100" xlink:href="hotcop.jpg" xlink:show="onLoad" xlink:type="simple"></PHOTO>
  <COMPOSER>Jacques Morali</COMPOSER>
  <COMPOSER>Henri Belolo</COMPOSER>
  <COMPOSER>Victor Willis</COMPOSER>
  <PRODUCER>Jacques Morali</PRODUCER>
  
  <PUBLISHER xlink:href="http://www.amrecords.com/" xlink:type="simple">
    A &amp; M Records
  </PUBLISHER>
  <LENGTH>6:20</LENGTH>
  <YEAR>1978</YEAR>
  <ARTIST>Village People</ARTIST>
</SONG>

An org.w3c.dom.Document object formed by reading hotcop.xml

import org.apache.xerces.parsers.*;
import org.w3c.dom.*;
import org.xml.sax.*;
import java.io.*;

public class DOMHotCop {

  public static void main(String[] args) {

    DOMParser parser = new DOMParser();

    try {
      parser.parse("hot_cop.xml");
      Document d = parser.getDocument();
    }
    catch (SAXException e) {
      System.err.println(e);
    }
    catch (IOException e) {
      System.err.println(e);
    }

  }

}

An encrypted hotcop.xml

Are these four the same thing or not?

The customary form of an XML document
The canonical form of an XML document
The object form of an XML document
The encrypted form of an XML document

What is the XML Infoset?

A W3C proposed recommendation providing "a consistent set of definitions for use in other specifications that need to refer to the information in a well-formed XML document." This is considerably weaker than originally planned.
What it used to be: A W3C standard for what is and is not significant in an XML document
Not everyone agrees that this is a good thing! or that this is the right list!

The Infoset defines 11 kinds of Information Items

The Document Information Item
Element Information Items
Attribute Information Items
Processing Instruction Information Items
Unparsed Entity Information Items
Unexpanded Entity Information Items
Character Information Items
Comment Information Items
The Document Type Declaration Information Item
Notation Information Items
Namespace Information Items

The Document Information Item

Represents the entire document; not just the root element
Properties:
- Children
  - One Element Information Item for the root element
  - One Comment Information Item for each Comment
  - One Processing Instruction Information Item for each Processing Instruction
- Document Element
- Character Encoding Scheme
- Notation Declarations
- Entity Declarations
- Base URI
- Standalone Declaration
- Version Declaration
- All declarations processed

Elements

<PHOTO 
  xlink:type="simple" xlink:show="onLoad" xlink:href="hotcop.jpg"
  ALT="Victor Willis in Cop Outfit" WIDTH="100" HEIGHT="200"/>
  
<COMPOSER>Jacques Morali</COMPOSER>

<COMPOSER>
  <PERSON>
    <NAME>
      <FIRST>Henri</FIRST>
      <LAST>Belolo</LAST>
    </NAME>
  </PERSON>
</COMPOSER>

<rdf:RDF xmlns:rdf="http://www.w3.org/TR/REC-rdf-syntax#">
  <rdf:Description xmlns:dc="http://purl.org/dc/"
     about="http://www.ibiblio.org/examples/impressionists.xml">
    <dc:title> Impressionist Paintings </dc:title>
    <dc:creator> Elliotte Rusty Harold </dc:creator>
    <dc:description> 
      A list of famous impressionist paintings organized 
      by painter and date 
    </dc:description>
    <dc:date>2000-08-22</dc:date>
  </rdf:Description>
</rdf:RDF>

Element Information Items

An Element Information Item Includes:

namespace name; e.g. the absolute URI for the element's namespace
local name
prefix
children: a list of element, processing instruction, unexpanded entity, character, and comment information items, one for each element, processing instruction, unexpanded entity, data character, and comment appearing immediately within the current element
attributes: an unordered set of attribute information items, one for each of the attributes (specified or defaulted from the DTD) of this element. xmlns attributes are not included.
namespace attributes: an unordered set of attribute information items, one for each of the namespaces declared either in the start-tag of this element or defaulted from the DTD.
in-scope namespaces: An unordered set of namespace information items, one for each of the namespaces in effect for this element
base URI: The absolute URI of the external entity in which this element appears, as defined in XML Base. If this is not known, this property is null.
parent

Attributes

xlink:type="simple"
xlink:href="http://www.amrecords.com/"
xlink:type =  "simple"
xlink:show = "onLoad"
xlink:href="hotcop.jpg"
ALT="Victor Willis in Cop Outfit"
WIDTH=" 100 "
HEIGHT=' 200 '

An Attribute Information Item Includes:

namespace name
local name
prefix
normalized value
specified: A flag indicating whether this attribute was actually specified in the start-tag of its element, or was defaulted from the DTD
attribute type:
- ID
- IDREF
- IDREFS
- ENTITY
- ENTITIES
- NMTOKEN
- NMTOKENS
- NOTATION
- CDATA
- ENUMERATED
owner element
references: only for IDREF, IDREFS, ENTITY, ENTITIES, and NOTATION type attributes; an ordered list of the things this attribute points to

Comments

  <!-- The publisher is actually Polygram but I needed 
       an example of a general entity reference. -->
<!--  <PUBLISHER xlink:type="simple" xlink:href="http://www.amrecords.com/">
    A &amp; M Records
  </PUBLISHER>
  <LENGTH>6:20</LENGTH>
  <YEAR>1978</YEAR>
  <ARTIST>Village People</ARTIST>
</SONG> -->
<!-- You can tell what album I was 
     listening to when I wrote this example -->

A comment Information Item includes:

content
parent

A Processing Instruction Information Item Includes:

<?robots index="yes" follow="no"?>
<?php 
  mysql_connect("database.unc.edu", "clerk", "password"); 
  $result = mysql("CYNW", "SELECT LastName, FirstName FROM Employees 
    ORDER BY LastName, FirstName"); 
  $i = 0;
  while ($i < mysql_numrows ($result)) {
     $fields = mysql_fetch_row($result);
     echo "<person>$fields[1] $fields[0] </person>\r\n";
     $i++;
  }
  mysql_close();
?>

target
content
notation
base URI
parent

Characters

A character is one Unicode character in the content of an element, attribute value, comment or processing instruction data.
A Character Information Item includes:

character code
The Unicode value of the character in the range 0 to #x10FFFF.

element content whitespace
A flag indicating whether the character is whitespace appearing within element content

parent
Note that Unicode is not a two-byte character set

Namespaces

An element has one namespace information item for each namespace in scope on the element. This is not the same as the namespaces declared on the element.
A Namespace Information Item includes:
- prefix
- namespace name
There is no obvious representation of namespace information items in the syntax of an XML document.

These are namespace declaration attributes, not namespace information items:

xmlns:rdf="http://www.w3.org/TR/REC-rdf-syntax#"
xmlns:dc="http://purl.org/dc/"
xmlns="http://www.w3.org/2000/svg"

Consider this document:

<svg:svg width="5cm" height="4cm"
 xmlns:svg="http://www.w3.org/2000/svg">
  <svg:desc>Two rectangles</svg:desc>
  <svg:rect x="1.5cm" y="3.5cm" width="12cm" height="9.9cm"/>
  <svg:rect x="2.5cm" y="2.8cm" width="3cm" height="17cm"/>
</svg:svg>

Each of the four elements has a namespace information item with the prefix svg and the namespace name http://www.w3.org/2000/svg

Document Type Declaration

<!DOCTYPE SONG SYSTEM "song.dtd">
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
                       "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

A Document Type Declaration Information Item includes:

SYSTEM ID
PUBLIC ID
children: only the comment and processing instruction information items in the internal DTD subset and external DTD subsets.
parent

Document Type Definition

<!ELEMENT SONG (TITLE, PHOTO?, COMPOSER+, PRODUCER*, 
 PUBLISHER*, LENGTH?, YEAR?, ARTIST+)>
<!ATTLIST SONG xmlns       CDATA #REQUIRED
               xmlns:xlink CDATA #REQUIRED>
<!ELEMENT TITLE (#PCDATA)>

<!ELEMENT PHOTO EMPTY>
<!ATTLIST PHOTO xlink:type CDATA #FIXED "simple"
                xlink:href CDATA #REQUIRED
                xlink:show CDATA #IMPLIED
                ALT        CDATA #REQUIRED
                WIDTH      CDATA #REQUIRED
                HEIGHT     CDATA #REQUIRED
>

<!ELEMENT COMPOSER (#PCDATA)>
<!ELEMENT PRODUCER (#PCDATA)>
<!ELEMENT PUBLISHER (#PCDATA)>
<!ATTLIST PUBLISHER xlink:type CDATA #IMPLIED
                    xlink:href CDATA #IMPLIED
>

<!ELEMENT LENGTH (#PCDATA)>
<!-- This should be a four digit year like "1999",
     not a two-digit year like "99" -->
<!ELEMENT YEAR (#PCDATA)>

<!ELEMENT ARTIST (#PCDATA)>

There is no information item for this.
Comments and processing instructions in the DTD are reported as children of the Document Type Declaration information item
Notation and general entity declarations are reported as properties of the Document information item
Attribute types and default values are reported on the actual attributes in the document instance.
Everything else is not reported!

Entities

An XML document is made up of one or more physical storage units called entities
Entity references :
- Parsed internal general entity references like &
- Parsed external general entity references
- Unparsed external general entity references
- External parameter entity references
- Internal parameter entity references
Reading an XML document is not the same thing as reading an XML file

The XML file contains entity references.
The XML document contains the entities' replacement text.
When you use a parser to read a document you'll get the text including characters like <. You will not see the entity references.

Entity Information Items

Two kinds of entity information items:
- Unparsed Entity Information Item
- Unexpanded Entity Information Items
Other entities are not reported

Unparsed Entity Information Items

name
system identifier
public identifier
Notation

Unexpanded Entity Information Items

name
entity
parent

The Infoset Omits:

The internal and external DTD subsets; especially ELEMENT and ATTLIST declarations
Schema types
CDATA sections
Character references
Expanded, parsed entity references
Whether an empty element uses two tags or one
What kind of quotes surround attributes
Insignificant white space in attributes
White space that occurs between attributes
Attribute order

The PSVI

A schema assigns a type to each element
Schema validation produces a Post Schema Validation Infoset, PSVI for short
Schema aware applications using schema aware parsers and APIs can make use of the types of elements

Canonical XML

A W3C proposed standard serialization format of an XML document instance
Not everyone agrees that this is a good thing! or that this is the right format! It's totally unsuitable for editors and validation.
Based on the XPath 1.0 data model
Not really Infoset compatible
Something of this nature is nonetheless clearly needed for non-XML aware tools like digital signatures, change management, hash functions, and the like.

How are documents canonicalized?

The document is encoded in UTF-8
Line breaks are normalized to a linefeed (ASCII , \n)
Attribute values are normalized, as if by a validating processor
Character and parsed entity references are replaced
CDATA sections are replaced with their character content
The XML and document type declarations are removed
Empty elements are converted to start tag-end tag pairs
White space outside of the document element and within start and end tags is normalized
All white space in character content is retained (except for characters removed during linefeed normalization)
Attribute value delimiters are set to double quotes
Special characters in attribute values and character content are replaced by character references
Superfluous namespace declarations are removed from each element
Default attributes are added to each element
Lexicographic order is imposed on the namespace declarations and attributes of each element

Canonicalization software

XML Canonicalizer from IBM's XML Security Suite:

http://www.alphaworks.ibm.com/tech/xmlsecuritysuite
c14n.C14nDOM reads an XML document from stdin and writes the canonicalized output to stdout:
% java c14n.C14nDOM -xpath < hotcop.xml > canonicalized_hotcop.xml
-xpath option necessary to support the final draft of Canonical XML 1.0.

API in com.ibm.xml.dsig.Canonicalizer

package com.ibm.xml.dsig;

public interface Canonicalizer {

  public static final String W3C 
   = "http://www.w3.org/TR/2000/WD-xml-c14n-20000119"
  public static final java.lang.String W3C2 
   = "http://www.w3.org/TR/2001/REC-xml-c14n-20010315"
  public static final java.lang.String W3C2WC 
   = "http://www.w3.org/TR/2001/REC-xml-c14n-20010315#WithComments"
  public static final java.lang.String EXCLUSIVE;
  public static final java.lang.String EXCLUSIVEWC;

  public String getURI();
  public void canonicalize(org.w3c.dom.Node node, OutputStream stream)
   throws IOException;

}

Implementations include:
- com.ibm.xml.dsig.transform.ExclusiveC11r
- com.ibm.xml.dsig.transform.ExclusiveC11rWC
- com.ibm.xml.dsig.transform.W3CCanonicalizer
- com.ibm.xml.dsig.transform.W3CCanonicalizer2
- com.ibm.xml.dsig.transform.W3CCanonicalizer2WC
All can be created with no-args constructors

Apache XML Security Suite

http://xml.apache.org/security/
CVS only; won't build

A standard feature for DOM level 3's DOMWriter

http://www.w3.org/TR/DOM-Level-3-ASLS/load-save.html#LS-Interfaces-DOMWriter

Digital Signatures

W3C/IETF Joint Proposed Recommendation, August 20, 2001
XML Signatures provide:

Integrity
Message authentication
Signer authentication

For data of any type

Not Just for Signing XML

Signed data can be located within the XML that includes the signature or elsewhere.
An enveloped signature is enclosed inside the XML element it signs
An enveloping signature signs XML data it contains.
A detached signature signs data external to the Signature element, possibly in another document entirely.

Generic Digital Signature Process

The signature processor calculates a hash code for some data using a strong, one-way hash function.
The processor encrypts the hash code using a private key.
The verifier calculates the hash code for the data it's received.
It then decrypts the encrypted hash code using the public key to see if the hash codes match.

XML Signature Process

The signature processor digests (calculates the hash code for) a data object.
The processor places the digest value in a Signature element.
The processor digests the Signature element.
The processor cryptographically signs the Signature element.

XML Digital Signature software

SampleSign2 and VerifyGUI from IBM's XML Security Suite: http://www.alphaworks.ibm.com/tech/xmlsecuritysuite
First use the JDK's keytool to generate a key:
% keytool -genkey -dname "CN=Elliotte Rusty Harold, OU=Metrotech, O=Polytechnic, L=Brooklyn, S=New York, C=US" -alias elharo -storepass mypassword -keypass mykeypassword
SampleSign2 reads an XML document from stdin and writes the signature to stdout:
C:\> java SampleSign2 elharo mypassword mykeypassword -ext http://www.cafeconleche.org/slides/hoffman/fundamentals/examples/hotcop.xml > hotcop_signature.xml Key store: C:\Documents and Settings\Administrator\.keystore Sign: 7030ms
VerifyGUI reads signature from stdin and warns of changes to signed content.
C:\>java VerifyGUI < hotcop_signature.xml The signature has a KeyValue element. The signature has one or more X509Data elements. Checks an X509Data: It has 1 certificate(s). Certificate Information: Version: 1 Validity: OK SubjectDN: CN=Elliotte Rusty Harold, OU=Metrotech, O=Polytechnic, L=Brooklyn, ST=New York, C=US IssuerDN: CN=Elliotte Rusty Harold, OU=Metrotech, O=Polytechnic, L=Brooklyn, ST=New York, C=US Serial#: 983556890 Time to verify: 951 [msec]

A Detached Signature for hotcop.xml

<?xml version='1.0' encoding='UTF-8'?>
<Signature xmlns="http://www.w3.org/2000/09/xmldsig#">
  <SignedInfo>
    <CanonicalizationMethod Algorithm="http://www.w3.org/TR/2000/WD-xml-c14n-20000119"/>
    <SignatureMethod Algorithm="http://www.w3.org/2000/09/xmldsig#dsa-sha1"/>
    <Reference URI="http://www.cafeconleche.org/slides/hoffman/fundamentals/examples/hotcop.xml">
      <DigestMethod Algorithm="http://www.w3.org/2000/09/xmldsig#sha1"/>
      <DigestValue>nvfYilfgN/rICyzhGmjidKCFoC8=</DigestValue>
    </Reference>
  </SignedInfo>
  <SignatureValue>
    hfowa4qdbuMkoZfX1/VXd4UBpIpZMM5+6CElmY7jOIKFqvXq5A5VKw==
  </SignatureValue>
  <KeyInfo>
    <KeyValue>
      <DSAKeyValue>
        <P>
          /X9TgR11EilS30qcLuzk5/YRt1I870QAwx4/gLZRJmlFXUAiUftZPY1Y+r/F9bow9s
          ubVWzXgTuAHTRv8mZgt2uZUKWkn5/oBHsQIsJPu6nX/rfGG/g7V+fGqKYVDwT7g/bT
          xR7DAjVUE1oWkTL2dfOuK2HXKu/yIgMZndFIAcc=
        </P>
        <Q>l2BQjxUjC8yykrmCouuEC/BYHPU=</Q>
        <G>
          9+GghdabPd7LvKtcNrhXuXmUr7v6OuqC+VdMCz0HgmdRWVeOutRZT+ZxBxCBgLRJFn
          Ej6EwoFhO3zwkyjMim4TwWeotUfI0o4KOuHiuzpnWRbqN/C/ohNWLx+2J6ASQ7zKTx
          vqhRkImog9/hWuWfBpKLZl6Ae1UlZAFMO/7PSSo=
        </G>
        <Y>
          6jKpNnmkkWeArsn5Oeeg2njcz+nXdk0f9kZI892ddlR8Lg1aMhPeFTYuoq3I6neFlb
          BjWzuktNZKiXYBfKsSTB8U09dTiJo2ir3HJuY7eW/p89osKMfixPQsp9vQMgzph6Qa
          lY7j4MB7y5ROJYsTr1/fFwmj/yhkHwpbpzed1LE=
        </Y>
      </DSAKeyValue>
    </KeyValue>
    <X509Data>
      <X509IssuerSerial>
        <X509IssuerName>CN=Elliotte Rusty Harold, OU=Metrotech, O=Polytechnic, L=Brooklyn, ST=New York, C=US</X509IssuerName>
        <X509SerialNumber>983556890</X509SerialNumber></X509IssuerSerial>
      <X509SubjectName>CN=Elliotte Rusty Harold, OU=Metrotech, O=Polytechnic, L=Brooklyn, ST=New York, C=US</X509SubjectName>
      <X509Certificate>
MIIDLzCCAu0CBDqf4xowCwYHKoZIzjgEAwUAMH0xCzAJBgNVBAYTAlVTMREwDwYDVQQIEwhOZXcg
WW9yazERMA8GA1UEBxMIQnJvb2tseW4xFDASBgNVBAoTC1BvbHl0ZWNobmljMRIwEAYDVQQLEwlN
ZXRyb3RlY2gxHjAcBgNVBAMTFUVsbGlvdHRlIFJ1c3R5IEhhcm9sZDAeFw0wMTAzMDIxODE0NTBa
Fw0wMTA1MzExODE0NTBaMH0xCzAJBgNVBAYTAlVTMREwDwYDVQQIEwhOZXcgWW9yazERMA8GA1UE
BxMIQnJvb2tseW4xFDASBgNVBAoTC1BvbHl0ZWNobmljMRIwEAYDVQQLEwlNZXRyb3RlY2gxHjAc
BgNVBAMTFUVsbGlvdHRlIFJ1c3R5IEhhcm9sZDCCAbgwggEsBgcqhkjOOAQBMIIBHwKBgQD9f1OB
HXUSKVLfSpwu7OTn9hG3UjzvRADDHj+AtlEmaUVdQCJR+1k9jVj6v8X1ujD2y5tVbNeBO4AdNG/y
ZmC3a5lQpaSfn+gEexAiwk+7qdf+t8Yb+DtX58aophUPBPuD9tPFHsMCNVQTWhaRMvZ1864rYdcq
7/IiAxmd0UgBxwIVAJdgUI8VIwvMspK5gqLrhAvwWBz1AoGBAPfhoIXWmz3ey7yrXDa4V7l5lK+7
+jrqgvlXTAs9B4JnUVlXjrrUWU/mcQcQgYC0SRZxI+hMKBYTt88JMozIpuE8FnqLVHyNKOCjrh4r
s6Z1kW6jfwv6ITVi8ftiegEkO8yk8b6oUZCJqIPf4VrlnwaSi2ZegHtVJWQBTDv+z0kqA4GFAAKB
gQDqMqk2eaSRZ4Cuyfk556DaeNzP6dd2TR/2Rkjz3Z12VHwuDVoyE94VNi6ircjqd4WVsGNbO6S0
1kqJdgF8qxJMHxTT11OImjaKvccm5jt5b+nz2iwox+LE9Cyn29AyDOmHpBqVjuPgwHvLlE4lixOv
X98XCaP/KGQfClunN53UsTALBgcqhkjOOAQDBQADLwAwLAIUODqxsFzS96BjrVA4LVo5FzuWBRMC
FC0xfXxbaJaCJuVqtcBv4bqwV0EX
      </X509Certificate>
    </X509Data>
  </KeyInfo>
</Signature>

XML Encryption

Can encrypt:
- An XML element
- The content of an XML element
- Arbitrary binary data with a URI
The ciphertext can be stored in an EncryptedData element or referenced (through a URI) by an EncryptedData element.

XML Encryption Algorithms

Arbitrary encryption algorithms are supported.
Required encryption algorithms include:
- AES with CMS keylength
- 3DES
- RSA-OAEP used with AES
- RSA-v1.5 used with 3DES
Required key transport algorithms include:
- RSA-OAEP used with AES
- RSA-v1.5 used with 3DES
Required Symmetric Key Wrap algorithms include:
- AES KeyWrap
- CMS-KeyWrap-3DES

Complete standard algorithm list

From the spec:

Block Encryption

REQUIRED TRIPLEDES
http://www.w3.org/2001/04/xmlenc#tripledes-cbc
REQUIRED AES-128
http://www.w3.org/2001/04/xmlenc#aes128-cbc
REQUIRED AES-256
http://www.w3.org/2001/04/xmlenc#aes256-cbc
OPTIONAL AES-192
http://www.w3.org/2001/04/xmlenc#aes192-cbc

Key Transport

REQUIRED RSA-v1.5
http://www.w3.org/2001/04/xmlenc#rsa-1_5
REQUIRED RSA-OAEP
http://www.w3.org/2001/04/xmlenc#rsa-oaep-mgf1p

Key Agreement

OPTIONAL Diffie-Hellman
http://www.w3.org/2001/04/xmlenc#dh

Symmetric Key Wrap

REQUIRED TRIPLEDES KeyWrap
http://www.w3.org/2001/04/xmlenc#kw-tripledes
REQUIRED AES-128 KeyWrap
http://www.w3.org/2001/04/xmlenc#kw-aes128
REQUIRED AES-256 KeyWrap
http://www.w3.org/2001/04/xmlenc#kw-aes256
OPTIONAL AES-192 KeyWrap
http://www.w3.org/2001/04/xmlenc#kw-aes192

Message Digest

REQUIRED SHA1
http://www.w3.org/2000/09/xmldsig#sha1
RECOMMENDED SHA256
http://www.w3.org/2001/04/xmlenc#sha256
OPTIONAL SHA512
http://www.w3.org/2001/04/xmlenc#sha512
OPTIONAL RIPEMD-160
http://www.w3.org/2001/04/xmlenc#ripemd160

Message Authentication

RECOMMENDED XML Digital Signature
http://www.w3.org/TR/2001/CR-xmldsig-core-20010419/

Canonicalization

OPTIONAL Canonical XML with Comments
http://www.w3.org/TR/2001/REC-xml-c14n-20010315#WithComments
OPTIONAL Canonical XML (omits comments)
http://www.w3.org/TR/2001/REC-xml-c14n-20010315

Encoding

REQUIRED base64
http://www.w3.org/2000/09/xmldsig#base64

XML Encryption Syntax

Namespace URI http://www.w3.org/2001/04/xmlenc# (Normally mapped to the xenc prefix)
Uses some elements from XML digital signatures for keys

Typical form:

<EncryptedData Id="unique_value" 
      Type="http://www.w3.org/2001/04/xmlenc#Element |
      http://www.w3.org/2001/04/xmlenc#Content |
      MIME media type URI">
  <EncryptionMethod Algorithm="URI"/>
  <ds:KeyInfo>
    <ds:KeyName>Plain text name of key</ds:KeyName>
    <ds:RetrievalMethod URI="key location"
     Type="http://www.w3.org/2001/04/xmlenc#EncryptedKey" />
  </ds:KeyInfo>
  <CipherData Nonce="Base-64 encoded salt">
    <CipherValue>Base-64 encoded cipher text</CipherValue>
    <CipherReference URI="URL of cipher text">
      <Transforms>
        <ds:Transform 
          Algorithm="http://www.w3.org/TR/1999/REC-xpath-19991116">
          <ds:XPath xmlns:rep="http://www.example.org/repository">
             self::text()[parent::CipherValue[@id="example1"]]
          </ds:XPath>
        <ds:Transform>
        <ds:Transform Algorithm="http://www.w3.org/2000/09/xmldsig#base64"/>
      </Transforms>
    </CipherReference>
  </CipherData>
</EncryptedData>

At a minimum, each EncryptedData must contain a CipherData which contains either a CipherValue or a CipherReference. Everything else is optional.

An XML Document containing sensitive information

<?xml version='1.0'?>
<PaymentInfo xmlns="http://example.org/payment">
  <Name>Elliotte Rusty Harold<Name/>
  <CreditCard Limit='1000' Currency='USD'>
    <Number>1234 5678 9012 3456</Number>
    <Issuer>Citibank</Issuer>
    <Expiration>03/02</Expiration>
  </CreditCard>
</PaymentInfo>

An XML Document containing an encrypted element

<?xml version='1.0'?>
<PaymentInfo xmlns="http://example.org/payment">
  <Name>Elliotte Rusty Harold<Name/>
  <EncryptedData Type='http://www.w3.org/2001/04/xmlenc#Element'
     xmlns='http://www.w3.org/2001/04/xmlenc#'>
    <EncryptionMethod
      Algorithm="http://www.w3.org/2001/04/xmlenc#3des-cbc"/>
    <CipherData>
      <CipherValue>A23B45C56CABE4BE33327</CipherValue>
    </CipherData>
  </EncryptedData>
</PaymentInfo>

An XML Document containing encrypted element data

<?xml version='1.0'?>
<PaymentInfo xmlns="http://example.org/payment">
  <Name>Elliotte Rusty Harold<Name/>
  <CreditCard Limit="1000" Currency="USD">
     <EncryptedData Type="http://www.w3.org/2001/04/xmlenc#Content"
        xmlns="http://www.w3.org/2001/04/xmlenc#">
      <EncryptionMethod
        Algorithm="http://www.w3.org/2001/04/xmlenc#3des-cbc"/>
      <CipherData>
        <CipherValue>A23B45C56CABE4BE3</CipherValue>
      </CipherData>
    </EncryptedData>
  </CreditCard>
</PaymentInfo>

An XML Document containing encrypted text

<?xml version='1.0'?>
<PaymentInfo xmlns="http://example.org/payment">
  <Name>Elliotte Rusty Harold<Name/>
  <CreditCard Limit='1000' Currency='USD'>
    <Number>
      <EncryptedData Type="http://www.w3.org/2001/04/xmlenc#Content"
                     xmlns="http://www.w3.org/2001/04/xmlenc#">
        <EncryptionMethod
          Algorithm="http://www.w3.org/2001/04/xmlenc#3des-cbc"/>
        <CipherData>
          <CipherValue>A23B45C56CABE4BE</CipherValue>
        </CipherData>
      </EncryptedData>
    </Number>
    <Issuer>Citibank</Issuer>
    <Expiration>03/02</Expiration>
  </CreditCard>
</PaymentInfo>

A completely encrypted XML Document

<?xml version='1.0'?>
<EncryptedData 
   Type="http://www.isi.edu/in-notes/iana/assignments/media-types/text/xml"
   xmlns="http://www.w3.org/2001/04/xmlenc#">
  <EncryptionMethod Algorithm="http://www.w3.org/2001/04/xmlenc#3des-cbc"/>
  <CipherData>
    <CipherValue>A23B45C56CABE4BE7687989219C4E5DEADBEEFCAFEBABE</CipherValue>
  </CipherData>
</EncryptedData>

XML Encryption Software

xss4j, IBM's XML Security Suite:

http://www.alphaworks.ibm.com/tech/xmlsecuritysuite
enc.XMLCipher2 reads an XML document and encrypts the part of it specified by an XPath expression using a template file:
% java enc.XMLCipher2 -e keyinfo.xml hotcop.xml /SONG/PUBLISHER template1.xml
API

Apache XML Security Suite

http://xml.apache.org/security/
org.apache.xml.security.c14n.Canonicalizer
I have not been able to build this. No precompiled binaries yet.

JSR-106: XML Digital Encryption APIs

http://jcp.org/jsr/detail/106.jsp

Issues XML Encryption doesn't address

Authentication
Authorization
Access Control

To Learn More

XML Infoset Specification: http://www.w3.org/TR/xml-infoset
Canonical XML Specification: http://www.w3.org/TR/xml-c14n
XML Signature Specification: http://www.w3.org/TR/xmldsig-core/
XML Encryption Requirements: http://www.w3.org/TR/xml-encryption-req
XML Encryption Syntax and Processing: http://www.w3.org/TR/xmlenc-core/

Part II: XML 1.1

Discriminating against languages simply because their scripts were not encoded in Unicode 2.0 is inherently unjust.

--XML Blueberry Requirements

New features

Changes the definition of white space
Enables native language markup in Ethiopic, Burmese, and Cambodian
Breaks compatibility with XML 1.0

White Space

XML 1.0 defines white space thusly:
[3] S ::= (#x20 | #x9 | #xD | #xA)+
With XML 1.1 this becomes
[3] S ::= (#x9 | #x20 | #xA | #xD | #x85 | #x2028)+
Supports IBM mainframe editors
Breaks everybody else's software

Native language markup

Currently only scripts defined in Unicode 2.0 are allowed in XML element and attribute names
All scripts defined in Unicode are allowed in element and attribute content
Unicode 3.0 adds:
- Ethiopic (Amharic, Geez, etc.)
- Burmese
- Cambodian
- Mongolian
- Dvihehi
- Yi syllabary
Also:
- Cherokee
- Canadian aboriginal languages
Perhaps:
- Japanese
- Cantonese
Is this enough to justify breaking compatibility?

Name characters

XML 1.0 explicitly lists them; everything not permitted is forbidden

XML 1.1:


[4] NameStartChar := ":" | [A-Z] | "_" | [a-z] | [#xC0-#x02FF] 
                    | [#x0370-#x037D] | [#x037F-#x2027] 
                    | [#x202A-#x218F] | [#x2800-#xD7FF] 
                    | [#xE000-#xFDCF] | [#xFDE0-#xFFEF] 
                    | [#x10000-#x10FFFF]
[4a] NameChar := NameStartChar | "-" | "." | [0-9] | #xB7 | [#x0300-#x036F]

Many of these characters aren't defined yet in Unicode
Many of these characters are very surprising; e.g. musical and mathematical sybols

Harm Reduction proposals

1.0.1
Mandate non-1.1 for documents that don't use 1.1
Well-formedness error?
Non-fatal error?
All of these were rejected by the working group.

Part III: XPath 2.0 and Beyond

In SQL, the query language is not expressed in tables and rows. In XQuery, the query language is not expressed in XML. Why is this a problem?

--Jonathan Robie on the xml-dev mailing list

XPath 2.0

Used for XSLT 2.0 and XQuery
Schema Aware
Partially implemented by Michael Kay's Saxon 7.0, http://saxon.sourceforge.net/

XPath 2.0 Goals

Simplify manipulation of XML Schema-typed content
Simplify manipulation of string content
Support related XML standards
Improve ease of use
Improve interoperability
Improve internationalization (i18n) support
Maintain backward compatibility
Enable improved processor efficiency

XPath 2.0 Requirements

Must express data model in terms of the Infoset
Must provide common core syntax and semantics for XSLT 2.0 and XML Query 1.0
Must support explicit "for any" or "for all" comparison and equality semantics
Must add min() and max() functions
Any valid XPath 1.0 expression SHOULD also be a valid XPath 2.0 expression when operating in the absence of XML Schema type information.
Should provide intersection and difference functions
Must loosen restrictions on location steps
Must provide a conditional expression (e.g. ternary ?: operator in Java and C)
Should support additional string functions, possibly including space padding, string replacement and conversion to upper or lower case
Must support regular expression string matching using the regexp syntax from schemas
Must add support for XML Schema primitive datatypes
Should add support for XML Schema structures

XPath 1.0 Data Model

(Adapted from Jeni Tennison)

The first class objects are strings, numbers, booleans, and node-sets (plus result tree fragments for XSLT)
Node-sets contain nodes (which are not first-class objects)
Nodes have various properties, including children - a node set (the order of the children can be worked out from the nodes' document order)
Seven node types: document, element, attribute, text, namespace, processing instruction, and comment
There are conceptually two kinds of node-sets:
- Node-sets containing new nodes (result tree fragments) can only be generated using XSLT
- Node-sets containing existing nodes can only be generated using XPath
No list data types, only node-sets but no number sets
Not Infoset compatible

XPath 2.0 Data Model

(Adapted from Jeni Tennison)

The first class object type is a sequence; i.e. an ordered list
Sequences contain items of two types: simple typed values or nodes. (They may not contain other sequences.)
A sequence containing one item is the same as the item.
Simple typed values have W3C XML Schema Language simple types: xsd:gYear, xsd:int, xsd:decimal, xsd:date, etc.
Seven node types: document, element, attribute, text, namespace, processing instruction, and comment
Nodes have these properties:
- node-kind: either "document", "element", "attribute", "text", "namespace", "processing-instruction", or "comment".
- name: a sequence containing one expanded QName if the node has a name (elements, attributes, etc.) or an empty sequence if the node doesn't have a name (comments, text nodes, etc.)
- parent: a sequence containing the unique parent node; the empty sequence is returned for parentless nodes, particularly document and namespace nodes
- base-uri: URI from which this particular node came
- string-value: same as XPath 1.0
- typed-value: a sequence of simple typed values corresponding to the node (always the empty sequence for anything other than elements and attributes)
- A sequence of child nodes (empty except for element and document nodes)
- attributes: a sequence of attribute nodes; empty except for attribute nodes
- namespaces: a sequence of namespace nodes in-scope on the node
- declaration: a sequence containing 0 or 1 schema component
- type: a sequence containing 0 or 1 schema component
- unique-ID: a sequence containing 0 or 1 xsd:ID type node
Infoset compatible

XPath Comments

{-- This is an XPath comment --}

<xsl:apply-templates 
 select="{-- The difference between the context node and the 
             current node is crucial here --}
 ../composition[@composer=current()/@id]"/>

Namespace wildcards

<xsl:template match="*:set">
  This matches MathML set elements, SVG set elements, set
  elements in no namespace at all, etc. 
</xsl:template>

Can use functions as location steps

The document() function returns the root of a document at a given URL
document("http://www.cafeconleche.org/")//today

Can use parenthesized expressions as location steps

/child::contacts/(child::personal | child::business)/child::name
Abbreviated: /contacts/(personal | business)/name

Dereference steps

Map an IDREF attribute node to the element it refers to

Composers and their compositions are linked through the an ID-type id attribute of the composer element and the IDREF-type composer attribute of the composition element:

  <composer id="c3">
    <name>
      <first_name>Beth</first_name> 
      <middle_name></middle_name> 
      <last_name>Anderson</last_name>
    </name>
  </composer>
    
  <composition composers="c3">
    <title>Trio: Dream in D</title>
    <date><year>(1980)</year></date> 
    <length>10'</length>
    <instruments>fl, pn, vc, or vn, pn, vc</instruments>
    <description>
      Rhapsodic. Passionate. Available on CD 
      <cite><a href="http://www.amazon.com/exec/obidos/ASIN/B000007NMH/qid%3D913265342/sr%3D1-2/">Two by Three</a></cite> 
      from North/South Consonance (1998).
    </description> 
    <publisher></publisher>
  </composition>

With XPath 1.0:

<xsl:template match="composition">
  <h2>
    <xsl:value-of select="name"/> by
    <xsl:value-of select="../composer[@id=current()/@composer]"/>
  </h2>
</xsl:template>

With XPath 2.0:

<xsl:template match="composition">
  <h2>
    <xsl:value-of select="name"/> by
    <xsl:value-of select="@composers=>composer/name"/>
  </h2>
</xsl:template>

Constructing sequences

(1, 3, 2, 34, 76, "fnord")
(1 to 12)
Using constructors: (xf:date("2002-03-11"), xf:date("2002-03-12"), xf:date("2002-03-13"), xf:date("2002-03-14"), xf:date("2002-03-15"))
Sequences can have mixed types: (xf:date("2002-03-11"), "Hello", 15)
Sequences do not nest; that is, a sequence cannot be a member of a sequence
Sequences are not sets: they are ordered and can contain duplicates
A single item is the same as a one-element sequence containing that item.

Sequence example

<?xml version="1.0"?> 
<xsl:stylesheet version="2.0" 
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:template match="/">
    <numbers>
      <xsl:for-each select="(1 to 10)">
        <integer>
          <xsl:value-of select="."/>
        </integer>
      </xsl:for-each>
    </numbers>
  </xsl:template>

</xsl:stylesheet>

Output (modulo white space):

<?xml version="1.0" encoding="utf-8"?>
<numbers>
<integer>1</integer>
<integer>2</integer>
<integer>3</integer>
<integer>4</integer>
<integer>5</integer>
<integer>6</integer>
<integer>7</integer>
<integer>8</integer>
<integer>9</integer>
<integer>10</integer>
</numbers>

Unions of sequences

union or |
Duplicates are eliminated

<?xml version="1.0"?> 
<xsl:stylesheet version="2.0" 
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:output indent="yes"/>             
                
  <xsl:template match="/">
    <numbers>
      <xsl:for-each select='(3 to 10) | (5 to 12) | (20 to 23)'>
        <integer>
          <xsl:value-of select="."/>
        </integer>
      </xsl:for-each>
    </numbers>
  </xsl:template>

</xsl:stylesheet>

Output:

<numbers>
<integer>3</integer>
<integer>4</integer>
<integer>5</integer>
<integer>6</integer>
<integer>7</integer>
<integer>8</integer>
<integer>9</integer>
<integer>10</integer>
<integer>11</integer>
<integer>12</integer>
<integer>20</integer>
<integer>21</integer>
<integer>22</integer>
<integer>23</integer>
</numbers>

Intersections of sequences

intersect

<?xml version="1.0"?> 
<xsl:stylesheet version="2.0" 
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:output indent="yes"/>             
                
  <xsl:template match="/">
    <numbers>
      <xsl:for-each select='(3 to 10) intersect (5 to 12)'>
        <integer>
          <xsl:value-of select="."/>
        </integer>
      </xsl:for-each>
    </numbers>
  </xsl:template>

</xsl:stylesheet>

Output:

<numbers>
<integer>5</integer>
<integer>6</integer>
<integer>7</integer>
<integer>8</integer>
<integer>9</integer>
<integer>10</integer>
</numbers>

Except sequences

except

<?xml version="1.0"?> 
<xsl:stylesheet version="2.0" 
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:output indent="yes"/>             
                
  <xsl:template match="/">
    <numbers>
      <xsl:for-each select='(3 to 10) except (5 to 12)'>
        <integer>
          <xsl:value-of select="."/>
        </integer>
      </xsl:for-each>
    </numbers>
  </xsl:template>

</xsl:stylesheet>

Output:

<numbers>
  <integer>3</integer>
  <integer>4</integer>
</numbers>

Value comparison operators

Compare single values and sequences of single or no values:
- eq
- ne
- lt
- le
- gt
- ge
These operators return either true, false, the empty sequence, an error, or a type exception.

General comparisons

Differ from value comparisons in that the condition only need to be true for some pair of items in a sequence
- =
- !=
- <
- <=
- >
- >=
These operators always return either true or false.

Node comparisons

== and != have the same semantics as Java's == operator (identity), not the equals() method (equality)
>> and << compare single nodes and sequences of single nodes for document order
The precedes operator returns true if the first operand node is reachable from the second operand node using the preceding axis; otherwise it returns false.
The follows operator returns true if the first operand node is reachable from the second operand node using the following axis; otherwise it returns false.

For Expressions

Useful for joining documents
Useful for restructuring data.

Syntax:

for $var1 in expression, $var2 in expression...
return expression

for Example

Consider the list of weblogs at http://static.userland.com/weblogMonitor/logs.xml

<?xml version="1.0" encoding="ISO-8859-1" ?>
<!DOCTYPE foo SYSTEM "http://msdn.microsoft.com/xml/general/htmlentities.dtd">
<weblogs>
    <log>
        <name>MozillaZine</name>
        <url>http://www.mozillazine.org</url>
        <changesUrl>http://www.mozillazine.org/contents.rdf</changesUrl>
        <ownerName>Jason Kersey</ownerName>
        <ownerEmail>kerz@en.com</ownerEmail>
        <description>THE source for news on the Mozilla Organization.  DevChats, Reviews, Chats, Builds, Demos, Screenshots, and more.</description>
        <imageUrl></imageUrl>
        <adImageUrl>http://static.userland.com/weblogMonitor/ads/kerz@en.com.gif</adImageUrl>
    </log>
    <log>
        <name>SalonHerringWiredFool</name>
        <url>http://www.salonherringwiredfool.com/</url>
        <ownerName>Some Random Herring</ownerName>
        <ownerEmail>salonfool@wiredherring.com</ownerEmail>
        <description></description>
    </log>
    <log>
        <name>SlashDot.Org</name>
        <url>http://www.slashdot.org/</url>
        <ownerName>Simply a friend</ownerName>
        <ownerEmail>afriendofweblogs@weblogs.com</ownerEmail>
        <description>News for Nerds, Stuff that Matters.</description>
    </log>
</weblogs>

The changesUrl element points to a document like this:

<?xml version="1.0"?>
<!DOCTYPE rss PUBLIC "-//Netscape Communications//DTD RSS 0.91//EN" 
                     "http://my.netscape.com/publish/formats/rss-0.91.dtd">
<rss version="0.91">
  <channel>
    <title>MozillaZine</title>
    <link>http://www.mozillazine.org/</link>
    <language>en-us</language>
    <description>Your source for Mozilla news, advocacy, interviews, builds, and more!</description>
    <copyright>Copyright 1998-2002, The MozillaZine Organization</copyright>
    <managingEditor>jason@mozillazine.org</managingEditor>
    <webMaster>jason@mozillazine.org</webMaster>
    <image>
      <title>MozillaZine</title>
      <url>http://www.mozillazine.org/image/mynetscape88.gif</url>
      <description>Your source for Mozilla news, advocacy, interviews, builds, and more!</description>
      <link>http://www.mozillazine.org/</link>
    </image>

    <item>
      <title>BugDays Are Back!</title>
      <link>http://www.mozillazine.org/talkback.html?article=2151</link>
    </item>

    <item>
      <title>Independent Status Reports</title>
      <link>http://www.mozillazine.org/talkback.html?article=2150</link>
    </item>

  </channel>

</rss>

We want to process all the item elements from each weblog.

for Example

<xsl:template match="weblogs">
  <xsl:apply-templates select="
    for $url in log/changesUrl
    return document($url)/item
  "/>
</xsl:template>

Conditional Expressions

if ( expression) then expression else expression

Not all weblogs have a changesUrl

<xsl:template match="log">
  <xsl:apply-templates select="
    if (changesUrl)
     then document(changesUrl)
     else document(url)"/>
</xsl:template>

Quantified Expressions

some $QualifedName in expression satisfies expression
every $QualifedName in expression satisfies expression
Both return boolean values, true or false

<xsl:template match="weblogs">
  <xsl:if test="some $log in log satisfies changesURL">
     ????
  </xsl:if>
</xsl:template>

<xsl:template match="weblogs">
  <xsl:if test="every $log in log satisfies url">
    ????
  </xsl:if>
</xsl:template>

Functions and operators

Functions are in the http://www.w3.org/2001/12/xquery-functions namespace which is customarily mapped to the xf prefix
The function namespace name and prefix is understood in XSLT, without being explicitly stated.
Operators are in the http://www.w3.org/2001/12/xquery-operators namespace
XPath implementations such as XQuery and XSLT map the operators to symbols like * and +

Accessor Functions

xf:node-kind(Node): Returns a string identifying the kind of node; i.e. "document", "element", "attribute", "text", "namespace", "processing-instruction", or "comment".
xf:name(Node): returns zero or one QName
xf:string(Object): returns the string value of anything
xf:data(Node): returns a sequence of zero or more typed simple values
xf:base-uri(node): returns the base URI of an Element or Document node
xf:unique-ID(element): returns the unique ID of an element

Constructor Functions

Create a simple type from a string
Numeric constructors:
- xf:decimal(string $srcval) => decimal
- xf:integer(string $srcval) => integer
- xf:long(string $srcval) => integer
- xf:int(string $srcval) => integer
- xf:short(string $srcval) => integer
- xf:byte(string $srcval) => integer
- xf:float(string $srcval) => float
- xf:double(string $srcval) => double
String constructors
- xf:string(string $srcval) => string
- xf:normalizedString(string $srcval) => normalizedString
- xf:token(string $srcval) => token
- xf:language(string $srcval) => language
- xf:Name(string $srcval) => Name
- xf:NMTOKEN(string $srcval) => NMTOKEN
- xf:NCName(string $srcval) => NCName
- xf:ID(string $srcval) => ID
- xf:IDREF(string $srcval) => IDREF
- xf:ENTITY(string $srcval) => ENTITY
Boolean constructors:
- xf:true() => boolean
- xf:false() => boolean
- xf:boolean-from-string(string $srcval) => boolean
Duration and Datetime constructors:
- xf:duration(string $srcval) => duration
- xf:dateTime(string $srcval) => dateTime
- xf:date(string $srcval) => date
- xf:time(string $srcval) => time
- xf:gYearMonth(string $srcval) => gYearMonth
- xf:gYear(string $srcval) => gYear
- xf:gMonthDay(string $srcval) => gMonthDay
- xf:gMonth(string $srcval) => gMonth
- xf:gDay(string $srcval) => gDay
Constructors for QNames
- xf:QName-from-uri(string $paramURI, string $paramLocal) => QName
- xf:QName-from-string(string $param) => QName
- xf:QName(string $paramLocal) => QName
Constructor for anyURI:
- xf:anyURI(string $srcval) => anyURI
Constructors for NOTATION:
- xf:NOTATION(string $srcval) => NOTATION

Arithmetic operators

op:multiply(numeric $operand1, numeric $operand2) => numeric
op:numeric-add(numeric $operand1, numeric $operand2) => numeric
op:numeric-subtract(numeric $operand1, numeric $operand2) => numeric
op:numeric-multiply(numeric $operand1, numeric $operand2) => numeric
op:numeric-divide(numeric $operand1, numeric $operand2) => numeric
op:numeric-mod(numeric $operand1, numeric $operand2) => numeric
op:numeric-unary-plus(numeric $operand) => numeric
op:numeric-unary-minus(numeric $operand) => numeric

Numeric comparison operators

op:numeric-equal(numeric $operand1, numeric $operand2) => boolean
op:numeric-less-than(numeric $operand1, numeric $operand2) => boolean
op:numeric-greater-than(numeric $operand1, numeric $operand2) => boolean
op:numeric-less-than-or-equal(numeric $operand1, numeric $operand2) => boolean
op:numeric-greater-than-or-equal(numeric $operand1, numeric $operand2) => boolean
op:numeric-not-equal(numeric $operand1, numeric $operand2) => boolean

Numeric Functions

xf:floor(double? $srcval) => integer?
xf:ceiling(double? $srcval) => integer?
xf:round(double? $srcval) => integer?

String functions

xf:compare(string? $comparand1, string? $comparand2) => integer?
xf:compare(string? $comparand1, string? $comparand2, anyURI $collationLiteral) => integer?
xf:concat() => string
xf:concat(string? $op1) => string
xf:concat(string? $op1, string? $op2, ...) => string
xf:starts-with(string? $operand1, string? $operand2) => boolean?
xf:starts-with(string? $operand1, string? $operand2, anyURI $collationLiteral) => boolean?
xf:ends-with(string? $operand1, string? $operand2) => boolean?
xf:ends-with(string? $operand1, string? $operand2, anyURI $collationLiteral) => boolean?
xf:contains(string? $operand1, string? $operand2) => boolean?
xf:contains(string? $operand1, string? $operand2, anyURI $collationLiteral) => boolean?
xf:substring(string? $sourceString, decimal? $startingLoc) => string?
xf:substring(string? $sourceString, decimal? $startingLoc, decimal? $length) => string?
xf:string-length(string? $srcval) => integer?
xf:substring-before(string? $operand1, string? $operand2) => string?
xf:substring-before(string? $operand1, string? $operand2, anyURI $collationLiteral) => string?
xf:substring-after(string? $operand1, string? $operand2) => string?
xf:substring-after(string? $operand1, string? $operand2, anyURI $collationLiteral) => string?
xf:normalize-space(string? $srcval) => string?
xf:normalize-unicode(string? $srcval, string $normalizationForm) => string?
xf:upper-case(string? $srcval) => string?
xf:lower-case(string? $srcval) => string?
xf:translate(string? $srcval, string? $mapString, string? $transString) => string?
xf:string-pad(string? $padString, decimal? $padCount) => string?
xf:match(string? $srcval, string? $regexp) => integer*
xf:replace(string? $srcval, string? $regexp, string? $repval) => string?

Regular expressions

Syntax for xf:match() is based on W3C XML Schema Language regular expressions:
Syntax for xf:replace() is based on W3C XML Schema Language regular expressions plus $N in replace patterns to indicate the Nth match.

Boolean Functions

op:boolean-and(boolean $value1, boolean $value2) => boolean
op:boolean-or(boolean $value1, boolean $value2) => boolean
op:boolean-equal(boolean? $value1, boolean? $value2) => boolean?
xf:not(boolean? $srcval) => boolean
xf:not3(boolean? $srcval) => boolean?

Date and time functions

Comparisons of Duration and Datetime Values:
- op:duration-equal(duration $operand1, duration $operand2) => boolean
- op:duration-less-than(duration $operand1, duration $operand2) => boolean
- op:duration-greater-than(duration $operand1, duration $operand2) => boolean
- op:duration-less-than-or-equal(duration $operand1, duration $operand2) => boolean
- op:duration-greater-than-or-equal(duration $operand1, duration $operand2) => boolean
- op:duration-not-equal(duration $operand1, duration $operand2) => boolean
- op:datetime-equal(dateTime $operand1, dateTime $operand2) => boolean
- op:datetime-less-than(dateTime $operand1, dateTime $operand2) => boolean
- op:datetime-greater-than(dateTime $operand1, dateTime $operand2) => boolean
- op:datetime-less-than-or-equal(dateTime $operand1, dateTime $operand2) => boolean
- op:datetime-greater-than-or-equal(dateTime $operand1, dateTime $operand2) => boolean
- op:datetime-not-equal(dateTime $operand1, dateTime $operand2) => boolean
Component Extraction Functions on Datetime Values:
- xf:get-Century-from-dateTime(dateTime? $srcval) => integer?
- xf:get-Century-from-date(date? $srcval) => integer?
- xf:get-hour-from-dateTime(dateTime? $srcval) => integer?
- xf:get-hour-from-time(time? $srcval) => integer?
- xf:get-minutes-from-dateTime(dateTime? $srcval) => integer?
- xf:get-minutes-from-time(time? $srcval) => integer?
- xf:get-seconds-from-dateTime(dateTime? $srcval) => decimal?
- xf:get-seconds-from-time(time? $srcval) => decimal?
- xf:get-timezone-from-dateTime(dateTime? $srcval) => string?
- xf:get-timezone-from-date(date? $srcval) => string?
- xf:get-timezone-from-time(time? $srcval) => string?
Arithmetic Functions on Dates:
- xf:add-days(date? $dateParam, decimal? $incrDays) => date?
Functions and Operators on TimePeriod Values:
- op:get-duration(dateTime $parameter1, dateTime $parameter2) => duration
- op:get-end-dateTime(dateTime $parameter1, duration $parameter2) => dateTime
- xf:get-start-dateTime(dateTime $parameter1, duration $parameter2) => dateTime?

Qualified Name Functions

xf:get-local-name(QName? $srcval) => string?
xf:get-namespace-uri(QName? $srcval) => anyURI?

Binary operators

op:hex-binary-equal(hexBinary $value1, hexBinary $value2) => boolean
op:base64-binary-equal(base64Binary $value1, base64Binary $value2) => boolean

Node Functions and Operators

xf:local-name() => string
xf:local-name(node $srcval) => string
xf:number() => anySimpleType
xf:number(node $srcval) => anySimpleType
op:node-equal(node $parameter1, node $parameter2) => boolean
xf:deep-equal(node $parameter1, node $parameter2) => boolean
xf:deep-equal(node $parameter1, node $parameter2, anyURI $collation) => boolean
op:node-before(node $parameter1, node $parameter2) => boolean
op:node-after(node $parameter1, node $parameter2) => boolean
xf:copy(node? $srcval) => node?
xf:shallow(node? $srcval) => node?
xf:if-absent((elementNode | attributeNode)? $node, anySimpleType $value) => (elementNode | attributeNode | anySimpleType)?
xf:if-empty((elementNode | attributeNode)? $node, anySimpleType $value) => (elementNode | attributeNode | anySimpleType)

Sequence Functions

op:to(decimal $firstval, decimal $lastval) => sequence
xf:boolean(item* $srcval) => boolean
op:concatenate(item* $seq1, item* $seq2) => item*
op:item-at(item* $seqParam, decimal $posParam) => item?
xf:index-of(item* $seqParam, item $srchParam) => unsignedInt?
xf:index-of(item* $seqParam, item $srchParam, anyURI $collationLiteral) => unsignedInt?
xf:empty(item* $srcval) => boolean
xf:exists(item* $srcval) => boolean
xf:distinct-nodes(node* $srcval) => node*
xf:distinct-values(item* $srcval) => item*
xf:distinct-values(item* $srcval, anyURI $collationLiteral) => item*
xf:insert(item* $target, decimal $position, item* $inserts) => item*
xf:remove(item* $target, decimal $position) => item*
xf:sublist(item* $sourceSeq, decimal $startingLoc) => item*
xf:sublist(item* $sourceSeq, decimal $startingLoc, decimal $length) => item*
xf:sequence-deep-equal(item* $parameter1, item* $parameter2) => boolean?
xf:sequence-deep-equal(item* $parameter1, item* $parameter2, anyURI $collationLiteral) => boolean?
xf:sequence-node-equal(item*? $parameter1, item*? $parameter2) => boolean?
op:union(item* $parameter1, item* $parameter2) => item*
op:intersect(item* $parameter1, item* $parameter2) => item*
op:except(item* $parameter1, item* $parameter2) => item*
xf:count(item* $srcval) => unsignedInt
xf:avg(item* $srcval) => double?
xf:max(item* $srcval) => anySimpleType?
xf:max(item* $srcval, anyURI $collationLiteral) => anySimpleType?
xf:min(item* $srcval) => anySimpleType?
xf:min(item* $srcval, anyURI $collationLiteral) => anySimpleType?
xf:sum(item* $srcval) => double?
xf:id(IDREF* $srcval) => elementNode*
xf:idref(string* $srcval) => elementNode*
xf:filter(expression $srcval) => node*
xf:document(string? $srcval) => node?

Context Functions

op:context-item() => item
xf:position() => unsignedInt
xf:last() => unsignedInt
op:context-document() => DocumentNode
xf:current-dateTime() => dateTime

Casting Functions

XSLT 2.0

Uses XPath 2.0
Schema Aware
Partially implemented by Michael Kay's Saxon 7.0, http://saxon.sourceforge.net/

XSLT 2.0 Goals

Simplify manipulation of XML Schema-typed content
Simplify manipulation of string content
Support related XML standards
Improve ease of use
Improve interoperability
Improve i18n support
Maintain backward compatibility
Enable improved processor efficiency

XSLT 2.0 Non-goals

Simplifying the ability to parse unstructured information to produce structured results.
Turning XSLT into a general-purpose programming language

XSLT 2.0 Requirements

Must maintain backwards compatibility with XSLT 1.1
Should be able to match elements and attributes whose value is explicitly null.
Should allow included documents to encapsulate local stylesheets
Could support accessing infoset items for XML declaration
Could provide qualified name aware string functions
Could enable constructing a namespace with computed name
Could simplify resolving prefix conflicts in qname-valued attributes
Could support XHTML output method
Must allow matching on default namespace without explicit prefix
Must add date formatting functions
Must simplify accessing IDs and keys in other documents
Should provide function to absolutize relative URIs
Should include unparsed text from an external resource
Should allow authoring extension functions in XSLT
Should output character entity references instead of numeric character entities
Should construct entity reference by name
Should support Unicode string normalization
Should standardize extension element language bindings
Could improve efficiency of transformations on large documents
Could support reverse IDREF attributes
Could support case-insensitive comparisons
Could support lexigraphic string comparisons
Could allow comparing nodes based on document order
Could improve support for unparsed entities
Could allow processing a node with the "next best matching" template
Could make coercions symmetric by allowing scalar to nodeset conversion
Must support XML schema
Must simplify constructing and copying typed content
Must support sorting nodes based on XML schema type
Could support scientific notation in number formatting
Could provide ability to detect whether "rich" schema information is available
Must simplify grouping

Some specific improvements that are likely

Multiple output documents
Variables can be set to node sets; no more result tree fragments.
Existing elements and functions hardly change at all

Identifying 2.0 stylesheets

Namespace is still http://www.w3.org/1999/XSL/Transform
version attribute of xsl:stylesheet has value 2.0

<xsl:stylesheet version="2.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <!-- Top level elements -->

</xsl:stylesheet>

No result tree fragments

The result tree fragment data-type has been eliminated.
Variable-binding elements with content now construct node-sets
These node sets can now be operated on by templates
Functionality previously available with saxon:nodeSet() and similar extension functions
Allows pipelining of templates

xsl:for-each-group

Like xsl:for-each, but orders elements differently
Works well with flat structures
Replaces Muenchian method

Basic syntax:

<xsl:for-each-group
  select = expression
  group-by = "string expression"
  group-adjacent = "string expression"
  group-starting-with = pattern>
  <!-- Content: (xsl:sort*, content-constructor) -->
</xsl:for-each-group>

The select attribute selects the population to be grouped.
The group-by attribute calculates a string value for each node in the population. Nodes with the same value are grouped together.
The group-adjacent attribute calculates a string value for each node in the population. Every time the value changes, a new group is started.
The group-starting-with starts a new group every time its pattern is matched.
group-by, group-adjacent, and group-starting-with are mutually exclusive.

Grouping example: input

Task: Arrange articles in a large, flat document like this by section:

<?xml version="1.0"?>
<backslash>

<story>
<title>ROX Desktop Update</title>
<url>http://slashdot.org/article.pl?sid=02/02/18/180240</url>
<time>2002-02-18 18:50:13</time>
<author>timothy</author>
<department>small-simple-swift</department>
<topic>104</topic>
<comments>32</comments>
<section>developers</section>
<image>topicx.jpg</image>
</story>

<story>
<title>HP Selling Systems With Linux</title>
<url>http://slashdot.org/article.pl?sid=02/02/18/1712241</url>
<time>2002-02-18 17:37:20</time>
<author>timothy</author>
<department>wish-this-wasn't-remarkable</department>
<topic>173</topic>
<comments>188</comments>
<section>articles</section>
<image>topichp.gif</image>
</story>

<story>
<title>Excellent Hacks to the ReplayTV 4000</title>
<url>http://slashdot.org/article.pl?sid=02/02/18/1619213</url>
<time>2002-02-18 16:46:04</time>
<author>CmdrTaco</author>
<department>hardware-I-lust-after</department>
<topic>129</topic>
<comments>117</comments>
<section>articles</section>
<image>topictv.jpg</image>
</story>

<story>
<title>Peek-a-Boo(ty)</title>
<url>http://slashdot.org/article.pl?sid=02/02/18/1546226</url>
<time>2002-02-18 15:58:06</time>
<author>Hemos</author>
<department>pirate-treasure</department>
<topic>158</topic>
<comments>207</comments>
<section>articles</section>
<image>topicprivacy.gif</image>
</story>

<story>
<title>Self-Shredding E-Mail</title>
<url>http://slashdot.org/article.pl?sid=02/02/18/1343202</url>
<time>2002-02-18 14:37:45</time>
<author>timothy</author>
<department>plausible-deniability</department>
<topic>158</topic>
<comments>170</comments>
<section>articles</section>
<image>topicprivacy.gif</image>
</story>

<story>
<title>CIA &amp;amp; KGB Gadgets On Display</title>
<url>http://slashdot.org/article.pl?sid=02/02/18/0252219</url>
<time>2002-02-18 13:52:04</time>
<author>Hemos</author>
<department>looking-a-tthe-gear</department>
<topic>126</topic>
<comments>103</comments>
<section>articles</section>
<image>topictech2.gif</image>
</story>

<story>
<title>Re-Building the Wright Flyer</title>
<url>http://slashdot.org/article.pl?sid=02/02/18/060257</url>
<time>2002-02-18 12:29:12</time>
<author>timothy</author>
<department>we-hope-they-wear-modern-helmets</department>
<topic>126</topic>
<comments>132</comments>
<section>science</section>
<image>topictech2.gif</image>
</story>

<story>
<title>How to Fix the Unix Configuration Nightmare</title>
<url>http://slashdot.org/article.pl?sid=02/02/18/0248248</url>
<time>2002-02-18 10:48:36</time>
<author>Hemos</author>
<department>fixing-the-problem</department>
<topic>130</topic>
<comments>367</comments>
<section>articles</section>
<image>topicunix.jpg</image>
</story>

<story>
<title>Sleep Less, Live Longer</title>
<url>http://slashdot.org/article.pl?sid=02/02/18/0511253</url>
<time>2002-02-18 07:38:15</time>
<author>timothy</author>
<department>if-you're-reading-this</department>
<topic>134</topic>
<comments>309</comments>
<section>science</section>
<image>topicscience.gif</image>
</story>

<story>
<title>Warming and Slowing the World</title>
<url>http://slashdot.org/article.pl?sid=02/02/18/0243253</url>
<time>2002-02-18 04:39:39</time>
<author>Hemos</author>
<department>slowing-things-down</department>
<topic>134</topic>
<comments>312</comments>
<section>science</section>
<image>topicscience.gif</image>
</story>

</backslash>

Grouping example: desired output

<?xml version="1.0"?>
<forwardslash>

<section>
  <title>developers</title>
<story>
<title>ROX Desktop Update</title>
<url>http://slashdot.org/article.pl?sid=02/02/18/180240</url>
<time>2002-02-18 18:50:13</time>
<author>timothy</author>
<department>small-simple-swift</department>
<topic>104</topic>
<comments>32</comments>
<image>topicx.jpg</image>
</story>

</section>

<section>
  <title>articles</title>

<story>
<title>HP Selling Systems With Linux</title>
<url>http://slashdot.org/article.pl?sid=02/02/18/1712241</url>
<time>2002-02-18 17:37:20</time>
<author>timothy</author>
<department>wish-this-wasn't-remarkable</department>
<topic>173</topic>
<comments>188</comments>
<image>topichp.gif</image>
</story>

<story>
<title>Excellent Hacks to the ReplayTV 4000</title>
<url>http://slashdot.org/article.pl?sid=02/02/18/1619213</url>
<time>2002-02-18 16:46:04</time>
<author>CmdrTaco</author>
<department>hardware-I-lust-after</department>
<topic>129</topic>
<comments>117</comments>
<image>topictv.jpg</image>
</story>

<story>
<title>Peek-a-Boo(ty)</title>
<url>http://slashdot.org/article.pl?sid=02/02/18/1546226</url>
<time>2002-02-18 15:58:06</time>
<author>Hemos</author>
<department>pirate-treasure</department>
<topic>158</topic>
<comments>207</comments>
<image>topicprivacy.gif</image>
</story>

<story>
<title>Self-Shredding E-Mail</title>
<url>http://slashdot.org/article.pl?sid=02/02/18/1343202</url>
<time>2002-02-18 14:37:45</time>
<author>timothy</author>
<department>plausible-deniability</department>
<topic>158</topic>
<comments>170</comments>
<image>topicprivacy.gif</image>
</story>

<story>
<title>CIA &amp;amp; KGB Gadgets On Display</title>
<url>http://slashdot.org/article.pl?sid=02/02/18/0252219</url>
<time>2002-02-18 13:52:04</time>
<author>Hemos</author>
<department>looking-a-tthe-gear</department>
<topic>126</topic>
<comments>103</comments>
<image>topictech2.gif</image>
</story>


<story>
<title>How to Fix the Unix Configuration Nightmare</title>
<url>http://slashdot.org/article.pl?sid=02/02/18/0248248</url>
<time>2002-02-18 10:48:36</time>
<author>Hemos</author>
<department>fixing-the-problem</department>
<topic>130</topic>
<comments>367</comments>
<image>topicunix.jpg</image>
</story>


</section>
<section>
  <title>science</title>


<story>
<title>Re-Building the Wright Flyer</title>
<url>http://slashdot.org/article.pl?sid=02/02/18/060257</url>
<time>2002-02-18 12:29:12</time>
<author>timothy</author>
<department>we-hope-they-wear-modern-helmets</department>
<topic>126</topic>
<comments>132</comments>
<image>topictech2.gif</image>
</story>


<story>
<title>Sleep Less, Live Longer</title>
<url>http://slashdot.org/article.pl?sid=02/02/18/0511253</url>
<time>2002-02-18 07:38:15</time>
<author>timothy</author>
<department>if-you're-reading-this</department>
<topic>134</topic>
<comments>309</comments>
<section>science</section>
<image>topicscience.gif</image>
</story>

<story>
<title>Warming and Slowing the World</title>
<url>http://slashdot.org/article.pl?sid=02/02/18/0243253</url>
<time>2002-02-18 04:39:39</time>
<author>Hemos</author>
<department>slowing-things-down</department>
<topic>134</topic>
<comments>312</comments>
<section>science</section>
<image>topicscience.gif</image>
</story>

</section>

</forwardslash>

Grouping example: stylesheet

<?xml version="1.0"?> 
<xsl:stylesheet version="2.0" 
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:template match="/">
    <forwardslash>
      <xsl:apply-templates select="*"/>
    </forwardslash>
  </xsl:template>

  <xsl:template match="backslash">
    <xsl:for-each-group select="story" group-by="section">
      <section>
        <title><xsl:value-of select="current-group()/section"/></title>
        <xsl:apply-templates select="."/>
      </section>
    </xsl:for-each-group>
  </xsl:template>

  <xsl:template match="story">
    <story>
      <xsl:apply-templates/>
    </story>
  </xsl:template>

  <xsl:template match="*">
    <xsl:copy-of select="."/>
  </xsl:template>

  <xsl:template match="section"/>

</xsl:stylesheet>

xsl:destination

Determines the URI of the principal result tree; i.e. the main output document

Syntax:

<!-- Category: declaration -->
<xsl:destination
  format = "QualifiedName"
  href   = "uri-reference" />

The format attribute names an xsl:output element for this result document.

xsl:result-document

Determines the URI of a secondary result tree; there can be several of these.
Allows you to generate multiple documents from one source document
Previously available with extension functions like xt:document and saxon:output

Syntax:

<!-- Category: instruction -->
<xsl:result-document
  format = "QualifiedName"
  href   = "uri-reference">
  <!-- Content: content-constructor -->
</xsl:result-document>

The format attribute names an xsl:output element for this result document.

xsl:result-document Example

     <xsl:output name="ccl:html" method="html" encoding="ISO-8859-1" />

     <xsl:result-document href="index.html" format="ccl:html">
       <html>
         <head>
           <title><xsl:value-of select="title"/></title>         
         </head>
         <body> 
           <h1 align="center"><xsl:value-of select="title"/></h1> 
           <ul>
             <xsl:for-each select="slide">
               <li><a href="{format-number(position(),'00')}.html"><xsl:value-of select="title"/></a></li>
             </xsl:for-each>    
           </ul>           
           
           <p><a href="{translate(title,' ', '_')}.html">Entire Presentation as Single File</a></p>
              
           <hr/>
           <div align="center">
             <A HREF="01.html">Start</A> | <A HREF="/xml/">Cafe con Leche</A>
           </div>
           <hr/>
           <font size="-1">
              Copyright 2002 
              <a href="http://www.elharo.com/">Elliotte Rusty Harold</a><br/>       
              <a href="mailto:elharo@metalab.unc.edu">elharo@metalab.unc.edu</a><br/>
              Last Modified <xsl:apply-templates select="last_modified" mode="lm"/>
           </font>
         </body>     
       </html>     
     </xsl:result-document>

Sorting

<xsl:sort-key
  name = "Qualified Name">
  <!-- Content: (xsl:sort+) -->
</xsl:sort-key>

xsl:namespace

Attaches an additional namespace node to a result tree element
Rarely necessary; normally the usual XSLT 1.0 namespace declarations are sufficient.
Occasionally useful if the output document uses a namespace prefix exclusively in element content or attribute values

<xsl:namespace name="xsd">http://www.w3.org/2001/XMLSchema</xsl:namespace>

Value of a sequence

Separator attribute identifies value placed between string value of each member of sequence

<x><xsl:value-of select="(1,2,3,4)" separator=" | "/></x>

<x>1 | 2 | 3 | 4</x>

default-xpath-namespace

An attribute that specifies the default namespace in effect for unprefixed element names used in XPath expressions within this element and its descendants
Can be used on literal result elements, in which case it is in the XSLT namespace and the attribute is prefixed as xsl:default-xpath-namespace

An XSLT 1.0 stylesheet for working with XHTML

XPath expressions must use a prefix to match XHTML element names.

<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns="http://www.w3.org/1999/xhtml" 
  xmlns:html="http://www.w3.org/1999/xhtml" 
>

  <xsl:output method="html" encoding="ISO-8859-1"/>

  <xsl:template match="week">
    <html xml:lang="en" lang="en">
      <head><title><xsl:value-of select="//html:h1[1]"/></title></head>
      <body bgcolor="#ffffff" text="#000000">

        <xsl:apply-templates select="html:body"/>

        <font size="-1">Last Modified Mon June 5, 2001<br />
          Copyright 2001 Elliotte Rusty Harold<br />
          <a href="mailto:elharo@metalab.unc.edu">elharo@metalab.unc.edu</a>
        </font>
      </body>
    </html>
  </xsl:template>

  <xsl:template match="html:body">
    <xsl:apply-templates 
      select="text()[count(following-sibling::html:hr)>1]|*[count(following-sibling::html:hr)>1]" />

    <hr/>
  </xsl:template>

  <xsl:template match="html:*">
    <xsl:copy>
      <xsl:for-each select="@*">
        <xsl:copy-of select="."/>
      </xsl:for-each>
      <xsl:apply-templates/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="html:font[@size='-1']"></xsl:template>

  <xsl:template match="html:a">
    <xsl:apply-templates/>
  </xsl:template>

  <xsl:template match="html:applet">
    <xsl:apply-templates/>
  </xsl:template>

  <xsl:template match="html:param"/>

</xsl:stylesheet>

An XSLT 2.0 stylesheet for working with XHTML

XPath expressions can use customary, non-prefixed XHTML element names.

<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns="http://www.w3.org/1999/xhtml" 
  default-xpath-namespace="http://www.w3.org/1999/xhtml"
>

  <xsl:output method="html" encoding="ISO-8859-1"/>

  <xsl:template match="week">
    <html xml:lang="en" lang="en">
      <head><title><xsl:value-of select="//h1[1]"/></title></head>
      <body bgcolor="#ffffff" text="#000000">

        <xsl:apply-templates select="body"/>

        <font size="-1">Last Modified Mon June 5, 2001<br />
          Copyright 2001 Elliotte Rusty Harold<br />
          <a href="mailto:elharo@metalab.unc.edu">elharo@metalab.unc.edu</a>
        </font>
      </body>
    </html>
  </xsl:template>

  <xsl:template match="body">
    <xsl:apply-templates 
     select="text()[count(following-sibling::hr)>1]|*[count(following-sibling::hr)>1]"/>

    <hr/>
  </xsl:template>

  <xsl:template match="*">
    <xsl:copy>
      <xsl:for-each select="@*">
        <xsl:copy-of select="."/>
      </xsl:for-each>
      <xsl:apply-templates/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="font[@size='-1']"></xsl:template>

  <xsl:template match="a">
    <xsl:apply-templates/>
  </xsl:template>

  <xsl:template match="applet">
    <xsl:apply-templates/>
  </xsl:template>

  <xsl:template match="param"/>

</xsl:stylesheet>

User defined data elements

Top-level elements in some namespace other than the XSLT namespace (No namespace is not allowed.)
Exact interpretation is processor specific, but may not change the meaning of cusotmary elements
Possible uses:
- Data for extension instructions and extension functions
- Information about what to do with the result tree
- Information about how to obtain the source tree
- Optimization hints
- Metadata about the stylesheet
- Documentation for the stylesheet

Typed variables and parameters

Variables can have a type:

Syntax:

<xsl:variable
  name   = "QualifiedName"
  select = expression
  type   = datatype>
  <!-- Content: content-constructor -->
</xsl:variable>

<xsl:param
  name   = "QualifiedName"
  select = expression
  type   = datatype>
  <!-- Content: content-constructor -->
</xsl:param>

Constants for types remain to be determined

Functions defined in XSLT

<xsl:function name="math:factorial"
 xmlns:fib="http://www.example.com/math"
 exclude-result-prefixes="math">
  <xsl:param name="index" type="xsd:nonNegativeInteger"/>
  <xsl:result type="xsd:positiveInteger"
    select="if ($sentence eq 0) then 1
            else math:factorial(index - 1)/>
</xsl:function>

Including text files

sequence unparsed-text(sequence uris, String encoding?)

For example,

<include_as_text source="bib.xml"/>

<xsl:template match="include_as_text">
  <xsl:value-of select="unparsed-text(@source)"/>
</xsl:template>

XQuery

Three parts:

A data model for XML documents based on the XML Infoset
A mathematically precise query algebra; that is, a set of query operators on that data model
A query language based on these query operators and this algebra

XQuery Language

A fourth generation declarative language like SQL; not a procedural language like Java or a functional language like XSLT
Queries operate on single documents or fixed collections of documents.
Queries select whole documents or subtrees of documents that match conditions defined on document content and structure
Can construct new documents based on what is selected
No updates or inserts!

Documents to Query

Narrative documents and collections of such documents; e.g. generate a table of contents for a book
Data-oriented documents; e.g. SQL-like queries of an XML dump of a database
Filtering streams to process logs of email messages, network packets, stock market data, newswire feeds, EDI, or weather data to filter and route messages represented in XML, to extract data from XML streams, or to transform data in XML streams.
XML views of non-XML data

Physical Representations to Query

Files on a disk
Native-XML databases like Software AG's Tamino
DOM trees in memory
Streaming data
Other representations of the infoset

Where is XQuery used?

Direct query tools at command line
GUI query tools
JSP, ASP, PHP, and other such server side technologies
Programs written in Java, C++, and other languages that need to extract data from XML documents
Others are possible
Anywhere SQL is used to extract data from a database, XQuery is used to extract data from an XML document.
SQL is a non-compiled language that must be processed by some other tool to extract data from a database. So is XQuery.

The XML Model vs. the Relational Model

A relational database contains tables	An XML database contains collections
A relational table contains records with the same schema	A collection contains XML documents with the same DTD
A relational record is an unordered list of named values	An XML document is a tree of nodes
A SQL query returns an unordered set of records	An XQuery returns an unordered sequence of nodes

Query Data Types

XML 1.0 #PCDATA
Schema primitive types: positiveInteger, String, float, double, unsignedLong, gYear, date, time, boolean, etc.
Schema complex types
Collections of these types
References to these types

An example document to query

Most of the examples in this talk query this bibliography document at the (relative) URL bib.xml:

<bib>
<book year="1994">
<title>TCP/IP Illustrated</title>
<author><last>Stevens</last><first>W.</first></author>
<publisher>Addison-Wesley</publisher>
<price> 65.95</price>
</book>

<book year="1992">
<title>Advanced Programming in the Unix Environment</title>
<author><last>Stevens</last><first>W.</first></author>
<publisher>Addison-Wesley</publisher>
<price>65.95</price>
</book>

<book year="2000">
<title>Data on the Web</title>
<author><last>Abiteboul</last><first>Serge</first></author>
<author><last>Buneman</last><first>Peter</first></author>
<author><last>Suciu</last><first>Dan</first></author>
<publisher>Morgan Kaufmann Publishers</publisher>
<price> 39.95</price>
</book>

<book year="1999">
<title>The Economics of Technology and Content for Digital TV</title>
<editor>
<last>Gerbarg</last><first>Darcy</first>
<affiliation>CITI</affiliation>
</editor>
<publisher>Kluwer Academic Publishers</publisher>
<price>129.95</price>
</book>

</bib>

Adapted from Mary Fernandez, Jerome Simeon, and Phil Wadler: XML Query Languages: Experiences and Exemplars, 1999, as adapted in XML Query Use Cases

The XQuery FLWR

FOR: each node selected by an XPath 2.0 location path
LET: a new variable have a specified value
WHERE: a condition expressed in XPath is true
RETURN: this node set

Query: List titles of all books

   FOR $t IN document("bib.xml")/bib/book/title
   RETURN
      $t

Adapted from XML Query Use Cases

Query Result: Book Titles

  <title>TCP/IP Illustrated</title>
  <title>Advanced Programming in the Unix Environment</title>
  <title>Data on the Web</title>
  <title>The Economics of Technology and Content for Digital TV</title>

Adapted from XML Query Use Cases

XQueryX

An XML Syntax for XQuery
Intended for machine processing and programmer convenience, not for human legibility

In XQuery:

   FOR $t IN document("bib.xml")/bib/book/title
   RETURN
      $t

In XQueryX:

<?xml version="1.0"?>
<xq:query xmlns:xq="http://www.w3.org/2001/06/xqueryx">
  <xq:flwr>
    <xq:forAssignment variable="$t">
      <xq:step axis="CHILD">
        <xq:function name="document">
          <xq:constant datatype="CHARSTRING">bib.xml</xq:constant>
        </xq:function>
        <xq:identifier>bib</xq:identifier>
      </xq:step>
      <xq:step axis="CHILD">
        <xq:identifier>book</xq:identifier>
      </xq:step>
      <xq:step axis="CHILD">
        <xq:identifier>title</xq:identifier>
      </xq:step>
    </xq:forAssignment>
    <xq:return>
      <xq:variable>$t</xq:variable>
    </xq:return>
  </xq:flwr>
</xq:query>

Element Constructors

Tags are given as literals
XQuery expression which is evaluated to become the contents of the element is enclosed in curly braces
The contents can also contain literal text outside the braces

List titles of all books in a bib element. Put each title in a book element.

<bib>
  {
   FOR $t IN document("bib.xml")/bib/book/title
   RETURN
    <book>
     { $t }
    </book>
  }
</bib>

Adapted from XML Query Use Cases

Query Result: Book Titles

<bib>
  <book>
    <title>TCP/IP Illustrated</title>
  </book>
  <book>
    <title>Advanced Programming in the Unix Environment</title>
  </book>
  <book>
    <title>Data on the Web</title>
  </book>
  <book>
    <title>The Economics of Technology and Content for Digital TV</title>
  </book>
</bib>

Adapted from XML Query Use Cases

Query with WHERE

List titles of books published by Addison-Wesley

<bib>
 {
   FOR $b IN document("bib.xml")/bib/book
   WHERE $b/publisher = "Addison-Wesley"
   RETURN
      $b/title 
  }
</bib>

This WHERE clause could be replaced by an XPath predicate:

<bib>
 {
   FOR $b IN document("bib.xml")/bib/book[publisher="Addison-Wesley"]
   RETURN
      $b/title 
 }
</bib>

But WHERE clauses can combine multiple variables from multiple documents

Adapted from XML Query Use Cases

Query Result: Titles of books published by Addison-Wesley

<bib>
    <title>TCP/IP Illustrated</title>
    <title>Advanced Programming in the Unix Environment</title>
</bib>

Adapted from XML Query Use Cases

Query with Booleans

XQuery booleans include:
- AND
- OR
- NOT()

List books published by Addison-Wesley after 1993:

<bib>
 {
   FOR $b IN document("bib.xml")/bib/book
   WHERE $b/publisher = "Addison-Wesley" AND $b/@year > 1993
   RETURN
      $b/title 
 }
</bib>

Adapted from XML Query Use Cases

Query Result: books published by Addison-Wesley after 1993

<bib>
    <title>Advanced Programming in the Unix Environment</title>
</bib>

Adapted from XML Query Use Cases

Attribute Constructors

List books published by Addison-Wesley after 1993, including their year and title:

<bib>
 {
   FOR $b IN document("bib.xml")/bib/book
   WHERE $b/publisher = "Addison-Wesley" AND $b/@year > 1993
   RETURN
    <book year = { $b/@year }>
     { $b/title }
    </book>
 }
</bib>

This is not well-formed XML!

Adapted from XML Query Use Cases

Query Result: books published by Addison-Wesley after 1993, including their year and title.

<bib>
  <book year="1992">
    <title>Advanced Programming in the Unix Environment</title>
  </book>
</bib>

Adapted from XML Query Use Cases

Query with multiple variables

Create a list of all the title-author pairs, with each pair enclosed in a result element.

<results>
 {
   FOR $b IN document("bib.xml")/bib/book,
     $t IN $b/title,
     $a IN $b/author
   RETURN
    <result>
    { $t }
    { $a }
    </result>
  }
</results>

Adapted from XML Query Use Cases

Query Result: A list of all the title-author pairs

<results>
    <result>
         <title>TCP/IP Illustrated</title>
         <author><last>Stevens</last><first>W.</first></author>
    </result>
    <result>
         <title>Advanced Programming in the Unix Environment</title>
         <author><last>Stevens</last><first>W.</first></author>
    </result>
    <result>
         <title>Data on the Web</title>
         <author><last>Abiteboul</last><first>Serge</first></author>
    </result>
    <result>
         <title> Data on the Web</title>
         <author><last>Buneman</last><first>Peter</first></author>
    </result>
    <result>
         <title>Data on the Web</title>
         <author><last>Suciu</last><first>Dan</first></author>
    </result>
</results>

Adapted from XML Query Use Cases

Nested Queries

For each book in the bibliography, list the title and authors, grouped inside a result element.

<results>
 {
   FOR $b IN document("bib.xml")/bib/book
   RETURN
    <result>
     { $b/title }
     {  
       FOR $a IN $b/author
       RETURN $a
     }
    </result>
 }
</results>

Adapted from XML Query Use Cases

Query Result: A list of the title and authors of each book in the bibliography

<?xml version="1.0"?>
<results xmlns:ino="http://namespaces.softwareag.com/tamino/response2" xmlns:xql="http://metalab.unc.edu/xql/">
  <result>
    <title>TCP/IP Illustrated</title>
    <author>
      <last>Stevens</last>
      <first>W.</first>
    </author>
  </result>
  <result>
    <title>Advanced Programming in the Unix Environment</title>
    <author>
      <last>Stevens</last>
      <first>W.</first>
    </author>
  </result>
  <result>
    <title>Data on the Web</title>
    <author>
      <last>Abiteboul</last>
      <first>Serge</first>
    </author>
    <author>
      <last>Buneman</last>
      <first>Peter</first>
    </author>
    <author>
      <last>Suciu</last>
      <first>Dan</first>
    </author>
  </result>
  <result>
    <title>The Economics of Technology and Content for Digital TV</title>
  </result>
</results>

Adapted from XML Query Use Cases

Query with distinct

For each author in the bibliography, list the author's name and the titles of all books by that author, grouped inside a result element.

<results>
 {
   FOR $a IN distinct-values(document("bib.xml")//author)
   RETURN
    <result>
     { $a }
     {  FOR $b IN document("bib.xml")/bib/book[author=$a]
        RETURN $b/title
     }
    </result>
 }
</results>

Adapted from XML Query Use Cases

Query Result

<results>
  <result>
    <author><last>Stevens</last><first>W.</first></author>
    <title>TCP/IP Illustrated</title>
    <title>Advanced Programming in the Unix Environment</title>
  </result>

  <result>
    <author><last>Abiteboul</last><first>Serge</first></author>
    <title>Data on the Web</title>
  </result>

  <result>
    <author><last>Buneman</last><first>Peter</first></author>
    <title>Data on the Web</title>
  </result>

  <result>
    <author><last>Suciu</last><first>Dan</first></author>
      <title>Data on the Web</title>
  </result>
</results>

Adapted from XML Query Use Cases

Query with sorting

List the titles and years of all books published by Addison-Wesley after 1991, in alphabetic order.

<bib>
 {
   FOR $b IN document("bib.xml")//book
    [publisher = "Addison-Wesley" AND @year > "1991"]
   RETURN
    <book>
     { $b/@year } { $b/title }
    </book> SORTBY (title)
 }
</bib>

Adapted from XML Query Use Cases

Query Result

<bib>
  <book year="1992">
    <title>Advanced Programming in the Unix Environment</title>
  </book>
  <book year="1994">
    <title>TCP/IP Illustrated</title>
   </book>
</bib>

Adapted from XML Query Use Cases

Queries with functions

Find books in which some element has a tag ending in "or" and the same element contains the string "Suciu" (at any level of nesting). For each such book, return the title and the qualifying element.

<result>
  FOR $b IN document("bib.xml")//book,
    $e IN $b/*[contains(string(.), "Suciu")]
  WHERE ends_with(name($e), "or") 
  RETURN
   <book>
    { $b/title} { $e }
   </book>
</result>

Not supported by Quip yet

Adapted from XML Query Use Cases

Query Result

<result>
 <book>
  <title> Data on the Web </title>
  <author> <last> Suciu </last> <first> Dan </first> </author>
 </book>
</result>

Adapted from XML Query Use Cases

A different document about books

Sample data at "reviews.xml":

<reviews>
  <entry>
    <title>Data on the Web</title>
    <price>34.95</price>
    <review>
       A very good discussion of semi-structured database
       systems and XML.
    </review>
  </entry>
  <entry>
    <title>Advanced Programming in the Unix Environment</title>
    <price>65.95</price>
    <review>
      A clear and detailed discussion of UNIX programming.
    </review>
  </entry>
  <entry>
    <title>TCP/IP Illustrated</title>
    <price>65.95</price>
    <review>
      One of the best books on TCP/IP.
    </review>
  </entry>
</reviews>

Adapted from XML Query Use Cases

This document uses a different DTD

<!ELEMENT reviews (entry*)>
<!ELEMENT entry   (title, price, review)>
<!ELEMENT title   (#PCDATA)>
<!ELEMENT price   (#PCDATA)>
<!ELEMENT review  (#PCDATA)>

Query that joins two documents

For each book found in both bib.xml and reveiws.xml, list the title of the book and its price from each source.

<books-with-prices>
 {
   FOR $b IN document("bib.xml")//book,
     $a IN document("reviews.xml")//entry
   WHERE $b/title = $a/title
   RETURN
    <book-with-prices>
     { $b/title },
       <price-amazon> { $a/price/text() } </price-amazon>
       <price-bn> { $b/price/text() } </price-bn>
    </book-with-prices>
 }
</books-with-prices>

Adapted from XML Query Use Cases

Result

<books-with-prices>
  <book-with-prices>
    <title>TCP/IP Illustrated</title>
    <price-amazon>65.95</price-amazon>
    <price-bn>65.95</price-bn>
  </book-with-prices>

  <book-with-prices>
    <title>Advanced Programming in the Unix Environment</title>
    <price-amazon>65.95</price-amazon>
    <price-bn>65.95</price-bn>
  </book-with-prices>

  <book-with-prices>
    <title>Data on the Web</title>
    <price-amazon>34.95</price-amazon>
    <price-bn>39.95</price-bn>
  </book-with-prices>
</books-with-prices>

Adapted from XML Query Use Cases

prices.xml Query Sample Data

The next query also uses an input document named "prices.xml":

<prices>
  <book>
    <title>Advanced Programming in the Unix Environment</title>
    <source>www.amazon.com</source>
    <price>65.95</price>
  </book>
  <book>
    <title>Advanced Programming in the Unix Environment </title>
    <source>www.bn.com</source>
    <price>65.95</price>
  </book>
  <book>
    <title>TCP/IP Illustrated </title>
    <source>www.amazon.com</source>
    <price>65.95</price>
  </book>
  <book>
    <title>TCP/IP Illustrated </title>
    <source>www.bn.com</source>
    <price>65.95</price>
  </book>
  <book>
    <title>Data on the Web</title>
    <source>www.amazon.com</source>
    <price>34.95</price>
  </book>
  <book>
    <title>Data on the Web</title>
    <source>www.bn.com</source>
    <price>39.95</price>
  </book>
</prices>

Adapted from XML Query Use Cases

Query with reused variables

In the document "prices.xml", find the minimum price for each book, in the form of a minprice element with the book title as its title attribute.

<results>
 {
   FOR $t IN distinct(document("prices.xml")/book/title)
   LET $p := $doc/book[title = $t]/price
   RETURN
    <minprice title = { $t/text() } >
     { min($p) }
    </minprice>
 }
</results>

Adapted from XML Query Use Cases

Query Result

<results>
  <minprice title="Advanced Programming in the Unix Environment"> 65.95 </minprice>
  <minprice title="TCP/IP Illustrated"> 65.95 </minprice>
  <minprice title="Data on the Web"> 34.95 </minprice>
</results>

Adapted from XML Query Use Cases

Multiple FLWR Queries

For each book with an author, return a book with its title and authors. For each book with an editor, return a reference with the book title and the editor's affiliation.

<bib>
 {
   FOR $b IN document("bib.xml")//book[author]
   RETURN
    <book>
     { $b/title }
     { $b/author }
    </book>,
   FOR $b IN document("bib.xml")//book[editor]
   RETURN
    <reference>
     { $b/title }
     <org> { $b/editor/affiliation/text() } </org>
    </reference>
 }
</bib>

Adapted from XML Query Use Cases

Query Result

<bib>
    <book>
         <title>TCP/IP Illustrated</title>
         <author><last> Stevens </last> <first> W.</first></author>
    </book>

    <book>
         <title>Advanced Programming in the Unix Environment</title>
         <author><last>Stevens</last><first>W.</first></author>
    </book>

    <book>
         <title>Data on the Web</title>
         <author><last>Abiteboul</last><first>Serge</first></author>
         <author><last>Buneman</last><first>Peter</first></author>
         <author><last>Suciu</last><first>Dan</first></author>
    </book>

    <reference>
        <title>The Economics of Technology and Content for Digital TV</title>
        <org>CITI</org>
    </reference>
</bib>

Adapted from XML Query Use Cases

Query Software

Quip: http://www.softwareag.com/developer/quip/
Quilt: http://www.almaden.ibm.com/cs/people/chamberlin/quilt.html
Kweelt: http://db.cis.upenn.edu/Kweelt/
Tamino: http://www.softwareag.com/tamino/
Ipedo: http://www.ipedo.com/

What's the difference between XQuery and XSLT?

XSLT is document-driven; XQuery is program driven
XSLT is functional; XQuery is imperative
XSLT is written in XML; XQuery is not
An assertion (unproven): XSLT 2.0 can do everything XQuery can do

To Learn More

This presentation: http://www.cafeconleche.org/slides/xmlone/london2002/advancedxml/
XSLT 2.0 Working Draft: http://www.w3.org/TR/xslt20/
XPath 2.0 Requirements: http://www.w3.org/TR/2001/WD-xpath20req-20010214
XSLT 2.0 Requirements: http://www.w3.org/TR/2001/WD-xslt20req-20010214
XQuery: A Query Language for XML: http://www.w3.org/TR/xquery/
XML Query Requirements: http://www.w3.org/TR/xmlquery-req
XML Query Use Cases: http://www.w3.org/TR/xmlquery-use-cases
XML Query Data Model: http://www.w3.org/TR/query-datamodel/
The XML Query Algebra: http://www.w3.org/TR/query-algebra/
XML Syntax for XQuery 1.0 (XQueryX): http://www.w3.org/TR/xqueryx
XQuery 1.0 and XPath 2.0 Functions and Operators Version 1.0: http://www.w3.org/TR/xquery-operators/

Part IV: SAX 2.1

Actually, SAX2 has ** MUCH ** better infoset support than DOM does. Yes, I've done the detailed analysis.

--David Brownell on the xml-dev mailing list

Goals

Full Infoset support
Backwards compatible with SAX2
Much less radical changes than from SAX1 to SAX2

Specified vs. Defaulted Attributes

Infoset includes a flag saying whether a given attribute value was specified in the instance document or defaulted from the DTD.
DOM also wants to know this

Solution:

package org.xml.sax.ext;

public interface Attributes2 extends Attributes {

  public boolean isSpecified (int index);
  public boolean isSpecified (String uri, String localName);
  public boolean isSpecified (String qualifiedName);
  
}

This interface would be implemented by SAX 2.1 Attributes objects provided in startElement() callbacks
The read-only http://xml.org/sax/features/use-attributes2 feature specifies whether Attributes2 is available

standalone declaration

<?xml version="1.0" standalone="yes"?>

The XML Infoset includes a standalone property for documents
Not currently exposed by SAX2
Solution: Define a new read-only feature: http://xml.org/sax/features/is-standalone
Open issue: distinguish between standalone="no" and omitted standalone declaration?

The version and encoding properties

<?xml version="1.0" encoding="UTF-16"?>

Infoset includes the version and encoding from the XML declaration; SAX2 does not.
Unlike standalone, these apply to all parsed entities; not just the document entity
Solution:
```
package org.xml.sax.ext;

public interface Locator2 extends Locator {
  public String getXMLVersion ();
  public String getEncoding ();
}
```
This would be implemented by Locator objects passed to setDocumentLocator() methods

The read-only feature http://xml.org/sax/features/use-locator2 says whether Locator2's are used.
To make matters worse, there can be as many as three encodings:
- What's declared in the document using an encoding declaration in the XML declaration
- The MIME type encoding, as specified by the the HTTP header
- The name of the encoding used by a java.io.InputStreamReader (UTF8 vs. UTF-8)

Feature/Property discovery

There's no way to find out what features and properties a given XMLReader recognizes.
Solution: Define two new read-only properties:

http://xml.org/sax/properties/supported-features

The value of this property is an array of Strings containing the names of all the features supported by this XMLReader.

http://xml.org/sax/properties/supported-properties

The value of this property is an array of Strings containing the names of all the properties supported by this XMLReader.
Or perhaps a method instead of a property?

DefaultHandler infoset extensions

The DeclHandler and LexicalHandler extension handlers are not supported by the DefaultHandler convenience class.

Solution:

Define a new org.xml.sax.ext class implementing those two interfaces, inheriting from org.xml.sax.helpers.DefaultHandler

public class DefaultHandler2 extends DefaultHandler 
  implements DeclHandler, LexicalHandler { 
  
    // LexicalHandler methods
  public void startDTD(String name, String publicId, String systemId)
      throws SAXException {}
  public void endDTD() throws SAXException {} 
  public void startEntity(String name) throws SAXException {}
  public void endEntity(String name) throws SAXException {}
  public void startCDATA() throws SAXException {}
  public void endCDATA() throws SAXException {}
  public void comment(char[] ch, int start, int length)
      throws SAXException {}
      
    // DeclHandler methods
  public void elementDecl(String name, String model)
      throws SAXException {}
  public void attributeDecl(String elementName,
      String attributeName, String type,
      String valueDefault, String value)
      throws SAXException {}
  public void internalEntityDecl(String name, String value)
      throws SAXException {}
  public void externalEntityDecl(String name, String publicID, 
      String systemID) throws SAXException {}
  
}

Alternately, update DefaultHandler.

Parser identification

Problem: There is no conventional way for applications to identify the version of the parser they are using, for purposes of diagnostics or other kinds of troubleshooting.
The best the JVM supports is the JDK 1.2 java.lang.Package facility, which is dependent on the JAR file metadata. It provides a partial solution, at the price of portability (JDK 1.1 APIs are much more portable) and assumptions like "one parser per package".
Solution: Define a new standard read-only property:

http://xml.org/sax/properties/reader-version

Returns a string identifying the reader and its version for use in diagnostics.

Parsers could support that if desired, probably using some sort of resource-based mechanism (not necessarily Package) to keep such release-specific strings out of the source code.
Open issue: Should there be separate strings to ID the reader (likely a constant value) and its version (ideally assigned in release engineering)?

A Verifier Class as in JDOM

package org.jdom;

public final class Verifier {

  public static final String checkElementName(String name) {}
  public static final String checkAttributeName(String name) {}
  public static final String checkCharacterData(String text) {}
  public static final String checkNamespacePrefix(String prefix) {}
  public static final String checkNamespaceURI(String uri) {}
  public static final String checkProcessingInstructionTarget(String target) {}
  public static final String checkCommentData(String data) {}
 
  public static boolean isXMLCharacter(char c) {}
  public static boolean isXMLNameCharacter(char c) {}
  public static boolean isXMLNameStartCharacter(char c) {}
  public static boolean isXMLLetterOrDigit(char c) {}
  public static boolean isXMLLetter(char c) {}
  public static boolean isXMLCombiningChar(char c) {}
  public static boolean isXMLExtender(char c) {}
  public static boolean isXMLDigit(char c) {}

}

To Learn More

http://sax.sourceforge.net/
Subscribe to the xml-dev mailing list, http://lists.xml.org/archives/xml-dev/

Part V: DOM Level 3

of all of the things the W3C has given us, the DOM is probably the one with the least value.

--Michael Brennan on the xml-dev mailing list

DOM Evolution

DOM Level 0: what was implemented for JavaScript in Netscape 3/IE3
DOM Level 1, a W3C Standard
DOM Level 2, a W3C Standard
DOM Level 3: Several Working Drafts:

New Features in DOM Level 3

Grammar access; a.k.a abstract schemas (DTDs, RELAX NG schemas, W3C XML Schema Language schemas)
Extra (IDL) attributes on Entity, Document, Node, and Text interfaces
Document normalization
Standard means of loading and saving XML documents.
Bootstrapping new documents
Key events

DOM Level 3 Core Changes

DOMKey
Node
Document
Text
Entity
Bootstrapping

DOMKey

Every node gets a unique key automatically generated by the DOM implementation to uniquely identify DOM nodes.
For Java this is just an Object. (Implementations may use more detailed subclasses.)

New methods in Node interface

Adds:

Base URI

The URI this document came from. May be null.

Tree position

The order of a node relative to another reference node in tree order

Methods to set the text content of a node

Methods to test for equality

Methods to work with namespaces

Methods to associate user data with each node
I will only show the new methods. Currently, the plan is to simply add these to the existing Node interface.

Java binding:

package org.w3c.dom;

public interface Node {

  public String getBaseURI();

  public static final short TREE_POSITION_PRECEDING   = 0x01;
  public static final short TREE_POSITION_FOLLOWING   = 0x02;
  public static final short TREE_POSITION_ANCESTOR    = 0x04;
  public static final short TREE_POSITION_DESCENDANT  = 0x08;
  public static final short TREE_POSITION_EQUIVALENT  = 0x10;
  public static final short TREE_POSITION_SAME_NODE   = 0x20;
  public static final short TREE_POSITION_DISCONNECTED = 0x00;

  public int compareTreePosition(Node other) throws DOMException;

  public String  getTextContent() throws DOMException;
  public void    setTextContent(String textContent) throws DOMException;
  
  public Object  setUserData(String key, Object data, UserDataHandler handler);
  public Object  getUserData(String key);

  public boolean isSameNode(Node other);
  public boolean isEqualNode(Node arg, boolean deep);

  public String  lookupNamespacePrefix(String namespaceURI);
  public String  lookupNamespaceURI(String prefix);
  public void    normalizeNS();
  public Node    getAs(String feature);

  public Object  getKey();

}

In IDL:

interface Node {

  readonly attribute DOMString baseURI;

  const unsigned short TREE_POSITION_PRECEDING    = 0x01;
  const unsigned short TREE_POSITION_FOLLOWING    = 0x02;
  const unsigned short TREE_POSITION_ANCESTOR     = 0x04;
  const unsigned short TREE_POSITION_DESCENDANT   = 0x08;
  const unsigned short TREE_POSITION_EQUIVALENT   = 0x10;
  const unsigned short TREE_POSITION_SAME_NODE    = 0x20;
  const unsigned short TREE_POSITION_DISCONNECTED = 0x00;

  unsigned short compareTreePosition(in Node other);
  
  attribute DOMString textContent; // raises(DOMException) on setting
                                   // raises(DOMException) on retrieval

  boolean   isSameNode(in Node other);
  DOMString lookupNamespacePrefix(in DOMString namespaceURI);
  DOMString lookupNamespaceURI(in DOMString prefix);
  boolean   isEqualNode(in Node arg, in boolean deep);
  Node getInterface(in DOMString feature);
  DOMKeyObject setUserData(in DOMString key, in DOMKeyObject data, 
   in UserDataHandler handler);
  DOMKeyObject getUserData(in DOMString key);

};

User Data

New methods in Entity

XML documents may be built from multiple parsed entities, each of which is not necessarily a well-formed XML document, but is at least a plausible part of a well-formed XML document.
Each entity may have its own text declaration. This is like an XML declaration without a standalone attribute and with an optional version attribute:
```
<?xml version="1.0"?>
<?xml version="1.0" encoding="ISO-8859-9"?>
<?xml encoding="ISO-8859-9"?>
```
DOM3 adds:

encoding

A string specifying what encoding the the text declaration claims this entity uses. This is null if this entity is not an external parsed entity.

actualEncoding

A string specifying the actual encoding of this entity. This is null if this entity is not an external parsed entity.

version

A string specifying the XML version given in the text declaration. This is null if this entity is not an external parsed entity.

Java binding:


package org.w3c.dom;

public interface Entity extends Node {
  
  public String getActualEncoding();
  public void   setActualEncoding(String actualEncoding);
  public String getEncoding();
  public void   setEncoding(String encoding);
  public String getVersion();
  public void   setVersion();
  
}

In IDL:

interface Entity : Node {
  attribute DOMString  actualEncoding;
  attribute DOMString  encoding;
  attribute DOMString  version;
};

New methods in Document

Adds:
XML declaration
encoding, version, and standalone attributes:
```
<?xml version="1.0"?>
<?xml version="1.0" encoding="ISO-8859-9"?>
<?xml version="1.0" encoding="ISO-8859-9" standalone="no"?>
<?xml version="1.0" standalone="yes"?>
```
adoptNode()

Method to move node from one document to another

setBaseURI()

Method to set the base URI of the document

Normalization methods

Methods to determine the normal form of a document and transform the document into that form

renameNode()

A method that changes the name and namespace URI of a node in the document.

Error Handlers

Specify a DOMErrorHandler that will be called in the event "that an error is encountered while performing an operation on a document"

Java binding:

package org.w3c.dom;

public interface Document extends Node {

  public String  getActualEncoding();
  public void    setActualEncoding(String actualEncoding);

  public String  getEncoding();
  public void    setEncoding(String encoding);

  public boolean getStandalone();
  public void    setStandalone(boolean standalone);

  public boolean getStrictErrorChecking();
  public void    setStrictErrorChecking(boolean strictErrorChecking);

  public String  getVersion();
  public void    setVersion(String version);

  public Node    adoptNode(Node source) throws DOMException;

  public DOMErrorHandler getErrorHandler();
  public void setErrorHandler(DOMErrorHandler errorHandler);

  public String getDocumentURI();
  public void   setDocumentURI(String documentURI);

  public void    normalizeDocument();
  public boolean canSetNormalizationFeature(String name, boolean state);
  public void    setNormalizationFeature(String name, boolean state)
   throws DOMException;
  public boolean getNormalizationFeature(String name)
   throws DOMException;

  public Node renameNode(Node n, String namespaceURI, String name)
   throws DOMException;
  
}

In IDL:

interface Document : Node {

  attribute DOMString actualEncoding;
  attribute DOMString encoding;
  attribute boolean   standalone;
  attribute boolean   strictErrorChecking;
  attribute DOMString version;
  
  Node adoptNode(in Node source) raises(DOMException);
  void setBaseURI(in DOMString baseURI) raises(DOMException);
  
  attribute boolean         strictErrorChecking;
  attribute DOMErrorHandler errorHandler;
  attribute DOMString documentURI;

  void    normalizeDocument();
  boolean canSetNormalizationFeature(in DOMString name, in boolean state);
  void    setNormalizationFeature(in DOMString name, in boolean state) raises(DOMException);
  boolean getNormalizationFeature(in DOMString name) raises(DOMException);

  Node renameNode(in Node n, in DOMString namespaceURI, in DOMString name) raises(DOMException);
};

New methods in Text

Adds:

isWhitespaceInElementContent()

Returns true if this node contains "ignorable" whitespace

wholeText()

Returns all text of Text nodes logically-adjacent to this node; i.e. the XPath value of the text node

Java binding:

package org.w3c.dom;
  
  public interface Text extends Node {
  
  public boolean getIsWhitespaceInElementContent();
    
  public String  getWholeText();
  public Text    replaceWholeText(String content) throws DOMException;

  }

In IDL:

interface Text : Node {

  readonly attribute boolean   isWhitespaceInElementContent;
  readonly attribute DOMString wholeText;

  Text replaceWholeText(in DOMString content) raises(DOMException);

};

Bootstrapping

DOM2 has no implementation-independent means to create a new Document object
Implementation-dependent methods tend to be fairly complex. For example, in Xerces-J:
DOMImplementation impl = DOMImplementationImpl.getDOMImplementation(); Document fibonacci = impl.createDocument(null, "Fibonacci_Numbers", null);

DOM3 Bootstrapping

Still no language-independent means to create a new Document object
Does provide an implementation-independent method for Java only:
DOMImplementation impl = DOMImplementationRegistry.getDOMImplementation("XML");

package org.w3c.dom;

public class DOMImplementationRegistry { 

  // The system property to specify the DOMImplementationSource class names. 
  public static String PROPERTY = "org.w3c.dom.DOMImplementationSourceList";

  public static DOMImplementation getDOMImplementation(String features)
   throws ClassNotFoundException, InstantiationException, IllegalAccessException;
  public static void addSource(DOMImplementationSource s)
   throws ClassNotFoundException, InstantiationException, IllegalAccessException;

}

DOM Error Handler Interfaces

DOMErrorHandler
DOMLocator

The DOMErrorHandler Interface

Similar to SAX2's ErrorHandler interface.
A callback interface
An application implements this interface and then registers it with the setErrorHandler() method to provide warnings, errors, and fatal errors.

Java binding:

package org.w3c.dom;

public interface DOMErrorHandler {
  
  public boolean handleError(DOMError error);

}

IDL:

interface DOMErrorHandler {
  boolean handleError(in DOMError error);
};

The DOMLocator Interface

Similar to SAX2's Locator interface. An application can implement this interface and then register it with the setLocator() method to find out in which line and column and file a given node appears.

Java binding:

package org.w3c.dom;

public interface DOMLocator {

  public int    getLineNumber();
  public int    getColumnNumber();
  public int    getOffset();
  public Node   getErrorNode();
  public String getUri();

}

IDL:

interface DOMLocator {
  readonly attribute long            lineNumber;
  readonly attribute long            columnNumber;
  readonly attribute long            offset;
  readonly attribute Node errorNode;
  readonly attribute DOMString uri;
};

Load and Save

Loading: parsing an existing XML document to produce a Document object
Saving: serializing a Document object into a file or onto a stream
Completely implementation dependent in DOM2

The DOM Process

Library specific code creates a parser
The parser parses the document and returns a DOM org.w3c.dom.Document object.
The entire document is stored in memory.
DOM methods and interfaces are used to extract data from this object

Parsing documents with DOM2

This program parses with Xerces. Other parsers are different.

import org.apache.xerces.parsers.*;
import org.w3c.dom.*;
import org.xml.sax.*;
import java.io.*;

public class DOMParserMaker {

  public static void main(String[] args) {
     
    DOMParser parser = new DOMParser();
    
    for (int i = 0; i < args.length; i++) {
      try {
        parser.parse(args[i]); 
       
        Document d = parser.getDocument();
      }
      catch (SAXException e) {
        System.err.println(e); 
      }
      catch (IOException e) {
        System.err.println(e); 
      }
      
    }
   
  }

}

Parsing documents with JAXP

import javax.xml.parsers.*; // JAXP
import org.xml.sax.SAXException;
import java.io.IOException;


public class JAXPParserMaker {

  public static void main(String[] args) {
     
    if (args.length <= 0) {
      System.out.println("Usage: java JAXPParserMaker URL");
      return;
    }
    String document = args[0];
    
    try {
      DocumentBuilderFactory factory 
       = DocumentBuilderFactory.newInstance();
      DocumentBuilder parser = factory.newDocumentBuilder();
      parser.parse(document); 
      System.out.println(document + " is well-formed.");
    }
    catch (SAXException e) {
      System.out.println(document + " is not well-formed.");
    }
    catch (IOException e) { 
      System.out.println(
       "Due to an IOException, the parser could not check " 
       + document
      ); 
    }
    catch (FactoryConfigurationError e) { 
      // JAXP suffers from excessive brain-damage caused by 
      // intellectual in-breeding at Sun. (Basically the Sun 
      // engineers spend way too much time talking to each other
      // and not nearly enough time talking to people outside 
      // Sun.) Fortunately, you can happily ignore most of the 
      // JAXP brain damage and not be any the poorer for it.
      
      // This, however, is one of the few problems you can't 
      // avoid if you're going to use JAXP at all. 
      // DocumentBuilderFactory.newInstance() should throw a 
      // ClassNotFoundException if it can't locate the factory
      // class. However, what it does throw is an Error,
      // specifically a FactoryConfigurationError. Very few 
      // programs are prepared to respond to errors as opposed
      // to exceptions. You should catch this error in your 
      // JAXP programs as quickly as possible even though the
      // compiler won't require you to, and you should 
      // never rethrow it or otherwise let it escape from the 
      // method that produced it. 
      System.out.println("Could not locate a factory class"); 
    }
    catch (ParserConfigurationException e) { 
      System.out.println("Could not locate a JAXP parser"); 
    }
   
  }

}

Parsing documents with DOM3

import org.w3c.dom.*;

public class DOM3ParserMaker {

  public static void main(String[] args) {

    DOMImplementation impl 
     = DOMImplementationRegistry.getDOMImplementation("XML 1.0 LS-Load 3.0");
    DOMImplementationLS implls = (DOMImplementationLS) impl;
    DOMBuilder parser = implls.getDOMBuilder();

    for (int i = 0; i < args.length; i++) {
      try {
        Document d = parser.parseURI(args[i]);
      }
      catch (DOMSystemException e) {
        System.err.println(e);
      }
      catch (DOMException e) {
        System.err.println(e);
      }

    }

  }

}

This code will not actually compile or run until some parser supports DOM3 Load and Save.

Load and Save

DOMImplementationLS: A sub-interface od DOMImplementation that provides the factory methods for creating the objects required for loading and saving.
DOMBuilder: A parser interface
DOMInputSource: Encapsulate information about the source of the XML to be loaded, like SAX's InputSource
DOMEntityResolver: During loading, provides a way for applications to redirect references to external entities.
DOMBuilderFilter: Provide the ability to examine and optionally remove Element nodes as they are being processed during the parsing of a document. like SAX filters.
DOMWriter: An interface for serializing DOM documents onto a stream or string.
DOMWriterFilter: Provide the ability to examine and optionally remove or modify nodes as they are being output.
DocumentLS: A "mechanism by which the content of a document can be replaced with the DOM tree produced when loading a URL, or parsing a string."
ParserErrorEvent: Some sort of error detected in the input document (well-formedness? validity?)
LSLoadEvent: A document has been completely loaded
LSProgressEvent: A document has been partially loaded

DOMImplementationLS

Factory interface to create new DOMBuilder and DOMWriter implementations.

Java Binding:

package org.w3c.dom.ls;

public interface DOMImplementationLS {

  public static final short MODE_SYNCHRONOUS  = 1;
  public static final short MODE_ASYNCHRONOUS = 2;
  
  public DOMBuilder     createDOMBuilder(short mode) throws DOMException;
  public DOMWriter      createDOMWriter();
  public DOMInputSource createDOMInputSource();

}

IDL:

  interface DOMImplementationLS {

  const unsigned short MODE_SYNCHRONOUS  = 1;
  const unsigned short MODE_ASYNCHRONOUS = 2;

  DOMBuilder     createDOMBuilder(in unsigned short mode) 
   raises(DOMException);
  DOMWriter      createDOMWriter();
  DOMInputSource createDOMInputSource();

  };

Creating DOMImplementationLS Objects

Use the feature "LS-Load" to find a DOMImplementation object that supports Load and Save.
Cast the DOMImplementation object to DOMImplementationLS.

DOMImplementation impl 
 = DOMImplementationRegistry.getDOMImplementation("XML 1.0 LS-Load 3.0");
  if (impl != null) {
    DOMImplementationLS implls = (DOMImplementationLS) impl;
    // ...
  }

DOMBuilder

Provides an implementation-independent API for parsing XML documents to produce a DOM Document object.
Instances are built by the createDOMBuilder() method in DOMImplementationLS.
IDL:

Java Binding:

package org.w3c.dom.ls;

public interface DOMBuilder {

  public DOMEntityResolver getEntityResolver();
  public void setEntityResolver(DOMEntityResolver entityResolver);

  public DOMErrorHandler getErrorHandler();
  public void setErrorHandler(DOMErrorHandler errorHandler);

  public DOMBuilderFilter getFilter();
  public void setFilter(DOMBuilderFilter filter);

  public void setFeature(String name, boolean state)
   throws DOMException;
  public boolean canSetFeature(String name, boolean state);
  public boolean getFeature(String name)
   throws DOMException;
  public Document parseURI(String uri) throws Exception;
  public Document parse(DOMInputSource is) throws Exception;

  // ACTION_TYPES
  public static final short ACTION_REPLACE       = 1;
  public static final short ACTION_APPEND        = 2;
  public static final short ACTION_INSERT_AFTER  = 3;
  public static final short ACTION_INSERT_BEFORE = 4;

  public void parseWithContext(DOMInputSource is,
   Node contextNode, short action) throws DOMException;

}

interface DOMBuilder {

  attribute DOMEntityResolver entityResolver;
  attribute DOMErrorHandler errorHandler;
  attribute DOMBuilderFilter filter;
  
  void     setFeature(in DOMString name, in boolean state)
   raises(DOMException);
  boolean  canSetFeature(in DOMString name, in boolean state);
  boolean  getFeature(in DOMString name) raises(DOMException);
  Document parseURI(in DOMString uri) raises(DOMSystemException);
  Document parse(in DOMInputSource is) raises(DOMSystemException);

  // ACTION_TYPES
  const unsigned short ACTION_REPLACE       = 1;
  const unsigned short ACTION_APPEND        = 2;
  const unsigned short ACTION_INSERT_AFTER  = 3;
  const unsigned short ACTION_INSERT_BEFORE = 4;

  void parseWithContext(in DOMInputSource is, in Node cnode, 
   in unsigned short action) raises(DOMException);
   
};

DOMInputSource

Like SAX2's InputSource class, this interface is an abstraction of all the different things (streams, files, byte arrays, sockets, URLs, etc.) from which an XML document can be read.

Java Binding:

package org.w3c.dom.ls;

public interface DOMInputSource {

  public InputStream getByteStream();
  public void        setByteStream(InputStream in);
  public Reader      getCharacterStream();
  public void        setCharacterStream(Reader in);

  public String getStringData();
  public void   setStringData(String data);
  public String getEncoding();
  public void   setEncoding(String encoding);
  public String getPublicId();
  public void   setPublicId(String publicId);
  public String getSystemId();
  public void   setSystemId(String systemId);

}

IDL:

  interface DOMInputSource {
    attribute DOMInputStream  byteStream;
    attribute DOMString       stringData;
    attribute DOMReader       characterStream;
    attribute DOMString       encoding;
    attribute DOMString       publicId;
    attribute DOMString       systemId;
  };

DOMEntityResolver

Like SAX2's EntityResolver interface, this interface lets applications redirect references to external entities.

Java Binding:

package org.w3c.dom.ls;

public interface DOMEntityResolver {

  public DOMInputSource resolveEntity(String publicID, 
   String systemID, String baseURI) throws DOMSystemException;

}

IDL:

  interface DOMEntityResolver {
  DOMInputSource resolveEntity(in DOMString publicID, 
   in DOMString systemID, in DOMString baseURI)
   raises(DOMSystemException);
  };

DOMWriter

Provides an API for serializing (writing) a DOM document out as a sequence of bytes onto a stream, file, socket, byte array, etc.

Java Binding:

package org.w3c.dom.ls;

public interface DOMWriter {

  public void setFeature(String name, boolean state)
   throws DOMException;
  public boolean canSetFeature(String name, boolean state);
  public boolean getFeature(String name) throws DOMException;

  public String getEncoding();
  public void   setEncoding(String encoding);
  public String getLastEncoding();
  public String getNewLine();
  public void   setNewLine(String newLine);
  
  public DOMErrorHandler getErrorHandler();
  public void setErrorHandler(DOMErrorHandler errorHandler);
  
  public boolean writeNode(OutputStream destination, Node wnode)
   throws Exception;
  public String writeToString(Node node) throws DOMException;

}

IDL:

interface DOMWriter {

  void setFeature(in DOMString name, in boolean state)
    raises(DOMException);
  boolean canSetFeature(in DOMString name, in boolean state);
  boolean getFeature(in DOMString name) raises(DOMException);
           attribute DOMString encoding;
  readonly attribute DOMString lastEncoding;
           attribute DOMString newLine;
           attribute DOMErrorHandler errorHandler;
           
  boolean writeNode(in DOMOutputStream destination, in Node wnode)
   raises(DOMSystemException);
  DOMString writeToString(in Node wnode) raises(DOMException);
  
};

DOMBuilderFilter

Lets applications examine nodes as they are being constructed during a parse.
As each node is examined, it may be modified or removed, or parsing may be aborted.

Java Binding:

package org.w3c.dom.ls;

public interface DOMBuilderFilter {

  public int startNode(Node snode);
  public int endNode(Node enode);
  public int getWhatToShow();

}

IDL:

interface DOMBuilderFilter {

  unsigned long      startNode(in Node snode);
  unsigned long      endNode(in Node enode);
  
  readonly attribute unsigned long whatToShow;
};

DOMWriterFilter

Lets applications examine nodes as they are being output.
As each element is examined, it may be modified or removed, or output may be aborted.

Java Binding:

package org.w3c.dom.ls;

public interface DOMWriterFilter extends NodeFilter {

  public int getWhatToShow();

}

IDL:

interface DOMWriterFilter : traversal::NodeFilter {
  readonly attribute unsigned long whatToShow;
};

DocumentLS

An instance of the DocumentLS interface can be obtained by casting an instance of the Document interface to DocumentLS.

Java Binding:

package org.w3c.dom.ls;

import org.w3c.dom.Node;
import org.w3c.dom.DOMException;

public interface DocumentLS {

  public boolean getAsync();
  public void    setAsync(boolean async);

  public void    abort();
  public boolean load(String url);
  public boolean loadXML(String source);
  public String  saveXML(Node node) throws DOMException;

}

IDL:

interface DocumentLS {
  attribute boolean  async;
  
  void      abort();
  boolean   load(in DOMString url);
  boolean   loadXML(in DOMString source);
  DOMString saveXML(in Node node) raises(DOMException);
  
};

ParserErrorEvent

Represents an error (of what kind?) in the document being parsed

Java Binding:

package org.w3c.dom.ls;

public interface ParseErrorEvent extends Event {

  public DOMError getError();

}

IDL:

interface ParseErrorEvent : events::Event {
  readonly attribute DOMError error;
};

Grammar Access/Abstract Schemas

Abstract Schemas (AS) include DTDs, W3C XML Schema Language Schemas, RELAX NG, and more
Should be able to access their information without binding yourself too tightly to any one language

What are Abstract Schemas for?

Associating a schema with a document, or changing the current association
Using the same schema with several documents, without having to reload it
Validating documents against a schema
Validating document parts against a schema
Retrieving information from a schema (e.g. default values and attribute types
Create new schemas (like the DOM creates new instance documents)
Save an in-memory schema to a file.
Modify in-memory schemas
Provide the guidance necessary so that valid instance documents can be modified and remain valid.

Abstract Schema Interfaces

Abstract Schema and AS-Editing Interfaces:
- ASObject
- ASModel
- ASException
- ASExceptionCode
- ASContentModel
- ASObjectList
- ASNamedObjectMap
- ASDataType
- ASElementDecl
- ASAttributeDecl
- ASEntityDecl
- ASNotationDecl
Validation and Other Interfaces:
- DocumentAS
- DOMImplementationAS
Load and Save for Abstract Schemas:
- ASDOMBuilder
- DOMASWriter
Document-Editing Interfaces:
- NodeEditAS
- ElementEditAS
- CharacterDataEditAS
- DocumentEditAS
- AttributeEditAS

Loading an Abstract Schema from a URI into an ASModel

Check to see if the implementation supports the "LS-AS" feature, version "3.0".
Construct a DOMBuilder object
Cast the DOMBuilder to ASDOMBuilder
Call the parseASURI() method to read the schema

try {
  if (impl.hasFeature("LS-AS", "3.0")) {
    DOMImplementationFactoryLS impl =
      (DOMImplementationLS) DOMImplementationFactory.getDOMImplementation();
    DOMBuilder parser = impl.getDOMBuilder();
    ASDOMBuilder schemaParser = (ASDOMBuilder) parser;
    ASModel schema = schemaParser.parseASURI(
     "http://www.openhealth.org/RDDL/rddl-integration.rxg",
     "RELAX");
    // Use the schema...
  }
}
catch (DOMException e) {
  //...
}

Validating a DOM Document against an ASModel

Cast a Document to DocumentAS
Add the schema to the DocumentAS with the addAS() method.
Invoke DocumentAS's validate method

    if (impl.hasFeature("AS-DOC 3.0")) {
      Document doc = parser.parseURI("????");
      DocumentAS docWithSchema = (DocumentAS) doc;
      docWithSchema.addAS(schema);
      docWithSchema.validate()
      // Process the data...
    }

Abstract Schema and AS-Editing Interfaces

DOMImplementation.hasFeature("AS-EDIT") returns true if a given DOM supports these interfaces for editing abstract schemas:
- ASObject
- ASModel
- ASException
- ASExceptionCode
- ASContentModel
- ASObjectList
- ASNamedObjectMap
- ASDataType
- ASElementDecl
- ASAttributeDecl
- ASEntityDecl
- ASNotationDecl

The ASObject Interface

The superinterface for the various kinds of declarations out of which ASModels are built

Java binding:

package org.w3c.dom.as;

public interface ASObject {

    // ASObjectType
  public static final short AS_ELEMENT_DECLARATION   = 1;
  public static final short AS_ATTRIBUTE_DECLARATION = 2;
  public static final short AS_NOTATION_DECLARATION  = 3;
  public static final short AS_ENTITY_DECLARATION    = 4;
  public static final short AS_CONTENTMODEL          = 5;
  public static final short AS_MODEL                 = 6;

  public short    getASObjectType();
  public ASModel  getOwnerASModel();
  public String   getObjectName();
  public void     setObjectName(String objectName);
  public String   getPrefix();
  public void     setPrefix(String prefix);
  public String   getLocalName();
  public void     setLocalName(String localName);
  public String   getNamespaceURI();
  public void     setNamespaceURI(String namespaceURI);
  public ASObject cloneASObject(boolean deep);

}

IDL:

interface ASObject {

  // ASObjectType
  const unsigned short AS_ELEMENT_DECLARATION   = 1;
  const unsigned short AS_ATTRIBUTE_DECLARATION = 2;
  const unsigned short AS_NOTATION_DECLARATION  = 3;
  const unsigned short AS_ENTITY_DECLARATION    = 4;
  const unsigned short AS_CONTENTMODEL          = 5;
  const unsigned short AS_MODEL                 = 6;

  readonly attribute unsigned short ASObjectType;
  readonly attribute ASModel        ownerASModel;
           attribute DOMString      objectName;
           attribute DOMString      prefix;
           attribute DOMString      localName;
           attribute DOMString      namespaceURI;
           
  ASObject cloneASObject(in boolean deep);
};

The ASModel Interface

Represents an abstract content model that could be a DTD, an XML Schema, or something else. It has both an internal and external subset.

Java binding:

package org.w3c.dom.as;

public interface ASModel extends ASObject {

  // ASMODEL_TYPES
  public static final short INTERNAL_SUBSET = 1;
  public static final short EXTERNAL_SUBSET = 2;
  public static final short NOT_USED        = 3;

  public boolean getNamespaceAware();
  public short   getUsage();

  public String           getLocation();
  public void             setLocation(String location);
  public String           getHint();
  public void             setHint(String hint);
  public boolean          getContainer();
  public ASNamedObjectMap getElementDecls();
  public ASNamedObjectMap getAttributeDecls();
  public ASNamedObjectMap getNotationDecls();
  public ASNamedObjectMap getEntityDecls();
  public ASNamedObjectMap getContentModelDecls();
  public void             addASModel(ASModel abstractSchema);
  public ASObjectList     getASModels();
  public void             removeAS(ASModel as);
  
  public boolean validate();
  public void importASObject(ASObject asobject);
  public void insertASObject(ASObject asobject);
  public ASElementDecl createASElementDecl(String namespaceURI, 
   String name) throws ASException;
  public ASAttributeDecl createASAttributeDecl(String namespaceURI, 
   String name) throws ASException;
  public ASNotationDecl createASNotationDecl(String namespaceURI, 
   String name, String systemId, String publicId) throws ASException;
  public ASEntityDecl createASEntityDecl(String name) throws ASException;
  public ASContentModel createASContentModel(String name, 
   String namespaceURI, int minOccurs, int maxOccurs, short operator)
   throws ASException;

}

IDL:

interface ASModel : ASObject {

  // ASMODEL_TYPES
  const unsigned short INTERNAL_SUBSET = 1;
  const unsigned short EXTERNAL_SUBSET = 2;
  const unsigned short NOT_USED        = 3;

  readonly attribute boolean         NamespaceAware;
  readonly attribute unsigned short  usage;
           attribute DOMString       location;
           attribute DOMString       hint;
  readonly attribute boolean         container;
  readonly attribute ASNamedObjectMap elementDecls;
  readonly attribute ASNamedObjectMap attributeDecls;
  readonly attribute ASNamedObjectMap notationDecls;
  readonly attribute ASNamedObjectMap entityDecls;
  readonly attribute ASNamedObjectMap contentModelDecls;
  void               addASModel(in ASModel abstractSchema);
  ASObjectList getASModels();
  void               removeAS(in ASModel as);
  boolean            validate();
  void               importASObject(in ASObject asobject);
  void               insertASObject(in ASObject asobject);
  ASElementDecl createASElementDecl(in DOMString namespaceURI, in DOMString name)
   raises(ASException);
  ASAttributeDecl createASAttributeDecl(in DOMString namespaceURI, in DOMString name)
   raises(ASException);
  ASNotationDecl createASNotationDecl(in DOMString namespaceURI, 
   in DOMString name, in DOMString systemId, in DOMString publicId)
   raises(ASException);
  ASEntityDecl createASEntityDecl(in DOMString name) raises(ASException);
  ASContentModel createASContentModel(in DOMString name, 
   in DOMString namespaceURI, in unsigned long minOccurs, 
   in unsigned long maxOccurs, in unsigned short operator)
   raises(ASException);
};

The ASContentModel Interface

Represents the content specification for an element:
- In a DTD:
  <!ELEMENT name (first, last)>
- In a schema this is the contents of an xsd:element element.

Java binding:

package org.w3c.dom.as;

public interface ASContentModel extends ASObject {

  public static final int AS_UNBOUNDED = MAX_VALUE;
    
    // ASContentModelType
  public static final short AS_SEQUENCE  = 0;
  public static final short AS_CHOICE    = 1;
  public static final short AS_ALL       = 2;
  public static final short AS_NONE      = 3;
  public static final short AS_UNDEFINED = 4;

  public short getListOperator();
  public void  setListOperator(short listOperator);

  public int   getMinOccurs();
  public void  setMinOccurs(int minOccurs);

  public int   getMaxOccurs();
  public void  setMaxOccurs(int maxOccurs);
 
  public ASObjectList getSubModels();
  public void         setSubModels(ASObjectList subModels);

  public void removesubModel(ASObject oldObject);

  public ASObject insertBeforeSubModel(ASObject newObject, ASObject refObject)
   throws ASException;

  public int appendsubModel(ASObject newObject) throws ASException;

}

IDL:

interface ASContentModel : ASObject {

  const unsigned long       AS_UNBOUNDED = MAX_VALUE;

  // ASContentModelType
  const unsigned short AS_SEQUENCE  = 0;
  const unsigned short AS_CHOICE    = 1;
  const unsigned short AS_ALL       = 2;
  const unsigned short AS_NONE      = 3;
  const unsigned short AS_UNDEFINED = 4;

  attribute unsigned short listOperator;
  attribute unsigned long  minOccurs;
  attribute unsigned long  maxOccurs;
  attribute ASObjectList   subModels;
           
  void          removesubModel(in ASObject oldObject);
  ASObject      insertBeforeSubModel(in ASObject newObject, in ASObject refObject)
   raises(ASException);
  unsigned long appendsubModel(in ASObject newObject) raises(ASException);
  
};

The ASObjectList Interface

An ordered list of the ASObjects in a content model

Java binding:

package org.w3c.dom.as;

public interface ASObjectList {

  public int      getLength();
  public ASObject item(int index);

}

IDL:

interface ASObjectList {

  readonly attribute unsigned long length;
  ASObject item(in unsigned long index);
  
};

The ASNamedObjectMap Interface

An unordered set of AS objects

Java binding:

package org.w3c.dom.as;

public interface ASNamedObjectMap {

  public int      getLength();
  public ASObject getNamedItem(String name);
  public ASObject item(int index);
  public ASObject removeNamedItem(String name) throws DOMException;
  public ASObject setNamedItem(ASObject newASObject)
   throws DOMException, ASException;

}

IDL:

interface ASNamedObjectMap {
  readonly attribute unsigned long   length;
  ASObject getNamedItem(in DOMString name);
  ASObject item(in unsigned long index);
  ASObject removeNamedItem(in DOMString name)
   raises(DOMException);
  ASObject setNamedItem(in ASObject newASObject)
   raises(DOMException, ASException);
};

The ASDataType Interface

Data types used in content models
Based on W3C XML Schema language types

Java binding:

package org.w3c.dom.as;

public interface ASDataType {
  public short getDataType();

    // DATA_TYPES
  public static final short STRING_DATATYPE             = 1;
  public static final short NOTATION_DATATYPE           = 10;
  public static final short ID_DATATYPE                 = 11;
  public static final short IDREF_DATATYPE              = 12;
  public static final short IDREFS_DATATYPE             = 13;
  public static final short ENTITY_DATATYPE             = 14;
  public static final short ENTITIES_DATATYPE           = 15;
  public static final short NMTOKEN_DATATYPE            = 16;
  public static final short NMTOKENS_DATATYPE           = 17;
  public static final short BOOLEAN_DATATYPE            = 100;
  public static final short FLOAT_DATATYPE              = 101;
  public static final short DOUBLE_DATATYPE             = 102;
  public static final short DECIMAL_DATATYPE            = 103;
  public static final short HEXBINARY_DATATYPE          = 104;
  public static final short BASE64BINARY_DATATYPE       = 105;
  public static final short ANYURI_DATATYPE             = 106;
  public static final short QNAME_DATATYPE              = 107;
  public static final short DURATION_DATATYPE           = 108;
  public static final short DATETIME_DATATYPE           = 109;
  public static final short DATE_DATATYPE               = 110;
  public static final short TIME_DATATYPE               = 111;
  public static final short GYEARMONTH_DATATYPE         = 112;
  public static final short GYEAR_DATATYPE              = 113;
  public static final short GMONTHDAY_DATATYPE          = 114;
  public static final short GDAY_DATATYPE               = 115;
  public static final short GMONTH_DATATYPE             = 116;
  public static final short INTEGER                     = 117;
  public static final short NAME_DATATYPE               = 200;
  public static final short NCNAME_DATATYPE             = 201;
  public static final short NORMALIZEDSTRING_DATATYPE   = 202;
  public static final short TOKEN_DATATYPE              = 203;
  public static final short LANGUAGE_DATATYPE           = 204;
  public static final short NONPOSITIVEINTEGER_DATATYPE = 205;
  public static final short NEGATIVEINTEGER_DATATYPE    = 206;
  public static final short LONG_DATATYPE               = 207;
  public static final short INT_DATATYPE                = 208;
  public static final short SHORT_DATATYPE              = 209;
  public static final short BYTE_DATATYPE               = 210;
  public static final short NONNEGATIVEINTEGER_DATATYPE = 211;
  public static final short UNSIGNEDLONG_DATATYPE       = 212;
  public static final short UNSIGNEDINT_DATATYPE        = 213;
  public static final short UNSIGNEDSHORT_DATATYPE      = 214;
  public static final short UNSIGNEDBYTE_DATATYPE       = 215;
  public static final short POSITIVEINTEGER_DATATYPE    = 216;
  public static final short OTHER_SIMPLE_DATATYPE       = 1000;
  public static final short COMPLEX_DATATYPE            = 1001;

}

IDL:

interface ASDataType {
  readonly attribute unsigned short  dataType;

  // DATA_TYPES
  const unsigned short STRING_DATATYPE             = 1;
  const unsigned short NOTATION_DATATYPE           = 10;
  const unsigned short ID_DATATYPE                 = 11;
  const unsigned short IDREF_DATATYPE              = 12;
  const unsigned short IDREFS_DATATYPE             = 13;
  const unsigned short ENTITY_DATATYPE             = 14;
  const unsigned short ENTITIES_DATATYPE           = 15;
  const unsigned short NMTOKEN_DATATYPE            = 16;
  const unsigned short NMTOKENS_DATATYPE           = 17;
  const unsigned short BOOLEAN_DATATYPE            = 100;
  const unsigned short FLOAT_DATATYPE              = 101;
  const unsigned short DOUBLE_DATATYPE             = 102;
  const unsigned short DECIMAL_DATATYPE            = 103;
  const unsigned short HEXBINARY_DATATYPE          = 104;
  const unsigned short BASE64BINARY_DATATYPE       = 105;
  const unsigned short ANYURI_DATATYPE             = 106;
  const unsigned short QNAME_DATATYPE              = 107;
  const unsigned short DURATION_DATATYPE           = 108;
  const unsigned short DATETIME_DATATYPE           = 109;
  const unsigned short DATE_DATATYPE               = 110;
  const unsigned short TIME_DATATYPE               = 111;
  const unsigned short GYEARMONTH_DATATYPE         = 112;
  const unsigned short GYEAR_DATATYPE              = 113;
  const unsigned short GMONTHDAY_DATATYPE          = 114;
  const unsigned short GDAY_DATATYPE               = 115;
  const unsigned short GMONTH_DATATYPE             = 116;
  const unsigned short INTEGER                     = 117;
  const unsigned short NAME_DATATYPE               = 200;
  const unsigned short NCNAME_DATATYPE             = 201;
  const unsigned short NORMALIZEDSTRING_DATATYPE   = 202;
  const unsigned short TOKEN_DATATYPE              = 203;
  const unsigned short LANGUAGE_DATATYPE           = 204;
  const unsigned short NONPOSITIVEINTEGER_DATATYPE = 205;
  const unsigned short NEGATIVEINTEGER_DATATYPE    = 206;
  const unsigned short LONG_DATATYPE               = 207;
  const unsigned short INT_DATATYPE                = 208;
  const unsigned short SHORT_DATATYPE              = 209;
  const unsigned short BYTE_DATATYPE               = 210;
  const unsigned short NONNEGATIVEINTEGER_DATATYPE = 211;
  const unsigned short UNSIGNEDLONG_DATATYPE       = 212;
  const unsigned short UNSIGNEDINT_DATATYPE        = 213;
  const unsigned short UNSIGNEDSHORT_DATATYPE      = 214;
  const unsigned short UNSIGNEDBYTE_DATATYPE       = 215;
  const unsigned short POSITIVEINTEGER_DATATYPE    = 216;
  const unsigned short OTHER_SIMPLE_DATATYPE       = 1000;
  const unsigned short COMPLEX_DATATYPE            = 1001;
  
};

The ASElementDecl Interface

Represents a declaration of an element such as <!ELEMENT TIME (#PCDATA)> or an xsd:element schema element

Java binding:

package org.w3c.dom.as;

public interface ASElementDecl extends ASObject {

    // CONTENT_MODEL_TYPES
  public static final short EMPTY_CONTENTTYPE    = 1;
  public static final short ANY_CONTENTTYPE      = 2;
  public static final short MIXED_CONTENTTYPE    = 3;
  public static final short ELEMENTS_CONTENTTYPE = 4;

  public boolean    getStrictMixedContent();
  public void       setStrictMixedContent(boolean strictMixedContent);

  public ASDataType getElementType();
  public void       setElementType(ASDataType elementType);

  public boolean    getIsPCDataOnly();
  public void       setIsPCDataOnly(boolean isPCDataOnly);

  public short      getContentType();
  public void       setContentType(short contentType);

  public ASContentModel getASContentModel();
  public void setASContentModel(ASContentModel ASContentModel);

  public ASNamedObjectMap getASAttributeDecls();
  public void setASAttributeDecls(ASNamedObjectMap ASAttributeDecls);

  public void addASAttributeDecl(ASAttributeDecl attributeDecl);

  public ASAttributeDecl removeASAttributeDecl(ASAttributeDecl attributeDecl);

}

IDL:

interface ASElementDecl : ASObject {

  // CONTENT_MODEL_TYPES
  const unsigned short EMPTY_CONTENTTYPE    = 1;
  const unsigned short ANY_CONTENTTYPE      = 2;
  const unsigned short MIXED_CONTENTTYPE    = 3;
  const unsigned short ELEMENTS_CONTENTTYPE = 4;

  attribute boolean          strictMixedContent;
  attribute ASDataType       elementType;
  attribute boolean          isPCDataOnly;
  attribute unsigned short   contentType;
  attribute ASContentModel   ASContentModel;
  attribute ASNamedObjectMap ASAttributeDecls;
           
  void            addASAttributeDecl(in ASAttributeDecl attributeDecl);
  ASAttributeDecl removeASAttributeDecl(in ASAttributeDecl attributeDecl);
};

The ASAttributeDecl Interface

Represents a declaration of an attribute; e.g. an xsd:attribute schema element oe
<!ATTLIST TIME HOURS CDATA #IMPLIED>

Java binding:

package org.w3c.dom.as;

public interface ASAttributeDecl extends ASObject {

  public static final short NONE     = 0;
  public static final short DEFAULT  = 1;
  public static final short FIXED    = 2;
  public static final short REQUIRED = 3;

  public ASDataType getDataType();
  public void setDataType(ASDataType DataType);

  public String getDataValue();
  public void   setDataValue(String DataValue);

  public String getEnumAttr();
  public void   setEnumAttr(String enumAttr);

  public ASObjectList getOwnerElements();
  public void         setOwnerElements(ASObjectList ownerElements);

  public short getDefaultType();
  public void  setDefaultType(short defaultType);

}

IDL:

interface ASAttributeDecl : ASObject {

  // VALUE_TYPES
  const unsigned short NONE     = 0;
  const unsigned short DEFAULT  = 1;
  const unsigned short FIXED    = 2;
  const unsigned short REQUIRED = 3;

  attribute ASDataType     DataType;
  attribute DOMString      DataValue;
  attribute DOMString      enumAttr;
  attribute ASObjectList   ownerElements;
  attribute unsigned short defaultType;
};

The ASEntityDecl Interface

Represents a declaration of a general entity; e.g.
<!ENTITY COPY01 "Copyright 2001 Elliotte Harold">

Java binding:

package org.w3c.dom.as;

public interface ASEntityDecl extends ASObject {
    // EntityType
  public static final short INTERNAL_ENTITY = 1;
  public static final short EXTERNAL_ENTITY = 2;

  public short  getEntityType();
  public void   setEntityType(short entityType);

  public String getEntityValue();
  public void   setEntityValue(String entityValue);

  public String getSystemId();
  public void   setSystemId(String systemId);

  public String getPublicId();
  public void   setPublicId(String publicId);

}

IDL:

interface ASEntityDecl : ASObject {

  // EntityType
  const unsigned short INTERNAL_ENTITY = 1;
  const unsigned short EXTERNAL_ENTITY = 2;

  attribute unsigned short entityType;
  attribute DOMString      entityValue;
  attribute DOMString      systemId;
  attribute DOMString      publicId;
};

The ASNotationDecl Interface

Represents a declaration of a notation; e.g.
<!NOTATION TXT SYSTEM "text/plain">

Java binding:

package org.w3c.dom.as;

public interface ASNotationDecl extends ASObject {

  public String getSystemId();
  public void   setSystemId(String systemId);

  public String getPublicId();
  public void   setPublicId(String publicId);

}

IDL:

interface ASNotationDecl : ASObject {

  attribute DOMString systemId;
  attribute DOMString publicId;
};

Validation Interfaces:

DocumentAS
DOMImplementationAS

The DocumentAS Interface

Extends the Document interface with additional methods for both document editing, abstract schema editing, and validation.

Java binding:

package org.w3c.dom.as;

public interface DocumentAS extends Document {

  public ASModel getActiveASModel();
  public void    setActiveASModel(ASModel activeASModel);

  public ASObjectList getBoundASModels();
  public void         setBoundASModels(ASObjectList boundASModels);

  public ASModel getInternalAS();
  public void    setInternalAS(ASModel as) throws DOMException;
  public void    addAS(ASModel as);
  public void    removeAS(ASModel as);
    
  public ASElementDecl getElementDecl() throws DOMException;
    
  public void validate() throws ASException;

}

IDL:

interface DocumentAS : Document {

  attribute ASModel activeASModel;
  attribute ASObjectList boundASModels;
  
  ASModel       getInternalAS();
  void          setInternalAS(in ASModel as) raises(DOMException);
  void          addAS(in ASModel as);
  void          removeAS(in ASModel as);
  ASElementDecl getElementDecl() raises(DOMException);
  void          validate() raises(ASException);
};

Validating a document in-memory

Call hasFeature("????", "3.0") to verify that this is supported
Load the document in the usual way
Load the ASModel
Cast the Document to a DocumentAS
Attach the ASModel the DocumentAS using the setAS() method
Invoke the DocumentAS's validate() method
If the Document is not valid, then a ASException is thrown with the code VALIDATION_ERR

The DOMImplementationAS Interface

Extends the DOM2 DOMImplementation interface with factory methods to create schema documents

Java binding:

package org.w3c.dom.as;

import org.w3c.dom.DOMImplementation;

public interface DOMImplementationAS extends DOMImplementation {

  public boolean getContainer();

  public String  getSchemaType();
  public void    setSchemaType(String schemaType);

  public ASModel createAS(boolean NamespaceAware, String schemaType);

}

IDL:

interface DOMImplementationAS : DOMImplementation {
  readonly attribute boolean         container;
           attribute DOMString       schemaType;
           
  ASModel createAS(in boolean NamespaceAware, in DOMString schemaType);
};

Creating a schema in-memory

Call hasFeature("AS-EDIT", "3.0") to verify that this is supported
Load a DOMImplementation in the usual way
Cast DOMImplementation to DOMImplementationAS
Invoke the createAS() method to create a new, implementation-specific ASModel object
Use the factory methods in this ASModel to create the schema

A DTD for XML-RPC

<!ELEMENT methodCall (methodName, params)>
<!ELEMENT methodName (#PCDATA)>
<!ELEMENT params     (param*)>
<!ELEMENT param      (value)>
<!ELEMENT value      
   (i4|int|string|dateTime.iso8601|double|base64|struct|array)>
<!ELEMENT i4               (#PCDATA)>
<!ELEMENT int              (#PCDATA)>
<!ELEMENT string           (#PCDATA)>
<!ELEMENT dateTime.iso8601 (#PCDATA)>
<!ELEMENT double           (#PCDATA)>
<!ELEMENT base64           (#PCDATA)>

<!ELEMENT array            (data)>
<!ELEMENT data             (value*)>
<!ELEMENT struct           (member+)>
<!ELEMENT member           (name, value)>
<!ELEMENT name             (#PCDATA)>

<!ELEMENT methodResponse   (params | fault)>
<!ELEMENT fault            (value)>

DOM code to create a DTD schema for XML-RPC

Call hasFeature("AS-EDIT", "3.0") to verify that this is supported
Load a DOMImplementation in the usual way
Cast DOMImplementation to DOMImplementationAS
Invoke the createAS() method to create a new, implementation-specific ASModel object
Use the factory methods in this ASModel to create the schema

try {
  if (impl.hasFeature("AS-EDIT", "3.0")) {
    DOMImplementationFactoryLS impl =
      (DOMImplementationAS) DOMImplementationFactory.getDOMImplementation();
    ASModel dtd = impl.createAS(false, "DTD");

    // <!ELEMENT methodCall (methodName, params)>
    ASElementDecl  methodCall = dtd.createASElementDecl(null, "methodCall");
    ASContentModel methodCallModel = dtd.createASContentModel(
     "methodCall",  null, 1, 1, ASContentModel.AS_SEQUENCE);
    methodCall.setASContentModel(methodCallModel);
    methodCall.setContentType(ASElementDecl.ELEMENTS_CONTENTTYPE);
    
    // <!ELEMENT methodName (#PCDATA)>
    ASElementDecl methodName = dtd.createASElementDecl(null, "methodName");
    methodName.setIsPCDataOnly(true);
    
    // <!ELEMENT params (param*)>
    ASElementDecl  params = dtd.createASElementDecl(null, "params");
    ASContentModel paramsModel = dtd.createASContentModel(
     "params",  "", 0, ASContentModel.AS_UNBOUNDED, ASContentModel.AS_SEQUENCE);
    methodCall.setASContentModel(paramsModel);
    methodCall.setContentType(ASElementDecl.ELEMENTS_CONTENTTYPE);
    
    // <!ELEMENT param (value)>
    ASElementDecl  param = dtd.createASElementDecl(null, "param");
    ASContentModel paramModel = dtd.createASContentModel(
     "param",  "", 1, 1, ASContentModel.AS_SEQUENCE);
    methodCall.setASContentModel(paramModel);
    methodCall.setContentType(ASElementDecl.ELEMENTS_CONTENTTYPE);
    
    // <!ELEMENT value (i4|int|string|dateTime.iso8601|double|base64|struct|array)>
    ASElementDecl  value = dtd.createASElementDecl(null, "value");
    ASContentModel valueModel = dtd.createASContentModel(
     "param",  "", 1, 1, ASContentModel.AS_CHOICE);
    methodCall.setASContentModel(valueModel);
    methodCall.setContentType(ASElementDecl.ELEMENTS_CONTENTTYPE);
    
    
    // <!ELEMENT i4               (#PCDATA)>
    // <!ELEMENT int              (#PCDATA)>
    // <!ELEMENT string           (#PCDATA)>
    // <!ELEMENT dateTime.iso8601 (#PCDATA)>
    // <!ELEMENT double           (#PCDATA)>
    // <!ELEMENT base64           (#PCDATA)>
    ASElementDecl i4 = dtd.createASElementDecl(null, "i4");
    i4.setIsPCDataOnly(true);
    ASElementDecl intElement = dtd.createASElementDecl(null, "int");
    intElement.setIsPCDataOnly(true);
    ASElementDecl string = dtd.createASElementDecl(null, "string");
    string.setIsPCDataOnly(true);
    ASElementDecl dateTime.iso8601 = dtd.createASElementDecl(null, "dateTime.iso8601");
    dateTime.iso8601.setIsPCDataOnly(true);
    ASElementDecl base64 = dtd.createASElementDecl(null, "base64");
    base64.setIsPCDataOnly(true);
    ASElementDecl doubleElement = dtd.createASElementDecl(null, "doubleElement");
    doubleElement.setIsPCDataOnly(true);
    
    
    // <!ELEMENT array (data)>
    ASElementDecl  array = dtd.createASElementDecl(null, "array");
    ASContentModel arrayModel = dtd.createASContentModel(
     "array",  "", 1, ASContentModel.AS_UNBOUNDED, ASContentModel.AS_SEQUENCE);
    methodCall.setASContentModel(arrayModel);
    methodCall.setContentType(ASElementDecl.ELEMENTS_CONTENTTYPE);

    // <!ELEMENT data (value*)>
    ASElementDecl  data = dtd.createASElementDecl(null, "data");
    ASContentModel dataModel = dtd.createASContentModel(
     "data",  "", 0, 1, ASContentModel.AS_SEQUENCE);
    methodCall.setASContentModel(arrayModel);
    methodCall.setContentType(ASElementDecl.ELEMENTS_CONTENTTYPE);

    // <!ELEMENT struct (member+)>
    ASElementDecl  struct = dtd.createASElementDecl(null, "struct");
    ASContentModel structModel = dtd.createASContentModel(
     "struct",  "", 1, ASContentModel.AS_UNBOUNDED, ASContentModel.AS_SEQUENCE);
    methodCall.setASContentModel(structModel);
    methodCall.setContentType(ASElementDecl.ELEMENTS_CONTENTTYPE);

    // <!ELEMENT member (name, value)>
    ASElementDecl  member = dtd.createASElementDecl(null, "member");
    ASContentModel memberModel = dtd.createASContentModel(
     "member",  "", 2, 2, ASContentModel.AS_SEQUENCE);
    methodCall.setASContentModel(memberModel);
    methodCall.setContentType(ASElementDecl.ELEMENTS_CONTENTTYPE);
    
    // <!ELEMENT name (#PCDATA)>
    ASElementDecl name = dtd.createASElementDecl(null, "i4");
    name.setIsPCDataOnly(true);

    // <!ELEMENT methodResponse (params | fault)>
    ASElementDecl  methodResponse = dtd.createASElementDecl(null, "methodResponse");
    ASContentModel methodResponseModel = dtd.createASContentModel(
     "member",  "", 1, 1, ASContentModel.AS_CHOICE);
    methodCall.setASContentModel(methodResponseModel);
    methodCall.setContentType(ASElementDecl.ELEMENTS_CONTENTTYPE);
    
    // <!ELEMENT fault (value)>    
    ASElementDecl  fault = dtd.createASElementDecl(null, "fault");
    ASContentModel faultModel = dtd.createASContentModel(
     "fault",  "", 1, 1, ASContentModel.AS_SEQUENCE);
    methodCall.setASContentModel(faultModel);
    methodCall.setContentType(ASElementDecl.ELEMENTS_CONTENTTYPE);
    
    methodCallModel.appendSubModel(methodName);
    methodCallModel.appendSubModel(params);
    paramsModel.appendSubModel(param);
    paramModel.appendSubModel(value);
    valueModel.appendSubModel(i4);
    valueModel.appendSubModel(intElement);
    valueModel.appendSubModel(string);
    valueModel.appendSubModel(dateTime.iso8601);
    valueModel.appendSubModel(doubleElement);
    valueModel.appendSubModel(base64Element);
    valueModel.appendSubModel(structElement);
    valueModel.appendSubModel(arrayElement);
    arrayModel.appendSubModel(data);
    dataModel.appendSubModel(value);
    structModel.appendSubModel(name);
    methodResponseModel.appendSubModel(params);
    methodResponseModel.appendSubModel(fault);
    memberModel.appendSubModel(name);
    memberModel.appendSubModel(value);
    faultModel.appendSubModel(value);

  }
}
catch (ASException e) {
  System.err.println(e);  
}

A W3C XML Schema Language schema for XML-RPC

<?xml version="1.0"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">

  <!-- The only two possible root elements are methodResponse and
       methodCall so these are the only two I use a top-level
       declaration for. --> 

  <xsd:element name="methodCall">
    <xsd:complexType>
      <xsd:all>
        <xsd:element name="methodName">
          <xsd:simpleType>
            <xsd:restriction base="ASCIIString">
              <xsd:pattern value="([A-Za-z0-9]|/|\.|:|_)*" />
            </xsd:restriction>
          </xsd:simpleType>
        </xsd:element>
        <xsd:element name="params" minOccurs="0" maxOccurs="1">
          <xsd:complexType>
            <xsd:sequence>
              <xsd:element name="param"  type="ParamType" 
                           minOccurs="0" maxOccurs="unbounded"/>
            </xsd:sequence>
          </xsd:complexType>
         </xsd:element>
      </xsd:all>
    </xsd:complexType>  
  </xsd:element>

  <xsd:element name="methodResponse">
    <xsd:complexType>
      <xsd:choice>
        <xsd:element name="params">
          <xsd:complexType>
            <xsd:sequence>
              <xsd:element name="param" type="ParamType"/>
            </xsd:sequence>
          </xsd:complexType>
        </xsd:element>
        <xsd:element name="fault">
          <!-- What can appear inside a fault is very restricted -->
          <xsd:complexType>
            <xsd:sequence>
              <xsd:element name="value">
                <xsd:complexType>
                  <xsd:sequence>
                    <xsd:element name="struct"> 
                      <xsd:complexType> 
                        <xsd:sequence> 
                          <xsd:element name="member" 
                                       type="MemberType">
                          </xsd:element>
                          <xsd:element name="member" 
                                       type="MemberType">
                          </xsd:element>
                        </xsd:sequence>
                      </xsd:complexType>
                    </xsd:element>
                  </xsd:sequence>
                </xsd:complexType>
              </xsd:element>
            </xsd:sequence>
          </xsd:complexType>
         </xsd:element>
      </xsd:choice>
    </xsd:complexType>  
  </xsd:element>

  <xsd:complexType name="ParamType">
    <xsd:sequence>
      <xsd:element name="value" type="ValueType"/>
    </xsd:sequence>
  </xsd:complexType>

  <xsd:complexType name="ValueType" mixed="true">
    <!-- I need to figure out how to say that this
         is either a simple xsd:string type or that 
         it contains one of these elements; but that otherwise
         it does not have mixed content -->
    <xsd:choice>
      <xsd:element name="i4"            type="xsd:int"/>
      <xsd:element name="int"           type="xsd:int"/>
      <xsd:element name="string"        type="ASCIIString"/>
      <xsd:element name="double"        type="xsd:decimal"/>
      <xsd:element name="Base64"        type="xsd:base64Binary"/>
      <xsd:element name="boolean"       type="NumericBoolean"/>
      <xsd:element name="dateTime.iso8601" type="xsd:dateTime"/>
      <xsd:element name="array"         type="ArrayType"/>
      <xsd:element name="struct"        type="StructType"/>
    </xsd:choice>
  </xsd:complexType>

  <xsd:complexType name="StructType">
    <xsd:sequence>
      <xsd:element name="member" type="MemberType" 
                   maxOccurs="unbounded"/>
    </xsd:sequence>
  </xsd:complexType>

  <xsd:complexType name="MemberType">
    <xsd:sequence>
      <xsd:element name="name"  type="xsd:string" />
      <xsd:element name="value" type="ValueType"/>
    </xsd:sequence>
  </xsd:complexType>

  <xsd:complexType name="ArrayType">
    <xsd:sequence>
      <xsd:element name="data">
        <xsd:complexType>
          <xsd:sequence>
            <xsd:element name="value"  type="ValueType" 
                         minOccurs="0" maxOccurs="unbounded"/>
          </xsd:sequence>
        </xsd:complexType>
      </xsd:element>
    </xsd:sequence>
  </xsd:complexType>

  <xsd:simpleType name="ASCIIString">
    <xsd:restriction base="xsd:string">
      <xsd:pattern value="([ -~]|\n|\r|\t)*" />
    </xsd:restriction>
  </xsd:simpleType>

  <xsd:simpleType name="NumericBoolean">
    <xsd:restriction base="xsd:boolean">
      <xsd:pattern value="0|1" />
    </xsd:restriction>
  </xsd:simpleType>

</xsd:schema>

DOM code to create an abstract schema for XML-RPC

Serializing an abstract schema to a file

Schema-guided Document-Editing Interfaces:

Allows you to determine whether or not it's valid to add or a delete a node at a particular position in a document. This is called guided document editing.
DOMImplementation.hasFeature("AS-DOC") returns true if a given DOM supports these capabilities.
- NodeEditAS
- ElementEditAS
- CharacterDataEditAS
- DocumentEditAS
- AttributeEditAS

The NodeEditAS Interface

Extends the Node interface with methods for guided document editing.

Java binding:

package org.w3c.dom.as;

import org.w3c.dom.Node;

public interface NodeEditAS extends Node {

    // ASCheckType
  public static final short WF_CHECK               = 1;
  public static final short NS_WF_CHECK            = 2;
  public static final short PARTIAL_VALIDITY_CHECK = 3;
  public static final short STRICT_VALIDITY_CHECK  = 4;

  public boolean canInsertBefore(Node newChild, Node refChild);
  public boolean canRemoveChild(Node oldChild);
  public boolean canReplaceChild(Node newChild, Node oldChild);
  public boolean canAppendChild(Node newChild);
  public boolean isNodeValid(boolean deep, short wFValidityCheckLevel)
     throws ASException;

}

IDL:

interface NodeEditAS : Node {

  // ASCheckType
  const unsigned short WF_CHECK               = 1;
  const unsigned short NS_WF_CHECK            = 2;
  const unsigned short PARTIAL_VALIDITY_CHECK = 3;
  const unsigned short STRICT_VALIDITY_CHECK  = 4;

  boolean canInsertBefore(in Node newChild, in Node refChild);
  boolean canRemoveChild(in Node oldChild);
  boolean canReplaceChild(in Node newChild, in Node oldChild);
  boolean canAppendChild(in Node newChild);
  boolean isNodeValid(in boolean deep, in unsigned short wFValidityCheckLevel)
   raises(ASException);
   
};

The ElementEditAS Interface

Extends the DOM NodeEditAS interface with methods for determining the legal attributes and children of an element.

Java binding:

package org.w3c.dom.as;

public interface ElementEditAS extends NodeEditAS {
  public NodeList getDefinedElementTypes();

  public short contentType();

  public boolean canSetAttribute(String name, String value);
  public boolean canSetAttributeNode(Attr attrNode);
  public boolean canSetAttributeNS(String name, String value, String namespaceURI);
  public boolean canRemoveAttribute(String name);
  public boolean canRemoveAttributeNS(String name, String namespaceURI);
  public boolean canRemoveAttributeNode(Node attrNode);

  public NodeList getChildElements();
  public NodeList getParentElements();
  public NodeList getAttributeList();
  public boolean  isElementDefined(String elemTypeName);
  public boolean  isElementDefinedNS(String elemTypeName, 
     String namespaceURI, String name);

}

IDL:

interface ElementEditAS : NodeEditAS {

  readonly attribute NodeList definedElementTypes;
  
  unsigned short contentType();
  
  boolean canSetAttribute(in DOMString attrname, in DOMString attrval);
  boolean canSetAttributeNode(in Attr attrNode);
  boolean canSetAttributeNS(in DOMString name, in DOMString attrval, in DOMString namespaceURI);
  boolean canRemoveAttribute(in DOMString attrname);
  boolean canRemoveAttributeNS(in DOMString attrname, in DOMString namespaceURI);
  boolean canRemoveAttributeNode(in Node attrNode);
  
  NodeList getChildElements();
  NodeList getParentElements();
  NodeList getAttributeList();
  boolean  isElementDefined(in DOMString elemTypeName);
  boolean  isElementDefinedNS(in DOMString elemTypeName, in DOMString namespaceURI, in DOMString name);
};

The CharacterDataEditAS Interface

Extends the NodeEditAS interface with methods to determine whether or not certain text can be added at a particular place.

Java binding:

package org.w3c.dom.as;

public interface CharacterDataEditAS extends NodeEditAS {

  public boolean getIsWhitespaceOnly();

  public boolean canSetData(int offset, int count);
  public boolean canAppendData(String arg);
  public boolean canReplaceData(int offset, int count, String data);
  public boolean canInsertData(int offset, String data);
  public boolean canDeleteData(int offset, int count);

}

IDL:

interface CharacterDataEditAS : NodeEditAS {

  readonly attribute boolean isWhitespaceOnly;
  
  boolean canSetData(in unsigned long offset, in unsigned long count);
  boolean canAppendData(in DOMString arg);
  boolean canReplaceData(in unsigned long offset, in unsigned long count, in DOMString arg);
  boolean canInsertData(in unsigned long offset, in DOMString arg);
  boolean canDeleteData(in unsigned long offset, in unsigned long count);
  
};

The DocumentEditAS Interface

Extends the NodeEditAS interface with methods to turn continuous validaity checking on or off

Java binding:

package org.w3c.dom.as;

public interface DocumentEditAS extends NodeEditAS {

  public boolean getContinuousValidityChecking();
  public void    setContinuousValidityChecking(boolean continuousValidityChecking);

}

IDL:

interface DocumentEditAS : NodeEditAS {
           attribute boolean         continuousValidityChecking;
};

To Learn More

Document Object Model (DOM) Level 3 Abstract Schemas and Load and Save Specification: http://www.w3.org/TR/DOM-Level-3-ASLS
Document Object Model (DOM) Level 3 Core Specification Version 1.0: http://www.w3.org/TR/DOM-Level-3-Core
Document Object Model (DOM) Requirements: http://www.w3.org/TR/DOM-Requirements/
Document Object Model (DOM) Level 3 Views and Formatting Specification: http://www.w3.org/TR/DOM-Level-3-Views/
Document Object Model (DOM) Level 3 XPath Specification: http://www.w3.org/TR/DOM-Level-3-Events/
Document Object Model (DOM) Level 3 Events Specification: http://www.w3.org/TR/DOM-Level-3-Views/

To Learn More

This presentation: http://www.cafeconleche.org/slides/xmlone/london2002/advancedxml/
XML Infoset Specification: http://www.w3.org/TR/xml-infoset
Processing XML with Java: http://www.cafeconleche.org/books/xmljava/

Index | Cafe con Leche