Advanced XML

Elliotte Rusty Harold

Software Development 2001 West

Monday, April 9, 2001

elharo@metalab.unc.edu

http://www.ibiblio.org/xml/

Outline

Part I: XML Infoset, Canonical XML, and Digital Signatures
Part II: Schemas
Part III: XSLT 1.1 and Beyond
Part IV: DOM Level 3
Part V: JDOM
Part VI: XML Hypertext
Part VII: The Oracle Speaks

Part I: XML Infoset

The Infoset is the unfortunate standard to which those in retreat from the radical and most useful implications of well-formedness have rallied. At its core the Infoset insists that there is 'more' to XML than the straightforward syntax of well-formedness. By imposing its canonical semantics the Infoset obviates the infinite other semantic outcomes which might be elaborated in particular unique circumstances from an instance of well-formed XML 1.0 syntax. The question we should be asking is not whether the Infoset has chosen the correct canonical semantics, but whether the syntactic possibilities of XML 1.0 should be curtailed in this way at all.

--Walter Perry on the xml-dev mailing list

A normal XML document

<?xml version="1.0"?>
<SONG xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xsi:noNamespaceSchemaLocation="song.xsd">
  <TITLE>Hot Cop</TITLE>
  <COMPOSER>Jacques Morali</COMPOSER>
  <COMPOSER>Henri Belolo</COMPOSER>
  <COMPOSER>Victor Willis</COMPOSER>
  <PRODUCER>Jacques Morali</PRODUCER>
  <PUBLISHER>PolyGram Records</PUBLISHER>
  <LENGTH>6:20</LENGTH>
  <YEAR>1978</YEAR>
  <ARTIST>Village People</ARTIST>  
</SONG>

A canonical XML document

<?xml-stylesheet type="text/css" href="song.css"?>
<SONG xmlns="http://www.ibiblio.org/xml/namespace/song" xmlns:xlink="http://www.w3.org/1999/xlink">
  <TITLE>Hot Cop</TITLE>
  <PHOTO ALT="Victor Willis in Cop Outfit" HEIGHT="200" WIDTH="100" xlink:href="hotcop.jpg" xlink:show="onLoad" xlink:type="simple"></PHOTO>
  <COMPOSER>Jacques Morali</COMPOSER>
  <COMPOSER>Henri Belolo</COMPOSER>
  <COMPOSER>Victor Willis</COMPOSER>
  <PRODUCER>Jacques Morali</PRODUCER>
  
  <PUBLISHER xlink:href="http://www.amrecords.com/" xlink:type="simple">
    A &amp; M Records
  </PUBLISHER>
  <LENGTH>6:20</LENGTH>
  <YEAR>1978</YEAR>
  <ARTIST>Village People</ARTIST>
</SONG>

An org.w3c.dom.Document object formed by reading hotcop.xml

import org.apache.xerces.parsers.*;
import org.w3c.dom.*;
import org.xml.sax.*;
import java.io.*;

public class DOMHotCop {

  public static void main(String[] args) {

    DOMParser parser = new DOMParser();

    try {
      parser.parse("http://www.ibiblio.org/xml/examples/hot_cop.xml");
      Document d = parser.getDocument();
    }
    catch (SAXException e) {
      System.err.println(e);
    }
    catch (IOException e) {
      System.err.println(e);
    }

  }

}

Are these three the same thing or not?

The customary form of an XML document
The canonical form of an XML document
The object form of an XML document

What is the XML InfoSet?

A W3C proposed standard providing "a consistent set of definitions for use in other specifications that need to refer to the information in a well-formed XML document." This is considerably weakee than originally planned.
What it used to be: A W3C proposed standard for what is and is not significant in an XML document
Not everyone agrees that this is a good thing! or that this is the right list!

The InfoSet defines 11 kinds of Information Items

The Document Information Item
Element Information Items
Attribute Information Items
Processing Instruction Information Items
Unparsed Entity Information Items
Unexpanded Entity Information Items
Character Information Items
Comment Information Items
The Document Type Declaration Information Item
Notation Information Items
Namespace Information Items

The Document Information Item

Represents the entire document; not just the root element
Properties:
- Children
  - One Element Information Item for the root element
  - One Comment Information Item for each Comment
  - One Processing Instruction Information Item for each Processing Instruction
- Document Entity
- Document Element
- Notation Declarations
- Entity Declarations
- Base URI
- Standalone Declaration
- Version Declaration

Elements

<PHOTO 
  xlink:type="simple" xlink:show="onLoad" xlink:href="hotcop.jpg"
  ALT="Victor Willis in Cop Outfit" WIDTH="100" HEIGHT="200"/>
  
<COMPOSER>Jacques Morali</COMPOSER>

<COMPOSER>
  <PERSON>
    <NAME>
      <FIRST>Henri</FIRST>
      <LAST>Belolo</LAST>
    </NAME>
  </PERSON>
</COMPOSER>

<rdf:RDF xmlns:rdf="http://www.w3.org/TR/REC-rdf-syntax#">
  <rdf:Description xmlns:dc="http://purl.org/dc/"
     about="http://www.ibiblio.org/examples/impressionists.xml">
    <dc:title> Impressionist Paintings </dc:title>
    <dc:creator> Elliotte Rusty Harold </dc:creator>
    <dc:description> 
      A list of famous impressionist paintings organized 
      by painter and date 
    </dc:description>
    <dc:date>2000-08-22</dc:date>
  </rdf:Description>
</rdf:RDF>

Element Information Items

An Element Information Item Includes:

namespace name; e.g. the absolute URI for the element's namespace
local name
prefix
children: a list of element, processing instruction, reference to skipped entity, character, and comment information items, one for each element, processing instruction, reference to an unprocessed external entity, data character, and comment appearing immediately within the current element
attributes: an unordered set of attribute information items, one for each of the attributes (specified or defaulted from the DTD) of this element. xmlns attributes declarations are not include.
namespace attributes: an unordered set of attribute information items, one for each of the namespaces declared either in the start-tag of this element or defaulted from the DTD.
in-scope namespaces: An unordered set of namespace information items, one for each of the namespaces in effect for this element
base URI: The absolute URI of the external entity in which this element appears, as defined in XML Base. If this is not known, this property is null.
parent

Attributes

xlink:type="simple"
xlink:href="http://www.amrecords.com/"
xlink:type =  "simple"
xlink:show = "onLoad"
xlink:href="hotcop.jpg"
ALT="Victor Willis in Cop Outfit"
WIDTH=" 100 "
HEIGHT=' 200 '

An Attribute Information Item Includes:

namespace name
local name
prefix
normalized value
specified: A flag indicating whether this attribute was actually specified in the start-tag of its element, or was defaulted from the DTD
attribute type:
- ID
- IDREF
- IDREFS
- ENTITY
- ENTITIES
- NMTOKEN
- NMTOKENS
- NOTATION
- CDATA
- ENUMERATED
owner element

Comments

  <!-- The publisher is actually Polygram but I needed 
       an example of a general entity reference. -->
<!--  <PUBLISHER xlink:type="simple" xlink:href="http://www.amrecords.com/">
    A &amp; M Records
  </PUBLISHER>
  <LENGTH>6:20</LENGTH>
  <YEAR>1978</YEAR>
  <ARTIST>Village People</ARTIST>
</SONG> -->
<!-- You can tell what album I was 
     listening to when I wrote this example -->

A comment Information Item includes:

content
parent

A Processing Instruction Information Item Includes:

<?robots index="yes" follow="no"?>
<?php 
  mysql_connect("database.unc.edu", "clerk", "password"); 
  $result = mysql("CYNW", "SELECT LastName, FirstName FROM Employees 
    ORDER BY LastName, FirstName"); 
  $i = 0;
  while ($i < mysql_numrows ($result)) {
     $fields = mysql_fetch_row($result);
     echo "<person>$fields[1] $fields[0] </person>\r\n";
     $i++;
  }
  mysql_close();
?>

target
content
base URI
parent

Characters

A character is one Unicode character in the content of an element, attribute value, comment or processing instruction data.
A Character Information Item includes:

character code
The Unicode value in the range 0 to #x10FFFF of the character

element content whitespace
A flag indicating whether the character is whitespace appearing within element content

parent

Namespaces

xmlns:rdf="http://www.w3.org/TR/REC-rdf-syntax#"
xmlns:dc="http://purl.org/dc/"
xmlns="http://www.w3.org/Graphics/SVG/SVG-19991203.dtd"

There is one namespace information item for each namespace actually used on an element or attribute somehwer ein the document.
A Namespace Information Item includes:
- prefix
- namespace name

Document Type Declaration

<!DOCTYPE SONG SYSTEM "song.dtd">
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
                       "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

A Document Type Declaration Information Item includes:

SYSTEM ID
PUBLIC ID
children: only the comment and processing instruction information items in the internal DTD subset and external DTD subsets.

parent

Document Type Definition

<!ELEMENT SONG (TITLE, PHOTO?, COMPOSER+, PRODUCER*, 
 PUBLISHER*, LENGTH?, YEAR?, ARTIST+)>
<!ATTLIST SONG xmlns       CDATA #REQUIRED
               xmlns:xlink CDATA #REQUIRED>
<!ELEMENT TITLE (#PCDATA)>

<!ELEMENT PHOTO EMPTY>
<!ATTLIST PHOTO xlink:type CDATA #FIXED "simple"
                xlink:href CDATA #REQUIRED
                xlink:show CDATA #IMPLIED
                ALT        CDATA #REQUIRED
                WIDTH      CDATA #REQUIRED
                HEIGHT     CDATA #REQUIRED
>

<!ELEMENT COMPOSER (#PCDATA)>
<!ELEMENT PRODUCER (#PCDATA)>
<!ELEMENT PUBLISHER (#PCDATA)>
<!ATTLIST PUBLISHER xlink:type CDATA #IMPLIED
                    xlink:href CDATA #IMPLIED
>

<!ELEMENT LENGTH (#PCDATA)>
<!-- This should be a four digit year like "1999",
     not a two-digit year like "99" -->
<!ELEMENT YEAR (#PCDATA)>

<!ELEMENT ARTIST (#PCDATA)>

There is no information itme for this.
Comments and processing instructions in the DTD are reported as children the Document Type Declaration information item
Notation and general entity declarations are reported as properties of the Document information item
Attribute types and default values are reported on the actual attributes in the document instance.
Everything else is not reported!

Entities

An XML document is made up of one or more physical storage units called entities
Entity references :
- Parsed internal general entity references like &
- Parsed external general entity references
- Unparsed external general entity references
- External parameter entity references
- Internal parameter entity references
Reading an XML document is not the same thing as reading an XML file

The file contains entity references.
The file document contains the entities' replacement text.
When you use a parser to read a document you'll get the text including characters like <. You will not see the entity references.

Entity Information Items

Two kinds of entity information items:
- Unparsed Entity Information Item
- Unexpanded Entity Information Items
Other entities are not reported

Unparsed Entity Information Items

name
system identifier
public identifier
Notation

Unexpanded Entity Information Items

name
entity
parent

The InfoSet Omits:

The internal and external DTD subsets; especially ELEMENT and ATTLIST declarations
Document encoding
CDATA sections
Character references
Expanded, parsed entity references
Whether an empty element uses two tags or one
What kind of quotes surround attributes
Insignificant white space in attributes
White space that occurs between attributes
Attribute order

Canonical XML

A W3C proposed standard serialization format of an XML document instance
Not everyone agrees that this is a good thing! or that this is the right format! It's totally unsuitable for editors and validation.
Based on the XPath data model
Not really InfoSet compatible
Something of this nature is nonetheless clearly needed for non-XML aware tools like digital signatures, change management, hash functions, and the like.

How are documents canonicalized?

The document is encoded in UTF-8
Line breaks are normalized to a linefeed (ASCII , \n)
Attribute values are normalized, as if by a validating processor
Character and parsed entity references are replaced
CDATA sections are replaced with their character content
The XML and document type declarations are removed
Empty elements are converted to start tag-end tag pairs
White space outside of the document element and within start and end tags is normalized
All white space in character content is retained (except for characters removed during linefeed normalization)
Attribute value delimiters are set to double quotes
Special characters in attribute values and character content are replaced by character references
Superfluous namespace declarations are removed from each element
Default attributes are added to each element
Lexicographic order is imposed on the namespace declarations and attributes of each element

Canonicalization software

XML Canonicalizer from IBM's XML Security Suite: http://www.alphaworks.ibm.com/tech/xmlsecuritysuite
C14nDOM reads an XML document from stdin and writes the canonicalized output to stdout:
% java C14nDOM -xpath < hotcop.xml > canonicalized_hotcop.xml
-xpath option necessary to support October 26, 2000 working draft and later versions.

Digital Signatures

W3C/IETF Joint Candidate Recommendation, October 31, 2000
XML Signatures provide

Integrity
Message authentication
Signer authentication

For data of any type

Not Just for Signing XML

Signed data can be located within the XML that includes the signature or elsewhere.
An enveloped signature is enclosed inside the XML element it signs
An enveloping signature signs XML data it contains.
A detached signature signs data external to the Signature element, possibly in another document entirely.

Signature Process

The signature processor digests a data object.
The processor places the digest value in a Signature element.
The processor digests the Signature element.
The processor cryptographically signs the Signature element.

XML Digital Signature software

SampleSign2 and VerifyGUI from IBM's XML Security Suite: http://www.alphaworks.ibm.com/tech/xmlsecuritysuite
First use the JDK's keytool to generate a key:
% keytool -genkey -dname "CN=Elliotte Rusty Harold, OU=Metrotech, O=Polytechnic, L=Brooklyn, S=New York, C=US" -alias elharo -storepass mypassword -keypass mykeypassword
SampleSign2 reads an XML document from stdin and writes the signature to stdout:
C:\> java SampleSign2 elharo mypassword mykeypassword -ext http://www.ibiblio.org/xml/slides/hoffman/fundamentals/examples/hotcop.xml > hotcop_signature.xml Key store: C:\Documents and Settings\Administrator\.keystore Sign: 7030ms
VerifyGUI reads signature from stdinand warns of changes to signed content.
C:\>java VerifyGUI < hotcop_signature.xml The signature has a KeyValue element. The signature has one or more X509Data elements. Checks an X509Data: It has 1 certificate(s). Certificate Information: Version: 1 Validity: OK SubjectDN: CN=Elliotte Rusty Harold, OU=Metrotech, O=Polytechnic, L=Brooklyn, ST=New York, C=US IssuerDN: CN=Elliotte Rusty Harold, OU=Metrotech, O=Polytechnic, L=Brooklyn, ST=New York, C=US Serial#: 983556890 Time to verify: 951 [msec]

A Detached Signature for hotcop.xml

<?xml version='1.0' encoding='UTF-8'?>
<Signature xmlns="http://www.w3.org/2000/09/xmldsig#">
  <SignedInfo>
    <CanonicalizationMethod Algorithm="http://www.w3.org/TR/2000/WD-xml-c14n-20000119"/>
    <SignatureMethod Algorithm="http://www.w3.org/2000/09/xmldsig#dsa-sha1"/>
    <Reference URI="http://www.ibiblio.org/xml/slides/hoffman/fundamentals/examples/hotcop.xml">
      <DigestMethod Algorithm="http://www.w3.org/2000/09/xmldsig#sha1"/>
      <DigestValue>nvfYilfgN/rICyzhGmjidKCFoC8=</DigestValue>
    </Reference>
  </SignedInfo>
  <SignatureValue>
    hfowa4qdbuMkoZfX1/VXd4UBpIpZMM5+6CElmY7jOIKFqvXq5A5VKw==
  </SignatureValue>
  <KeyInfo>
    <KeyValue>
      <DSAKeyValue>
        <P>
          /X9TgR11EilS30qcLuzk5/YRt1I870QAwx4/gLZRJmlFXUAiUftZPY1Y+r/F9bow9s
          ubVWzXgTuAHTRv8mZgt2uZUKWkn5/oBHsQIsJPu6nX/rfGG/g7V+fGqKYVDwT7g/bT
          xR7DAjVUE1oWkTL2dfOuK2HXKu/yIgMZndFIAcc=
        </P>
        <Q>l2BQjxUjC8yykrmCouuEC/BYHPU=</Q>
        <G>
          9+GghdabPd7LvKtcNrhXuXmUr7v6OuqC+VdMCz0HgmdRWVeOutRZT+ZxBxCBgLRJFn
          Ej6EwoFhO3zwkyjMim4TwWeotUfI0o4KOuHiuzpnWRbqN/C/ohNWLx+2J6ASQ7zKTx
          vqhRkImog9/hWuWfBpKLZl6Ae1UlZAFMO/7PSSo=
        </G>
        <Y>
          6jKpNnmkkWeArsn5Oeeg2njcz+nXdk0f9kZI892ddlR8Lg1aMhPeFTYuoq3I6neFlb
          BjWzuktNZKiXYBfKsSTB8U09dTiJo2ir3HJuY7eW/p89osKMfixPQsp9vQMgzph6Qa
          lY7j4MB7y5ROJYsTr1/fFwmj/yhkHwpbpzed1LE=
        </Y>
      </DSAKeyValue>
    </KeyValue>
    <X509Data>
      <X509IssuerSerial>
        <X509IssuerName>CN=Elliotte Rusty Harold, OU=Metrotech, O=Polytechnic, L=Brooklyn, ST=New York, C=US</X509IssuerName>
        <X509SerialNumber>983556890</X509SerialNumber></X509IssuerSerial>
      <X509SubjectName>CN=Elliotte Rusty Harold, OU=Metrotech, O=Polytechnic, L=Brooklyn, ST=New York, C=US</X509SubjectName>
      <X509Certificate>
MIIDLzCCAu0CBDqf4xowCwYHKoZIzjgEAwUAMH0xCzAJBgNVBAYTAlVTMREwDwYDVQQIEwhOZXcg
WW9yazERMA8GA1UEBxMIQnJvb2tseW4xFDASBgNVBAoTC1BvbHl0ZWNobmljMRIwEAYDVQQLEwlN
ZXRyb3RlY2gxHjAcBgNVBAMTFUVsbGlvdHRlIFJ1c3R5IEhhcm9sZDAeFw0wMTAzMDIxODE0NTBa
Fw0wMTA1MzExODE0NTBaMH0xCzAJBgNVBAYTAlVTMREwDwYDVQQIEwhOZXcgWW9yazERMA8GA1UE
BxMIQnJvb2tseW4xFDASBgNVBAoTC1BvbHl0ZWNobmljMRIwEAYDVQQLEwlNZXRyb3RlY2gxHjAc
BgNVBAMTFUVsbGlvdHRlIFJ1c3R5IEhhcm9sZDCCAbgwggEsBgcqhkjOOAQBMIIBHwKBgQD9f1OB
HXUSKVLfSpwu7OTn9hG3UjzvRADDHj+AtlEmaUVdQCJR+1k9jVj6v8X1ujD2y5tVbNeBO4AdNG/y
ZmC3a5lQpaSfn+gEexAiwk+7qdf+t8Yb+DtX58aophUPBPuD9tPFHsMCNVQTWhaRMvZ1864rYdcq
7/IiAxmd0UgBxwIVAJdgUI8VIwvMspK5gqLrhAvwWBz1AoGBAPfhoIXWmz3ey7yrXDa4V7l5lK+7
+jrqgvlXTAs9B4JnUVlXjrrUWU/mcQcQgYC0SRZxI+hMKBYTt88JMozIpuE8FnqLVHyNKOCjrh4r
s6Z1kW6jfwv6ITVi8ftiegEkO8yk8b6oUZCJqIPf4VrlnwaSi2ZegHtVJWQBTDv+z0kqA4GFAAKB
gQDqMqk2eaSRZ4Cuyfk556DaeNzP6dd2TR/2Rkjz3Z12VHwuDVoyE94VNi6ircjqd4WVsGNbO6S0
1kqJdgF8qxJMHxTT11OImjaKvccm5jt5b+nz2iwox+LE9Cyn29AyDOmHpBqVjuPgwHvLlE4lixOv
X98XCaP/KGQfClunN53UsTALBgcqhkjOOAQDBQADLwAwLAIUODqxsFzS96BjrVA4LVo5FzuWBRMC
FC0xfXxbaJaCJuVqtcBv4bqwV0EX
      </X509Certificate>
    </X509Data>
  </KeyInfo>
</Signature>

To Learn More

XML InfoSet Specification: http://www.w3.org/TR/xml-infoset
Canonical XML Specification: http://www.w3.org/TR/xml-c14n
XML Signature Specification: http://www.w3.org/TR/xmldsig-core/

Part II: Schemas

Schemas are not the salvation for the world of Markup Languages, just as DTDs aren't the embodiment of evil.

--Ann Navarro on the XHTML-L mailing list

What are Schemas?

Generically, a document that describes what a correct document may contain
Specifically, a W3C Recommendation for an XML-document syntax that describes the permissible contents of XML documents

About Schemas

Created by W3C XML Schema Working Group based on many different submissions
No known patent, trademark, or other IP restrictions
XML Schema Part 1: Structures: http://www.w3.org/TR/xmlschema-1/
XML Schema Part 2: Datatypes: http://www.w3.org/TR/xmlschema-2/

What's Wrong with DTDs?

Unusual, non-XML like syntax
No data typing, especially for element content
Limited extensibility
Only marginally compatible with namespaces
Cannot use mixed content and enforce order and number of child elements
Cannot enforce number of child elements without also enforcing order. (i.e. no & operator from SGML)

Schema versions

Last call working draft from April 7, 2000
Candidate Recommendation October 24, 2000
Proposed Recommendation March 16, 2001
2nd Proposed Recommendation March 30, 2001

greeting.xml

<?xml version="1.0"?>
<GREETING>
Hello XML!
</GREETING>

greeting.xsd

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
 
  <xsd:element name="GREETING" type="xsd:string"/>

</xsd:schema>

Attaching the schema to the document without namespaces

xsi:noNamespaceSchemaLocation attribute on root element
xsi prefix is mapped to http://www.w3.org/2001/XMLSchema-instance URI

For example,

<?xml version="1.0"?>
<GREETING xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:noNamespaceSchemaLocation="greeting.xsd">
Hello XML!
</GREETING>

Other means of connecting schemas to documents are allowed

Validating the document with Xerces-J 1.3.0

D:\schemas\examples>java sax.SAX2Count -v greeting2.xml
greeting2.xml: 701 ms (1 elems, 1 attrs, 0 spaces, 12 chars)

An Invalid Document

<?xml version="1.0"?>
<GREETING xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:noNamespaceSchemaLocation="greeting.xsd">
  <P>Hello XML!</P>
</GREETING>

Checking the Invalid Document

D:\speaking\Software Development 2001 West\schemas\examples>java sax.SAX2Count -v greeting3.xml
[Error] greeting3.xml:4:6: Element type "P" must be declared.
[Error] greeting3.xml:5:13: Datatype error: In element 'GREETING' : Can not have
 element children within a simple type content.
greeting3.xml: 781 ms (2 elems, 1 attrs, 0 spaces, 14 chars)

A More Complex Document

<?xml version="1.0"?>
<SONG xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xsi:noNamespaceSchemaLocation="simple_song.xsd">
  <TITLE>Hot Cop</TITLE>
  <COMPOSER>Jacques Morali</COMPOSER>
  <COMPOSER>Henri Belolo</COMPOSER>
  <COMPOSER>Victor Willis</COMPOSER>
  <PRODUCER>Jacques Morali</PRODUCER>
  <PUBLISHER>PolyGram Records</PUBLISHER>
  <LENGTH>6:20</LENGTH>
  <YEAR>1978</YEAR>
  <ARTIST>Village People</ARTIST>  
</SONG>

Complex vs. Simple Types

Complex types can have child elements and attributes
Simple types cannot have children or attributes

Three main schema elements:

xsd:element declares an element and assigns it a type
xsd:attribute declares an attribute and assigns it a type
xsd:complexType defines a new type

A More Complex Schema

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">

  <xsd:element name="SONG" type="songType"/>
 
  <xsd:complexType name="songType">
    <xsd:sequence>
      <xsd:element name="TITLE"     type="xsd:string"
        minOccurs="1" maxOccurs="1"/>
      <xsd:element name="COMPOSER"  type="xsd:string" 
        minOccurs="1" maxOccurs="unbounded"/>
      <xsd:element name="PRODUCER"  type="xsd:string" 
        minOccurs="0" maxOccurs="unbounded"/>
      <xsd:element name="PUBLISHER" type="xsd:string" 
        minOccurs="0" maxOccurs="1"/>
      <xsd:element name="LENGTH"    type="xsd:timeDuration"
        minOccurs="1" maxOccurs="1"/>
      <xsd:element name="YEAR"      type="xsd:gYear"
        minOccurs="1" maxOccurs="1"/>
      <xsd:element name="ARTIST"    type="xsd:string" 
        minOccurs="1" maxOccurs="unbounded"/>
    </xsd:sequence>
  </xsd:complexType>
 
</xsd:schema>

Validating the Song Document

D:\speaking\Software Development 2001 West\schemas\examples>java sax.SAX2Count -v hotcop.xml [Error] hotcop.xml:10:25: Datatype error: java.text.ParseException: Illegal or misplaced separator.

Here's the problem:

<?xml version="1.0"?>
<SONG xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xsi:noNamespaceSchemaLocation="song.xsd">
  <TITLE>Hot Cop</TITLE>
  <COMPOSER>Jacques Morali</COMPOSER>
  <COMPOSER>Henri Belolo</COMPOSER>
  <COMPOSER>Victor Willis</COMPOSER>
  <PRODUCER>Jacques Morali</PRODUCER>
  <PUBLISHER>PolyGram Records</PUBLISHER>
  <LENGTH>6:20</LENGTH>
  <YEAR>1978</YEAR>
  <ARTIST>Village People</ARTIST>
</SONG>

This is not in the schema time duration format! which is ISO 8601 "PnYn MnDTnH nMnS, where nY represents the number of years, nM the number of months, nD the number of days, 'T' is the date/time separator, nH the number of hours, nM the number of minutes and nS the number of seconds. The number of seconds can include decimal digits to arbitrary precision. An optional preceding minus sign ('-') is allowed, to indicate a negative duration."

Fixed Hot Cop

<?xml version="1.0"?>
<SONG xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xsi:noNamespaceSchemaLocation="simple_song.xsd">
  <TITLE>Hot Cop</TITLE>
  <COMPOSER>Jacques Morali</COMPOSER>
  <COMPOSER>Henri Belolo</COMPOSER>
  <COMPOSER>Victor Willis</COMPOSER>
  <PRODUCER>Jacques Morali</PRODUCER>
  <PUBLISHER>PolyGram Records</PUBLISHER>
  <LENGTH>P0YT6M20S</LENGTH>
  <YEAR>1978</YEAR>
  <ARTIST>Village People</ARTIST>
</SONG>

Xerces doesn't get this one right yet!

A Smaller Schema

Default value of minOccurs is 1
Default value of maxOccurs is 1

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">

  <xsd:element name="SONG" type="songType"/>
 
  <xsd:complexType name="songType">
    <xsd:sequence>
      <xsd:element name="TITLE"     type="xsd:string"/>
      <xsd:element name="COMPOSER"  type="xsd:string" 
        maxOccurs="unbounded"/>
      <xsd:element name="PRODUCER"  type="xsd:string" 
        minOccurs="0" maxOccurs="unbounded"/>
      <xsd:element name="PUBLISHER" type="xsd:string" 
        minOccurs="0"/>
      <xsd:element name="LENGTH"    type="xsd:timeDuration"/>
      <xsd:element name="YEAR"      type="xsd:gYear"/>
      <xsd:element name="ARTIST"    type="xsd:string" 
        maxOccurs="unbounded"/>
    </xsd:sequence>
  </xsd:complexType>
 
</xsd:schema>

Anonymous Types

<?xml version="1.0"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">

  <xsd:element name="SONG" type="songType">
    <xsd:sequence>
      <xsd:element name="TITLE"     type="xsd:string"/>
      <xsd:element name="COMPOSER"  type="xsd:string" 
        minOccurs="1" maxOccurs="unbounded"/>
      <xsd:element name="PRODUCER"  type="xsd:string" 
        minOccurs="0" maxOccurs="unbounded"/>
      <xsd:element name="PUBLISHER" type="xsd:string" 
        minOccurs="0" maxOccurs="1"/>
      <xsd:element name="LENGTH"    type="xsd:timeDuration"/>
      <xsd:element name="YEAR"      type="xsd:string"/>
      <xsd:element name="ARTIST"    type="xsd:string" 
        minOccurs="1" maxOccurs="unbounded"/>
    </xsd:sequence>
  </xsd:element>
 
</xsd:schema>

Data Typing

Consider this document:

<foo>
    <value>45.67</value>
</foo>

What is the type of value?

Possible types

A decimal monetary type, as in COBOL
A fixed point number
An infinitely precise floating point number such as represented by the java.math.BigDecimal class
An IEEE754 double
A Java double
An IEEE 754 float
A VAX Fortran REAL
An imprecisely known decimal number with 4 significant digits that's plus or minus 1 in the last place.
An imprecisely known decimal number with 4 significant digits that's plus or minus 5 in the last place.
Build 67 of version 45 of Microsoft Word
A regular expression matching all strings that begin with the two characters '4' and '5', followed by a single character, followed by the two characters '6' and '7'.
A string of characters a monkey typed on a keyboard

Other interpretations are doubtless possible, and even make sense in particular contexts. There's no guarantee that the string 45.67 in fact represents any particular type.

The PSVI

A schema assigns an identifiable type to each element
Schema validation produces in a Post Schema Validation Infoset, PSVI for short
Schema aware applications using schema aware parsers and APIs can make use of the types of elements

Primitive Data Types for Schemas

Boolean
String
URIs
Numeric types
Time types
XML types
No money types. However, these can be derived

Numeric Data Types for Schemas

XML Schema Built-In Simple Types
Name	Type	Examples
float	IEEE 754 32-bit floating point number	-INF, -1E4, -0, 0, 12.78E-2, 12, INF, NaN
double	IEEE 754 64-bit floating point number	-INF, 1.401E-90, -1E4, -0, 0, 12.78E-2, 12, INF, NaN, 3.4E42
decimal	arbitrary precision, decimal numbers	-2.7E400, 5.7E-444, -3.1415292, 0, 7.8, 90200.76, 3.4E1024
integer	an arbitrarily large or small integer	-500000000000000000000000, -9223372036854775809, -126789, -1, 0, 1, 5, 23, 42, 126789, 9223372036854775808, 456734987324983264987362495809587095720978
nonPositiveInteger	an integer less than or equal to zero	0, -1, -2, -3, -4, -5, ...
negativeInteger	an integer strictly less than zero	-1, -2, -3, -4, -5, ...
long	an eight-byte two's complement integer such as Java's `long` type	-9223372036854775808, -12678967543233, -1, 9223372036854775807
int	an integer that can be represented as a four-byte, two's complement number such as Java's `int` type	-2147483648, -1, 0, 1, 5, 23, 42, 2147483647
short	an integer that can be represented as a two-byte, two's complement number such as Java's `short` type	-32768, -1, 0, 1, 5, 23, 42, 32767
byte	an integer that can be represented as a one-byte, two's complement number such as Java's `byte` type	-128, -1, 0, 1, 5, 23, 42, 127
nonNegativeInteger	an integer greater than or equal to zero	0, 1, 2, 3, 4, 5, ...
unsignedLong	an eight-byte unsigned integer	0, 1, 2, 3, 4, 5, ...18446744073709551614, 18446744073709551615
unsignedInt	a four-byte unsigned integer	0, 1, 2, 3, 4, 5, ...4294967294, 4294967295
unsignedShort	a two-byte unsigned integer	0, 1, 2, 3, 4, 5, ...65534, 65535
unsignedByte	a one-byte unsigned integer	0, 1, 2, 3, 4, 5, ...254, 255
positiveInteger	an integer strictly greater than zero	1, 2, 3, 4, 5, 6, ...

Time Data Types for Schemas

XML Schema Built-In Simple Types
Name	Type	Examples
timeInstant	a particular moment in Coordinated Universal Time; up to an arbitrarily small fraction of a second	1999-05-31T13:20:00.000-05:00
gMonth	A given month in a given year	2000-10
gYear	a given year	2000
recurringDate	a date in no particular year, or rather in every year	--10-31
recurringDay	a day in no particular month, or rather in every mnonth	----31
timeDuration	a length of time, without fixed endpoints, to an arbitrary fraction of a second	P2000Y10M31DT09H32M7.4312S
date	a specific day in history	2000-10-31
time	a specific time of day, that recurs every day	14:30:00.000, 09:30:00.000-05:00

XML Data Types for Schemas

XML Schema Built-In Simple Types
Name	Type	Examples
ID	XML 1.0 ID attribute type	any XML name that's unique among ID type attributes
IDREF	XML 1.0 IDREF attribute type	any XML name that's used as an ID type attribute elsewhere in the document
ENTITY	XML 1.0 ENTITY attribute type	any XML name that's declared as an unparsed entity in the DTD
NOTATION	XML 1.0 NOTATION attribute type	any XML name that's declared as a notation name in the DTD
language	valid values for xml:lang as defined in XML 1.0	en-GB, en-US, fr
IDREFS	XML 1.0 IDREFS attribute type	a white space separated list of IDREF names
ENTITIES	XML 1.0 ENTITIES attribute type	a white space separated list of ENTITY names
NMTOKEN	XML 1.0 NMTOKEN attribute type	12 are you ready
NMTOKENS	XML 1.0 NMTOKENS attribute type	a white space separated list of name tokens
Name	An XML 1.0 Name	set, title, rdf, math, math123, href
QName	a prefixed name	song:title
NCName	a local name without any colons	title

Assorted Data Types for Schemas

XML Schema Built-In Simple Types
Name	Type	Examples
string	Parsed Character Data; #PCDATA	Hot Cop
normalizedString	A string that does not contain any tabs, carriage returns, or linefeeds	PIC1, PIC2, PIC3, cow_movie, MonaLisa, Hello World , Warhol, red green
token	A string with no leading or trailing white space, no tabs, no linefeeds, and not more than one consecutive space	p1 p2, ss123 45 6789, _92, red, green, NT Decl, seventeenp1, p2, 123 45 6789, ^&^&_92, red green blue, NT-Decl, seventeen; Mary had a little lamb, The love of money is the root of all Evil.
boolean	C++'s bool type	true, false, 1, 0
anyURI	relative or absolute URI	http://www.w3.org/TR/2000/WD-xmlschema-2-20000407/#timeDuration, /javafaq/reports/JCE1.2.1.html
hexBinary	Arbitrary binary data encoded in hexadecimal form	A4E345EC54CC8D52198000FFEA6C
base64Binary	Arbitrary binary data encoded in Base64	6jKpNnmkkWeArsn5Oeeg2njcz+nXdk0f9kZI892ddlR8Lg1aMhPeFTYuoq3I6neFlb BjWzuktNZKiXYBfKsSTB8U09dTiJo2ir3HJuY7eW/p89osKMfixPQsp9vQMgzph6Qa lY7j4MB7y5ROJYsTr1/fFwmj/yhkHwpbpzed1LE=

Derived Types

You can derive new simple types from existing types by restricting the type to a subset of its normal values
An xsd:simpleType element defines the restricted type
The name attribute of xsd:simpleType assigns a name to the new type, by which it can be referred to in xsd:element type attributes.
An xsd:restriction child element specifies what type is being restricted via its base attribute.
Facet children of xsd:restriction specify the constraints on the type.

For example, this xsd:simpleType element defines a phonoYear as any year from 1877 (the year Edison invented the phonograph) on:

<xsd:simpleType name="phonoYear">
  <xsd:restriction base="xsd:gYear">
    <xsd:minInclusive value="1877"/>
  </xsd:restriction>
</xsd:simpleType>

Then you declare the year element like this:
<xsd:element type="phonoYear" />

Facets

Facets include:
- length
- minLength
- maxLength
- pattern
- enumeration
- whiteSpace
- maxInclusive
- maxExclusive
- minInclusive
- minExclusive
- totalDigits
- fractionDigits
- period
- duration
Not all facets apply to all types.

Facets for strings: length, minLength, maxLength

The number of characters allowed in a string
Must be a non-negative integer
Applies to string, normalizedString, token, hexBinary, base64Binary, QName, NCname, ID, IDREF, IDREFS, language, anyURI, ENTITY, ENTITIES, NOTATION, NOTATIONS, NMTOKEN and NMTOKENS type items

For example, to say that all names and titles must contain between 1 and 255 characters:

<?xml version="1.0"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">

  <xsd:element name="SONG" type="songType"/>
 
  <xsd:simpleType name="Str255">
    <xsd:restriction base="xsd:string">
       <xsd:minLength value="1"/>
       <xsd:maxLength value="255"/>
    </xsd:restriction>
  </xsd:simpleType>
 
  <xsd:complexType name="songType">
    <xsd:sequence>
      <xsd:element name="TITLE"     type="Str255"/>
      <xsd:element name="COMPOSER"  type="Str255"
        maxOccurs="unbounded"/>
      <xsd:element name="PRODUCER"  type="Str255" 
        minOccurs="0" maxOccurs="unbounded"/>
      <xsd:element name="PUBLISHER" type="Str255" 
        minOccurs="0"/>
      <xsd:element name="LENGTH"    type="xsd:timeDuration"/>
      <xsd:element name="YEAR"      type="xsd:gYear"/>
      <xsd:element name="ARTIST"    type="Str255" 
        maxOccurs="unbounded"/>
    </xsd:sequence>
  </xsd:complexType>
 
</xsd:schema>

Facets for ordered items: minExclusive, maxExclusive, minInclusive, maxInclusive

Determines the minimum and maximum allowed values
Applies to ordered simple types including byte, unsignedByte, integer, positiveInteger, negativeInteger, nonNegativeInteger, nonPositiveInteger, int, unsignedInt, long, number, unsignedLong, short, unsignedShort, number, float, double, time, timeInstant, timePeriod, timeDuration, date, gMonth, gYear, recurringDay, and recurringDate.

For example, to say that the year must be between 1877 and 2100:

<?xml version="1.0"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">

  <xsd:element name="SONG" type="songType"/>
 
  <xsd:simpleType name="phonoYear">
    <xsd:restriction base="xsd:gYear">
      <xsd:minInclusive value="1877"/>
      <xsd:maxInclusive value="2100"/>
    </xsd:restriction>
  </xsd:simpleType> 
 
  <xsd:complexType name="songType">
    <xsd:sequence>
      <xsd:element name="TITLE"     type="xsd:string"/>
      <xsd:element name="COMPOSER"  type="xsd:string"
        maxOccurs="unbounded"/>
      <xsd:element name="PRODUCER"  type="xsd:string" 
        minOccurs="0" maxOccurs="unbounded"/>
      <xsd:element name="PUBLISHER" type="xsd:string" 
        minOccurs="0"/>
      <xsd:element name="LENGTH"    type="xsd:timeDuration"/>
      <xsd:element name="YEAR"      type="phonoYear"/>
      <xsd:element name="ARTIST"    type="xsd:string" 
        maxOccurs="unbounded"/>
    </xsd:sequence>
  </xsd:complexType>
 
</xsd:schema>

whiteSpace

Determines what the application should do with white space found in the content
Three possible values:
- preserve: The white space in the input document is left unchanged
- replace: Each tab, carriage return and linefeed is replaced with a single space.
- collapse: Each tab, carriage return and linefeed is replaced with a single space. Furthermopre, after this replacement is performed, all runs of multiple spaces are condensed to a single space. leading and trailing white space is deleted.
No effect on validation
Applies to string, normalizedString and token type items
Per XML 1.0, white space in attributes is normalized irregardless of the schema

For example, to say that white space should be collapsed in all names and titles:

<?xml version="1.0"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">

  <xsd:element name="SONG" type="songType"/>

  <xsd:simpleType name="NormalizedString">
    <xsd:restriction base="xsd:string">
       <xsd:whiteSpace value="collapse"/>
    </xsd:restriction>
  </xsd:simpleType>
 
  <xsd:complexType name="songType">
    <xsd:sequence>
      <xsd:element name="TITLE"     type="NormalizedString"/>
      <xsd:element name="COMPOSER"  type="NormalizedString"
        maxOccurs="unbounded"/>
      <xsd:element name="PRODUCER"  type="NormalizedString" 
        minOccurs="0" maxOccurs="unbounded"/>
      <xsd:element name="PUBLISHER" type="NormalizedString" 
        minOccurs="0"/>
      <xsd:element name="LENGTH"    type="xsd:timeDuration"/>
      <xsd:element name="YEAR"      type="xsd:gYear"/>
      <xsd:element name="ARTIST"    type="NormalizedString" 
        maxOccurs="unbounded"/>
    </xsd:sequence>
  </xsd:complexType>
 
</xsd:schema>

Facets for decimal numbers: totalDigits and fractionDigits

The totalDigits facet specifies the maximum number of decimal digits in a number as a positive integer
The fractionDigits facet specifies the maximum number of decimal digits to the right of the decimal point as a non-negative integer
Applies to all number types except base64Binary and hexBinary including byte, unsignedByte, byte, integer, positiveInteger, negativeInteger, nonNegativeInteger, nonPositiveInteger, int, unsignedInt, long, unsignedLong, short, unsignedShort and number
You can specify at most two fractional digits or at most seven decimal digits, but not at least two fractional digits or exactly seven decimal digits

Facets for time: period and duration

The period facet defines the frequency of recurrence (after what duration it recurs) for time types. Its value is a time duration.
The duration facet defines the the length of the duration for time types. Its value is also a time duration.
Applies to time types: time, timeInstant, timePeriod, date, month, year, recurringDay, and recurringDate.
For example, you might use the period facet to define a twoWeek type with a fourteen day and a fourteen day period duration for paychecks.

Enumeration

The enumeration facet lists all allowed values
Applies to all simple types except boolean

For example, to say that the publisher must be one of the oligopoly that controls 90% of U.S. music (Warner-Elektra-Atlantic, Universal Music Group, Sony Music Entertainment, Inc., Capitol Records, Inc., BMG Music)

<?xml version="1.0"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">

  <xsd:element name="SONG" type="songType"/>
 
  <xsd:simpleType name="oligopolyMember">
    <xsd:restriction base="xsd:string">
      <xsd:enumeration value="Warner-Elektra-Atlantic"/>
      <xsd:enumeration value="Universal Music Group"/>
      <xsd:enumeration value="Sony Music Entertainment, Inc."/>
      <xsd:enumeration value="Capitol Records, Inc."/>
      <xsd:enumeration value="BMG Music"/>
    </xsd:restriction>
  </xsd:simpleType> 
 
  <xsd:complexType name="songType">
    <xsd:sequence>
      <xsd:element name="TITLE"     type="xsd:string"/>
      <xsd:element name="COMPOSER"  type="xsd:string"
        maxOccurs="unbounded"/>
      <xsd:element name="PRODUCER"  type="xsd:string" 
        minOccurs="0" maxOccurs="unbounded"/>
      <xsd:element name="PUBLISHER" type="oligopolyMember" 
        minOccurs="0"/>
      <xsd:element name="LENGTH"    type="xsd:timeDuration"/>
      <xsd:element name="YEAR"      type="xsd:gYear"/>
      <xsd:element name="ARTIST"    type="xsd:string" 
        maxOccurs="unbounded"/>
    </xsd:sequence>
  </xsd:complexType>
 
</xsd:schema>

Adding a Price

<?xml version="1.0"?>
<SONG xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xsi:noNamespaceSchemaLocation="priced_song.xsd">
  <TITLE>Hot Cop</TITLE>
  <COMPOSER>Jacques Morali</COMPOSER>
  <COMPOSER>Henri Belolo</COMPOSER>
  <COMPOSER>Victor Willis</COMPOSER>
  <PRODUCER>Jacques Morali</PRODUCER>
  <PUBLISHER>PolyGram Records</PUBLISHER>
  <YEAR>1978</YEAR>
  <ARTIST>Village People</ARTIST>  
  <PRICE>$1.35</PRICE>  
</SONG>

The pattern facet

Suppose you want a money type to specify that the PRICE element content must look like $1.35 or ¥11000
Derive this from the xsd:string type by restriction
Use the pattern facet to specify a regular expression instances must match

Regular Expressions

More or less Perl-like with some Unicode extensions
The money regular expression:
\p{Sc}\p{Nd}+(\.\p{Nd}\p{Nd})?

\p{Sc}

Any Unicode currency indicator; e.g. $, ¥, £, ¤, etc.

\p{Nd}
A Unicode decimal digit character

\p{Nd}+
One or more Unicode decimal digit characters

\.
The period character

(\.\p{Nd}\p{Nd})
A string with a period followed by two decimal digits like .35

(\.\p{Nd}\p{Nd})?
Zero or one strings of the form .35

The Price Schema

<?xml version="1.0"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
 
  <xsd:element name="SONG" type="SongType"/>

  <xsd:simpleType name="money">
    <xsd:restriction base="xsd:string">             
      <xsd:pattern value="\p{Sc}\p{Nd}+(\.\p{Nd}\p{Nd})?"/>
    <!-- 
       Regular Expression:
       \p{Sc}             Any Unicode currency indicator; e.g. $, &#xA5, &#xA3, &#A4, etc.
       \p{Nd}             A Unicode decimal digit character
       \p{Nd}+            One or more Unicode decimal digit characters
       \.                 The period character
       (\.\p{Nd}\p{Nd})
       (\.\p{Nd}\p{Nd})?  Zero or one strings of the form .35
       
       This works for any decimalized currency. 
       
    -->
    </xsd:restriction>
  </xsd:simpleType>

  <xsd:complexType name="SongType">
    <xsd:sequence>
      <xsd:element name="TITLE"     type="xsd:string"/>
      <xsd:element name="COMPOSER"  type="xsd:string" 
        maxOccurs="unbounded"/>
      <xsd:element name="PRODUCER"  type="xsd:string" 
        minOccurs="0" maxOccurs="unbounded"/>
      <xsd:element name="PUBLISHER" type="xsd:string" 
        minOccurs="0"/>
      <xsd:element name="YEAR"   type="xsd:gYear"/>
      <xsd:element name="ARTIST" type="xsd:string" 
        maxOccurs="unbounded"/>
      <xsd:element name="PRICE" type="money"/>      
     </xsd:sequence> 
  </xsd:complexType>
  
</xsd:schema>

Complex Types

Elements that contain child elements or have attributes or both
Defined by an xsd:complexType element

A Document with Attributes

<?xml version="1.0"?>
<SONG xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xsi:noNamespaceSchemaLocation="attribute_song.xsd">
  <TITLE>Hot Cop</TITLE>
  <PHOTO ALT="Victor Willis in Cop Outfit" WIDTH="100" HEIGHT="200"/>
  <COMPOSER>Jacques Morali</COMPOSER>
  <COMPOSER>Henri Belolo</COMPOSER>
  <COMPOSER>Victor Willis</COMPOSER>
  <PRODUCER>Jacques Morali</PRODUCER>
  <PUBLISHER>PolyGram Records</PUBLISHER>
  <YEAR>1978</YEAR>
  <ARTIST>Village People</ARTIST>  
</SONG>

Declaring Attributes

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
 
  <xsd:element name="SONG" type="SongType"/>

  <!-- An empty element -->
  <xsd:complexType name="PhotoType">
    <xsd:complexContent>
       <xsd:restriction base="xsd:anyType">
         <xsd:attribute name="ALT"    type="xsd:string"/>
         <xsd:attribute name="WIDTH"  type="xsd:nonNegativeInteger"/>
         <xsd:attribute name="HEIGHT" type="xsd:nonNegativeInteger"/> 
      </xsd:restriction>
    </xsd:complexContent>
  </xsd:complexType>

  <xsd:complexType name="SongType">
    <xsd:sequence>
      <xsd:element name="TITLE"     type="xsd:string"/>
      <xsd:element name="PHOTO"     type="PhotoType"  
        minOccurs="0""/>
      <xsd:element name="COMPOSER"  type="xsd:string" 
        maxOccurs="unbounded"/>
      <xsd:element name="PRODUCER"  type="xsd:string" 
        minOccurs="0" maxOccurs="unbounded"/>
      <xsd:element name="PUBLISHER" type="xsd:string" 
        minOccurs="0"/>
      <xsd:element name="YEAR"   type="xsd:gYear"/>
      <xsd:element name="ARTIST" type="xsd:string" 
        maxOccurs="unbounded"/>
    </xsd:sequence>
  </xsd:complexType>

</xsd:schema>

Element Content

<?xml version="1.0"?>
<SONG xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xsi:noNamespaceSchemaLocation="nested_song.xsd">
  <TITLE>Hot Cop</TITLE>
  <PHOTO ALT="Victor Willis in Cop Outfit" WIDTH="100" HEIGHT="200"/>
  <COMPOSER>
    <NAME>
      <GIVEN>Jacques</GIVEN>
      <FAMILY>Morali</FAMILY>
    </NAME>
  </COMPOSER>
  <COMPOSER>
    <NAME>
      <GIVEN>Henri</GIVEN>
      <FAMILY>Belolo</FAMILY>
    </NAME>
  </COMPOSER>
  <COMPOSER>
    <NAME>
      <GIVEN>Victor</GIVEN>
      <FAMILY>Willis</FAMILY>
    </NAME>
  </COMPOSER>
  <PRODUCER>
    <NAME>
      <GIVEN>Jacques</GIVEN>
      <FAMILY>Morali</FAMILY>
    </NAME>
  </PRODUCER>
  <PUBLISHER>PolyGram Records</PUBLISHER>
  <YEAR>1978</YEAR>
  <ARTIST>Village People</ARTIST>  
</SONG>

Declaring Complex Types

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
 
  <xsd:element name="SONG" type="SongType"/>

  <xsd:complexType name="ComposerType">
    <xsd:sequence>
      <xsd:element name="NAME">
        <xsd:complexType>
          <xsd:sequence>
             <xsd:element name="GIVEN"  type="xsd:string"/>
             <xsd:element name="FAMILY" type="xsd:string"/>  
          </xsd:sequence>    
        </xsd:complexType>
      </xsd:element>
    </xsd:sequence>
  </xsd:complexType>

  <xsd:complexType name="ProducerType">
    <xsd:sequence>
      <xsd:element name="NAME">
        <xsd:complexType>
          <xsd:sequence>
            <xsd:element name="GIVEN"  type="xsd:string"/>
            <xsd:element name="FAMILY" type="xsd:string"/>
          </xsd:sequence>      
        </xsd:complexType>
      </xsd:element>
    </xsd:sequence>
  </xsd:complexType>

  <xsd:complexType name="PhotoType">
    <xsd:complexContent>
       <xsd:restriction base="xsd:anyType">
         <xsd:attribute name="ALT"    type="xsd:string"/>
         <xsd:attribute name="WIDTH"  type="xsd:nonNegativeInteger"/>
         <xsd:attribute name="HEIGHT" type="xsd:nonNegativeInteger"/> 
      </xsd:restriction>
    </xsd:complexContent>
  </xsd:complexType>
  
  <xsd:complexType name="SongType">
    <xsd:sequence>
      <xsd:element name="TITLE"     type="xsd:string"/>
      <xsd:element name="PHOTO"     type="PhotoType"  
        minOccurs="0"/>
      <xsd:element name="COMPOSER"  type="ComposerType" 
        maxOccurs="unbounded"/>
      <xsd:element name="PRODUCER"  type="ProducerType" 
        minOccurs="0" maxOccurs="unbounded"/>
      <xsd:element name="PUBLISHER" type="xsd:string" 
        minOccurs="0"/>
      <xsd:element name="YEAR"   type="xsd:gYear"/>
      <xsd:element name="ARTIST" type="xsd:string" 
        maxOccurs="unbounded"/>
    </xsd:sequence>
  </xsd:complexType>

</xsd:schema>

Sharing Content Models

PRODUCER and COMPOSER are really the same type.

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
 
  <xsd:element name="SONG" type="SongType"/>

  <xsd:complexType name="PersonType">
    <xsd:sequence>
      <xsd:element name="NAME">
        <xsd:complexType>
          <xsd:sequence>
             <xsd:element name="GIVEN"  type="xsd:string"/>
             <xsd:element name="FAMILY" type="xsd:string"/>  
          </xsd:sequence>    
        </xsd:complexType>
      </xsd:element>
    </xsd:sequence>
  </xsd:complexType>

  <xsd:complexType name="SongType">
    <xsd:sequence>
      <xsd:element name="TITLE"     type="xsd:string"/>
      <xsd:element name="PHOTO"     type="PhotoType"  
        minOccurs="0"/>
      <xsd:element name="COMPOSER"  type="PersonType" 
        maxOccurs="unbounded"/>
      <xsd:element name="PRODUCER"  type="PersonType" 
        minOccurs="0" maxOccurs="unbounded"/>
      <xsd:element name="PUBLISHER" type="xsd:string" 
        minOccurs="0"/>
      <xsd:element name="YEAR"   type="xsd:gYear"/>
      <xsd:element name="ARTIST" type="xsd:string" 
        maxOccurs="unbounded"/>
    </xsd:sequence>
  </xsd:complexType>

  <xsd:complexType name="PhotoType">
    <xsd:complexContent>
       <xsd:restriction base="xsd:anyType">
         <xsd:attribute name="ALT"    type="xsd:string"/>
         <xsd:attribute name="WIDTH"  type="xsd:nonNegativeInteger"/>
         <xsd:attribute name="HEIGHT" type="xsd:nonNegativeInteger"/> 
      </xsd:restriction>
    </xsd:complexContent>
  </xsd:complexType>
  
</xsd:schema>

Mixed Content

Schemas let you enforce order and appearance of elements in mixed content.

<?xml version="1.0"?>
<SONG xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xsi:noNamespaceSchemaLocation="mixed_song.xsd">
  <TITLE>Hot Cop</TITLE>
  <PHOTO ALT="Victor Willis in Cop Outfit" WIDTH="100" HEIGHT="200"/>
  <COMPOSER>
    <NAME>Mr. <GIVEN>Jacques</GIVEN> <FAMILY>Morali</FAMILY> Esq.</NAME>
  </COMPOSER>
  <COMPOSER>
    <NAME>Mr. <GIVEN>Henri</GIVEN> L. <FAMILY>Belolo</FAMILY>, M.D.</NAME>
  </COMPOSER>
  <COMPOSER>
    <NAME>Mr. <GIVEN>Victor</GIVEN> C. <FAMILY>Willis</FAMILY></NAME>
  </COMPOSER>
  <PRODUCER>
    <NAME>Mr. <GIVEN>Jacques</GIVEN> S. <FAMILY>Morali</FAMILY></NAME>
  </PRODUCER>
  <PUBLISHER>PolyGram Records</PUBLISHER>
  <YEAR>1978</YEAR>
  <ARTIST>Village People</ARTIST>  
</SONG>

Declaring Mixed Content

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
 
  <xsd:element name="SONG" type="SongType"/>

  <xsd:complexType name="PersonType">
    <xsd:sequence>
      <xsd:element name="NAME">
        <xsd:complexType mixed="true">
          <xsd:sequence>
            <xsd:element name="GIVEN"  type="xsd:string"/>
            <xsd:element name="FAMILY" type="xsd:string"/>
           </xsd:sequence>      
         </xsd:complexType>
      </xsd:element>
    </xsd:sequence>
  </xsd:complexType>

  <xsd:complexType name="SongType">
    <xsd:sequence>
      <xsd:element name="TITLE"     type="xsd:string"/>
      <xsd:element name="PHOTO"     type="PhotoType"  
        minOccurs="0"/>
      <xsd:element name="COMPOSER"  type="PersonType" 
        maxOccurs="unbounded"/>
      <xsd:element name="PRODUCER"  type="PersonType" 
        minOccurs="0" maxOccurs="unbounded"/>
      <xsd:element name="PUBLISHER" type="xsd:string" 
        minOccurs="0"/>
      <xsd:element name="YEAR"   type="xsd:gYear"/>
      <xsd:element name="ARTIST" type="xsd:string" 
        maxOccurs="unbounded"/>
    </xsd:sequence>
  </xsd:complexType>

  <xsd:complexType name="PhotoType">
    <xsd:complexContent>
       <xsd:restriction base="xsd:anyType">
         <xsd:attribute name="ALT"    type="xsd:string"/>
         <xsd:attribute name="WIDTH"  type="xsd:nonNegativeInteger"/>
         <xsd:attribute name="HEIGHT" type="xsd:nonNegativeInteger"/> 
      </xsd:restriction>
    </xsd:complexContent>
  </xsd:complexType>
  
</xsd:schema>

When Order Doesn't Matter

<?xml version="1.0"?>
<SONG xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xsi:noNamespaceSchemaLocation="unordered_song.xsd">
  <TITLE>Hot Cop</TITLE>
  <PHOTO ALT="Victor Willis in Cop Outfit" WIDTH="100" HEIGHT="200"/>
  <COMPOSER>
    <NAME><FAMILY>Morali</FAMILY> <GIVEN>Jacques</GIVEN></NAME>
  </COMPOSER>
  <COMPOSER>
    <NAME><GIVEN>Henri</GIVEN> <FAMILY>Belolo</FAMILY></NAME>
  </COMPOSER>
  <COMPOSER>
    <NAME><FAMILY>Willis</FAMILY> <GIVEN>Victor</GIVEN></NAME>
  </COMPOSER>
  <PRODUCER>
    <NAME><GIVEN>Jacques</GIVEN> <FAMILY>Morali</FAMILY></NAME>
  </PRODUCER>
  <PUBLISHER>PolyGram Records</PUBLISHER>
  <YEAR>1978</YEAR>
  <ARTIST>Village People</ARTIST>  
</SONG>

The xsd:all Group

Each element in the xsd:all group must occur zero or once; that is minOccurs and maxOccurs must each be 0 or 1
The xsd:all group must be the top level element of its type
The xsd:all group may contain only individual element declarations; no choice or sequences

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
 
  <xsd:element name="SONG" type="SongType"/>

  <xsd:complexType name="PersonType">
    <xsd:sequence>
      <xsd:element name="NAME">
        <xsd:complexType>
          <xsd:all>
            <xsd:element name="GIVEN"  type="xsd:string"/>
            <xsd:element name="FAMILY" type="xsd:string"/> 
          </xsd:all>     
        </xsd:complexType>
      </xsd:element>
    </xsd:sequence>
  </xsd:complexType>

  <xsd:complexType name="SongType">
    <xsd:sequence>
      <xsd:element name="TITLE"     type="xsd:string"/>
      <xsd:element name="PHOTO"     type="PhotoType"  
        minOccurs="0"/>
      <xsd:element name="COMPOSER"  type="PersonType" 
        maxOccurs="unbounded"/>
      <xsd:element name="PRODUCER"  type="PersonType" 
        minOccurs="0" maxOccurs="unbounded"/>
      <xsd:element name="PUBLISHER" type="xsd:string" 
        minOccurs="0"/>
      <xsd:element name="YEAR"   type="xsd:gYear"/>
      <xsd:element name="ARTIST" type="xsd:string" 
        maxOccurs="unbounded"/>
    </xsd:sequence>
  </xsd:complexType>

  <!-- An empty element -->
  <xsd:complexType name="PhotoType" content="empty">
    <xsd:complexContent>
       <xsd:restriction base="xsd:anyType">
         <xsd:attribute name="ALT"    type="xsd:string"/>
         <xsd:attribute name="WIDTH"  type="xsd:nonNegativeInteger"/>
         <xsd:attribute name="HEIGHT" type="xsd:nonNegativeInteger"/> 
      </xsd:restriction>
    </xsd:complexContent>
  </xsd:complexType>
  
</xsd:schema>

Choices

xsd:choice requires exactly one of a group of specified elements to appear
The choice can have minOccurs and maxOccurs attributes that adjust this from zero to any given number.

Sequences

xsd:sequence requires each child element it specifies to appear in the specified order
The sequence can have minOccurs and maxOccurs attributes that repeat each sequence zero to any given number of times.

Default Namespace

<?xml version="1.0"?>
<GREETING 
  xmlns="http://ibiblio.org/xml/schemas/greeting/"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://ibiblio.org/xml/schemas/greeting/
                      greeting_defaultNS.xsd">
  Hello XML!
</GREETING>

The targetNamespace attribute

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
  targetNamespace="http://ibiblio.org/xml/schemas/greeting/"
>
 
  <xsd:element name="GREETING" type="xsd:string"/>

</xsd:schema>

A Song with a Namespace

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<SONG xmlns="http://ibiblio.org/xml/namespace/song"
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xsi:schemaLocation = 
       "http://ibiblio.org/xml/namespace/song namespace_song.xsd"
>
  <TITLE>Hot Cop</TITLE>
  <PHOTO ALT="Victor Willis in Cop Outfit" WIDTH="100" HEIGHT="200"/>
  <COMPOSER>Jacques Morali</COMPOSER>
  <COMPOSER>Henri Belolo</COMPOSER>
  <COMPOSER>Victor Willis</COMPOSER>
  <PRODUCER>Jacques Morali</PRODUCER>
  <PUBLISHER>PolyGram Records</PUBLISHER>
  <YEAR>1978</YEAR>
  <ARTIST>Village People</ARTIST>  
</SONG>

A Schema for a Document that Uses the Default Namespace

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
  xmlns="http://ibiblio.org/xml/namespace/song"
  targetNamespace="http://ibiblio.org/xml/namespace/song"
  elementFormDefault="qualified"
  attributeFormDefault="unqualified"
>
 
  <xsd:element name="SONG" type="SongType"/>

  <xsd:complexType name="SongType">
    <xsd:sequence>
      <xsd:element name="TITLE" type="xsd:string"/>
      <xsd:element name="PHOTO" type="PhotoType"  
        minOccurs="0"/>
      <xsd:element name="COMPOSER"  type="xsd:string" 
        maxOccurs="unbounded"/>
      <xsd:element name="PRODUCER"  type="xsd:string" 
        minOccurs="0" maxOccurs="unbounded"/>
      <xsd:element name="PUBLISHER" type="xsd:string" 
        minOccurs="0"/>    
      <xsd:element name="YEAR" type="xsd:gYear"/>
      <xsd:element name="ARTIST" type="xsd:string" 
        minOccurs="0" maxOccurs="unbounded"/>
    </xsd:sequence>
  </xsd:complexType>

  <xsd:complexType name="PhotoType">
    <xsd:complexContent>
       <xsd:restriction base="xsd:anyType">
         <xsd:attribute name="ALT"    type="xsd:string"/>
         <xsd:attribute name="WIDTH"  type="xsd:nonNegativeInteger"/>
         <xsd:attribute name="HEIGHT" type="xsd:nonNegativeInteger"/> 
      </xsd:restriction>
    </xsd:complexContent>
  </xsd:complexType>
  
</xsd:schema>

Multiple Namespaces, Multiple Schemas

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<SONG xmlns="http://ibiblio.org/xml/namespace/song"
      xmlns:xlink="http://www.w3.org/1999/xlink"
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xsi:schemaLocation = 
       "http://ibiblio.org/xml/namespace/song xlink_song.xsd
        http://www.w3.org/1999/xlink xlink.xsd"
>
  <TITLE>Hot Cop</TITLE>
  <PHOTO xlink:type="simple" xlink:href="hotcop.jpg"
         xlink:actuate="onLoad" xlink:show="embed"
    ALT="Victor Willis in Cop Outfit" WIDTH="100" HEIGHT="200"/>
  <COMPOSER>Jacques Morali</COMPOSER>
  <COMPOSER>Henri Belolo</COMPOSER>
  <COMPOSER>Victor Willis</COMPOSER>
  <PRODUCER>Jacques Morali</PRODUCER>
  <PUBLISHER>PolyGram Records</PUBLISHER>
  <YEAR>1978</YEAR>
  <ARTIST>Village People</ARTIST>  
</SONG>

XLink Schema

<?xml version="1.0"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
  xmlns="http://www.w3.org/1999/xlink"
  targetNamespace="http://www.w3.org/1999/xlink"
  attributeFormDefault="unqualified"
>

  <xsd:attribute name="type" type="xsd:string" 
                 use="fixed" value="simple"  />
  <xsd:attribute name="href" type="xsd:anyURI"/>
  <xsd:attribute name="actuate" type="xsd:string"
                 use="fixed" value="onLoad"  />
  <xsd:attribute name="show" type="xsd:string"
                 use="fixed" value="embed"  />

</xsd:schema>

Song Schema with XLink Support

<?xml version="1.0"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
  xmlns="http://ibiblio.org/xml/namespace/song"
  xmlns:xlink="http://www.w3.org/1999/xlink"
  targetNamespace="http://ibiblio.org/xml/namespace/song"
  elementFormDefault="qualified"
  attributeFormDefault="unqualified"
>

  <xsd:import namespace="http://www.w3.org/1999/xlink"
              schemaLocation="xlink.xsd"/>

  <xsd:element name="SONG" type="SongType"/>

  <xsd:complexType name="PhotoType">
    <xsd:attribute name="WIDTH"  type="xsd:positiveInteger"
                   use="required" />
    <xsd:attribute name="HEIGHT" type="xsd:positiveInteger"
                   use="required" />
    <xsd:attribute name="ALT"    type="xsd:string"
                   use="required" />
    <xsd:attribute ref="xlink:type"/>
    <xsd:attribute ref="xlink:href" use="required"/>
    <xsd:attribute ref="xlink:actuate"/>
    <xsd:attribute ref="xlink:show"/>                 
  </xsd:complexType>

  <xsd:complexType name="SongType">
    <xsd:sequence>
      <xsd:element name="TITLE"     type="xsd:string"/>
      <xsd:element name="PHOTO"     type="PhotoType"/>
      <xsd:element name="COMPOSER"  type="xsd:string"
                   maxOccurs="unbounded"/>
      <xsd:element name="PRODUCER"  type="xsd:string"
                   minOccurs="0" maxOccurs="unbounded"/>
      <xsd:element name="PUBLISHER" type="xsd:string"
                   minOccurs="0"/>
      <xsd:element name="LENGTH"    type="xsd:timeDuration"/>
      <xsd:element name="YEAR"      type="xsd:gYear"/>
      <xsd:element name="ARTIST"    type="xsd:string"
                   maxOccurs="unbounded"/>
    </xsd:sequence>
  </xsd:complexType>

</xsd:schema>

Annotations

The top-level xsd:annotation element describes the schema
Its xsd:documentation child element describes the schema for human readers
Its xsd:appInfo child element describes the schema for computer programs; e.g. stylesheet instructions

  <xsd:annotation>
   <xsd:documentation>
    Song schema for XML and Java Example at Software Development 2001 West
    Copyright 2001 Elliotte Rusty Harold. 
   </xsd:documentation>
  </xsd:annotation>

What Schemas don't do

Cannot declare entities
Parent models
Extra-document validation

Schema Alternatives

Rick Jelliffe's Schematron
Murato Makoto's RELAX
James Clark's TREX
Rick Jelliffe's Hook
DTDs

Schematron

According to Schematron inventor Rick Jelliffe:
The Schematron differs in basic concept from other schema languages in that it not based on grammars but on finding tree patterns in the parsed document. This approach allows many kinds of structures to be represented which are inconvenient and difficult in grammar-based schema languages.
XSLT/XPath based
W3C Schemas are conservative: everything not permitted is forbidden.
Schematron is liberal: everything not forbidden is permitted.
No data typing; validation only
Handles unordered structures very well
Handles descendant constraints very well
Almost self-documenting
http://www.ascc.net/xml/resource/schematron/schematron.html

A Schematron schema for songs

A schema contains a title and a pattern
Each pattern contains rules
Each rule contains assert and report elements and has a context attribute
Each assert and report element has a test attribute containing an XPath expression whihc returns a boolean.
The contents of each assert element is printed if the assertion test fails
The contents of each report element is printed if the report test succeeds

<?xml version="1.0"?>
<schema xmlns="http://www.ascc.net/xml/schematron">
  <title>A Schematron Schema for Songs</title>
  <pattern>
    <rule context="SONG">
      <assert test="TITLE">
        A SONG must contain an initial TITLE element.
      </assert>
      <assert test="TITLE[position()!=1]">
        The TITLE element must be the initial element of the SONG element.
      </assert>
      <assert test="COMPOSER">
        A SONG must contain at least COMPOSER element.
      </assert>
      <assert test="ARTIST">
        A SONG must contain at least one ARTIST element.
      </assert>
    </rule>
  </pattern>
</schema>

RELAX

Murato Makoto
JIS standard/Proposed ISO standard
Uses W3C Schema data types
No derived types
Mostly DTD-like structures
http://www.xml.gr.jp/relax/

A RELAX schema for songs

<?xml version="1.0?>
<module
      moduleVersion="1.2"
      relaxCoreVersion="1.0"
      targetNamespace=""
      xmlns="http://www.xml.gr.jp/xmlns/relaxCore">

  <!-- Elements allowed as document roots -->
  <interface>
    <export label="SONG"/>
  </interface>

  <elementRule role="SONG">
    <sequence>
      <ref label="TITLE"/>
      <ref label="COMPOSER" occurs="*"/>
      <ref label="PRODUCER" occurs="*"/>
      <ref label="PUBLISHER" occurs="?"/>
      <ref label="YEAR"/>
      <ref label="ARTIST" occurs="+"/>
      <ref label="PRICE"  occurs="?"/>
    </sequence>
  </elementRule>
  
  <elementRule role="TITLE"     type="string"/>
  <elementRule role="COMPOSER"  type="string"/>
  <elementRule role="PRODUCER"  type="string"/>
  <elementRule role="PUBLISHER" type="string"/>
  <elementRule role="YEAR"      type="year"/>
  <elementRule role="ARTIST"    type="string"/>
  <elementRule role="PRICE"     type="string"/>

</module>

TREX

Tree Regular Expressions for XML
Invented by James Clark of XSLT fame
Proposed OASIS standard
Uses W3C Schema data types
http://www.thaiopensource.com/trex/

Hook

Rick Jelliffe's Hook: A One-Element Language for Validation of XML Documents based on Partial Order
XSLT/XPath based
No data typing; validation only

A Hook schema for XHTML (adapted from Rick Jelliffe):

<hook:order targetNamespace="http://www.w3.org/1999/xhtml" >
  html head  [ title; meta. link. base. ]   body
  [ a br. blockquote caption; div  dl; h1; h2; h3; h4; h5; h6;  
    img. ol; p; pre; table; ul; ]  
  [ tr;  dt; dd; li; ]  td 
  [ a br. blockquote div  form img. ol; ul; li; ]  
  [ input; label; select; textarea; ]  [ option. ]
  [ abbr acronym address cite code dfn em kbd q samp span strong var object; ] 
  param 
</hook:order>

http://www.ascc.net/xml/hook/

DTDs aren't Dead!

To Learn More

XML Bible, second edition, Chapter 24
- Elliotte Rusty Harold
- Hungry Minds, 2001
- ISBN 0-7645-4760-7
This presentation: http://www.ibiblio.org/xml/slides/sd2001west/schemas
W3C Schema Primer: http://www.w3.org/TR/xmlschema-0/
XML Schema Part 1: Structures: http://www.w3.org/TR/xmlschema-1/
XML Schema Part 2: Datatypes: http://www.w3.org/TR/xmlschema-2/

Part III: XSLT 1.1 and Beyond

In SQL, the query language is not expressed in tables and rows. In XQuery, the query language is not expressed in XML. Why is this a problem?

--Jonathan Robie on the xml-dev mailing list

XSLT 1.1

XSLT 1.0 has been very successful. XSLT 1.1 just adds a few small pieces and cleans up a couple of holes.
Multiple output documents
Variables can be set to node sets; no more result tree fragments.
Extension functions defined in style sheets with Java and ECMAScript
Standard Java and JavaScript bindings for extension functions
Existing elements and functions hardly change at all

Identifying 1.1 compliant stylesheets

Namespace is still http://www.w3.org/1999/XSL/Transform
version attribute of xsl:stylesheet has value 1.1

<xsl:stylesheet version="1.1"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <!-- Top level elements -->

</xsl:stylesheet>

No result tree fragments

The result tree fragment data-type has been eliminated.
Variable-binding elements with content now construct node-sets
These node sets can now be operated on by templates
Functionality previously available with saxon:nodeSet() and similar extension functions

Multiple Output Documents

Allows you to generate multiple documents from one source document
Previously available with extension functions like xt:document and saxon:output

Syntax modeled on xsl:output

<xsl:document
    href = { uri-reference }
    method = { "xml" | "html" | "text" | qname-but-not-ncname }
    version = { nmtoken }
    encoding = { string }
    omit-xml-declaration = { "yes" | "no" }
    standalone = { "yes" | "no" }
    doctype-public = { string }
    doctype-system = { string }
    cdata-section-elements = { qnames }
    indent = { "yes" | "no" }
    media-type = { string }
    <!-- Content: template -->
</xsl:document>

xsl:document Example

Partially supported by Saxon 6.2

     <xsl:document method="html" encoding="ISO-8859-1" href="index.html">
       <html>
         <head>
           <title><xsl:value-of select="title"/></title>         
         </head>
         <body> 
           <h1 align="center"><xsl:value-of select="title"/></h1> 
           <ul>
             <xsl:for-each select="slide">
               <li><a href="{format-number(position(),'00')}.html"><xsl:value-of select="title"/></a></li>
             </xsl:for-each>    
           </ul>           
           
           <p><a href="{translate(title,' ', '_')}.html">Entire Presentation as Single File</a></p>
              
           <hr/>
           <div align="center">
             <A HREF="01.html">Start</A> | <A HREF="/xml/">Cafe con Leche</A>
           </div>
           <hr/>
           <font size="-1">
              Copyright 2001 
              <a href="http://www.macfaq.com/personal.html">Elliotte Rusty Harold</a><br/>       
              <a href="mailto:elharo@metalab.unc.edu">elharo@metalab.unc.edu</a><br/>
              Last Modified <xsl:apply-templates select="last_modified" mode="lm"/>
           </font>
         </body>     
       </html>     
     </xsl:document>

xsl:script Top-level Element

Defines an extension function, possibly inline

Syntax:

<xsl:script
  implements-prefix = ncname
  language = "ecmascript" | "javascript" | "java" | qname-but-not-ncname
  src = uri-reference
  archive = uri-references>
  <!-- Content: #PCDATA -->
</xsl:script>

Partially supported by Saxon 6.2 for Java only

xsl:script with Java

<?xml version="1.0"?>
<xsl:stylesheet version="1.1"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:date="http://www.cafeconleche.org/ns/"
>

  <xsl:template match="/">
    <xsl:value-of select="date:new()"/>
  </xsl:template>

  <xsl:script
    implements-prefix="date"
    language="java"
    src="java:java.util.Date"
  />

</xsl:stylesheet>

xsl:script with JavaScript

<?xml version="1.0"?>
<xsl:stylesheet version="1.1"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:date="http://www.cafeconleche.org/ns/date"
>

  <xsl:template match="/">
    <xsl:value-of select="date:clock()"/>
  </xsl:template>

  <xsl:script
    implements-prefix="date"
    language="javascript">
    
    function clock() {
      var time = new Date();
      var hours = time.getHours();
      var min = time.getMinutes();
      var sec = time.getSeconds();
      var status = "AM";
      if (hours > 11) {
        status = "PM";
      }
      if (hours < 11) {
        hours -= 12;
      }
      if (min < 10) {
        min = "0" + min;
      }
      if (sec < 10) {
        sec = "0" + sec;
      }
      return hours + ":" + min + ":" + sec + " " + status;
   }
   
  </xsl:script>  

</xsl:stylesheet>

XPath 2.0

Used for XSLT 2.0 and XQuery
Schema Aware

XPath 2.0 Goals

Simplify manipulation of XML Schema-typed content
Simplify manipulation of string content
Support related XML standards
Improve ease of use
Improve interoperability
Improve internationalization (i18n) support
Maintain backward compatibility
Enable improved processor efficiency

XPath 2.0 Requirements

Must express data model in terms of the Infoset
Must provide common core syntax and semantics for XSLT 2.0 and XML Query 1.0
Must support explicit "for any" or "for all" comparison and equality semantics
Must add min() and max() functions
Any valid XPath 1.0 expression SHOULD also be a valid XPath 2.0 expression when operating in the absence of XML Schema type information.
Should provide intersection and difference functions
Must loosen restrictions on location steps
Must provide a conditional expression (e.g. ternary ?: operator in Java and C)
Should support additional string functions, possibly including space padding, string replacement and conversion to upper or lower case
Must support regular expression string matching using the regexp syntax from schemas
Must add support for XML Schema primitive datatypes
Should add support for XML Schema structures

XSLT 2.0

Uses XPath 2.0
Schema Aware

XSLT 2.0 Goals

Simplify manipulation of XML Schema-typed content
Simplify manipulation of string content
Support related XML standards
Improve ease of use
Improve interoperability
Improve i18n support
Maintain backward compatibility
Enable improved processor efficiency

XSLT 2.0 Non-goals

Simplifying the ability to parse unstructured information to produce structured results.
Turning XSLT into a general-purpose programming language

XSLT 2.0 Requirements

Must maintain backwards compatibility with XSLT 1.1
Should be able to match elements and attributes whose value is explicitly null.
Should allow included documents to encapsulate local stylesheets
Could support accessing infoset items for XML declaration
Could provide qualified name aware string functions
Could enable constructing a namespace with computed name
Could simplify resolving prefix conflicts in qname-valued attributes
Could support XHTML output method
Must allow matching on default namespace without explicit prefix
Must add date formatting functions
Must simplify accessing IDs and keys in other documents
Should provide function to absolutize relative URIs
Should include unparsed text from an external resource
Should allow authoring extension functions in XSLT
Should output character entity references instead of numeric character entities
Should construct entity reference by name
Should support Unicode string normalization
Should standardize extension element language bindings
Could improve efficiency of transformations on large documents
Could support reverse IDREF attributes
Could support case-insensitive comparisons
Could support lexigraphic string comparisons
Could allow comparing nodes based on document order
Could improve support for unparsed entities
Could allow processing a node with the "next best matching" template
Could make coercions symmetric by allowing scalar to nodeset conversion
Must support XML schema
Must simplify constructing and copying typed content
Must support sorting nodes based on XML schema type
Could support scientific notation in number formatting
Could provide ability to detect whether "rich" schema information is available
Must simplify grouping

XQuery

Three parts:

A data model for XML documents based on the XML Infoset
A mathematically precise query algebra; i.e. a set of query operators on that data model
A query language based on these query operators and this algebra

XQuery Language

A fourth generation declarative language like SQL; not a procedural language like Java or a functional language like XSLT
Queries operate on single documents or fixed collections of documents.
Queries select whole documents or subtrees of documents that match conditions defined on document content and structure
Can construct new documents based on what is selected
No updates or inserts!

Documents to Query

Narrative documents and collections of such documents; e.g. generate a table of contents for a book
Data-oriented documents; e.g. SQL-like queries of an XML dump of a database
Filtering streams to process logs of email messages, network packets, stock market data, newswire feeds, EDI, or weather data to filter and route messages represented in XML, to extract data from XML streams, or to transform data in XML streams.
XML views of non-XML data

Physical Representations to Query

Files on a disk
Native-XML databases like Software AG's Tamino
DOM trees in memory
Streaming data
Other representations of the infoset

Where is XQuery used?

Direct query tools at command line
GUI query tools
JSP, ASP, PHP, and other such server side technologies
Programs written in Java, C++, and other languages that need to extract data from XML documents
Others are possible
Anywhere SQL is used to extract data from a database, XQuery is used to extract data from an XML document.
SQL is a non-compiled language that must be processed by some other tool to extract data from a database. So is XQuery.

The XML Model vs. the Relational Model

A relational database contains tables	An XML database contains collections
A relational table contains records with the same schema	A collection contains XML documents with the same DTD
A relational record is an unordered list of named values	An XML document is a tree of nodes
A SQL query returns an unordered set of records	An XQuery returns an ordered node set

Query Data Types

XML 1.0 #PCDATA
Schema primitive types: positiveInteger, String, float, double, unsignedLong, year, date, time, boolean, etc.
Schema complex types
Collections of these types
References to these types

An example document to query

Most of the examples in this talk query this bibliography document at the (fictional) URL http://www.bn.com/bib.xml:

<bib>
  <book year="1994">
    <title>TCP/IP Illustrated</title>
    <author><last>Stevens</last><first>W.</first></author>
    <publisher>Addison-Wesley</publisher>
    <price> 65.95</price>
  </book>

  <book year="1992">
    <title>Advanced Programming in the Unix Environment</title>
    <author><last>Stevens</last><first>W.</first></author>
    <publisher>Addison-Wesley</publisher>
    <price>65.95</price>
  </book>

  <book year="2000">
    <title>Data on the Web</title>
    <author><last>Abiteboul</last><first>Serge</first></author>
    <author><last>Buneman</last><first>Peter</first></author>
    <author><last>Suciu</last><first>Dan</first></author>
    <publisher>Morgan Kaufmann Publishers</publisher>
    <price> 39.95</price>
  </book>

  <book year="1999">
    <title>The Economics of Technology and Content for Digital TV</title>
    <editor>
      <last>Gerbarg</last><first>Darcy</first>
      <affiliation>CITI</affiliation>
    </editor>
      <publisher>Kluwer Academic Publishers</publisher>
    <price>129.95</price>
  </book>

</bib>

Adapted from Mary Fernandez, Jerome Simeon, and Phil Wadler: XML Query Languages: Experiences and Exemplars, 1999, as adapted in XML Query Use Cases

Bibliography DTD

<!ELEMENT bib (book* )>
<!ELEMENT book  (title,  (author+ | editor+ ), publisher, price )>
<!ATTLIST book year CDATA  #REQUIRED >

<!ELEMENT author      (last, first )>
<!ELEMENT editor      (last, first, affiliation )>
<!ELEMENT title       (#PCDATA )>
<!ELEMENT last        (#PCDATA )>
<!ELEMENT first       (#PCDATA )>
<!ELEMENT affiliation (#PCDATA )>
<!ELEMENT publisher   (#PCDATA )>
<!ELEMENT price       (#PCDATA )>

Adapted from XML Query Use Cases

The XQuery FLWR

FOR: each node selected by an XPath 2.0 location path
LET: a new variable have a specified value
WHERE: a condition expressed in XPath is true
RETURN: this node set

Query: List titles of all books

   FOR $t IN document("http://www.bn.com")/bib/book/title
   RETURN
     $t

Adapted from XML Query Use Cases

Query Result: Book Titles

  <title>TCP/IP Illustrated</title>
  <title>Advanced Programming in the Unix Environment</title>
  <title>Data on the Web</title>
  <title>The Economics of Technology and Content for Digital TV</title>

Adapted from XML Query Use Cases

Element Constructors

List titles of all books in a bib element. Put each title in a book element.

<bib>
   FOR $t IN document("http://www.bn.com")/bib/book/title
   RETURN
    <book>
     $t
    </book>
</bib>

Adapted from XML Query Use Cases

Query Result: Book Titles

<bib>
  <book>
    <title>TCP/IP Illustrated</title>
  </book>
  <book>
    <title>Advanced Programming in the Unix Environment</title>
  </book>
  <book>
    <title>Data on the Web</title>
  </book>
  <book>
    <title>The Economics of Technology and Content for Digital TV</title>
  </book>
</bib>

Adapted from XML Query Use Cases

Query with WHERE

List titles of books published by Addison-Wesley

<bib>
   FOR $b IN document("http://www.bn.com")/bib/book
   WHERE $b/publisher = "Addison-Wesley"
   RETURN
     $b/title
</bib>

This WHERE clause could be replaced by an XPath predicate:

<bib>
   FOR $b IN document("http://www.bn.com")/bib/book[publisher="Addison-Wesley"]
   RETURN
     $b/title
</bib>

But WHERE clauses can combine multiple variables from multiple documents

Adapted from XML Query Use Cases

Query Result: Titles of books published by Addison-Wesley

<bib>
    <title>TCP/IP Illustrated</title>
    <title>Advanced Programming in the Unix Environment</title>
</bib>

Adapted from XML Query Use Cases

Query with Booleans

XQuery booleans include:
- AND
- OR
- NOT()

List books published by Addison-Wesley after 1993:

<bib>
   FOR $b IN document("http://www.bn.com")/bib/book
   WHERE $b/publisher = "Addison-Wesley" AND $b/@year > 1993
   RETURN
     $b/title
</bib>

Adapted from XML Query Use Cases

Query Result: books published by Addison-Wesley after 1993

<bib>
    <title>Advanced Programming in the Unix Environment</title>
</bib>

Adapted from XML Query Use Cases

Attribute Constructors

List books published by Addison-Wesley after 1993, including their year and title:

<bib>
   FOR $b IN document("http://www.bn.com")/bib/book
   WHERE $b/publisher = "Addison-Wesley" AND $b/@year > 1993
   RETURN
    <book year = $b/@year>
     $b/title
    </book>
</bib>

This is not well-formed XML!

Adapted from XML Query Use Cases

Query Result: books published by Addison-Wesley after 1993, including their year and title.

<bib>
  <book year="1992">
    <title>Advanced Programming in the Unix Environment</title>
  </book>
</bib>

Adapted from XML Query Use Cases

Query with multiple variables

Create a list of all the title-author pairs, with each pair enclosed in a result element.

<results>
   FOR $b IN document("http://www.bn.com")/bib/book,
     $t IN $b/title,
     $a IN $b/author
   RETURN
    <result>
     $t,
     $a
    </result>
</results>

Adapted from XML Query Use Cases

Query Result: A list of all the title-author pairs

<results>
    <result>
         <title>TCP/IP Illustrated</title>
         <author><last>Stevens</last><first>W.</first></author>
    </result>
    <result>
         <title>Advanced Programming in the Unix Environment</title>
         <author><last>Stevens</last><first>W.</first></author>
    </result>
    <result>
         <title>Data on the Web</title>
         <author><last>Abiteboul</last><first>Serge</first></author>
    </result>
    <result>
         <title> Data on the Web</title>
         <author><last>Buneman</last><first>Peter</first></author>
    </result>
    <result>
         <title>Data on the Web</title>
         <author><last>Suciu</last><first>Dan</first></author>
    </result>
</results>

Adapted from XML Query Use Cases

Nested Queries

For each book in the bibliography, list the title and authors, grouped inside a result element.

<results>
   FOR $b IN document("http://www.bn.com")/bib/book
   RETURN
    <result>
     $b/title,
     FOR $a IN $b/author
     RETURN $a
    </result>
</results>

Adapted from XML Query Use Cases

Query Result: A list of the title and authors of each book in the bibliography

<results>
  <result>
    <title>TCP/IP Illustrated</title>
    <author><last>Stevens</last><first>W.</first></author>
  </result>

  <result>
    <title>Advanced Programming in the Unix Environment</title>
    <author><last>Stevens</last><first>W.</first></author>
  </result>

  <result>
    <title>Data on the Web</title>
    <author><last>Abiteboul</last><first>Serge</first></author>
    <author><last>Buneman</last><first>Peter</first></author>
    <author><last>Suciu</last><first>Dan</first></author>
  </result>
</results>

Adapted from XML Query Use Cases

Query with distinct

For each author in the bibliography, list the author's name and the titles of all books by that author, grouped inside a result element.

<results>
   FOR $a IN distinct(document("http://www.bn.com")//author)
   RETURN
    <result>
     $a,
     FOR $b IN document("http://www.bn.com")/bib/book[author=$a]
     RETURN $b/title
    </result>
</results>

Adapted from XML Query Use Cases

Query Result

<results>
  <result>
    <author><last>Stevens</last><first>W.</first></author>
    <title>TCP/IP Illustrated</title>
    <title>Advanced Programming in the Unix Environment</title>
  </result>

  <result>
    <author><last>Abiteboul</last><first>Serge</first></author>
    <title>Data on the Web</title>
  </result>

  <result>
    <author><last>Buneman</last><first>Peter</first></author>
    <title>Data on the Web</title>
  </result>

  <result>
    <author><last>Suciu</last><first>Dan</first></author>
      <title>Data on the Web</title>
  </result>
</results>

Adapted from XML Query Use Cases

IF THEN ELSE

Query: For each book that has at least one author, list the title and first two authors, and an empty "et-al" element if the book has additional authors

<bib>
   FOR $b IN document("http://www.bn.com/bib.xml")//book
   WHERE count($b/author) > 0
   RETURN
    <book>
     $b/title,
     FOR $a IN $b/author[RANGE 1 TO 2] RETURN $a,
     IF count($b/author) > 2 THEN <et-al/> ELSE [ ]
    </book>
</bib>

Adapted from XML Query Use Cases

Query Result: Books with the first two authors

<bib>
  <book>
    <title>TCP/IP Illustrated</title>
    <author><last>Stevens</last><first>W.</first></author>
  </book>

  <book>
    <title>Advanced Programming in the Unix Environment</title>
    <author><last>Stevens</last><first>W.</first></author>
  </book>

  <book>
    <title>Data on the Web</title>
      <author><last>Abiteboul</last><first> Serge</first></author>
      <author><last>Buneman</last><first>Peter</first></author>
      <et-al/>
  </book>
</bib>

Adapted from XML Query Use Cases

Query with sorting

List the titles and years of all books published by Addison-Wesley after 1991, in alphabetic order.

<bib>
   FOR $b IN document("http://www.bn.com/bib.xml")//book
    [publisher = "Addison-Wesley" AND @year > "1991"]
   RETURN
    <book>
     $b/@year,
     $b/title
    </book> SORTBY (title)
</bib>

Adapted from XML Query Use Cases

Query Result

<bib>
  <book year="1992">
    <title>Advanced Programming in the Unix Environment</title>
  </book>
  <book year="1994">
    <title>TCP/IP Illustrated</title>
   </book>
</bib>

Adapted from XML Query Use Cases

Queries with string functions

Find books in which some element has a tag ending in "or" and the same element contains the string "Suciu" (at any level of nesting). For each such book, return the title and the qualifying element.

FOR $b IN document("http://www.bn.com/bib.xml")//book,
  $e IN $b/*[contains(string(.), "Suciu")]
WHERE ends_with(name($e), "or")
RETURN
   <book>
    $b/title,
    $e
   </book>

Adapted from XML Query Use Cases

Query Result

<book>
  <title> Data on the Web </title>
  <author> <last> Suciu </last> <first> Dan </first> </author>
</book>

Adapted from XML Query Use Cases

A different document about books

Amazon sample data at "http://www.amazon.com/reviews.xml":

<reviews>
    <entry>
      <title> Data on the Web</title>
      <price>34.95</price>
      <review>
         A very good discussion of semi-structured database
         systems and XML.
      </review>
    </entry>
  
    <entry>
      <title> Advanced Programming in the Unix Environment</title>
      <price>65.95</price>
      <review>
         A clear and detailed discussion of UNIX programming.
      </review>
    </entry>
  
    <entry>
      <title>TCP/IP Illustrated</title>
      <price>65.95</price>
      <review>
         One of the best books on TCP/IP.
      </review>
    </entry>
  
  </reviews>

Adapted from XML Query Use Cases

This document uses a different DTD

  <!ELEMENT reviews (entry*)>
  <!ELEMENT entry   (title, price, review)>
  <!ELEMENT title   (#PCDATA)>
  <!ELEMENT price   (#PCDATA)>
  <!ELEMENT review  (#PCDATA)>

Query that joins two documents

For each book found at both bn.com and amazon.com, list the title of the book and its price from each source.

<books-with-prices>
   FOR $b IN document("http://www.bn.com/bib.xml")//book,
     $a IN document("http://www.amazon.com/reviews.xml")//entry
   WHERE $b/title = $a/title
   RETURN
    <book-with-prices>
     $b/title,
     <price-amazon> $a/price/text() </price-amazon>,
     <price-bn> $b/price/text() </price-bn>
    </book-with-prices>
</books-with-prices>

Adapted from XML Query Use Cases

Result

<books-with-prices>
  <book-with-prices>
    <title>TCP/IP Illustrated</title>
    <price-amazon>65.95</price-amazon>
    <price-bn>65.95</price-bn>
  </book-with-prices>

  <book-with-prices>
    <title>Advanced Programming in the Unix Environment</title>
    <price-amazon>65.95</price-amazon>
    <price-bn>65.95</price-bn>
  </book-with-prices>

  <book-with-prices>
    <title>Data on the Web</title>
    <price-amazon>34.95</price-amazon>
    <price-bn>39.95</price-bn>
  </book-with-prices>
</books-with-prices>

Adapted from XML Query Use Cases

Prices DTD

The next query also uses an input document named "prices.xml", with this DTD:

  <!ELEMENT prices (book*)>
  <!ELEMENT book   (title, source, price)>
  <!ELEMENT title  (#PCDATA)>
  <!ELEMENT source (#PCDATA)>
  <!ELEMENT price  (#PCDATA)>

prices.xml Query Sample Data

<prices>
    <book>
      <title>Advanced Programming in the Unix Environment</title>
      <source>www.amazon.com</source>
      <price>65.95</price>
    </book>
  
    <book>
      <title>Advanced Programming in the Unix Environment </title>
      <source>www.bn.com</source>
      <price>65.95</price>
    </book>
  
    <book>
      <title> TCP/IP Illustrated </title>
      <source>www.amazon.com</source>
      <price>65.95</price>
    </book>
  
    <book>
      <title> TCP/IP Illustrated </title>
      <source>www.bn.com</source>
      <price>65.95</price>
    </book>
  
    <book>
      <title>Data on the Web</title>
      <source>www.amazon.com</source>
      <price>34.95</price>
    </book>
  
    <book>
      <title>Data on the Web</title>
      <source>www.bn.com</source>
      <price>39.95</price>
    </book>
  </prices>

Adapted from XML Query Use Cases

Query with reused variables

In the document "prices.xml", find the minimum price for each book, in the form of a minprice element with the book title as its title attribute.

<results>
   LET $doc := document("prices.xml")
   FOR $t IN distinct($doc/book/title)
   LET $p := $doc/book[title = $t]/price
   RETURN
    <minprice title = $t/text()>
     min($p)
    </minprice>
</results>

Adapted from XML Query Use Cases

Query Result

<results>
  <minprice title="Advanced Programming in the Unix Environment"> 65.95 </minprice>
  <minprice title="TCP/IP Illustrated"> 65.95 </minprice>
  <minprice title="Data on the Web"> 34.95 </minprice>
</results>

Adapted from XML Query Use Cases

Multiple FLWR Queries

For each book with an author, return a book with its title and authors. For each book with an editor, return a reference with the book title and the editor's affiliation.

<bib>
   FOR $b IN document("http://www.bn.com/bib.xml")//book[author]
   RETURN
    <book>
     $b/title,
     $b/author
    </book>,
   FOR $b IN document("http://www.bn.com/bib.xml")//book[editor]
   RETURN
    <reference>
     $b/title,
     <org> $b/editor/affiliation/text() </org>
    </reference>
</bib>

Adapted from XML Query Use Cases

Query Result

<bib>
    <book>
         <title>TCP/IP Illustrated</title>
         <author><last> Stevens </last> <first> W.</first></author>
    </book>

    <book>
         <title>Advanced Programming in the Unix Environment</title>
         <author><last>Stevens</last><first>W.</first></author>
    </book>

    <book>
         <title>Data on the Web</title>
         <author><last>Abiteboul</last><first>Serge</first></author>
         <author><last>Buneman</last><first>Peter</first></author>
         <author><last>Suciu</last><first>Dan</first></author>
    </book>

    <reference>
        <title>The Economics of Technology and Content for Digital TV</title>
        <org>CITI</org>
    </reference>
</bib>

Adapted from XML Query Use Cases

Query Software

Quilt: http://www.almaden.ibm.com/cs/people/chamberlin/quilt.html
Kweelt: http://db.cis.upenn.edu/Kweelt/
Tamino
Ipedo

To Learn More

XSLT 1.1 Working Draft: http://www.w3.org/TR/xslt11/
XPath 2.0 Requirements: http://www.w3.org/TR/2001/WD-xpath20req-20010214
XSLT 2.0 Requirements: http://www.w3.org/TR/2001/WD-xslt20req-20010214
XQuery: A Query Language for XML: http://www.w3.org/TR/xquery/
XML Query Requirements: http://www.w3.org/TR/xmlquery-req
XML Query Use Cases: http://www.w3.org/TR/xmlquery-use-cases
XML Query Data Model: http://www.w3.org/TR/query-datamodel/
The XML Query Algebra: http://www.w3.org/TR/query-algebra/

Part IV: DOM Level 3

Trees

An XML document is a tree.
It has a root.
It has nodes.
It is amenable to recursive processing.
Not all applications agree on what the root is.
Not all applications agree on what is and isn't a node.

Document Object Model

Defines how XML and HTML documents are represented as objects in programs
W3C Standard
Defined in the Interface Definition Language (IDL) from the OMG; thus language independent
HTML as well as XML
Writing as well as reading
More complete than SAX; covers everything except internal and external DTD subsets
DOM focuses more on the document; SAX focuses more on the parser.

DOM Evolution

DOM Level 0: what was implemented for JavaScript in netscape 3/IE3
DOM Level 1, a W3C Standard
DOM Level 2, a W3C Standard
DOM Level 3: Several Working Drafts:

DOM Parsers for Java

Apache XML Project's Xerces Java: http://xml.apache.org/xerces-j/
IBM's XML for Java: http://www.alphaworks.ibm.com/formula/xml
Sun's Java API for XML http://java.sun.com/products/xml
None yet support DOM3

Eight Modules:

Eight Modules:
- Core: org.w3c.dom *
- HTML: org.w3c.dom.html
- Views: org.w3c.dom.views
- StyleSheets: org.w3c.dom.stylesheets
- CSS: org.w3c.dom.css
- Events: org.w3c.dom.events *
- Traversal: org.w3c.dom.traversal *
- Range: org.w3c.dom.range
Only the core and traversal modules really apply to XML. The other six are for HTML.
* indicates Xerces support

DOM Trees

Entire document is represented as a tree.
A tree contains nodes.
Some nodes may contain other nodes (depending on node type).
Each document node contains:
- zero or one doctype nodes
- one root element node
- zero or more comment and processing instruction nodes

The Core: org.w3c.dom

17 classes:
- Attr
- CDATASection
- CharacterData
- Comment
- Document
- DocumentFragment
- DocumentType
- DOMImplementation
- Element
- Entity
- EntityReference
- NamedNodeMap
- Node
- NodeList
- Notation
- ProcessingInstruction
- Text
plus one exception: DOMException
Plus a bunch of HTML stuff in org.w3c.dom.html and other packages we will ignore

The Node Interface

package org.w3c.dom;

public interface Node {

  // NodeType
  public static final short ELEMENT_NODE                = 1;
  public static final short ATTRIBUTE_NODE              = 2;
  public static final short TEXT_NODE                   = 3;
  public static final short CDATA_SECTION_NODE          = 4;
  public static final short ENTITY_REFERENCE_NODE       = 5;
  public static final short ENTITY_NODE                 = 6;
  public static final short PROCESSING_INSTRUCTION_NODE = 7;
  public static final short COMMENT_NODE                = 8;
  public static final short DOCUMENT_NODE               = 9;
  public static final short DOCUMENT_TYPE_NODE          = 10;
  public static final short DOCUMENT_FRAGMENT_NODE      = 11;
  public static final short NOTATION_NODE               = 12;

  public String       getNodeName();
  public String       getNodeValue() throws DOMException;
  public void         setNodeValue(String nodeValue) throws DOMException;
  public short        getNodeType();
  public Node         getParentNode();
  public NodeList     getChildNodes();
  public Node         getFirstChild();
  public Node         getLastChild();
  public Node         getPreviousSibling();
  public Node         getNextSibling();
  public NamedNodeMap getAttributes();
  public Document     getOwnerDocument();
  public Node         insertBefore(Node newChild, Node refChild) throws DOMException;
  public Node         replaceChild(Node newChild, Node oldChild) throws DOMException;
  public Node         removeChild(Node oldChild) throws DOMException;
  public Node         appendChild(Node newChild) throws DOMException;
  public boolean      hasChildNodes();
  public Node         cloneNode(boolean deep);
  public void         normalize();
  public boolean      supports(String feature, String version);
  public String       getNamespaceURI();
  public String       getPrefix();
  public void         setPrefix(String prefix) throws DOMException;
  public String       getLocalName();
}

The NodeList Interface

package org.w3c.dom;

public interface NodeList {
  public Node item(int index);
  public int  getLength();
}

Now we're really ready to read a document

Node Reporter

import org.apache.xerces.parsers.*;
import org.w3c.dom.*;
import org.xml.sax.*;
import java.io.*;


public class NodeReporter {

  public static void main(String[] args) {
     
    DOMParser parser = new DOMParser();
    NodeReporter iterator = new NodeReporter();
    
    for (int i = 0; i < args.length; i++) {
      try {
        // Read the entire document into memory
        parser.parse(args[i]); 
       
        Document doc = parser.getDocument();
        iterator.followNode(doc);
      }
      catch (SAXException e) {
        System.err.println(e); 
      }
      catch (IOException e) {
        System.err.println(e); 
      }
      
    }
  
  } // end main

  // note use of recursion
  public void followNode(Node node) {
    
    processNode(node);
    if (node.hasChildNodes()) {
      NodeList children = node.getChildNodes();
      for (int i = 0; i < children.getLength(); i++) {
        followNode(children.item(i));
      } 
    }
    
  }

  public void processNode(Node node) {
    
    String name = node.getNodeName();
    String type = getTypeName(node.getNodeType());
    System.out.println("Type " + type + ": " + name);
    
  }
  
  public static String getTypeName(int type) {
    
    switch (type) {
      case Node.ELEMENT_NODE: return "Element";
      case Node.ATTRIBUTE_NODE: return "Attribute";
      case Node.TEXT_NODE: return "Text";
      case Node.CDATA_SECTION_NODE: return "CDATA Section";
      case Node.ENTITY_REFERENCE_NODE: return "Entity Reference";
      case Node.ENTITY_NODE: return "Entity";
      case Node.PROCESSING_INSTRUCTION_NODE: return "Processing Instruction";
      case Node.COMMENT_NODE : return "Comment";
      case Node.DOCUMENT_NODE: return "Document";
      case Node.DOCUMENT_TYPE_NODE: return "Document Type Declaration";
      case Node.DOCUMENT_FRAGMENT_NODE: return "Document Fragment";
      case Node.NOTATION_NODE: return "Notation";
      default: return "Unknown Type"; 
    }
    
  }

}

Node Reporter Output

% java NodeReporter hotcop.xml
Type Document: #document
Type Processing Instruction: xml-stylesheet
Type Document Type Declaration: SONG
Type Element: SONG
Type Text: #text
Type Element: TITLE
Type Text: #text
Type Text: #text
Type Element: PHOTO
Type Text: #text
Type Element: COMPOSER
Type Text: #text
Type Text: #text
Type Element: COMPOSER
Type Text: #text
Type Text: #text
Type Element: COMPOSER
Type Text: #text
Type Text: #text
Type Element: PRODUCER
Type Text: #text
Type Text: #text
Type Comment: #comment
Type Text: #text
Type Element: PUBLISHER
Type Text: #text
Type Text: #text
Type Element: LENGTH
Type Text: #text
Type Text: #text
Type Element: YEAR
Type Text: #text
Type Text: #text
Type Element: ARTIST
Type Text: #text
Type Text: #text
Type Comment: #comment

Attributes are missing from this output. They are not nodes. They are properties of nodes.

Node Values as returned by getNodeValue()

Node Type	Node Value
element node	null
attribute node	attribute value
text node	text of the node
CDATA section node	text of the section
entity reference node	null
entity node	null
processing instruction node	content of the processing instruction, not including the target
comment node	text of the comment
document node	null
document type declaration node	null
document fragment node	null
notation node	null

New Features in DOM Level 3

Grammar access; a.k.a content models (DTDs and schemas)
Extra attributes on Entity, Document, Node, and Text interfaces
Standard means of loading and saving XML documents.
Bootstrapping new documents
Key events

DOM Level 3 Core Additions

DOMKey
Node3
Document3
Text3
Entity3
Bootstrapping

DOMKey

Every node gets a unique key automatically generated by the DOM implementation to uniquely identify DOM nodes.
Type, attributes, and methods of the DOMKey interface remain to be determined

Node3

Extends DOM2 Node.
Adds:

baseURI

The URI this document came from. May be null.

Document order

The order of a node relative to another reference node in document order

Tree Position

The order of a node relative to another reference node in tree order

Methods to test for equality

Methods to work with namespaces

In IDL:

interface Node3 {

  readonly attribute DOMString baseURI;

  typedef enum _DocumentOrder {
    DOCUMENT_ORDER_PRECEDING,
    DOCUMENT_ORDER_FOLLOWING,
    DOCUMENT_ORDER_SAME,
    DOCUMENT_ORDER_UNORDERED
  };
  DocumentOrder;
  DocumentOrder compareDocumentOrder(in Node other) raises(DOMException);

  typedef enum _TreePosition {
    TREE_POSITION_PRECEDING,
    TREE_POSITION_FOLLOWING,
    TREE_POSITION_ANCESTOR,
    TREE_POSITION_DESCENDANT,
    TREE_POSITION_SAME,
    TREE_POSITION_UNORDERED
  };
  TreePosition;
  TreePosition compareTreePosition(in Node other) raises(DOMException);
           attribute DOMString textContent;
  readonly attribute DOMKey    key;
  
  boolean    isSameNode(in Node other);
  DOMString  lookupNamespacePrefix(in DOMString namespaceURI);
  DOMString  lookupNamespaceURI(in DOMString prefix);
  void       normalizeNS();
  boolean    equalsNode(in Node arg, in boolean deep);
};

Java binding:

package org.w3c.dom;

public interface Node3 {

  public String getBaseURI();

  public static final int DOCUMENT_ORDER_PRECEDING = 1;
  public static final int DOCUMENT_ORDER_FOLLOWING = 2;
  public static final int DOCUMENT_ORDER_SAME      = 3;
  public static final int DOCUMENT_ORDER_UNORDERED = 4;
  
  public int compareDocumentOrder(Node other) throws DOMException;

  public static final int TREE_POSITION_PRECEDING  = 1;
  public static final int TREE_POSITION_FOLLOWING  = 2;
  public static final int TREE_POSITION_ANCESTOR   = 3;
  public static final int TREE_POSITION_DESCENDANT = 4;
  public static final int TREE_POSITION_SAME       = 5;
  public static final int TREE_POSITION_UNORDERED  = 6;

  public int compareTreePosition(Node other) throws DOMException;

  public String getTextContent();
  public void   setTextContent(String textContent);

  public boolean isSameNode(Node other);
  public boolean equalsNode(Node arg, boolean deep);

  public String lookupNamespacePrefix(String namespaceURI);
  public String lookupNamespaceURI(String prefix);
  public void   normalizeNS();

  public Object getKey();

}

Entity3

XML documents may be built from multiple parsed entities, each of which is not necessarily a well-formed XML document, but is at least a plausible part of a well-formed XML document.
Each entity may have its own text declaration. This is like an XML declaration without a standalone attribute and with an optional version attribute:
```
<?xml version="1.0"?>
<?xml version="1.0" encoding="ISO-8859-9"?>
<?xml encoding="ISO-8859-9"?>
```
DOM3 Entity3 extends DOM2 Entity to add information from text declarations
Adds:

encoding

A string specifying what encoding the the text declaration claims this entity uses. This is null if this entity is not an external parsed entity.

actualEncoding

A string specifying the actual encoding of this entity. This is null if this entity is not an external parsed entity.

version

A string specifying the XML version given in the text declaration. This is null if this entity is not an external parsed entity.

In IDL:

interface Entity3 : Entity {
  attribute DOMString  actualEncoding;
  attribute DOMString  encoding;
  attribute DOMString  version;
};

Java binding:


package org.w3c.dom;

public interface Entity3 extends Entity {
  
  public String getActualEncoding();
  public void   setActualEncoding(String actualEncoding);
  public String getEncoding();
  public void   setEncoding(String encoding);
  public String getVersion();
  public void   setVersion();
  
}

Document3

Extends DOM2 Document.

Adds:

XML Declaration

encoding, version, and standalone attributes:

<?xml version="1.0"?>
<?xml version="1.0" encoding="ISO-8859-9"?>
<?xml version="1.0" encoding="ISO-8859-9" standalone="no"?>
<?xml version="1.0" standalone="yes"?>

adoptNode()

Method to move node from one document to another

In IDL:

interface Document3 : Document {
  attribute DOMString actualEncoding;
  attribute DOMString encoding;
  attribute boolean   standalone;
  attribute boolean   strictErrorChecking;
  attribute DOMString version;
  
  Node adoptNode(in Node source) raises(DOMException);
};

Java binding:

package org.w3c.dom;

public interface Document3 extends Document {

  public String  getActualEncoding();
  public void    setActualEncoding(String actualEncoding);

  public String  getEncoding();
  public void    setEncoding(String encoding);

  public boolean getStandalone();
  public void    setStandalone(boolean standalone);

  public boolean getStrictErrorChecking();
  public void    setStrictErrorChecking(boolean strictErrorChecking);

  public String  getVersion();
  public void    setVersion(String version);

  public Node    adoptNode(Node source) throws DOMException;

}

Text3

Extends DOM2 Text interface
Adds:

isWhitespaceInElementContent()

Returns true if this node contains "ignorable" whitespace

In IDL:

interface Text3 : Text {
  readonly attribute boolean isWhitespaceInElementContent;
};

Java binding:

package org.w3c.dom;
  
  public interface Text3 extends Text {
  
    public boolean getIsWhitespaceInElementContent();

  }

Bootstrapping

DOM2 has no implementation-independent means to create a new Document object
Implementation-dependent methods tend to be fairly complex. For example, in Xerces-J:
DOMImplementation impl = DOMImplementationImpl.getDOMImplementation(); Document fibonacci = impl.createDocument(null, "Fibonacci_Numbers", null);

DOM3 Bootstrapping

Still no language-independent means to create a new Document object
Does provide an implementation-independent method for Java only:
DOMImplementation impl = DOMImplementationFactory.getDOMImplementation();

package org.w3c.dom;

public abstract class DOMImplementationFactory {

  // The system property to specify the DOMImplementation class name.
  private static String property = "org.w3c.dom.DOMImplementation";

  // The default DOMImplementation class name to use.
  private static String defaultImpl = "NO DEFAULT IMPLEMENTATION SET";

   public static DOMImplementation getDOMImplementation()
     throws ClassNotFoundException, InstantiationException,
            IllegalAccessException, ClassCastException {
     // Retrieve the system property
     String impl;
     try {
       impl = System.getProperty(property, defaultImpl);
     } 
     catch (SecurityException e) {
       // fallback on default implementation in case of security problem
       impl = defaultImpl;
     }

     // Attempt to load, instantiate and return the implementation class
     return (DOMImplementation) Class.forName(impl).newInstance();
  }
   
}

Load and Save

Loading: parsing an existing XML document to produce a Document object
Saving: serializing a Document object into a file or onto a stream
Completely implementation dependent in DOM2

The DOM Process

Library specific code creates a parser
The parser parses the document and returns a DOM org.w3c.dom.Document object.
The entire document is stored in memory.
DOM methods and interfaces are used to extract data from this object

Parsing documents with DOM2

This program parses with Xerces. Other parsers are different.

import org.apache.xerces.parsers.*;
import org.w3c.dom.*;
import org.xml.sax.*;
import java.io.*;

public class DOMParserMaker {

  public static void main(String[] args) {
     
    DOMParser parser = new DOMParser();
    
    for (int i = 0; i < args.length; i++) {
      try {
        parser.parse(args[i]); 
       
        Document d = parser.getDocument();
      }
      catch (SAXException e) {
        System.err.println(e); 
      }
      catch (IOException e) {
        System.err.println(e); 
      }
      
    }
   
  }

}

Parsing documents with DOM3

import org.w3c.dom.*;

public class DOM3ParserMaker {

  public static void main(String[] args) {

    DOMImplementationFactoryLS impl =
      (DOMImplementationLS) DOMImplementationFactory.getDOMImplementation();
    DOMBuilder parser = impl.getDOMBuilder();

    for (int i = 0; i < args.length; i++) {
      try {
        Document d = parser.parseURI(args[i]);
      }
      catch (DOMSystemException e) {
        System.err.println(e);
      }
      catch (DOMException e) {
        System.err.println(e);
      }

    }

  }

}

This code will not actually compile or run until some parser supports DOM3 Load and Save.

Load and Save

DOMImplementationLS: A new DOMImplementation interface that provides the factory methods for creating the objects required for loading and saving.
DOMBuilder: A parser interface
DOMInputSource: Encapsulate information about the source of the XML to be loaded, like SAX's InputSource
DOMEntityResolver: During loading, provides a way for applications to redirect references to external entities.
DOMBuilderFilter: Provide the ability to examine and optionally remove Element nodes as they are being processed during the parsing of a document. like SAX filters.
DOMWriter: An interface for serializing DOM documents onto a stream.

DOMImplementationLS

Factory interface to create new DOMBuilder and DOMWriter implementations.

IDL:

  interface DOMImplementationLS {
    DOMBuilder  createDOMBuilder();
    DOMWriter   createDOMWriter();
  };

Java Binding:

package org.w3c.dom.loadSave;

public interface DOMImplementationLS {

    public DOMBuilder createDOMBuilder();
    public DOMWriter  createDOMWriter();

}

DOMBuilder

Provides an implementation-independent API for parsing XML documents to produce a DOM Document object.
Instances are built by the createDOMBuilder() method in DOMImplementationLS.

IDL:

  interface DOMBuilder {
    attribute DOMEntityResolver entityResolver;
    attribute DOMErrorHandler   errorHandler;
    attribute DOMBuilderFilter  filter;
             
    void      setFeature(in DOMString name, 
                         in boolean state)
                            raises(dom::DOMException);
    boolean   supportsFeature(in DOMString name);
    boolean   canSetFeature(in DOMString name, 
                            in boolean state);
    boolean   getFeature(in DOMString name)
                          raises(dom::DOMException);
    Document  parseURI(in DOMString uri)
                        raises(dom::DOMException, dom::DOMSystemException);
    Document  parseDOMInputSource(in DOMInputSource is)
                                     raises(dom::DOMException, 
                                            dom::DOMSystemException);
  };

Java Binding:

package org.w3c.dom.loadSave;

public interface DOMBuilder {

    public DOMEntityResolver getEntityResolver();
    public void setEntityResolver(DOMEntityResolver entityResolver);

    public DOMErrorHandler getErrorHandler();
    public void setErrorHandler(DOMErrorHandler errorHandler);

    public DOMBuilderFilter getFilter();
    public void setFilter(DOMBuilderFilter filter);

    public void setFeature(String name, boolean state)
      throws DOMException;
    public boolean supportsFeature(String name);
    public boolean canSetFeature(String name, boolean state);
    public boolean getFeature(String name) throws DOMException;
    public Document parseURI(String uri)
      throws DOMException, DOMSystemException;
    public Document parseDOMInputSource(DOMInputSource is)
     throws DOMException, DOMSystemException;

}

DOMInputSource

Like SAX2's InputSource class, this interface is an abstration of all the different things (streams, files, byte arrays, sockets, URLs, etc.) from which an XML document can be read.

IDL:

  interface DOMInputSource {
    attribute DOMInputStream  byteStream;
    attribute DOMReader       characterStream;
    attribute DOMString       encoding;
    attribute DOMString       publicId;
    attribute DOMString       systemId;
  };

Java Binding:

package org.w3c.dom.loadSave;

public interface DOMInputSource {

    public InputStream getByteStream();
    public void        setByteStream(InputStream in);
    public Reader      getCharacterStream();
    public void        setCharacterStream(Reader in);

    public String getEncoding();
    public void   setEncoding(String encoding);
    public String getPublicId();
    public void   setPublicId(String publicId);
    public String getSystemId();
    public void   setSystemId(String systemId);

}

DOMEntityResolver

Like SAX2's EntityResolver interface, this interface lets applications redirect references to external entities.

IDL:

  interface DOMEntityResolver {
    DOMInputSource resolveEntity(in DOMString publicId, 
                                 in DOMString systemId )
                                    raises(dom::DOMSystemException);
  };

Java Binding:

package org.w3c.dom.loadSave;

public interface DOMEntityResolver {

    public DOMInputSource resolveEntity(String publicId, 
      String systemId ) throws DOMSystemException;

}

DOMWriter

Provides an API for serializing (writing) a DOM document out as a sequence of bytes onto a stream, file, socket, byte array, etc.

IDL:

  interface DOMWriter {
             attribute DOMString encoding;
    readonly attribute DOMString lastEncoding;
             attribute unsigned short format;
    // Modified in DOM Level 3:
             attribute DOMString newLine;
    void  writeNode(in DOMOutputStream destination, in Node node)
                       raises(dom::DOMSystemException);
  };

Java Binding:

package org.w3c.dom.loadSave;

public interface DOMWriter {

    public String getEncoding();
    public void   setEncoding(String encoding);
    public String getLastEncoding();
    public short  getFormat();
    public void   setFormat(short format);
    public String getNewLine();
    public void   setNewLine(String newLine);
    public void   writeNode(OutputStream out, Node node)
      throws DOMSystemException;

}

DOMBuilderFilter

Lets applications examine element nodes as they are being constructed during a parse.
As each element is examined, it may be modified or removed, or parsing may be aborted.

IDL:

  interface DOMBuilderFilter {
    boolean endElement(in Element element);
  };

Java Binding:

package org.w3c.dom.loadSave;

public interface DOMBuilderFilter {

  public boolean endElement(Element element);

}

Grammar Access/Content Models

Content Model Interfaces

Content Model and CM-Editing Interfaces:
- CMModel
- CMExternalModel
- CMNode
- CMNodeList
- CMNamedNodeMap
- CMDataType
- ElementDeclaration
- CMChildren
- AttributeDeclaration
- EntityDeclaration
- CMNotationDeclaration
Validation and Other Interfaces:
- Document
- DocumentCM
- DOMImplementationCM
Document-Editing Interfaces:
- NodeCM
- ElementCM
- CharacterDataCM
- DocumentTypeCM
- AttributeCM
DOM Error Handler Interfaces:
- DOMErrorHandler
- DOMLocator

Content Model and CM-Editing Interfaces

DOMImplementation.hasFeature("CM-EDIT") returns true if a given DOM supports these interfaces for editing content models.
- CMModel
- CMExternalModel
- CMNode
- CMNodeList
- CMNamedNodeMap
- CMDataType
- ElementDeclaration
- CMChildren
- AttributeDeclaration
- EntityDeclaration
- CMNotationDeclaration

The CMModel Interface

Represents an abstract content model that could be a DTD, an XML Schema, a database schema, or something else. It has both an internal and external subset.

IDL:

  interface CMModel : CMNode {
    readonly attribute boolean          isNamespaceAware;
    readonly attribute ElementDeclaration  rootElementDecl;
    DOMString          getLocation();
    nsElement          getCMNamespace();
    CMNamedNodeMap     getCMNodes();
    boolean            removeNode(in CMNode node);
    boolean            insertBefore(in CMNode newNode, 
                                    in CMNode refNode);
    boolean            validate();
  };

Java binding:

package org.w3c.dom.contentModel;

public interface CMModel extends CMNode {

    public boolean getIsNamespaceAware();
    public ElementDeclaration getRootElementDecl();
    public String getLocation();
    public nsElement getCMNamespace();
    public CMNamedNodeMap getCMNodes();
    public boolean removeNode(CMNode node);
    public boolean insertBefore(CMNode newNode, CMNode refNode);
    public boolean validate();  

}

The CMExternalModel Interface

A CMModel that is not bound to a particular document, and can thus be shared among documents.

IDL:

  interface CMExternalModel : CMModel {
  };

Java binding:

package org.w3c.dom.contentModel;

public interface CMExternalModel extends CMModel {
}

The CMNode Interface

The node for the various kinds of declarations out of which CMModels are built

IDL:

  interface CMNode {
    const unsigned short ELEMENT_DECLARATION     = 1;
    const unsigned short ATTRIBUTE_DECLARATION   = 2;
    const unsigned short CM_NOTATION_DECLARATION = 3;
    const unsigned short ENTITY_DECLARATION      = 4;
    const unsigned short CM_CHILDREN             = 5;
    const unsigned short CM_MODEL                = 6;
    const unsigned short CM_EXTERNALMODEL        = 7;
    
    readonly attribute unsigned short cmNodeType;
    
    CMNode             cloneCM();
    CMNode             cloneExternalCM();
  };

Java binding:

package org.w3c.dom.contentModel;

public interface CMNode {

    public static final short ELEMENT_DECLARATION       = 1;
    public static final short ATTRIBUTE_DECLARATION     = 2;
    public static final short CM_NOTATION_DECLARATION   = 3;
    public static final short ENTITY_DECLARATION        = 4;
    public static final short CM_CHILDREN               = 5;
    public static final short CM_MODEL                  = 6;
    public static final short CM_EXTERNALMODEL          = 7;

    public short  getCmNodeType();
    public CMNode cloneCM();
    public CMNode cloneExternalCM();

}

The CMNodeList Interface

An ordered list of the nodes in a content model
IDL:
```
  interface CMNodeList {
  };
```

Java binding:

package org.w3c.dom.contentModel;

public interface CMNodeList {
}

The CMNamedNodeMap Interface

An unordered set of CM nodes
IDL:
```
  interface CMNamedNodeMap {
  };
```

Java binding:

package org.w3c.dom.contentModel;

public interface CMNamedNodeMap {
}

The CMDataType Interface

Primitive data types used in content models
This one is a little weak

IDL:

  interface CMDataType {
    const     short STRING_DATATYPE  = 1;
    const     short BOOLEAN_DATATYPE = 2;
    const     short FLOAT_DATATYPE   = 3;
    const     short DOUBLE_DATATYPE  = 4;
    const     short LONG_DATATYPE    = 5;
    const     short INT_DATATYPE     = 6;
    const     short SHORT_DATATYPE   = 7;
    const     short BYTE_DATATYPE    = 8;
    
    attribute int   lowValue;
    attribute int   highValue;
    
              short getPrimitiveType();
  };

Java binding:

package org.w3c.dom.contentModel;

public interface CMDataType {

    public static final short STRING_DATATYPE  = 1;
    public static final short BOOLEAN_DATATYPE = 2;
    public static final short FLOAT_DATATYPE   = 3;
    public static final short DOUBLE_DATATYPE  = 4;
    public static final short LONG_DATATYPE    = 5;
    public static final short INT_DATATYPE     = 6;
    public static final short SHORT_DATATYPE   = 7;
    public static final short BYTE_DATATYPE    = 8;
    
    public int  getLowValue();
    public void setLowValue(int lowValue);
    public int  getHighValue();
    public void setHighValue(int highValue);

    public short getPrimitiveType();

}

The ElementDeclaration Interface

Represents a declaration of an element such as <!ELEMENT TIME (#PCDATA)> or an xsd:element schema element

IDL:

  interface ElementDeclaration {
    int            getContentType();
    CMChildren     getCMChildren();
    CMNamedNodeMap getCMAttributes();
    CMNamedNodeMap getCMGrandChildren();
  };

Java binding:

package org.w3c.dom.contentModel;

public interface ElementDeclaration {

    public int            getContentType();
    public CMChildren     getCMChildren();
    public CMNamedNodeMap getCMAttributes();
    public CMNamedNodeMap getCMGrandChildren();

}

The CMChildren Interface

Represents an element in the context of a CMNode.

IDL:

  interface CMChildren {
             attribute DOMString      listOperator;
             attribute CMDataType     elementType;
             attribute int            multiplicity;
             attribute CMNamedNodeMap subModels;
    readonly attribute boolean        isPCDataOnly;
  };

Java binding:

package org.w3c.dom.contentModel;

public interface CMChildren {

    public String     getListOperator();
    public void       setListOperator(String listOperator);
    public CMDataType getElementType();
    public void       setElementType(CMDataType elementType);

    public int            getMultiplicity();
    public void           setMultiplicity(int multiplicity);
    public CMNamedNodeMap getSubModels();
    public void           setSubModels(CMNamedNodeMap subModels);
    public boolean        getIsPCDataOnly();

}

The AttributeDeclaration Interface

Represents a declaration of an attribute; e.g. an xsd:attribute schema element oe
<!ATTLIST TIME HOURS CDATA #IMPLIED>

IDL:

  interface AttributeDeclaration {
    const short NO_VALUE_CONSTRAINT      = 0;
    const short DEFAULT_VALUE_CONSTRAINT = 1;
    const short FIXED_VALUE_CONSTRAINT   = 2;
    
    readonly attribute DOMString  attrName;
             attribute CMDataType attrType;
             attribute DOMString  attributeValue;
             attribute DOMString  enumAttr;
             attribute CMNodeList ownerElement;
             attribute short      constraintType;
  };

Java binding:

package org.w3c.dom.contentModel;

public interface AttributeDeclaration {

    public static final short NO_VALUE_CONSTRAINT       = 0;
    public static final short DEFAULT_VALUE_CONSTRAINT  = 1;
    public static final short FIXED_VALUE_CONSTRAINT    = 2;

    public String     getAttrName();
    public CMDataType getAttrType();
    public void       setAttrType(CMDataType attrType);
    public String     getAttributeValue();
    public void       setAttributeValue(String value);
    public String     getEnumAttr();
    public void       setEnumAttr(String enumAttr);
    public CMNodeList getOwnerElement();
    public void       setOwnerElement(CMNodeList ownerElement);
    public short      getConstraintType();
    public void       setConstraintType(short constraintType);

}

The EntityDeclaration Interface

Represents a declaration of an entity; e.g.
<!ENTITY COPY01 "Copyright 2001 Elliotte Harold">
IDL:
```
  interface EntityDeclaration {
  };
```

Java binding:

package org.w3c.dom.contentModel;

public interface EntityDeclaration {
}

The CMNotationDeclaration Interface

Represents a declaration of a notation; e.g.
<!NOTATION TXT SYSTEM "text/plain">

IDL:

  interface CMNotationDeclaration {
             attribute DOMString strSystemIdentifier;
             attribute DOMString strPublicIdentifier;
  };

Java binding:

package org.w3c.dom.contentModel;

public interface CMNotationDeclaration {

    public String getStrSystemIdentifier();
    public void   setStrSystemIdentifier(String strSystemIdentifier);
    public String getStrPublicIdentifier();
    public void   setStrPublicIdentifier(String strPublicIdentifier);

}

Validation and Other Interfaces:

Document
DocumentCM
DOMImplementationCM

The Document Interface

The DOM2 Document interface gets a new setErrorHandler() method

IDL:

  interface Document {
    void setErrorHandler(in DOMErrorHandler handler);
  };

Java binding:

package org.w3c.dom.contentModel;

public interface Document {

    public void setErrorHandler(DOMErrorHandler handler);

}

The different specs aren't syncedup on this one yet.

The DocumentCM Interface

Extends the Document interface with additional methods for both document and CM editing.

IDL:

interface DocumentCM : Document {
  int                numCMs();
  CMModel            getInternalCM();
  CMExternalModel *  getCMs();
  CMModel            getActiveCM();
  void               addCM(in CMModel cm);
  void               removeCM(in CMModel cm);
  boolean            activateCM(in CMModel cm);
};

Java binding:

package org.w3c.dom.contentModel;

public interface DocumentCM extends Document {

    public int             numCMs();
    public CMModel         getInternalCM();
    public CMExternalModel getCMs();
    public CMModel         getActiveCM();
    public void            addCM(CMModel cm);
    public void            removeCM(CMModel cm);
    public boolean         activateCM(CMModel cm);

}

The DOMImplementationCM Interface

Extends the DOM2 DOMImplementation interface with factory methods to create content models

IDL:

  interface DOMImplementationCM : DOMImplementation {
    CMModel         createCM();
    CMExternalModel createExternalCM();
  };

Java binding:

package org.w3c.dom.contentModel;

public interface DOMImplementationCM extends DOMImplementation {

    public CMModel         createCM();
    public CMExternalModel createExternalCM();

}

Document-Editing Interfaces:

DOMImplementation.hasFeature("CM-DOC") returns true if a given DOM supports these capabilities.
- NodeCM
- ElementCM
- CharacterDataCM
- DocumentTypeCM
- AttributeCM

The NodeCM Interface

Extends the DOM2 Node interface with methods for guided document editing.

IDL:

  interface NodeCM : Node {
    boolean canInsertBefore(in Node newChild, 
                            in Node refChild)
                               raises(dom::DOMException);
    boolean canRemoveChild(in Node oldChild)
                              raises(dom::DOMException);
    boolean canReplaceChild(in Node newChild, 
                            in Node oldChild)
                               raises(dom::DOMException);
    boolean canAppendChild(in Node newChild)
                              raises(dom::DOMException);
    boolean isValid();
  };

Java binding:

package org.w3c.dom.contentModel;

public interface NodeCM extends Node {

    public boolean canInsertBefore(Node newChild, Node refChild)
      throws DOMException;
    public boolean canRemoveChild(Node oldChild)
      throws DOMException;
    public boolean canReplaceChild(Node newChild, Node oldChild)
      throws DOMException;
    public boolean canAppendChild(Node newChild)
      throws DOMException;
    public boolean isValid();

}

The ElementCM Interface

Extends the DOM2 Element interface with methods for guided document editing.

IDL:

  interface ElementCM : Element {
    int                contentType();
    ElementDeclaration getElementDeclaration()
                                        raises(dom::DOMException);
    boolean            canSetAttribute(in DOMString attrname, 
                                       in DOMString attrval);
    boolean            canSetAttributeNode(in Node node);
    boolean            canSetAttributeNodeNS(in Node node, 
                                             in DOMString namespaceURI, 
                                             in DOMString localName);
    boolean            canSetAttributeNS(in DOMString attrname, 
                                         in DOMString attrval, 
                                         in DOMString namespaceURI, 
                                         in DOMString localName);
  };

Java binding:

package org.w3c.dom.contentModel;

public interface ElementCM extends Element {

    public int contentType();

    public ElementDeclaration getElementDeclaration()
     throws DOMException;

    public boolean canSetAttribute(String attrname, String attrval);
    public boolean canSetAttributeNode(Node node);
    public boolean canSetAttributeNodeNS(Node node, 
     String namespaceURI, String localName);
    public boolean canSetAttributeNS(String attrname, 
     String attrval, String namespaceURI, String localName);
     
}

The CharacterDataCM Interface

Extends the DOM2 Text interface (which itself extends the DOM2 CharacterData interface) with methods for guided document editing.

IDL:

  interface CharacterDataCM : Text {
    boolean isWhitespaceOnly();
    boolean canSetData(in unsigned long offset, 
                       in DOMString arg)
                          raises(dom::DOMException);
    boolean canAppendData(in DOMString arg)
                             raises(dom::DOMException);
    boolean canReplaceData(in unsigned long offset, 
                           in unsigned long count, 
                           in DOMString arg)
                              raises(dom::DOMException);
    boolean canInsertData(in unsigned long offset, 
                          in DOMString arg)
                             raises(dom::DOMException);
    boolean canDeleteData(in unsigned long offset, 
                          in DOMString arg)
                             raises(dom::DOMException);
  };

Java binding:

package org.w3c.dom.contentModel;

public interface CharacterDataCM extends Text {

    public boolean isWhitespaceOnly();
    public boolean canSetData(int offset, String arg)
      throws DOMException;
    public boolean canAppendData(String arg)
      throws DOMException;
    public boolean canReplaceData(int offset, int count, String arg)
      throws DOMException;
    public boolean canInsertData(int offset, String arg)
      throws DOMException;
    public boolean canDeleteData(int offset, String arg)
      throws DOMException;

}

The DocumentTypeCM Interface

Extends the DOM2 DocumentType interface with methods for guided document editing.

IDL:

  interface DocumentTypeCM : DocumentType {
    boolean isElementDefined(in DOMString elemTypeName);
    boolean isElementDefinedNS(in DOMString elemTypeName, 
                               in DOMString namespaceURI, 
                               in DOMString localName);
    boolean isAttributeDefined(in DOMString elemTypeName, 
                               in DOMString attrName);
    boolean isAttributeDefinedNS(in DOMString elemTypeName, 
                                 in DOMString attrName, 
                                 in DOMString namespaceURI, 
                                 in DOMString localName);
    boolean isEntityDefined(in DOMString entName);
  };

Java binding:

package org.w3c.dom.contentModel;

public interface DocumentTypeCM extends DocumentType {

    public boolean isElementDefined(String elemTypeName);
    public boolean isElementDefinedNS(String elemTypeName, 
                                      String namespaceURI, 
                                      String localName);
    public boolean isAttributeDefined(String elemTypeName, 
                                      String attrName);
    public boolean isAttributeDefinedNS(String elemTypeName, 
                                        String attrName, 
                                        String namespaceURI, 
                                        String localName);
    public boolean isEntityDefined(String entName);

}

The AttributeCM Interface

Extends the DOM2 Attr interface with methods for guided document editing.

IDL:

  interface AttributeCM : Attr {
    AttributeDeclaration  getAttributeDeclaration();
    CMNotationDeclaration getNotation() raises(dom::DOMException);
  };

Java binding:

package org.w3c.dom.contentModel;

public interface AttributeCM extends Attr {

    public AttributeDeclaration  getAttributeDeclaration();
    public CMNotationDeclaration getNotation()
      throws DOMException;

}

DOM Error Handler Interfaces

DOMErrorHandler
DOMLocator

The DOMErrorHandler Interface

Similar to SAX2's ErrorHandler interface.
A callback interface
An application implements this interface and then registers it with the setErrorHandler() method to provide warnings, errors, and fatal errors.

IDL:

  interface DOMErrorHandler {
    void warning(in DOMLocator where, 
                 in DOMString how, 
                 in DOMString why)
                    raises(dom::DOMSystemException);
    void fatalError(in DOMLocator where, 
                    in DOMString how, 
                    in DOMString why)
                       raises(dom::DOMSystemException);
    void error(in DOMLocator where, 
               in DOMString how, 
               in DOMString why)
                  raises(dom::DOMSystemException);
  };

Java binding:

package org.w3c.dom.contentModel;

public interface DOMErrorHandler {

    public void warning(DOMLocator where, String how,  String why)
      throws DOMSystemException;
    public void fatalError(DOMLocator where, String how, String why)
      throws DOMSystemException;
    public void error(DOMLocator where, String how, String why)
      throws DOMSystemException;

}

The DOMLocator Interface

Similar to SAX2's Locator interface. An application can implement this interface and then register it with the setLocator() method to find out in which line and column and file a given node appears.

IDL:

  interface DOMLocator {
    int        getColumnNumber();
    int        getLineNumber();
    DOMString  getPublicID();
    DOMString  getSystemID();
    Node       getNode();
  };

Java binding:

package org.w3c.dom.contentModel;

public interface DOMLocator {

    public int    getColumnNumber();
    public int    getLineNumber();
    public String getPublicID();
    public Node   getNode();

}

To Learn More

Document Object Model (DOM) Level 3 Content Models and Load and Save Specification: http://www.w3.org/TR/DOM-Level-3-CMLS/
Document Object Model (DOM) Level 3 Core Specification Version 1.0: http://www.w3.org/TR/DOM-Level-3-Core
Document Object Model (DOM) Requirements: http://www.w3.org/TR/DOM-Requirements/
Document Object Model (DOM) Level 3 Views and Formatting Specification: http://www.w3.org/TR/DOM-Level-3-Views/

Part V: JDOM

There is no compelling reason for a Java API to manipulate XML to be complex, tricky, unintuitive, or a pain in the neck.

--JDOM Mission Statement

Where we're going

Writing XML with JDOM
Reading XML through JDOM
The JDOM Classes

Trees

An XML document is a tree.
It has a root.
It has nodes.
It is amenable to recursive processing.
Not all applications agree on what the root is.
Not all applications agree on what is and isn't a node.

Processing XML with JDOM is easy

You need a JDK.
You need some free class libraries.
You need a text editor.
You need some data to process.

What is JDOM?

A Pure Java API for reading and writing XML Documents
A Java-oriented API for reading and writing XML Documents
A tree-oriented API for reading and writing XML Documents
A parser independent API for reading and writing XML Documents

About JDOM

Created by Brett McLaughlin and Jason Hunter. (James Duncan Davidson is an unindicted coconspirator.)
Open source with an Apache-like license
http://www.jdom.org/

JDOM versions

1.0 Beta 5 is current tarball from October 7, 2000
Last three weeks have added some functionality to the API
This presentation is based on the October 26, 2000 CVS version
cvs.jdom.org

Four packages:

org.jdom: the classes that represent an XML document and its parts
org.jdom.input: classes for reading a document into memory
org.jdom.output: classes for writing a document onto a stream or other target (e.g. SAX or DOM app)
org.jdom.adapters: classes for hooking up to DOM implementations

The org.jdom package

The classes that represent an XML document and its parts

Attribute
Comment
DocType
Document
Element
Entity
ProcessingInstruction
plus Verifier
plus assorted exceptions

The org.jdom.input package

Classes for reading a document into memory from a file or other source

DOMBuilder
SAXBuilder

The org.jdom.output package

The classes for writing a document to a file or other target

XMLOutputter
SAXOutputter (under development)
DOMOutputter (under development)

The org.jdom.adapters package

Classes for hooking up JDOM to DOM implementations:
- AbstractDOMAdapter
- OracleV1DOMAdapter
- OracleV2DOMAdapter
- ProjectXDOMAdapter
- XercesDOMAdapter
- XercesDOMAdapter
You rarely need to access these directly.

Writing XML Documents with JDOM

JDOM is for both input and output
New documents can be read from a stream or constructed in memory
An org.jdom.output.XMLOutputter sends a document from memory to an OutputStream or Writer
A JDOM document can also be sent to a SAX ContentHandler or DOM org.w3c.dom.Document for further processing with a different API

A JDOM program that writes this XML document

<?xml version="1.0"?>
<GREETING>
  Hello JDOM!
</GREETING>

Hello JDOM

import org.jdom.*;
import org.jdom.output.XMLOutputter;


public class HelloJDOM {

  public static void main(String[] args) {
   
    Element root = new Element("GREETING");
    	
    root.setText("Hello JDOM!");
         
    Document doc = new Document(root);      
    
    // At this point the document only exists in memory.
    // We still need to serialize it
    XMLOutputter outputter = new XMLOutputter();
    try {
      outputter.output(doc, System.out);       
    }
    catch (Exception e) {
      System.err.println(e);
    }

  }

}

Actual Output

<?xml version="1.0" encoding="UTF-8"?><GREETING>Hello JDOM!</GREETING>

This is more or less what we wanted, modulo white space.

Hello DOM

Here's the same program using DOM instead of JDOM. Which is simpler?

import java.io.*;
import org.w3c.dom.*;
import org.apache.xerces.dom.*;
import org.apache.xml.serialize.*;


public class HelloDOM {

  public static void main(String[] args) {

    try {

      DOMImplementation impl = DOMImplementationImpl.getDOMImplementation();
      //                       ^^^^^^^^^^^^^^^^^^^^^
      //                       Xerces Specific class

      Document hello = impl.createDocument(null, "GREETING", null);
      //                                   ^^^^              ^^^^
      //                               Namespace URI       DocType

      Element root = hello.getDocumentElement();

      // We can't use a raw string. Instead we have to first create
      // a text node.
      Text text = hello.createTextNode("Hello DOM!");
      root.appendChild(text);

      // Now that the document is created we need to *serialize* it
      try {
        OutputFormat format = new OutputFormat(hello);
        XMLSerializer serializer = new XMLSerializer(System.out, format);
        serializer.serialize(root);
      }
      catch (IOException e) {
        System.err.println(e);
      }
    }
    catch (DOMException e) {
      e.printStackTrace();
    }

  }

}

A Java program that writes Fibonacci numbers into a text file

import java.math.*;
import java.io.*;


public class FibonacciText {

  public static void main(String[] args) {

    try {
      FileOutputStream fout = new FileOutputStream("fibonacci.txt");
      OutputStreamWriter out = new OutputStreamWriter(fout, "8859_1");

      BigInteger low = BigInteger.ZERO;
      BigInteger high = BigInteger.ONE;

      for (int i = 0; i <= 25; i++) {
        out.write(low.toString() + "\r\n");
        BigInteger temp = high;
        high = high.add(low);
        low = temp;
        i++;
      }
      out.write(high.toString() + "\r\n");

      out.close();
    }
    catch (IOException e) {
      System.err.println(e);
    }

  }

}

fibonacci.txt

fibonacci.xml

Suppose we want that data in an XML document that looks something like this:

<?xml version="1.0" encoding="UTF-8"?>
<Fibonacci_Numbers>
  <fibonacci index="0">0</fibonacci>
  <fibonacci index="1">1</fibonacci>
  <fibonacci index="2">1</fibonacci>
  <fibonacci index="3">2</fibonacci>
  <fibonacci index="4">3</fibonacci>
  <fibonacci index="5">5</fibonacci>
  <fibonacci index="6">8</fibonacci>
  <fibonacci index="7">13</fibonacci>
  <fibonacci index="8">21</fibonacci>
  <fibonacci index="9">34</fibonacci>
  <fibonacci index="10">55</fibonacci>
  <fibonacci index="11">89</fibonacci>
  <fibonacci index="12">144</fibonacci>
  <fibonacci index="13">233</fibonacci>
  <fibonacci index="14">377</fibonacci>
  <fibonacci index="15">610</fibonacci>
  <fibonacci index="16">987</fibonacci>
  <fibonacci index="17">1597</fibonacci>
  <fibonacci index="18">2584</fibonacci>
  <fibonacci index="19">4181</fibonacci>
  <fibonacci index="20">6765</fibonacci>
  <fibonacci index="21">10946</fibonacci>
  <fibonacci index="22">17711</fibonacci>
  <fibonacci index="23">28657</fibonacci>
  <fibonacci index="24">46368</fibonacci>
  <fibonacci index="25">75025</fibonacci>
</Fibonacci_Numbers>

A JDOM program that writes Fibonacci numbers into an XML file

import org.jdom.Element;
import org.jdom.Document;
import org.jdom.output.XMLOutputter;
import java.math.BigInteger;
import java.io.*;


public class FibonacciJDOM {

  public static void main(String[] args) {

    Element root = new Element("Fibonacci_Numbers");

    BigInteger low  = BigInteger.ZERO;
    BigInteger high = BigInteger.ONE;

    for (int i = 0; i <= 25; i++) {
      Element fibonacci = new Element("fibonacci");
      fibonacci.addAttribute("index", String.valueOf(i));
      fibonacci.setText(low.toString());
      root.addContent(fibonacci);

      BigInteger temp = high;
      high = high.add(low);
      low = temp;
    }

    Document doc = new Document(root);
    // serialize it into a file
    try {
      FileOutputStream out = new FileOutputStream("fibonacci.xml");
      XMLOutputter serializer = new XMLOutputter();
      serializer.output(doc, out);
      out.flush();
      out.close();
    }
    catch (IOException e) {
      System.err.println(e);
    }

  }

}

Output

Again, modulo white space this is correct

<?xml version="1.0" encoding="UTF-8"?><Fibonacci_Numbers><fibonacci index="0">0</fibonacci><fibonacci index="1">1</fibonacci><fibonacci index="2">1</fibonacci><fibonacci index="3">2</fibonacci><fibonacci index="4">3</fibonacci><fibonacci index="5">5</fibonacci><fibonacci index="6">8</fibonacci><fibonacci index="7">13</fibonacci><fibonacci index="8">21</fibonacci><fibonacci index="9">34</fibonacci><fibonacci index="10">55</fibonacci><fibonacci index="11">89</fibonacci><fibonacci index="12">144</fibonacci><fibonacci index="13">233</fibonacci><fibonacci index="14">377</fibonacci><fibonacci index="15">610</fibonacci><fibonacci index="16">987</fibonacci><fibonacci index="17">1597</fibonacci><fibonacci index="18">2584</fibonacci><fibonacci index="19">4181</fibonacci><fibonacci index="20">6765</fibonacci><fibonacci index="21">10946</fibonacci><fibonacci index="22">17711</fibonacci><fibonacci index="23">28657</fibonacci><fibonacci index="24">46368</fibonacci><fibonacci index="25">75025</fibonacci></Fibonacci_Numbers>

A DOM program that writes Fibonacci numbers into an XML file

Here's the same program using DOM instead of JDOM. Which is simpler?

import java.math.*;
import java.io.*;
import org.w3c.dom.*;
import org.apache.xerces.dom.*;
import org.apache.xml.serialize.*;


public class FibonacciDOM {

  public static void main(String[] args) {

    try {

      DOMImplementation impl = DOMImplementationImpl.getDOMImplementation();

      Document fibonacci = impl.createDocument(null, "Fibonacci_Numbers", null);

      BigInteger low  = BigInteger.ZERO;
      BigInteger high = BigInteger.ONE;

      Element root = fibonacci.getDocumentElement();

      for (int i = 0; i <= 25; i++) {
        Element number = fibonacci.createElement("fibonacci");
        number.setAttribute("index", Integer.toString(i));
        Text text = fibonacci.createTextNode(low.toString());
        number.appendChild(text);
        root.appendChild(number);

        BigInteger temp = high;
        high = high.add(low);
        low = temp;
      }

      try {
        // Now that the document is created we need to *serialize* it
        FileOutputStream out = new FileOutputStream("fibonacci_dom.xml");
        OutputFormat format = new OutputFormat(fibonacci);
        XMLSerializer serializer = new XMLSerializer(out, format);
        serializer.serialize(root);
        out.flush();
        out.close();
      }
      catch (IOException e) {
        System.err.println(e);
      }
    }
    catch (DOMException e) {
      e.printStackTrace();
    }

  }

}

62 lines vs. 42 lines

Suppose you want to include a DTD

Suppose we have this DTD at the relative URL fibonacci.dtd:

<!ELEMENT Fibonacci_Numbers (fibonacci*)>
<!ELEMENT fibonacci (#PCDATA)>
<!ATTLIST fibonacci index CDATA #IMPLIED>

We need this DOCTYPE declaration:

<!DOCTYPE Fibonacci_Numbers SYSTEM "fibonacci.dtd">

ValidFibonacci

Use the DocType class to insert a document type declaration
JDOM does not support internal DTD subsets.
JDOM does not let you output a DTD.

import java.math.*;
import java.io.*;
import org.jdom.*;
import org.jdom.output.XMLOutputter;


public class ValidFibonacci {

  public static void main(String[] args) {
   
    Element root = new Element("Fibonacci_Numbers");	
  	      
    BigInteger low  = BigInteger.ZERO;
    BigInteger high = BigInteger.ONE;      
    
    for (int i = 0; i <= 25; i++) {
      Element fibonacci = new Element("fibonacci");
      Attribute index = new Attribute("index", String.valueOf(i));
      fibonacci.addAttribute(index);
      fibonacci.setText(low.toString());
      BigInteger temp = high;
      high = high.add(low);
      low = temp;
      root.addContent(fibonacci);
    }
 
    DocType type = new DocType("Fibonacci_Numbers", "fibonacci.dtd");
 
    Document doc = new Document(root, type);
    // serialize it into a file
    try {
      FileOutputStream out = new FileOutputStream("validfibonacci.xml");
      XMLOutputter serializer = new XMLOutputter(); 
      serializer.output(doc, out);
      out.flush();	
      out.close();
    }
    catch (IOException e) {
      System.err.println(e);
    }

  }

}

View Output in Browser

validfibonacci.xml

<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE Fibonacci_Numbers SYSTEM "fibonacci.dtd"><Fibonacci_Numbers><fibonacci index="0">0</fibonacci><fibonacci index="1">1</fibonacci><fibonacci index="2">1</fibonacci><fibonacci index="3">2</fibonacci><fibonacci index="4">3</fibonacci><fibonacci index="5">5</fibonacci><fibonacci index="6">8</fibonacci><fibonacci index="7">13</fibonacci><fibonacci index="8">21</fibonacci><fibonacci index="9">34</fibonacci><fibonacci index="10">55</fibonacci><fibonacci index="11">89</fibonacci><fibonacci index="12">144</fibonacci><fibonacci index="13">233</fibonacci><fibonacci index="14">377</fibonacci><fibonacci index="15">610</fibonacci><fibonacci index="16">987</fibonacci><fibonacci index="17">1597</fibonacci><fibonacci index="18">2584</fibonacci><fibonacci index="19">4181</fibonacci><fibonacci index="20">6765</fibonacci><fibonacci index="21">10946</fibonacci><fibonacci index="22">17711</fibonacci><fibonacci index="23">28657</fibonacci><fibonacci index="24">46368</fibonacci><fibonacci index="25">75025</fibonacci></Fibonacci_Numbers>

Using Namespaces

Suppose we want some MathML like this:

<?xml version="1.0" encoding="UTF-8"?>
<mathml:math xmlns:mathml="http://www.w3.org/1998/Math/MathML">
  <mathml:mrow>
    <mathml:mi>f(0)</mathml:mi>
    <mathml:mo>=</mathml:mo>
    <mathml:mn>0</mathml:mn>
  </mathml:mrow>
  <mathml:mrow>
    <mathml:mi>f(1)</mathml:mi>
    <mathml:mo>=</mathml:mo>
    <mathml:mn>1</mathml:mn>
  </mathml:mrow>
  <mathml:mrow>
    <mathml:mi>f(2)</mathml:mi>
    <mathml:mo>=</mathml:mo>
    <mathml:mn>1</mathml:mn>
  </mathml:mrow>
</mathml:math>

Do not use the qualified names like mathml:mn.
Instead use the prefixes mathml, local names like mn, and URIs like http://www.w3.org/1998/Math/MathML to create the elements.
Do not include xmlns attributes like xmlns:mathml="http://www.w3.org/1998/Math/MathML".
XMLOutputter will decide where to put the xmlns attributes when the document is serialized.

With Namespace Prefixes

import org.jdom.Element;
import org.jdom.Document;
import org.jdom.output.XMLOutputter;
import java.math.BigInteger;
import java.io.*;


public class PrefixedFibonacci {

  public static void main(String[] args) {

    Element root = new Element("math", "mathml",
     "http://www.w3.org/1998/Math/MathML");

    BigInteger low  = BigInteger.ZERO;
    BigInteger high = BigInteger.ONE;

    for (int i = 0; i <= 25; i++) {

      Element mrow = new Element("mrow", "mathml",
       "http://www.w3.org/1998/Math/MathML");

      Element mi = new Element("mi", "mathml",
       "http://www.w3.org/1998/Math/MathML");
      mi.setText("f(" + i + ")");
      mrow.addContent(mi);

      Element mo = new Element("mo", "mathml",
       "http://www.w3.org/1998/Math/MathML");
      mo.setText("=");
      mrow.addContent(mo);

      Element mn = new Element("mn", "mathml",
       "http://www.w3.org/1998/Math/MathML");
      mn.setText(low.toString());
      mrow.addContent(mn);

      BigInteger temp = high;
      high = high.add(low);
      low = temp;
      root.addContent(mrow);

    }

    Document doc = new Document(root);
    // serialize it into a file
    try {
      FileOutputStream out = new FileOutputStream("prefixed_fibonacci.xml");
      XMLOutputter serializer = new XMLOutputter();
      serializer.output(doc, out);
      out.flush();
      out.close();
    }
    catch (IOException e) {
      System.err.println(e);
    }

  }

}

View Output in Browser

The Default, Unprefixed Namespace

Suppose you want some MathML like this:

<?xml version="1.0" encoding="UTF-8"?>
<math xmlns="http://www.w3.org/1998/Math/MathML">
  <mrow>
    <mi>f(0)</mi>
    <mo>=</mo>
    <mn>0</mn>
  </mrow>
  <mrow>
    <mi>f(1)</mi>
    <mo>=</mo>
    <mn>1</mn>
  </mrow>
  <mrow>
    <mi>f(2)</mi>
    <mo>=</mo>
    <mn>1</mn>
  </mrow>
</math>

Do not use the local names like mn.
Instead use the local names like mn, and URIs like http://www.w3.org/1998/Math/MathML to create the elements.
Do not include xmlns attributes like xmlns="http://www.w3.org/1998/Math/MathML".
XMLOutputter will decide where to put the xmlns attribute when the document is serialized.

With Default Namespace

import org.jdom.Element;
import org.jdom.Document;
import org.jdom.output.XMLOutputter;
import java.math.BigInteger;
import java.io.*;


public class UnprefixedFibonacci {

  public static void main(String[] args) {
   
    Element root = new Element("math", "http://www.w3.org/1998/Math/MathML");	
  	      
    BigInteger low  = BigInteger.ZERO;
    BigInteger high = BigInteger.ONE;      
    
    for (int i = 0; i <= 25; i++) {
        
      Element mrow = new Element("mrow", "http://www.w3.org/1998/Math/MathML");
      
      Element mi = new Element("mi", "http://www.w3.org/1998/Math/MathML");
      mi.setText("f(" + i + ")"); 
      mrow.addContent(mi);
      
      Element mo = new Element("mo", "http://www.w3.org/1998/Math/MathML");
      mo.setText("="); 
      mrow.addContent(mo);
      
      Element mn = new Element("mn", "http://www.w3.org/1998/Math/MathML");
      mn.setText(low.toString());
      mrow.addContent(mn);

      BigInteger temp = high;
      high = high.add(low);
      low = temp;
      root.addContent(mrow);
      
    }
 
    Document doc = new Document(root);
    // serialize it into a file
    try {
      FileOutputStream out = new FileOutputStream("unprefixed_fibonacci.xml");
      XMLOutputter serializer = new XMLOutputter(); 
      serializer.output(doc, out);
      out.flush();	
      out.close();
    }
    catch (IOException e) {
      System.err.println(e);
    }

  }

}

View Output in Browser

Converting data to XML

Sample Tab Delimited Data: Baseball Statistics



Surname FirstName Team Position Games Played Games Started AtBats Runs Hits Doubles Triples Home runs RBI Stolen Bases Caught Stealing Sacrifice Hits Sacrifice Flies Errors PB Walks Strike outs Hit by pitch 
Anderson Garret ANA Outfield 156 151 622 62 183 41 7 15 79 8 3 3 3 6 0 29 80 1 
Baughman Justin ANA Second Base 62 54 196 24 50 9 1 1 20 10 4 5 3 8 0 6 36 1 
Bolick Frank ANA Third Base 21 11 45 3 7 2 0 1 2 0 0 0 0 0 0 11 8 0 
Disarcina Gary ANA Shortstop 157 155 551 73 158 39 3 3 56 12 7 12 3 14 0 21 51 8 
Edmonds Jim ANA Outfield 154 150 599 115 184 42 1 25 91 7 5 1 1 5 0 57 114 1 
Erstad Darin ANA Outfield 133 129 537 84 159 39 3 19 82 20 6 1 3 3 0 43 77 6 
Garcia Carlos ANA Second Base 19 10 35 4 5 1 0 0 0 2 0 1 0 1 0 3 11 1 
Glaus Troy ANA Third Base 48 45 165 19 36 9 0 1 23 1 0 0 2 7 0 15 51 0 
Greene Todd ANA Outfield 29 15 71 3 18 4 0 1 7 0 0 0 0 0 0 2 20 0 
Helfand Eric ANA Catcher 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
Hollins Dave ANA Third Base 101 98 363 60 88 16 2 11 39 11 3 2 2 17 0 44 69 7 
Jefferies Gregg ANA Outfield 19 18 72 7 25 6 0 1 10 1 0 0 0 0 0 0 5 0 
Johnson Mark ANA First Base 10 2 14 1 1 0 0 0 0 0 0 0 0 0 0 0 6 0 
Kreuter Chad ANA Catcher 96 74 252 27 63 10 1 2 33 1 0 5 1 9 5 33 49 3 
Martin Norberto ANA Second Base 79 50 195 20 42 2 0 1 13 3 1 3 2 4 0 6 29 0 
Mashore Damon ANA Outfield 43 24 98 13 23 6 0 2 11 1 0 1 0 0 0 9 22 3 
Molina Ben ANA Catcher 2 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
Nevin Phil ANA Catcher 75 65 237 27 54 8 1 8 27 0 0 0 2 5 20 17 67 5 
O'Brien Charlie ANA Catcher 62 58 175 13 45 9 0 4 18 0 0 3 3 4 1 10 33 2 
Palmeiro Orlando ANA Outfield 74 34 165 28 53 7 2 0 21 5 4 7 0 0 0 20 11 0 
Pritchett Chris ANA First Base 31 19 80 12 23 2 1 2 8 2 0 0 0 1 0 4 16 0 
Salmon Tim ANA Designated Hitter 136 130 463 84 139 28 1 26 88 0 1 0 10 2 0 90 100 3 
Shipley Craig ANA Third Base 77 32 147 18 38 7 1 2 17 0 4 4 1 3 0 5 22 5 
Velarde Randy ANA Second Base 51 50 188 29 49 13 1 4 26 7 2 0 1 4 0 34 42 1 
Walbeck Matt ANA Catcher 108 91 338 41 87 15 2 6 46 1 1 5 5 7 8 30 68 2 
Williams Reggie ANA Outfield 29 7 36 7 13 1 0 1 5 3 3 1 0 0 0 7 11 1

A Program to convert tab delimited data to XML

import java.io.*;
import org.jdom.*;
import org.jdom.output.XMLOutputter;


public class BaseballTabToXML {

  public static void main(String[] args) {
     
    Element root = new Element("players");
    
    try {
      FileInputStream fin = new FileInputStream(args[0]);
      BufferedReader in 
       = new BufferedReader(new InputStreamReader(fin));    

      String playerStats;  
      while ((playerStats = in.readLine()) != null) {
        String[] stats = splitLine(playerStats);
        
        Element player = new Element("player");

        Element first_name = new Element("first_name");
        first_name.setText(stats[1]);
        player.addContent(first_name);
        
        Element surname = new Element("surname");
        surname.setText(stats[0]);
        player.addContent(surname);
       
        Element games_played = new Element("games_played");
        games_played.setText(stats[4]);
        player.addContent(games_played);
       
        Element at_bats = new Element("at_bats");
        at_bats.setText(stats[6]);
        player.addContent(at_bats);
       
        Element runs = new Element("runs");
        runs.setText(stats[7]);
        player.addContent(runs);
       
        Element hits = new Element("hits");
        hits.setText(stats[8]);
        player.addContent(hits);
       
        Element doubles = new Element("doubles");
        doubles.setText(stats[9]);
        player.addContent(doubles);
       
        Element triples = new Element("triples");
        triples.setText(stats[10]);
        player.addContent(triples); 

        Element home_runs = new Element("home_runs");
        home_runs.setText(stats[11]);
        player.addContent(home_runs); 

        Element runs_batted_in = new Element("runs_batted_in");
        runs_batted_in.setText(stats[12]);
        player.addContent(runs_batted_in); 

        Element stolen_bases = new Element("stolen_bases");
        stolen_bases.setText(stats[13]);
        player.addContent(stolen_bases); 

        Element caught_stealing = new Element("caught_stealing");
        caught_stealing.setText(stats[14]);
        player.addContent(caught_stealing); 

        Element sacrifice_hits = new Element("sacrifice_hits");
        sacrifice_hits.setText(stats[15]);
        player.addContent(sacrifice_hits); 

        Element sacrifice_flies = new Element("sacrifice_flies");
        sacrifice_flies.setText(stats[16]);
        player.addContent(sacrifice_flies); 

        Element errors = new Element("errors");
        errors.setText(stats[17]);
        player.addContent(errors); 

        Element passed_by_ball = new Element("passed_by_ball");
        passed_by_ball.setText(stats[18]);
        player.addContent(passed_by_ball); 

        Element walks = new Element("walks");
        walks.setText(stats[19]);
        player.addContent(walks); 

        Element strike_outs = new Element("strike_outs");
        strike_outs.setText(stats[20]);
        player.addContent(strike_outs); 

        Element hit_by_pitch = new Element("hit_by_pitch");
        hit_by_pitch.setText(stats[21]);
        player.addContent(hit_by_pitch); 
        
        root.addContent(player);
      }  
      
      Document doc = new Document(root);
      // serialize it into a file
      FileOutputStream fout 
       = new FileOutputStream("baseballstats.xml");
      
      XMLOutputter serializer = new XMLOutputter(); 
      serializer.output(doc, fout);
      fout.flush();	
      fout.close();
      in.close();
      
    }
    catch (IOException e) {
      System.err.println(e);
    }
    catch (ArrayIndexOutOfBoundsException e) {
      System.out.println("Usage: java BaseballTabToXML input_file.tab");
    }

  }

  public static String[] splitLine(String playerStats) {
    
    // count the number of tabs
    int numTabs = 0;
    for (int i = 0; i < playerStats.length(); i++) {
      if (playerStats.charAt(i) == '\t') numTabs++;
    }
    int numFields = numTabs + 1;
    String[] fields = new String[numFields];
    int position = 0;
    for (int i = 0; i < numFields; i++) {
      StringBuffer field = new StringBuffer();
      while (position < playerStats.length() 
       && playerStats.charAt(position++) != '\t') {
        field.append(playerStats.charAt(position-1));
      }
      fields[i] = field.toString();
    }    
    return fields;
    
  }

}

View Output in Browser

Baseball Stats in XML

<?xml version="1.0"?>
<players>
  <player>
    <first_name>FirstName</first_name>
    <surname>Surname</surname>
    <games_played>Games Played</games_played>
    <at_bats>AtBats</at_bats>
    <runs>Runs</runs>
    <hits>Hits</hits>
    <doubles>Doubles</doubles>
    <triples>Triples</triples>
    <home_runs>Home runs</home_runs>
    <stolen_bases>RBI</stolen_bases>
    <caught_stealing>Caught Stealing</caught_stealing>
    <sacrifice_hits>Sacrifice Hits</sacrifice_hits>
    <sacrifice_flies>Sacrifice Flies</sacrifice_flies>
    <errors>Errors</errors>
    <passed_by_ball>PB</passed_by_ball>
    <walks>Walks</walks>
    <strike_outs>Strike outs</strike_outs>
    <hit_by_pitch>Hit by pitch</hit_by_pitch>
  </player>
  <player>
    <first_name>Garret </first_name>
    <surname>Anderson</surname>
    <games_played>156</games_played>
    <at_bats>622</at_bats>
    <runs>62</runs>
    <hits>183</hits>
    <doubles>41</doubles>
    <triples>7</triples>
    <home_runs>15</home_runs>
    <stolen_bases>79</stolen_bases>
    <caught_stealing>3</caught_stealing>
    <sacrifice_hits>3</sacrifice_hits>
    <sacrifice_flies>3</sacrifice_flies>
    <errors>6</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>29</walks>
    <strike_outs>80</strike_outs>
    <hit_by_pitch>1</hit_by_pitch>
  </player>
  <player>
    <first_name>Justin </first_name>
    <surname>Baughman</surname>
    <games_played>62</games_played>
    <at_bats>196</at_bats>
    <runs>24</runs>
    <hits>50</hits>
    <doubles>9</doubles>
    <triples>1</triples>
    <home_runs>1</home_runs>
    <stolen_bases>20</stolen_bases>
    <caught_stealing>4</caught_stealing>
    <sacrifice_hits>5</sacrifice_hits>
    <sacrifice_flies>3</sacrifice_flies>
    <errors>8</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>6</walks>
    <strike_outs>36</strike_outs>
    <hit_by_pitch>1</hit_by_pitch>
  </player>
  <player>
    <first_name>Frank </first_name>
    <surname>Bolick</surname>
    <games_played>21</games_played>
    <at_bats>45</at_bats>
    <runs>3</runs>
    <hits>7</hits>
    <doubles>2</doubles>
    <triples>0</triples>
    <home_runs>1</home_runs>
    <stolen_bases>2</stolen_bases>
    <caught_stealing>0</caught_stealing>
    <sacrifice_hits>0</sacrifice_hits>
    <sacrifice_flies>0</sacrifice_flies>
    <errors>0</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>11</walks>
    <strike_outs>8</strike_outs>
    <hit_by_pitch>0</hit_by_pitch>
  </player>
  <player>
    <first_name>Gary </first_name>
    <surname>Disarcina</surname>
    <games_played>157</games_played>
    <at_bats>551</at_bats>
    <runs>73</runs>
    <hits>158</hits>
    <doubles>39</doubles>
    <triples>3</triples>
    <home_runs>3</home_runs>
    <stolen_bases>56</stolen_bases>
    <caught_stealing>7</caught_stealing>
    <sacrifice_hits>12</sacrifice_hits>
    <sacrifice_flies>3</sacrifice_flies>
    <errors>14</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>21</walks>
    <strike_outs>51</strike_outs>
    <hit_by_pitch>8</hit_by_pitch>
  </player>
  <player>
    <first_name>Jim </first_name>
    <surname>Edmonds</surname>
    <games_played>154</games_played>
    <at_bats>599</at_bats>
    <runs>115</runs>
    <hits>184</hits>
    <doubles>42</doubles>
    <triples>1</triples>
    <home_runs>25</home_runs>
    <stolen_bases>91</stolen_bases>
    <caught_stealing>5</caught_stealing>
    <sacrifice_hits>1</sacrifice_hits>
    <sacrifice_flies>1</sacrifice_flies>
    <errors>5</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>57</walks>
    <strike_outs>114</strike_outs>
    <hit_by_pitch>1</hit_by_pitch>
  </player>
  <player>
    <first_name>Darin </first_name>
    <surname>Erstad</surname>
    <games_played>133</games_played>
    <at_bats>537</at_bats>
    <runs>84</runs>
    <hits>159</hits>
    <doubles>39</doubles>
    <triples>3</triples>
    <home_runs>19</home_runs>
    <stolen_bases>82</stolen_bases>
    <caught_stealing>6</caught_stealing>
    <sacrifice_hits>1</sacrifice_hits>
    <sacrifice_flies>3</sacrifice_flies>
    <errors>3</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>43</walks>
    <strike_outs>77</strike_outs>
    <hit_by_pitch>6</hit_by_pitch>
  </player>
  <player>
    <first_name>Carlos </first_name>
    <surname>Garcia</surname>
    <games_played>19</games_played>
    <at_bats>35</at_bats>
    <runs>4</runs>
    <hits>5</hits>
    <doubles>1</doubles>
    <triples>0</triples>
    <home_runs>0</home_runs>
    <stolen_bases>0</stolen_bases>
    <caught_stealing>0</caught_stealing>
    <sacrifice_hits>1</sacrifice_hits>
    <sacrifice_flies>0</sacrifice_flies>
    <errors>1</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>3</walks>
    <strike_outs>11</strike_outs>
    <hit_by_pitch>1</hit_by_pitch>
  </player>
  <player>
    <first_name>Troy </first_name>
    <surname>Glaus</surname>
    <games_played>48</games_played>
    <at_bats>165</at_bats>
    <runs>19</runs>
    <hits>36</hits>
    <doubles>9</doubles>
    <triples>0</triples>
    <home_runs>1</home_runs>
    <stolen_bases>23</stolen_bases>
    <caught_stealing>0</caught_stealing>
    <sacrifice_hits>0</sacrifice_hits>
    <sacrifice_flies>2</sacrifice_flies>
    <errors>7</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>15</walks>
    <strike_outs>51</strike_outs>
    <hit_by_pitch>0</hit_by_pitch>
  </player>
  <player>
    <first_name>Todd </first_name>
    <surname>Greene</surname>
    <games_played>29</games_played>
    <at_bats>71</at_bats>
    <runs>3</runs>
    <hits>18</hits>
    <doubles>4</doubles>
    <triples>0</triples>
    <home_runs>1</home_runs>
    <stolen_bases>7</stolen_bases>
    <caught_stealing>0</caught_stealing>
    <sacrifice_hits>0</sacrifice_hits>
    <sacrifice_flies>0</sacrifice_flies>
    <errors>0</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>2</walks>
    <strike_outs>20</strike_outs>
    <hit_by_pitch>0</hit_by_pitch>
  </player>
  <player>
    <first_name>Eric </first_name>
    <surname>Helfand</surname>
    <games_played>0</games_played>
    <at_bats>0</at_bats>
    <runs>0</runs>
    <hits>0</hits>
    <doubles>0</doubles>
    <triples>0</triples>
    <home_runs>0</home_runs>
    <stolen_bases>0</stolen_bases>
    <caught_stealing>0</caught_stealing>
    <sacrifice_hits>0</sacrifice_hits>
    <sacrifice_flies>0</sacrifice_flies>
    <errors>0</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>0</walks>
    <strike_outs>0</strike_outs>
    <hit_by_pitch>0</hit_by_pitch>
  </player>
  <player>
    <first_name>Dave </first_name>
    <surname>Hollins</surname>
    <games_played>101</games_played>
    <at_bats>363</at_bats>
    <runs>60</runs>
    <hits>88</hits>
    <doubles>16</doubles>
    <triples>2</triples>
    <home_runs>11</home_runs>
    <stolen_bases>39</stolen_bases>
    <caught_stealing>3</caught_stealing>
    <sacrifice_hits>2</sacrifice_hits>
    <sacrifice_flies>2</sacrifice_flies>
    <errors>17</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>44</walks>
    <strike_outs>69</strike_outs>
    <hit_by_pitch>7</hit_by_pitch>
  </player>
  <player>
    <first_name>Gregg </first_name>
    <surname>Jefferies</surname>
    <games_played>19</games_played>
    <at_bats>72</at_bats>
    <runs>7</runs>
    <hits>25</hits>
    <doubles>6</doubles>
    <triples>0</triples>
    <home_runs>1</home_runs>
    <stolen_bases>10</stolen_bases>
    <caught_stealing>0</caught_stealing>
    <sacrifice_hits>0</sacrifice_hits>
    <sacrifice_flies>0</sacrifice_flies>
    <errors>0</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>0</walks>
    <strike_outs>5</strike_outs>
    <hit_by_pitch>0</hit_by_pitch>
  </player>
  <player>
    <first_name>Mark </first_name>
    <surname>Johnson</surname>
    <games_played>10</games_played>
    <at_bats>14</at_bats>
    <runs>1</runs>
    <hits>1</hits>
    <doubles>0</doubles>
    <triples>0</triples>
    <home_runs>0</home_runs>
    <stolen_bases>0</stolen_bases>
    <caught_stealing>0</caught_stealing>
    <sacrifice_hits>0</sacrifice_hits>
    <sacrifice_flies>0</sacrifice_flies>
    <errors>0</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>0</walks>
    <strike_outs>6</strike_outs>
    <hit_by_pitch>0</hit_by_pitch>
  </player>
  <player>
    <first_name>Chad </first_name>
    <surname>Kreuter</surname>
    <games_played>96</games_played>
    <at_bats>252</at_bats>
    <runs>27</runs>
    <hits>63</hits>
    <doubles>10</doubles>
    <triples>1</triples>
    <home_runs>2</home_runs>
    <stolen_bases>33</stolen_bases>
    <caught_stealing>0</caught_stealing>
    <sacrifice_hits>5</sacrifice_hits>
    <sacrifice_flies>1</sacrifice_flies>
    <errors>9</errors>
    <passed_by_ball>5</passed_by_ball>
    <walks>33</walks>
    <strike_outs>49</strike_outs>
    <hit_by_pitch>3</hit_by_pitch>
  </player>
  <player>
    <first_name>Norberto </first_name>
    <surname>Martin</surname>
    <games_played>79</games_played>
    <at_bats>195</at_bats>
    <runs>20</runs>
    <hits>42</hits>
    <doubles>2</doubles>
    <triples>0</triples>
    <home_runs>1</home_runs>
    <stolen_bases>13</stolen_bases>
    <caught_stealing>1</caught_stealing>
    <sacrifice_hits>3</sacrifice_hits>
    <sacrifice_flies>2</sacrifice_flies>
    <errors>4</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>6</walks>
    <strike_outs>29</strike_outs>
    <hit_by_pitch>0</hit_by_pitch>
  </player>
  <player>
    <first_name>Damon </first_name>
    <surname>Mashore</surname>
    <games_played>43</games_played>
    <at_bats>98</at_bats>
    <runs>13</runs>
    <hits>23</hits>
    <doubles>6</doubles>
    <triples>0</triples>
    <home_runs>2</home_runs>
    <stolen_bases>11</stolen_bases>
    <caught_stealing>0</caught_stealing>
    <sacrifice_hits>1</sacrifice_hits>
    <sacrifice_flies>0</sacrifice_flies>
    <errors>0</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>9</walks>
    <strike_outs>22</strike_outs>
    <hit_by_pitch>3</hit_by_pitch>
  </player>
  <player>
    <first_name>Ben </first_name>
    <surname>Molina</surname>
    <games_played>2</games_played>
    <at_bats>1</at_bats>
    <runs>0</runs>
    <hits>0</hits>
    <doubles>0</doubles>
    <triples>0</triples>
    <home_runs>0</home_runs>
    <stolen_bases>0</stolen_bases>
    <caught_stealing>0</caught_stealing>
    <sacrifice_hits>0</sacrifice_hits>
    <sacrifice_flies>0</sacrifice_flies>
    <errors>0</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>0</walks>
    <strike_outs>0</strike_outs>
    <hit_by_pitch>0</hit_by_pitch>
  </player>
  <player>
    <first_name>Phil </first_name>
    <surname>Nevin</surname>
    <games_played>75</games_played>
    <at_bats>237</at_bats>
    <runs>27</runs>
    <hits>54</hits>
    <doubles>8</doubles>
    <triples>1</triples>
    <home_runs>8</home_runs>
    <stolen_bases>27</stolen_bases>
    <caught_stealing>0</caught_stealing>
    <sacrifice_hits>0</sacrifice_hits>
    <sacrifice_flies>2</sacrifice_flies>
    <errors>5</errors>
    <passed_by_ball>20</passed_by_ball>
    <walks>17</walks>
    <strike_outs>67</strike_outs>
    <hit_by_pitch>5</hit_by_pitch>
  </player>
  <player>
    <first_name>Charlie </first_name>
    <surname>Obrien</surname>
    <games_played>62</games_played>
    <at_bats>175</at_bats>
    <runs>13</runs>
    <hits>45</hits>
    <doubles>9</doubles>
    <triples>0</triples>
    <home_runs>4</home_runs>
    <stolen_bases>18</stolen_bases>
    <caught_stealing>0</caught_stealing>
    <sacrifice_hits>3</sacrifice_hits>
    <sacrifice_flies>3</sacrifice_flies>
    <errors>4</errors>
    <passed_by_ball>1</passed_by_ball>
    <walks>10</walks>
    <strike_outs>33</strike_outs>
    <hit_by_pitch>2</hit_by_pitch>
  </player>
  <player>
    <first_name>Orlando </first_name>
    <surname>Palmeiro</surname>
    <games_played>74</games_played>
    <at_bats>165</at_bats>
    <runs>28</runs>
    <hits>53</hits>
    <doubles>7</doubles>
    <triples>2</triples>
    <home_runs>0</home_runs>
    <stolen_bases>21</stolen_bases>
    <caught_stealing>4</caught_stealing>
    <sacrifice_hits>7</sacrifice_hits>
    <sacrifice_flies>0</sacrifice_flies>
    <errors>0</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>20</walks>
    <strike_outs>11</strike_outs>
    <hit_by_pitch>0</hit_by_pitch>
  </player>
  <player>
    <first_name>Chris </first_name>
    <surname>Pritchett</surname>
    <games_played>31</games_played>
    <at_bats>80</at_bats>
    <runs>12</runs>
    <hits>23</hits>
    <doubles>2</doubles>
    <triples>1</triples>
    <home_runs>2</home_runs>
    <stolen_bases>8</stolen_bases>
    <caught_stealing>0</caught_stealing>
    <sacrifice_hits>0</sacrifice_hits>
    <sacrifice_flies>0</sacrifice_flies>
    <errors>1</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>4</walks>
    <strike_outs>16</strike_outs>
    <hit_by_pitch>0</hit_by_pitch>
  </player>
  <player>
    <first_name>Tim </first_name>
    <surname>Salmon</surname>
    <games_played>136</games_played>
    <at_bats>463</at_bats>
    <runs>84</runs>
    <hits>139</hits>
    <doubles>28</doubles>
    <triples>1</triples>
    <home_runs>26</home_runs>
    <stolen_bases>88</stolen_bases>
    <caught_stealing>1</caught_stealing>
    <sacrifice_hits>0</sacrifice_hits>
    <sacrifice_flies>10</sacrifice_flies>
    <errors>2</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>90</walks>
    <strike_outs>100</strike_outs>
    <hit_by_pitch>3</hit_by_pitch>
  </player>
  <player>
    <first_name>Craig </first_name>
    <surname>Shipley</surname>
    <games_played>77</games_played>
    <at_bats>147</at_bats>
    <runs>18</runs>
    <hits>38</hits>
    <doubles>7</doubles>
    <triples>1</triples>
    <home_runs>2</home_runs>
    <stolen_bases>17</stolen_bases>
    <caught_stealing>4</caught_stealing>
    <sacrifice_hits>4</sacrifice_hits>
    <sacrifice_flies>1</sacrifice_flies>
    <errors>3</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>5</walks>
    <strike_outs>22</strike_outs>
    <hit_by_pitch>5</hit_by_pitch>
  </player>
  <player>
    <first_name>Randy </first_name>
    <surname>Velarde</surname>
    <games_played>51</games_played>
    <at_bats>188</at_bats>
    <runs>29</runs>
    <hits>49</hits>
    <doubles>13</doubles>
    <triples>1</triples>
    <home_runs>4</home_runs>
    <stolen_bases>26</stolen_bases>
    <caught_stealing>2</caught_stealing>
    <sacrifice_hits>0</sacrifice_hits>
    <sacrifice_flies>1</sacrifice_flies>
    <errors>4</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>34</walks>
    <strike_outs>42</strike_outs>
    <hit_by_pitch>1</hit_by_pitch>
  </player>
  <player>
    <first_name>Matt </first_name>
    <surname>Walbeck</surname>
    <games_played>108</games_played>
    <at_bats>338</at_bats>
    <runs>41</runs>
    <hits>87</hits>
    <doubles>15</doubles>
    <triples>2</triples>
    <home_runs>6</home_runs>
    <stolen_bases>46</stolen_bases>
    <caught_stealing>1</caught_stealing>
    <sacrifice_hits>5</sacrifice_hits>
    <sacrifice_flies>5</sacrifice_flies>
    <errors>7</errors>
    <passed_by_ball>8</passed_by_ball>
    <walks>30</walks>
    <strike_outs>68</strike_outs>
    <hit_by_pitch>2</hit_by_pitch>
  </player>
  <player>
    <first_name>Reggie </first_name>
    <surname>Williams</surname>
    <games_played>29</games_played>
    <at_bats>36</at_bats>
    <runs>7</runs>
    <hits>13</hits>
    <doubles>1</doubles>
    <triples>0</triples>
    <home_runs>1</home_runs>
    <stolen_bases>5</stolen_bases>
    <caught_stealing>3</caught_stealing>
    <sacrifice_hits>1</sacrifice_hits>
    <sacrifice_flies>0</sacrifice_flies>
    <errors>0</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>7</walks>
    <strike_outs>11</strike_outs>
    <hit_by_pitch>1</hit_by_pitch>
  </player>
</players>

A Shortcut

import java.io.*;
import org.jdom.*;
import org.jdom.output.XMLOutputter;


public class BaseballTabToXMLShortcut {

  public static void main(String[] args) {
     
    Element root = new Element("players");
    
    try {
      FileInputStream fin = new FileInputStream(args[0]);
      BufferedReader in 
       = new BufferedReader(new InputStreamReader(fin));    

      String playerStats;  
      while ((playerStats = in.readLine()) != null) {
        String[] stats = splitLine(playerStats);
        
        Element player = new Element("player");

        player.addContent((new Element("first_name")).setText(stats[1]));
        player.addContent((new Element("surname")).setText(stats[0]));
        player.addContent((new Element("games_played")).setText(stats[4]));
        player.addContent((new Element("at_bats")).setText(stats[6]));
        player.addContent((new Element("runs")).setText(stats[7]));
        player.addContent((new Element("hits")).setText(stats[8]));
        player.addContent((new Element("doubles")).setText(stats[9]));
        player.addContent((new Element("triples")).setText(stats[10]));
        player.addContent((new Element("home_runs")).setText(stats[11]));
        player.addContent((new Element("runs_batted_in")).setText(stats[12]));
        player.addContent((new Element("stolen_bases")).setText(stats[13]));
        player.addContent((new Element("caught_stealing")).setText(stats[14]));
        player.addContent((new Element("sacrifice_hits")).setText(stats[15]));
        player.addContent((new Element("sacrifice_flies")).setText(stats[16]));
        player.addContent((new Element("errors")).setText(stats[17]));
        player.addContent((new Element("passed_by_ball")).setText(stats[18]));
        player.addContent((new Element("walks")).setText(stats[19]));
        player.addContent((new Element("strike_outs")).setText(stats[20]));
        player.addContent((new Element("hit_by_pitch")).setText(stats[21]));
        
        root.addContent(player);
      }  
      
      Document doc = new Document(root);
      // serialize it into a file
      FileOutputStream fout 
       = new FileOutputStream("baseballstats.xml");
      
      XMLOutputter serializer = new XMLOutputter(); 
      serializer.output(doc, fout);
      fout.flush();	
      fout.close();
      in.close();
      
    }
    catch (IOException e) {
      System.err.println(e);
    }
    catch (ArrayIndexOutOfBoundsException e) {
      System.out.println("Usage: java BaseballTabToXML input_file.tab");
    }

  }

  public static String[] splitLine(String playerStats) {
    
    // count the number of tabs
    int numTabs = 0;
    for (int i = 0; i < playerStats.length(); i++) {
      if (playerStats.charAt(i) == '\t') numTabs++;
    }
    int numFields = numTabs + 1;
    String[] fields = new String[numFields];
    int position = 0;
    for (int i = 0; i < numFields; i++) {
      StringBuffer field = new StringBuffer();
      while (position < playerStats.length() 
       && playerStats.charAt(position++) != '\t') {
        field.append(playerStats.charAt(position-1));
      }
      fields[i] = field.toString();
    }    
    return fields;
    
  }

}

Converting data to XML while Processing it

import java.io.*;
import java.text.*;
import java.util.*;
import org.jdom.*;
import org.jdom.output.XMLOutputter;

public class BattingAverage {

  public static void main(String[] args) {
     
    Element root = new Element("players");
     
    try {
      FileInputStream fin = new FileInputStream(args[0]);
      BufferedReader in 
       = new BufferedReader(new InputStreamReader(fin));
      
      String playerStats;
      
      // for formatting batting averages
      DecimalFormat averages = (DecimalFormat) 
       NumberFormat.getNumberInstance(Locale.US);
      averages.setMaximumFractionDigits(3);
      averages.setMinimumFractionDigits(3);
      averages.setMinimumIntegerDigits(0);
      
      while ((playerStats = in.readLine()) != null) {
        String[] stats = splitLine(playerStats);
        
        String formattedAverage;
        try {
          int atBats         = Integer.parseInt(stats[6]);
          int hits           = Integer.parseInt(stats[8]);
          int walks          = Integer.parseInt(stats[19]);
          int hitByPitch     = Integer.parseInt(stats[21]);
          int sacrificeFlies = Integer.parseInt(stats[16]);
          int sacrificeHits  = Integer.parseInt(stats[15]);
        
          int officialAtBats 
           = atBats - walks - hitByPitch - sacrificeHits;
          if (officialAtBats <= 0) formattedAverage = "N/A";
          else {
            double average = hits / (double) officialAtBats;
            formattedAverage = averages.format(average);
          }       
        }
        catch (Exception e) {
          // skip this player
          continue; 
        }

        Element player = new Element("player");

        Element first_name = new Element("first_name");
        first_name.setText(stats[1]);
        player.addContent(first_name);
             
        Element surname = new Element("surname");
        surname.setText(stats[0]);
        player.addContent(surname);
       
        Element battingAverage = new Element("batting_average");
        battingAverage.setText(formattedAverage);
        player.addContent(battingAverage);
   
        root.addContent(player);
        
      }  
      
      
      Document doc = new Document(root);
      // serialize it into a file
      FileOutputStream fout 
       = new FileOutputStream("battingaverages.xml");
      
      XMLOutputter serializer = new XMLOutputter(); 
      serializer.output(doc, fout);
      fout.flush();	
      fout.close();
      in.close();

    }
    catch (IOException e) {
      System.err.println(e);
    }
    catch (ArrayIndexOutOfBoundsException e) {
      System.out.println("Usage: java BattingAverage input_file.tab");
    }

  }

  public static String[] splitLine(String playerStats) {
    
    // count the number of tabs
    int numTabs = 0;
    for (int i = 0; i < playerStats.length(); i++) {
      if (playerStats.charAt(i) == '\t') numTabs++;
    }
    int numFields = numTabs + 1;
    String[] fields = new String[numFields];
    int position = 0;
    for (int i = 0; i < numFields; i++) {
      StringBuffer field = new StringBuffer();
      while (position < playerStats.length() 
       && playerStats.charAt(position++) != '\t') {
        field.append(playerStats.charAt(position-1));
      }
      fields[i] = field.toString();
    }    
    return fields;
    
  }

}

View Output in Browser

Batting Averages in XML

<?xml version="1.0"?>
<players>
  <player>
    <first_name>Garret </first_name>
    <surname>Anderson</surname>
    <batting_average>.311</batting_average>
  </player>
  <player>
    <first_name>Justin </first_name>
    <surname>Baughman</surname>
    <batting_average>.272</batting_average>
  </player>
  <player>
    <first_name>Frank </first_name>
    <surname>Bolick</surname>
    <batting_average>.206</batting_average>
  </player>
  <player>
    <first_name>Gary </first_name>
    <surname>Disarcina</surname>
    <batting_average>.310</batting_average>
  </player>
  <player>
    <first_name>Jim </first_name>
    <surname>Edmonds</surname>
    <batting_average>.341</batting_average>
  </player>
  <player>
    <first_name>Darin </first_name>
    <surname>Erstad</surname>
    <batting_average>.326</batting_average>
  </player>
  <player>
    <first_name>Carlos </first_name>
    <surname>Garcia</surname>
    <batting_average>.167</batting_average>
  </player>
  <player>
    <first_name>Troy </first_name>
    <surname>Glaus</surname>
    <batting_average>.240</batting_average>
  </player>
  <player>
    <first_name>Todd </first_name>
    <surname>Greene</surname>
    <batting_average>.261</batting_average>
  </player>
  <player>
    <first_name>Eric </first_name>
    <surname>Helfand</surname>
    <batting_average>N/A</batting_average>
  </player>
  <player>
    <first_name>Dave </first_name>
    <surname>Hollins</surname>
    <batting_average>.284</batting_average>
  </player>
  <player>
    <first_name>Gregg </first_name>
    <surname>Jefferies</surname>
    <batting_average>.347</batting_average>
  </player>
  <player>
    <first_name>Mark </first_name>
    <surname>Johnson</surname>
    <batting_average>.071</batting_average>
  </player>
  <player>
    <first_name>Chad </first_name>
    <surname>Kreuter</surname>
    <batting_average>.299</batting_average>
  </player>
  <player>
    <first_name>Norberto </first_name>
    <surname>Martin</surname>
    <batting_average>.226</batting_average>
  </player>
  <player>
    <first_name>Damon </first_name>
    <surname>Mashore</surname>
    <batting_average>.271</batting_average>
  </player>
  <player>
    <first_name>Ben </first_name>
    <surname>Molina</surname>
    <batting_average>.000</batting_average>
  </player>
  <player>
    <first_name>Phil </first_name>
    <surname>Nevin</surname>
    <batting_average>.251</batting_average>
  </player>
  <player>
    <first_name>Charlie </first_name>
    <surname>Obrien</surname>
    <batting_average>.281</batting_average>
  </player>
  <player>
    <first_name>Orlando </first_name>
    <surname>Palmeiro</surname>
    <batting_average>.384</batting_average>
  </player>
  <player>
    <first_name>Chris </first_name>
    <surname>Pritchett</surname>
    <batting_average>.303</batting_average>
  </player>
  <player>
    <first_name>Tim </first_name>
    <surname>Salmon</surname>
    <batting_average>.376</batting_average>
  </player>
  <player>
    <first_name>Craig </first_name>
    <surname>Shipley</surname>
    <batting_average>.286</batting_average>
  </player>
  <player>
    <first_name>Randy </first_name>
    <surname>Velarde</surname>
    <batting_average>.320</batting_average>
  </player>
  <player>
    <first_name>Matt </first_name>
    <surname>Walbeck</surname>
    <batting_average>.289</batting_average>
  </player>
  <player>
    <first_name>Reggie </first_name>
    <surname>Williams</surname>
    <batting_average>.481</batting_average>
  </player>
</players>

Advantages of JDOM for Writing Documents

You don't need to worry about well-formedness rules
Very configurable output
You can pick any encoding Java supports.
Validity is not automatically maintained.

Reading XML with JDOM

The stereotypical "Desperate Perl Hacker" (DPH) is supposed to be able to write an XML parser in a weekend.
The parser does the hard work for you.
Your code reads the document through by hooking up JDOM to the parser.
JDOM can connect to any parser that supports SAX or DOM.

Parser APIs

SAX, the Simple API for XML
- SAX1
- SAX2
DOM, the Document Object Model
- DOM Level 0
- DOM Level 1
- DOM Level 2
- DOM Level 3
Proprietary APIs
- Parser specific APIs
- Sun's Java API for XML Parsing = SAX1 + DOM1 + a few factory classes
- JSR-000031 XML Data Binding Specification from Bluestone, Sun, WebMethods et al.
  The proposed specification will define an XML data-binding facility for the JavaTM Platform. Such a facility compiles an XML schema into one or more Java classes. These automatically-generated classes handle the translation between XML documents that follow the schema and interrelated instances of the derived classes. They also ensure that the constraints expressed in the schema are maintained as instances of the classes are manipulated.
And of course JDOM

JDOM Compatible Parsers for Java

Any SAX or DOM compatible parser including:

Apache XML Project's Xerces Java: http://xml.apache.org/xerces-j/index.html
Oracle's XML Parser for Java: http://technet.oracle.com/tech/xml/parser_java2
Sun's Java API for XML http://java.sun.com/products/xml

SAX

Public domain, developed on xml-dev mailing list
Maintained by David Megginson
org.xml.sax package
http://www.megginson.com/SAX/
Event based
Read-only
SAX omits DTD declarations

SAX2

Adds:
- Namespace support
- Optional Validation
- Optional Lexical events for comments, CDATA sections, entity references
A lot more configurable
Deprecates a lot of SAX1
Adapter classes convert between SAX2 and SAX1 parsers.

The SAX Process

Use the XMLReaderFactory class to get a parser-specific implementation of the XMLReader interface
Your code registers a ContentHandler with the parser
An InputSource feeds the document into the parser
As the document is read, the parser calls back to the methods of the methods of the ContentHandler to tell it what it's seeing in the document.

Event Based API Caveats

You do not always have all the information you need at the time of a given callback
You may need to store information in various data structures (stacks, queues,vectors, arrays, etc.) and act on it at a later point
For example, the characters() method is not guaranteed to give you the maximum number of contiguous characters. It may split a single run of characters over multiple method calls.

Document Object Model

Defines how XML and HTML documents are represented as objects in programs
W3C Standard
Defined in IDL; thus language independent
HTML as well as XML
Writing as well as reading
More complete than SAX or JDOM; covers everything except internal and external DTD subsets
DOM focuses more on the document; SAX focuses more on the parser.

The Design of the DOM API

Parser independent interfaces; parser dependent implementation classes. Most programs must use the parser dependent classes. JAXP helps solve this, but so far only for DOM Level 1.
Everything's a Node:
- Extensive use of polymorphism
- Lots of casting
Language independence means there's very limited use of the Java class library; Various features are reinvented
Language independence requires no method overloading because not all languages support it.
Several features are poor design in Java, if not in other languages:
- Named constants are often shorts
- Only one kind of exception; details provided by constants
- No Java-specific utility methods like equals(), hashCode(), clone(), or toString()

DOM Evolution

DOM Level 0:
DOM Level 1, a W3C Standard
DOM Level 2, a W3C Standard
DOM Level 3, eventual W3C Standard to add schema and DTD support

Eight Modules:

Eight Modules:

Core org.w3c.dom ^*

HTML org.w3c.dom.html

Views org.w3c.dom.views

StyleSheets org.w3c.dom.stylesheets

CSS org.w3c.dom.css

Events org.w3c.dom.events ^*

Traversal org.w3c.dom.traversal ^*

Range org.w3c.dom.range

Only the core and traversal modules really apply to XML. The other six are for HTML.
^* indicates Xerces support

DOM Trees

Each XML document should contain exactly one tree.
A tree contains nodes.
Some nodes may contain other nodes (depending on node type).
Each document node contains:
- zero or one doctype nodes
- one root element node
- zero or more comment and processing instruction nodes

org.w3c.dom

17 interfaces:

DOM Interface JDOM Equivalent

Attr Attribute

CDATASection CDATA

CharacterData

Comment Comment

Document Document

DocumentFragment

DocumentType DocType

DOMImplementation

Element Element

Entity Entity

EntityReference

NamedNodeMap

Node

NodeList

Notation

ProcessingInstruction ProcessingInstruction

Text java.lang.String

plus one exception: DOMException
Plus a bunch of HTML stuff in org.w3c.dom.html and other packages we will ignore

DOM Interface	JDOM Equivalent
`Attr`	`Attribute`
`CDATASection`	`CDATA`
`CharacterData`
`Comment`	`Comment`
`Document`	`Document`
`DocumentFragment`
`DocumentType`	`DocType`
`DOMImplementation`
`Element`	`Element`
`Entity`	`Entity`
`EntityReference`
`NamedNodeMap`
`Node`
`NodeList`
`Notation`
`ProcessingInstruction`	`ProcessingInstruction`
`Text`	`java.lang.String`

The DOM Process

Library specific code creates a parser
The parser parses the document and returns an org.w3c.dom.Document object.
The entire document is stored in memory.
DOM methods and interfaces are used to extract data from this object

The JDOM Process

Construct an org.jdom.input.SAXBuilder or an org.jdom.input.DOMBuilder; no parser specific code is needed!
Invoke the builder's build() method to build a Document object from a
- Reader
- InputStream
- URL
- File
- String containing a SYSTEM ID
If there's a problem building the document, a JDOMException is thrown
Work with the resulting Document object

Parsing a Document with JDOM

import org.jdom.JDOMException;
import org.jdom.input.SAXBuilder;


public class JDOMChecker {

  public static void main(String[] args) {
  
    if (args.length == 0) {
      System.out.println("Usage: java JDOMChecker URL1 URL2..."); 
    } 
      
    SAXBuilder builder = new SAXBuilder();
     
    // start parsing... 
    for (int i = 0; i < args.length; i++) {
      
      // command line should offer URIs or file names
      try {
        builder.build(args[i]);
        // If there are no well-formedness errors, 
        // then no exception is thrown
        System.out.println(args[i] + " is well formed.");
      }
      catch (JDOMException e) { // indicates a well-formedness or other error
        System.out.println(args[i] + " is not well formed.");
        System.out.println(e.getMessage());
      }
      
    }   
  
  }

}

Parser Results

% java JDOMChecker shortlogs.xml HelloJDOM.java
shortlogs.xml is well formed.
HelloJDOM.java is not well formed.
The markup in the document preceding the root element must be well-formed.: 
Error on line 1 of XML document: The markup in the document preceding the 
root element must be well-formed.

Turning on Validation in JDOM

Not all parsers are validating but Xerces-J is.
Validity errors are not fatal; therefore they do not necessarily cause a JDOMException
However, you can tell the builder you want it to validate by passing true to its constructor:
```
    SAXBuilder builder = new SAXBuilder(true);
```

JDOM Validator

import org.jdom.JDOMException;
import org.jdom.input.SAXBuilder;


public class Validator {

  public static void main(String[] args) {
  
    if (args.length == 0) {
      System.out.println("Usage: java Validator URL1 URL2..."); 
    } 
      
    SAXBuilder builder = new SAXBuilder(true);
                             /*         ^^^^       */
                             /* Turn on validation */
    // start parsing... 
    // start parsing... 
    for (int i = 0; i < args.length; i++) {
      
      // command line should offer URIs or file names
      try {
          builder.build(args[i]);
        // If there are no well-formedness errors, 
        // then no exception is thrown
        System.out.println(args[i] + " is well formed.");
      }
      catch (JDOMException e) { // indicates a well-formedness or validity error
        System.out.println(args[i] + " is not valid.");
        System.out.println(e.getMessage());
      }
      
    }   
  
  }

}

Validation Output

% java Validator invalid_fibonacci.xml
invalid_fibonacci.xml is not valid.
Element type "title" must be declared.: Error on line 8 of XML document: 
Element type "title" must be declared.

% java Validator validfibonacci.xml
validfibonacci.xml is valid.

Building with DOM instead of SAX

Use DOMBuilder instead of SAXBuilder
Must have an existing DOM tree, specifically an org.w3c.dom.Document (Note the name conflict with org.jdom.Document)
DOM validation is currently broken.
Approximately doubles the memory usage.
In general, SAX is easier and more efficient.

DOMBuilder Example

import org.jdom.*;
import org.jdom.input.DOMBuilder;
import org.apache.xerces.parsers.*;


public class DOMValidator {

  public static void main(String[] args) {
  
    if (args.length == 0) {
      System.out.println("Usage: java DOMValidator URL1 URL2..."); 
    }      
      
    DOMBuilder builder = new DOMBuilder(true);
                             /*         ^^^^       */
                             /* Turn on validation */
    // start parsing... 
    DOMParser parser = new DOMParser();  // Xerces specific class
    for (int i = 0; i < args.length; i++) {
        
      try {
        // Read the entire document into memory
        parser.parse(args[i]); 
    
        org.w3c.dom.Document domDoc  = parser.getDocument();
        org.jdom.Document    jdomDoc = builder.build(domDoc);

        // If there are no validity errors, 
        // then no exception is thrown
        System.out.println(args[i] + " is valid.");
      }
      catch (Exception e) { // indicates a well-formedness or validity error
        System.out.println(args[i] + " is not valid.");
        System.out.println(e.getMessage());
      }
      
    }   
  
  }

}

Reading XML Documents

One program, three implementations:

SAX
DOM
JDOM

UserLand's RSS based list of Web logs

UserLand's RSS based list of Web logs at http://static.userland.com/weblogMonitor/logs.xml:

<?xml version="1.0" encoding="ISO-8859-1" ?>
<!DOCTYPE foo SYSTEM "http://msdn.microsoft.com/xml/general/htmlentities.dtd">
<weblogs>
    <log>
        <name>MozillaZine</name>
        <url>http://www.mozillazine.org</url>
        <changesUrl>http://www.mozillazine.org/contents.rdf</changesUrl>
        <ownerName>Jason Kersey</ownerName>
        <ownerEmail>kerz@en.com</ownerEmail>
        <description>THE source for news on the Mozilla Organization.  DevChats, Reviews, Chats, Builds, Demos, Screenshots, and more.</description>
        <imageUrl></imageUrl>
        <adImageUrl>http://static.userland.com/weblogMonitor/ads/kerz@en.com.gif</adImageUrl>
        </log>
    <log>
        <name>SalonHerringWiredFool</name>
        <url>http://www.salonherringwiredfool.com/</url>
        <ownerName>Some Random Herring</ownerName>
        <ownerEmail>salonfool@wiredherring.com</ownerEmail>
        <description></description>
        </log>
    <log>
        <name>SlashDot.Org</name>
        <url>http://www.slashdot.org/</url>
        <ownerName>Simply a friend</ownerName>
        <ownerEmail>afriendofweblogs@weblogs.com</ownerEmail>
        <description>News for Nerds, Stuff that Matters.</description>
        </log>
    </weblogs>

Full list

Goal: Return a list of all the URLs in this list as java.net.URL objects

Design Decisions

Should we return an array, an Enumeration, a List, or what?
Perhaps we should use multiple threads?

The SAX ContentHandler interface

package org.xml.sax;


public interface ContentHandler {

  public void setDocumentLocator(Locator locator);
    
  public void startDocument() throws SAXException;
    
  public void endDocument() throws SAXException;
    
  public void startPrefixMapping(String prefix, String uri) 
   throws SAXException;

  public void endPrefixMapping(String prefix) throws SAXException;

  public void startElement(String namespaceURI, String localName,
   String rawName, Attributes atts) throws SAXException;

  public void endElement(String namespaceURI, String localName,
   String qualifiedName) throws SAXException;

  public void characters(char[] ch, int start, int length) 
   throws SAXException;

  public void ignorableWhitespace(char[] ch, int start, int length)
   throws SAXException;

  public void processingInstruction(String target, String data)
   throws SAXException;

  public void skippedEntity(String name) throws SAXException;
     
}

SAX Design

We do not know how many URLs there will be when we start parsing so let's use a Vector
Single threaded for simplicity but a real program would use multiple threads
- One to load and parse the data
- Another thread (probably the main thread) to serve the data
- Early data could be provided before the entire document had been read
The character data of each url element needs to be stored. Everything else can be ignored.
A startElement() with the name url indicates that we need to start storing this data.
A stopElement() with the name url indicates that we need to stop storing this data, convert it to a URL and put it in the Vector
Should we hide the XML parsing inside a non-public class to avoid accidentally calling the methods from unexpected places or threads?

User Interface Class

import org.xml.sax.*;
import org.xml.sax.helpers.XMLReaderFactory;
import java.util.*;
import java.io.*;


public class WeblogsSAX {
     
  public static List listChannels() 
   throws IOException, SAXException {
    return listChannels(
     "http://static.userland.com/weblogMonitor/logs.xml"); 
  }
  
  public static List listChannels(String uri) 
   throws IOException, SAXException {
    
    XMLReader parser = XMLReaderFactory.createXMLReader();
    Vector urls = new Vector(1000);
    URIGrabber u = new URIGrabber(urls);
    parser.setContentHandler(u);
    parser.parse(uri);
    return urls;
    
  }
  
  public static void main(String[] args) {
   
    try {
      List urls;
      if (args.length > 0) urls = listChannels(args[0]);
      else urls = listChannels();
      Iterator iterator = urls.iterator();
      while (iterator.hasNext()) {
        System.out.println(iterator.next()); 
      }
    }
    catch (IOException e) {
      System.err.println(e); 
    }
    catch (SAXParseException e) {
      System.err.println(e); 
      System.err.println("at line " + e.getLineNumber() 
       + ", column " + e.getColumnNumber()); 
    }
    catch (SAXException e) {
      System.err.println(e); 
    }
    catch (/* Unexpected */ Exception e) {
      e.printStackTrace(); 
    }
    
  }
  
}

ContentHandler Class

import org.xml.sax.*;
import java.net.*;
import java.util.Vector;

             // conflicts with java.net.ContentHandler
class URIGrabber implements org.xml.sax.ContentHandler {
    
  private Vector urls;
     
  URIGrabber(Vector urls) {
    this.urls = urls;
  }
    
  // do nothing methods  
  public void setDocumentLocator(Locator locator) {}
  public void startDocument() throws SAXException {}
  public void endDocument() throws SAXException {}
  public void startPrefixMapping(String prefix, String uri) 
   throws SAXException {}
  public void endPrefixMapping(String prefix) throws SAXException {}
  public void skippedEntity(String name) throws SAXException {}  
  public void ignorableWhitespace(char[] text, int start, int length)
   throws SAXException {}
  public void processingInstruction(String target, String data)
   throws SAXException {}
  
  
  // Remember, there's no guarantee all the text of the
  // url element will be returned in a single call to characters
  private StringBuffer urlBuffer;
  private boolean collecting = false;
  
  public void startElement(String namespaceURI, String localName,
   String rawName, Attributes atts) throws SAXException {
	  
    if (rawName.equals("url")) {
      collecting = true;
      urlBuffer = new StringBuffer();
    } 
    
  }
  
  public void characters(char[] text, int start, int length) 
   throws SAXException {
    
    if (collecting) {
      urlBuffer.append(text, start, length);
    } 
    
  }
  
  public void endElement(String namespaceURI, String localName,
   String rawName) throws SAXException {
	  
    if (rawName.equals("url")) {
      collecting = false;
      String url = urlBuffer.toString();
      try {
        urls.addElement(new URL(url));
      }
      catch (MalformedURLException e) {
        // skip this url
      }
    }
    
  } 
    
}

Weblogs Output

% java Weblogs shortlogs.xml
http://www.mozillazine.org
http://www.salonherringwiredfool.com/
http://www.slashdot.org/

Weblogs with DOM

This example is very sequential so SAX fits it nicely
Let's look at the port to DOM

DOM Design

We cannot easily find out how many URLs there will be when we start parsing, even though they're all in memory.
Single threaded by nature; no benefit to multiple threads since no data will be available until the entire document has been read and parsed.
The character data of each url element needs to be read. Everything else can be ignored.
We can use NodeIterator to walk the tree.
We can use NodeIterator to select only the url elements.
The XML parsing is so straight-forward it can be done inside one method. No extra class is required.

The Node Interface

package org.w3c.dom;

public interface Node {

  // NodeType
  public static final short ELEMENT_NODE                = 1;
  public static final short ATTRIBUTE_NODE              = 2;
  public static final short TEXT_NODE                   = 3;
  public static final short CDATA_SECTION_NODE          = 4;
  public static final short ENTITY_REFERENCE_NODE       = 5;
  public static final short ENTITY_NODE                 = 6;
  public static final short PROCESSING_INSTRUCTION_NODE = 7;
  public static final short COMMENT_NODE                = 8;
  public static final short DOCUMENT_NODE               = 9;
  public static final short DOCUMENT_TYPE_NODE          = 10;
  public static final short DOCUMENT_FRAGMENT_NODE      = 11;
  public static final short NOTATION_NODE               = 12;

  public String       getNodeName();
  public String       getNodeValue() throws DOMException;
  public void         setNodeValue(String nodeValue) throws DOMException;
  public short        getNodeType();
  public Node         getParentNode();
  public NodeList     getChildNodes();
  public Node         getFirstChild();
  public Node         getLastChild();
  public Node         getPreviousSibling();
  public Node         getNextSibling();
  public NamedNodeMap getAttributes();
  public Document     getOwnerDocument();
  public Node         insertBefore(Node newChild, Node refChild) throws DOMException;
  public Node         replaceChild(Node newChild, Node oldChild) throws DOMException;
  public Node         removeChild(Node oldChild) throws DOMException;
  public Node         appendChild(Node newChild) throws DOMException;
  public boolean      hasChildNodes();
  public Node         cloneNode(boolean deep);
  public void         normalize();
  public boolean      supports(String feature, String version);
  public String       getNamespaceURI();
  public String       getPrefix();
  public void         setPrefix(String prefix) throws DOMException;
  public String       getLocalName();
  
}

The NodeIterator Interface

package org.w3c.dom.traversal;

public interface NodeIterator {

  public Node       nextNode()     throws DOMException;
  public Node       previousNode() throws DOMException;
  public int        getWhatToShow();
  public NodeFilter getFilter();
  public boolean    getExpandEntityReferences();
  public void       detach();
    
}

The NodeFilter Interface

package org.w3c.dom.traversal;

public interface NodeFilter {

    // Constants returned by acceptNode
    public static final short   FILTER_ACCEPT = 1;
    public static final short   FILTER_REJECT = 2;
    public static final short   FILTER_SKIP   = 3;

    public short acceptNode(Node n);
    
    // Constants for whatToShow
    public static final int     SHOW_ALL                    = 0x0000FFFF;
    public static final int     SHOW_ELEMENT                = 0x00000001;
    public static final int     SHOW_ATTRIBUTE              = 0x00000002;
    public static final int     SHOW_TEXT                   = 0x00000004;
    public static final int     SHOW_CDATA_SECTION          = 0x00000008;
    public static final int     SHOW_ENTITY_REFERENCE       = 0x00000010;
    public static final int     SHOW_ENTITY                 = 0x00000020;
    public static final int     SHOW_PROCESSING_INSTRUCTION = 0x00000040;
    public static final int     SHOW_COMMENT                = 0x00000080;
    public static final int     SHOW_DOCUMENT               = 0x00000100;
    public static final int     SHOW_DOCUMENT_TYPE          = 0x00000200;
    public static final int     SHOW_DOCUMENT_FRAGMENT      = 0x00000400;
    public static final int     SHOW_NOTATION               = 0x00000800;

}

Weblogs with DOM

import org.w3c.dom.*;
import org.w3c.dom.traversal.*;
import org.xml.sax.SAXException;
import java.io.IOException;
import java.util.*;
import java.net.*;


public class WeblogsDOM {

  public static String DEFAULT_URL 
   = "http://static.userland.com/weblogMonitor/logs.xml"; 
     
  public static List listChannels() throws DOMException {
    return listChannels(DEFAULT_URL); 
  }
  
  public static List listChannels(String uri) throws DOMException {
    
    if (uri == null) {
      throw new NullPointerException("URL must be non-null");   
    }

    org.apache.xerces.parsers.DOMParser parser 
     = new org.apache.xerces.parsers.DOMParser();
    
    Vector urls = null;
    
    try {
      // Read the entire document into memory
      parser.parse(uri); 
      Document doc = parser.getDocument();
      org.apache.xerces.dom.DocumentImpl impl 
       = (org.apache.xerces.dom.DocumentImpl) doc;
      NodeIterator iterator = impl.createNodeIterator(doc, 
       NodeFilter.SHOW_ALL, new URLFilter(), true);
      urls = new Vector(100);

      Node current = null;
      while ((current = iterator.nextNode()) != null) {
        try {
          String content = current.getNodeValue();
          URL u = new URL(content);
          urls.addElement(u);
        }
        catch (MalformedURLException e) {
          // bad input data from one third party; just ignore it 
        }
      }
    }
    catch (SAXException e) {
      System.err.println(e); 
    }
    catch (IOException e) {
      System.err.println(e); 
    }
    
    return urls;
    
  }
  
  static class URLFilter implements NodeFilter {
        
    public short acceptNode(Node n) {
      
      if (n instanceof Text) {
        Node parent = n.getParentNode();
        if (parent instanceof Element) {
          Element e = (Element) parent;
          if (e.getTagName().equals("url")) {
            return NodeFilter.FILTER_ACCEPT;       
          }
        }
      }
      
      return NodeFilter.FILTER_REJECT;
      
    }
    
  }
    
  public static void main(String[] args) {
     
    try {
      List urls;
      if (args.length > 0) {
        try {
          URL url = new URL(args[0]);
          urls = listChannels(args[0]);
        }
        catch (MalformedURLException e) {
          System.err.println("Usage: java WeblogsJDOM url");
          return;
        }
      }
      else {
        urls = listChannels();
      }
      Iterator iterator = urls.iterator();
      while (iterator.hasNext()) {
        System.out.println(iterator.next()); 
      }
    }
    catch (/* Unexpected */ Exception e) {
      e.printStackTrace(); 
    }
    
  } // end main

}

Weblogs Output

% java WeblogsDOM
http://2020Hindsight.editthispage.com/
http://www.sff.net/people/mitchw/weblog/weblog.htp
http://nate.weblogs.com/
http://plugins.launchpoint.net
http://404.psistorm.net
http://home.att.net/~geek9000
http://daubnet.tzo.com/weblog
several hundred more...

Weblogs with JDOM

Let's look at the port to JDOM

JDOM Design

We can easily find out how many URLs there will be when we start parsing.
Single threaded by nature; no benefit to mutiple threads since no data will be available until the entire document has been read and parsed.
The character data of each url element needs to be read. Everything else can be ignored.
The format is very straight-forward so we don't need to traverse the entire tree.
The XML parsing is so straight-forward it can be done inside one method. No extra class is required.

Weblogs with JDOM

import org.jdom.*;
import org.jdom.input.SAXBuilder;
import java.util.*;
import java.net.*;


public class WeblogsJDOM {
   
  public static String DEFAULT_SYSTEM_ID 
   = "http://static.userland.com/weblogMonitor/logs.xml"; 
     
  public static List listChannels() throws JDOMException {
    return listChannels(DEFAULT_SYSTEM_ID); 
  }
  
  public static List listChannels(String systemID) 
   throws JDOMException, NullPointerException {
    
    if (systemID == null) {
      throw new NullPointerException("URL must be non-null");   
    }
    
    SAXBuilder builder = new SAXBuilder();
    // Load the entire document into memory 
    // from the network or file system
    Document doc = builder.build(systemID);
    
    // Descend the tree and find the URLs. It helps that
    // the document has a very regular structure.
    Element weblogs = doc.getRootElement();
    List logs = weblogs.getChildren("log");
    Vector urls = new Vector(logs.size());
    Iterator iterator = logs.iterator();
    while (iterator.hasNext()) {
      Object o = iterator.next();
      Element log = (Element) o;
      try {
                         // This will probably be changed to 
                         //  getElement() or getChildElement() 
        Element url = log.getChild("url"); 
        if (url == null) continue;
        String content = url.getTextTrim();
        URL u = new URL(content);
        urls.addElement(u);
      }
      catch (MalformedURLException e) {
        // bad input data from one third party; just ignore it 
      }
    }
    return urls;
    
  }
  
  public static void main(String[] args) {
   
    try {
      List urls;
      if (args.length > 0) {
        urls = listChannels(args[0]);
      }
      else {
        urls = listChannels();
      }
      Iterator iterator = urls.iterator();
      while (iterator.hasNext()) {
        System.out.println(iterator.next()); 
      }
    }
    catch (/* Unexpected */ Exception e) {
      e.printStackTrace(); 
    }
    
  }
  
}

Weblogs Output

% java WeblogsJDOM
http://2020Hindsight.editthispage.com/
http://www.sff.net/people/mitchw/weblog/weblog.htp
http://nate.weblogs.com/
http://plugins.launchpoint.net
http://404.psistorm.net
http://home.att.net/~geek9000
http://daubnet.tzo.com/weblog
several hundred more...

The org.jdom Package

The classes that represent an XML document and its parts

Document
Element
Attribute
Comment
DocType
Entity
ProcessingInstruction
Verifier
plus assorted exceptions

The Document Node

The root node containing the entire document; not the same as the root element
Contains:
- one element
- zero or more processing instructions
- zero or more comments
- zero or one document type declarations

The Document Class

package org.jdom;

public class Document implements Serializable, Cloneable {

  protected List    content;
  protected Element rootElement;
  protected DocType docType;

  protected Document() {}
  public    Document(Element rootElement) {}
  public    Document(Element rootElement, DocType docType) {}

  public Element   getRootElement() {}
  public Document  setRootElement(Element rootElement) {}
  public DocType   getDocType() {}
  public Document  setDocType(DocType docType) {}
  public List      getProcessingInstructions() {}
  public List      getProcessingInstructions(String target) {}
  public ProcessingInstruction getProcessingInstruction(String target)
    throws NoSuchProcessingInstructionException {}
  public Document  addProcessingInstruction(ProcessingInstruction pi) {}
  public Document  addProcessingInstruction(String target, String data) {}
  public Document  addProcessingInstruction(String target, Map data) {}
  public Document  setProcessingInstructions(List processingInstructions) {}
  public boolean   removeProcessingInstruction(ProcessingInstruction processingInstruction) {}
  public boolean   removeProcessingInstruction(String target) {}
  public boolean   removeProcessingInstructions(String target) {}
  public Document  addComment(Comment comment) {}
  public List      getMixedContent() {}
  
  // basic utility methods
  public final String  toString() {}
  public final String  getSerializedForm() {}  // going away
  public final boolean equals(Object ob) {}
  public final int     hashCode() {}
  public final Object  clone() {}

}

Document Example

import org.jdom.Document;
import org.jdom.JDOMException;
import org.jdom.input.SAXBuilder;
import org.jdom.output.XMLOutputter;
import java.io.IOException;


public class XMLPrinter {

  public static void main(String[] args) {
  
    if (args.length == 0) {
      System.out.println("Usage: java XMLPrinter URL1 URL2..."); 
    } 
      
    SAXBuilder builder = new SAXBuilder();
     
    // start parsing... 
    for (int i = 0; i < args.length; i++) {
      
      // command line should offer URIs or file names
      try {
        Document doc = builder.build(args[i]);
        System.out.println("*************" + args[i] + "*************");
        XMLOutputter outputter = new XMLOutputter();
        outputter.output(doc, System.out);
      }
      catch (JDOMException e) { // indicates a well-formedness or other error
        System.out.println(args[i] + " is not well formed.");
        System.out.println(e.getMessage());
      }
      catch (IOException e) { // shouldn't happen beacuse System.out eats exceptions
        System.out.println(e.getMessage());
      }
      
    }   
  
  }

}

Output from XMLPrinter

% java XMLPrinter shortlogs.xml
*************shortlogs.xml*************
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE foo SYSTEM "http://msdn.microsoft.com/xml/general/htmlentities.dtd"><weblogs>
        <log>
                <name>MozillaZine</name>
                <url>http://www.mozillazine.org</url>
                <changesUrl>http://www.mozillazine.org/contents.rdf</changesUrl>

                <ownerName>Jason Kersey</ownerName>
                <ownerEmail>kerz@en.com</ownerEmail>
                <description>THE source for news on the Mozilla Organization.  DevChats, Reviews, Chats, Builds, Demos, Screenshots, and more.</description>
                <imageUrl />
                <adImageUrl>http://static.userland.com/weblogMonitor/ads/kerz@en.com.gif</adImageUrl>
                </log>
        <log>
                <name>SalonHerringWiredFool</name>
                <url>http://www.salonherringwiredfool.com/</url>
                <ownerName>Some Random Herring</ownerName>
                <ownerEmail>salonfool@wiredherring.com</ownerEmail>
                <description />
                </log>
        <log>
                <name>SlashDot.Org</name>
                <url>http://www.slashdot.org/</url>
                <ownerName>Simply a friend</ownerName>
                <ownerEmail>afriendofweblogs@weblogs.com</ownerEmail>
                <description>News for Nerds, Stuff that Matters.</description>
                </log>
        </weblogs>

Element Nodes

Represents a complete element including its start tag, end tag, and content
Contains:
- Child Elements
- Processing Instructions
- Comments
- Text
JDOM enforces restrictions on element names and possibly values; e.g. name cannot contain start with a digit.

Element Class Implementation

The content is stored as a java.util.List which contains
- One String object per text node
- One Element object per child element
- One Comment object per comment
- One CDATA object per CDATA section
- One ProcessingInstruction object per processing instruction
Use the regular methods of java.util.List to add, remove, and inspect the contents of an element
Since the methods of java.util.List expect to work with Object objects, casting back to JDOM types and String is frequent
Various utility methods mean you don't always have to work with the full list.
Attributes are available as a separate List since attributes are not children.
This list only contains Attribute objects.

The Element Class

package org.jdom;

public class Element implements Serializable, Cloneable {

    protected String    name;
    protected Namespace namespace;
    protected Element   parent;
    protected boolean   isRootElement;
    protected List      attributes;
    protected List      content;

    protected Element() {}
    public    Element(String name, Namespace namespace) {}
    public    Element(String name) {}
    public    Element(String name, String uri) {}
    public    Element(String name, String prefix, String uri) {}

    public String     getName() {}
    public Namespace  getNamespace() {}
    public String     getNamespacePrefix() {}
    public String     getNamespaceURI() {}
    public String     getQualifiedName() {}
    
    public Element    getParent() {}
    protected Element setParent(Element parent) {}
    public boolean    isRootElement() {}
    protected Element setIsRootElement(boolean isRootElement) {}
    

    public String    getText() {} 
    public String    getTextTrim() {} 
    public boolean   hasMixedContent() {} 
    public List      getMixedContent() {}
    public String    getChildText(String name) {} 
    public String    getChildTextTrim(String name) {} 
    public String    getChildText(String name, Namespace ns) {} 

    public Element   setMixedContent(List mixedContent) {} 
    public List      getChildren() {} 
    public Element   setChildren(List children) {} 
    public List      getChildren(String name, Namespace ns) {} 
    // will be renamed, probably getElement() {}
    public Element   getChild(String name, Namespace ns) {} 
    public Element   getChild(String name) {} 
    public boolean   removeChild(String name) {} 
    public boolean   removeChild(String name, Namespace ns) {} 
    public boolean   removeChildren(String name) {}
    public boolean   removeChildren(String name, Namespace ns) {} 
    public boolean   removeChildren() {} 
    
    public Element   addContent(String text) {}
    public Element   addContent(Element element) {} 
    public Element   addContent(ProcessingInstruction pi) {} 
    public Element   addContent(Entity entity) {} 
    public Element   addContent(Comment comment) {} 
    public Element   addContent(CDATA cdata) {} 
    public boolean   removeContent(Element element) {} 
    public boolean   removeContent(ProcessingInstruction pi) {} 
    public boolean   removeContent(Entity entity) {} 
    public boolean   removeContent(Comment comment) {} 
    
    public List      getAttributes() {} 
    public Attribute getAttribute(String name) {} 
    public Attribute getAttribute(String name, Namespace ns) {} 
    public String    getAttributeValue(String name) {} 
    public String    getAttributeValue(String name, Namespace ns) {} 
    public Element   setAttributes(List attributes) {} 
    public Element   addAttribute(Attribute attribute) {}
    public Element   addAttribute(String name, String value) {} 
    public boolean   removeAttribute(String name, String uri) {} 
    public boolean   removeAttribute(String name) {} 
    public boolean   removeAttribute(String name, Namespace ns) {} 
    
    public Element   getCopy(String name, Namespace ns) {}
    public Element   getCopy(String name, String uri) {}
    public Element   getCopy(String name, String prefix, String uri) {}

    /////////////////////////////////////////////////////////////////
    // Basic Utility Methods
    ///////////////////////////////////////////////////////////////// 
    public final String  toString() {}
    public final String  getSerializedForm() {}  // will be removed
    public final boolean equals(Object ob) {}
    public final int     hashCode() {}
    public final Object  clone() {}
    
}

Element Example: XCount

import org.jdom.*;
import org.jdom.input.SAXBuilder;
import java.util.*;


public class XCount {

  public static void main(String[] args) {
  
    if (args.length == 0) {
      System.out.println("Usage: java XCount URL1 URL2..."); 
    } 
      
    SAXBuilder builder = new SAXBuilder();
     
    System.out.println(
     "File\tElements\tAttributes\tComments\tProcessing Instructions\tCharacters");
    // start parsing... 
    for (int i = 0; i < args.length; i++) {
      
      // command line should offer URIs or file names
      try {
        Document doc = builder.build(args[i]);
        System.out.print(args[i] + ":\t");
        String result = count(doc);
        System.out.println(result);
      }
      catch (JDOMException e) { // indicates a well-formedness or other error
        System.out.println(args[i] + " is not a well formed XML document.");
        System.out.println(e.getMessage());
      }
      
    }   
  
  }  

  private static int numCharacters             = 0;
  private static int numComments               = 0;
  private static int numElements               = 0;
  private static int numAttributes             = 0;
  private static int numProcessingInstructions = 0;
      
  public static String count(Document doc) {

    numCharacters = 0;
    numComments = 0;
    numElements = 0;
    numAttributes = 0;
    numProcessingInstructions = 0;  

    List children = doc.getMixedContent();
    Iterator iterator = children.iterator();
    while (iterator.hasNext()) {
      Object o = iterator.next();
      if (o instanceof Element) {
        numElements++;
        count((Element) o);
      }
      else if (o instanceof Comment) numComments++;
      else if (o instanceof ProcessingInstruction) numProcessingInstructions++;   
    }
    
    String result = numElements + "\t" + numAttributes + "\t" 
     + numComments + "\t" + numProcessingInstructions + "\t" + numCharacters;
    return result;
       
  }     

  public static void count(Element element) {

    List attributes = element.getAttributes();
    numAttributes += attributes.size();
    List children = element.getMixedContent();
    Iterator iterator = children.iterator();
    while (iterator.hasNext()) {
      Object o = iterator.next();
      if (o instanceof Element) {
        numElements++;
        count((Element) o);
      }
      else if (o instanceof Comment) numComments++;
      else if (o instanceof ProcessingInstruction) numProcessingInstructions++;   
      else if (o instanceof String) {
        String s = (String) o;
        numCharacters += s.length();
      }   
    }
        
  }  

}

XCount Output

% java XCount shortlogs.xml hotcop.xml
File    Elements        Attributes      Comments        Processing Instructions
Characters
shortlogs.xml:  30      0       0       0       736
hotcop.xml:     11      8       2       1       95

Handling Attributes in JDOM

Each attribute is represented as an Attribute object
Each Attribute has:
- A local name, a String
- A value, a String
- A Namespace object (which may be Namespace.NO_NAMESPACE)
Everything else can be determined from these three items.

Convenience methods can convert the attribute value to various types like int or double
JDOM enforces restrictions on attribute names and values; e.g. value may not contain < or >
Attributes are stored in a java.util.List in the Element that contains them
This list only contains Attribute objects.

The Attribute Class

package org.jdom;

public class Attribute implements Serializable, Cloneable {

    protected String    name;
    protected Namespace namespace;
    protected String    value;

    protected Attribute() {}
    public    Attribute(String name, String value, Namespace namespace) {}
    public    Attribute(String name, String prefix, String uri, String value) {}
    public    Attribute(String name, String value) {}

    public String    getName() {}
    public String    getQualifiedName() {}
    public String    getNamespacePrefix() {}
    public String    getNamespaceURI() {}
    public Namespace getNamespace() {}
    public String    getValue() {}
    public void      setValue(String value) {}

    /////////////////////////////////////////////////////////////////
    // Basic Utility Methods
    /////////////////////////////////////////////////////////////////

    public final String  toString() {}
    public final boolean equals(Object ob) {}
    public final int     hashCode() {}
    public final Object  clone() {}

    /////////////////////////////////////////////////////////////////
    // Convenience Methods below here
    /////////////////////////////////////////////////////////////////

    public String  getValue(String defaultValue) {}
    public int     getIntValue(int defaultValue) {}
    public int     getIntValue() throws DataConversionException {}
    public long    getLongValue(long defaultValue) {}
    public long    getLongValue() throws DataConversionException {}
    public float   getFloatValue(float defaultValue) {}
    public float   getFloatValue() throws DataConversionException {}
    public double  getDoubleValue(double defaultValue) {}
    public double  getDoubleValue() throws DataConversionException {}
    public boolean getBooleanValue(boolean defaultValue) {}
    public boolean getBooleanValue() throws DataConversionException {}
    public char    getCharValue(char defaultValue) {}
    public char    getCharValue() throws DataConversionException {}

}

IDTagger

import java.io.IOException;
import org.jdom.*;
import org.jdom.input.SAXBuilder;
import org.jdom.output.XMLOutputter;
import java.util.*;


public class IDTagger {

  private static int id = 1;

  public static void processElement(Element element) {

    if (element.getAttribute("ID") == null) {
      element.addAttribute(new Attribute("ID", "_" + id));
      id = id + 1; 
    }
    
    // recursion
    List children = element.getChildren();
    Iterator iterator = children.iterator();
    while (iterator.hasNext()) {
      processElement((Element) iterator.next());   
    }
    
  }

  public static void main(String[] args) {
     
    SAXBuilder builder = new SAXBuilder();
    
    for (int i = 0; i < args.length; i++) {
        
      try {
        // Read the entire document into memory
        Document document = builder.build(args[i]); 
       
        processElement(document.getRootElement());
        
        // now we serialize the document...
        XMLOutputter serializer = new XMLOutputter(); 
        serializer.output(document, System.out);
        System.out.flush();	        
      }
      catch (JDOMException e) {
        System.err.println(e);
        continue; 
      }
      catch (IOException e) {
        System.err.println(e);
        continue; 
      }
      
    }
  
  } // end main

}

Before IDTagger

<?xml version="1.0"?><backslash
xmlns:backslash="http://slashdot.org/backslash.dtd">

 <story>
    <title>The Onion to buy the New York Times</title>
    <url>http://slashdot.org/articles/00/02/19/1128240.shtml</url>
    <time>2000-02-19 17:25:15</time>
    <author>CmdrTaco</author>
    <department>stuff-to-read</department>
    <topic>media</topic>
    <comments>20</comments>
    <section>articles</section>
    <image>topicmedia.gif</image>
  </story>
 <story>
    <title>Al Gore's Webmaster Answers Your Questions</title>
    <url>http://slashdot.org/interviews/00/02/19/0932207.shtml</url>
    <time>2000-02-19 17:00:52</time>
    <author>Roblimo</author>
    <department>political-process-online</department>
    <topic>usa</topic>
    <comments>49</comments>
    <section>interviews</section>
    <image>topicus.gif</image>
  </story>
 <story>
    <title>Open Source Africa</title>
    <url>http://slashdot.org/articles/00/02/19/1016216.shtml</url>
    <time>2000-02-19 16:05:58</time>
    <author>emmett</author>
    <department>songs-by-toto</department>
    <topic>linux</topic>
    <comments>50</comments>
    <section>articles</section>
    <image>topiclinux.gif</image>
  </story>
 <story>
    <title>Microsoft Funded by NSA, Helps Spy on Win Users?</title>
    <url>http://slashdot.org/articles/00/02/19/0750247.shtml</url>
    <time>2000-02-19 14:07:04</time>
    <author>Roblimo</author>
    <department>deep-dark-conspiracy-theories</department>
    <topic>microsoft</topic>
    <comments>154</comments>
    <section>articles</section>
    <image>topicms.gif</image>
  </story>
 <story>
    <title>X-Men Trailer Released</title>
    <url>http://slashdot.org/articles/00/02/18/0829209.shtml</url>
    <time>2000-02-19 13:47:06</time>
    <author>emmett</author>
    <department>mutant</department>
    <topic>movies</topic>
    <comments>70</comments>
    <section>articles</section>
    <image>topicmovies.gif</image>
  </story>
 <story>
    <title>Connell Replies to "Grok" Comments</title>
    <url>http://slashdot.org/articles/00/02/18/202240.shtml</url>
    <time>2000-02-19 05:01:37</time>
    <author>Hemos</author>
    <department>replying-to-things</department>
    <topic>linux</topic>
    <comments>197</comments>
    <section>articles</section>
    <image>topiclinux.gif</image>
  </story>
 <story>
    <title>etoy.com Returns</title>
    <url>http://slashdot.org/yro/00/02/18/1739216.shtml</url>
    <time>2000-02-19 02:35:06</time>
    <author>nik</author>
    <department>NP:-gimme-shelter</department>
    <topic>internet</topic>
    <comments>77</comments>
    <section>yro</section>
    <image>topicinternet.jpg</image>
  </story>
 <story>
    <title>New Propaganda Series: Rebirth</title>
    <url>http://slashdot.org/articles/00/02/18/205232.shtml</url>
    <time>2000-02-19 01:05:26</time>
    <author>Hemos</author>
    <department>as-pretty-as-always</department>
    <topic>graphics</topic>
    <comments>120</comments>
    <section>articles</section>
    <image>topicgraphics3.gif</image>
  </story>
 <story>
    <title>Giving Back</title>
    <url>http://slashdot.org/features/00/02/18/1631224.shtml</url>
    <time>2000-02-18 22:27:26</time>
    <author>emmett</author>
    <department>salvation-army</department>
    <topic>news</topic>
    <comments>122</comments>
    <section>features</section>
    <image>topicnews.gif</image>
  </story>
 <story>
    <title>Connectix Considering Open Sourcing VGS?</title>
    <url>http://slashdot.org/articles/00/02/18/1050225.shtml</url>
    <time>2000-02-18 20:46:20</time>
    <author>emmett</author>
    <department>grain-of-salt</department>
    <topic>news</topic>
    <comments>93</comments>
    <section>articles</section>
    <image>topicnews.gif</image>
  </story>
</backslash>

View Input in Browser

After IDTagger

<?xml version="1.0" encoding="UTF-8"?>
<backslash ID="_1">
  <story ID="_2">
    <title ID="_3">The Onion to buy the New York Times</title>
    <url ID="_4">http://slashdot.org/articles/00/02/19/1128240.shtml</url>
    <time ID="_5">2000-02-19 17:25:15</time>
    <author ID="_6">CmdrTaco</author>
    <department ID="_7">stuff-to-read</department>
    <topic ID="_8">media</topic>
    <comments ID="_9">20</comments>
    <section ID="_10">articles</section>
    <image ID="_11">topicmedia.gif</image>
  </story>
  <story ID="_12">
    <title ID="_13">Al Gore's Webmaster Answers Your Questions</title>
    <url ID="_14">http://slashdot.org/interviews/00/02/19/0932207.shtml</url>
    <time ID="_15">2000-02-19 17:00:52</time>
    <author ID="_16">Roblimo</author>
    <department ID="_17">political-process-online</department>
    <topic ID="_18">usa</topic>
    <comments ID="_19">49</comments>
    <section ID="_20">interviews</section>
    <image ID="_21">topicus.gif</image>
  </story>
  <story ID="_22">
    <title ID="_23">Open Source Africa</title>
    <url ID="_24">http://slashdot.org/articles/00/02/19/1016216.shtml</url>
    <time ID="_25">2000-02-19 16:05:58</time>
    <author ID="_26">emmett</author>
    <department ID="_27">songs-by-toto</department>
    <topic ID="_28">linux</topic>
    <comments ID="_29">50</comments>
    <section ID="_30">articles</section>
    <image ID="_31">topiclinux.gif</image>
  </story>
  <story ID="_32">
    <title ID="_33">Microsoft Funded by NSA, Helps Spy on Win Users?</title>
    <url ID="_34">http://slashdot.org/articles/00/02/19/0750247.shtml</url>
    <time ID="_35">2000-02-19 14:07:04</time>
    <author ID="_36">Roblimo</author>
    <department ID="_37">deep-dark-conspiracy-theories</department>
    <topic ID="_38">microsoft</topic>
    <comments ID="_39">154</comments>
    <section ID="_40">articles</section>
    <image ID="_41">topicms.gif</image>
  </story>
  <story ID="_42">
    <title ID="_43">X-Men Trailer Released</title>
    <url ID="_44">http://slashdot.org/articles/00/02/18/0829209.shtml</url>
    <time ID="_45">2000-02-19 13:47:06</time>
    <author ID="_46">emmett</author>
    <department ID="_47">mutant</department>
    <topic ID="_48">movies</topic>
    <comments ID="_49">70</comments>
    <section ID="_50">articles</section>
    <image ID="_51">topicmovies.gif</image>
  </story>
  <story ID="_52">
    <title ID="_53">Connell Replies to "Grok" Comments</title>
    <url ID="_54">http://slashdot.org/articles/00/02/18/202240.shtml</url>
    <time ID="_55">2000-02-19 05:01:37</time>
    <author ID="_56">Hemos</author>
    <department ID="_57">replying-to-things</department>
    <topic ID="_58">linux</topic>
    <comments ID="_59">197</comments>
    <section ID="_60">articles</section>
    <image ID="_61">topiclinux.gif</image>
  </story>
  <story ID="_62">
    <title ID="_63">etoy.com Returns</title>
    <url ID="_64">http://slashdot.org/yro/00/02/18/1739216.shtml</url>
    <time ID="_65">2000-02-19 02:35:06</time>
    <author ID="_66">nik</author>
    <department ID="_67">NP:-gimme-shelter</department>
    <topic ID="_68">internet</topic>
    <comments ID="_69">77</comments>
    <section ID="_70">yro</section>
    <image ID="_71">topicinternet.jpg</image>
  </story>
  <story ID="_72">
    <title ID="_73">New Propaganda Series: Rebirth</title>
    <url ID="_74">http://slashdot.org/articles/00/02/18/205232.shtml</url>
    <time ID="_75">2000-02-19 01:05:26</time>
    <author ID="_76">Hemos</author>
    <department ID="_77">as-pretty-as-always</department>
    <topic ID="_78">graphics</topic>
    <comments ID="_79">120</comments>
    <section ID="_80">articles</section>
    <image ID="_81">topicgraphics3.gif</image>
  </story>
  <story ID="_82">
    <title ID="_83">Giving Back</title>
    <url ID="_84">http://slashdot.org/features/00/02/18/1631224.shtml</url>
    <time ID="_85">2000-02-18 22:27:26</time>
    <author ID="_86">emmett</author>
    <department ID="_87">salvation-army</department>
    <topic ID="_88">news</topic>
    <comments ID="_89">122</comments>
    <section ID="_90">features</section>
    <image ID="_91">topicnews.gif</image>
  </story>
  <story ID="_92">
    <title ID="_93">Connectix Considering Open Sourcing VGS?</title>
    <url ID="_94">http://slashdot.org/articles/00/02/18/1050225.shtml</url>
    <time ID="_95">2000-02-18 20:46:20</time>
    <author ID="_96">emmett</author>
    <department ID="_97">grain-of-salt</department>
    <topic ID="_98">news</topic>
    <comments ID="_99">93</comments>
    <section ID="_100">articles</section>
    <image ID="_101">topicnews.gif</image>
  </story>
</backslash>

View Output in Browser

Handling Entities in JDOM

Unparsed entities really aren't handled at all.
In general, the parser resolves parsed entities and you never see them.
When writing, the outputter outputs entity references but not the entity's content.
The Entity class represents a parsed entity.
The API is mostly like the API of Element
This one is still being thought out.

The Entity Class

package org.jdom;

public class Entity implements Serializable, Cloneable {

    protected String name;
    protected List   content;

    protected Entity() {}
    public    Entity(String name) {}
    
    public String  getName() {}
    public String  getContent() {}
    public Entity  setContent(String textContent) {}
    public boolean hasMixedContent() {}
    public List    getMixedContent() {}
    public Entity  setMixedContent(List mixedContent) {}
    public List    getChildren() {}
    public Entity  setChildren(List children) {}
    public Entity  addChild(Element element) {}
    public Entity  addChild(String s) {}
    public Entity  addText(String text) {}
    
    public final String  toString() {}
    public final String  getSerializedForm() {} // will be removed
    public final boolean equals(Object ob) {}
    public final int     hashCode() {}
    public final Object  clone() {}

}

Handling Comments in JDOM

A Comment object Represents a comment like this example from the XML 1.0 spec:

<!--* N.B. some readers (notably JC) find the following
paragraph awkward and redundant.  I agree it's logically redundant:
it *says* it is summarizing the logical implications of
matching the grammar, and that means by definition it's
logically redundant.  I don't think it's rhetorically
redundant or unnecessary, though, so I'm keeping it.  It
could however use some recasting when the editors are feeling
stronger. -MSM *-->

No children
JDOM checks the content to make sure it's legal (i.e. does not contain a double-hyphen)

The Comment Class

package org.jdom;

public class Comment implements Serializable, Cloneable {

    protected String text;

    protected Comment() {}
    public    Comment(String text) {}
    
    public String getText() {}
    public void   setText(String text) {}
    
    public final String  toString() {}
    public final String  getSerializedForm() {} // will be removed
    public final boolean equals(Object ob) {}
    public final int     hashCode() {}
    public final Object  clone() {}

}

Comment Example

import org.jdom.*;
import org.jdom.input.SAXBuilder;
import java.util.*;


public class CommentReader {

  public static void main(String[] args) {
     
    SAXBuilder builder = new SAXBuilder();
    
    for (int i = 0; i < args.length; i++) {
      try {
        // Read the entire document into memory
        Document doc = builder.build(args[i]);
        List content = doc.getMixedContent();
        Iterator iterator = content.iterator();
        while (iterator.hasNext()) {
          Object o = iterator.next();
          if (o instanceof Comment) {
            Comment c = (Comment) o;
            System.out.println(c.getText());     
            System.out.println();     
          }
          else if (o instanceof Element) {
            processElement((Element) o);   
          }
        }
      }
      catch (JDOMException e) {
        System.err.println(e); 
        e.getRootCause().printStackTrace(); 
      }
      
    }
  
  } // end main

  // note use of recursion
  public static void processElement(Element element) {
    
    List content = element.getMixedContent();
    Iterator iterator = content.iterator();
    while (iterator.hasNext()) {
      Object o = iterator.next();
      if (o instanceof Comment) {
        Comment c = (Comment) o;
        System.out.println(c.getText());     
        System.out.println();     
      }
      else if (o instanceof Element) {
        processElement((Element) o);   
      }
    } // end while
    
  }

}

CommentReader Output

% java CommentReader hotcop.xml
 The publisher is actually Polygram but I needed
       an example of a general entity reference.

 You can tell what album I was
     listening to when I wrote this example

ProcessingInstruction Nodes

Represents a processing instruction like
<?robots index="yes" follow="no"?>
No children

Some have pseudo-attributes; some don't:

<?php 
  mysql_connect("database.unc.edu", "clerk", "password"); 
  $result = mysql("music", "SELECT LastName, FirstName FROM Employees 
    ORDER BY LastName, FirstName"); 
  $i = 0;
  while ($i < mysql_numrows ($result)) {
     $fields = mysql_fetch_row($result);
     echo "<person>$fields[1] $fields[0] </person>\r\n";
     $i++;
  }
  mysql_close();
?>

A ProcessingInstruction is represented as either
- Target and Value
- Target and Pseudo-attributes
As usual JDOM checks the contents of each processingInstruction object for well-formedness

The ProcessingInstruction Class

package org.jdom;

public class ProcessingInstruction implements Serializable, Cloneable {

    protected String target;
    protected String rawData;
    protected Map    mapData;

    protected ProcessingInstruction() {}
    public    ProcessingInstruction(String target, Map data) {}
    public    ProcessingInstruction(String target, String data) {}
    
    public String                getTarget() {}
    public String                getData() {}
    public ProcessingInstruction setData(String data) {}
    public ProcessingInstruction setData(Map data) {}
    public String                getValue(String name) {}
    public ProcessingInstruction setValue(String name, String value) {}
    public boolean               removeValue(String name) {}

    public final String toString() {}
    public final String getSerializedForm() {} // will be removed
    public final boolean equals(Object ob) {}
    public final int hashCode() {}
    public final Object clone() {}
}

XLinkSpider that Respects the robots Processing Instruction

import java.io.*;
import java.util.*;
import org.jdom.*;
import org.jdom.input.SAXBuilder;


public class XLinkSpider {

  private static SAXBuilder builder = new SAXBuilder();
  
  private static Vector visited = new Vector();
  
  private static int maxDepth = 5;
  private static int currentDepth = 0; 
  
  public static void listURIs(String systemID) {
    
    currentDepth++;
    try {
      if (currentDepth < maxDepth) {

        Document document = builder.build(systemID); 
                
        // check to see if we're allowed to spider
        boolean index = true;
        boolean follow = true;
        ProcessingInstruction robots 
         = document.getProcessingInstruction("robots");
        if (robots != null) {
          String indexValue = robots.getValue("index");
          if (indexValue.equalsIgnoreCase("no")) index = false;
          String followValue = robots.getValue("follow");
          if (followValue.equalsIgnoreCase("no")) follow = false;
        }
        Vector uris = new Vector();
        // search the document for uris, 
        // store them in vector, and print them
        if (follow) searchForURIs(document.getRootElement(), uris);
    
        Enumeration e = uris.elements();
        while (e.hasMoreElements()) {
          String uri = (String) e.nextElement();
          visited.addElement(uri);
          if (index) listURIs(uri); 
        }
      
      }
    
    }
    catch (JDOMException e) {
      // couldn't load the document, 
      // probably not well-formed XML, skip it 
    }
    finally { 
      currentDepth--;
      System.out.flush();     
    }
      
  }
  
  private static Namespace xlink = Namespace.getNamespace("http://www.w3.org/1999/xlink");
  
  // use recursion 
  public static void searchForURIs(Element element, Vector uris) {
    
    // look for XLinks in this element
    String uri = element.getAttributeValue("href", xlink);
    if (uri != null && !uri.equals("") && !visited.contains(uri) && !uris.contains(uri)) {
      System.out.println(uri);
      uris.addElement(uri);
    }
    
    // process child elements recursively
    List children = element.getChildren();
    Iterator iterator = children.iterator();
    while (iterator.hasNext()) {
      searchForURIs((Element) iterator.next(), uris); 
    }
    
  }

  public static void main(String[] args) {
    
    if (args.length == 0) {
      System.out.println("Usage: java XLinkSpider URL1 URL2..."); 
    } 
      
    // start parsing... 
    for (int i = 0; i < args.length; i++) {
      System.err.println(args[i]); 
      listURIs(args[i]);
    } // end for
  
  } // end main

} // end XLinkSpider

Handling Namespaces

JDOM is fully namespace aware
Namespaces are represented by instances of the Namespace class rather than by attributes or raw strings
Always ask for elements and attributes by local names and namespace URIs
Elements and attributes that are not in any namespace can be asked for by local name alone
Never identify an element or attribute by qualified name

The Namespace Class

Mostly for internal parser use
Occasionally useful for tasks like finding out whether a document contains any XLinks

The Namespace Class

package org.jdom;

public final class Namespace {

  public static final Namespace NO_NAMESPACE = new Namespace("", "");
  public static final Namespace XML_NAMESPACE = 
   new Namespace("xml", "http://www.w3.org/XML/1998/namespace");

  // factory methods
  public static Namespace getNamespace(String prefix, String uri) {}
  public static Namespace getNamespace(String uri) {}

  // getter methods
  public String  getPrefix() {}
  public String  getURI() {}

  // utility methods
  public boolean equals(Object ob) {}
  public String  toString() {}
  public int     hashCode() {}

}

DocType Nodes

Represents a document type declaration
Has no children

The DocType class

package org.jdom;

public class DocType implements Serializable, Cloneable {

    protected String elementName;
    protected String publicID;
    protected String systemID;

    protected DocType() {}
    public    DocType(String rootElementName, String publicID, String systemID) {}
    public    DocType(String rootElementName, String systemID) {}
    public    DocType(String rootElementName) {}

    public String  getElementName() {}
    public String  getPublicID() {}
    public DocType setPublicID(String publicID) {}
    public String  getSystemID() {}
    public DocType setSystemID(String systemID) {}

    // Usual utility methods
    public final String  toString() {}
    public final String  getSerializedForm() {} // will be removed
    public final boolean equals(Object ob) {}
    public final int     hashCode() {}
    public final Object  clone() {}
    
}

Example of the DocType Class

Verify that a document is correct XHTML
From the XHTML 1.0 spec:
1. It must validate against one of the three DTDs found in Appendix A.
2. The root element of the document must be <html>.
3. The root element of the document must designate the XHTML namespace using the xmlns attribute [XMLNAMES]. The namespace for XHTML is defined to be http://www.w3.org/1999/xhtml.
4. There must be a DOCTYPE declaration in the document prior to the root element. The public identifier included in the DOCTYPE declaration must reference one of the three DTDs found in Appendix A using the respective Formal Public Identifier. The system identifier may be changed to reflect local system conventions.
```
<!DOCTYPE html 
     PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
     "DTD/xhtml1-strict.dtd">

<!DOCTYPE html 
     PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
     "DTD/xhtml1-transitional.dtd">

<!DOCTYPE html 
     PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN"
     "DTD/xhtml1-frameset.dtd">
```

XHTMLValidator

import java.io.*;
import org.jdom.*;
import org.jdom.input.SAXBuilder;


public class XHTMLValidator {

  public static void main(String[] args) {
    
    for (int i = 0; i < args.length; i++) {
      validate(args[i]);
    }   
    
  }

  private static SAXBuilder builder = new SAXBuilder(true);
                                                 /*  ^^^^ */
                                              /* turn on validation  */
  
  // not thread safe
  public static void validate(String source) {
        
      Document document;
      try {
        document = builder.build(source); 
      }
      catch (JDOMException e) {  
        System.out.println("Error: " + e.getMessage()); 
        e.printStackTrace();
        return; 
      }
      
      // If we get this far, then the document is valid XML.
      // Check to see whether the document is actually XHTML        
      DocType doctype = document.getDocType();
    
      if (doctype == null) {
        System.out.println("No DOCTYPE"); 
        return;
      }

      String name     = doctype.getElementName();
      String systemID = doctype.getSystemID();
      String publicID = doctype.getPublicID();
      
      if (!name.equals("html")) {
        System.out.println("Incorrect root element name " + name); 
      }
    
      if (publicID == null
       || (!publicID.equals("-//W3C//DTD XHTML 1.0 Strict//EN")
           && !publicID.equals("-//W3C//DTD XHTML 1.0 Transitional//EN")
           && !publicID.equals("-//W3C//DTD XHTML 1.0 Frameset//EN"))) {
        System.out.println(source + " does not seem to use an XHTML 1.0 DTD");
      }
    
      // Check the namespace on the root element
      Element root = document.getRootElement();
      Namespace namespace = root.getNamespace();
      String prefix = namespace.getPrefix();
      String uri = namespace.getURI();
      if (!uri.equals("http://www.w3.org/1999/xhtml")) {
        System.out.println(source 
         + " does not properly declare the"
         + " http://www.w3.org/1999/xhtml namespace"
         + " on the root element");        
      }
      if (!prefix.equals("")) {
        System.out.println(source 
         + " does not use the empty prefix for XHTML");        
      }
    
  }

}

Using the XHTMLValidator

% java XHTMLValidator http://www.w3.org/TR/xhtml1
Error: File "http://www.w3.org/TR/DTD/xhtml1-strict.dtd" not found.: Error on 
line -1 of XML document: File "http://www.w3.org/TR/DTD/xhtml1-strict.dtd" not 
found.
org.jdom.JDOMException: File "http://www.w3.org/TR/DTD/xhtml1-strict.dtd" not 
found.: Error on line -1 of XML document: File 
"http://www.w3.org/TR/DTD/xhtml1-strict.dtd" not found.
        at org.jdom.input.SAXBuilder.build(SAXBuilder.java:227)
        at org.jdom.input.SAXBuilder.build(SAXBuilder.java:359)
        at XHTMLValidator.validate(XHTMLValidator.java:25)
        at XHTMLValidator.main(XHTMLValidator.java:11)
Root cause: org.jdom.JDOMException: Error on line -1 of XML document: File 
"http://www.w3.org/TR/DTD/xhtml1-strict.dtd" not found.
        at org.jdom.input.SAXBuilder.build(SAXBuilder.java:228)
        at org.jdom.input.SAXBuilder.build(SAXBuilder.java:359)
        at XHTMLValidator.validate(XHTMLValidator.java:25)
        at XHTMLValidator.main(XHTMLValidator.java:11)

The Verifier Class

Checks a variety of strings to see if they're legal for particular uses in XML as specified by XML 1.0 and Namespaces in XML.
Mostly for internal parser use

The Verifier Class

package org.jdom;

public final class Verifier {

    public static final String checkElementName(String name) {}
    public static final String checkAttributeName(String name) {}
    public static final String checkCharacterData(String text) {}
    public static final String checkNamespacePrefix(String prefix) {}
    public static final String checkNamespaceURI(String uri) {}
    public static final String checkProcessingInstructionTarget(String target) {}
    public static final String checkCommentData(String data) {}
 
    public static boolean isXMLCharacter(char c) {}
    public static boolean isXMLNameCharacter(char c) {}
    public static boolean isXMLNameStartCharacter(char c) {}
    public static boolean isXMLLetterOrDigit(char c) {}
    public static boolean isXMLLetter(char c) {}
    public static boolean isXMLCombiningChar(char c) {}
    public static boolean isXMLExtender(char c) {}
    public static boolean isXMLDigit(char c) {}

}

JDOMException

A checked exception so you must catch it
Wraps other exceptions that are thrown during JDOM operations like IOException or SAXException
Root cause of exception (if any) is accessible through the getRootCause() method:
public Throwable getRootCause()
Subclasses:
- DataConversionException
- NoSuchAttributeException
- NoSuchChildException
- NoSuchProcessingInstructionException
IllegalArgumentException subclasses:
- IllegalAddException
- IllegalDataException
- IllegalNameException
- IllegalTargetException

JDOMException Class

package org.jdom;

public class JDOMException extends Exception {

    protected Throwable rootCause;

    public JDOMException() {}
    public JDOMException(String message)  {}
    public JDOMException(String message, Throwable rootCause)  {} 
       
    public String    getMessage() {}
    public void      printStackTrace() {}
    public void      printStackTrace(PrintStream s) {}
    public void      printStackTrace(PrintWriter w) {}
    public Throwable getRootCause()  {}

}

The org.jdom.output Package

DOMOutputter
SAXOutputter
XMLOutputter

Serialization

The process of taking an in-memory JDOM Document and converting it to a stream of characters that can be written onto an output stream
The org.jdom.output.XMLOutputter class

XMLOutputter

This class is still undergoing API changes.

package org.jdom.output;

public class XMLOutputter implements Cloneable {

    protected static final String STANDARD_INDENT = "  ";
    
    public XMLOutputter() {}
    public XMLOutputter(String indent) {}
    public XMLOutputter(String indent, boolean newlines) {}
    public XMLOutputter(String indent, boolean newlines, String encoding) {}
    public XMLOutputter(XMLOutputter that) {}
    
    public void setLineSeparator(String separator) {}
    public void setNewlines(boolean newlines) {}
    public void setEncoding(String encoding) {}
    public void setOmitEncoding(boolean omitEncoding) {}
    public void setSuppressDeclaration(boolean suppressDeclaration) {}
    public void setExpandEmptyElements(boolean expandEmptyElements) {}
    public void setTrimText(boolean trimText) {}
    public void setPadText(boolean padText) {}
    public void setIndent(String indent) {}
    public void setIndent(boolean doIndent) {}
    public void setIndentLevel(int indentLevel) {}
    public void setIndentSize(int indentSize) {}

    protected void indent(Writer out, int level) throws IOException {}
    protected void maybePrintln(Writer out) throws IOException  {}
    protected Writer makeWriter(OutputStream out) 
     throws java.io.UnsupportedEncodingException {}
    protected Writer makeWriter(OutputStream out, String encoding) 
     throws java.io.UnsupportedEncodingException {}
     
    public void output(Document doc, OutputStream out) throws IOException {}
    public void output(Document doc, Writer writer) throws IOException {}
    public void output(Element element, Writer out) throws IOException {}
    public void output(Element element, OutputStream out) {}
    public void outputElementContent(Element element, Writer out) throws IOException {}
    public void output(CDATA cdata, Writer out) throws IOException {}
    public void output(CDATA cdata, OutputStream out) throws IOException {}
    public void output(Comment comment, Writer out) throws IOException {}
    public void output(Comment comment, OutputStream out) throws IOException {}
    public void output(String string, Writer out) throws IOException {}
    public void output(String string, OutputStream out) throws IOException {}
    public void output(Entity entity, Writer out) throws IOException {}
    public void output(Entity entity, OutputStream out) throws IOException {}
    public void output(ProcessingInstruction processingInstruction, Writer out)
      throws IOException {}
    public void output(ProcessingInstruction processingInstruction, OutputStream out)
     throws IOException {}
    public String outputString(Document doc) throws IOException {}
    public String outputString(Element element) throws IOException {}

    // internal printing methods
    protected void printDeclaration(Document doc, Writer out, String encoding) 
     throws IOException {}    
    protected void printDocType(DocType docType, Writer out) throws IOException {}
    protected void printComment(Comment comment, Writer out, int indentLevel) 
     throws IOException {}
    protected void printProcessingInstruction(ProcessingInstruction pi,
     Writer out, int indentLevel) throws IOException {}
    protected void printCDATASection(CDATA cdata, Writer out, int indentLevel) 
     throws IOException {}
    protected void printElement(Element element, Writer out,
     int indentLevel, NamespaceStack namespaces) throws IOException {}
    protected void printElementContent(Element element, Writer out,
     int indentLevel, NamespaceStack namespaces, List mixedContent) 
     throws IOException {}
    protected void printString(String s, Writer out) throws IOException {}
    protected void printEntity(Entity entity, Writer out) throws IOException {}
    protected void printNamespace(Namespace ns, Writer out) throws IOException {}
    protected void printAttributes(List attributes, Element parent, 
     Writer out, NamespaceStack namespaces)  
     throws IOException {}
    
    public int parseArgs(String[] args, int i) {} 
    
}

Using the XMLOutputter Class Directly

Configured with three variables passed to the constructor:

indent
a String added at each level of output; e.g. two spaces or a tab

lineSeparator
the String to break lines with, no line breaking is performed if this is null or the empty string

encoding
The name of the encoding to use for output; e.g. UTF-16 or ISO-8859-1

Options can be set with these twelve methods:

  public void setLineSeparator(String separator) {}
  public void setNewlines(boolean newlines) {}
  public void setEncoding(String encoding) {}
  public void setOmitEncoding(boolean omitEncoding) {}
  public void setSuppressDeclaration(boolean suppressDeclaration) {}
  public void setExpandEmptyElements(boolean expandEmptyElements) {}
  public void setTrimText(boolean trimText) {}
  public void setPadText(boolean padText) {}
  public void setIndent(String indent) {}
  public void setIndent(boolean doIndent) {}
  public void setIndentLevel(int indentLevel) {} 
  public void setIndentSize(int indentSize) {}

The output() method writes a Document onto a given OutputStream:

  public void output(Document doc, OutputStream out) throws IOException {}
  public void output(Document doc, Writer writer) throws IOException {}

There are also output() methods for other JDOM classes:

  public void output(Element element, Writer out) throws IOException {}
  public void output(Element element, OutputStream out) {}
  public void outputElementContent(Element element, Writer out) throws IOException {}
  public void output(CDATA cdata, Writer out) throws IOException {}
  public void output(CDATA cdata, OutputStream out) throws IOException {}
  public void output(Comment comment, Writer out) throws IOException {}
  public void output(Comment comment, OutputStream out) throws IOException {}
  public void output(String string, Writer out) throws IOException {}
  public void output(String string, OutputStream out) throws IOException {}
  public void output(Entity entity, Writer out) throws IOException {}
  public void output(Entity entity, OutputStream out) throws IOException {}
  public void output(ProcessingInstruction processingInstruction, Writer out)
    throws IOException {}
  public void output(ProcessingInstruction processingInstruction, OutputStream out)
   throws IOException {}
  public String outputString(Document doc) throws IOException {}
  public String outputString(Element element) throws IOException {}

Using the XMLOutputter Class Indirectly

Configured by overriding protected methods:

  protected void printDeclaration(Document doc, Writer out, String encoding) 
  throws IOException {}    
  protected void printDocType(DocType docType, Writer out) throws IOException {}
  protected void printComment(Comment comment, Writer out, int indentLevel) 
   throws IOException {}
  protected void printProcessingInstruction(ProcessingInstruction pi,
   Writer out, int indentLevel) throws IOException {}
  protected void printCDATASection(CDATA cdata, Writer out, int indentLevel) 
   throws IOException {}
  protected void printElement(Element element, Writer out,
   int indentLevel, NamespaceStack namespaces) throws IOException {}
  protected void printElementContent(Element element, Writer out,
   int indentLevel, NamespaceStack namespaces, List mixedContent) 
   throws IOException {}
  protected void printString(String s, Writer out) throws IOException {}
  protected void printEntity(Entity entity, Writer out) throws IOException {}
  protected void printNamespace(Namespace ns, Writer out) throws IOException {}
  protected void printAttributes(List attributes, Element parent, 
   Writer out, NamespaceStack namespaces)  
   throws IOException {}

JDOM based TagStripper

A bug in the current version of JDOM prevents this from working.

import org.jdom.*;
import org.jdom.output.XMLOutputter;
import org.jdom.input.SAXBuilder;
import java.io.*;
import java.util.*;


public class TagStripper extends XMLOutputter {

  public TagStripper() {
    super();
  }

  // Things we won't print at all
  protected void printDeclaration(Document doc, Writer out, String encoding) {}
  protected void printComment(Comment comment, Writer out, int indentLevel) {}
  protected void printDocType(DocType docType, Writer out) {}
  protected void printProcessingInstruction(ProcessingInstruction pi, 
   Writer out, int indentLevel) {}
  protected void printNamespace(Namespace ns, Writer out) {}
  protected void printAttributes(List attributes, Writer out) {}
  
  protected void printElement(Element element, Writer out, 
   int indentLevel, NamespaceStack namespaces) throws IOException {
    
    List content = element.getMixedContent();
    Iterator iterator = content.iterator();
    while (iterator.hasNext()) {
      Object o = iterator.next();
      if (o instanceof String) {
        out.write((String) o);
        this.maybePrintln(out);
      }
      else if (o instanceof Element) {
        printElement((Element) o, out, indentLevel, namespaces);
      }
    }
          
  }

  // Could easily have put main() method in a separate class
  public static void main(String[] args) {
     
    if (args.length == 0) {
      System.out.println(
       "Usage: java TagStripper URL1 URL2..."); 
    } 
      
    TagStripper stripper = new TagStripper();
    SAXBuilder builder   = new SAXBuilder();
    
    // start parsing... 
    for (int i = 0; i < args.length; i++) {
      
      // command line should offer URIs or file names
      try {
        Document doc = builder.build(args[i]);
        stripper.output(doc, System.out);
      }
      catch (JDOMException e) { // a well-formedness error
        System.out.println(args[i] + " is not well formed.");
        System.out.println(e.getMessage());
      }
      catch (IOException e) { // a well-formedness error
        System.out.println(e.getMessage());
      }
      
    }  
  
  }

}

Output from a JDOM based TagStripper

% java TagStripper hotcop.xml
Hot Cop
Jacques Morali
Henri Belolo
Victor Willis
Jacques Morali
A & M Records
6:20
1978
Village People

Talking to DOM Programs

The process of taking an in-memory JDOM Document and converting it to an org.w3c.dom.Document object

The org.jdom.output.DOMOutputter class:


package org.jdom.output;

public class DOMOutputter {

  // Constructors
  public DOMOutputter() {}

  // Outputter methods
  public org.w3c.dom.Document output(Document document) {}
  public org.w3c.dom.Element  output(Element element) {}
  public org.w3c.dom.Element  output(Element element, String domAdapterClass) {}
  public org.w3c.dom.Document output(Document document, String domAdapterClass) {}

  // utility methods
  protected void buildDOMTree(Object content, org.w3c.dom.Document doc, 
   org.w3c.dom.Element current, boolean atRoot, LinkedList namespaces) {}
  public String getXmlnsTagFor(Namespace ns);
    
}

Talking to SAX Programs

The process of taking an in-memory JDOM Document and walking its tree while firing off SAX events
The org.jdom.output.SAXOutputter class

What JDOM doesn't do

Documents larger than available memory
Byte-for-byte faithful round trips
DTDs
XPath Queries (may be added in 1.1)

To Learn More

JavaWorld: http://javaworld.com/javaworld/jw-05-2000/jw-0518-jdom.html
JDOM Web Site, http://www.jdom.org/
Java and XML, Brett McLaughlin, O'Reilly & Associates, 2000, ISBN 0-596-00016-2, http://www.oreilly.com/catalog/javaxml/

Part VI: XML Hypertext

The problem is that we're not providing the tools. We're providing the specs. That's a whole different ball game. If tools existed for actually making really interesting use of RDF and XLink and XInclude then people would use them. If IE and/or Mozilla supported the full gamut of specs, from XSLT 1.0 to XLink and XInclude (OK, so they're not quite REC's, but with time...) then you would find people using them more.

--Matt Sergeant on the xml-dev mailing list

HTML Hypertext is Limited

The Web conquered gopher for one reason: HTML made it possible to embed hypertext links in documents.
HTML linking has limits

You can only link to one document at a time
You must link to the entire document.
Once the link is traversed the trail of where you've been is lost.

Includes are server dependent and don't work across domains
Links break

XML Hypertext

Linking in XML is divided into multiple parts:

A Uniform Resource Identifier (URI) names or locates a resource
An XLink defines connections between two or more documents identified by URIs
XPath identifies particular nodes within a document
An XPointer adds an XPath to a URI
XBase defines the URI against which relative URIs are resolved
XInclude embeds a document identified by a URI inside an XML document.

XML Hypertext Example

<?xml version="1.0"?>
<story date="January 9, 2001"
       xmlns:xlink="http://www.w3.org/1999/xlink"
       xmlns:xinclude="http://www.w3.org/1999/XML/xinclude"
       xml:base="http://www.cafeaulait.org/">

  <p>
    The W3C XML Linking Working Group has pushed the 
    <link xlink:type="simple"
      xlink:href="http://www.w3.org/TR/2001/WD-xptr-20010108">
      XPointer specification
    </link> 
    back to working draft status. The specific issue that was 
    uncovered during Candidate Recommendation was some 
    <link xlink:type="simple"
      xlink:href="http://www.w3.org/TR/xptr#xpointer(//div[@class='div3'][7])">
      confusion
    </link> 
    over how to integrate XPointers, particularly those in non-XML documents, 
    with namespaces. 
   </p>

   <p>
     It's also come to light in this draft that Sun has 
     <link xlink:type="simple"
      xlink:href=
      "http://lists.w3.org/Archives/Public/www-xml-linking-comments/2000OctDec/0092.html"
      >
      claimed a patent</link> on some of the technologies needed to 
      implement XPointer. I think this is particularly offensive because Eve 
      L. Maler, a Sun employee, served as co-chair of the XML Linking 
      Working Group and a co-editor of the XPointer specification. As usual 
      Sun wants to use this as a club to lock implementers and users into a 
      licensing agreement that goes beyond what Sun and the W3C could 
      otherwise demand. The specific patent is <cite>United States Patent 
      No. 5,659,729, Method and system for implementing hypertext scroll 
      attributes</cite>, issued to Jakob Nielsen in 1997. The patent was 
      filed on February 1, 1996. It claims:
  </p>
  <blockquote>
    <xinclude:include 
      href=
      "http://www.delphion.com/details?&pn=US05659729__#xpointer(//abstract)"
      >
    </xinclude:include>
  </blockquote>
  
</story>

Versions

This talk covers:

XLinks: December 20, 2000 Proposed Recommendation
XPointers: January 8, 2001 2nd Last Call Working Draft
XPath: November 16, 1999 1.0 Specification
XInclude: October 26, 2000 Working Draft
XBase: December 20, 2000 Proposed Recommendation

Part I: XLinks

Once you've tasted XLink's Chunky Monkey, it's hard to reconcile yourself to HTML's vanilla.

--John E. Simpson on the xsl-list mailing list

XLinks are More Powerful

Designed especially for use with XML
Multidirectional
Any element can be a link, not just <A>
Can link to arbitrary positions in the document

Application Support

No general-purpose Web browsers or other applications support arbitrary XLinks.
XLinks have a much broader base of applicability than HTML links. They can be used by any custom application that needs to establish connections between documents and parts of documents, for any reason.
Even when XLinks are fully implemented in browsers they may not always be blue underlined text that you click to jump to another page.

Linking Elements

Any element can be a link
XLink elements are identified by an xlink:type attribute with one of these six values:
- simple
- extended
- locator
- arc
- resource
- title
Linking elements are identified by an xlink:type attribute with one of these two values:
- simple
- extended
Each linking element contains an xlink:href attribute whose value is the URI of the resource being linked to.
An xmlns:xlink attribute associates the xlink prefix with the http://www.w3.org/1999/xlink namespace.

For example

<FOOTNOTE xmlns:xlink="http://www.w3.org/1999/xlink"
          xlink:type="simple"
          xlink:href="footnote7.xml">7</FOOTNOTE>
<COMPOSER xmlns:xlink="http://www.w3.org/1999/xlink"
          xlink:type="simple"
          xlink:href="http://www.interport.net/~beand/">
    Beth Anderson
</COMPOSER>
<IMAGE xmlns:xlink="http://www.w3.org/1999/xlink"
       xlink:type="simple" xlink:href="logo.gif"/>

Declaring XLink Attributes in DTDs

<!ELEMENT FOOTNOTE (#PCDATA)>
<!ATTLIST FOOTNOTE
  xmlns:xlink CDATA #FIXED "http://www.w3.org/1999/xlink"
  xlink:type  CDATA #FIXED "simple"
  xlink:href  CDATA #REQUIRED
>
<!ELEMENT COMPOSER (#PCDATA)>
<!ATTLIST COMPOSER
  xmlns:xlink CDATA #FIXED "http://www.w3.org/1999/xlink"
  xlink:type  CDATA #FIXED "simple"
  xlink:href  CDATA #REQUIRED
>
<!ELEMENT IMAGE EMPTY>
<!ATTLIST IMAGE
  xmlns:xlink CDATA #FIXED "http://www.w3.org/1999/xlink"
  xlink:type  CDATA #FIXED "simple"
  xlink:href  CDATA #REQUIRED
>

Fixed Attributes

<FOOTNOTE xlink:href="footnote7.xml">7</FOOTNOTE>
<COMPOSER xlink:href="http://www.interport.net/~beand/">
  Beth Anderson
</COMPOSER>
<IMAGE xlink:href="logo.gif"/>

Other Attributes

<!ELEMENT IMAGE EMPTY>
<!ATTLIST IMAGE
  xmlns:xlink CDATA #FIXED "http://www.w3.org/1999/xlink"
  xlink:type  CDATA #FIXED "simple"
  xlink:href  CDATA #REQUIRED
  ALT         CDATA #REQUIRED
  HEIGHT      CDATA #REQUIRED
  WIDTH       CDATA #REQUIRED
>

Descriptions of the Remote Resource

A link element may contain optional xlink:role and xlink:title attributes that describe the remote resource, that is, the document or other resource to which the link points
The title contains a short plain text description.

The role contains a URI pointing to a long description.

<AUTHOR 
 xmlns:xlink="http://www.w3.org/1999/xlink"
 xlink:href="mailto:elharo@metalab.unc.edu"
 xlink:title="Send email to Elliotte Rusty Harold" 
 xlink:role="http://www.macfaq.com/personal.html">
  Please drop me a line.
</AUTHOR>

As with all other attributes, the xlink:title and xlink:role attributes should be declared in the DTD for all the elements to which they belong. For example, this is a reasonable declaration for the above AUTHOR element:

<!ELEMENT AUTHOR (#PCDATA)>
<!ATTLIST AUTHOR
  xmlns:xlink CDATA #FIXED "http://www.w3.org/1999/xlink"
  xlink:type  CDATA #FIXED "simple"
  xlink:href  CDATA #REQUIRED
  xlink:title CDATA #IMPLIED
  xlink:role  CDATA #IMPLIED
>

Link Behavior

Linking elements can contain two more optional attributes that suggest to applications how the remote resource is associated with the current page. These are:

xlink:show suggests where the content should be displayed when the link is activated
xlink:actuate suggests whether the link should be traversed automatically or whether a specific user request is required
These are application dependent, however, and applications are free to ignore the suggestions.

xlink:show

The xlink:show attribute has five predefined values:
- replace
- new
- embed
- other
- none

Like all attributes in valid documents, the xlink:show attribute must be declared in a <!ATTLIST> declaration for the DTD's link element. For example:

<!ELEMENT WEBSITE (#PCDATA)>
<!ATTLIST WEBSITE 
    xmlns:xlink CDATA  #FIXED "http://www.w3.org/1999/xlink"
    xlink:type CDATA   #FIXED "simple"
    xlink:href CDATA   #REQUIRED
    xlink:show (new | replace | embed) "replace"
>

xlink:actuate

A linking element's xlink:actuate attribute has four predefined values:

onRequest
onLoad
other
none

<IMAGE 
  xmlns:xlink="http://www.w3.org/1999/xlink" 
       xlink:type="simple" xlink:href="logo.gif"
       xlink:actuate="onLoad"/>

Like all attributes in valid documents, the actuate attribute must be declared in the DTD in a <!ATTLIST> declaration for the link elements in which it appears. For example:

<!ELEMENT IMAGE EMPTY>
<!ATTLIST IMAGE 
 xmlns:xlink CDATA #FIXED "http://www.w3.org/1999/xlink"
  xlink:type CDATA #FIXED "simple"
  xlink:href CDATA #REQUIRED
  xlink:show    (new | replace | embed) "embed"
  xlink:actuate (onRequest | onLoad)    "onLoad"
>

Parameter Entities for Link Attributes

<!ENTITY % link-attributes
   "xlink:type     CDATA  #FIXED 'simple'
    xlink:role     CDATA  #IMPLIED
    xlink:title    CDATA  #IMPLIED

    xmlns:xlink    CDATA  #FIXED 'http://www.w3.org/1999/xlink'
    xlink:href     CDATA  #REQUIRED
    xlink:show     (new | replace | embed) 'replace'
    xlink:actuate  (onRequest | onLoad)    'onRequest'"
>

<!ELEMENT COMPOSER (#PCDATA)>
<!ATTLIST COMPOSER 
    %link-attributes;
>
<!ELEMENT AUTHOR (#PCDATA)>
<!ATTLIST AUTHOR
    %link-attributes;
>
<!ELEMENT WEBSITE (#PCDATA)>
<!ATTLIST WEBSITE
    %link-attributes;
>

Extended Links

Simple links are very similar to HTML links, one-directional, one-element-to-one-document links
Extended links are multi-directional, many-to-many links
An extended link is a list of nodes and a list of the connections between them

Extended Links

An extended link is included in an XML document as an element of some arbitrary type like COMPOSER or TEAM that has an xlink:type attribute with the value extended.

<WEBSITE xmlns:xlink="http://www.w3.org/1999/xlink"
         xlink:type="extended">
 ...
</WEBSITE>

Resources

Extended links generally point to more than one target and from more than one source. Both sources and targets are called by the more generic word resource.
Resources are divided into remote resources and local resources.
A local resource is actually contained inside the extended link element. It is enclosed in element of arbitrary type that has an xlink:type attribute with the value resource.
A remote resource exists outside the extended link element, very possibly in another document. The extended link element contains locator child elements that point to the remote resource. These are elements with any name that have an xlink:type attribute with the value locator. Each locator element has an xlink:href attribute whose value is a URI locating the remote resource.

Resource Example

<WEBSITE xmlns:xlink="http://www.w3.org/1999/xlink" 
         xlink:type="extended">
  <NAME xlink:type="resource">Cafe au Lait</NAME>
  <HOMESITE xlink:type="locator" 
            xlink:href="http://ibiblio.org/javafaq/"/>
  <MIRROR xlink:type="locator" 
          xlink:href="http://sunsite.kth.se/javafaq"/>
  <MIRROR xlink:type="locator" 
          xlink:href="http://sunsite.informatik.rwth-aachen.de/javafaq/"/>
  <MIRROR xlink:type="locator" 
          xlink:href="http://sunsite.cnlab-switch.ch/javafaq/"/>
</WEBSITE>

This WEBSITE element describes an extended link with five resources:

The text "Cafe au Lait", a local resource
The document at http://ibiblio.org/javafaq/, a remote resource
The document at http://sunsite.kth.se/javafaq, a remote resource
The document at http://sunsite.informatik.rwth-aachen.de/javafaq/, a remote resource
The document at http://sunsite.cnlab-switch.ch/javafaq/, a remote resource

Since one of the resources referenced by this extended link is contained in the extended link, it is called an inline link. It will be included as part of one of the documents it connects.

Resource Example Diagram

This picture shows the WEBSITE extended link element and five resources, one of which WEBSITE contains, the other four of which are referred to by URLs. However, this just describes these resources. No connections are implied between them.

Four local and one remote resource with no connections

Roles and Titles for Resources

Both the extended link element itself and the individual locator children may have descriptive attributes such as xlink:role and xlink:title.
The xlink:role and xlink:title attributes of the extended link element provide default roles and titles for each of the individual locator child elements.
Individual resource and locator elements may override these defaults with xlink:role and xlink:title attributes of their own.

<WEBSITE xmlns:xlink="http://www.w3.org/1999/xlink" 
         xlink:type="extended" xlink:title="Cafe au Lait">
  <NAME xlink:type="resource" 
        xlink:role="http://ibiblio.org/javafaq/">
    Cafe au Lait
  </NAME>
  <HOMESITE xlink:type="locator" 
          xlink:href="http://ibiblio.org/javafaq/"
          xlink:role="http://ibiblio.org/"/>
  <MIRROR xlink:type="locator" 
         xlink:title="Cafe au Lait Swedish Mirror"
         xlink:role="http://sunsite.kth.se/"
         xlink:href="http://sunsite.kth.se/javafaq"/>
  <MIRROR xlink:type="locator" 
         xlink:title="Cafe au Lait German Mirror"
         xlink:role="http://sunsite.informatik.rwth-aachen.de/"
         xlink:href=
          "http://sunsite.informatik.rwth-aachen.de/javafaq/"/>
  <MIRROR xlink:type="locator" 
         xlink:title="Cafe au Lait Swiss Mirror"
         xlink:role="http://sunsite.cnlab-switch.ch/"
         xlink:href="http://sunsite.cnlab-switch.ch/javafaq/"/>
</WEBSITE>

DTD for Extended Links

<!ELEMENT WEBSITE (NAME, HOMESITE, MIRROR*) >
<!ATTLIST WEBSITE
  xmlns:xlink  CDATA     #FIXED "http://www.w3.org/1999/xlink"
  xlink:type  (extended) #FIXED  "extended"
  xlink:title  CDATA     #IMPLIED
  xlink:role   CDATA     #IMPLIED
>

<!ELEMENT NAME (#PCDATA)>
<!ATTLIST NAME
   xlink:type  (resource) #FIXED    "resource"
   xlink:role   CDATA     #IMPLIED
   xlink:title  CDATA     #IMPLIED
>

<!ELEMENT HOMESITE (#PCDATA)>
<!ATTLIST HOMESITE
   xlink:type  (locator)  #FIXED    "locator"
   xlink:href   CDATA     #REQUIRED
   xlink:role   CDATA     #IMPLIED
   xlink:title  CDATA     #IMPLIED
>

<!ELEMENT MIRROR (#PCDATA)>
<!ATTLIST MIRROR
   xlink:type  (locator)  #FIXED    "locator"
   xlink:href   CDATA     #REQUIRED
   xlink:role   CDATA     #IMPLIED
   xlink:title  CDATA     #IMPLIED
>

Another Shortcut for the DTD

<!ENTITY % extended.att
  "xlink:type   CDATA    #FIXED 'extended'
   xmlns:xlink  CDATA    #FIXED 'http://www.w3.org/1999/xlink'
   xlink:role   CDATA    #IMPLIED
   xlink:title  CDATA    #IMPLIED"
>

<!ENTITY % resource.att
  "xlink:type (resource) #FIXED  'resource'
   xlink:href    CDATA   #REQUIRED
   xlink:role    CDATA   #IMPLIED
   xlink:title   CDATA   #IMPLIED"
>

<!ENTITY % locator.att
  "xlink:type (locator)  #FIXED  'locator'
   xlink:href    CDATA   #REQUIRED
   xlink:role    CDATA   #IMPLIED
   xlink:title   CDATA   #IMPLIED"
>

<!ELEMENT WEBSITE (HOMESITE, MIRROR*) >
<!ATTLIST WEBSITE
   %extended.att;
>

<!ELEMENT NAME (#PCDATA)>
<!ATTLIST NAME
   %resource.att;
>

<!ELEMENT HOMESITE (#PCDATA)>
<!ATTLIST HOMESITE
   %locator.att;
>

<!ELEMENT MIRROR (#PCDATA)>
<!ATTLIST MIRROR
   %locator.att;
>

Arcs

In an extended link with three resources, A, B, and C; there are nine different possible traversals.
- A --> A
- B --> B
- C --> C
- A --> B
- B --> A
- A --> C
- C --> A
- B --> C
- C --> B
These potential traversals are called arcs
Arcs are represented in XML by elements that have an xlink:type attribute with the value arc.
Traversal rules are defined by attaching xlink:actuate and xlink:show attributes to arc elements.
An arc element has an xlink:from attribute and an xlink:to attribute.
These attributes match the xlink:label attributes of the locator element in the extended link from which traversal is initiated and to which the traversal goes.

Arc Example

<?xml version="1.0"?>
<WEBSITE xmlns:xlink="http://www.w3.org/1999/xlink" 
         xlink:type="extended" xlink:title="Cafe au Lait">
         
  <NAME xlink:type="resource" xlink:label="source">
    Cafe au Lait
  </NAME>

  <HOMESITE xlink:type="locator" 
            xlink:href="http://ibiblio.org/javafaq/"
            xlink:label="us"/>
  
  <MIRROR xlink:type="locator" 
          xlink:title="Cafe au Lait Swedish Mirror"
          xlink:label="se"
          xlink:href="http://sunsite.kth.se/javafaq"/>
  
  <MIRROR xlink:type="locator" 
          xlink:title="Cafe au Lait German Mirror"
          xlink:label="de"
          xlink:href="http://sunsite.informatik.rwth-aachen.de/javafaq/"/>
  
  <MIRROR xlink:type="locator" 
          xlink:title="Cafe au Lait Swiss Mirror"
          xlink:label="ch"
          xlink:href="http://sunsite.cnlab-switch.ch/javafaq/"/>
  
  <CONNECTION xlink:type="arc" xlink:from="source" 
              xlink:to="ch"    xlink:show="replace" 
              xlink:actuate="onRequest"/>
  <CONNECTION xlink:type="arc" xlink:from="source" 
              xlink:to="us"    xlink:show="replace" 
              xlink:actuate="onRequest"/>
  <CONNECTION xlink:type="arc" xlink:from="source" 
              xlink:to="se"    xlink:show="replace" 
              xlink:actuate="onRequest"/>
  <CONNECTION xlink:type="arc" xlink:from="source" 
              xlink:to="sk"    xlink:show="replace" 
              xlink:actuate="onRequest"/>
  
</WEBSITE>

Arc Example Diagram

Arc Example

<WEBSITE xmlns:xlink="http://www.w3.org/1999/xlink" 
         xlink:type="extended" xlink:title="Cafe au Lait">
         
  <NAME xlink:type="resource" xlink:label="source">
    Cafe au Lait
  </NAME>

  <HOMESITE xlink:type="locator" 
           xlink:href="http://ibiblio.org/javafaq/"
           xlink:label="us"/>
  
  <MIRROR xlink:type="locator" 
         xlink:title="Cafe au Lait Swedish Mirror"
         xlink:label="mirror"
         xlink:href="http://sunsite.kth.se/javafaq"/>
  
  <MIRROR xlink:type="locator" 
         xlink:title="Cafe au Lait German Mirror"
         xlink:label="mirror"
         xlink:href="http://sunsite.informatik.rwth-aachen.de/javafaq/"/>
  
  <MIRROR xlink:type="locator" 
         xlink:title="Cafe au Lait Swiss Mirror"
         xlink:label="mirror"
         xlink:href="http://sunsite.cnlab-switch.ch/javafaq/"/>
  
  <CONNECTION xlink:type="arc"  xlink:from="source" 
              xlink:to="mirror" xlink:show="replace" 
              xlink:actuate="onRequest"/>

</WEBSITE>

Arc Example Diagram

Arc Example with omitted to attribute

<?xml version="1.0"?>
<WEBSITE xmlns:xlink="http://www.w3.org/1999/xlink" 
         xlink:type="extended" xlink:title="Cafe au Lait">
         
  <NAME xlink:type="resource" xlink:label="source">
    Cafe au Lait
  </NAME>

  <HOMESITE xlink:type="locator" 
            xlink:href="http://ibiblio.org/javafaq/"
            xlink:label="us"/>
  
  <MIRROR xlink:type="locator" 
          xlink:title="Cafe au Lait Swedish Mirror"
          xlink:label="se"
          xlink:href="http://sunsite.kth.se/javafaq"/>
  
  <MIRROR xlink:type="locator" 
          xlink:title="Cafe au Lait German Mirror"
          xlink:label="sk"
          xlink:href="http://sunsite.informatik.rwth-aachen.de/javafaq/"/>
  
  <MIRROR xlink:type="locator" 
          xlink:title="Cafe au Lait Swiss Mirror"
          xlink:label="ch"
          xlink:href="http://sunsite.cnlab-switch.ch/javafaq/"/>
  
  <xlink:arc from="source" show="new" actuate="onRequest"/>

  <CONNECTION xlink:type="arc" xlink:from="source" 
              xlink:show="replace" xlink:actuate="onRequest"/>

</WEBSITE>

Arc Example Diagram

Arcs can return to the same resource they started from

Arc DTD Fragment

<!ELEMENT WEBSITE (HOMESITE, MIRROR*, xlink:arc*) >
<!ATTLIST WEBSITE
  xmlns:xlink  CDATA  #FIXED "http://www.w3.org/1999/xlink"
  xlink:type  (extended) #FIXED  "extended"
  xlink:title  CDATA     #IMPLIED
  xlink:label  CDATA     #IMPLIED
>

<!ELEMENT HOMESITE (#PCDATA)>
<!ATTLIST HOMESITE
   xlink:type     (locator) #FIXED  "locator"
   xlink:href      CDATA    #REQUIRED
   xlink:label     CDATA    #REQUIRED
   xlink:title     CDATA    #IMPLIED
>

<!ELEMENT MIRROR (#PCDATA)>
<!ATTLIST MIRROR
   xlink:type     (locator) #FIXED  "locator"
   xlink:href      CDATA    #REQUIRED
   xlink:label     CDATA    #REQUIRED
   xlink:title     CDATA    #IMPLIED
>

<!ELEMENT xlink:arc EMPTY>
<!ATTLIST CONNECTION
  xlink:type     (arc)               #FIXED   "arc"
  xlink:from     CDATA               #IMPLIED
  xlink:to       CDATA               #IMPLIED
  xlink:show    (replace)            "replace"
  xlink:actuate (onRequest | onLoad) "onRequest"
>

Out-of-Line Links

Inline links, such as the familiar A element from HTML, are themselves part of the source or target of the link. The source of the link, that is the blue underlined text, is included inside the A element that defines the link. Most simple links are inline.
An out-of-line link does not contain any part of any of the resources it connects. Instead, the links are stored in a separate document called the linkbase.
Out of line links allow you to add links to and from documents that can't be modified such as a page on someone else's web site.
Out of line links allow you to add links to different parts of non-XML content.
Out of line links are not yet supported by software.

Out of line Link example

Out of line Link example

Out of line Link Example

<COURSE xmlns:xlink="http://www.w3.org/1999/xlink"
         xlink:type="extended">

  <TOC xlink:type="locator" 
          xlink:href="http://www.ibiblio.org/javafaq/course/" 
          xlink:label="index"/>

  <CLASS xlink:type="locator" xlink:label="class" xlink:label="class"
         xlink:href="http://www.ibiblio.org/javafaq/course/week1.xml"/>
  <CLASS xlink:type="locator" xlink:label="class" 
         xlink:href="http://www.ibiblio.org/javafaq/course/week2.xml"/>
  <CLASS xlink:type="locator" xlink:label="class" 
         xlink:href="http://www.ibiblio.org/javafaq/course/week3.xml"/>
  <CLASS xlink:type="locator" xlink:label="class" 
         xlink:href="http://www.ibiblio.org/javafaq/course/week4.xml"/>
  <CLASS xlink:type="locator" xlink:label="class" 
         xlink:href="http://www.ibiblio.org/javafaq/course/week5.xml"/>
  <CLASS xlink:type="locator" xlink:label="class" 
         xlink:href="http://www.ibiblio.org/javafaq/course/week6.xml"/>
  <CLASS xlink:type="locator" xlink:label="class" 
         xlink:href="http://www.ibiblio.org/javafaq/course/week7.xml"/>
  <CLASS xlink:type="locator" xlink:label="class" 
         xlink:href="http://www.ibiblio.org/javafaq/course/week8.xml"/>
  <CLASS xlink:type="locator" xlink:label="class" 
         xlink:href="http://www.ibiblio.org/javafaq/course/week9.xml"/>
  <CLASS xlink:type="locator" xlink:label="class" 
         xlink:href="http://www.ibiblio.org/javafaq/course/week10.xml"/> 
  <CLASS xlink:type="locator" xlink:label="class" 
         xlink:href="http://www.ibiblio.org/javafaq/course/week11.xml"/> 
  <CLASS xlink:type="locator" xlink:label="class" 
         xlink:href="http://www.ibiblio.org/javafaq/course/week12.xml"/>
  <CLASS xlink:type="locator" xlink:label="class" 
         xlink:href="http://www.ibiblio.org/javafaq/course/week13.xml"/>
  
  <CONNECTION xlink:type="arc" from="index" to="class"/>
  <CONNECTION xlink:type="arc" from="class" to="index"/>
  
</COURSE>

Another Out of line Link Example

<COURSE xmlns:xlink="http://www.w3.org/1999/xlink"
         xlink:type="extended">

  <CLASS xlink:type="locator" xlink:label="1"
         xlink:href="http://www.ibiblio.org/javafaq/course/week1.xml"/>
  <CLASS xlink:type="locator" xlink:label="2" 
         xlink:href="http://www.ibiblio.org/javafaq/course/week2.xml"/>
  <CLASS xlink:type="locator" xlink:label="3"
         xlink:href="http://www.ibiblio.org/javafaq/course/week3.xml"/>
  <CLASS xlink:type="locator" xlink:label="4"
         xlink:href="http://www.ibiblio.org/javafaq/course/week4.xml"/>
  <CLASS xlink:type="locator" xlink:label="5" 
         xlink:href="http://www.ibiblio.org/javafaq/course/week5.xml"/>
  <CLASS xlink:type="locator" xlink:label="6"
         xlink:href="http://www.ibiblio.org/javafaq/course/week6.xml"/>
  <CLASS xlink:type="locator"  xlink:label="7" 
         xlink:href="http://www.ibiblio.org/javafaq/course/week7.xml"/>
  <CLASS xlink:type="locator"   xlink:label="8"
         xlink:href="http://www.ibiblio.org/javafaq/course/week8.xml"/>
  <CLASS xlink:type="locator" xlink:label="9"
         xlink:href="http://www.ibiblio.org/javafaq/course/week9.xml"/>
  <CLASS xlink:type="locator"  xlink:label="10"
         xlink:href="http://www.ibiblio.org/javafaq/course/week10.xml"/> 
  <CLASS xlink:type="locator" xlink:label="11" 
         xlink:href="http://www.ibiblio.org/javafaq/course/week11.xml"/> 
  <CLASS xlink:type="locator" xlink:label="12" 
         xlink:href="http://www.ibiblio.org/javafaq/course/week12.xml"/>
  <CLASS xlink:type="locator" xlink:label="13" 
         xlink:href="http://www.ibiblio.org/javafaq/course/week13.xml"/>
  
  <!-- Previous Links --> 
  <CONNECTION xlink:type="arc" xlink:from="2" xlink:to="1"/>
  <CONNECTION xlink:type="arc" xlink:from="3" xlink:to="2"/>
  <CONNECTION xlink:type="arc" xlink:from="4" xlink:to="3"/>
  <CONNECTION xlink:type="arc" xlink:from="5" xlink:to="4"/>
  <CONNECTION xlink:type="arc" xlink:from="6" xlink:to="5"/>
  <CONNECTION xlink:type="arc" xlink:from="7" xlink:to="6"/>
  <CONNECTION xlink:type="arc" xlink:from="8" xlink:to="7"/>
  <CONNECTION xlink:type="arc" xlink:from="9" xlink:to="8"/>
  <CONNECTION xlink:type="arc" xlink:from="10" xlink:to="9"/>
  <CONNECTION xlink:type="arc" xlink:from="11" xlink:to="10"/> 
  <CONNECTION xlink:type="arc" xlink:from="12" xlink:to="11"/> 
  <CONNECTION xlink:type="arc" xlink:from="13" xlink:to="12"/>
  
  <!-- Next Links --> 
  <CONNECTION xlink:type="arc" xlink:from="1" xlink:to="2"/>
  <CONNECTION xlink:type="arc" xlink:from="2" xlink:to="3"/>
  <CONNECTION xlink:type="arc" xlink:from="3" xlink:to="4"/>
  <CONNECTION xlink:type="arc" xlink:from="4" xlink:to="5"/>
  <CONNECTION xlink:type="arc" xlink:from="5" xlink:to="6"/>
  <CONNECTION xlink:type="arc" xlink:from="6" xlink:to="7"/>
  <CONNECTION xlink:type="arc" xlink:from="7" xlink:to="8"/>
  <CONNECTION xlink:type="arc" xlink:from="8" xlink:to="9"/>
  <CONNECTION xlink:type="arc" xlink:from="9" xlink:to="10"/>
  <CONNECTION xlink:type="arc" xlink:from="10" xlink:to="11"/> 
  <CONNECTION xlink:type="arc" xlink:from="11" xlink:to="12"/> 
  <CONNECTION xlink:type="arc" xlink:from="12" xlink:to="13"/>
  
</COURSE>

Linkbases

A single XML document may contain multiple out-of-line extended links. However, the current XLink specification is relatively silent on exactly what the format of such a compound document should look like. About all it says is that such a document must be a well-formed XML document. An XLink processor would presumably read the entire document an extract any extended links that indicate connections to or from the current document.
A browser or other application that's reading the individual pages needs to be informed that there is a separate linkbase elsewhere that it should read and parse so that it can show the links to the user.
Ideally it would be handled through some external mechanism like HTTP headers.
The only currently defined way to do this is to add an arc element inside the documents the out-of-line link connects. This arc has an xlink:arcrole attribute with the value http://www.w3.org/1999/xlink/properties/linkbase. Its xlink:to attribute points to the linkbase.

<METADATA xlink:type="xlink:extended"
          xmlns:xlink="http://www.w3.org/1999/xlink">
  <LINKBASE xlink:type="arc"
            xmlns:xlink="http://www.w3.org/1999/xlink"
            xlink:arcrole="http://www.w3.org/1999/xlink/properties/linkbase"
            xlink:to="courselinks"/>
  <RESOURCE xlink:type="locator" href="courselinks.xml" 
            xlink:label="courselinks"/>
</METADATA>

XLink Summary

XLinks can do everything HTML links can do and quite a bit more, but they aren't supported by current applications.
XLink elements of all types are placed in the http://www.w3.org/1999/xlink namespace, normally with the xlink prefix.
Simple links behave much like HTML links, but they are not restricted to a single <A> tag.
Linking elements are identified by xlink:type attributes.
Simple link elements are identified by xlink:type attributes with the value simple.
Linking elements can describe the resource they're linking to with xlink:title and xlink:role attributes.
Linking elements can use the xlink:show attribute to tell the application how the content should be displayed when the link is activated, for example, by opening a new window.
Linking elements can use the xlink:actuate attribute to tell the application whether the link should be traversed without a specific user request.
Extended link elements are identified by xlink:type attributes with the value extended.
Extended links can contain multiple locators, resources, and arcs. Currently, it's left to the application to decide how to choose between different alternatives.
A resource element represents a local, inline resource. It is identified by an xlink:type attributes with the value resource.
A locator element represents a remote, out-of-line resource. It is identified by an xlink:type attribute with the value locator.
Both locator and resource elements can be labeled by xlink:label attributes. These labels are used to define arcs between resources.
A locator element has an xlink:href attribute whose value is the URI of the resource it locates.
Arc elements are identified by xlink:type attributes with the value arc.
Arc elements have xlink:from and xlink:to attributes of IDREF type that identify the resources they connect by their labels.
Arc elements may have xlink:show and xlink:actuate attributes to determine when and how traversal of the link occurs.
An out-of-line link is a link that does not contain any local resources.
A linkbase is a document containing multiple out-of-line, extended link elements.
A linkbase is found when a document with an extended link with the role xlink:external-linkset is read.

To Learn More

This presentation: http://www.ibiblio.org/xml/slides/xmlonelondon2001/xlinks/
XLink Specification: http://www.w3.org/TR/xlink/
Chapter 16 of the XML Bible: http://www.ibiblio.org/xml/books/bible/updates/16.html
Chapter 10 of XML in a Nutshell

Part II: XPointers

The many advantages of descriptive pointing are crucial for a scalable, generic pointing system. Descriptive pointing is crucial for all the same reasons that descriptive markup is crucial to documents, and that making links first-class objects is crucial to linking. It is also clearly feasible, as shown by multiple implementations of the prior WDs from the XML WG, and of TEI extended pointers.

--XML Linking Working Group, XML XPointer Requirements

XPointers

Why Use XPointers?
XPointer Examples
A Concrete Example
Location Paths, Steps, and Sets
Axes
Node Tests
Predicates
Functions that Return Node Sets
Points
Ranges
Child Sequences

What are XPointers?

XPointer, the XML Pointer Language, defines an addressing scheme for individual parts of an XML document.
XLinks point to a URI (in practice, a URL) that specifies a particular resource.
The URI may include an XPointer part that more specifically identifies the desired part or element of the targeted resource or document.
XPointers use the same XPath syntax you're familiar with from XSL transformations to identify the parts of the document they point to, along with a few additional pieces.

Why Use XPointers?

The element with a given ID
All elements that possess a certain attribute
The first element of a certain type
The last element whose class attribute has the value pending.
The seventh element of a given type
The first child of the seventh element
and many more including combinations of these addresses...

XPointer Examples

xpointer(id("ebnf"))
xpointer(descendant::language[position()=2])
ebnf
xpointer(/child::spec/child::body/child::*/child::language[position()=2])
/1/14/2
xpointer(id("ebnf"))xpointer(id("EBNF"))

XPointers in URIs

The XPointer does not specify the document. A URI does.
XPointers can be used as fragment identifiers in a URI after a #
For example,
http://www.w3.org/TR/1998/REC-xml-19980210.xml#xpointer(id("ebnf")) http://www.w3.org/TR/1998/REC-xml-19980210.xml#xpointer(descendant::language[position()=2]) http://www.w3.org/TR/1998/REC-xml-19980210.xml#ebnf http://www.w3.org/TR/1998/REC-xml-19980210.xml#xpointer(/child::spec/child::body/child::*/child::language[position()=2]) http://www.w3.org/TR/1998/REC-xml-19980210.xml#/1/14/2 http://www.w3.org/TR/1998/REC-xml-19980210.xml#xpointer(id("ebnf"))xpointer(id("EBNF"))

XPointers in XLinks

<SPECIFICATION xmlns:xlink="http://www.w3.org/1999/xlink" xlink:type="simple" xlink:href="http://www.w3.org/TR/1998/REC-xml-19980210.xml#xpointer(id('ebnf'))"> xlink:actuate="onRequest" xlink:show="replace"> Extensible Markup Language (XML) 1.0 </SPECIFICATION>

A Concrete Example

<?xml version="1.0"?>
<!DOCTYPE FAMILYTREE [

  <!ELEMENT FAMILYTREE (PERSON | FAMILY)*>

  <!-- PERSON elements --> 
  <!ELEMENT PERSON (NAME*, BORN*, DIED*, SPOUSE*)>
  <!ATTLIST PERSON 
    ID      ID     #REQUIRED
    FATHER  CDATA  #IMPLIED
    MOTHER  CDATA  #IMPLIED
  >
  <!ELEMENT NAME (#PCDATA)>
  <!ELEMENT BORN (#PCDATA)>
  <!ELEMENT DIED  (#PCDATA)>
  <!ELEMENT SPOUSE EMPTY>
  <!ATTLIST SPOUSE IDREF IDREF #REQUIRED>
  
  <!--FAMILY--> 
  <!ELEMENT FAMILY (HUSBAND?, WIFE?, CHILD*) >
  <!ATTLIST FAMILY ID ID #REQUIRED>
  
  <!ELEMENT HUSBAND EMPTY>
  <!ATTLIST HUSBAND IDREF IDREF #REQUIRED>
  <!ELEMENT WIFE EMPTY>
  <!ATTLIST WIFE IDREF IDREF #REQUIRED>
  <!ELEMENT CHILD EMPTY>
  <!ATTLIST CHILD IDREF IDREF #REQUIRED>

]>
<FAMILYTREE>

  <PERSON ID="p1">
    <NAME>Domeniquette Celeste Baudean</NAME>
    <BORN>21 Apr 1836</BORN>
    <DIED>Unknown</DIED>
    <SPOUSE IDREF="p2"/>
  </PERSON>

  <PERSON ID="p2">
    <NAME>Jean Francois Bellau</NAME>
    <SPOUSE IDREF="p1"/>
  </PERSON>

  <PERSON ID="p3" FATHER="p2" MOTHER="p1">
    <NAME>Elodie Bellau</NAME>
    <BORN>11 Feb 1858</BORN>
    <DIED>12 Apr 1898</DIED>
    <SPOUSE IDREF="p4"/>
  </PERSON>

  <PERSON ID="p4" FATHER="p2" MOTHER="p1">
    <NAME>John P. Muller</NAME>
    <SPOUSE IDREF="p3"/>
  </PERSON>

  <PERSON ID="p7">
    <NAME>Adolf Eno</NAME>
    <SPOUSE IDREF="p6"/>
  </PERSON>

  <PERSON ID="p6" FATHER="p2" MOTHER="p1">
    <NAME>Maria Bellau</NAME>
    <SPOUSE IDREF="p7"/>
  </PERSON>

  <PERSON ID="p5" FATHER="p2" MOTHER="p1">
    <NAME>Eugene Bellau</NAME>
  </PERSON>

  <PERSON ID="p8" FATHER="p2" MOTHER="p1">
    <NAME>Louise Pauline Bellau</NAME>
    <BORN>29 Oct 1868</BORN>
    <DIED>3 May 1938</DIED>
    <SPOUSE IDREF="p9"/>
  </PERSON>

  <PERSON ID="p9">
    <NAME>Charles Walter Harold</NAME>
    <BORN>about 1861</BORN>
    <DIED>about 1938</DIED>
    <SPOUSE IDREF="p8"/>
  </PERSON>

  <PERSON ID="p10" FATHER="p2" MOTHER="p1">
    <NAME>Victor Joseph Bellau</NAME>
    <SPOUSE IDREF="p11"/>
  </PERSON>

  <PERSON ID="p11">
    <NAME>Ellen Gilmore</NAME>
    <SPOUSE IDREF="p10"/>
  </PERSON>

  <PERSON ID="p12" FATHER="p2" MOTHER="p1">
    <NAME>Honore Bellau</NAME>
  </PERSON>

  <FAMILY ID="f1">
    <HUSBAND IDREF="p2"/>
    <WIFE IDREF="p1"/>
    <CHILD IDREF="p3"/>
    <CHILD IDREF="p5"/>
    <CHILD IDREF="p6"/>
    <CHILD IDREF="p8"/>
    <CHILD IDREF="p10"/>
    <CHILD IDREF="p12"/>
  </FAMILY>

  <FAMILY ID="f2">
    <HUSBAND IDREF="p7"/>
    <WIFE IDREF="p6"/>
  </FAMILY>

</FAMILYTREE>

Location Paths, Steps, and Sets

Many (though not all) XPointers are location paths. These are the same location paths used by XSLT.
Location paths are built from location steps.
Each location step specifies a point in the targeted document, generally relative to some other well-known point such as the start of the document or another location step. This well-known point is called the context node.

Location Steps

A location step has three parts:
- The axis
- The node test
- An optional predicate
axis::node-test[predicate]
child::PERSON[position()=2]
The axis tells you in what direction to search from the context node.
The node test tells you which nodes to consider along the axis.
The predicate is a boolean expression that tests each node in that set. If that expression returns false, then the node is removed from the set.

Location Paths

xpointer(/child::FAMILYTREE/child::PERSON[position()=3])
The location path of this XPointer is /child::FAMILYTREE/child::PERSON[position()=3].
It is built from two location steps:
- /child::FAMILYTREE
- child::PERSON[position()=3]

It identifies the single node:

  <PERSON ID="p3" FATHER="p2" MOTHER="p1">
    <NAME>Elodie Bellau</NAME>
    <BORN>11 Feb 1858</BORN>
    <DIED>12 Apr 1898</DIED>
    <SPOUSE IDREF="p4"/>
  </PERSON>

Location Paths that Identify Multiple Nodes

xpointer(/child::FAMILYTREE/child::PERSON[position()>3])
Identifies all PERSON element nodes after Elodie Bellau

Axes

XPath defines twelve axes along which an XPointer may search for nodes
These depend on context to determine exactly what they point to.
For instance, consider this location path:
id("p6")/child::NAME
It begins with the id() function that returns a node set containing the element with the ID type attribute whose value is p6. This provides a context node for the following location step along the relative child axis.
Other axes include
- ancestor
- descendant
- self
- ancestor-or-self
- descendant-or-self
- attribute
Each selects nodes from a particular subset of the nodes in the document. For instance, the following axis selects from nodes that come after the context node. The preceding axis selects from nodes that come before the context node.

Location Step Axes

Axis	Selects From
`ancestor`	the parent of the context node, the parent of the parent of the context node, the parent of the parent of the parent of the context node, and so forth back to the root node
`ancestor-or-self`	the ancestors of the context node and the context node itself
`attribute`	the attributes of the context node
`child`	the immediate children of the context node
`descendant`	the children of the context node, the children of the children of the context node, and so forth
`descendant-or-self`	the context node itself and its descendants
`following`	all nodes that start after the end of the context node, excluding attribute and namespace nodes
`following-sibling`	all nodes that start after the end of the context node and have the same parent as the context node
`parent`	the unique parent node of the context node
`preceding`	all nodes that end before the beginning of the context node, excluding attribute and namespace nodes
`preceding-sibling`	all nodes that start before the beginning of the context node and have the same parent as the context node
`self`	the context node

Node Tests

There are ten node tests in XPointer, eight from XPath and two new ones:
- name
- *
- prefix:*
- @name
- node()
- text()
- comment()
- processing-instruction()
- point()
- range()
A node test is attached to an axis to specify which nodes along the axis are chosen.
For example:
/descendant::body/child::*/attribute::xlink:*

Predicates

Each location step can contain zero or more predicates that further restrict which nodes an XPointer points to. In most non-trivial cases a predicate is necessary to pick the one node from a node set that you want.
Each predicate contains a boolean expression in square brackets ([]) that further winnows the node set.
This allows an XPointer to select nodes according to many different criteria. For example, you can select:
- All elements that have a specified attribute
- All elements that have a specified attribute with a specified value
- The first element that contains a specified child element
- An element whose text content includes a specified string
- All elements that are not the first or last children of their parents
- All elements whose value is a number
- All elements whose value is a number greater than 100
These are just a small sampling of the selections that predicates make possible.

Boolean Conversion

XPath predicate expressions are ultimately converted to a boolean after all calculations are finished. Non-boolean results are converted as follows:
- A number is true if it's equal to the position of the context node, false otherwise.
- An empty node set is false; all other node sets are true.
- A zero length string is false; all other strings are true (including the string "false")
The predicate expression is evaluated for each node in the context node list. Each node for which the expression ultimately evaluates to false is removed from the list. Thus only those nodes that satisfy the predicate remain.

The position() function

Probably the function most frequently used in XPointer predicates is position(). This returns the index of the node in the context node list. This allows you to find the first, second, third, or other indexed node.

You can compare positions using the various relational operators like <, >, =, !=, >=, and <=.

xpointer(/child::FAMILYTREE/child::*[position()=1])
xpointer(/child::FAMILYTREE/child::*[position()=2])
xpointer(/child::FAMILYTREE/child::*[position()=3])
xpointer(/child::FAMILYTREE/child::*[position()=4])
xpointer(/child::FAMILYTREE/child::*[position()=5])
xpointer(/child::FAMILYTREE/child::*[position()=6])
xpointer(/child::FAMILYTREE/child::*[position()=7])
xpointer(/child::FAMILYTREE/child::*[position()=8])
xpointer(/child::FAMILYTREE/child::*[position()=9])
xpointer(/child::FAMILYTREE/child::*[position()=10])
xpointer(/child::FAMILYTREE/child::*[position()=11])
xpointer(/child::FAMILYTREE/child::*[position()=12])
xpointer(/child::FAMILYTREE/child::*[position()=13])
xpointer(/child::FAMILYTREE/child::*[position()=14])

Identifying an element by its position

xpointer(/child::FAMILYTREE/child::*[1])
xpointer(/child::FAMILYTREE/child::*[2])
xpointer(/child::FAMILYTREE/child::*[3])
xpointer(/child::FAMILYTREE/child::*[4])
xpointer(/child::FAMILYTREE/child::*[5])
xpointer(/child::FAMILYTREE/child::*[6])
xpointer(/child::FAMILYTREE/child::*[7])
xpointer(/child::FAMILYTREE/child::*[8])
xpointer(/child::FAMILYTREE/child::*[9])
xpointer(/child::FAMILYTREE/child::*[10])
xpointer(/child::FAMILYTREE/child::*[11])
xpointer(/child::FAMILYTREE/child::*[12])
xpointer(/child::FAMILYTREE/child::*[13])
xpointer(/child::FAMILYTREE/child::*[14])

Functions that Return Node Sets


        id()
        here()
        origin()

The last two, here() and origin() are XPointer extensions to XPath that are not available in XSLT.

id()

The id() function selects the element in the document that has an ID type attribute with a specified value.
For example, consider the URI http://www.theharolds.com/genealogy.xml#xpointer(id("p12")). If you look back at Listing 17-1, you find this element:
```
<PERSON ID="p12" FATHER="p2" MOTHER="p1">
  <NAME>Honore Bellau</NAME>
</PERSON>
```
Since ID pointers are so common and so useful, there's also a shortcut for this. If all you want to do is point to a particular element with a particular ID, you can skip all the xpointer(id("")) fru-fru and just use the bare ID after the # like this:
http://www.theharolds.com/genealogy.xml#p12

here()

Consider a simple slide show. In this example, here()/following::SLIDE[1] refers to the next slide in the show. here()/preceding::SLIDE[1] refers to the previous slide in the show. Presumably this would be used in conjunction with a style sheet that showed one slide at a time.

<?xml version="1.0"?>
<SLIDESHOW xmlns:xlink="http://www.w3.org/1999/xlink">
  <SLIDE>
    <H1>Welcome to the slide show!</H1>
    <BUTTON xlink:type="simple"
            xlink:href="here()/following::SLIDE[1]">
      Next
    </BUTTON>
  </SLIDE>
  <SLIDE>
    <H1>This is the second slide</H1>
    <BUTTON xlink:type="simple" 
            xlink:href="here()/preceding::SLIDE[1]">
      Previous
    </BUTTON>
    <BUTTON xlink:type="simple" 
            xlink:href="here()/following::SLIDE[1]">
      Next
    </BUTTON>
  </SLIDE>
  <SLIDE>
    <H1>This is the second slide</H1>
    <BUTTON xlink:type="simple" 
            xlink:href="here()/preceding::SLIDE[1]">
      Previous
    </BUTTON>
    <BUTTON xlink:type="simple" 
           xlink:href="here()/following::SLIDE[1]">
      Next
    </BUTTON>
  </SLIDE>
  <SLIDE>
    <H1>This is the third slide</H1>
    <BUTTON xlink:type="simple" 
            xlink:href="here()/preceding::SLIDE[1]">
      Previous
    </BUTTON>
    <BUTTON xlink:type="simple" 
            xlink:href="here().following(1,SLIDE)">
      Next
    </BUTTON>
  </SLIDE>
  ...
  <SLIDE>
    <H1>This is the last slide</H1>
    <BUTTON xlink:type="simple"
            xlink:href="here()/preceding::SLIDE[1]">
      Previous
    </BUTTON>
  </SLIDE>

</SLIDESHOW>

Generally, the here() location term is only used in fully relative URIs in XLinks. If any URI part is included, it must be the same as the URI of the current document.

origin()

The origin() function is much the same as here(); that is, it refers to the source of a link. However, origin() is used in out-of-line links where the link is not actually present in the source document. It points to the element in the source document from which the user activated the link.

Points

Every point is either between two nodes or between two characters in the parsed character data of a document. To make sense of this you have to remember that parsed character data is part of a text node. For instance, consider this very simple but well-formed XML document:

<GREETING>
  Hello
</GREETING>

Tree Structure

There are exactly three nodes and 13 distinct points in this document. In order the points are:

The point before the root node
The point before the GREETING element node
The point before the text node containing the text "Hello" (as well as assorted white space)
The point before the white space between <GREETING> and Hello.
The point before the first H in Hello
The point between the H and the e in Hello
The point between the e and the l in Hello
The point between the l and the l in Hello
The point between the l and the o in Hello
The point after the o in Hello
The point after the white space between Hello and </GREETING>.
The point after the GREETING element.
The point after the root node.

The exact details of the white space in the document are not considered here. XPointer collapses all runs of white space to a single space.

Point Expressions

A point is selected using an XPath expression with the point() node test
A predicate can indicate which of several points is chosen.
child::point()[position()=n]
The index refers to the point before n^th child element if the context node is an element or root node, or to the n^th character of the string value of the node otherwise.
For example, to select the point immediately before the D in Domeniquette Celeste Baudean's NAME element,
/child::FAMILYTREE/descendant::*[position()=1]/child::NAME/child::text()/child::point()[position()=0]
To select the point after the last e in Domeniquette, since there are 12 letters in Domeniquette,
/child::FAMILYTREE/descendant::*[position()=1]/child::NAME/child::text()/child::point()[position()=12]

Ranges

In some applications it may be important to specify a range across a document rather than a particular point in the document. For instance, the selection a user makes with a mouse is not necessarily going to match up with any one element or node. It may start in the middle of one paragraph, extend across a heading and a picture and then into the middle of another paragraph two pages down. Any such contiguous area of a document can be described with a range.

A range begins at one point and continues until another point.
The endpoints of the range are identified by location paths.
If the starting path points to a node set rather than a point, then the first point in the location set the XPointer identifies is the start point.
If the ending location path points to a node set rather than a point, then the last point in the location set the XPointer identifies is the end point of the range.

Range Expressions

To specify a range, you append /range-to(end-point) to a location path specifying the start point of the range.
The parentheses contain a location path specifying the endpoint of the range.
For example, suppose you want to select everything between the first PERSON element and the last PERSON element
xpointer(/child::PERSON[position() = 1]/range-to(/child::PERSON[position() = last()]))

Range Functions

range(location-set): returns returns a location set containing one range for each location in the argument.
The range is the minimum range necessary to cover the entire location.
range-inside(location-set): Returns a location set containing the interiors of each of the locations in the input.
start-point(location-set): Returns a location set that contains one point representing the first point of each location in the input location set. For example, start-point(//PERSON[1]) Returns the point immediately before the first PERSON element. start-point(//PERSON) returns the set of points immediately before each PERSON element.
end-point(location-set): The same as start-point() except that it returns the points immediately after each location in its input.

String Ranges

string-range(node-set,substring,index,length)

A string range points to an occurrence of a specified string, or a substring of a given string in the text (not markup) of the document.
string-range() takes as arguments a node set to search and a substring to search for.
string-range() returns a node set containing one range for each non-overlapping match to the string.
By default, the range returned starts before the first matched character and encompasses all the matched characters.
You can also provide optional index and length arguments indicating how many characters after the match the range should start and how many characters after the start the range should continue.
For example, this XPointer finds all occurrences of the string "Harold":
xpointer(string-range(/,"Harold"))
You can change the first argument to specify what nodes you want to look in. For example, this XPointer finds all occurrences of the string "Harold" in NAME elements:
xpointer(string-range(//NAME,"Harold"))
String ranges may have node tests. Thus this XPointer finds only the first occurrence of the string "Harold" in the document:
xpointer(string-range(/,"Harold")[position()=1])

This targets the position immediately preceding the word Harold in Charles Walter Harold's NAME element. This is not the same as pointing at the entire NAME element as an element-based selector would do.
A third numeric argument targets a particular position in the string. For example, this targets the point immediately following the first occurrence of the string "Harold" because Harold has six letters:
xpointer(string-range(/,"Harold",6)[position()=1])
An optional fourth argument specifies the number of characters to select. For example, this URI selects the "old" from the first occurrence of the entire string "Harold":
xpointer(string-range(/,"Harold",4,3)[position()=1])
When matching strings, case is considered. All white space is condensed to a single space. Markup characters are ignored.

XPointers and Namespaces

XPointers may appear in non-XML documents where namespace prefixes are not defined.
You use an xmlns() scheme to map a prefix to a URI. For example,
xmlns(svg=http://www.w3.org/2000/svg) xpointer(//svg:polygon[3])

Child Sequences

A child sequence is a shortcut for XPointers that consist of nothing but a series of child relative location steps counting down from the root node, each of which selects a particular child by position only.
The shortcut is to use only the position number and the slashes that separate individual elements from each other, like this:
http://www.theharolds.com/genealogy.xml#/1/4
/1/4 is a child sequence that says to select the fourth child element of the first child element of the root.
Child sequences may include an initial ID. In that case the counting begins from the element with that ID rather than from the root. For example, John P. Muller's PERSON element has an ID attribute with the value p4. Consequently the XPointer p4/1 points to his NAME element and p4/2 points to his SPOUSE element.
Each child sequence always points to a single element. You cannot use child sequences with any other relative location steps. You cannot use them to select elements of a particular type. You cannot use them to select attribute or strings. You can only use them to select a single element by its relative location in the tree.

XPointer Summary

XPointers refer to particular parts of or locations in XML documents.
The syntax of an XPointer is the keyword xpointer, followed by parentheses containing an XPath expression that returns a node set.
The id() function points to an element with a specified value for an ID type attribute.
Location steps can be chained to make more sophisticated location paths.
Each location step contains an axis, a node test, and zero or more predicates.
Relative location steps select nodes in a document based on their relationship to a context node.
The self axis points to the context node. It can be abbreviated as a period (.).
The parent axis points to the node that contains the context node. It can be abbreviated as a double period (..).
The child axis points to immediate children of the context node. It can be abbreviated simply by a node test.
The descendant axis points to all elements contained in the context node. It can be abbreviated as a double slash (//).
The descendant-or-self axis points to all elements contained in the context node as well as the context node itself.
The ancestor axis points to an element that contains the context node.
The ancestor-or-self axis points to all elements that contain the context node as well as the context node itself.
The preceding axis points to any element that comes before the context node.
The following axis points to any element following the context node.
The preceding-sibling axis selects from sibling elements that precede the context node.
The following-sibling axis selects from sibling elements that follow the context node.
The attribute axis points to an attribute of the context node. It can be abbreviated as a @ sign.
The node test of a relative location step is normally an element name, but may also be * to select all elements, @* to select all attributes, @name to select all attributes with the given name, prefix:* to select all elements in the specified namespace, or one of the keywords comment(), text(), processing-instruction(), node(), point() or range().
The optional predicate of a relative location step is an XPath boolean expression enclosed in square brackets that further narrows down the node set the XPointer refers to.
A point indicates a position preceding or following a node or a character.
A range identifies the parsed character data between two points.
The string-range() function points to a specified block of text.
A child sequence points to an element by counting children from the root.

To Learn More

This presentation: http://www.ibiblio.org/xml/slides/xmlonelondon2001/xlinks
XPointer Specification: http://www.w3.org/TR/xptr
Chapter 17 of the XML Bible: http://www.ibiblio.org/xml/books/bible/updates/17.html
Chapter 10 of XML in a Nutshell

Part III: XML Base

What is XML Base?

An inband means of specifying the proper URI for a document that can succeed even if out-of-band mechanisms aren't available.
A means of specifying the proper base URI which relative URLs are relative to, even if the document itself is copied to a different location.
An XML replacement for the HTML BASE element
W3C Proposed Recommendation, December 20, 2000

The xml:base attribute

<slide xml:base="http://www.ibiblio.org/xml/slides/xmlonelondon2001/xlinks/">
  <title>The xml:base attribute</title>
  ...
  <previous xlink:type="simple" xlink:href="What_Is_XBase.xml"/>
  <next xlink:type="simple" xlink:href="xbaseexample.xml"/>
</slide>

May be attached to any element to set the base URI for that element and its descendants
The xml prefix is automatically bound to the http://www.w3.org/XML/1998/namespace URI
The value should be an absolute URI

XML Base Example

<COURSE xmlns:xlink="http://www.w3.org/1999/xlink"
         xml:base="http://www.ibiblio.org/javafaq/course/"
         xlink:type="extended">

  <TOC xlink:type="locator" xlink:href="index.html" xlink:label="index"/>

  <CLASS xlink:type="locator" xlink:label="class"
         xlink:href="week1.xml"/>
  <CLASS xlink:type="locator" xlink:label="class" 
         xlink:href="week2.xml"/>
  <CLASS xlink:type="locator" xlink:label="class" 
         xlink:href="week3.xml"/>
  <CLASS xlink:type="locator" xlink:label="class" 
         xlink:href="week4.xml"/>
  <CLASS xlink:type="locator" xlink:label="class" 
         xlink:href="week5.xml"/>
  <CLASS xlink:type="locator" xlink:label="class" 
         xlink:href="week6.xml"/>
  <CLASS xlink:type="locator" xlink:label="class" 
         xlink:href="week7.xml"/>
  <CLASS xlink:type="locator" xlink:label="class" 
         xlink:href="week8.xml"/>
  <CLASS xlink:type="locator" xlink:label="class" 
         xlink:href="week9.xml"/>
  <CLASS xlink:type="locator" xlink:label="class" 
         xlink:href="week10.xml"/> 
  <CLASS xlink:type="locator" xlink:label="class" 
         xlink:href="week11.xml"/> 
  <CLASS xlink:type="locator" xlink:label="class" 
         xlink:href="week12.xml"/>
  <CLASS xlink:type="locator" xlink:label="class" 
         xlink:href="week13.xml"/>
  
  <CONNECTION xlink:type="arc" from="index" to="class"/>
  <CONNECTION xlink:type="arc" from="class" to="index"/>
  
</COURSE>

"index.html" now resolves to the URI "http://www.ibiblio.org/javafaq/course/index.html"
"week1.xml" resolves to the URI "http://www.ibiblio.org/javafaq/course/week1.xml"
"week2.xml" resolves to the URI "http://www.ibiblio.org/javafaq/course/week2.xml"
"week3.xml" resolves to the URI "http://www.ibiblio.org/javafaq/course/week3.xml"
etc.

Open Issues

How does it interact with XHTML? in particular, the XHTML base element?
Browser and other application support?

To Learn More

XML Base Specification: http://www.w3.org/TR/xmlbase

Part IV: XInclude

The problem is that we're not providing the tools. We're providing the specs. That's a whole different ball game. If tools existed for actually making really interesting use of RDF and XLink and XInclude then people would use them. If IE and/or Mozilla supported the full gamut of specs, from XSLT 1.0 to XLink and XInclude (OK, so they're not quite REC's, but with time...) then you would find people using them more.

--Matt Sergeant on the xml-dev mailing list

What is XInclude?

A means of including one XML document inside another, irrespective of validation.
W3C Working Draft, October 26, 2000
Based on the XML Infoset; a source infoset is transformed into a result infoset

Alternatives (and why they don't work)

xlink:show="embed" only graphically includes, like the IMG element in HTML. It does not merge infosets.
External parsed entities:
- Require a DTD
- Can only handle very limited documents; i.e. not all well-formed XML documents are well-formed external parsed entities. In particular XML declarations can be and document type declarations are a problem.
- Doesn't allow unparsed text inserted as CDATA
XSLT document() function
- Only handles XSLT
- No unparsed, pure-text includes
Custom code or XSLT extension functions

The include element

href attribute identifies the document (or part thereof) to be included
In the http://www.w3.org/1999/XML/xinclude namespace.
The prefix xinclude is customary.

<book xmlns:xinclude="http://www.w3.org/1999/XML/xinclude">
  <title>Processing XML with Java</title>
  <chapter><xinclude:include href="dom.xml"/></chapter>
  <chapter><xinclude:include href="sax.xml"/></chapter>
  <chapter><xinclude:include href="jdom.xml"/></chapter>
</book>

The parse attribute

parse="xml": The resource must be parsed as XML and the infosets merged. This is the default.
parse="text": The resource must be treated as pure text and inserted as a text node. When serialized, this means that characters like < will change to < and so forth.

<slide xmlns:xinclude="http://www.w3.org/1999/XML/xinclude">
  <title>The href attribute</title>
  
<ul>
  <li>Identifies the document to be included with a URI</li>
  <li>The document at the URI replaces the <code>include</code> 
      element in the including document</li>
  <li>The <code>xinclude</code> prefix is bound to the http://www.w3.org/1999/XML/xinclude
  namespace URI. 
  </li>
</ul>  

<pre><code><xinclude:include parse="text" href="processing_xml_with_java.xml"/>
</code></pre>
        
  <description>
      A slide from Elliotte Rusty Harold's XML and Hypertext seminar at
      <host_ref/>, <date_ref/>
    </description>
  <last_modified>October 26, 2000</last_modified>
</slide>

Implementation as JDOM

/*--

 Copyright 2000 Elliotte Rusty Harold.
 All rights reserved.

 I haven't yet decided on a license.
 It will be some form of open source.

 THIS SOFTWARE IS PROVIDED "AS IS" AND ANY EXPRESSED OR IMPLIED
 WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
 OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
 DISCLAIMED.  IN NO EVENT SHALL ELLIOTTE RUSTY HAROLD OR ANY
 OTHER CONTRIBUTORS TO THIS PACKAGE
 BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
 SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
 LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
 USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
 ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
 OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
 OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
 SUCH DAMAGE.

 */

package com.macfaq.xml;

import java.net.URL;
import java.net.MalformedURLException;
import java.util.Stack;
import java.util.Iterator;
import java.util.List;
import java.io.File;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.BufferedInputStream;
import java.io.InputStream;
import org.jdom.Namespace;
import org.jdom.Comment;
import org.jdom.CDATA;
import org.jdom.JDOMException;
import org.jdom.Attribute;
import org.jdom.Element;
import org.jdom.ProcessingInstruction;
import org.jdom.Document;
import org.jdom.input.SAXBuilder;
import org.jdom.output.XMLOutputter;

/**
 * <p><code>XIncluder</code> provides methods to
 * resolve JDOM elements and documents to produce
 * a new Document or Element with all
 * XInclude references resolved.
 * </p>
 *
 *
 * @author Elliotte Rusty Harold
 * @version 1.0d2
 */
public class XIncluder {

  public final static Namespace XINCLUDE_NAMESPACE
    = Namespace.getNamespace("xinclude", "http://www.w3.org/1999/XML/xinclude");

  // No instances allowed
  private XIncluder() {}

  private static SAXBuilder builder = new SAXBuilder();

  /**
    * <p>
    * This method resolves a JDOM <code>Document</code>
    * and merges in all XInclude references.
    * If a referenced document cannot be found it is replaced with
    * an error message. The Document object returned is a new document.
    * The original <code>Document</code> is not changed.
    * </p>
    *
    * @param original <code>Document</code> that will be processed
    * @param base     <code>String</code> form of the base URI against which
    *                 relative URLs will be resolved. This can be null if the
    *                 document includes an <code>xml:base</code> attribute.
    * @return Document new <code>Document</code> object in which all
    *                  XInclude elements have been replaced.
    * @throws CircularIncludeException if this document possesses a cycle of
    *                                  XIncludes.
    * @throws MalformedURLException if Java cannot parse the base URI using the
    *                               the normal methods of java.net.URL.
    */
    public static Document resolve(Document original, String base)
      throws CircularIncludeException, MalformedURLException {

        if (original == null) throw new NullPointerException("Document must not be null");

        Element root = original.getRootElement();
        Element resolved = (Element) resolve(root, base);

        // catch a ClassCastException if a String is returned????
        // Is the root element allowed to be replaced by
        // an parse="text"

        Document result = new Document(resolved, original.getDocType());

        Iterator iterator = original.getMixedContent().iterator();
        while (iterator.hasNext()) {
          Object o = iterator.next();
          if (o instanceof Comment) {
            Comment c = (Comment) o;
            result.addContent((Comment) c.clone());
          }
          else if (o instanceof ProcessingInstruction) {
            ProcessingInstruction pi =(ProcessingInstruction) o;
            result.addContent((ProcessingInstruction) pi.clone());
          }
        }

        return result;
  }

  /**
    * <p>
    * This method resolves a JDOM <code>Element</code>
    * and merges in all XInclude references. This process is recursive.
    * The element returned contains no XInclude elements.
    * If a referenced document cannot be found it is replaced with
    * an error message. The <code>Element</code> object returned is a new element.
    * The original <code>Element</code> is not changed.
    * </p>
    *
    * @param original <code>Element</code> that will be processed
    * @param base     <code>String</code> form of the base URI against which
    *                 relative URLs will be resolved. This can be null if the
    *                 element includes an <code>xml:base</code> attribute.
    * @return Object  Either an <code>Element</code>
    *                 (<code>parse="text"</code>) or a <code>String</code>
    *                 (<code>parse="xml"</code>)
    * @throws CircularIncludeException if this <code>Element</code> contains an XInclude element
    *                                  that attempts to include a document in which
    *                                  this element is directly or indirectly included.
    */
    public static Object resolve(Element original, String base)
     throws CircularIncludeException, MalformedURLException {

        if (original == null) {
          throw new NullPointerException("You can't XInclude a null element.");
        }
        Stack bases = new Stack();
        if (base != null) bases.push(base);

        Object result = resolve(original, bases);
        bases.pop();
        return result;

    }

    private static boolean isIncludeElement(Element element) {

        if (element.getName().equals("include") &&
            element.getNamespace().equals(XINCLUDE_NAMESPACE)) {
          return true;
        }
        return false;

    }


  /**
    * <p>
    * This method resolves a JDOM <code>Element</code>
    * and merges in all XInclude references. This process is recursive.
    * The element returned contains no XInclude elements.
    * If a referenced document cannot be found it is replaced with
    * an error message. The <code>Element</code> object returned is a new element.
    * The original <code>Element</code> is not changed.
    * </p>
    *
    * @param original <code>Element</code> that will be processed
    * @param bases    <code>Stack</code> containing the string forms of
    *                 all the URIs of documents which contain this element
    *                 through XIncludes. This used to detect if a circular
    *                 reference is being used.
    * @return Object  Either an <code>Element</code>
    *                 (<code>parse="text"</code>) or a <code>String</code>
    *                 (<code>parse="xml"</code>)
    * @throws CircularIncludeException if this <code>Element</code> contains an XInclude element
    *                                  that attempts to include a document in which
    *                                  this element is directly or indirectly included.
    */
  protected static Object resolve(Element original, Stack bases)
   throws CircularIncludeException {

    Element result;
    String base = "";
    if (bases.size() != 0) base = (String) bases.peek();

    if (isIncludeElement(original)) {
      Attribute href = original.getAttribute("href");
      if (href == null) { // illegal, what kind of exception????
        throw new IllegalArgumentException("Missing href attribute");
      }
      Attribute baseAttribute
       = original.getAttribute("base", Namespace.XML_NAMESPACE);
      if (baseAttribute != null) base = baseAttribute.getValue();
      boolean parse = true;
      Attribute parseAttribute = original.getAttribute("parse");
      if (parseAttribute != null) {
        if (parseAttribute.getValue().equals("text")) parse = false;
      }

      URL remote;
      if (base != null) {
        try {
          URL context = new URL(base);
          remote = new URL(context, href.getValue());
        }
        catch (MalformedURLException ex) {
          return "Unresolvable URL " + base + "/" + href.getValue();
        }
      }
      else {
        try {
          remote = new URL(href.getValue());
        }
        catch (MalformedURLException ex) {
          return "Unresolvable URL " + href.getValue();
        }
      }

      if (parse) {
                 // checks for equality (OK) or identity (not OK)????
        if (bases.contains(remote.toExternalForm())) {
          // need to figure out how to get file and number where
          // bad include occurs
          throw new CircularIncludeException(
            "Circular XInclude Reference to "
           + remote.toExternalForm() + " in " );
        }

        try {
          Document doc = builder.build(remote);
          bases.push(remote.toExternalForm());
          result = (Element) resolve(doc.getRootElement(), bases);
          bases.pop();
        }
        // Make this configurable
        catch (JDOMException e) {
           return "Document not found: " + remote.toExternalForm()
            + "\r\n" + e.getMessage();
        }
      }
      else { // insert text
        return downloadTextDocument(remote);
      }

    }
    // not an include element
    else { // recursively process children
       result = new Element(original.getName(), original.getNamespace());
       Iterator attributes = original.getAttributes().iterator();
       while (attributes.hasNext()) {
         Attribute a = (Attribute) attributes.next();
         result.addAttribute((Attribute) a.clone());
       }
       List children = original.getMixedContent();

       Iterator iterator = children.iterator();
       while (iterator.hasNext()) {
         Object o = iterator.next();
         if (o instanceof Element) {
           Element e = (Element) o;
           Object resolved = resolve(e, bases);
           if (resolved instanceof String) {
               result.addContent((String) resolved);
           }
           else result.addContent((Element) resolved);
         }
         else if (o instanceof String) {
           result.addContent((String) o);
         }
         else if (o instanceof Comment) {
           result.addContent((Comment) o);
         }
         else if (o instanceof CDATA) {
           result.addContent((CDATA) o);
         }
         else if (o instanceof ProcessingInstruction) {
           result.addContent((ProcessingInstruction) o);
         }
       }
    }

    return result;

  }

  /**
    * <p>
    * This utility method reads a document at a specified URL
    * and returns the contents of that document as a <code>String</code>.
    * It's used to include files with <code>parse="text"</code>
    * </p>
    *
    * <p>
    * If the document cannot be located due to an IOException,
    * then an error message string is returned. I'm not yet convinced this
    * is the right behavior. Perhaps I should pass on the exception?
    * </p>
    *
    * @param source   <code>URL</code> of the document that will be stored in
    *                 <code>String</code>.
    * @return String  The document retrieved from the source <code>URL</code>
    *                 or an error message if the document can't be retrieved.
    *                 Note: throwing an exception might be better here. I should
    *                 at least allow the setting of the error message.
    */
    public static String downloadTextDocument(URL source) {

        StringBuffer s = new StringBuffer();
        try {
          InputStream in = new BufferedInputStream(source.openStream());
          // does XInclude give you anything to specify the character set????
          InputStreamReader reader = new InputStreamReader(in, "8859_1");
          int c;
          while ((c = in.read()) != -1) {
            if (c == '<') s.append("&lt;");
            else if (c == '&') s.append("&amp;");
            else s.append((char) c);
          }
          return s.toString();
        }
        catch (IOException e) {
          return "Document not found: " + source.toExternalForm();
        }

    }

    /**
      * <p>
      * The driver method for the XIncluder program.
      * I'll probably move this to a separate class soon.
      * </p>
      *
      * @param args  <code>args[0]</code> contains the URL or file name
      *              of the document to be processed.
      */
    public static void main(String[] args) {

        SAXBuilder builder = new SAXBuilder();
        XMLOutputter outputter = new XMLOutputter();
        for (int i = 0; i < args.length; i++) {
          try {
            Document input = builder.build(args[i]);
            // absolutize URL
            String base = args[i];
            if (base.indexOf(':') < 0) {
              File f = new File(base);
              base = f.toURL().toExternalForm();
            }
            Document output = resolve(input, base);
            // need to set encoding on this to Latin-1 and check what
            // happens to UTF-8 curly quotes
            outputter.output(output, System.out);
          }
          catch (Exception e) {
            System.err.println(e);
            e.printStackTrace();
          }
        }

    }

}

Implementation as DOM

/*--

 Copyright 2000 Elliotte Rusty Harold.
 All rights reserved.

 I haven't yet decided on a license.
 It will be some form of open source.

 THIS SOFTWARE IS PROVIDED "AS IS" AND ANY EXPRESSED OR IMPLIED
 WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
 OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
 DISCLAIMED.  IN NO EVENT SHALL ELLIOTTE RUSTY HAROLD OR ANY
 OTHER CONTRIBUTORS TO THIS PACKAGE
 BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
 SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
 LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
 USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
 ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
 OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
 OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
 SUCH DAMAGE.

 */

package com.macfaq.xml;

import java.net.URL;
import java.net.MalformedURLException;
import java.util.Stack;
import org.xml.sax.SAXException;
import java.io.File;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.BufferedInputStream;
import java.io.InputStream;
import org.w3c.dom.Element;
import org.w3c.dom.Document;
import org.w3c.dom.Attr;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.w3c.dom.DocumentType;
import org.w3c.dom.DOMImplementation;
import org.apache.xerces.parsers.DOMParser;
import org.apache.xml.serialize.OutputFormat;
import org.apache.xml.serialize.XMLSerializer;

/**
 * <p><code>DOMXIncluder</code> provides methods to
 * resolve DOM elements and documents to produce
 * a new <code>Document</code> or <code>Element</code> with all
 * XInclude references resolved.
 * </p>
 *
 *
 * @author Elliotte Rusty Harold
 * @version 1.0d1
 */
public class DOMXIncluder {

  public final static String XINCLUDE_NAMESPACE
   = "http://www.w3.org/1999/XML/xinclude";

  // No instances allowed
  private DOMXIncluder() {}

  private static DOMParser parser = new DOMParser();

  /**
    * <p>
    * This method resolves a DOM <code>Document</code>
    * and merges in all XInclude references.
    * If a referenced document cannot be found it is replaced with
    * an error message. The <code>Document</code>
    * object returned is a new document.
    * The original <code>Document</code> object is not changed.
    * </p>
    *
    * @param original <code>Document</code> that will be processed
    * @param base     <code>String</code> form of the base URI against which
    *                 relative URLs will be resolved. This can be null if the
    *                 document includes an <code>xml:base</code> attribute.
    * @return Document new <code>Document</code> object in which all
    *                  XInclude elements have been replaced.
    * @throws CircularIncludeException if this document possesses a cycle of
    *                                  XIncludes.
    * @throws NullPointerException  if the original argument is null.
    */
    public static Document resolve(Document original, String base)
      throws CircularIncludeException, NullPointerException {

        if (original == null) {
          throw new NullPointerException("Document must not be null");
        }

        Element root = original.getDocumentElement();

        // catch a ClassCastException if a Text is returned????
        // Is the root element allowed to be replaced by
        // an parse="text"

        DOMImplementation impl = original.getImplementation();

        DocumentType oldDoctype = original.getDoctype();
        DocumentType newDoctype = impl.createDocumentType(
         oldDoctype.getName(),
         oldDoctype.getPublicId(),
         oldDoctype.getSystemId());

        Document resultDocument
         = impl.createDocument(root.getNamespaceURI(),
           root.getTagName(),
           newDoctype);
        // check that tag name is qualified name

        NodeList children = original.getChildNodes();
        for (int i = 0; i < children.getLength(); i++) {
          Node n = children.item(i);
          if (n instanceof Element) { // root element
              resultDocument.replaceChild(
               resolve(root, base, resultDocument),
               resultDocument.getDocumentElement()
             );
          }
          else if (n instanceof DocumentType) {
              // skip it, already cloned
          }
          else {
              resultDocument.appendChild(n.cloneNode(true));
          }
        }

        return resultDocument;
  }

  /**
    * <p>
    * This method resolves a DOM <code>Element</code>
    * and merges in all XInclude references. This process is recursive.
    * The element returned contains no XInclude elements.
    * If a referenced document cannot be found it is replaced with
    * an error message. The <code>Element</code> object returned is a new element.
    * The original <code>Element</code> is not changed.
    * </p>
    *
    * @param original <code>Element</code> that will be processed
    * @param base     <code>String</code> form of the base URI against which
    *                 relative URLs will be resolved. This can be null if the
    *                 element includes an <code>xml:base</code> attribute.
    * @param resolved <code>Document</code> into which the resolved element will be placed.
    * @return Node    Either an <code>Element</code>
    *                 (<code>parse="text"</code>) or a <code>Text</code>
    *                 (<code>parse="xml"</code>)
    * @throws CircularIncludeException if this <code>Element</code> contains an XInclude element
    *                                  that attempts to include a document in which
    *                                  this element is directly or indirectly included.
    * @throws NullPointerException  if the original argument is null.
    */
    public static Node resolve(Element original, String base, Document resolved)
     throws CircularIncludeException,  NullPointerException {

        if (original == null) {
          throw new NullPointerException(
           "You can't XInclude a null element."
          );
        }
        Stack bases = new Stack();
        if (base != null) bases.push(base);

        Node result = resolve(original, bases, resolved);
        bases.pop();
        return result;

    }

    private static boolean isIncludeElement(Element element) {

        if (element.getLocalName().equals("include") &&
            element.getNamespaceURI().equals(XINCLUDE_NAMESPACE)) {
          return true;
        }
        return false;

    }


  /**
    * <p>
    * This method resolves a DOM <code>Element</code>
    * and merges in all XInclude references. This process is recursive.
    * The element returned contains no XInclude elements.
    * If a referenced document cannot be found it is replaced with
    * an error message. The <code>Element</code> object returned is a new element.
    * The original <code>Element</code> is not changed.
    * </p>
    *
    * @param original <code>Element</code> that will be processed
    * @param bases    <code>Stack</code> containing the string forms of
    *                 all the URIs of doucments which contain this element
    *                 through XIncludes. This used to detect if a circular
    *                 reference is being used.
    * @param resolved <code>Document</code> into which the resolved element will be placed.
    * @return Node  Either an <code>Element</code>
    *                 (<code>parse="text"</code>) or a <code>String</code>
    *                 (<code>parse="xml"</code>)
    * @throws CircularIncludeException if this <code>Element</code> contains an XInclude element
    *                                  that attempts to include a document in which
    *                                  this element is directly or indirectly included.
    * @throws IllegalArgumentException if the href attribute is missing from an include element.
    */
  private static Node resolve(Element original, Stack bases, Document resolved)
   throws CircularIncludeException, IllegalArgumentException {

    Element result;
    String base = "";
    if (bases.size() != 0) base = (String) bases.peek();

    if (isIncludeElement(original)) {
      String href = original.getAttribute("href");
      if (href == null || href.equals("")) { // illegal, what kind of exception????
        throw new IllegalArgumentException("Missing href attribute");
      }
      String baseAttribute
       = original.getAttributeNS("http://www.w3.org/XML/1998/namespace", "base");
      if (base != null && !base.equals("")) {
        base = baseAttribute;
      }
      boolean parse = true;
      String parseAttribute = original.getAttribute("parse");
      if (parseAttribute != null && parseAttribute.equals("text")) {
          parse = false;
      }

      String remote;
      if (base != null) {
        try {
          URL context = new URL(base);
          URL u = new URL(context, href);
          remote = u.toExternalForm();
        }
        catch (MalformedURLException ex) {
          return resolved.createTextNode("Unresolvable URL "
           + base + "/" + href);
        }
      }
      else {
          remote = href;
      }

      if (parse) {
                 // checks for equality (OK) or identity (not OK)????
        if (bases.contains(remote)) {
          // need to figure out how to get file and number where
          // bad include occurs
          throw new CircularIncludeException(
            "Circular XInclude Reference to "
           + remote + " in " );
        }

        try {
          parser.parse(remote);
          Document doc = parser.getDocument();
          bases.push(remote);
          result = (Element) resolve(doc.getDocumentElement(), bases, resolved);
          bases.pop();
        }
        // Make this configurable
        catch (SAXException e) {
           return resolved.createTextNode("Document "
            + remote + " is not well-formed.\r\n" + e.getMessage());
        }
        catch (IOException e) {
           return resolved.createTextNode("Document not found: "
            + remote + "\r\n" + e.getMessage());
        }
      }
      else { // insert text
        String s = downloadTextDocument(remote);
        return resolved.createTextNode(s);
      }

    }
    // not an include element
    else { // recursively process children
       // still need to adjust bases here????
       result = (Element) resolved.importNode(original, false);
       NodeList children = original.getChildNodes();
       for (int i = 0; i < children.getLength(); i++) {
         Node n = children.item(i);
         if (n instanceof Element) {
           Element e = (Element) n;
           result.appendChild(resolve(e, bases, resolved));
         }
         else {
           result.appendChild(resolved.importNode(n,true));
         }
       }
    }

    return result;

  }

  /**
    * <p>
    * This utility method reads a document at a specified URL
    * and returns the contents of that document as a <code>Text</code>.
    * It's used to include files with <code>parse="text"</code>
    * </p>
    *
    * <p>
    * If the document cannot be located due to an IOException,
    * then an error message string is returned. I'm not yet convinced this
    * is the right behavior. Perhaps I should pass on the exception?
    * </p>
    *
    * @param url      URL of the doucment that will be stored in
    *                 <code>String</code>.
    * @return Text  The document retrieved from the source <code>URL</code>
    *                 or an error message if the document can't be retrieved.
    *                 Note: throwing an exception might be better here. I should
    *                 at least allow the setting of the eror message.
    */
    public static String downloadTextDocument(String url) {

        URL source;
        try {
          source = new URL(url);
        }
        catch (MalformedURLException ex) {
          return "Unresolvable URL " + url;
        }
        StringBuffer s = new StringBuffer();
        try {
          InputStream in = new BufferedInputStream(source.openStream());
          // does XInclude give you anything to specify the character set????
          InputStreamReader reader = new InputStreamReader(in, "8859_1");
          int c;
          while ((c = in.read()) != -1) {
            if (c == '<') s.append("&lt;");
            else if (c == '&') s.append("&amp;");
            else s.append((char) c);
          }
          return s.toString();
        }
        catch (IOException e) {
          return "Document not found: " + source.toExternalForm();
        }

    }

    /**
      * <p>
      * The driver method for the XIncluder program.
      * I'll probably move this to a separate class soon.
      * </p>
      *
      * @param args  <code>args[0]</code> contains the URL or file name
      *              of the document to be procesed.
      */
    public static void main(String[] args) {

        DOMParser parser = new DOMParser();
        XMLSerializer outputter = new XMLSerializer();
        for (int i = 0; i < args.length; i++) {
          try {
            parser.parse(args[i]);
            Document input = parser.getDocument();
            // absolutize URL
            String base = args[i];
            if (base.indexOf(':') < 0) {
              File f = new File(base);
              base = f.toURL().toExternalForm();
            }
            Document output = resolve(input, base);
            // need to set encoding on this to Latin-1 and check what
            // happens to UTF-8 curly quotes

            OutputFormat format = new OutputFormat("XML", "ISO-8859-1", false);
            format.setPreserveSpace(true);
            XMLSerializer serializer
             = new XMLSerializer(System.out, format);
            serializer.serialize(output);
          }
          catch (Exception e) {
            System.err.println(e);
            e.printStackTrace();
          }
        }

    }

}

To Learn More

XInclude Specification: http://www.w3.org/TR/xinclude

Part VII: The Oracle Speaks, Predictions for the Future

XSLT 1.1 as successful as XSLT 1.0

XSLT 2.0

Too dependant on schemas
Loses momentum of XSLT 1.0 and 1.1
But succeeds anyw

XQuery

Too early to call
Do we really need native XML databases?

XInclude succeeds once parsers support it

DOM Level 3 succeeds

JDOM succeeds, much to the consternation of the W3C

The triumph of worse is better

Schemas, a partial success

Developers need them desperately
Far too complex to be used as broadly as they're needed; experts only
Specification is poorly written, incomplete, and riddled with known problems; recommendation by exhaustion
Will be replaced within ten years; much like Java has replaced C

Stuff we didn't talk about

XPointers
XLinks
XSL-FO
XHTML
Schema Repositories
MathML
SVG
Browser support

XLinks

Won't succeed unless and until there's a killer app
First company to define the killer app gets to fill in the holes in the spec over the protests of the W3C and the hypertext community

XPointers; the same story

Won't succeed unless and until there's a killer app
First company to define the killer app gets to fill in the holes in the spec over the protests of the W3C and the hypertext community

XSL-FO

Slow but successful adoption; steady linear growth

XHTML Fails

Too complex
Too little tool support
Too poorly documented
Offer no benefits to web page authors; the only people benefited are the tool vendors

Schema Repositories all fail

Commerce One
UDDI
BizTalk
xml.org
etc.

MathML succeeds

Mozilla will save this

SVG Takes Off in 2001

Illustrator supports it now
Several browser plug-ins are available
Many tools
We needed this 10 years ago

Browser Support

We won't see reliable browser support for XML until at least 2002
Non-PC devices will become common; necessitating a move to browser-independent layout
Mozilla 2.0 knocks off IE

Invent the Future!

The best way to predict the future is to invent it.

--Alan Kay

To Learn More

This presentation: http://www.ibiblio.org/xml/slides/sd2000east/advancedxml
JDOM Web Site: http://www.jdom.org
XML InfoSet Specification: http://www.w3.org/TR/xml-infoset
XML Base Specification: http://www.w3.org/TR/xmlbase
XInclude Specification: http://www.w3.org/TR/xinclude
W3C Schema Primer: http://www.w3.org/TR/xmlschema-0/
XML Schema Part 1: Structures: http://www.w3.org/TR/xmlschema-1/
XML Schema Part 2: Datatypes: http://www.w3.org/TR/xmlschema-2/

Index | Cafe con Leche

Surname	FirstName	Team	Position	Games Played	Games Started	AtBats	Runs	Hits	Doubles	Triples	Home runs	RBI	Stolen Bases	Caught Stealing	Sacrifice Hits	Sacrifice Flies	Errors	PB	Walks	Strike outs	Hit by pitch
Anderson	Garret	ANA	Outfield	156	151	622	62	183	41	7	15	79	8	3	3	3	6	0	29	80	1
Baughman	Justin	ANA	Second Base	62	54	196	24	50	9	1	1	20	10	4	5	3	8	0	6	36	1
Bolick	Frank	ANA	Third Base	21	11	45	3	7	2	0	1	2	0	0	0	0	0	0	11	8	0
Disarcina	Gary	ANA	Shortstop	157	155	551	73	158	39	3	3	56	12	7	12	3	14	0	21	51	8
Edmonds	Jim	ANA	Outfield	154	150	599	115	184	42	1	25	91	7	5	1	1	5	0	57	114	1
Erstad	Darin	ANA	Outfield	133	129	537	84	159	39	3	19	82	20	6	1	3	3	0	43	77	6
Garcia	Carlos	ANA	Second Base	19	10	35	4	5	1	0	0	0	2	0	1	0	1	0	3	11	1
Glaus	Troy	ANA	Third Base	48	45	165	19	36	9	0	1	23	1	0	0	2	7	0	15	51	0
Greene	Todd	ANA	Outfield	29	15	71	3	18	4	0	1	7	0	0	0	0	0	0	2	20	0
Helfand	Eric	ANA	Catcher	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
Hollins	Dave	ANA	Third Base	101	98	363	60	88	16	2	11	39	11	3	2	2	17	0	44	69	7
Jefferies	Gregg	ANA	Outfield	19	18	72	7	25	6	0	1	10	1	0	0	0	0	0	0	5	0
Johnson	Mark	ANA	First Base	10	2	14	1	1	0	0	0	0	0	0	0	0	0	0	0	6	0
Kreuter	Chad	ANA	Catcher	96	74	252	27	63	10	1	2	33	1	0	5	1	9	5	33	49	3
Martin	Norberto	ANA	Second Base	79	50	195	20	42	2	0	1	13	3	1	3	2	4	0	6	29	0
Mashore	Damon	ANA	Outfield	43	24	98	13	23	6	0	2	11	1	0	1	0	0	0	9	22	3
Molina	Ben	ANA	Catcher	2	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
Nevin	Phil	ANA	Catcher	75	65	237	27	54	8	1	8	27	0	0	0	2	5	20	17	67	5
O'Brien	Charlie	ANA	Catcher	62	58	175	13	45	9	0	4	18	0	0	3	3	4	1	10	33	2
Palmeiro	Orlando	ANA	Outfield	74	34	165	28	53	7	2	0	21	5	4	7	0	0	0	20	11	0
Pritchett	Chris	ANA	First Base	31	19	80	12	23	2	1	2	8	2	0	0	0	1	0	4	16	0
Salmon	Tim	ANA	Designated Hitter	136	130	463	84	139	28	1	26	88	0	1	0	10	2	0	90	100	3
Shipley	Craig	ANA	Third Base	77	32	147	18	38	7	1	2	17	0	4	4	1	3	0	5	22	5
Velarde	Randy	ANA	Second Base	51	50	188	29	49	13	1	4	26	7	2	0	1	4	0	34	42	1
Walbeck	Matt	ANA	Catcher	108	91	338	41	87	15	2	6	46	1	1	5	5	7	8	30	68	2
Williams	Reggie	ANA	Outfield	29	7	36	7	13	1	0	1	5	3	3	1	0	0	0	7	11	1

Core	`org.w3c.dom` ^*
HTML	`org.w3c.dom.html`
Views	`org.w3c.dom.views`
StyleSheets	`org.w3c.dom.stylesheets`
CSS	`org.w3c.dom.css`
Events	`org.w3c.dom.events` ^*
Traversal	`org.w3c.dom.traversal` ^*
Range	`org.w3c.dom.range`