The Bleeding Edge of XML

The Bleeding Edge of XML

Elliotte Rusty Harold

XMLOne San Jose 2001 West

Monday, October 1, 2001

elharo@metalab.unc.edu

http://www.ibiblio.org/xml/

Outline

Part I: XML Infoset, Canonical XML, and Digital Signatures
Part II: XSLT 2.0 and Beyond
Part III: SAX 2.1
Part IV: DOM Level 3
Part V: JDOM
Part VI: The Oracle Speaks

Part I: XML Infoset

The Infoset is the unfortunate standard to which those in retreat from the radical and most useful implications of well-formedness have rallied. At its core the Infoset insists that there is 'more' to XML than the straightforward syntax of well-formedness. By imposing its canonical semantics the Infoset obviates the infinite other semantic outcomes which might be elaborated in particular unique circumstances from an instance of well-formed XML 1.0 syntax. The question we should be asking is not whether the Infoset has chosen the correct canonical semantics, but whether the syntactic possibilities of XML 1.0 should be curtailed in this way at all.

--Walter Perry on the xml-dev mailing list

A normal XML document

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/css" href="song.css"?>
<!DOCTYPE SONG SYSTEM "song.dtd">
<SONG xmlns="http://www.ibiblio.org/xml/namespace/song"
      xmlns:xlink="http://www.w3.org/1999/xlink">
  <TITLE>Hot Cop</TITLE>
  <PHOTO 
    xlink:type="simple" xlink:show="onLoad" xlink:href="hotcop.jpg"
    ALT="Victor Willis in Cop Outfit" WIDTH="100" HEIGHT="200"/>
  <COMPOSER>Jacques Morali</COMPOSER>
  <COMPOSER>Henri Belolo</COMPOSER>
  <COMPOSER>Victor Willis</COMPOSER>
  <PRODUCER>Jacques Morali</PRODUCER>
  <!-- The publisher is actually Polygram but I needed 
       an example of a general entity reference. -->
  <PUBLISHER xlink:type="simple" xlink:href="http://www.amrecords.com/">
    A &amp; M Records
  </PUBLISHER>
  <LENGTH>6:20</LENGTH>
  <YEAR>1978</YEAR>
  <ARTIST>Village People</ARTIST>
</SONG>
<!-- You can tell what album I was 
     listening to when I wrote this example -->

A canonical XML document

<?xml-stylesheet type="text/css" href="song.css"?>
<SONG xmlns="http://www.ibiblio.org/xml/namespace/song" xmlns:xlink="http://www.w3.org/1999/xlink">
  <TITLE>Hot Cop</TITLE>
  <PHOTO ALT="Victor Willis in Cop Outfit" HEIGHT="200" WIDTH="100" xlink:href="hotcop.jpg" xlink:show="onLoad" xlink:type="simple"></PHOTO>
  <COMPOSER>Jacques Morali</COMPOSER>
  <COMPOSER>Henri Belolo</COMPOSER>
  <COMPOSER>Victor Willis</COMPOSER>
  <PRODUCER>Jacques Morali</PRODUCER>
  
  <PUBLISHER xlink:href="http://www.amrecords.com/" xlink:type="simple">
    A &amp; M Records
  </PUBLISHER>
  <LENGTH>6:20</LENGTH>
  <YEAR>1978</YEAR>
  <ARTIST>Village People</ARTIST>
</SONG>

An org.w3c.dom.Document object formed by reading hotcop.xml

import org.apache.xerces.parsers.*;
import org.w3c.dom.*;
import org.xml.sax.*;
import java.io.*;

public class DOMHotCop {

  public static void main(String[] args) {

    DOMParser parser = new DOMParser();

    try {
      parser.parse("http://www.ibiblio.org/xml/examples/hot_cop.xml");
      Document d = parser.getDocument();
    }
    catch (SAXException e) {
      System.err.println(e);
    }
    catch (IOException e) {
      System.err.println(e);
    }

  }

}

Are these three the same thing or not?

The customary form of an XML document
The canonical form of an XML document
The object form of an XML document

What is the XML Infoset?

A W3C proposed recommendation providing "a consistent set of definitions for use in other specifications that need to refer to the information in a well-formed XML document." This is considerably weaker than originally planned.
What it used to be: A W3C standard for what is and is not significant in an XML document
Not everyone agrees that this is a good thing! or that this is the right list!

The Infoset defines 11 kinds of Information Items

The Document Information Item
Element Information Items
Attribute Information Items
Processing Instruction Information Items
Unparsed Entity Information Items
Unexpanded Entity Information Items
Character Information Items
Comment Information Items
The Document Type Declaration Information Item
Notation Information Items
Namespace Information Items

The Document Information Item

Represents the entire document; not just the root element
Properties:
- Children
  - One Element Information Item for the root element
  - One Comment Information Item for each Comment
  - One Processing Instruction Information Item for each Processing Instruction
- Document Element
- Character Encoding Scheme
- Notation Declarations
- Entity Declarations
- Base URI
- Standalone Declaration
- Version Declaration
- All declarations processed*

Elements

<PHOTO 
  xlink:type="simple" xlink:show="onLoad" xlink:href="hotcop.jpg"
  ALT="Victor Willis in Cop Outfit" WIDTH="100" HEIGHT="200"/>
  
<COMPOSER>Jacques Morali</COMPOSER>

<COMPOSER>
  <PERSON>
    <NAME>
      <FIRST>Henri</FIRST>
      <LAST>Belolo</LAST>
    </NAME>
  </PERSON>
</COMPOSER>

<rdf:RDF xmlns:rdf="http://www.w3.org/TR/REC-rdf-syntax#">
  <rdf:Description xmlns:dc="http://purl.org/dc/"
     about="http://www.ibiblio.org/examples/impressionists.xml">
    <dc:title> Impressionist Paintings </dc:title>
    <dc:creator> Elliotte Rusty Harold </dc:creator>
    <dc:description> 
      A list of famous impressionist paintings organized 
      by painter and date 
    </dc:description>
    <dc:date>2000-08-22</dc:date>
  </rdf:Description>
</rdf:RDF>

Element Information Items

An Element Information Item Includes:

namespace name; e.g. the absolute URI for the element's namespace
local name
prefix
children: a list of element, processing instruction, reference to skipped entity, character, and comment information items, one for each element, processing instruction, reference to an unprocessed external entity, data character, and comment appearing immediately within the current element
attributes: an unordered set of attribute information items, one for each of the attributes (specified or defaulted from the DTD) of this element. xmlns attributes declarations are not include.
namespace attributes: an unordered set of attribute information items, one for each of the namespaces declared either in the start-tag of this element or defaulted from the DTD.
in-scope namespaces: An unordered set of namespace information items, one for each of the namespaces in effect for this element
base URI: The absolute URI of the external entity in which this element appears, as defined in XML Base. If this is not known, this property is null.
parent

Attributes

xlink:type="simple"
xlink:href="http://www.amrecords.com/"
xlink:type =  "simple"
xlink:show = "onLoad"
xlink:href="hotcop.jpg"
ALT="Victor Willis in Cop Outfit"
WIDTH=" 100 "
HEIGHT=' 200 '

An Attribute Information Item Includes:

namespace name
local name
prefix
normalized value
specified: A flag indicating whether this attribute was actually specified in the start-tag of its element, or was defaulted from the DTD
attribute type:
- ID
- IDREF
- IDREFS
- ENTITY
- ENTITIES
- NMTOKEN
- NMTOKENS
- NOTATION
- CDATA
- ENUMERATED
owner element
references: only for IDREF, IDREFS, ENTITY, ENTITIES, and NOTATION type attributes; an ordered list of the things this attribute points to

Comments

  <!-- The publisher is actually Polygram but I needed 
       an example of a general entity reference. -->
<!--  <PUBLISHER xlink:type="simple" xlink:href="http://www.amrecords.com/">
    A &amp; M Records
  </PUBLISHER>
  <LENGTH>6:20</LENGTH>
  <YEAR>1978</YEAR>
  <ARTIST>Village People</ARTIST>
</SONG> -->
<!-- You can tell what album I was 
     listening to when I wrote this example -->

A comment Information Item includes:

content
parent

A Processing Instruction Information Item Includes:

<?robots index="yes" follow="no"?>
<?php 
  mysql_connect("database.unc.edu", "clerk", "password"); 
  $result = mysql("CYNW", "SELECT LastName, FirstName FROM Employees 
    ORDER BY LastName, FirstName"); 
  $i = 0;
  while ($i < mysql_numrows ($result)) {
     $fields = mysql_fetch_row($result);
     echo "<person>$fields[1] $fields[0] </person>\r\n";
     $i++;
  }
  mysql_close();
?>

target
content
notation
base URI
parent

Characters

A character is one Unicode character in the content of an element, attribute value, comment or processing instruction data.
A Character Information Item includes:

character code
The Unicode value in the range 0 to #x10FFFF of the character

element content whitespace
A flag indicating whether the character is whitespace appearing within element content

parent
Note that Unicode is not a two-byte character set

Namespaces

An element has one namespace information item for each namespace in scope on the element. This is not the same as the namespaces declared on the element.
A Namespace Information Item includes:
- prefix
- namespace name
There is no obvious representation of namespace information items in the syntax of an XML document.

These are namespace declaration attributes, not namespace information items:

xmlns:rdf="http://www.w3.org/TR/REC-rdf-syntax#"
xmlns:dc="http://purl.org/dc/"
xmlns="http://www.w3.org/2000/svg"

Consider this document:

<svg:svg width="5cm" height="4cm"
 xmlns:svg="http://www.w3.org/2000/svg">
  <svg:desc>Two rectangles</svg:desc>
  <svg:rect x="1.5cm" y="3.5cm" width="12cm" height="9.9cm"/>
  <svg:rect x="2.5cm" y="2.8cm" width="3cm" height="17cm"/>
</svg:svg>

Each of the four elements has a namespace information item with the prefix svg and the namespace name http://www.w3.org/2000/svg

Document Type Declaration

<!DOCTYPE SONG SYSTEM "song.dtd">
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
                       "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

A Document Type Declaration Information Item includes:

SYSTEM ID
PUBLIC ID
children: only the comment and processing instruction information items in the internal DTD subset and external DTD subsets.
parent

Document Type Definition

<!ELEMENT SONG (TITLE, PHOTO?, COMPOSER+, PRODUCER*, 
 PUBLISHER*, LENGTH?, YEAR?, ARTIST+)>
<!ATTLIST SONG xmlns       CDATA #REQUIRED
               xmlns:xlink CDATA #REQUIRED>
<!ELEMENT TITLE (#PCDATA)>

<!ELEMENT PHOTO EMPTY>
<!ATTLIST PHOTO xlink:type CDATA #FIXED "simple"
                xlink:href CDATA #REQUIRED
                xlink:show CDATA #IMPLIED
                ALT        CDATA #REQUIRED
                WIDTH      CDATA #REQUIRED
                HEIGHT     CDATA #REQUIRED
>

<!ELEMENT COMPOSER (#PCDATA)>
<!ELEMENT PRODUCER (#PCDATA)>
<!ELEMENT PUBLISHER (#PCDATA)>
<!ATTLIST PUBLISHER xlink:type CDATA #IMPLIED
                    xlink:href CDATA #IMPLIED
>

<!ELEMENT LENGTH (#PCDATA)>
<!-- This should be a four digit year like "1999",
     not a two-digit year like "99" -->
<!ELEMENT YEAR (#PCDATA)>

<!ELEMENT ARTIST (#PCDATA)>

There is no information item for this.
Comments and processing instructions in the DTD are reported as children of the Document Type Declaration information item
Notation and general entity declarations are reported as properties of the Document information item
Attribute types and default values are reported on the actual attributes in the document instance.
Everything else is not reported!

Entities

An XML document is made up of one or more physical storage units called entities
Entity references :
- Parsed internal general entity references like &
- Parsed external general entity references
- Unparsed external general entity references
- External parameter entity references
- Internal parameter entity references
Reading an XML document is not the same thing as reading an XML file

The XML file contains entity references.
The XML document contains the entities' replacement text.
When you use a parser to read a document you'll get the text including characters like <. You will not see the entity references.

Entity Information Items

Two kinds of entity information items:
- Unparsed Entity Information Item
- Unexpanded Entity Information Items
Other entities are not reported

Unparsed Entity Information Items

name
system identifier
public identifier
Notation

Unexpanded Entity Information Items

name
entity
parent

The Infoset Omits:

The internal and external DTD subsets; especially ELEMENT and ATTLIST declarations
Document encoding
CDATA sections
Character references
Expanded, parsed entity references
Whether an empty element uses two tags or one
What kind of quotes surround attributes
Insignificant white space in attributes
White space that occurs between attributes
Attribute order

Canonical XML

A W3C proposed standard serialization format of an XML document instance
Not everyone agrees that this is a good thing! or that this is the right format! It's totally unsuitable for editors and validation.
Based on the XPath data model
Not really Infoset compatible
Something of this nature is nonetheless clearly needed for non-XML aware tools like digital signatures, change management, hash functions, and the like.

How are documents canonicalized?

The document is encoded in UTF-8
Line breaks are normalized to a linefeed (ASCII , \n)
Attribute values are normalized, as if by a validating processor
Character and parsed entity references are replaced
CDATA sections are replaced with their character content
The XML and document type declarations are removed
Empty elements are converted to start tag-end tag pairs
White space outside of the document element and within start and end tags is normalized
All white space in character content is retained (except for characters removed during linefeed normalization)
Attribute value delimiters are set to double quotes
Special characters in attribute values and character content are replaced by character references
Superfluous namespace declarations are removed from each element
Default attributes are added to each element
Lexicographic order is imposed on the namespace declarations and attributes of each element

Canonicalization software

XML Canonicalizer from IBM's XML Security Suite: http://www.alphaworks.ibm.com/tech/xmlsecuritysuite
C14nDOM reads an XML document from stdin and writes the canonicalized output to stdout:
% java C14nDOM -xpath < hotcop.xml > canonicalized_hotcop.xml
-xpath option necessary to support October 26, 2000 working draft and later versions.

Digital Signatures

W3C/IETF Joint Proposed Recommendation, August 20, 2001
XML Signatures provide:

Integrity
Message authentication
Signer authentication

For data of any type

Not Just for Signing XML

Signed data can be located within the XML that includes the signature or elsewhere.
An enveloped signature is enclosed inside the XML element it signs
An enveloping signature signs XML data it contains.
A detached signature signs data external to the Signature element, possibly in another document entirely.

Signature Process

The signature processor digests a data object.
The processor places the digest value in a Signature element.
The processor digests the Signature element.
The processor cryptographically signs the Signature element.

XML Digital Signature software

SampleSign2 and VerifyGUI from IBM's XML Security Suite: http://www.alphaworks.ibm.com/tech/xmlsecuritysuite
First use the JDK's keytool to generate a key:
% keytool -genkey -dname "CN=Elliotte Rusty Harold, OU=Metrotech, O=Polytechnic, L=Brooklyn, S=New York, C=US" -alias elharo -storepass mypassword -keypass mykeypassword
SampleSign2 reads an XML document from stdin and writes the signature to stdout:
C:\> java SampleSign2 elharo mypassword mykeypassword -ext http://www.ibiblio.org/xml/slides/hoffman/fundamentals/examples/hotcop.xml > hotcop_signature.xml Key store: C:\Documents and Settings\Administrator\.keystore Sign: 7030ms
VerifyGUI reads signature from stdinand warns of changes to signed content.
C:\>java VerifyGUI < hotcop_signature.xml The signature has a KeyValue element. The signature has one or more X509Data elements. Checks an X509Data: It has 1 certificate(s). Certificate Information: Version: 1 Validity: OK SubjectDN: CN=Elliotte Rusty Harold, OU=Metrotech, O=Polytechnic, L=Brooklyn, ST=New York, C=US IssuerDN: CN=Elliotte Rusty Harold, OU=Metrotech, O=Polytechnic, L=Brooklyn, ST=New York, C=US Serial#: 983556890 Time to verify: 951 [msec]

A Detached Signature for hotcop.xml

<?xml version='1.0' encoding='UTF-8'?>
<Signature xmlns="http://www.w3.org/2000/09/xmldsig#">
  <SignedInfo>
    <CanonicalizationMethod Algorithm="http://www.w3.org/TR/2000/WD-xml-c14n-20000119"/>
    <SignatureMethod Algorithm="http://www.w3.org/2000/09/xmldsig#dsa-sha1"/>
    <Reference URI="http://www.ibiblio.org/xml/slides/hoffman/fundamentals/examples/hotcop.xml">
      <DigestMethod Algorithm="http://www.w3.org/2000/09/xmldsig#sha1"/>
      <DigestValue>nvfYilfgN/rICyzhGmjidKCFoC8=</DigestValue>
    </Reference>
  </SignedInfo>
  <SignatureValue>
    hfowa4qdbuMkoZfX1/VXd4UBpIpZMM5+6CElmY7jOIKFqvXq5A5VKw==
  </SignatureValue>
  <KeyInfo>
    <KeyValue>
      <DSAKeyValue>
        <P>
          /X9TgR11EilS30qcLuzk5/YRt1I870QAwx4/gLZRJmlFXUAiUftZPY1Y+r/F9bow9s
          ubVWzXgTuAHTRv8mZgt2uZUKWkn5/oBHsQIsJPu6nX/rfGG/g7V+fGqKYVDwT7g/bT
          xR7DAjVUE1oWkTL2dfOuK2HXKu/yIgMZndFIAcc=
        </P>
        <Q>l2BQjxUjC8yykrmCouuEC/BYHPU=</Q>
        <G>
          9+GghdabPd7LvKtcNrhXuXmUr7v6OuqC+VdMCz0HgmdRWVeOutRZT+ZxBxCBgLRJFn
          Ej6EwoFhO3zwkyjMim4TwWeotUfI0o4KOuHiuzpnWRbqN/C/ohNWLx+2J6ASQ7zKTx
          vqhRkImog9/hWuWfBpKLZl6Ae1UlZAFMO/7PSSo=
        </G>
        <Y>
          6jKpNnmkkWeArsn5Oeeg2njcz+nXdk0f9kZI892ddlR8Lg1aMhPeFTYuoq3I6neFlb
          BjWzuktNZKiXYBfKsSTB8U09dTiJo2ir3HJuY7eW/p89osKMfixPQsp9vQMgzph6Qa
          lY7j4MB7y5ROJYsTr1/fFwmj/yhkHwpbpzed1LE=
        </Y>
      </DSAKeyValue>
    </KeyValue>
    <X509Data>
      <X509IssuerSerial>
        <X509IssuerName>CN=Elliotte Rusty Harold, OU=Metrotech, O=Polytechnic, L=Brooklyn, ST=New York, C=US</X509IssuerName>
        <X509SerialNumber>983556890</X509SerialNumber></X509IssuerSerial>
      <X509SubjectName>CN=Elliotte Rusty Harold, OU=Metrotech, O=Polytechnic, L=Brooklyn, ST=New York, C=US</X509SubjectName>
      <X509Certificate>
MIIDLzCCAu0CBDqf4xowCwYHKoZIzjgEAwUAMH0xCzAJBgNVBAYTAlVTMREwDwYDVQQIEwhOZXcg
WW9yazERMA8GA1UEBxMIQnJvb2tseW4xFDASBgNVBAoTC1BvbHl0ZWNobmljMRIwEAYDVQQLEwlN
ZXRyb3RlY2gxHjAcBgNVBAMTFUVsbGlvdHRlIFJ1c3R5IEhhcm9sZDAeFw0wMTAzMDIxODE0NTBa
Fw0wMTA1MzExODE0NTBaMH0xCzAJBgNVBAYTAlVTMREwDwYDVQQIEwhOZXcgWW9yazERMA8GA1UE
BxMIQnJvb2tseW4xFDASBgNVBAoTC1BvbHl0ZWNobmljMRIwEAYDVQQLEwlNZXRyb3RlY2gxHjAc
BgNVBAMTFUVsbGlvdHRlIFJ1c3R5IEhhcm9sZDCCAbgwggEsBgcqhkjOOAQBMIIBHwKBgQD9f1OB
HXUSKVLfSpwu7OTn9hG3UjzvRADDHj+AtlEmaUVdQCJR+1k9jVj6v8X1ujD2y5tVbNeBO4AdNG/y
ZmC3a5lQpaSfn+gEexAiwk+7qdf+t8Yb+DtX58aophUPBPuD9tPFHsMCNVQTWhaRMvZ1864rYdcq
7/IiAxmd0UgBxwIVAJdgUI8VIwvMspK5gqLrhAvwWBz1AoGBAPfhoIXWmz3ey7yrXDa4V7l5lK+7
+jrqgvlXTAs9B4JnUVlXjrrUWU/mcQcQgYC0SRZxI+hMKBYTt88JMozIpuE8FnqLVHyNKOCjrh4r
s6Z1kW6jfwv6ITVi8ftiegEkO8yk8b6oUZCJqIPf4VrlnwaSi2ZegHtVJWQBTDv+z0kqA4GFAAKB
gQDqMqk2eaSRZ4Cuyfk556DaeNzP6dd2TR/2Rkjz3Z12VHwuDVoyE94VNi6ircjqd4WVsGNbO6S0
1kqJdgF8qxJMHxTT11OImjaKvccm5jt5b+nz2iwox+LE9Cyn29AyDOmHpBqVjuPgwHvLlE4lixOv
X98XCaP/KGQfClunN53UsTALBgcqhkjOOAQDBQADLwAwLAIUODqxsFzS96BjrVA4LVo5FzuWBRMC
FC0xfXxbaJaCJuVqtcBv4bqwV0EX
      </X509Certificate>
    </X509Data>
  </KeyInfo>
</Signature>

To Learn More

XML Infoset Specification: http://www.w3.org/TR/xml-infoset
Canonical XML Specification: http://www.w3.org/TR/xml-c14n
XML Signature Specification: http://www.w3.org/TR/xmldsig-core/

Part II: XSLT 2.0 and Beyond

In SQL, the query language is not expressed in tables and rows. In XQuery, the query language is not expressed in XML. Why is this a problem?

--Jonathan Robie on the xml-dev mailing list

XPath 2.0

Used for XSLT 2.0 and XQuery
Schema Aware

XPath 2.0 Goals

Simplify manipulation of XML Schema-typed content
Simplify manipulation of string content
Support related XML standards
Improve ease of use
Improve interoperability
Improve internationalization (i18n) support
Maintain backward compatibility
Enable improved processor efficiency

XPath 2.0 Requirements

Must express data model in terms of the Infoset
Must provide common core syntax and semantics for XSLT 2.0 and XML Query 1.0
Must support explicit "for any" or "for all" comparison and equality semantics
Must add min() and max() functions
Any valid XPath 1.0 expression SHOULD also be a valid XPath 2.0 expression when operating in the absence of XML Schema type information.
Should provide intersection and difference functions
Must loosen restrictions on location steps
Must provide a conditional expression (e.g. ternary ?: operator in Java and C)
Should support additional string functions, possibly including space padding, string replacement and conversion to upper or lower case
Must support regular expression string matching using the regexp syntax from schemas
Must add support for XML Schema primitive datatypes
Should add support for XML Schema structures

XSLT 2.0

Uses XPath 2.0
Schema Aware

XSLT 2.0 Goals

Simplify manipulation of XML Schema-typed content
Simplify manipulation of string content
Support related XML standards
Improve ease of use
Improve interoperability
Improve i18n support
Maintain backward compatibility
Enable improved processor efficiency

XSLT 2.0 Non-goals

Simplifying the ability to parse unstructured information to produce structured results.
Turning XSLT into a general-purpose programming language

XSLT 2.0 Requirements

Must maintain backwards compatibility with XSLT 1.1
Should be able to match elements and attributes whose value is explicitly null.
Should allow included documents to encapsulate local stylesheets
Could support accessing infoset items for XML declaration
Could provide qualified name aware string functions
Could enable constructing a namespace with computed name
Could simplify resolving prefix conflicts in qname-valued attributes
Could support XHTML output method
Must allow matching on default namespace without explicit prefix
Must add date formatting functions
Must simplify accessing IDs and keys in other documents
Should provide function to absolutize relative URIs
Should include unparsed text from an external resource
Should allow authoring extension functions in XSLT
Should output character entity references instead of numeric character entities
Should construct entity reference by name
Should support Unicode string normalization
Should standardize extension element language bindings
Could improve efficiency of transformations on large documents
Could support reverse IDREF attributes
Could support case-insensitive comparisons
Could support lexigraphic string comparisons
Could allow comparing nodes based on document order
Could improve support for unparsed entities
Could allow processing a node with the "next best matching" template
Could make coercions symmetric by allowing scalar to nodeset conversion
Must support XML schema
Must simplify constructing and copying typed content
Must support sorting nodes based on XML schema type
Could support scientific notation in number formatting
Could provide ability to detect whether "rich" schema information is available
Must simplify grouping

Some specific improvements that are likely

Multiple output documents
Variables can be set to node sets; no more result tree fragments.
Extension functions defined in style sheets with Java and ECMAScript
Existing elements and functions hardly change at all
And just maybe standard Java and JavaScript bindings for extension functions

Identifying 2.0 compliant stylesheets

Namespace is still http://www.w3.org/1999/XSL/Transform
version attribute of xsl:stylesheet has value 2.0

<xsl:stylesheet version="2.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <!-- Top level elements -->

</xsl:stylesheet>

No result tree fragments

The result tree fragment data-type has been eliminated.
Variable-binding elements with content now construct node-sets
These node sets can now be operated on by templates
Functionality previously available with saxon:nodeSet() and similar extension functions

Multiple Output Documents

Allows you to generate multiple documents from one source document
Previously available with extension functions like xt:document and saxon:output

Syntax modeled on xsl:output

<xsl:document
    href = { uri-reference }
    method = { "xml" | "html" | "text" | qname-but-not-ncname }
    version = { nmtoken }
    encoding = { string }
    omit-xml-declaration = { "yes" | "no" }
    standalone = { "yes" | "no" }
    doctype-public = { string }
    doctype-system = { string }
    cdata-section-elements = { qnames }
    indent = { "yes" | "no" }
    media-type = { string }
    <!-- Content: template -->
</xsl:document>

xsl:document Example

Partially supported by Saxon 6.2 and later

     <xsl:document method="html" encoding="ISO-8859-1" href="index.html">
       <html>
         <head>
           <title><xsl:value-of select="title"/></title>         
         </head>
         <body> 
           <h1 align="center"><xsl:value-of select="title"/></h1> 
           <ul>
             <xsl:for-each select="slide">
               <li><a href="{format-number(position(),'00')}.html"><xsl:value-of select="title"/></a></li>
             </xsl:for-each>    
           </ul>           
           
           <p><a href="{translate(title,' ', '_')}.html">Entire Presentation as Single File</a></p>
              
           <hr/>
           <div align="center">
             <A HREF="01.html">Start</A> | <A HREF="/xml/">Cafe con Leche</A>
           </div>
           <hr/>
           <font size="-1">
              Copyright 2001 
              <a href="http://www.macfaq.com/personal.html">Elliotte Rusty Harold</a><br/>       
              <a href="mailto:elharo@metalab.unc.edu">elharo@metalab.unc.edu</a><br/>
              Last Modified <xsl:apply-templates select="last_modified" mode="lm"/>
           </font>
         </body>     
       </html>     
     </xsl:document>

xsl:script Top-level Element

Defines an extension function, possibly inline

Syntax:

<xsl:script
  implements-prefix = ncname
  language = "ecmascript" | "javascript" | "java" | qname-but-not-ncname
  src = uri-reference
  archive = uri-references>
  <!-- Content: #PCDATA -->
</xsl:script>

Partially supported by Saxon 6.2 for Java only

xsl:script with Java

<?xml version="1.0"?>
<xsl:stylesheet version="1.1"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:date="http://www.cafeconleche.org/ns/"
>

  <xsl:template match="/">
    <xsl:value-of select="date:new()"/>
  </xsl:template>

  <xsl:script
    implements-prefix="date"
    language="java"
    src="java:java.util.Date"
  />

</xsl:stylesheet>

xsl:script with JavaScript

<?xml version="1.0"?>
<xsl:stylesheet version="1.1"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:date="http://www.cafeconleche.org/ns/date"
>

  <xsl:template match="/">
    <xsl:value-of select="date:clock()"/>
  </xsl:template>

  <xsl:script
    implements-prefix="date"
    language="javascript">
    
    function clock() {
      var time = new Date();
      var hours = time.getHours();
      var min = time.getMinutes();
      var sec = time.getSeconds();
      var status = "AM";
      if (hours > 11) {
        status = "PM";
      }
      if (hours &lt; 11) {
        hours -= 12;
      }
      if (min &lt; 10) {
        min = "0" + min;
      }
      if (sec &lt; 10) {
        sec = "0" + sec;
      }
      return hours + ":" + min + ":" + sec + " " + status;
   }
   
  </xsl:script>  

</xsl:stylesheet>

XQuery

Three parts:

A data model for XML documents based on the XML Infoset
A mathematically precise query algebra; that is, a set of query operators on that data model
A query language based on these query operators and this algebra

XQuery Language

A fourth generation declarative language like SQL; not a procedural language like Java or a functional language like XSLT
Queries operate on single documents or fixed collections of documents.
Queries select whole documents or subtrees of documents that match conditions defined on document content and structure
Can construct new documents based on what is selected
No updates or inserts!

Documents to Query

Narrative documents and collections of such documents; e.g. generate a table of contents for a book
Data-oriented documents; e.g. SQL-like queries of an XML dump of a database
Filtering streams to process logs of email messages, network packets, stock market data, newswire feeds, EDI, or weather data to filter and route messages represented in XML, to extract data from XML streams, or to transform data in XML streams.
XML views of non-XML data

Physical Representations to Query

Files on a disk
Native-XML databases like Software AG's Tamino
DOM trees in memory
Streaming data
Other representations of the infoset

Where is XQuery used?

Direct query tools at command line
GUI query tools
JSP, ASP, PHP, and other such server side technologies
Programs written in Java, C++, and other languages that need to extract data from XML documents
Others are possible
Anywhere SQL is used to extract data from a database, XQuery is used to extract data from an XML document.
SQL is a non-compiled language that must be processed by some other tool to extract data from a database. So is XQuery.

The XML Model vs. the Relational Model

A relational database contains tables	An XML database contains collections
A relational table contains records with the same schema	A collection contains XML documents with the same DTD
A relational record is an unordered list of named values	An XML document is a tree of nodes
A SQL query returns an unordered set of records	An XQuery returns an ordered node set

Query Data Types

XML 1.0 #PCDATA
Schema primitive types: positiveInteger, String, float, double, unsignedLong, gYear, date, time, boolean, etc.
Schema complex types
Collections of these types
References to these types

An example document to query

Most of the examples in this talk query this bibliography document at the (relative) URL bib.xml:

<bib>
<book year="1994">
<title>TCP/IP Illustrated</title>
<author><last>Stevens</last><first>W.</first></author>
<publisher>Addison-Wesley</publisher>
<price> 65.95</price>
</book>

<book year="1992">
<title>Advanced Programming in the Unix Environment</title>
<author><last>Stevens</last><first>W.</first></author>
<publisher>Addison-Wesley</publisher>
<price>65.95</price>
</book>

<book year="2000">
<title>Data on the Web</title>
<author><last>Abiteboul</last><first>Serge</first></author>
<author><last>Buneman</last><first>Peter</first></author>
<author><last>Suciu</last><first>Dan</first></author>
<publisher>Morgan Kaufmann Publishers</publisher>
<price> 39.95</price>
</book>

<book year="1999">
<title>The Economics of Technology and Content for Digital TV</title>
<editor>
<last>Gerbarg</last><first>Darcy</first>
<affiliation>CITI</affiliation>
</editor>
<publisher>Kluwer Academic Publishers</publisher>
<price>129.95</price>
</book>

</bib>

Adapted from Mary Fernandez, Jerome Simeon, and Phil Wadler: XML Query Languages: Experiences and Exemplars, 1999, as adapted in XML Query Use Cases

The XQuery FLWR

FOR: each node selected by an XPath 2.0 location path
LET: a new variable have a specified value
WHERE: a condition expressed in XPath is true
RETURN: this node set

Query: List titles of all books

   FOR $t IN document("bib.xml")/bib/book/title
   RETURN
      $t

Adapted from XML Query Use Cases

Query Result: Book Titles

  <title>TCP/IP Illustrated</title>
  <title>Advanced Programming in the Unix Environment</title>
  <title>Data on the Web</title>
  <title>The Economics of Technology and Content for Digital TV</title>

Adapted from XML Query Use Cases

XQueryX

An XML Syntax for XQuery
Intended for machine processing and programmer convenience, not for human legibility

In XQuery:

   FOR $t IN document("bib.xml")/bib/book/title
   RETURN
      $t

In XQueryX:

<?xml version="1.0"?>
<xq:query xmlns:xq="http://www.w3.org/2001/06/xqueryx">
  <xq:flwr>
    <xq:forAssignment variable="$t">
      <xq:step axis="CHILD">
        <xq:function name="document">
          <xq:constant datatype="CHARSTRING">bib.xml</xq:constant>
        </xq:function>
        <xq:identifier>bib</xq:identifier>
      </xq:step>
      <xq:step axis="CHILD">
        <xq:identifier>book</xq:identifier>
      </xq:step>
      <xq:step axis="CHILD">
        <xq:identifier>title</xq:identifier>
      </xq:step>
    </xq:forAssignment>
    <xq:return>
      <xq:variable>$b</xq:variable>
    </xq:return>
  </xq:flwr>
</xq:query>

Element Constructors

Tags are given as literals
XQuery expression which is evaluated to become the contents of the element is enclosed in curly braces
The contents can also contain literal text outside the braces

List titles of all books in a bib element. Put each title in a book element.

<bib>
  {
   FOR $t IN document("bib.xml")/bib/book/title
   RETURN
    <book>
     { $t }
    </book>
  }
</bib>

Adapted from XML Query Use Cases

Query Result: Book Titles

<bib>
  <book>
    <title>TCP/IP Illustrated</title>
  </book>
  <book>
    <title>Advanced Programming in the Unix Environment</title>
  </book>
  <book>
    <title>Data on the Web</title>
  </book>
  <book>
    <title>The Economics of Technology and Content for Digital TV</title>
  </book>
</bib>

Adapted from XML Query Use Cases

Query with WHERE

List titles of books published by Addison-Wesley

<bib>
 {
   FOR $b IN document("bib.xml")/bib/book
   WHERE $b/publisher = "Addison-Wesley"
   RETURN
      $b/title 
  }
</bib>

This WHERE clause could be replaced by an XPath predicate:

<bib>
 {
   FOR $b IN document("bib.xml")/bib/book[publisher="Addison-Wesley"]
   RETURN
      $b/title 
 }
</bib>

But WHERE clauses can combine multiple variables from multiple documents

Adapted from XML Query Use Cases

Query Result: Titles of books published by Addison-Wesley

<bib>
    <title>TCP/IP Illustrated</title>
    <title>Advanced Programming in the Unix Environment</title>
</bib>

Adapted from XML Query Use Cases

Query with Booleans

XQuery booleans include:
- AND
- OR
- NOT()

List books published by Addison-Wesley after 1993:

<bib>
 {
   FOR $b IN document("bib.xml")/bib/book
   WHERE $b/publisher = "Addison-Wesley" AND $b/@year > 1993
   RETURN
      $b/title 
 }
</bib>

Adapted from XML Query Use Cases

Query Result: books published by Addison-Wesley after 1993

<bib>
    <title>Advanced Programming in the Unix Environment</title>
</bib>

Adapted from XML Query Use Cases

Attribute Constructors

List books published by Addison-Wesley after 1993, including their year and title:

<bib>
 {
   FOR $b IN document("bib.xml")/bib/book
   WHERE $b/publisher = "Addison-Wesley" AND $b/@year > 1993
   RETURN
    <book year = { $b/@year }>
     { $b/title }
    </book>
 }
</bib>

This is not well-formed XML!

Adapted from XML Query Use Cases

Query Result: books published by Addison-Wesley after 1993, including their year and title.

<bib>
  <book year="1992">
    <title>Advanced Programming in the Unix Environment</title>
  </book>
</bib>

Adapted from XML Query Use Cases

Query with multiple variables

Create a list of all the title-author pairs, with each pair enclosed in a result element.

<results>
 {
   FOR $b IN document("bib.xml")/bib/book,
     $t IN $b/title,
     $a IN $b/author
   RETURN
    <result>
    { $t }
    { $a }
    </result>
  }
</results>

Adapted from XML Query Use Cases

Query Result: A list of all the title-author pairs

<results>
    <result>
         <title>TCP/IP Illustrated</title>
         <author><last>Stevens</last><first>W.</first></author>
    </result>
    <result>
         <title>Advanced Programming in the Unix Environment</title>
         <author><last>Stevens</last><first>W.</first></author>
    </result>
    <result>
         <title>Data on the Web</title>
         <author><last>Abiteboul</last><first>Serge</first></author>
    </result>
    <result>
         <title> Data on the Web</title>
         <author><last>Buneman</last><first>Peter</first></author>
    </result>
    <result>
         <title>Data on the Web</title>
         <author><last>Suciu</last><first>Dan</first></author>
    </result>
</results>

Adapted from XML Query Use Cases

Nested Queries

For each book in the bibliography, list the title and authors, grouped inside a result element.

<results>
 {
   FOR $b IN document("bib.xml")/bib/book
   RETURN
    <result>
     { $b/title }
     {  
       FOR $a IN $b/author
       RETURN $a
     }
    </result>
 }
</results>

Adapted from XML Query Use Cases

Query Result: A list of the title and authors of each book in the bibliography

<?xml version="1.0"?>
<results xmlns:ino="http://namespaces.softwareag.com/tamino/response2" xmlns:xql="http://metalab.unc.edu/xql/">
  <result>
    <title>TCP/IP Illustrated</title>
    <author>
      <last>Stevens</last>
      <first>W.</first>
    </author>
  </result>
  <result>
    <title>Advanced Programming in the Unix Environment</title>
    <author>
      <last>Stevens</last>
      <first>W.</first>
    </author>
  </result>
  <result>
    <title>Data on the Web</title>
    <author>
      <last>Abiteboul</last>
      <first>Serge</first>
    </author>
    <author>
      <last>Buneman</last>
      <first>Peter</first>
    </author>
    <author>
      <last>Suciu</last>
      <first>Dan</first>
    </author>
  </result>
  <result>
    <title>The Economics of Technology and Content for Digital TV</title>
  </result>
</results>

Adapted from XML Query Use Cases

Query with distinct

For each author in the bibliography, list the author's name and the titles of all books by that author, grouped inside a result element.

<results>
 {
   FOR $a IN distinct(document("bib.xml")//author)
   RETURN
    <result>
     { $a }
     {  FOR $b IN document("bib.xml")/bib/book[author=$a]
        RETURN $b/title
     }
    </result>
 }
</results>

Adapted from XML Query Use Cases

Query Result

<results>
  <result>
    <author><last>Stevens</last><first>W.</first></author>
    <title>TCP/IP Illustrated</title>
    <title>Advanced Programming in the Unix Environment</title>
  </result>

  <result>
    <author><last>Abiteboul</last><first>Serge</first></author>
    <title>Data on the Web</title>
  </result>

  <result>
    <author><last>Buneman</last><first>Peter</first></author>
    <title>Data on the Web</title>
  </result>

  <result>
    <author><last>Suciu</last><first>Dan</first></author>
      <title>Data on the Web</title>
  </result>
</results>

Adapted from XML Query Use Cases

Query with sorting

List the titles and years of all books published by Addison-Wesley after 1991, in alphabetic order.

<bib>
 {
   FOR $b IN document("bib.xml")//book
    [publisher = "Addison-Wesley" AND @year > "1991"]
   RETURN
    <book>
     { $b/@year } { $b/title }
    </book> SORTBY (title)
 }
</bib>

Adapted from XML Query Use Cases

Query Result

<bib>
  <book year="1992">
    <title>Advanced Programming in the Unix Environment</title>
  </book>
  <book year="1994">
    <title>TCP/IP Illustrated</title>
   </book>
</bib>

Adapted from XML Query Use Cases

Queries with functions

Find books in which some element has a tag ending in "or" and the same element contains the string "Suciu" (at any level of nesting). For each such book, return the title and the qualifying element.

<result xmlns:xf="http://www.w3.org/2001/08/xquery-operators">
  FOR $b IN document("bib.xml")//book,
    $e IN $b/*[contains(string(.), "Suciu")]
  WHERE xf:ends_with(name($e), "or") 
  RETURN
   <book>
    { $b/title} { $e }
   </book>
</result>

Not supported by Quip yet

Adapted from XML Query Use Cases

Query Result

<result>
 <book>
  <title> Data on the Web </title>
  <author> <last> Suciu </last> <first> Dan </first> </author>
 </book>
</result>

Adapted from XML Query Use Cases

Tentative Function List

Numeric Constructors

xf:decimal
xf:integer
xf:long
xf:int
xf:short
xf:byte
xf:float
xf:double

Functions on Numeric Values

xf:floor
xf:ceiling
xf:round

String Constructors

xf:string
xf:normalizedString
xf:token
xf:language
xf:Name
xf:NMTOKEN
xf:NCName
xf:ID
xf:IDREF
xf:ENTITY

Equality and Comparison of Strings

xf:codepoint-compare
xf:compare

Functions on String Values

xf:concat
xf:starts-with
xf:ends-with
xf:codepoint-contains
xf:contains
xf:substring
xf:string-length
xf:codepoint-substring-before
xf:substring-before
xf:codepoint-substring-after
xf:substring-after
xf:normalize-space
xf:normalize-unicode
xf:upper-case
xf:lower-case
xf:translate
xf:string-pad-beginning
xf:string-pad-end
xf:match
xf:replace

Boolean Constructors

xf:true
xf:false
xf:boolean-from-string

Functions on Boolean Values

xf:not

Duration and Datetime Constructors

xf:duration
xf:dateTime
xf:date
xf:time
xf:gYearMonth
xf:gYear
xf:gMonthDay
xf:gMonth
xf:gDay
xf:currentDateTime

Component Extraction Functions on Datetime Values

xf:get-Century-from-dateTime
xf:get-Century-from-date
xf:get-Century-from-gYear
xf:get-Century-from-gYearMonth
xf:get-gYear-from-dateTime
xf:get-gYear-from-date
xf:get-gYear-from-gYearMonth
xf:get-gMonth-from-dateTime
xf:get-gMonth-from-date
xf:get-gMonth-from-gYearMonth
xf:get-gMonth-from-gMonthDay
xf:get-gDay-from-dateTime
xf:get-gDay-from-date
xf:get-gDay-from-gMonthDay
xf:get-hour-from-dateTime
xf:get-hour-from-time
xf:get-minutes-from-dateTime
xf:get-minutes-from-time
xf:get-seconds-from-dateTime
xf:get-seconds-from-time
xf:get-timezone-from-dateTime
xf:get-timezone-from-date
xf:get-timezone-from-time
xf:get-timezone-from-gYear
xf:get-timezone-from-gYearMonth
xf:get-timezone-from-gMonth
xf:get-timezone-from-gMonthDay
xf:get-timezone-from-gDay

Component Extraction Functions on Duration Values

xf:get-years
xf:get-months
xf:get-days
xf:get-hours
xf:get-minutes
xf:get-seconds

Arithmetic Functions on Dates

xf:add-days
xf:add-months
xf:add-years
xf:add-gMonth
xf:add-gYear

Functions on TimePeriod Values

xf:get-duration
xf:get-end
xf:get-start
xf:temporal-dateTimes-contains
xf:temporal-dateTimeDuration-contains
xf:temporal-durationDateTime-contains

Constructors for QNames

xf:QName-from-uri
xf:QName-from-prefix
xf:QName

Functions on QNames

xf:get-local-name
xf:get-namespace-uri

Constructor for anyURI

xf:anyURI

NOTATION Constructor

xf:NOTATION

Functions on Nodes

xf:local-name
xf:namespace-uri
xf:number
xf:node-equal
xf:value-equal
xf:node-before
xf:node-after
xf:copy
xf:shallow
xf:boolean

Constructors on Sequences

TO

Functions on Sequences

xf:position
xf:last
xf:item-at
xf:index-of
xf:empty
xf:exists
xf:identity-distinct
xf:value-distinct
xf:sort
xf:reverse-sort
xf:insert
xf:sublist-before
xf:sublist-after
xf:sublist
xf:sequence-pad-beginning
xf:sequence-pad-end
xf:truncate-beginning
xf:truncate-end
xf:resize-beginning
xf:resize-end
xf:unordered

Equals, Union, Intersection and Except

xf:sequence-value-equal
xf:sequence-node-equal
xf:union
xf:union-all
xf:intersect
xf:intersect-all
xf:except
xf:except-all

Aggregate Functions

xf:count
xf:avg
xf:max
xf:min
xf:sum

Functions that Generate Sequences

xf:id
xf:idref
xf:filter
xf:document

Casting Functions

Casting to string and its derived types
Casting to numeric types
Casting to datetime and duration types
Casting to all other simple types

Miscellaneous casting functions

xf:boolean
xf:string

A different document about books

Sample data at "reviews.xml":

<reviews>
  <entry>
    <title>Data on the Web</title>
    <price>34.95</price>
    <review>
       A very good discussion of semi-structured database
       systems and XML.
    </review>
  </entry>
  <entry>
    <title>Advanced Programming in the Unix Environment</title>
    <price>65.95</price>
    <review>
      A clear and detailed discussion of UNIX programming.
    </review>
  </entry>
  <entry>
    <title>TCP/IP Illustrated</title>
    <price>65.95</price>
    <review>
      One of the best books on TCP/IP.
    </review>
  </entry>
</reviews>

Adapted from XML Query Use Cases

This document uses a different DTD

<!ELEMENT reviews (entry*)>
<!ELEMENT entry   (title, price, review)>
<!ELEMENT title   (#PCDATA)>
<!ELEMENT price   (#PCDATA)>
<!ELEMENT review  (#PCDATA)>

Query that joins two documents

For each book found in both bib.xml and reveiws.xml, list the title of the book and its price from each source.

<books-with-prices>
 {
   FOR $b IN document("bib.xml")//book,
     $a IN document("reviews.xml")//entry
   WHERE $b/title = $a/title
   RETURN
    <book-with-prices>
     { $b/title },
       <price-amazon> { $a/price/text() } </price-amazon>
       <price-bn> { $b/price/text() } </price-bn>
    </book-with-prices>
 }
</books-with-prices>

Adapted from XML Query Use Cases

Result

<books-with-prices>
  <book-with-prices>
    <title>TCP/IP Illustrated</title>
    <price-amazon>65.95</price-amazon>
    <price-bn>65.95</price-bn>
  </book-with-prices>

  <book-with-prices>
    <title>Advanced Programming in the Unix Environment</title>
    <price-amazon>65.95</price-amazon>
    <price-bn>65.95</price-bn>
  </book-with-prices>

  <book-with-prices>
    <title>Data on the Web</title>
    <price-amazon>34.95</price-amazon>
    <price-bn>39.95</price-bn>
  </book-with-prices>
</books-with-prices>

Adapted from XML Query Use Cases

prices.xml Query Sample Data

The next query also uses an input document named "prices.xml":

<prices>
  <book>
    <title>Advanced Programming in the Unix Environment</title>
    <source>www.amazon.com</source>
    <price>65.95</price>
  </book>
  <book>
    <title>Advanced Programming in the Unix Environment </title>
    <source>www.bn.com</source>
    <price>65.95</price>
  </book>
  <book>
    <title>TCP/IP Illustrated </title>
    <source>www.amazon.com</source>
    <price>65.95</price>
  </book>
  <book>
    <title>TCP/IP Illustrated </title>
    <source>www.bn.com</source>
    <price>65.95</price>
  </book>
  <book>
    <title>Data on the Web</title>
    <source>www.amazon.com</source>
    <price>34.95</price>
  </book>
  <book>
    <title>Data on the Web</title>
    <source>www.bn.com</source>
    <price>39.95</price>
  </book>
</prices>

Adapted from XML Query Use Cases

Query with reused variables

In the document "prices.xml", find the minimum price for each book, in the form of a minprice element with the book title as its title attribute.

<results>
 {
   FOR $t IN distinct(document("prices.xml")/book/title)
   LET $p := $doc/book[title = $t]/price
   RETURN
    <minprice title = { $t/text() } >
     { min($p) }
    </minprice>
 }
</results>

Adapted from XML Query Use Cases

Query Result

<results>
  <minprice title="Advanced Programming in the Unix Environment"> 65.95 </minprice>
  <minprice title="TCP/IP Illustrated"> 65.95 </minprice>
  <minprice title="Data on the Web"> 34.95 </minprice>
</results>

Adapted from XML Query Use Cases

Multiple FLWR Queries

For each book with an author, return a book with its title and authors. For each book with an editor, return a reference with the book title and the editor's affiliation.

<bib>
 {
   FOR $b IN document("bib.xml")//book[author]
   RETURN
    <book>
     { $b/title }
     { $b/author }
    </book>,
   FOR $b IN document("bib.xml")//book[editor]
   RETURN
    <reference>
     { $b/title }
     <org> { $b/editor/affiliation/text() } </org>
    </reference>
 }
</bib>

Adapted from XML Query Use Cases

Query Result

<bib>
    <book>
         <title>TCP/IP Illustrated</title>
         <author><last> Stevens </last> <first> W.</first></author>
    </book>

    <book>
         <title>Advanced Programming in the Unix Environment</title>
         <author><last>Stevens</last><first>W.</first></author>
    </book>

    <book>
         <title>Data on the Web</title>
         <author><last>Abiteboul</last><first>Serge</first></author>
         <author><last>Buneman</last><first>Peter</first></author>
         <author><last>Suciu</last><first>Dan</first></author>
    </book>

    <reference>
        <title>The Economics of Technology and Content for Digital TV</title>
        <org>CITI</org>
    </reference>
</bib>

Adapted from XML Query Use Cases

Query Software

Quip: http://www.softwareag.com/developer/quip/
Quilt: http://www.almaden.ibm.com/cs/people/chamberlin/quilt.html
Kweelt: http://db.cis.upenn.edu/Kweelt/
Tamino: http://www.softwareag.com/tamino/
Ipedo: http://www.ipedo.com/

To Learn More

This presentation: http://www.ibiblio.org/xml/slides/xmlonesanjose2001/advancedxml/
XSLT 1.1 Working Draft: http://www.w3.org/TR/xslt11/
XPath 2.0 Requirements: http://www.w3.org/TR/2001/WD-xpath20req-20010214
XSLT 2.0 Requirements: http://www.w3.org/TR/2001/WD-xslt20req-20010214
XQuery: A Query Language for XML: http://www.w3.org/TR/xquery/
XML Query Requirements: http://www.w3.org/TR/xmlquery-req
XML Query Use Cases: http://www.w3.org/TR/xmlquery-use-cases
XML Query Data Model: http://www.w3.org/TR/query-datamodel/
The XML Query Algebra: http://www.w3.org/TR/query-algebra/
XML Syntax for XQuery 1.0 (XQueryX): http://www.w3.org/TR/xqueryx
XQuery 1.0 and XPath 2.0 Functions and Operators Version 1.0: http://www.w3.org/TR/xquery-operators/

Part III: SAX 2.1

Actually, SAX2 has ** MUCH ** better infoset support than DOM does. Yes, I've done the detailed analysis.

--David Brownell on the xml-dev mailing list

Goals

Full Infoset support
Backwards compatible with SAX2
Much less radical changes than from SAX1 to SAX2

Specified vs. Defaulted Attributes

Infoset includes a flag saying whether a given attribute value was specified in the instance document or defaulted from the DTD.
DOM also wants to know this

Solution:

package org.xml.sax.ext;

public interface Attributes2 extends Attributes {

  public boolean isSpecified (int index);
  public boolean isSpecified (String uri, String localName);
  public boolean isSpecified (String qualifiedName);
  
}

This interface would be implemented by SAX 2.1 Attributes objects provided in startElement() callbacks
The read-only http://xml.org/sax/features/use-attributes2 feature specifies whether Attributes2 is available

standalone declaration

<?xml version="1.0" standalone="yes"?>

The XML Infoset includes a standalone property for documents
Not currently exposed by SAX2
Solution: Define a new read-only feature: http://xml.org/sax/features/is-standalone
Open issue: distinguish between standalone="no" and omitted standalone declaration?

The version and encoding properties

<?xml version="1.0" encoding="UTF-16"?>

Infoset includes the version and encoding from the XML declaration; SAX2 does not.
Unlike standalone, these apply to all parsed entities; not just the document entity
Solution:
```
package org.xml.sax.ext;

public interface Locator2 extends Locator {
  public String getXMLVersion ();
  public String getEncoding ();
}
```
This would be implemented by Locator objects passed to setDocumentLocator() methods

The read-only feature http://xml.org/sax/features/use-locator2 says whether Locator2's are used.
To make matters worse, there can be as many as three encodings:
- What's declared in the document using an encoding declaration in the XML declaration
- The MIME type encoding, as specified by the the HTTP header
- The name of the encoding used by a java.io.InputStreamReader (UTF8 vs. UTF-8)

Feature/Property discovery

There's no way to find out what features and properties a given XMLReader recognizes.
Solution: Define two new read-only properties:

http://xml.org/sax/properties/supported-features

The value of this property is an array of Strings containing the names of all the features supported by this XMLReader.

http://xml.org/sax/properties/supported-properties

The value of this property is an array of Strings containing the names of all the properties supported by this XMLReader.
Or perhaps a method instead of a property?

DefaultHandler infoset extensions

The DeclHandler and LexicalHandler extension handlers are not supported by the DefaultHandler convenience class.

Solution:

Define a new org.xml.sax.ext class implementing those two interfaces, inheriting from org.xml.sax.helpers.DefaultHandler

public class DefaultHandler2 extends DefaultHandler 
  implements DeclHandler, LexicalHandler { 
  
    // LexicalHandler methods
    public void startDTD(String name, String publicId, String systemId)
      throws SAXException {}
    public void endDTD() throws SAXException {} 
    public void startEntity(String name) throws SAXException {}
    public void endEntity(String name) throws SAXException {}
    public void startCDATA() throws SAXException {}
    public void endCDATA() throws SAXException {}
    public void comment(char[] ch, int start, int length)
      throws SAXException {}
      
    // DeclHandler methods
    public void elementDecl(String name, String model)
      throws SAXException {}
    public void attributeDecl(String elementName,
      String attributeName, String type,
      String valueDefault, String value)
      throws SAXException {}
    public void internalEntityDecl(String name, String value)
      throws SAXException {}
    public void externalEntityDecl(String name, String publicID, 
      String systemID) throws SAXException {}
  
}

Alternately, update DefaultHandler.

Parser identification

Problem: There is no conventional way for applications to identify the version of the parser they are using, for purposes of diagnostics or other kinds of troubleshooting.
The best the JVM supports is the JDK 1.2 java.lang.Package facility, which is dependent on the JAR file metadata. It provides a partial solution, at the price of portability (JDK 1.1 APIs are much more portable) and assumptions like "one parser per package".
Solution: Define a new standard read-only property:

http://xml.org/sax/properties/reader-version

Returns a string identifying the reader and its version for use in diagnostics.

Parsers could support that if desired, probably using some sort of resource-based mechanism (not necessarily Package) to keep such release-specific strings out of the source code.
Open issue: Should there be separate strings to ID the reader (likely a constant value) and its version (ideally assigned in release engineering)?

A Verifier Class as in JDOM

package org.jdom;

public final class Verifier {

    public static final String checkElementName(String name) {}
    public static final String checkAttributeName(String name) {}
    public static final String checkCharacterData(String text) {}
    public static final String checkNamespacePrefix(String prefix) {}
    public static final String checkNamespaceURI(String uri) {}
    public static final String checkProcessingInstructionTarget(String target) {}
    public static final String checkCommentData(String data) {}
 
    public static boolean isXMLCharacter(char c) {}
    public static boolean isXMLNameCharacter(char c) {}
    public static boolean isXMLNameStartCharacter(char c) {}
    public static boolean isXMLLetterOrDigit(char c) {}
    public static boolean isXMLLetter(char c) {}
    public static boolean isXMLCombiningChar(char c) {}
    public static boolean isXMLExtender(char c) {}
    public static boolean isXMLDigit(char c) {}

}

To Learn More

http://sax.sourceforge.net/
Subscribe to the xml-dev mailing list, http://lists.xml.org/archives/xml-dev/

Part III.V: XML Blueberry

Discriminating against languages simply because their scripts were not encoded in Unicode 2.0 is inherently unjust.

--XML Blueberry Requirements

XML Blueberry

Changes the definition of white space
Enables native language markup in Ethiopic, Burmese, and Cambodian
Breaks compatibility with XML 1.0

White Space

[3] S ::= (#x20 | #x9 | #xD | #xA)+
With Blueberry this becomes
[3] S ::= (#x20 | #x9 | #xD | #xA | #x85)+
Supports IBM mainframe editors
Breaks everybody else's software

Native language markup

Currently only scripts defined in Unicode 2.0 are allowed in XML element and attribute names
All scripts defined in Unicode are allowed in element and attribute content
Unicode 3.0 adds:
- Ethiopic (Amharic, Geez, etc.)
- Burmese
- Cambodian
- Mongolian
- Dvihehi
Also:
- Cherokee
- Canadian aboriginal languages
Perhaps:
- Japanese
- Cantonese
Is this enough to justify breaking compatibility?

Unicode still isn't finished

Ogham, Runic
Tengwar, Cirth
Unicode 4.0?

How to identify Blueberry

<xml version="1.1">
<xml version="1.0.1">
<xml version="1.0" unicode="3.1">
<xml version="1.0" blueberry="true">

Harm Reduction proposals

Mandate non-Blueberry for documents that don't use Blueberry
Well-formedness error?
Non-fatal error?

Part IV: DOM Level 3

of all of the things the W3C has given us, the DOM is probably the one with the least value.

--Michael Brennan on the xml-dev mailing list

DOM Evolution

DOM Level 0: what was implemented for JavaScript in Netscape 3/IE3
DOM Level 1, a W3C Standard
DOM Level 2, a W3C Standard
DOM Level 3: Several Working Drafts:

New Features in DOM Level 3

Grammar access; a.k.a abstract schemas (DTDs, TREX schemas, W3C Schema Language schemas)
Extra attributes on Entity, Document, Node, and Text interfaces
Standard means of loading and saving XML documents.
Bootstrapping new documents
Key events

DOM Level 3 Core Changes

DOMKey new
Node
Document
Text
Entity
Bootstrapping

DOMKey

Every node gets a unique key automatically generated by the DOM implementation to uniquely identify DOM nodes.
Type, attributes, and methods of the DOMKey interface remain to be determined for Java. It will likely just be declared an Object.
A Number in JavaScript

New methods in Node interface

Adds:

Base URI

The URI this document came from. May be null.

Document order

The order of a node relative to another reference node in document order

Tree position

The order of a node relative to another reference node in tree order

Methods to test for equality

Methods to work with namespaces
I will only show the new methods. Currently, the plan is to simply add these to the existing Node interface.

Java binding:

package org.w3c.dom;

public interface Node {

  public String getBaseURI();

  public static final int DOCUMENT_ORDER_PRECEDING = 1;
  public static final int DOCUMENT_ORDER_FOLLOWING = 2;
  public static final int DOCUMENT_ORDER_SAME      = 3;
  public static final int DOCUMENT_ORDER_UNORDERED = 4;
  
  public int compareDocumentOrder(Node other) throws DOMException;

  public static final int TREE_POSITION_PRECEDING  = 1;
  public static final int TREE_POSITION_FOLLOWING  = 2;
  public static final int TREE_POSITION_ANCESTOR   = 3;
  public static final int TREE_POSITION_DESCENDANT = 4;
  public static final int TREE_POSITION_SAME       = 5;
  public static final int TREE_POSITION_UNORDERED  = 6;

  public int compareTreePosition(Node other) throws DOMException;

  public String getTextContent();
  public void   setTextContent(String textContent);

  public boolean isSameNode(Node other);
  public boolean equalsNode(Node arg, boolean deep);

  public String lookupNamespacePrefix(String namespaceURI);
  public String lookupNamespaceURI(String prefix);
  public void   normalizeNS();
  public Node   getAs(String feature);

  public Object getKey();

}

In IDL:

interface Node {

  readonly attribute DOMString baseURI;

  typedef enum _DocumentOrder {
    DOCUMENT_ORDER_PRECEDING,
    DOCUMENT_ORDER_FOLLOWING,
    DOCUMENT_ORDER_SAME,
    DOCUMENT_ORDER_UNORDERED
  };
  DocumentOrder;
  DocumentOrder compareDocumentOrder(in Node other) raises(DOMException);

  typedef enum _TreePosition {
    TREE_POSITION_PRECEDING,
    TREE_POSITION_FOLLOWING,
    TREE_POSITION_ANCESTOR,
    TREE_POSITION_DESCENDANT,
    TREE_POSITION_SAME,
    TREE_POSITION_UNORDERED
  };
  TreePosition;
  TreePosition compareTreePosition(in Node other) raises(DOMException);
           attribute DOMString textContent;
  readonly attribute DOMKey    key;
  
  boolean    isSameNode(in Node other);
  DOMString  lookupNamespacePrefix(in DOMString namespaceURI);
  DOMString  lookupNamespaceURI(in DOMString prefix);
  void       normalizeNS();
  boolean    equalsNode(in Node arg, in boolean deep);
  Node       getAs(in DOMString feature);
  }; 
 
};

New methods in Entity

XML documents may be built from multiple parsed entities, each of which is not necessarily a well-formed XML document, but is at least a plausible part of a well-formed XML document.
Each entity may have its own text declaration. This is like an XML declaration without a standalone attribute and with an optional version attribute:
```
<?xml version="1.0"?>
<?xml version="1.0" encoding="ISO-8859-9"?>
<?xml encoding="ISO-8859-9"?>
```
DOM3 adds:

encoding

A string specifying what encoding the the text declaration claims this entity uses. This is null if this entity is not an external parsed entity.

actualEncoding

A string specifying the actual encoding of this entity. This is null if this entity is not an external parsed entity.

version

A string specifying the XML version given in the text declaration. This is null if this entity is not an external parsed entity.

Java binding:


package org.w3c.dom;

public interface Entity extends Node {
  
  public String getActualEncoding();
  public void   setActualEncoding(String actualEncoding);
  public String getEncoding();
  public void   setEncoding(String encoding);
  public String getVersion();
  public void   setVersion();
  
}

In IDL:

interface Entity : Node {
  attribute DOMString  actualEncoding;
  attribute DOMString  encoding;
  attribute DOMString  version;
};

New methods in Document

Adds:
XML Declaration
encoding, version, and standalone attributes:
```
<?xml version="1.0"?>
<?xml version="1.0" encoding="ISO-8859-9"?>
<?xml version="1.0" encoding="ISO-8859-9" standalone="no"?>
<?xml version="1.0" standalone="yes"?>
```
adoptNode()

Method to move node from one document to another

setBaseURI()

Method to set the base URI of the document
In IDL:

Java binding:

package org.w3c.dom;

public interface Document extends Node {

  public String  getActualEncoding();
  public void    setActualEncoding(String actualEncoding);

  public String  getEncoding();
  public void    setEncoding(String encoding);

  public boolean getStandalone();
  public void    setStandalone(boolean standalone);

  public boolean getStrictErrorChecking();
  public void    setStrictErrorChecking(boolean strictErrorChecking);

  public String  getVersion();
  public void    setVersion(String version);

  public Node    adoptNode(Node source) throws DOMException;
  public void    setBaseURI(String baseURI) throws DOMException;

}

interface Document : Node {

  attribute DOMString actualEncoding;
  attribute DOMString encoding;
  attribute boolean   standalone;
  attribute boolean   strictErrorChecking;
  attribute DOMString version;
  
  Node adoptNode(in Node source) raises(DOMException);
  void setBaseURI(in DOMString baseURI) raises(DOMException);
  
};

New methods in Text

Adds:

isWhitespaceInElementContent()

Returns true if this node contains "ignorable" whitespace

wholeText()

Returns all text of Text nodes logically-adjacent to this node; i.e. the XPath value of the text node

Java binding:

package org.w3c.dom;
  
  public interface Text extends Node {
  
    public boolean getIsWhitespaceInElementContent();
    
    public String  getWholeText();
    public Text    replaceWholeText(String content) 
     throws DOMException;


  }

In IDL:

interface Text : Node {

  readonly attribute boolean   isWhitespaceInElementContent;
  readonly attribute DOMString wholeText;

  Text replaceWholeText(in DOMString content) raises(DOMException);

};

Bootstrapping

DOM2 has no implementation-independent means to create a new Document object
Implementation-dependent methods tend to be fairly complex. For example, in Xerces-J:
DOMImplementation impl = DOMImplementationImpl.getDOMImplementation(); Document fibonacci = impl.createDocument(null, "Fibonacci_Numbers", null);

DOM3 Bootstrapping

Still no language-independent means to create a new Document object
Does provide an implementation-independent method for Java only:
DOMImplementation impl = DOMImplementationRegistry.getDOMImplementation("XML");

package org.w3c.dom;

public class DOMImplementationRegistry { 

  // The system property to specify the DOMImplementationSource class names. 
  public static String PROPERTY = "org.w3c.dom.DOMImplementationSourceList";

  public static DOMImplementation getDOMImplementation(String features)
   throws ClassNotFoundException, InstantiationException, IllegalAccessException;
  public static void addSource(DOMImplementationSource s)
   throws ClassNotFoundException, InstantiationException, IllegalAccessException;

}

Load and Save

Loading: parsing an existing XML document to produce a Document object
Saving: serializing a Document object into a file or onto a stream
Completely implementation dependent in DOM2

The DOM Process

Library specific code creates a parser
The parser parses the document and returns a DOM org.w3c.dom.Document object.
The entire document is stored in memory.
DOM methods and interfaces are used to extract data from this object

Parsing documents with DOM2

This program parses with Xerces. Other parsers are different.

import org.apache.xerces.parsers.*;
import org.w3c.dom.*;
import org.xml.sax.*;
import java.io.*;

public class DOMParserMaker {

  public static void main(String[] args) {
     
    DOMParser parser = new DOMParser();
    
    for (int i = 0; i < args.length; i++) {
      try {
        parser.parse(args[i]); 
       
        Document d = parser.getDocument();
      }
      catch (SAXException e) {
        System.err.println(e); 
      }
      catch (IOException e) {
        System.err.println(e); 
      }
      
    }
   
  }

}

Parsing documents with DOM3

import org.w3c.dom.*;

public class DOM3ParserMaker {

  public static void main(String[] args) {

    DOMImplementationFactoryLS impl =
      (DOMImplementationLS) DOMImplementationFactory.getDOMImplementation();
    DOMBuilder parser = impl.getDOMBuilder();

    for (int i = 0; i < args.length; i++) {
      try {
        Document d = parser.parseURI(args[i]);
      }
      catch (DOMSystemException e) {
        System.err.println(e);
      }
      catch (DOMException e) {
        System.err.println(e);
      }

    }

  }

}

This code will not actually compile or run until some parser supports DOM3 Load and Save.

Load and Save

DOMImplementationLS: A new DOMImplementation interface that provides the factory methods for creating the objects required for loading and saving.
DOMBuilder: A parser interface
DOMInputSource: Encapsulate information about the source of the XML to be loaded, like SAX's InputSource
DOMEntityResolver: During loading, provides a way for applications to redirect references to external entities.
DOMBuilderFilter: Provide the ability to examine and optionally remove Element nodes as they are being processed during the parsing of a document. like SAX filters.
DOMWriter: An interface for serializing DOM documents onto a stream.
DOMCMBuilder: an interface for parsing Content Models (e.g. DTDs and schemas) and building the corresponding CMModel tree.
DOMCMWriter: An interface for serializing content models
DocumentLS: A "mechanism by which the content of a document can be replaced with the DOM tree produced when loading a URL, or parsing a string."
ParserErrorEvent: Some sort of error detected in the input document (well-formedness? validity?)

DOMImplementationLS

Factory interface to create new DOMBuilder and DOMWriter implementations.

Java Binding:

package org.w3c.dom.loadSave;

public interface DOMImplementationLS {

    public DOMBuilder createDOMBuilder();
    public DOMWriter  createDOMWriter();

}

IDL:

  interface DOMImplementationLS {
    DOMBuilder  createDOMBuilder();
    DOMWriter   createDOMWriter();
  };

DOMBuilder

Provides an implementation-independent API for parsing XML documents to produce a DOM Document object.
Instances are built by the createDOMBuilder() method in DOMImplementationLS.
IDL:

Java Binding:

package org.w3c.dom.loadSave;

public interface DOMBuilder {

  public DOMEntityResolver getEntityResolver();
  public void setEntityResolver(DOMEntityResolver entityResolver);

  public DOMErrorHandler getErrorHandler();
  public void setErrorHandler(DOMErrorHandler errorHandler);

  public DOMBuilderFilter getFilter();
  public void setFilter(DOMBuilderFilter filter);

  public boolean getMimeTypeCheck();
  public void setMimeTypeCheck(boolean mimeTypeCheck);

  public void setFeature(String name, boolean state)
   throws DOMException;
  public boolean supportsFeature(String name);
  public boolean canSetFeature(String name, boolean state);
  public boolean getFeature(String name) throws DOMException;
  public Document parseURI(String uri)
   throws DOMException, DOMSystemException;
  public Document parseDOMInputSource(DOMInputSource is)
   throws DOMException, DOMSystemException;

}

  interface DOMBuilder {
    attribute DOMEntityResolver entityResolver;
    attribute DOMErrorHandler   errorHandler;
    attribute DOMBuilderFilter  filter;
    attribute boolean           mimeTypeCheck;
             
    void      setFeature(in DOMString name, 
                         in boolean state)
                          raises(DOMException);
    boolean   supportsFeature(in DOMString name);
    boolean   canSetFeature(in DOMString name, 
                            in boolean state);
    boolean   getFeature(in DOMString name)
                          raises(dom::DOMException);
    Document  parseURI(in DOMString uri)
                        raises(DOMException, DOMSystemException);
    Document  parseDOMInputSource(in DOMInputSource is)
                        raises(DOMException, DOMSystemException);
  };

DOMInputSource

Like SAX2's InputSource class, this interface is an abstraction of all the different things (streams, files, byte arrays, sockets, URLs, etc.) from which an XML document can be read.

Java Binding:

package org.w3c.dom.loadSave;

public interface DOMInputSource {

    public InputStream getByteStream();
    public void        setByteStream(InputStream in);
    public Reader      getCharacterStream();
    public void        setCharacterStream(Reader in);

    public String getEncoding();
    public void   setEncoding(String encoding);
    public String getPublicId();
    public void   setPublicId(String publicId);
    public String getSystemId();
    public void   setSystemId(String systemId);

}

IDL:

  interface DOMInputSource {
    attribute DOMInputStream  byteStream;
    attribute DOMReader       characterStream;
    attribute DOMString       encoding;
    attribute DOMString       publicId;
    attribute DOMString       systemId;
  };

DOMEntityResolver

Like SAX2's EntityResolver interface, this interface lets applications redirect references to external entities.

Java Binding:

package org.w3c.dom.loadSave;

public interface DOMEntityResolver {

    public DOMInputSource resolveEntity(String publicId, 
      String systemId ) throws DOMSystemException;

}

IDL:

  interface DOMEntityResolver {
    DOMInputSource resolveEntity(in DOMString publicId, 
                                 in DOMString systemId )
                                    raises(DOMSystemException);
  };

DOMWriter

Provides an API for serializing (writing) a DOM document out as a sequence of bytes onto a stream, file, socket, byte array, etc.

Java Binding:

package org.w3c.dom.loadSave;

public interface DOMWriter {

    public String getEncoding();
    public void   setEncoding(String encoding);
    public String getLastEncoding();
    public short  getFormat();
    public void   setFormat(short format);
    public String getNewLine();
    public void   setNewLine(String newLine);
    public void   writeNode(OutputStream out, Node node)
      throws DOMSystemException;

}

IDL:

interface DOMWriter {
           attribute DOMString encoding;
  readonly attribute DOMString lastEncoding;
           attribute unsigned short format;
  // Modified in DOM Level 3:
           attribute DOMString newLine;
           
  void  writeNode(in DOMOutputStream destination, in Node node)
                     raises(DOMSystemException);
};

DOMBuilderFilter

Lets applications examine element nodes as they are being constructed during a parse.
As each element is examined, it may be modified or removed, or parsing may be aborted.

Java Binding:

package org.w3c.dom.loadSave;

public interface DOMBuilderFilter {

  public boolean startElement(Element element);
  public boolean endElement(Element element);

}

IDL:

  interface DOMBuilderFilter {
    boolean startElement(in Element element);
    boolean endElement(in Element element);
  };

DOMCMBuilder

A DTD parser, a schema parser, etc.

Java Binding:

package org.w3c.dom.loadSave;

public interface DOMCMBuilder extends DOMBuilder {

    public CMModel parseCMURI(String uri)
      throws DOMException, DOMSystemException;
    public CMModel parseCMInputSource(DOMInputSource is)
      throws DOMException, DOMSystemException;

}

IDL:

interface DOMCMBuilder : DOMBuilder {
  CMModel parseCMURI(in DOMString uri)
               raises(DOMException, DOMSystemException);
  CMModel parseCMInputSource(in DOMInputSource is)
               raises(DOMException, DOMSystemException);
};

DOMCMWriter

Serializes DTDs, schemas, and other content models

Java Binding:

package org.w3c.dom.loadSave;

public interface DOMCMWriter extends DOMWriter {

  public void writeCMModel(OutputStream destination, CMModel model)
   throws DOMSystemException;

}

IDL:

interface DOMCMWriter : DOMWriter {
  void writeCMModel(in DOMOutputStream destination, 
                    in CMModel model)
                  raises(DOMSystemException);
};

DocumentLS

A "mechanism by which the content of a document can be replaced with the DOM tree produced when loading a URL, or parsing a string."
An instance of the DocumentLS interface can be obtained by using binding-specific casting methods on an instance of the Document interface.

Java Binding:

package org.w3c.dom.loadSave;

import org.w3c.dom.Node;
import org.w3c.dom.DOMException;

public interface DocumentLS {

  public boolean getAsync();
  public void    setAsync(boolean async);

  public void    abort();
  public boolean load(String url);
  public boolean loadXML(String source);
  public String  saveXML(Node node) throws DOMException;

}

IDL:

interface DocumentLS {
  attribute boolean  async;
  
  void      abort();
  boolean   load(in DOMString url);
  boolean   loadXML(in DOMString source);
  DOMString saveXML(in Node node) raises(DOMException);
  
};

ParserErrorEvent

Represents an error (of what kind?) in the document being parsed

Java Binding:

package org.w3c.dom.loadSave;

public interface ParserErrorEvent {

  public int    getErrorCode();
  public int    getFilepos();
  public int    getLine();
  public int    getLinepos();
  public String getReason();
  public String getSrcText();
  public String getUrl();

}

IDL:

interface ParserErrorEvent {
  readonly attribute long      errorCode;
  readonly attribute long      filepos;
  readonly attribute long      line;
  readonly attribute long      linepos;
  readonly attribute DOMString reason;
  readonly attribute DOMString srcText;
  readonly attribute DOMString url;
};

Grammar Access/Abstract Schemas

Abstract Schemas (AS) include DTDs, W3C XML Schema Language Schemas, TREX, and more
Should be able to access their information without binding yourself too tightly to any one language

Abstract Schema Interfaces

Abstract Schema and AS-Editing Interfaces:
- ASModel
- ASExternalModel
- ASNode
- ASNodeList
- ASDOMStringList
- ASNamedNodeMap
- ASDataType
- ASPrimitiveType
- ASElementDeclaration
- ASChildren
- ASAttributeDeclaration
- ASEntityDeclaration
- ASNotationDeclaration
Validation and Other Interfaces:
- Document
- DocumentAS
- DOMImplementationAS
Document-Editing Interfaces:
- NodeAS
- ElementAS
- CharacterDataAS
- DocumentTypeAS
- AttributeAS
DOM Error Handler Interfaces:
- DOMErrorHandler
- DOMLocator

Abstract Schema and AS-Editing Interfaces

DOMImplementation.hasFeature("AS-EDIT") returns true if a given DOM supports these interfaces for editing abstract schemas:
- ASModel
- ASExternalModel
- ASNode
- ASNodeList
- ASNamedNodeMap
- ASDataType
- ASElementDeclaration
- ASChildren
- ASAttributeDeclaration
- ASEntityDeclaration
- ASNotationDeclaration

The ASModel Interface

Represents an abstract content model that could be a DTD, an XML Schema, or something else. It has both an internal and external subset.

Java binding:

package org.w3c.dom.abstractSchemas;

import org.w3c.dom.DOMException;

public interface ASModel extends ASNode {

  public boolean getIsNamespaceAware();

  public ASElementDeclaration getRootElementDecl();
  public void setRootElementDecl(ASElementDeclaration rootElementDecl);

  public String getSystemId();
  public void setSystemId(String systemId);

  public String getPublicId();
  public void setPublicId(String publicId);

  public ASNodeList getASNodes();

  public boolean removeNode(ASNode node);
  public boolean insertBefore(ASNode newNode, ASNode refNode);

  public boolean validate();

  public ASElementDeclaration createASElementDeclaration(String namespaceURI, 
   String qualifiedElementName) throws DOMException;

  public ASAttributeDeclaration createASAttributeDeclaration(String namespaceURI, 
   String qualifiedName) throws DOMException;

  public ASNotationDeclaration createASNotationDeclaration(String namespaceURI, 
   String qualifiedElementName, String systemIdentifier, String publicIdentifier)
   throws DOMException;

  public ASEntityDeclaration createASEntityDeclaration(String name)
                                                       throws DOMException;

  public ASChildren createASChildren(int minOccurs,  int maxOccurs, 
    short operator) throws DOMException;

}

IDL:

interface ASModel : ASNode {
  readonly attribute boolean               isNamespaceAware;
           attribute ASElementDeclaration  rootElementDecl;
           attribute DOMString             systemId;
           attribute DOMString             publicId;
           
  ASNodeList             getASNodes();
  boolean                removeNode(in ASNode node);
  boolean                insertBefore(in ASNode newNode in ASNode refNode);
  boolean                validate();
  ASElementDeclaration   createASElementDeclaration(inout DOMString namespaceURI, 
                                                  in DOMString qualifiedElementName)
                                        raises(DOMException);
  ASAttributeDeclaration createASAttributeDeclaration(inout DOMString namespaceURI, 
                                                      in DOMString qualifiedName)
                                        raises(DOMException);
  ASNotationDeclaration  createASNotationDeclaration(inout DOMString namespaceURI, 
                                                    in DOMString qualifiedElementName, 
                                                    in DOMString systemIdentifier, 
                                                    inout DOMString publicIdentifier)
                                        raises(DOMException);
  ASEntityDeclaration createASEntityDeclaration(in DOMString name)
                                        raises(DOMException);
  ASChildren          createASChildren(in unsigned long minOccurs, 
                                      in unsigned long maxOccurs, 
                                      inout unsigned short operator)
                                        raises(DOMException);
};

The ASExternalModel Interface

A ASModel that is not bound to a particular document, and can thus be shared among documents.

Java binding:

package org.w3c.dom.contentModel;

public interface ASExternalModel extends ASModel {
}

IDL:

  interface ASExternalModel : ASModel {
  };

The ASNode Interface

The node for the various kinds of declarations out of which ASModels are built

Java binding:

package org.w3c.dom.abstractSchemas;

public interface ASNode {

    public static final short AS_ELEMENT_DECLARATION    = 1;
    public static final short AS_ATTRIBUTE_DECLARATION  = 2;
    public static final short AS_NOTATION_DECLARATION   = 3;
    public static final short AS_ENTITY_DECLARATION     = 4;
    public static final short AS_CHILDREN               = 5;
    public static final short AS_MODEL                  = 6;
    public static final short AS_EXTERNALMODEL          = 7;

    public short   getCmNodeType();
    public ASModel getOwnerASModel();
    public void    setOwnerASModel(ASModel ownerASModel);
    public String  getNodeName();
    public void    setNodeName(String nodeName);
    public String  getPrefix();
    public void    setPrefix(String prefix);
    public String  getLocalName();
    public void    setLocalName(String localName);
    public String  getNamespaceURI();
    public void    setNamespaceURI(String namespaceURI);
    public ASNode  cloneASNode();

}

IDL:

interface ASNode {

  const unsigned short AS_ELEMENT_DECLARATION   = 1;
  const unsigned short AS_ATTRIBUTE_DECLARATION = 2;
  const unsigned short AS_NOTATION_DECLARATION  = 3;
  const unsigned short AS_ENTITY_DECLARATION    = 4;
  const unsigned short AS_CHILDREN              = 5;
  const unsigned short AS_MODEL                 = 6;
  const unsigned short AS_EXTERNALMODEL         = 7;
    
  readonly attribute unsigned short   cmNodeType;
           attribute ASModel          ownerASModel;
           attribute DOMString        nodeName;
           attribute DOMString        prefix;
           attribute DOMString        localName;
           attribute DOMString        namespaceURI;
          
  ASNode cloneASNode();
};

The ASNodeList Interface

An ordered list of the nodes in a content model

Java binding:

package org.w3c.dom.abstractSchemas;

public interface ASNodeList {

    public int getLength();
    public ASNode item(int index);

}

IDL:

interface ASNodeList {

  readonly attribute int length;
  
  ASNode item(in int index);
  
};

The ASNamedNodeMap Interface

An unordered set of AS nodes

Java binding:

package org.w3c.dom.abstractSchemas;

import org.w3c.dom.DOMASException;

public interface ASNamedNodeMap {

    public int getLength();
    public ASNode getNamedItem(String name);
    public ASNode getNamedItemNS(String namespaceURI, 
                                 String localName);
    public ASNode item(int index);
    public ASNode removeNamedItem(String name);
    public ASNode removeNamedItemNS(String namespaceURI, 
                                    String localName);
    public ASNode setNamedItem(ASNode newASNode)
                               throws DOMASException;
    public ASNode setNamedItemNS(ASNode newASNode)
                                 throws DOMASException;

}

IDL:

interface ASNamedNodeMap {
  readonly attribute int length;
  
  ASNode getNamedItem(inout DOMString name);
  ASNode getNamedItemNS(in DOMString namespaceURI, inout DOMString localName);
  ASNode item(in int index);
  ASNode removeNamedItem(in DOMString name);
  ASNode removeNamedItemNS(in DOMString namespaceURI, in DOMString localName);
  ASNode setNamedItem(inout ASNode newASNode)
    raises(DOMASException);
  ASNode setNamedItemNS(inout ASNode newASNode)
    raises(DOMASException);
};

The ASDataType Interface

Data types used in content models
This one is a little weak

Java binding:

package org.w3c.dom.abstractSchemas;

public interface ASDataType {

  public static final short STRING_DATATYPE = 1;
  public short getASPrimitiveType();

}

IDL:

interface ASDataType {
  const short STRING_DATATYPE = 1;
        short getASPrimitiveType();
};

The ASPrimitiveDataType Interface

Primitive data types used in content models
This one is a little weak

Java binding:

package org.w3c.dom.abstractSchemas;

import org.w3c.dom.decimal;

public interface ASPrimitiveType extends ASDataType {

  public static final short BOOLEAN_DATATYPE      = 2;
  public static final short FLOAT_DATATYPE        = 3;
  public static final short DOUBLE_DATATYPE       = 4;
  public static final short DECIMAL_DATATYPE      = 5;
  public static final short HEXBINARY_DATATYPE    = 6;
  public static final short BASE64BINARY_DATATYPE = 7;
  public static final short ANYURI_DATATYPE       = 8;
  public static final short QNAME_DATATYPE        = 9;
  public static final short DURATION_DATATYPE     = 10;
  public static final short DATETIME_DATATYPE     = 11;
  public static final short DATE_DATATYPE         = 12;
  public static final short TIME_DATATYPE         = 13;
  public static final short YEARMONTH_DATATYPE    = 14;
  public static final short YEAR_DATATYPE         = 15;
  public static final short MONTHDAY_DATATYPE     = 16;
  public static final short DAY_DATATYPE          = 17;
  public static final short MONTH_DATATYPE        = 18;
  public static final short NOTATION_DATATYPE     = 19;
  public decimal getLowValue();
  public void setLowValue(decimal lowValue);

  public decimal getHighValue();
  public void setHighValue(decimal highValue);

}

IDL:

interface ASPrimitiveType : ASDataType {
  const short BOOLEAN_DATATYPE      = 2;
  const short FLOAT_DATATYPE        = 3;
  const short DOUBLE_DATATYPE       = 4;
  const short DECIMAL_DATATYPE      = 5;
  const short HEXBINARY_DATATYPE    = 6;
  const short BASE64BINARY_DATATYPE = 7;
  const short ANYURI_DATATYPE       = 8;
  const short QNAME_DATATYPE        = 9;
  const short DURATION_DATATYPE     = 10;
  const short DATETIME_DATATYPE     = 11;
  const short DATE_DATATYPE         = 12;
  const short TIME_DATATYPE         = 13;
  const short YEARMONTH_DATATYPE    = 14;
  const short YEAR_DATATYPE         = 15;
  const short MONTHDAY_DATATYPE     = 16;
  const short DAY_DATATYPE          = 17;
  const short MONTH_DATATYPE        = 18;
  const short NOTATION_DATATYPE     = 19;
  
  attribute decimal lowValue;
  attribute decimal highValue;
};

The ASElementDeclaration Interface

Java binding:

package org.w3c.dom.abstractSchemas;

import org.w3c.dom.DOMASException;

public interface ASElementDeclaration extends ASNode {

  public static final short EMPTY_CONTENTTYPE         = 1;
  public static final short ANY_CONTENTTYPE           = 2;
  public static final short MIXED_CONTENTTYPE         = 3;
  public static final short ELEMENTS_CONTENTTYPE      = 4;
  public boolean getStrictMixedContent();
  public void setStrictMixedContent(boolean strictMixedContent);

  public ASDataType getElementType();
  public void setElementType(ASDataType elementType);
  public boolean getIsPCDataOnly();
  public void   setIsPCDataOnly(boolean isPCDataOnly);
  public short  getContentType();
  public void   setContentType(short contentType);
  public String getTagName();
  public void   setTagName(String tagName);
  public ASChildren getASChildren();
  public void setASChildren(ASChildren elementContent)
                            throws DOMASException;
  public ASNamedNodeMap getASAttributeDecls();
  public void setASAttributeDecls(ASNamedNodeMap attributes);
  public void addASAttributeDecl(ASAttributeDeclaration attributeDecl);
  public ASAttributeDeclaration removeASAttributeDecl(ASAttributeDeclaration attributeDecl);

}

Represents a declaration of an element such as <!ELEMENT TIME (#PCDATA)> or an xsd:element schema element

IDL:

 interface ASElementDeclaration : ASNode {
  const short EMPTY_CONTENTTYPE    = 1;
  const short ANY_CONTENTTYPE      = 2;
  const short MIXED_CONTENTTYPE    = 3;
  const short ELEMENTS_CONTENTTYPE = 4;
  
  attribute boolean    strictMixedContent;
  attribute ASDataType elementType;
  attribute boolean    isPCDataOnly;
  attribute short      contentType;
  attribute DOMString  tagName;
  
  ASChildren         getASChildren();
  void               setASChildren(inout ASChildren elementContent)
                                        raises(DOMASException);
  ASNamedNodeMap     getASAttributeDecls();
  void               setASAttributeDecls(inout ASNamedNodeMap attributes);
  void               addASAttributeDecl(in ASAttributeDeclaration attributeDecl);
  ASAttributeDeclaration removeASAttributeDecl(in ASAttributeDeclaration attributeDecl);
};

The ASChildren Interface

Represents the list of child elements in a content model of an element declaration

Java binding:

package org.w3c.dom.abstractSchemas;

public interface ASChildren extends ASNode {

  public static final int UNBOUNDED  = MAX_LONG;
  public static final short NONE     = 0;
  public static final short SEQUENCE = 1;
  public static final short CHOICE   = 2;
  
  public short getListOperator();
  public void  setListOperator(short listOperator);
  public int   getMinOccurs();
  public void  setMinOccurs(int minOccurs);
  public int   getMaxOccurs();
  public void  setMaxOccurs(int maxOccurs);
  public ASNodeList getSubModels();
  public void       setSubModels(ASNodeList subModels);
  public ASNode removeASNode(int nodeIndex);
  public int insertASNode(int nodeIndex, ASNode newNode);
  public int appendASNode(ASNode newNode);

}

IDL:

interface ASChildren : ASNode {

  const unsigned long   UNBOUNDED = MAX_LONG;
  const unsigned short  NONE      = 0;
  const unsigned short  SEQUENCE  = 1;
  const unsigned short  CHOICE    = 2;
  
  attribute unsigned short   listOperator;
  attribute unsigned long    minOccurs;
  attribute unsigned long    maxOccurs;
  attribute ASNodeList       subModels;
  
  ASNode removeASNode(in unsigned long nodeIndex);
  int    insertASNode(in unsigned long nodeIndex, 
                                  in ASNode newNode);
  int    appendASNode(in ASNode newNode);
};

The ASAttributeDeclaration Interface

Represents a declaration of an attribute; e.g. an xsd:attribute schema element oe
<!ATTLIST TIME HOURS CDATA #IMPLIED>

Java binding:

package org.w3c.dom.abstractSchemas;

public interface ASAttributeDeclaration extends ASNode {

    public static final short NO_VALUE_CONSTRAINT       = 0;
    public static final short DEFAULT_VALUE_CONSTRAINT  = 1;
    public static final short FIXED_VALUE_CONSTRAINT    = 2;
    public String getAttrName();
    public void setAttrName(String attrName);

    public ASDataType getAttrType();
    public void setAttrType(ASDataType attrType);

    public String getAttributeValue();
    public void   setAttributeValue(String attributeValue);

    public String getEnumAttr();
    public void setEnumAttr(String enumAttr);

    public ASNodeList getOwnerElement();
    public void setOwnerElement(ASNodeList ownerElement);

    public short getConstraintType();
    public void  setConstraintType(short constraintType);

}

IDL:

interface ASAttributeDeclaration : ASNode {

  const short NO_VALUE_CONSTRAINT      = 0;
  const short DEFAULT_VALUE_CONSTRAINT = 1;
  const short FIXED_VALUE_CONSTRAINT   = 2;
  
  attribute DOMString  attrName;
  attribute ASDataType attrType;
  attribute DOMString  attributeValue;
  attribute DOMString  enumAttr;
  attribute ASNodeList ownerElement;
  attribute short      constraintType;
};

The EntityDeclaration Interface

Represents a declaration of an entity; e.g.
<!ENTITY COPY01 "Copyright 2001 Elliotte Harold">

Java binding:

package org.w3c.dom.abstractSchemas;

public interface ASEntityDeclaration extends ASNode {

    public static final short INTERNAL_ENTITY = 1;
    public static final short EXTERNAL_ENTITY = 2;
    
    public short  getEntityType();
    public void   setEntityType(short entityType);
    public String getEntityName();
    public void   setEntityName(String entityName);
    public String getEntityValue();
    public void   setEntityValue(String entityValue);
    public String getSystemId();
    public void   setSystemId(String systemId);
    public String getPublicId();
    public void   setPublicId(String publicId);
    public String getNotationName();
    public void   setNotationName(String notationName);

}

IDL:

interface ASEntityDeclaration : ASNode {

  const short INTERNAL_ENTITY = 1;
  const short EXTERNAL_ENTITY = 2;
  
  attribute short      entityType;
  attribute DOMString  entityName;
  attribute DOMString  entityValue;
  attribute DOMString  systemId;
  attribute DOMString  publicId;
  attribute DOMString  notationName;
};

The ASNotationDeclaration Interface

Represents a declaration of a notation; e.g.
<!NOTATION TXT SYSTEM "text/plain">

Java binding:

package org.w3c.dom.abstractSchemas;

public interface ASNotationDeclaration extends ASNode {

  public String getNotationName();
  public void   setNotationName(String notationName);
  public String getSystemId();
  public void   setSystemId(String systemId);
  public String getPublicId();
  public void   setPublicId(String publicId);

}

IDL:

interface ASNotationDeclaration : ASNode {

  attribute DOMString notationName;
  attribute DOMString systemId;
  attribute DOMString publicId;
  
};

Validation and Other Interfaces:

Document
DocumentAS
DOMImplementationAS

The Document Interface

The DOM2 Document interface gets a new setErrorHandler() method

Java binding:

package org.w3c.dom.contentModel;

public interface Document {

    public void setErrorHandler(DOMErrorHandler handler);

}

IDL:

  interface Document {
    void setErrorHandler(in DOMErrorHandler handler);
  };

The different specs aren't synced up on this one yet.

The DocumentAS Interface

Extends the Document interface with additional methods for both document and abstract schema editing.

Java binding:

package org.w3c.dom.abstractSchemas;

public interface DocumentAS extends Document {

    public boolean getContinuousValidityChecking();
    public void setContinuousValidityChecking(boolean continuousValidityChecking);

    public int numASs();
    public ASModel getInternalAS();
    public ASNodeList getASs();
    public ASModel getActiveAS();
    public void addAS(ASModel cm);
    public void removeAS(ASModel cm);
    public boolean activateAS(ASModel cm);

}

IDL:

interface DocumentAS : Document {
  attribute boolean          continuousValidityChecking;
  
  int        numASs();
  ASModel    getInternalAS();
  ASNodeList getASs();
  ASModel    getActiveAS();
  void       addAS(in ASModel cm);
  void       removeAS(in ASModel cm);
  boolean    activateAS(in ASModel cm);
};

The DOMImplementationAS Interface

Extends the DOM2 DOMImplementation interface with factory methods to create schema documents

Java binding:

package org.w3c.dom.abstractSchemas;

public interface DOMImplementationAS extends DOMImplementation {

    public ASModel createAS();
    public ASExternalModel createExternalAS();

}

IDL:

interface DOMImplementationAS : DOMImplementation {

  ASModel         createAS();
  ASExternalModel createExternalAS();
};

Schema-guided Document-Editing Interfaces:

Allows you to determine whether or not it's valid to add or a delete a node at a particular position in a document. This is called guided document editing.
DOMImplementation.hasFeature("AS-DOC") returns true if a given DOM supports these capabilities.
- NodeAS
- ElementAS
- CharacterDataAS
- DocumentTypeAS
- AttributeAS

The NodeAS Interface

Extends the DOM2 Node interface with methods for guided document editing.

Java binding:

package org.w3c.dom.abstractSchemas;

public interface NodeAS extends Node {

  public static final short WF_CHECK               = 1;
  public static final short NS_WF_CHECK            = 2;
  public static final short PARTIAL_VALIDITY_CHECK = 3;
  public static final short STRICT_VALIDITY_CHECK  = 4;
  
  public short getWfValidityCheckLevel();
  public void setWfValidityCheckLevel(short wfValidityCheckLevel);

  public boolean canInsertBefore(Node newChild, Node refChild) 
   throws DOMException;
  public boolean canRemoveChild(Node oldChild)
   throws DOMException;
  public boolean canReplaceChild(Node newChild, Node oldChild)
   throws DOMException;
  public boolean canAppendChild(Node newChild)
   throws DOMException;
  public boolean isValid(boolean deep) throws DOMException;

}

IDL:

interface NodeAS : Node {

  const short WF_CHECK               = 1;
  const short NS_WF_CHECK            = 2;
  const short PARTIAL_VALIDITY_CHECK = 3;
  const short STRICT_VALIDITY_CHECK  = 4;
  
  attribute short wfValidityCheckLevel;
  
  boolean canInsertBefore(in Node newChild, in Node refChild) 
   raises(DOMException);
  boolean canRemoveChild(in Node oldChild) raises(DOMException);
  boolean canReplaceChild(in Node newChild, in Node oldChild)
   raises(DOMException);
  boolean canAppendChild(in Node newChild)  raises(DOMException);
  boolean isValid(in boolean deep) raises(DOMException);
};

The ElementAS Interface

Extends the DOM2 Element interface with methods for guided document editing.

Java binding:

package org.w3c.dom.abstractSchemas;

import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.DOMException;

public interface ElementAS extends Element {

  public short contentType();
  public ASElementDeclaration getElementDeclaration()
   throws DOMException;
  public boolean canSetAttribute(String attrname, String attrval);
  public boolean canSetAttributeNode(Node node);
  public boolean canSetAttributeNodeNS(Node node);
  public boolean canSetAttributeNS(String attrname, 
   String attrval, String namespaceURI, String localName);
  public boolean canRemoveAttribute(String attrname);
  public boolean canRemoveAttributeNS(String attrname, String namespaceURI);
  public boolean canRemoveAttributeNode(Node node);
  public ASDOMStringList getChildElements();
  public ASDOMStringList getParentElements();
  public ASDOMStringList getAttributeList();

}

IDL:

interface ElementAS : Element {

  short                contentType();
  ASElementDeclaration getElementDeclaration() raises(DOMException);
  boolean            canSetAttribute(in DOMString attrname, 
                                     in DOMString attrval);
  boolean            canSetAttributeNode(in Node node);
  boolean            canSetAttributeNodeNS(in Node node);
  boolean            canSetAttributeNS(in DOMString attrname, 
                                       in DOMString attrval, 
                                       in DOMString namespaceURI, 
                                       in DOMString localName);
  boolean            canRemoveAttribute(in DOMString attrname);
  boolean            canRemoveAttributeNS(in DOMString attrname, 
                                          inout DOMString namespaceURI);
  boolean            canRemoveAttributeNode(in Node node);
  ASDOMStringList    getChildElements();
  ASDOMStringList    getParentElements();
  ASDOMStringList    getAttributeList();
};

The CharacterDataAS Interface

Extends the DOM2 Text interface (which itself extends the DOM2 CharacterData interface) with methods for guided document editing.

Java binding:

package org.w3c.dom.abstractSchemas;

public interface CharacterDataAS extends CharacterData {

  public boolean isWhitespaceOnly();

  public boolean canSetData(int offset, String arg)
   throws DOMException;
  public boolean canAppendData(String arg)
   throws DOMException;
  public boolean canReplaceData(int offset, int count, String arg)
   throws DOMException;
  public boolean canInsertData(int offset, String arg)
   throws DOMException;
  public boolean canDeleteData(int offset, String arg)
   throws DOMException;

}

IDL:

interface CharacterDataAS : CharacterData {
  boolean isWhitespaceOnly();
  boolean canSetData(in unsigned long offset, in DOMString arg)
   raises(DOMException);
  boolean canAppendData(in DOMString arg) raises(DOMException);
  boolean canReplaceData(in unsigned long offset, 
   in unsigned long count, in DOMString arg)
   raises(DOMException);
  boolean canInsertData(in unsigned long offset, in DOMString arg)
   raises(DOMException);
  boolean canDeleteData(in unsigned long offset, in DOMString arg)
   raises(DOMException);
};

The DocumentTypeAS Interface

Extends the DOM2 DocumentType interface with methods for guided document editing.

Java binding:

package org.w3c.dom.abstractSchemas;

public interface DocumentTypeAS extends DocumentType {

    public ASDOMStringList getDefinedElementTypes();

    public boolean isElementDefined(String elemTypeName);

    public boolean isElementDefinedNS(String elemTypeName, 
     String namespaceURI, String localName);

    public boolean isAttributeDefined(String elemTypeName, 
     String attrName);

    public boolean isAttributeDefinedNS(String elemTypeName, 
     String attrName, String namespaceURI, String localName);
     
    public boolean isEntityDefined(String entName);

}

IDL:

interface DocumentTypeAS : DocumentType {

  readonly attribute ASDOMStringList  definedElementTypes;
  boolean            isElementDefined(in DOMString elemTypeName);
  boolean            isElementDefinedNS(in DOMString elemTypeName, 
                                        in DOMString namespaceURI, 
                                        in DOMString localName);
  boolean            isAttributeDefined(in DOMString elemTypeName, 
                                        in DOMString attrName);
  boolean            isAttributeDefinedNS(in DOMString elemTypeName, 
                                          in DOMString attrName, 
                                          in DOMString namespaceURI, 
                                          in DOMString localName);
  boolean            isEntityDefined(in DOMString entName);
  
};

The AttributeAS Interface

Extends the DOM2 Attr interface with methods for guided document editing.

Java binding:

package org.w3c.dom.abstractSchemas;

import org.w3c.dom.DOMException;
import org.w3c.dom.Attr;

public interface AttributeAS extends Attr {

  public ASAttributeDeclaration getAttributeDeclaration();
  public ASNotationDeclaration getNotation() throws DOMException;

}

IDL:

interface AttributeAS : Attr {
  ASAttributeDeclaration getAttributeDeclaration();
  ASNotationDeclaration getNotation() raises(DOMException);
};

DOM Error Handler Interfaces

DOMErrorHandler
DOMLocator

The DOMErrorHandler Interface

Similar to SAX2's ErrorHandler interface.
A callback interface
An application implements this interface and then registers it with the setErrorHandler() method to provide warnings, errors, and fatal errors.

Java binding:

package org.w3c.dom.contentModel;

public interface DOMErrorHandler {

    public void warning(DOMLocator where, String how,  String why)
      throws DOMSystemException;
    public void fatalError(DOMLocator where, String how, String why)
      throws DOMSystemException;
    public void error(DOMLocator where, String how, String why)
      throws DOMSystemException;

}

IDL:

  interface DOMErrorHandler {
    void warning(in DOMLocator where, 
                 in DOMString how, 
                 in DOMString why)
                    raises(dom::DOMSystemException);
    void fatalError(in DOMLocator where, 
                    in DOMString how, 
                    in DOMString why)
                       raises(dom::DOMSystemException);
    void error(in DOMLocator where, 
               in DOMString how, 
               in DOMString why)
                  raises(dom::DOMSystemException);
  };

The DOMLocator Interface

Similar to SAX2's Locator interface. An application can implement this interface and then register it with the setLocator() method to find out in which line and column and file a given node appears.

Java binding:

package org.w3c.dom.contentModel;

public interface DOMLocator {

    public int    getColumnNumber();
    public int    getLineNumber();
    public String getPublicID();
    public Node   getNode();

}

IDL:

  interface DOMLocator {
    int        getColumnNumber();
    int        getLineNumber();
    DOMString  getPublicID();
    DOMString  getSystemID();
    Node       getNode();
  };

To Learn More

Document Object Model (DOM) Level 3 Content Models and Load and Save Specification: http://www.w3.org/TR/DOM-Level-3-CMLS/
Document Object Model (DOM) Level 3 Core Specification Version 1.0: http://www.w3.org/TR/DOM-Level-3-Core
Document Object Model (DOM) Requirements: http://www.w3.org/TR/DOM-Requirements/
Document Object Model (DOM) Level 3 Views and Formatting Specification: http://www.w3.org/TR/DOM-Level-3-Views/

Part V: JDOM

There is no compelling reason for a Java API to manipulate XML to be complex, tricky, unintuitive, or a pain in the neck.

--JDOM Mission Statement

Where we're going

Writing XML with JDOM
Reading XML through JDOM
The JDOM Classes

What is JDOM?

A Pure Java API for reading and writing XML Documents
A Java-oriented API for reading and writing XML Documents
A tree-oriented API for reading and writing XML Documents
A parser independent API for reading and writing XML Documents

About JDOM

Created by Brett McLaughlin and Jason Hunter. (James Duncan Davidson is an unindicted coconspirator.)
Alex Chafee, Alex Rosen, Jools Enticknap, and Philip Nelson are also major contributors.
Open source with an Apache-like license
http://www.jdom.org/

JDOM versions

1.0 Beta 7 is current tarball from June, 2001
Last couple of months have added some functionality to the API
This presentation is based on the June 22, 2001 CVS version
cvs.jdom.org

Five packages:

org.jdom: the classes that represent an XML document and its parts
org.jdom.input: classes for reading a document into memory
org.jdom.output: classes for writing a document onto a stream or other target (e.g. SAX or DOM app)
org.jdom.adapters: classes for hooking up to DOM implementations
org.jdom.transforms: XSLT support

The org.jdom package

The classes that represent an XML document and its parts

Attribute
Comment
DocType
Document
Element
Text (incomplete)
CDATA (may be going away)
EntityRef
ProcessingInstruction
plus Verifier
plus assorted exceptions

The org.jdom.input package

Classes for reading a document into memory from a file or other source

DOMBuilder
SAXBuilder
BuilderErrorHandler
DefaultJDOMFactory
SAXHandler

The org.jdom.output package

The classes for writing a document to a file or other target

XMLOutputter
SAXOutputter
DOMOutputter

The org.jdom.adapters package

Classes for hooking up JDOM to DOM implementations:
- AbstractDOMAdapter
- OracleV1DOMAdapter
- OracleV2DOMAdapter
- ProjectXDOMAdapter
- XercesDOMAdapter
- JAXPDOMAdapter
- CrimsonDOMAdapter
- XML4JDOMAdapter
You rarely need to access these directly.

The org.jdom.transform package

Classes for XSLT support:

JDOMResult
JDOMSource

Writing XML Documents with JDOM

JDOM is for both input and output
New documents can be read from a stream or constructed in memory
An org.jdom.output.XMLOutputter sends a document from memory to an OutputStream or Writer
A JDOM document can also be sent to a SAX ContentHandler or DOM org.w3c.dom.Document for further processing with a different API

A JDOM program that writes this XML document

<?xml version="1.0"?>
<GREETING>
  Hello JDOM!
</GREETING>

Hello JDOM

import org.jdom.*;
import org.jdom.output.XMLOutputter;


public class HelloJDOM {

  public static void main(String[] args) {
   
    Element root = new Element("GREETING");
    	
    root.setText("Hello JDOM!");
         
    Document doc = new Document(root);      
    
    // At this point the document only exists in memory.
    // We still need to serialize it
    XMLOutputter outputter = new XMLOutputter();
    try {
      outputter.output(doc, System.out);       
    }
    catch (Exception e) {
      System.err.println(e);
    }

  }

}

Actual Output

<?xml version="1.0" encoding="UTF-8"?>
<GREETING>Hello JDOM!</GREETING>

This is more or less what we wanted, modulo white space.

Hello JDOM 2

With white space:

import org.jdom.*;
import org.jdom.output.XMLOutputter;


public class HelloJDOM2 {

  public static void main(String[] args) {
   
    Element root = new Element("GREETING");
    	
    root.setText("\r\n  Hello JDOM!\r\n");
         
    Document doc = new Document(root);      
    
    // At this point the document only exists in memory.
    // We still need to serialize it
    XMLOutputter outputter = new XMLOutputter();
    try {
      outputter.output(doc, System.out);       
    }
    catch (Exception e) {
      System.err.println(e);
    }

  }

}

Hello DOM

Here's the same program using DOM instead of JDOM. Which is simpler?

import java.io.*;
import org.w3c.dom.*;
import org.apache.xerces.dom.*;
import org.apache.xml.serialize.*;


public class HelloDOM {

  public static void main(String[] args) {

    try {

      DOMImplementation impl = DOMImplementationImpl.getDOMImplementation();
      //                       ^^^^^^^^^^^^^^^^^^^^^
      //                       Xerces Specific class

      Document hello = impl.createDocument(null, "GREETING", null);
      //                                   ^^^^              ^^^^
      //                               Namespace URI       DocType

      Element root = hello.getDocumentElement();

      // We can't use a raw string. Instead we have to first create
      // a text node.
      Text text = hello.createTextNode("Hello DOM!");
      root.appendChild(text);

      // Now that the document is created we need to *serialize* it
      try {
        OutputFormat format = new OutputFormat(hello);
        XMLSerializer serializer = new XMLSerializer(System.out, format);
        serializer.serialize(root);
      }
      catch (IOException e) {
        System.err.println(e);
      }
    }
    catch (DOMException e) {
      e.printStackTrace();
    }

  }

}

fibonacci.xml

Suppose we want data in an XML document that looks something like this:

<?xml version="1.0" encoding="UTF-8"?>
<Fibonacci_Numbers>
  <fibonacci index="0">0</fibonacci>
  <fibonacci index="1">1</fibonacci>
  <fibonacci index="2">1</fibonacci>
  <fibonacci index="3">2</fibonacci>
  <fibonacci index="4">3</fibonacci>
  <fibonacci index="5">5</fibonacci>
  <fibonacci index="6">8</fibonacci>
  <fibonacci index="7">13</fibonacci>
  <fibonacci index="8">21</fibonacci>
  <fibonacci index="9">34</fibonacci>
  <fibonacci index="10">55</fibonacci>
  <fibonacci index="11">89</fibonacci>
  <fibonacci index="12">144</fibonacci>
  <fibonacci index="13">233</fibonacci>
  <fibonacci index="14">377</fibonacci>
  <fibonacci index="15">610</fibonacci>
  <fibonacci index="16">987</fibonacci>
  <fibonacci index="17">1597</fibonacci>
  <fibonacci index="18">2584</fibonacci>
  <fibonacci index="19">4181</fibonacci>
  <fibonacci index="20">6765</fibonacci>
  <fibonacci index="21">10946</fibonacci>
  <fibonacci index="22">17711</fibonacci>
  <fibonacci index="23">28657</fibonacci>
  <fibonacci index="24">46368</fibonacci>
  <fibonacci index="25">75025</fibonacci>
</Fibonacci_Numbers>

A JDOM program that writes Fibonacci numbers into an XML file

import org.jdom.Element;
import org.jdom.Document;
import org.jdom.output.XMLOutputter;
import java.math.BigInteger;
import java.io.*;


public class FibonacciJDOM {

  public static void main(String[] args) {

    Element root = new Element("Fibonacci_Numbers");

    BigInteger low  = BigInteger.ZERO;
    BigInteger high = BigInteger.ONE;

    for (int i = 0; i <= 25; i++) {
      Element fibonacci = new Element("fibonacci");
      fibonacci.addAttribute("index", String.valueOf(i));
      fibonacci.setText(low.toString());
      root.addContent(fibonacci);

      BigInteger temp = high;
      high = high.add(low);
      low = temp;
    }

    Document doc = new Document(root);
    // serialize it into a file
    try {
      FileOutputStream out = new FileOutputStream("fibonacci.xml");
      XMLOutputter serializer = new XMLOutputter();
      serializer.output(doc, out);
      out.flush();
      out.close();
    }
    catch (IOException e) {
      System.err.println(e);
    }

  }

}

Output

Again, modulo white space this is correct

<?xml version="1.0" encoding="UTF-8"?>
<Fibonacci_Numbers><fibonacci index="0">0</fibonacci><fibonacci index="1">1</fibonacci><fibonacci index="2">1</fibonacci><fibonacci index="3">2</fibonacci><fibonacci index="4">3</fibonacci><fibonacci index="5">5</fibonacci><fibonacci index="6">8</fibonacci><fibonacci index="7">13</fibonacci><fibonacci index="8">21</fibonacci><fibonacci index="9">34</fibonacci><fibonacci index="10">55</fibonacci><fibonacci index="11">89</fibonacci><fibonacci index="12">144</fibonacci><fibonacci index="13">233</fibonacci><fibonacci index="14">377</fibonacci><fibonacci index="15">610</fibonacci><fibonacci index="16">987</fibonacci><fibonacci index="17">1597</fibonacci><fibonacci index="18">2584</fibonacci><fibonacci index="19">4181</fibonacci><fibonacci index="20">6765</fibonacci><fibonacci index="21">10946</fibonacci><fibonacci index="22">17711</fibonacci><fibonacci index="23">28657</fibonacci><fibonacci index="24">46368</fibonacci><fibonacci index="25">75025</fibonacci></Fibonacci_Numbers>

Suppose you want to include a DTD

Suppose we have this DTD at the relative URL fibonacci.dtd:

<!ELEMENT Fibonacci_Numbers (fibonacci*)>
<!ELEMENT fibonacci (#PCDATA)>
<!ATTLIST fibonacci index CDATA #IMPLIED>

We need this DOCTYPE declaration:

<!DOCTYPE Fibonacci_Numbers SYSTEM "fibonacci.dtd">

ValidFibonacci

Use the DocType class to insert a document type declaration
JDOM does not support internal DTD subsets.
JDOM does not let you output a DTD.

import java.math.*;
import java.io.*;
import org.jdom.*;
import org.jdom.output.XMLOutputter;


public class ValidFibonacci {

  public static void main(String[] args) {
   
    Element root = new Element("Fibonacci_Numbers");	
  	      
    BigInteger low  = BigInteger.ZERO;
    BigInteger high = BigInteger.ONE;      
    
    for (int i = 0; i <= 25; i++) {
      Element fibonacci = new Element("fibonacci");
      Attribute index = new Attribute("index", String.valueOf(i));
      fibonacci.addAttribute(index);
      fibonacci.setText(low.toString());
      BigInteger temp = high;
      high = high.add(low);
      low = temp;
      root.addContent(fibonacci);
    }
 
    DocType type = new DocType("Fibonacci_Numbers", "fibonacci.dtd");
 
    Document doc = new Document(root, type);
    // serialize it into a file
    try {
      FileOutputStream out = new FileOutputStream("validfibonacci.xml");
      XMLOutputter serializer = new XMLOutputter(); 
      serializer.output(doc, out);
      out.flush();	
      out.close();
    }
    catch (IOException e) {
      System.err.println(e);
    }

  }

}

View Output in Browser

validfibonacci.xml

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE Fibonacci_Numbers SYSTEM "fibonacci.dtd">
<Fibonacci_Numbers><fibonacci index="0">0</fibonacci><fibonacci index="1">1</fibonacci><fibonacci index="2">1</fibonacci><fibonacci index="3">2</fibonacci><fibonacci index="4">3</fibonacci><fibonacci index="5">5</fibonacci><fibonacci index="6">8</fibonacci><fibonacci index="7">13</fibonacci><fibonacci index="8">21</fibonacci><fibonacci index="9">34</fibonacci><fibonacci index="10">55</fibonacci><fibonacci index="11">89</fibonacci><fibonacci index="12">144</fibonacci><fibonacci index="13">233</fibonacci><fibonacci index="14">377</fibonacci><fibonacci index="15">610</fibonacci><fibonacci index="16">987</fibonacci><fibonacci index="17">1597</fibonacci><fibonacci index="18">2584</fibonacci><fibonacci index="19">4181</fibonacci><fibonacci index="20">6765</fibonacci><fibonacci index="21">10946</fibonacci><fibonacci index="22">17711</fibonacci><fibonacci index="23">28657</fibonacci><fibonacci index="24">46368</fibonacci><fibonacci index="25">75025</fibonacci></Fibonacci_Numbers>

Using Namespaces

Suppose we want some MathML like this:

<?xml version="1.0" encoding="UTF-8"?>
<mathml:math xmlns:mathml="http://www.w3.org/1998/Math/MathML">
  <mathml:mrow>
    <mathml:mi>f(0)</mathml:mi>
    <mathml:mo>=</mathml:mo>
    <mathml:mn>0</mathml:mn>
  </mathml:mrow>
  <mathml:mrow>
    <mathml:mi>f(1)</mathml:mi>
    <mathml:mo>=</mathml:mo>
    <mathml:mn>1</mathml:mn>
  </mathml:mrow>
  <mathml:mrow>
    <mathml:mi>f(2)</mathml:mi>
    <mathml:mo>=</mathml:mo>
    <mathml:mn>1</mathml:mn>
  </mathml:mrow>
</mathml:math>

Rules for Using Namespaces

Do not use the qualified names like mathml:mn.
Instead use the prefixes mathml, local names like mn, and URIs like http://www.w3.org/1998/Math/MathML to create the elements.
Do not include xmlns attributes like xmlns:mathml="http://www.w3.org/1998/Math/MathML".
XMLOutputter will decide where to put the xmlns attributes when the document is serialized.

With Namespace Prefixes

import org.jdom.Element;
import org.jdom.Document;
import org.jdom.output.XMLOutputter;
import java.math.BigInteger;
import java.io.*;


public class PrefixedFibonacci {

  public static void main(String[] args) {

    Element root = new Element("math", "mathml",
     "http://www.w3.org/1998/Math/MathML");

    BigInteger low  = BigInteger.ZERO;
    BigInteger high = BigInteger.ONE;

    for (int i = 0; i <= 25; i++) {

      Element mrow = new Element("mrow", "mathml",
       "http://www.w3.org/1998/Math/MathML");

      Element mi = new Element("mi", "mathml",
       "http://www.w3.org/1998/Math/MathML");
      mi.setText("f(" + i + ")");
      mrow.addContent(mi);

      Element mo = new Element("mo", "mathml",
       "http://www.w3.org/1998/Math/MathML");
      mo.setText("=");
      mrow.addContent(mo);

      Element mn = new Element("mn", "mathml",
       "http://www.w3.org/1998/Math/MathML");
      mn.setText(low.toString());
      mrow.addContent(mn);

      BigInteger temp = high;
      high = high.add(low);
      low = temp;
      root.addContent(mrow);

    }

    Document doc = new Document(root);
    // serialize it into a file
    try {
      FileOutputStream out = new FileOutputStream("prefixed_fibonacci.xml");
      XMLOutputter serializer = new XMLOutputter();
      serializer.output(doc, out);
      out.flush();
      out.close();
    }
    catch (IOException e) {
      System.err.println(e);
    }

  }

}

View Output in Browser

The Default, Unprefixed Namespace

Suppose you want some MathML like this:

<?xml version="1.0" encoding="UTF-8"?>
<math xmlns="http://www.w3.org/1998/Math/MathML">
  <mrow>
    <mi>f(0)</mi>
    <mo>=</mo>
    <mn>0</mn>
  </mrow>
  <mrow>
    <mi>f(1)</mi>
    <mo>=</mo>
    <mn>1</mn>
  </mrow>
  <mrow>
    <mi>f(2)</mi>
    <mo>=</mo>
    <mn>1</mn>
  </mrow>
</math>

Rules for Using Default Namespace

Do not use the local names like mn.
Instead use the local names like mn, and URIs like http://www.w3.org/1998/Math/MathML to create the elements.
Do not include xmlns attributes like xmlns="http://www.w3.org/1998/Math/MathML".
XMLOutputter will decide where to put the xmlns attribute when the document is serialized.

With Default Namespace

import org.jdom.Element;
import org.jdom.Document;
import org.jdom.output.XMLOutputter;
import java.math.BigInteger;
import java.io.*;


public class UnprefixedFibonacci {

  public static void main(String[] args) {
   
    Element root 
     = new Element("math", "http://www.w3.org/1998/Math/MathML");	
  	      
    BigInteger low  = BigInteger.ZERO;
    BigInteger high = BigInteger.ONE;      
    
    for (int i = 0; i <= 25; i++) {
        
      Element mrow 
       = new Element("mrow", "http://www.w3.org/1998/Math/MathML");
      
      Element mi 
       = new Element("mi", "http://www.w3.org/1998/Math/MathML");
      mi.setText("f(" + i + ")"); 
      mrow.addContent(mi);
      
      Element mo 
       = new Element("mo", "http://www.w3.org/1998/Math/MathML");
      mo.setText("="); 
      mrow.addContent(mo);
      
      Element mn 
       = new Element("mn", "http://www.w3.org/1998/Math/MathML");
      mn.setText(low.toString());
      mrow.addContent(mn);

      BigInteger temp = high;
      high = high.add(low);
      low = temp;
      root.addContent(mrow);
      
    }
 
    Document doc = new Document(root);
    // serialize it into a file
    try {
      FileOutputStream out = new FileOutputStream("unprefixed_fibonacci.xml");
      XMLOutputter serializer = new XMLOutputter(); 
      serializer.output(doc, out);
      out.flush();	
      out.close();
    }
    catch (IOException e) {
      System.err.println(e);
    }

  }

}

View Output in Browser

Converting data to XML

Sample Tab Delimited Data: Baseball Statistics



Surname FirstName Team Position Games Played Games Started AtBats Runs Hits Doubles Triples Home runs RBI Stolen Bases Caught Stealing Sacrifice Hits Sacrifice Flies Errors PB Walks Strike outs Hit by pitch 
Anderson Garret ANA Outfield 156 151 622 62 183 41 7 15 79 8 3 3 3 6 0 29 80 1 
Baughman Justin ANA Second Base 62 54 196 24 50 9 1 1 20 10 4 5 3 8 0 6 36 1 
Bolick Frank ANA Third Base 21 11 45 3 7 2 0 1 2 0 0 0 0 0 0 11 8 0 
Disarcina Gary ANA Shortstop 157 155 551 73 158 39 3 3 56 12 7 12 3 14 0 21 51 8 
Edmonds Jim ANA Outfield 154 150 599 115 184 42 1 25 91 7 5 1 1 5 0 57 114 1 
Erstad Darin ANA Outfield 133 129 537 84 159 39 3 19 82 20 6 1 3 3 0 43 77 6 
Garcia Carlos ANA Second Base 19 10 35 4 5 1 0 0 0 2 0 1 0 1 0 3 11 1 
Glaus Troy ANA Third Base 48 45 165 19 36 9 0 1 23 1 0 0 2 7 0 15 51 0 
Greene Todd ANA Outfield 29 15 71 3 18 4 0 1 7 0 0 0 0 0 0 2 20 0 
Helfand Eric ANA Catcher 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
Hollins Dave ANA Third Base 101 98 363 60 88 16 2 11 39 11 3 2 2 17 0 44 69 7 
Jefferies Gregg ANA Outfield 19 18 72 7 25 6 0 1 10 1 0 0 0 0 0 0 5 0 
Johnson Mark ANA First Base 10 2 14 1 1 0 0 0 0 0 0 0 0 0 0 0 6 0 
Kreuter Chad ANA Catcher 96 74 252 27 63 10 1 2 33 1 0 5 1 9 5 33 49 3 
Martin Norberto ANA Second Base 79 50 195 20 42 2 0 1 13 3 1 3 2 4 0 6 29 0 
Mashore Damon ANA Outfield 43 24 98 13 23 6 0 2 11 1 0 1 0 0 0 9 22 3 
Molina Ben ANA Catcher 2 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
Nevin Phil ANA Catcher 75 65 237 27 54 8 1 8 27 0 0 0 2 5 20 17 67 5 
O'Brien Charlie ANA Catcher 62 58 175 13 45 9 0 4 18 0 0 3 3 4 1 10 33 2 
Palmeiro Orlando ANA Outfield 74 34 165 28 53 7 2 0 21 5 4 7 0 0 0 20 11 0 
Pritchett Chris ANA First Base 31 19 80 12 23 2 1 2 8 2 0 0 0 1 0 4 16 0 
Salmon Tim ANA Designated Hitter 136 130 463 84 139 28 1 26 88 0 1 0 10 2 0 90 100 3 
Shipley Craig ANA Third Base 77 32 147 18 38 7 1 2 17 0 4 4 1 3 0 5 22 5 
Velarde Randy ANA Second Base 51 50 188 29 49 13 1 4 26 7 2 0 1 4 0 34 42 1 
Walbeck Matt ANA Catcher 108 91 338 41 87 15 2 6 46 1 1 5 5 7 8 30 68 2 
Williams Reggie ANA Outfield 29 7 36 7 13 1 0 1 5 3 3 1 0 0 0 7 11 1

A Program to convert tab delimited data to XML

import java.io.*;
import org.jdom.*;
import org.jdom.output.XMLOutputter;


public class JDOMBaseballTabToXML {

  public static void main(String[] args) {
     
    Element root = new Element("players");
    
    try {
      FileInputStream fin = new FileInputStream(args[0]);
      BufferedReader in 
       = new BufferedReader(new InputStreamReader(fin));    

      String playerStats;  
      while ((playerStats = in.readLine()) != null) {
        String[] stats = splitLine(playerStats);
        
        Element player = new Element("player");

        Element first_name = new Element("first_name");
        first_name.setText(stats[1]);
        player.addContent(first_name);
        
        Element surname = new Element("surname");
        surname.setText(stats[0]);
        player.addContent(surname);
       
        Element games_played = new Element("games_played");
        games_played.setText(stats[4]);
        player.addContent(games_played);
       
        Element at_bats = new Element("at_bats");
        at_bats.setText(stats[6]);
        player.addContent(at_bats);
       
        Element runs = new Element("runs");
        runs.setText(stats[7]);
        player.addContent(runs);
       
        Element hits = new Element("hits");
        hits.setText(stats[8]);
        player.addContent(hits);
       
        Element doubles = new Element("doubles");
        doubles.setText(stats[9]);
        player.addContent(doubles);
       
        Element triples = new Element("triples");
        triples.setText(stats[10]);
        player.addContent(triples); 

        Element home_runs = new Element("home_runs");
        home_runs.setText(stats[11]);
        player.addContent(home_runs); 

        Element runs_batted_in = new Element("runs_batted_in");
        runs_batted_in.setText(stats[12]);
        player.addContent(runs_batted_in); 

        Element stolen_bases = new Element("stolen_bases");
        stolen_bases.setText(stats[13]);
        player.addContent(stolen_bases); 

        Element caught_stealing = new Element("caught_stealing");
        caught_stealing.setText(stats[14]);
        player.addContent(caught_stealing); 

        Element sacrifice_hits = new Element("sacrifice_hits");
        sacrifice_hits.setText(stats[15]);
        player.addContent(sacrifice_hits); 

        Element sacrifice_flies = new Element("sacrifice_flies");
        sacrifice_flies.setText(stats[16]);
        player.addContent(sacrifice_flies); 

        Element errors = new Element("errors");
        errors.setText(stats[17]);
        player.addContent(errors); 

        Element passed_by_ball = new Element("passed_by_ball");
        passed_by_ball.setText(stats[18]);
        player.addContent(passed_by_ball); 

        Element walks = new Element("walks");
        walks.setText(stats[19]);
        player.addContent(walks); 

        Element strike_outs = new Element("strike_outs");
        strike_outs.setText(stats[20]);
        player.addContent(strike_outs); 

        Element hit_by_pitch = new Element("hit_by_pitch");
        hit_by_pitch.setText(stats[21]);
        player.addContent(hit_by_pitch); 
        
        root.addContent(player);
      }  
      
      Document doc = new Document(root);
      // serialize it into a file
      FileOutputStream fout 
       = new FileOutputStream("baseballstats.xml");
      
      XMLOutputter serializer = new XMLOutputter(); 
      serializer.output(doc, fout);
      fout.flush();	
      fout.close();
      in.close();
      
    }
    catch (IOException e) {
      System.err.println(e);
    }
    catch (ArrayIndexOutOfBoundsException e) {
      System.out.println("Usage: java BaseballTabToXML input_file.tab");
    }

  }

  public static String[] splitLine(String playerStats) {
    
    // count the number of tabs
    int numTabs = 0;
    for (int i = 0; i < playerStats.length(); i++) {
      if (playerStats.charAt(i) == '\t') numTabs++;
    }
    int numFields = numTabs + 1;
    String[] fields = new String[numFields];
    int position = 0;
    for (int i = 0; i < numFields; i++) {
      StringBuffer field = new StringBuffer();
      while (position < playerStats.length() 
       && playerStats.charAt(position++) != '\t') {
        field.append(playerStats.charAt(position-1));
      }
      fields[i] = field.toString();
    }    
    return fields;
    
  }

}

View Output in Browser

Baseball Stats in XML

<?xml version="1.0"?>
<players>
  <player>
    <first_name>FirstName</first_name>
    <surname>Surname</surname>
    <games_played>Games Played</games_played>
    <at_bats>AtBats</at_bats>
    <runs>Runs</runs>
    <hits>Hits</hits>
    <doubles>Doubles</doubles>
    <triples>Triples</triples>
    <home_runs>Home runs</home_runs>
    <stolen_bases>RBI</stolen_bases>
    <caught_stealing>Caught Stealing</caught_stealing>
    <sacrifice_hits>Sacrifice Hits</sacrifice_hits>
    <sacrifice_flies>Sacrifice Flies</sacrifice_flies>
    <errors>Errors</errors>
    <passed_by_ball>PB</passed_by_ball>
    <walks>Walks</walks>
    <strike_outs>Strike outs</strike_outs>
    <hit_by_pitch>Hit by pitch</hit_by_pitch>
  </player>
  <player>
    <first_name>Garret </first_name>
    <surname>Anderson</surname>
    <games_played>156</games_played>
    <at_bats>622</at_bats>
    <runs>62</runs>
    <hits>183</hits>
    <doubles>41</doubles>
    <triples>7</triples>
    <home_runs>15</home_runs>
    <stolen_bases>79</stolen_bases>
    <caught_stealing>3</caught_stealing>
    <sacrifice_hits>3</sacrifice_hits>
    <sacrifice_flies>3</sacrifice_flies>
    <errors>6</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>29</walks>
    <strike_outs>80</strike_outs>
    <hit_by_pitch>1</hit_by_pitch>
  </player>
  <player>
    <first_name>Justin </first_name>
    <surname>Baughman</surname>
    <games_played>62</games_played>
    <at_bats>196</at_bats>
    <runs>24</runs>
    <hits>50</hits>
    <doubles>9</doubles>
    <triples>1</triples>
    <home_runs>1</home_runs>
    <stolen_bases>20</stolen_bases>
    <caught_stealing>4</caught_stealing>
    <sacrifice_hits>5</sacrifice_hits>
    <sacrifice_flies>3</sacrifice_flies>
    <errors>8</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>6</walks>
    <strike_outs>36</strike_outs>
    <hit_by_pitch>1</hit_by_pitch>
  </player>
  <player>
    <first_name>Frank </first_name>
    <surname>Bolick</surname>
    <games_played>21</games_played>
    <at_bats>45</at_bats>
    <runs>3</runs>
    <hits>7</hits>
    <doubles>2</doubles>
    <triples>0</triples>
    <home_runs>1</home_runs>
    <stolen_bases>2</stolen_bases>
    <caught_stealing>0</caught_stealing>
    <sacrifice_hits>0</sacrifice_hits>
    <sacrifice_flies>0</sacrifice_flies>
    <errors>0</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>11</walks>
    <strike_outs>8</strike_outs>
    <hit_by_pitch>0</hit_by_pitch>
  </player>
  <player>
    <first_name>Gary </first_name>
    <surname>Disarcina</surname>
    <games_played>157</games_played>
    <at_bats>551</at_bats>
    <runs>73</runs>
    <hits>158</hits>
    <doubles>39</doubles>
    <triples>3</triples>
    <home_runs>3</home_runs>
    <stolen_bases>56</stolen_bases>
    <caught_stealing>7</caught_stealing>
    <sacrifice_hits>12</sacrifice_hits>
    <sacrifice_flies>3</sacrifice_flies>
    <errors>14</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>21</walks>
    <strike_outs>51</strike_outs>
    <hit_by_pitch>8</hit_by_pitch>
  </player>
  <player>
    <first_name>Jim </first_name>
    <surname>Edmonds</surname>
    <games_played>154</games_played>
    <at_bats>599</at_bats>
    <runs>115</runs>
    <hits>184</hits>
    <doubles>42</doubles>
    <triples>1</triples>
    <home_runs>25</home_runs>
    <stolen_bases>91</stolen_bases>
    <caught_stealing>5</caught_stealing>
    <sacrifice_hits>1</sacrifice_hits>
    <sacrifice_flies>1</sacrifice_flies>
    <errors>5</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>57</walks>
    <strike_outs>114</strike_outs>
    <hit_by_pitch>1</hit_by_pitch>
  </player>
  <player>
    <first_name>Darin </first_name>
    <surname>Erstad</surname>
    <games_played>133</games_played>
    <at_bats>537</at_bats>
    <runs>84</runs>
    <hits>159</hits>
    <doubles>39</doubles>
    <triples>3</triples>
    <home_runs>19</home_runs>
    <stolen_bases>82</stolen_bases>
    <caught_stealing>6</caught_stealing>
    <sacrifice_hits>1</sacrifice_hits>
    <sacrifice_flies>3</sacrifice_flies>
    <errors>3</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>43</walks>
    <strike_outs>77</strike_outs>
    <hit_by_pitch>6</hit_by_pitch>
  </player>
  <player>
    <first_name>Carlos </first_name>
    <surname>Garcia</surname>
    <games_played>19</games_played>
    <at_bats>35</at_bats>
    <runs>4</runs>
    <hits>5</hits>
    <doubles>1</doubles>
    <triples>0</triples>
    <home_runs>0</home_runs>
    <stolen_bases>0</stolen_bases>
    <caught_stealing>0</caught_stealing>
    <sacrifice_hits>1</sacrifice_hits>
    <sacrifice_flies>0</sacrifice_flies>
    <errors>1</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>3</walks>
    <strike_outs>11</strike_outs>
    <hit_by_pitch>1</hit_by_pitch>
  </player>
  <player>
    <first_name>Troy </first_name>
    <surname>Glaus</surname>
    <games_played>48</games_played>
    <at_bats>165</at_bats>
    <runs>19</runs>
    <hits>36</hits>
    <doubles>9</doubles>
    <triples>0</triples>
    <home_runs>1</home_runs>
    <stolen_bases>23</stolen_bases>
    <caught_stealing>0</caught_stealing>
    <sacrifice_hits>0</sacrifice_hits>
    <sacrifice_flies>2</sacrifice_flies>
    <errors>7</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>15</walks>
    <strike_outs>51</strike_outs>
    <hit_by_pitch>0</hit_by_pitch>
  </player>
  <player>
    <first_name>Todd </first_name>
    <surname>Greene</surname>
    <games_played>29</games_played>
    <at_bats>71</at_bats>
    <runs>3</runs>
    <hits>18</hits>
    <doubles>4</doubles>
    <triples>0</triples>
    <home_runs>1</home_runs>
    <stolen_bases>7</stolen_bases>
    <caught_stealing>0</caught_stealing>
    <sacrifice_hits>0</sacrifice_hits>
    <sacrifice_flies>0</sacrifice_flies>
    <errors>0</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>2</walks>
    <strike_outs>20</strike_outs>
    <hit_by_pitch>0</hit_by_pitch>
  </player>
  <player>
    <first_name>Eric </first_name>
    <surname>Helfand</surname>
    <games_played>0</games_played>
    <at_bats>0</at_bats>
    <runs>0</runs>
    <hits>0</hits>
    <doubles>0</doubles>
    <triples>0</triples>
    <home_runs>0</home_runs>
    <stolen_bases>0</stolen_bases>
    <caught_stealing>0</caught_stealing>
    <sacrifice_hits>0</sacrifice_hits>
    <sacrifice_flies>0</sacrifice_flies>
    <errors>0</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>0</walks>
    <strike_outs>0</strike_outs>
    <hit_by_pitch>0</hit_by_pitch>
  </player>
  <player>
    <first_name>Dave </first_name>
    <surname>Hollins</surname>
    <games_played>101</games_played>
    <at_bats>363</at_bats>
    <runs>60</runs>
    <hits>88</hits>
    <doubles>16</doubles>
    <triples>2</triples>
    <home_runs>11</home_runs>
    <stolen_bases>39</stolen_bases>
    <caught_stealing>3</caught_stealing>
    <sacrifice_hits>2</sacrifice_hits>
    <sacrifice_flies>2</sacrifice_flies>
    <errors>17</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>44</walks>
    <strike_outs>69</strike_outs>
    <hit_by_pitch>7</hit_by_pitch>
  </player>
  <player>
    <first_name>Gregg </first_name>
    <surname>Jefferies</surname>
    <games_played>19</games_played>
    <at_bats>72</at_bats>
    <runs>7</runs>
    <hits>25</hits>
    <doubles>6</doubles>
    <triples>0</triples>
    <home_runs>1</home_runs>
    <stolen_bases>10</stolen_bases>
    <caught_stealing>0</caught_stealing>
    <sacrifice_hits>0</sacrifice_hits>
    <sacrifice_flies>0</sacrifice_flies>
    <errors>0</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>0</walks>
    <strike_outs>5</strike_outs>
    <hit_by_pitch>0</hit_by_pitch>
  </player>
  <player>
    <first_name>Mark </first_name>
    <surname>Johnson</surname>
    <games_played>10</games_played>
    <at_bats>14</at_bats>
    <runs>1</runs>
    <hits>1</hits>
    <doubles>0</doubles>
    <triples>0</triples>
    <home_runs>0</home_runs>
    <stolen_bases>0</stolen_bases>
    <caught_stealing>0</caught_stealing>
    <sacrifice_hits>0</sacrifice_hits>
    <sacrifice_flies>0</sacrifice_flies>
    <errors>0</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>0</walks>
    <strike_outs>6</strike_outs>
    <hit_by_pitch>0</hit_by_pitch>
  </player>
  <player>
    <first_name>Chad </first_name>
    <surname>Kreuter</surname>
    <games_played>96</games_played>
    <at_bats>252</at_bats>
    <runs>27</runs>
    <hits>63</hits>
    <doubles>10</doubles>
    <triples>1</triples>
    <home_runs>2</home_runs>
    <stolen_bases>33</stolen_bases>
    <caught_stealing>0</caught_stealing>
    <sacrifice_hits>5</sacrifice_hits>
    <sacrifice_flies>1</sacrifice_flies>
    <errors>9</errors>
    <passed_by_ball>5</passed_by_ball>
    <walks>33</walks>
    <strike_outs>49</strike_outs>
    <hit_by_pitch>3</hit_by_pitch>
  </player>
  <player>
    <first_name>Norberto </first_name>
    <surname>Martin</surname>
    <games_played>79</games_played>
    <at_bats>195</at_bats>
    <runs>20</runs>
    <hits>42</hits>
    <doubles>2</doubles>
    <triples>0</triples>
    <home_runs>1</home_runs>
    <stolen_bases>13</stolen_bases>
    <caught_stealing>1</caught_stealing>
    <sacrifice_hits>3</sacrifice_hits>
    <sacrifice_flies>2</sacrifice_flies>
    <errors>4</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>6</walks>
    <strike_outs>29</strike_outs>
    <hit_by_pitch>0</hit_by_pitch>
  </player>
  <player>
    <first_name>Damon </first_name>
    <surname>Mashore</surname>
    <games_played>43</games_played>
    <at_bats>98</at_bats>
    <runs>13</runs>
    <hits>23</hits>
    <doubles>6</doubles>
    <triples>0</triples>
    <home_runs>2</home_runs>
    <stolen_bases>11</stolen_bases>
    <caught_stealing>0</caught_stealing>
    <sacrifice_hits>1</sacrifice_hits>
    <sacrifice_flies>0</sacrifice_flies>
    <errors>0</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>9</walks>
    <strike_outs>22</strike_outs>
    <hit_by_pitch>3</hit_by_pitch>
  </player>
  <player>
    <first_name>Ben </first_name>
    <surname>Molina</surname>
    <games_played>2</games_played>
    <at_bats>1</at_bats>
    <runs>0</runs>
    <hits>0</hits>
    <doubles>0</doubles>
    <triples>0</triples>
    <home_runs>0</home_runs>
    <stolen_bases>0</stolen_bases>
    <caught_stealing>0</caught_stealing>
    <sacrifice_hits>0</sacrifice_hits>
    <sacrifice_flies>0</sacrifice_flies>
    <errors>0</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>0</walks>
    <strike_outs>0</strike_outs>
    <hit_by_pitch>0</hit_by_pitch>
  </player>
  <player>
    <first_name>Phil </first_name>
    <surname>Nevin</surname>
    <games_played>75</games_played>
    <at_bats>237</at_bats>
    <runs>27</runs>
    <hits>54</hits>
    <doubles>8</doubles>
    <triples>1</triples>
    <home_runs>8</home_runs>
    <stolen_bases>27</stolen_bases>
    <caught_stealing>0</caught_stealing>
    <sacrifice_hits>0</sacrifice_hits>
    <sacrifice_flies>2</sacrifice_flies>
    <errors>5</errors>
    <passed_by_ball>20</passed_by_ball>
    <walks>17</walks>
    <strike_outs>67</strike_outs>
    <hit_by_pitch>5</hit_by_pitch>
  </player>
  <player>
    <first_name>Charlie </first_name>
    <surname>Obrien</surname>
    <games_played>62</games_played>
    <at_bats>175</at_bats>
    <runs>13</runs>
    <hits>45</hits>
    <doubles>9</doubles>
    <triples>0</triples>
    <home_runs>4</home_runs>
    <stolen_bases>18</stolen_bases>
    <caught_stealing>0</caught_stealing>
    <sacrifice_hits>3</sacrifice_hits>
    <sacrifice_flies>3</sacrifice_flies>
    <errors>4</errors>
    <passed_by_ball>1</passed_by_ball>
    <walks>10</walks>
    <strike_outs>33</strike_outs>
    <hit_by_pitch>2</hit_by_pitch>
  </player>
  <player>
    <first_name>Orlando </first_name>
    <surname>Palmeiro</surname>
    <games_played>74</games_played>
    <at_bats>165</at_bats>
    <runs>28</runs>
    <hits>53</hits>
    <doubles>7</doubles>
    <triples>2</triples>
    <home_runs>0</home_runs>
    <stolen_bases>21</stolen_bases>
    <caught_stealing>4</caught_stealing>
    <sacrifice_hits>7</sacrifice_hits>
    <sacrifice_flies>0</sacrifice_flies>
    <errors>0</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>20</walks>
    <strike_outs>11</strike_outs>
    <hit_by_pitch>0</hit_by_pitch>
  </player>
  <player>
    <first_name>Chris </first_name>
    <surname>Pritchett</surname>
    <games_played>31</games_played>
    <at_bats>80</at_bats>
    <runs>12</runs>
    <hits>23</hits>
    <doubles>2</doubles>
    <triples>1</triples>
    <home_runs>2</home_runs>
    <stolen_bases>8</stolen_bases>
    <caught_stealing>0</caught_stealing>
    <sacrifice_hits>0</sacrifice_hits>
    <sacrifice_flies>0</sacrifice_flies>
    <errors>1</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>4</walks>
    <strike_outs>16</strike_outs>
    <hit_by_pitch>0</hit_by_pitch>
  </player>
  <player>
    <first_name>Tim </first_name>
    <surname>Salmon</surname>
    <games_played>136</games_played>
    <at_bats>463</at_bats>
    <runs>84</runs>
    <hits>139</hits>
    <doubles>28</doubles>
    <triples>1</triples>
    <home_runs>26</home_runs>
    <stolen_bases>88</stolen_bases>
    <caught_stealing>1</caught_stealing>
    <sacrifice_hits>0</sacrifice_hits>
    <sacrifice_flies>10</sacrifice_flies>
    <errors>2</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>90</walks>
    <strike_outs>100</strike_outs>
    <hit_by_pitch>3</hit_by_pitch>
  </player>
  <player>
    <first_name>Craig </first_name>
    <surname>Shipley</surname>
    <games_played>77</games_played>
    <at_bats>147</at_bats>
    <runs>18</runs>
    <hits>38</hits>
    <doubles>7</doubles>
    <triples>1</triples>
    <home_runs>2</home_runs>
    <stolen_bases>17</stolen_bases>
    <caught_stealing>4</caught_stealing>
    <sacrifice_hits>4</sacrifice_hits>
    <sacrifice_flies>1</sacrifice_flies>
    <errors>3</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>5</walks>
    <strike_outs>22</strike_outs>
    <hit_by_pitch>5</hit_by_pitch>
  </player>
  <player>
    <first_name>Randy </first_name>
    <surname>Velarde</surname>
    <games_played>51</games_played>
    <at_bats>188</at_bats>
    <runs>29</runs>
    <hits>49</hits>
    <doubles>13</doubles>
    <triples>1</triples>
    <home_runs>4</home_runs>
    <stolen_bases>26</stolen_bases>
    <caught_stealing>2</caught_stealing>
    <sacrifice_hits>0</sacrifice_hits>
    <sacrifice_flies>1</sacrifice_flies>
    <errors>4</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>34</walks>
    <strike_outs>42</strike_outs>
    <hit_by_pitch>1</hit_by_pitch>
  </player>
  <player>
    <first_name>Matt </first_name>
    <surname>Walbeck</surname>
    <games_played>108</games_played>
    <at_bats>338</at_bats>
    <runs>41</runs>
    <hits>87</hits>
    <doubles>15</doubles>
    <triples>2</triples>
    <home_runs>6</home_runs>
    <stolen_bases>46</stolen_bases>
    <caught_stealing>1</caught_stealing>
    <sacrifice_hits>5</sacrifice_hits>
    <sacrifice_flies>5</sacrifice_flies>
    <errors>7</errors>
    <passed_by_ball>8</passed_by_ball>
    <walks>30</walks>
    <strike_outs>68</strike_outs>
    <hit_by_pitch>2</hit_by_pitch>
  </player>
  <player>
    <first_name>Reggie </first_name>
    <surname>Williams</surname>
    <games_played>29</games_played>
    <at_bats>36</at_bats>
    <runs>7</runs>
    <hits>13</hits>
    <doubles>1</doubles>
    <triples>0</triples>
    <home_runs>1</home_runs>
    <stolen_bases>5</stolen_bases>
    <caught_stealing>3</caught_stealing>
    <sacrifice_hits>1</sacrifice_hits>
    <sacrifice_flies>0</sacrifice_flies>
    <errors>0</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>7</walks>
    <strike_outs>11</strike_outs>
    <hit_by_pitch>1</hit_by_pitch>
  </player>
</players>

A Shortcut

import java.io.*;
import org.jdom.*;
import org.jdom.output.XMLOutputter;


public class BaseballTabToXMLShortcut {

  public static void main(String[] args) {
     
    Element root = new Element("players");
    
    try {
      FileInputStream fin = new FileInputStream(args[0]);
      BufferedReader in 
       = new BufferedReader(new InputStreamReader(fin));    

      String playerStats;  
      while ((playerStats = in.readLine()) != null) {
        String[] stats = splitLine(playerStats);
        
        Element player = new Element("player");

        player.addContent((new Element("first_name")).setText(stats[1]));
        player.addContent((new Element("surname")).setText(stats[0]));
        player.addContent((new Element("games_played")).setText(stats[4]));
        player.addContent((new Element("at_bats")).setText(stats[6]));
        player.addContent((new Element("runs")).setText(stats[7]));
        player.addContent((new Element("hits")).setText(stats[8]));
        player.addContent((new Element("doubles")).setText(stats[9]));
        player.addContent((new Element("triples")).setText(stats[10]));
        player.addContent((new Element("home_runs")).setText(stats[11]));
        player.addContent((new Element("runs_batted_in")).setText(stats[12]));
        player.addContent((new Element("stolen_bases")).setText(stats[13]));
        player.addContent((new Element("caught_stealing")).setText(stats[14]));
        player.addContent((new Element("sacrifice_hits")).setText(stats[15]));
        player.addContent((new Element("sacrifice_flies")).setText(stats[16]));
        player.addContent((new Element("errors")).setText(stats[17]));
        player.addContent((new Element("passed_by_ball")).setText(stats[18]));
        player.addContent((new Element("walks")).setText(stats[19]));
        player.addContent((new Element("strike_outs")).setText(stats[20]));
        player.addContent((new Element("hit_by_pitch")).setText(stats[21]));
        
        root.addContent(player);
      }  
      
      Document doc = new Document(root);
      // serialize it into a file
      FileOutputStream fout 
       = new FileOutputStream("baseballstats.xml");
      
      XMLOutputter serializer = new XMLOutputter(); 
      serializer.output(doc, fout);
      fout.flush();	
      fout.close();
      in.close();
      
    }
    catch (IOException e) {
      System.err.println(e);
    }
    catch (ArrayIndexOutOfBoundsException e) {
      System.out.println("Usage: java BaseballTabToXML input_file.tab");
    }

  }

  public static String[] splitLine(String playerStats) {
    
    // count the number of tabs
    int numTabs = 0;
    for (int i = 0; i < playerStats.length(); i++) {
      if (playerStats.charAt(i) == '\t') numTabs++;
    }
    int numFields = numTabs + 1;
    String[] fields = new String[numFields];
    int position = 0;
    for (int i = 0; i < numFields; i++) {
      StringBuffer field = new StringBuffer();
      while (position < playerStats.length() 
       && playerStats.charAt(position++) != '\t') {
        field.append(playerStats.charAt(position-1));
      }
      fields[i] = field.toString();
    }    
    return fields;
    
  }

}

Converting data to XML while Processing it

import java.io.*;
import java.text.*;
import java.util.*;
import org.jdom.*;
import org.jdom.output.XMLOutputter;

public class JDOMBattingAverage {

  public static void main(String[] args) {
     
    Element root = new Element("players");
     
    try {
      FileInputStream fin = new FileInputStream(args[0]);
      BufferedReader in 
       = new BufferedReader(new InputStreamReader(fin));
      
      String playerStats;
      
      // for formatting batting averages
      DecimalFormat averages = (DecimalFormat) 
       NumberFormat.getNumberInstance(Locale.US);
      averages.setMaximumFractionDigits(3);
      averages.setMinimumFractionDigits(3);
      averages.setMinimumIntegerDigits(0);
      
      while ((playerStats = in.readLine()) != null) {
        String[] stats = splitLine(playerStats);
        
        String formattedAverage;
        try {
          int atBats         = Integer.parseInt(stats[6]);
          int hits           = Integer.parseInt(stats[8]);
        
          if (atBats <= 0) formattedAverage = "N/A";
          else {
            double average = hits / (double) atBats;
            formattedAverage = averages.format(average);
          }       
        }
        catch (Exception e) {
          // skip this player
          continue; 
        }

        Element player = new Element("player");

        Element first_name = new Element("first_name");
        first_name.setText(stats[1]);
        player.addContent(first_name);
             
        Element surname = new Element("surname");
        surname.setText(stats[0]);
        player.addContent(surname);
       
        Element battingAverage = new Element("batting_average");
        battingAverage.setText(formattedAverage);
        player.addContent(battingAverage);
   
        root.addContent(player);
        
      }  
      
      
      Document doc = new Document(root);
      // serialize it into a file
      FileOutputStream fout 
       = new FileOutputStream("battingaverages.xml");
      
      XMLOutputter serializer = new XMLOutputter(); 
      serializer.output(doc, fout);
      fout.flush();	
      fout.close();
      in.close();

    }
    catch (IOException e) {
      System.err.println(e);
    }
    catch (ArrayIndexOutOfBoundsException e) {
      System.out.println("Usage: java JDOMBattingAverage input_file.tab");
    }

  }

  public static String[] splitLine(String playerStats) {
    
    // count the number of tabs
    int numTabs = 0;
    for (int i = 0; i < playerStats.length(); i++) {
      if (playerStats.charAt(i) == '\t') numTabs++;
    }
    int numFields = numTabs + 1;
    String[] fields = new String[numFields];
    int position = 0;
    for (int i = 0; i < numFields; i++) {
      StringBuffer field = new StringBuffer();
      while (position < playerStats.length() 
       && playerStats.charAt(position++) != '\t') {
        field.append(playerStats.charAt(position-1));
      }
      fields[i] = field.toString();
    }    
    return fields;
    
  }

}

View Output in Browser

Batting Averages in XML

<?xml version="1.0"?>
<players>
  <player>
    <first_name>Garret </first_name>
    <surname>Anderson</surname>
    <batting_average>.311</batting_average>
  </player>
  <player>
    <first_name>Justin </first_name>
    <surname>Baughman</surname>
    <batting_average>.272</batting_average>
  </player>
  <player>
    <first_name>Frank </first_name>
    <surname>Bolick</surname>
    <batting_average>.206</batting_average>
  </player>
  <player>
    <first_name>Gary </first_name>
    <surname>Disarcina</surname>
    <batting_average>.310</batting_average>
  </player>
  <player>
    <first_name>Jim </first_name>
    <surname>Edmonds</surname>
    <batting_average>.341</batting_average>
  </player>
  <player>
    <first_name>Darin </first_name>
    <surname>Erstad</surname>
    <batting_average>.326</batting_average>
  </player>
  <player>
    <first_name>Carlos </first_name>
    <surname>Garcia</surname>
    <batting_average>.167</batting_average>
  </player>
  <player>
    <first_name>Troy </first_name>
    <surname>Glaus</surname>
    <batting_average>.240</batting_average>
  </player>
  <player>
    <first_name>Todd </first_name>
    <surname>Greene</surname>
    <batting_average>.261</batting_average>
  </player>
  <player>
    <first_name>Eric </first_name>
    <surname>Helfand</surname>
    <batting_average>N/A</batting_average>
  </player>
  <player>
    <first_name>Dave </first_name>
    <surname>Hollins</surname>
    <batting_average>.284</batting_average>
  </player>
  <player>
    <first_name>Gregg </first_name>
    <surname>Jefferies</surname>
    <batting_average>.347</batting_average>
  </player>
  <player>
    <first_name>Mark </first_name>
    <surname>Johnson</surname>
    <batting_average>.071</batting_average>
  </player>
  <player>
    <first_name>Chad </first_name>
    <surname>Kreuter</surname>
    <batting_average>.299</batting_average>
  </player>
  <player>
    <first_name>Norberto </first_name>
    <surname>Martin</surname>
    <batting_average>.226</batting_average>
  </player>
  <player>
    <first_name>Damon </first_name>
    <surname>Mashore</surname>
    <batting_average>.271</batting_average>
  </player>
  <player>
    <first_name>Ben </first_name>
    <surname>Molina</surname>
    <batting_average>.000</batting_average>
  </player>
  <player>
    <first_name>Phil </first_name>
    <surname>Nevin</surname>
    <batting_average>.251</batting_average>
  </player>
  <player>
    <first_name>Charlie </first_name>
    <surname>Obrien</surname>
    <batting_average>.281</batting_average>
  </player>
  <player>
    <first_name>Orlando </first_name>
    <surname>Palmeiro</surname>
    <batting_average>.384</batting_average>
  </player>
  <player>
    <first_name>Chris </first_name>
    <surname>Pritchett</surname>
    <batting_average>.303</batting_average>
  </player>
  <player>
    <first_name>Tim </first_name>
    <surname>Salmon</surname>
    <batting_average>.376</batting_average>
  </player>
  <player>
    <first_name>Craig </first_name>
    <surname>Shipley</surname>
    <batting_average>.286</batting_average>
  </player>
  <player>
    <first_name>Randy </first_name>
    <surname>Velarde</surname>
    <batting_average>.320</batting_average>
  </player>
  <player>
    <first_name>Matt </first_name>
    <surname>Walbeck</surname>
    <batting_average>.289</batting_average>
  </player>
  <player>
    <first_name>Reggie </first_name>
    <surname>Williams</surname>
    <batting_average>.481</batting_average>
  </player>
</players>

Advantages of JDOM for Writing Documents

You don't need to worry about well-formedness rules
Very configurable output
You can pick any encoding Java supports.
Validity is not automatically maintained.

Reading XML with JDOM

The stereotypical "Desperate Perl Hacker" (DPH) is supposed to be able to write an XML parser in a weekend.
The parser does the hard work for you.
Your code reads the document through by hooking up JDOM to the parser.
JDOM can connect to any parser that supports SAX or DOM.

JDOM Compatible Parsers for Java

Any SAX or DOM compatible parser including:

Apache XML Project's Xerces Java: http://xml.apache.org/xerces-j/index.html
Oracle's XML Parser for Java: http://technet.oracle.com/tech/xml/parser_java2
Sun's Java API for XML http://java.sun.com/products/xml

The Design of the DOM API

Parser independent interfaces; parser dependent implementation classes. Most programs must use the parser dependent classes. JAXP helps solve this.
Everything's a Node:
- Extensive use of polymorphism
- Lots of casting
Language independence means there's very limited use of the Java class library; Various features are reinvented
Language independence requires no method overloading because not all languages support it.
Several features are poor design in Java, if not in other languages:
- Named constants are often shorts
- Only one kind of exception; details provided by constants
- No Java-specific utility methods like equals(), hashCode(), clone(), or toString()

The JDOM Process

Construct an org.jdom.input.SAXBuilder or an org.jdom.input.DOMBuilder; no parser specific code is needed!
Invoke the builder's build() method to build a Document object from a
- Reader
- InputStream
- URL
- File
- String containing a SYSTEM ID
If there's a problem building the document, a JDOMException is thrown
Work with the resulting Document object

Parsing a Document with JDOM

import org.jdom.JDOMException;
import org.jdom.input.SAXBuilder;


public class JDOMChecker {

  public static void main(String[] args) {
  
    if (args.length == 0) {
      System.out.println("Usage: java JDOMChecker URL1 URL2..."); 
    } 
      
    SAXBuilder builder = new SAXBuilder();
     
    // start parsing... 
    for (int i = 0; i < args.length; i++) {
      
      // command line should offer URIs or file names
      try {
        builder.build(args[i]);
        // If there are no well-formedness errors, 
        // then no exception is thrown
        System.out.println(args[i] + " is well formed.");
      }
             // indicates a well-formedness or other error
      catch (JDOMException e) { 
        System.out.println(args[i] + " is not well formed.");
        System.out.println(e.getMessage());
      }
      
    }   
  
  }

}

Parser Results

% java JDOMChecker shortlogs.xml HelloJDOM.java
shortlogs.xml is well formed.
HelloJDOM.java is not well formed.
The markup in the document preceding the root element must be well-formed.: 
Error on line 1 of XML document: The markup in the document preceding the 
root element must be well-formed.

Turning on Validation in JDOM

Not all parsers are validating but Xerces-J is.
Validity errors are not fatal; therefore they do not necessarily cause a JDOMException
However, you can tell the builder you want it to validate by passing true to its constructor:
```
    SAXBuilder builder = new SAXBuilder(true);
```

JDOM Validator

import org.jdom.JDOMException;
import org.jdom.input.SAXBuilder;


public class JDOMValidator {

  public static void main(String[] args) {
  
    if (args.length == 0) {
      System.out.println("Usage: java Validator URL1 URL2..."); 
    } 
      
    SAXBuilder builder = new SAXBuilder(true);
                             /*         ^^^^       */
                             /* Turn on validation */

    // start parsing... 
    for (int i = 0; i < args.length; i++) {
      
      // command line should offer URIs or file names
      try {
          builder.build(args[i]);
        // If there are no well-formedness errors, 
        // then no exception is thrown
        System.out.println(args[i] + " is well formed.");
      }
      catch (JDOMException e) { // indicates a well-formedness or validity error
        System.out.println(args[i] + " is not valid.");
        System.out.println(e.getMessage());
      }
      
    }   
  
  }

}

Validation Output

% java JDOMValidator invalid_fibonacci.xml
invalid_fibonacci.xml is not valid.
Element type "title" must be declared.: Error on line 8 of XML document: 
Element type "title" must be declared.

% java JDOMValidator validfibonacci.xml
validfibonacci.xml is valid.

Building with DOM instead of SAX

Use DOMBuilder instead of SAXBuilder
Must have an existing DOM tree, specifically an org.w3c.dom.Document (Note the name conflict with org.jdom.Document)
DOM validation is currently broken.
Approximately doubles the memory usage.
In general, SAX is easier and more efficient.

DOMBuilder Example

import org.jdom.*;
import org.jdom.input.DOMBuilder;
import org.apache.xerces.parsers.*;


public class DOMValidator {

  public static void main(String[] args) {
  
    if (args.length == 0) {
      System.out.println("Usage: java DOMValidator URL1 URL2..."); 
    }      
      
    DOMBuilder builder = new DOMBuilder(true);
                             /*         ^^^^       */
                             /* Turn on validation */
    // start parsing... 
    DOMParser parser = new DOMParser();  // Xerces specific class
    for (int i = 0; i < args.length; i++) {
        
      try {
        // Read the entire document into memory
        parser.parse(args[i]); 
    
        org.w3c.dom.Document domDoc  = parser.getDocument();
        org.jdom.Document    jdomDoc = builder.build(domDoc);

        // If there are no validity errors, 
        // then no exception is thrown
        System.out.println(args[i] + " is valid.");
      }
      catch (Exception e) { // indicates a well-formedness or validity error
        System.out.println(args[i] + " is not valid.");
        System.out.println(e.getMessage());
      }
      
    }   
  
  }

}

Weblogs: One Task, Three Implementations

UserLand's RSS based list of Web logs at http://static.userland.com/weblogMonitor/logs.xml:

<?xml version="1.0" encoding="ISO-8859-1" ?>
<!DOCTYPE foo SYSTEM "http://msdn.microsoft.com/xml/general/htmlentities.dtd">
<weblogs>
    <log>
        <name>MozillaZine</name>
        <url>http://www.mozillazine.org</url>
        <changesUrl>http://www.mozillazine.org/contents.rdf</changesUrl>
        <ownerName>Jason Kersey</ownerName>
        <ownerEmail>kerz@en.com</ownerEmail>
        <description>THE source for news on the Mozilla Organization.  DevChats, Reviews, Chats, Builds, Demos, Screenshots, and more.</description>
        <imageUrl></imageUrl>
        <adImageUrl>http://static.userland.com/weblogMonitor/ads/kerz@en.com.gif</adImageUrl>
        </log>
    <log>
        <name>SalonHerringWiredFool</name>
        <url>http://www.salonherringwiredfool.com/</url>
        <ownerName>Some Random Herring</ownerName>
        <ownerEmail>salonfool@wiredherring.com</ownerEmail>
        <description></description>
        </log>
    <log>
        <name>SlashDot.Org</name>
        <url>http://www.slashdot.org/</url>
        <ownerName>Simply a friend</ownerName>
        <ownerEmail>afriendofweblogs@weblogs.com</ownerEmail>
        <description>News for Nerds, Stuff that Matters.</description>
        </log>
    </weblogs>

Full list

Goal: Return a list of all the URLs in this list as java.net.URL objects

Design Decisions

Should we return an array, an Enumeration, a List, or what?
Perhaps we should use multiple threads?

SAX Design

We do not know how many URLs there will be when we start parsing so let's use a Vector
Single threaded for simplicity but a real program would use multiple threads
- One to load and parse the data
- Another thread (probably the main thread) to serve the data
- Early data could be provided before the entire document had been read
The character data of each url element needs to be stored. Everything else can be ignored.
A startElement() with the name url indicates that we need to start storing this data.
A stopElement() with the name url indicates that we need to stop storing this data, convert it to a URL and put it in the Vector
Should we hide the XML parsing inside a non-public class to avoid accidentally calling the methods from unexpected places or threads?

User Interface Class

import org.xml.sax.*;
import org.xml.sax.helpers.XMLReaderFactory;
import java.util.*;
import java.io.*;


public class WeblogsSAX {
     
  public static List listChannels() 
   throws IOException, SAXException {
    return listChannels(
     "http://static.userland.com/weblogMonitor/logs.xml"); 
  }
  
  public static List listChannels(String uri) 
   throws IOException, SAXException {
    
    XMLReader parser = XMLReaderFactory.createXMLReader();
    Vector urls = new Vector(1000);
    URIGrabber u = new URIGrabber(urls);
    parser.setContentHandler(u);
    parser.parse(uri);
    return urls;
    
  }
  
  public static void main(String[] args) {
   
    try {
      List urls;
      if (args.length > 0) urls = listChannels(args[0]);
      else urls = listChannels();
      Iterator iterator = urls.iterator();
      while (iterator.hasNext()) {
        System.out.println(iterator.next()); 
      }
    }
    catch (IOException e) {
      System.err.println(e); 
    }
    catch (SAXParseException e) {
      System.err.println(e); 
      System.err.println("at line " + e.getLineNumber() 
       + ", column " + e.getColumnNumber()); 
    }
    catch (SAXException e) {
      System.err.println(e); 
    }
    catch (/* Unexpected */ Exception e) {
      e.printStackTrace(); 
    }
    
  }
  
}

ContentHandler Class

import org.xml.sax.*;
import java.net.*;
import java.util.Vector;

             // conflicts with java.net.ContentHandler
class URIGrabber implements org.xml.sax.ContentHandler {
    
  private Vector urls;
     
  URIGrabber(Vector urls) {
    this.urls = urls;
  }
    
  // do nothing methods  
  public void setDocumentLocator(Locator locator) {}
  public void startDocument() throws SAXException {}
  public void endDocument() throws SAXException {}
  public void startPrefixMapping(String prefix, String uri) 
   throws SAXException {}
  public void endPrefixMapping(String prefix) throws SAXException {}
  public void skippedEntity(String name) throws SAXException {}  
  public void ignorableWhitespace(char[] text, int start, int length)
   throws SAXException {}
  public void processingInstruction(String target, String data)
   throws SAXException {}
  
  
  // Remember, there's no guarantee all the text of the
  // url element will be returned in a single call to characters
  private StringBuffer urlBuffer;
  private boolean collecting = false;
  
  public void startElement(String namespaceURI, String localName,
   String qualifiedName, Attributes atts) throws SAXException {
	  
    if (qualifiedName.equals("url")) {
      collecting = true;
      urlBuffer = new StringBuffer();
    } 
    
  }
  
  public void characters(char[] text, int start, int length) 
   throws SAXException {
    
    if (collecting) {
      urlBuffer.append(text, start, length);
    } 
    
  }
  
  public void endElement(String namespaceURI, String localName,
   String qualifiedName) throws SAXException {
	  
    if (qualifiedName.equals("url")) {
      collecting = false;
      String url = urlBuffer.toString();
      try {
        urls.addElement(new URL(url));
      }
      catch (MalformedURLException e) {
        // skip this url
      }
    }
    
  } 
    
}

Weblogs Output

% java Weblogs shortlogs.xml
http://www.mozillazine.org
http://www.salonherringwiredfool.com/
http://www.slashdot.org/

DOM Design

We can easily find out how many URLs there will be when we start parsing, since they're all in memory.
Single threaded by nature; no benefit to multiple threads since no data will be available until the entire document has been read and parsed.
The character data of each url element needs to be read. Everything else can be ignored.
The getElementsByTagName() method in Document gives us a quick list of all the url elements.
The XML parsing is so straight-forward it can be done inside one method. No extra class is required.

Weblogs with DOM

import org.w3c.dom.*;
import org.w3c.dom.traversal.*;
import org.xml.sax.SAXException;
import java.io.IOException;
import java.util.*;
import java.net.*;


public class WeblogsDOM {

  public static String DEFAULT_URL 
   = "http://static.userland.com/weblogMonitor/logs.xml"; 
     
  public static List listChannels() throws DOMException {
    return listChannels(DEFAULT_URL); 
  }
  
  public static List listChannels(String uri) throws DOMException {
    
    if (uri == null) {
      throw new NullPointerException("URL must be non-null");   
    }

    org.apache.xerces.parsers.DOMParser parser 
     = new org.apache.xerces.parsers.DOMParser();
    
    Vector urls = null;
    
    try {
      // Read the entire document into memory
      parser.parse(uri); 
      Document doc = parser.getDocument();
      org.apache.xerces.dom.DocumentImpl impl 
       = (org.apache.xerces.dom.DocumentImpl) doc;
      NodeIterator iterator = impl.createNodeIterator(doc, 
       NodeFilter.SHOW_ALL, new URLFilter(), true);
      urls = new Vector(100);

      Node current = null;
      while ((current = iterator.nextNode()) != null) {
        try {
          String content = current.getNodeValue();
          URL u = new URL(content);
          urls.addElement(u);
        }
        catch (MalformedURLException e) {
          // bad input data from one third party; just ignore it 
        }
      }
    }
    catch (SAXException e) {
      System.err.println(e); 
    }
    catch (IOException e) {
      System.err.println(e); 
    }
    
    return urls;
    
  }
  
  static class URLFilter implements NodeFilter {
        
    public short acceptNode(Node n) {
      
      if (n instanceof Text) {
        Node parent = n.getParentNode();
        if (parent instanceof Element) {
          Element e = (Element) parent;
          if (e.getTagName().equals("url")) {
            return NodeFilter.FILTER_ACCEPT;       
          }
        }
      }
      
      return NodeFilter.FILTER_REJECT;
      
    }
    
  }
    
  public static void main(String[] args) {
     
    try {
      List urls;
      if (args.length > 0) {
        try {
          URL url = new URL(args[0]);
          urls = listChannels(args[0]);
        }
        catch (MalformedURLException e) {
          System.err.println("Usage: java WeblogsJDOM url");
          return;
        }
      }
      else {
        urls = listChannels();
      }
      Iterator iterator = urls.iterator();
      while (iterator.hasNext()) {
        System.out.println(iterator.next()); 
      }
    }
    catch (/* Unexpected */ Exception e) {
      e.printStackTrace(); 
    }
    
  } // end main

}

Weblogs Output

% java WeblogsDOM
http://2020Hindsight.editthispage.com/
http://www.sff.net/people/mitchw/weblog/weblog.htp
http://nate.weblogs.com/
http://plugins.launchpoint.net
http://404.psistorm.net
http://home.att.net/~geek9000
http://daubnet.tzo.com/weblog
several hundred more...

JDOM Design

We can easily find out how many URLs there will be when we start parsing.
Single threaded by nature; no benefit to mutiple threads since no data will be available until the entire document has been read and parsed.
The character data of each url element needs to be read. Everything else can be ignored.
The format is very straight-forward so we don't need to traverse the entire tree.
The XML parsing is so straight-forward it can be done inside one method. No extra class is required.

Weblogs with JDOM

import org.jdom.*;
import org.jdom.input.SAXBuilder;
import java.util.*;
import java.net.*;


public class WeblogsJDOM {
   
  public static String DEFAULT_SYSTEM_ID 
   = "http://static.userland.com/weblogMonitor/logs.xml"; 
     
  public static List listChannels() throws JDOMException {
    return listChannels(DEFAULT_SYSTEM_ID); 
  }
  
  public static List listChannels(String systemID) 
   throws JDOMException, NullPointerException {
    
    if (systemID == null) {
      throw new NullPointerException("URL must be non-null");   
    }
    
    SAXBuilder builder = new SAXBuilder();
    // Load the entire document into memory 
    // from the network or file system
    Document doc = builder.build(systemID);
    
    // Descend the tree and find the URLs. It helps that
    // the document has a very regular structure.
    Element weblogs = doc.getRootElement();
    List logs = weblogs.getChildren("log");
    Vector urls = new Vector(logs.size());
    Iterator iterator = logs.iterator();
    while (iterator.hasNext()) {
      Object o = iterator.next();
      Element log = (Element) o;
      try {
                         // This will probably be changed to 
                         //  getElement() or getChildElement() 
        Element url = log.getChild("url"); 
        if (url == null) continue;
        String content = url.getTextTrim();
        URL u = new URL(content);
        urls.addElement(u);
      }
      catch (MalformedURLException e) {
        // bad input data from one third party; just ignore it 
      }
    }
    return urls;
    
  }
  
  public static void main(String[] args) {
   
    try {
      List urls;
      if (args.length > 0) {
        urls = listChannels(args[0]);
      }
      else {
        urls = listChannels();
      }
      Iterator iterator = urls.iterator();
      while (iterator.hasNext()) {
        System.out.println(iterator.next()); 
      }
    }
    catch (/* Unexpected */ Exception e) {
      e.printStackTrace(); 
    }
    
  }
  
}

Weblogs Output

% java WeblogsJDOM
http://2020Hindsight.editthispage.com/
http://www.sff.net/people/mitchw/weblog/weblog.htp
http://nate.weblogs.com/
http://plugins.launchpoint.net
http://404.psistorm.net
http://home.att.net/~geek9000
http://daubnet.tzo.com/weblog
several hundred more...

The org.jdom Package

The classes that represent an XML document and its parts

Document
Element
Attribute
Comment
DocType
EntityRef
Text
CDATA
ProcessingInstruction
Verifier
plus assorted exceptions

The Document Node

The root node containing the entire document; not the same as the root element
Contains:
- one element
- zero or more processing instructions
- zero or more comments
- zero or one document type declarations

The Document Class

package org.jdom;

public class Document implements Serializable, Cloneable {

  protected List    content;
  protected DocType docType;

  protected Document() {}
  public    Document(Element rootElement) {}
  public    Document(Element rootElement, DocType docType) {}
  public    Document(List content) {}
  public    Document(List content, DocType doctype) {}

  public Element   getRootElement() {}
  public Document  setRootElement(Element rootElement) {}
  public DocType   getDocType() {}
  public Document  setDocType(DocType docType) {}
  public List      getMixedContent() {}
  public Document  addContent(ProcessingInstruction pi) {}
  public Document  addContent(Comment comment) {}
  public Document  setMixedContent(List mixedContent) {}
  
  // basic utility methods
  public final String  toString() {}
  public final boolean equals(Object ob) {}
  public final int     hashCode() {}
  public final Object  clone() {}

}

Document Example

import org.jdom.Document;
import org.jdom.JDOMException;
import org.jdom.input.SAXBuilder;
import org.jdom.output.XMLOutputter;
import java.io.IOException;


public class XMLPrinter {

  public static void main(String[] args) {
  
    if (args.length == 0) {
      System.out.println("Usage: java XMLPrinter URL1 URL2..."); 
    } 
      
    SAXBuilder builder = new SAXBuilder();
     
    // start parsing... 
    for (int i = 0; i < args.length; i++) {
      
      // command line should offer URIs or file names
      try {
        Document doc = builder.build(args[i]);
        System.out.println("*************" + args[i] + "*************");
        XMLOutputter outputter = new XMLOutputter();
        outputter.output(doc, System.out);
      }
      catch (JDOMException e) { // indicates a well-formedness or other error
        System.out.println(args[i] + " is not well formed.");
        System.out.println(e.getMessage());
      }
      catch (IOException e) { // shouldn't happen beacuse System.out eats exceptions
        System.out.println(e.getMessage());
      }
      
    }   
  
  }

}

Output from XMLPrinter

% java XMLPrinter shortlogs.xml
*************shortlogs.xml*************
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE foo SYSTEM "http://msdn.microsoft.com/xml/general/htmlentities.dtd"><weblogs>
        <log>
                <name>MozillaZine</name>
                <url>http://www.mozillazine.org</url>
                <changesUrl>http://www.mozillazine.org/contents.rdf</changesUrl>

                <ownerName>Jason Kersey</ownerName>
                <ownerEmail>kerz@en.com</ownerEmail>
                <description>THE source for news on the Mozilla Organization.  DevChats, Reviews, Chats, Builds, Demos, Screenshots, and more.</description>
                <imageUrl />
                <adImageUrl>http://static.userland.com/weblogMonitor/ads/kerz@en.com.gif</adImageUrl>
                </log>
        <log>
                <name>SalonHerringWiredFool</name>
                <url>http://www.salonherringwiredfool.com/</url>
                <ownerName>Some Random Herring</ownerName>
                <ownerEmail>salonfool@wiredherring.com</ownerEmail>
                <description />
                </log>
        <log>
                <name>SlashDot.Org</name>
                <url>http://www.slashdot.org/</url>
                <ownerName>Simply a friend</ownerName>
                <ownerEmail>afriendofweblogs@weblogs.com</ownerEmail>
                <description>News for Nerds, Stuff that Matters.</description>
                </log>
        </weblogs>

Element Nodes

Represents a complete element including its start tag, end tag, and content
Contains:
- Child Elements
- Processing Instructions
- Comments
- Text
- CDATA sectiond
- Entity references
JDOM enforces restrictions on element names and possibly values; e.g. name cannot contain start with a digit.

Element Class Implementation

The content is stored as a java.util.List which contains
- One String (soon to be Text) object per text node
- One Element object per child element
- One Comment object per comment
- One CDATA object per CDATA section (Text?)
- One ProcessingInstruction object per processing instruction
Use the regular methods of java.util.List to add, remove, and inspect the contents of an element
Since the methods of java.util.List expect to work with Object objects, casting back to JDOM types and String is frequent
Various utility methods mean you don't always have to work with the full list.
Attributes and namespaces are available as separate lists since these are not children.

The Element Class

package org.jdom;

public class Element implements Serializable, Cloneable {

    protected           String    name;
    protected           Namespace namespace;
    protected           Object    parent;
    protected           List      attributes;
    protected transient ArrayList additionalNamespaces
    protected           ArrayList content;

    protected Element() {}
    public    Element(String name, Namespace namespace) {}
    public    Element(String name) {}
    public    Element(String name, String uri) {}
    public    Element(String name, String prefix, String uri) {}

    public String     getName() {}
    public Namespace  getNamespace() {}
    public Namespace  getNamespace(String prefix) {}
    public String     getNamespacePrefix() {}
    public String     getNamespaceURI() {}
    public String     getQualifiedName() {}
    public Element    getParent() {}
    
    protected Element setParent(Element parent) {}
    public    boolean isRootElement() {}
    protected Element setIsRootElement(boolean isRootElement) {}
    public    Element setChildren(List children)
    protected Element setDocument(Document document)
    public    Element setMixedContent(List mixedContent)
    public    Element setName(String name)
    public    Element setNamespace(Namespace namespace)
    public    Element setText(String text)

    public String    getText() {} 
    public String    getTextTrim() {} 
    public String    getTextNormalize() {} 
    public List      getMixedContent() {}
    public String    getChildText(String name) {} 
    public String    getChildTextTrim(String name) {} 
    public String    getChildText(String name, Namespace ns) {} 

    public Element   setMixedContent(List mixedContent) {} 
    public List      getChildren() {} 
    public Element   setChildren(List children) {} 
    public List      getChildren(String name) {} 
    public List      getChildren(String name, Namespace ns) {} 
    public Element   getChild(String name, Namespace ns) {} 
    public Element   getChild(String name) {} 
    public boolean   removeChild(String name) {} 
    public boolean   removeChild(String name, Namespace ns) {} 
    public boolean   removeChildren(String name) {}
    public boolean   removeChildren(String name, Namespace ns) {} 
    public boolean   removeChildren() {} 
    
    public Element   addContent(String text) {}
    public Element   addContent(Element element) {} 
    public Element   addContent(ProcessingInstruction pi) {} 
    public Element   addContent(EntityRef entity) {} 
    public Element   addContent(Comment comment) {} 
    public Element   addContent(CDATA cdata) {} 
    public boolean   removeContent(Element element) {} 
    public boolean   removeContent(CDATA cdata) {} 
    public boolean   removeContent(ProcessingInstruction pi) {} 
    public boolean   removeContent(EntityRef entity) {} 
    public boolean   removeContent(Comment comment) {} 
    
    public List      getAttributes() {} 
    public Attribute getAttribute(String name) {} 
    public Attribute getAttribute(String name, Namespace ns) {} 
    public String    getAttributeValue(String name) {} 
    public String    getAttributeValue(String name, Namespace ns) {} 
    public Element   setAttribute(Attribute attribute) {} 
    public Element   setAttributes(List attributes) {} 
    public boolean   removeAttribute(String name) {} 
    public boolean   removeAttribute(String name, Namespace ns) {} 

    public void addNamespaceDeclaration(Namespace additionalNamespace) {}
    public void removeNamespaceDeclaration(Namespace additionalNamespace) {}
    public List getAdditionalNamespaces() {}

    public Element detach() {}
    
    ///////////////////////////////////////
    // Basic Utility Methods
    /////////////////////////////////////// 
    public final String  toString() {}
    public final boolean equals(Object ob) {}
    public final int     hashCode() {}
    public final Object  clone() {}
    
}

Element Example: XCount

import org.jdom.*;
import org.jdom.input.SAXBuilder;
import java.util.*;


public class XCount {

  public static void main(String[] args) {
  
    if (args.length == 0) {
      System.out.println("Usage: java XCount URL1 URL2..."); 
    } 
      
    SAXBuilder builder = new SAXBuilder();
     
    System.out.println(
     "File\tElements\tAttributes\tComments\tProcessing Instructions\tCharacters");
    // start parsing... 
    for (int i = 0; i < args.length; i++) {
      
      // command line should offer URIs or file names
      try {
        Document doc = builder.build(args[i]);
        System.out.print(args[i] + ":\t");
        String result = count(doc);
        System.out.println(result);
      }
      catch (JDOMException e) { // indicates a well-formedness or other error
        System.out.println(args[i] + " is not a well formed XML document.");
        System.out.println(e.getMessage());
      }
      
    }   
  
  }  

  private static int numCharacters             = 0;
  private static int numComments               = 0;
  private static int numElements               = 0;
  private static int numAttributes             = 0;
  private static int numProcessingInstructions = 0;
      
  public static String count(Document doc) {

    numCharacters = 0;
    numComments = 0;
    numElements = 0;
    numAttributes = 0;
    numProcessingInstructions = 0;  

    List children = doc.getMixedContent();
    Iterator iterator = children.iterator();
    while (iterator.hasNext()) {
      Object o = iterator.next();
      if (o instanceof Element) {
        numElements++;
        count((Element) o);
      }
      else if (o instanceof Comment) numComments++;
      else if (o instanceof ProcessingInstruction) numProcessingInstructions++;   
    }
    
    String result = numElements + "\t" + numAttributes + "\t" 
     + numComments + "\t" + numProcessingInstructions + "\t" + numCharacters;
    return result;
       
  }     

  public static void count(Element element) {

    List attributes = element.getAttributes();
    numAttributes += attributes.size();
    List children = element.getMixedContent();
    Iterator iterator = children.iterator();
    while (iterator.hasNext()) {
      Object o = iterator.next();
      if (o instanceof Element) {
        numElements++;
        count((Element) o);
      }
      else if (o instanceof Comment) numComments++;
      else if (o instanceof ProcessingInstruction) numProcessingInstructions++;   
      else if (o instanceof String) {
        String s = (String) o;
        numCharacters += s.length();
      }   
    }
        
  }  

}

XCount Output

% java XCount shortlogs.xml hotcop.xml
File    Elements        Attributes      Comments        Processing Instructions
Characters
shortlogs.xml:  30      0       0       0       736
hotcop.xml:     11      8       2       1       95

Handling Attributes in JDOM

Each attribute is represented as an Attribute object
Each Attribute has:
- A local name, a String
- A value, a String
- A Namespace object (which may be Namespace.NO_NAMESPACE)
Everything else can be determined from these three items.

Convenience methods can convert the attribute value to various types like int or double
JDOM enforces restrictions on attribute names and values; e.g. value may not contain < or >
Attributes are stored in a java.util.List in the Element that contains them
This list only contains Attribute objects.

The Attribute Class

package org.jdom;

public class Attribute implements Serializable, Cloneable {

    protected String    name;
    protected Namespace namespace;
    protected String    value;
    protected Element   parent;

    protected Attribute() {}
    public    Attribute(String name, String value) {}
    public    Attribute(String name, String value, Namespace namespace) {}

    public String    getName() {}
    public Attribute setName(String name) {}
    public String    getQualifiedName() {}
    public String    getNamespacePrefix() {}
    public String    getNamespaceURI() {}
    public Namespace getNamespace() {}
    public String    getValue() {}
    public Attribute setValue(String value) {}
    protected Attribute setParent(Element parent) {}
    
    public Attribute detach() {}

    /////////////////////////////////////////////////////////////////
    // Basic Utility Methods
    /////////////////////////////////////////////////////////////////

    public final String  toString() {}
    public final boolean equals(Object ob) {}
    public final int     hashCode() {}
    public final Object  clone() {}

    /////////////////////////////////////////////////////////////////
    // Convenience Methods below here
    /////////////////////////////////////////////////////////////////

    public String  getValue(String defaultValue) {}
    public int     getIntValue(int defaultValue) {}
    public int     getIntValue() throws DataConversionException {}
    public long    getLongValue(long defaultValue) {}
    public long    getLongValue() throws DataConversionException {}
    public float   getFloatValue(float defaultValue) {}
    public float   getFloatValue() throws DataConversionException {}
    public double  getDoubleValue(double defaultValue) {}
    public double  getDoubleValue() throws DataConversionException {}
    public boolean getBooleanValue(boolean defaultValue) {}
    public boolean getBooleanValue() throws DataConversionException {}
    public char    getCharValue(char defaultValue) {}
    public char    getCharValue() throws DataConversionException {}

}

IDTagger

import java.io.IOException;
import org.jdom.*;
import org.jdom.input.SAXBuilder;
import org.jdom.output.XMLOutputter;
import java.util.*;


public class JDOMIDTagger {

  private static int id = 1;

  public static void processElement(Element element) {

    if (element.getAttribute("ID") == null) {
      element.addAttribute(new Attribute("ID", "_" + id));
      id = id + 1; 
    }
    
    // recursion
    List children = element.getChildren();
    Iterator iterator = children.iterator();
    while (iterator.hasNext()) {
      processElement((Element) iterator.next());   
    }
    
  }

  public static void main(String[] args) {
     
    SAXBuilder builder = new SAXBuilder();
    
    for (int i = 0; i < args.length; i++) {
        
      try {
        // Read the entire document into memory
        Document document = builder.build(args[i]); 
       
        processElement(document.getRootElement());
        
        // now we serialize the document...
        XMLOutputter serializer = new XMLOutputter(); 
        serializer.output(document, System.out);
        System.out.flush();	        
      }
      catch (JDOMException e) {
        System.err.println(e);
        continue; 
      }
      catch (IOException e) {
        System.err.println(e);
        continue; 
      }
      
    }
  
  } // end main

}

Before IDTagger

<?xml version="1.0"?><backslash
xmlns:backslash="http://slashdot.org/backslash.dtd">

 <story>
    <title>The Onion to buy the New York Times</title>
    <url>http://slashdot.org/articles/00/02/19/1128240.shtml</url>
    <time>2000-02-19 17:25:15</time>
    <author>CmdrTaco</author>
    <department>stuff-to-read</department>
    <topic>media</topic>
    <comments>20</comments>
    <section>articles</section>
    <image>topicmedia.gif</image>
  </story>
 <story>
    <title>Al Gore's Webmaster Answers Your Questions</title>
    <url>http://slashdot.org/interviews/00/02/19/0932207.shtml</url>
    <time>2000-02-19 17:00:52</time>
    <author>Roblimo</author>
    <department>political-process-online</department>
    <topic>usa</topic>
    <comments>49</comments>
    <section>interviews</section>
    <image>topicus.gif</image>
  </story>
 <story>
    <title>Open Source Africa</title>
    <url>http://slashdot.org/articles/00/02/19/1016216.shtml</url>
    <time>2000-02-19 16:05:58</time>
    <author>emmett</author>
    <department>songs-by-toto</department>
    <topic>linux</topic>
    <comments>50</comments>
    <section>articles</section>
    <image>topiclinux.gif</image>
  </story>
 <story>
    <title>Microsoft Funded by NSA, Helps Spy on Win Users?</title>
    <url>http://slashdot.org/articles/00/02/19/0750247.shtml</url>
    <time>2000-02-19 14:07:04</time>
    <author>Roblimo</author>
    <department>deep-dark-conspiracy-theories</department>
    <topic>microsoft</topic>
    <comments>154</comments>
    <section>articles</section>
    <image>topicms.gif</image>
  </story>
 <story>
    <title>X-Men Trailer Released</title>
    <url>http://slashdot.org/articles/00/02/18/0829209.shtml</url>
    <time>2000-02-19 13:47:06</time>
    <author>emmett</author>
    <department>mutant</department>
    <topic>movies</topic>
    <comments>70</comments>
    <section>articles</section>
    <image>topicmovies.gif</image>
  </story>
 <story>
    <title>Connell Replies to "Grok" Comments</title>
    <url>http://slashdot.org/articles/00/02/18/202240.shtml</url>
    <time>2000-02-19 05:01:37</time>
    <author>Hemos</author>
    <department>replying-to-things</department>
    <topic>linux</topic>
    <comments>197</comments>
    <section>articles</section>
    <image>topiclinux.gif</image>
  </story>
 <story>
    <title>etoy.com Returns</title>
    <url>http://slashdot.org/yro/00/02/18/1739216.shtml</url>
    <time>2000-02-19 02:35:06</time>
    <author>nik</author>
    <department>NP:-gimme-shelter</department>
    <topic>internet</topic>
    <comments>77</comments>
    <section>yro</section>
    <image>topicinternet.jpg</image>
  </story>
 <story>
    <title>New Propaganda Series: Rebirth</title>
    <url>http://slashdot.org/articles/00/02/18/205232.shtml</url>
    <time>2000-02-19 01:05:26</time>
    <author>Hemos</author>
    <department>as-pretty-as-always</department>
    <topic>graphics</topic>
    <comments>120</comments>
    <section>articles</section>
    <image>topicgraphics3.gif</image>
  </story>
 <story>
    <title>Giving Back</title>
    <url>http://slashdot.org/features/00/02/18/1631224.shtml</url>
    <time>2000-02-18 22:27:26</time>
    <author>emmett</author>
    <department>salvation-army</department>
    <topic>news</topic>
    <comments>122</comments>
    <section>features</section>
    <image>topicnews.gif</image>
  </story>
 <story>
    <title>Connectix Considering Open Sourcing VGS?</title>
    <url>http://slashdot.org/articles/00/02/18/1050225.shtml</url>
    <time>2000-02-18 20:46:20</time>
    <author>emmett</author>
    <department>grain-of-salt</department>
    <topic>news</topic>
    <comments>93</comments>
    <section>articles</section>
    <image>topicnews.gif</image>
  </story>
</backslash>

View Input in Browser

After IDTagger

<?xml version="1.0" encoding="UTF-8"?>
<backslash ID="_1">
  <story ID="_2">
    <title ID="_3">The Onion to buy the New York Times</title>
    <url ID="_4">http://slashdot.org/articles/00/02/19/1128240.shtml</url>
    <time ID="_5">2000-02-19 17:25:15</time>
    <author ID="_6">CmdrTaco</author>
    <department ID="_7">stuff-to-read</department>
    <topic ID="_8">media</topic>
    <comments ID="_9">20</comments>
    <section ID="_10">articles</section>
    <image ID="_11">topicmedia.gif</image>
  </story>
  <story ID="_12">
    <title ID="_13">Al Gore's Webmaster Answers Your Questions</title>
    <url ID="_14">http://slashdot.org/interviews/00/02/19/0932207.shtml</url>
    <time ID="_15">2000-02-19 17:00:52</time>
    <author ID="_16">Roblimo</author>
    <department ID="_17">political-process-online</department>
    <topic ID="_18">usa</topic>
    <comments ID="_19">49</comments>
    <section ID="_20">interviews</section>
    <image ID="_21">topicus.gif</image>
  </story>
  <story ID="_22">
    <title ID="_23">Open Source Africa</title>
    <url ID="_24">http://slashdot.org/articles/00/02/19/1016216.shtml</url>
    <time ID="_25">2000-02-19 16:05:58</time>
    <author ID="_26">emmett</author>
    <department ID="_27">songs-by-toto</department>
    <topic ID="_28">linux</topic>
    <comments ID="_29">50</comments>
    <section ID="_30">articles</section>
    <image ID="_31">topiclinux.gif</image>
  </story>
  <story ID="_32">
    <title ID="_33">Microsoft Funded by NSA, Helps Spy on Win Users?</title>
    <url ID="_34">http://slashdot.org/articles/00/02/19/0750247.shtml</url>
    <time ID="_35">2000-02-19 14:07:04</time>
    <author ID="_36">Roblimo</author>
    <department ID="_37">deep-dark-conspiracy-theories</department>
    <topic ID="_38">microsoft</topic>
    <comments ID="_39">154</comments>
    <section ID="_40">articles</section>
    <image ID="_41">topicms.gif</image>
  </story>
  <story ID="_42">
    <title ID="_43">X-Men Trailer Released</title>
    <url ID="_44">http://slashdot.org/articles/00/02/18/0829209.shtml</url>
    <time ID="_45">2000-02-19 13:47:06</time>
    <author ID="_46">emmett</author>
    <department ID="_47">mutant</department>
    <topic ID="_48">movies</topic>
    <comments ID="_49">70</comments>
    <section ID="_50">articles</section>
    <image ID="_51">topicmovies.gif</image>
  </story>
  <story ID="_52">
    <title ID="_53">Connell Replies to "Grok" Comments</title>
    <url ID="_54">http://slashdot.org/articles/00/02/18/202240.shtml</url>
    <time ID="_55">2000-02-19 05:01:37</time>
    <author ID="_56">Hemos</author>
    <department ID="_57">replying-to-things</department>
    <topic ID="_58">linux</topic>
    <comments ID="_59">197</comments>
    <section ID="_60">articles</section>
    <image ID="_61">topiclinux.gif</image>
  </story>
  <story ID="_62">
    <title ID="_63">etoy.com Returns</title>
    <url ID="_64">http://slashdot.org/yro/00/02/18/1739216.shtml</url>
    <time ID="_65">2000-02-19 02:35:06</time>
    <author ID="_66">nik</author>
    <department ID="_67">NP:-gimme-shelter</department>
    <topic ID="_68">internet</topic>
    <comments ID="_69">77</comments>
    <section ID="_70">yro</section>
    <image ID="_71">topicinternet.jpg</image>
  </story>
  <story ID="_72">
    <title ID="_73">New Propaganda Series: Rebirth</title>
    <url ID="_74">http://slashdot.org/articles/00/02/18/205232.shtml</url>
    <time ID="_75">2000-02-19 01:05:26</time>
    <author ID="_76">Hemos</author>
    <department ID="_77">as-pretty-as-always</department>
    <topic ID="_78">graphics</topic>
    <comments ID="_79">120</comments>
    <section ID="_80">articles</section>
    <image ID="_81">topicgraphics3.gif</image>
  </story>
  <story ID="_82">
    <title ID="_83">Giving Back</title>
    <url ID="_84">http://slashdot.org/features/00/02/18/1631224.shtml</url>
    <time ID="_85">2000-02-18 22:27:26</time>
    <author ID="_86">emmett</author>
    <department ID="_87">salvation-army</department>
    <topic ID="_88">news</topic>
    <comments ID="_89">122</comments>
    <section ID="_90">features</section>
    <image ID="_91">topicnews.gif</image>
  </story>
  <story ID="_92">
    <title ID="_93">Connectix Considering Open Sourcing VGS?</title>
    <url ID="_94">http://slashdot.org/articles/00/02/18/1050225.shtml</url>
    <time ID="_95">2000-02-18 20:46:20</time>
    <author ID="_96">emmett</author>
    <department ID="_97">grain-of-salt</department>
    <topic ID="_98">news</topic>
    <comments ID="_99">93</comments>
    <section ID="_100">articles</section>
    <image ID="_101">topicnews.gif</image>
  </story>
</backslash>

View Output in Browser

Handling Entities in JDOM

Unparsed entities really aren't handled at all.
Most of the time, the parser resolves general entity references and you never see them.
If the parser doesn't resolve a general entity reference, an EntityRef object will be left in the tree.
When writing, the outputter outputs entity references but not the entity's content.
This one is still being thought out.

The EntityRef Class

package org.jdom;

public class EntityRef implements Serializable, Cloneable {

    protected String name;
    protected String publicID;
    protected String systemID;
    protected Element parent;
    protected Document document;

    protected EntityRef() {}
    public EntityRef(String name) {}
    public EntityRef(String name, String publicID, String systemID) {}
    
    public EntityRef detach() {}
    
    public Document  getDocument() {}
    public String    getName() {}
    public Element   getParent() {}
    public String    getPublicID()  {}
    public String    getSystemID() {}

    protected EntityRef setParent(Element parent) {}
    public    EntityRef setName(String newPublicID) {}
    public    EntityRef setPublicID(String newPublicID) {}
    public    EntityRef setSystemID(String newSystemID) {}

    public Object clone() {}
    public final boolean equals(Object o) {}
    public final int hashCode() {}
    public String toString() {}
    
}

Handling Comments in JDOM

A Comment object Represents a comment like this example from the XML 1.0 spec:

<!--* N.B. some readers (notably JC) find the following
paragraph awkward and redundant.  I agree it's logically redundant:
it *says* it is summarizing the logical implications of
matching the grammar, and that means by definition it's
logically redundant.  I don't think it's rhetorically
redundant or unnecessary, though, so I'm keeping it.  It
could however use some recasting when the editors are feeling
stronger. -MSM *-->

No children
JDOM checks the content to make sure it's legal (i.e. does not contain a double-hyphen)

The Comment Class

package org.jdom;

public class Comment implements Serializable, Cloneable {

    protected String text;

    protected Comment() {}
    public    Comment(String text) {}
    
    public String     getText() {}
    public void       setText(String text) {}
    public Comment    detach() {}
    public Document   getDocument() {}
    protected Comment setDocument(Document document) {}
    public Element    getParent() {}
    protected Comment setParent(Element parent){}
    
    public final String  toString() {}
    public final boolean equals(Object ob) {}
    public final int     hashCode() {}
    public final Object  clone() {}

}

Comment Example

import org.jdom.*;
import org.jdom.input.SAXBuilder;
import java.util.*;


public class CommentReader {

  public static void main(String[] args) {
     
    SAXBuilder builder = new SAXBuilder();
    
    for (int i = 0; i < args.length; i++) {
      try {
        // Read the entire document into memory
        Document doc = builder.build(args[i]);
        List content = doc.getMixedContent();
        Iterator iterator = content.iterator();
        while (iterator.hasNext()) {
          Object o = iterator.next();
          if (o instanceof Comment) {
            Comment c = (Comment) o;
            System.out.println(c.getText());     
            System.out.println();     
          }
          else if (o instanceof Element) {
            processElement((Element) o);   
          }
        }
      }
      catch (JDOMException e) {
        System.err.println(e); 
        e.getRootCause().printStackTrace(); 
      }
      
    }
  
  } // end main

  // note use of recursion
  public static void processElement(Element element) {
    
    List content = element.getMixedContent();
    Iterator iterator = content.iterator();
    while (iterator.hasNext()) {
      Object o = iterator.next();
      if (o instanceof Comment) {
        Comment c = (Comment) o;
        System.out.println(c.getText());     
        System.out.println();     
      }
      else if (o instanceof Element) {
        processElement((Element) o);   
      }
    } // end while
    
  }

}

CommentReader Output

% java CommentReader hotcop.xml
 The publisher is actually Polygram but I needed
       an example of a general entity reference.

 You can tell what album I was
     listening to when I wrote this example

Or try http://www.w3.org/TR/1998/REC-xml-19980210.xml for more interesting output.

ProcessingInstruction Nodes

Represents a processing instruction like
<?robots index="yes" follow="no"?>
No children

Some have pseudo-attributes; some don't:

<?php 
  mysql_connect("database.unc.edu", "clerk", "password"); 
  $result = mysql("music", "SELECT LastName, FirstName  
    FROM Employees ORDER BY LastName, FirstName"); 
  $i = 0;
  while ($i < mysql_numrows ($result)) {
     $fields = mysql_fetch_row($result);
     echo "<person>$fields[1] $fields[0] </person>\r\n";
     $i++;
  }
  mysql_close();
?>

A ProcessingInstruction is represented as either
- Target and Value
- Target and Pseudo-attributes
As usual JDOM checks the contents of each processingInstruction object for well-formedness

The ProcessingInstruction Class

package org.jdom;

public class ProcessingInstruction implements Serializable, Cloneable {

    protected String target;
    protected String rawData;
    protected Map    mapData;
    protected Document document;
    protected Element parent;
    
    protected ProcessingInstruction() {}
    public    ProcessingInstruction(String target, Map data) {}
    public    ProcessingInstruction(String target, String data) {}
    
    public String                getTarget() {}
    public String                getData() {}
    public ProcessingInstruction setData(String data) {}
    public ProcessingInstruction setData(Map data) {}
    public String                getValue(String name) {}
    public ProcessingInstruction setValue(String name, String value) {}
    public boolean               removeValue(String name) {}

    public    Document              getDocument() {}
    protected ProcessingInstruction setDocument(Document document) {}
    public    Element               getParent() {}
    protected ProcessingInstruction setParent(Element parent){}

    public final String  toString() {}
    public final boolean equals(Object ob) {}
    public final int     hashCode() {}
    public final Object  clone() {}
}

XLinkSpider that Respects the robots Processing Instruction

import java.io.*;
import java.util.*;
import org.jdom.*;
import org.jdom.input.SAXBuilder;


public class XLinkSpider {

  private static SAXBuilder builder = new SAXBuilder();
  
  private static Vector visited = new Vector();
  
  private static int maxDepth = 5;
  private static int currentDepth = 0; 
  
  public static void listURIs(String systemID) {
    
    currentDepth++;
    try {
      if (currentDepth < maxDepth) {

        Document document = builder.build(systemID); 
                
        // check to see if we're allowed to spider
        boolean index = true;
        boolean follow = true;
        ProcessingInstruction robots 
         = document.getProcessingInstruction("robots");
        if (robots != null) {
          String indexValue = robots.getValue("index");
          if (indexValue.equalsIgnoreCase("no")) index = false;
          String followValue = robots.getValue("follow");
          if (followValue.equalsIgnoreCase("no")) follow = false;
        }
        Vector uris = new Vector();
        // search the document for uris, 
        // store them in vector, and print them
        if (follow) searchForURIs(document.getRootElement(), uris);
    
        Enumeration e = uris.elements();
        while (e.hasMoreElements()) {
          String uri = (String) e.nextElement();
          visited.addElement(uri);
          if (index) listURIs(uri); 
        }
      
      }
    
    }
    catch (JDOMException e) {
      // couldn't load the document, 
      // probably not well-formed XML, skip it 
    }
    finally { 
      currentDepth--;
      System.out.flush();     
    }
      
  }
  
  private static Namespace xlink 
   = Namespace.getNamespace("http://www.w3.org/1999/xlink");
  
  // use recursion 
  public static void searchForURIs(Element element, Vector uris) {
    
    // look for XLinks in this element
    String uri = element.getAttributeValue("href", xlink);
    if (uri != null && !uri.equals("") 
     && !visited.contains(uri) && !uris.contains(uri)) {
      System.out.println(uri);
      uris.addElement(uri);
    }
    
    // process child elements recursively
    List children = element.getChildren();
    Iterator iterator = children.iterator();
    while (iterator.hasNext()) {
      searchForURIs((Element) iterator.next(), uris); 
    }
    
  }

  public static void main(String[] args) {
    
    if (args.length == 0) {
      System.out.println("Usage: java XLinkSpider URL1 URL2..."); 
    } 
      
    // start parsing... 
    for (int i = 0; i < args.length; i++) {
      System.err.println(args[i]); 
      listURIs(args[i]);
    } // end for
  
  } // end main

} // end XLinkSpider

Handling Namespaces

JDOM is fully namespace aware
Namespaces are represented by instances of the Namespace class rather than by attributes or raw strings
Always ask for elements and attributes by local names and namespace URIs
Elements and attributes that are not in any namespace can be asked for by local name alone
Never identify an element or attribute by qualified name

The Namespace Class

Mostly for internal parser use
Occasionally useful for tasks like finding out whether a document contains any XLinks

The Namespace Class

package org.jdom;

public final class Namespace {

  public static final Namespace NO_NAMESPACE = new Namespace("", "");
  public static final Namespace XML_NAMESPACE = 
   new Namespace("xml", "http://www.w3.org/XML/1998/namespace");

  // factory methods
  public static Namespace getNamespace(String prefix, String uri) {}
  public static Namespace getNamespace(String uri) {}

  // getter methods
  public String  getPrefix() {}
  public String  getURI() {}

  // utility methods
  public boolean equals(Object ob) {}
  public String  toString() {}
  public int     hashCode() {}

}

DocType Nodes

Represents a document type declaration
Has no children

The DocType class

package org.jdom;

public class DocType implements Serializable, Cloneable {

    protected String elementName;
    protected String publicID;
    protected String systemID;

    protected DocType() {}
    public    DocType(String rootElementName, String publicID, String systemID) {}
    public    DocType(String rootElementName, String systemID) {}
    public    DocType(String rootElementName) {}

    public String  getElementName() {}
    public String  getPublicID() {}
    public DocType setPublicID(String publicID) {}
    public String  getSystemID() {}
    public DocType setSystemID(String systemID) {}

    // Usual utility methods
    public final String  toString() {}
    public final boolean equals(Object ob) {}
    public final int     hashCode() {}
    public final Object  clone() {}
    
}

Example of the DocType Class

Verify that a document is correct XHTML
From the XHTML 1.0 spec:
1. It must validate against one of the three DTDs found in Appendix A.
2. The root element of the document must be <html>.
3. The root element of the document must designate the XHTML namespace using the xmlns attribute [XMLNAMES]. The namespace for XHTML is defined to be http://www.w3.org/1999/xhtml.
4. There must be a DOCTYPE declaration in the document prior to the root element. The public identifier included in the DOCTYPE declaration must reference one of the three DTDs found in Appendix A using the respective Formal Public Identifier. The system identifier may be changed to reflect local system conventions.
```
<!DOCTYPE html 
     PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
     "DTD/xhtml1-strict.dtd">

<!DOCTYPE html 
     PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
     "DTD/xhtml1-transitional.dtd">

<!DOCTYPE html 
     PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN"
     "DTD/xhtml1-frameset.dtd">
```

XHTMLValidator

import java.io.*;
import org.jdom.*;
import org.jdom.input.SAXBuilder;


public class JDOMXHTMLValidator {

  public static void main(String[] args) {
    
    for (int i = 0; i < args.length; i++) {
      validate(args[i]);
    }   
    
  }

  private static SAXBuilder builder = new SAXBuilder(true);
                                                 /*  ^^^^ */
                                              /* turn on validation  */
  
  // not thread safe
  public static void validate(String source) {
        
      Document document;
      try {
        document = builder.build(source); 
      }
      catch (JDOMException e) {  
        System.out.println("Error: " + e.getMessage()); 
        e.printStackTrace();
        return; 
      }
      
      // If we get this far, then the document is valid XML.
      // Check to see whether the document is actually XHTML        
      DocType doctype = document.getDocType();
    
      if (doctype == null) {
        System.out.println("No DOCTYPE"); 
        return;
      }

      String name     = doctype.getElementName();
      String systemID = doctype.getSystemID();
      String publicID = doctype.getPublicID();
      
      if (!name.equals("html")) {
        System.out.println("Incorrect root element name " + name); 
      }
    
      if (publicID == null
       || (!publicID.equals("-//W3C//DTD XHTML 1.0 Strict//EN")
           && !publicID.equals("-//W3C//DTD XHTML 1.0 Transitional//EN")
           && !publicID.equals("-//W3C//DTD XHTML 1.0 Frameset//EN"))) {
        System.out.println(source + " does not seem to use an XHTML 1.0 DTD");
      }
    
      // Check the namespace on the root element
      Element root = document.getRootElement();
      Namespace namespace = root.getNamespace();
      String prefix = namespace.getPrefix();
      String uri = namespace.getURI();
      if (!uri.equals("http://www.w3.org/1999/xhtml")) {
        System.out.println(source 
         + " does not properly declare the"
         + " http://www.w3.org/1999/xhtml namespace"
         + " on the root element");        
      }
      if (!prefix.equals("")) {
        System.out.println(source 
         + " does not use the empty prefix for XHTML");        
      }
    
  }

}

Using the XHTMLValidator

% java JDOMXHTMLValidator http://www.w3.org/TR/xhtml1
Error: File "http://www.w3.org/TR/DTD/xhtml1-strict.dtd" not found.: Error on 
line -1 of XML document: File "http://www.w3.org/TR/DTD/xhtml1-strict.dtd" not 
found.
org.jdom.JDOMException: File "http://www.w3.org/TR/DTD/xhtml1-strict.dtd" not 
found.: Error on line -1 of XML document: File 
"http://www.w3.org/TR/DTD/xhtml1-strict.dtd" not found.
        at org.jdom.input.SAXBuilder.build(SAXBuilder.java:227)
        at org.jdom.input.SAXBuilder.build(SAXBuilder.java:359)
        at XHTMLValidator.validate(XHTMLValidator.java:25)
        at XHTMLValidator.main(XHTMLValidator.java:11)
Root cause: org.jdom.JDOMException: Error on line -1 of XML document: File 
"http://www.w3.org/TR/DTD/xhtml1-strict.dtd" not found.
        at org.jdom.input.SAXBuilder.build(SAXBuilder.java:228)
        at org.jdom.input.SAXBuilder.build(SAXBuilder.java:359)
        at XHTMLValidator.validate(XHTMLValidator.java:25)
        at XHTMLValidator.main(XHTMLValidator.java:11)

The Verifier Class

Checks a variety of strings to see if they're legal for particular uses in XML as specified by XML 1.0 and Namespaces in XML.
Mostly for internal parser use

The Verifier Class

package org.jdom;

public final class Verifier {

    public static final String checkElementName(String name) {}
    public static final String checkAttributeName(String name) {}
    public static final String checkCharacterData(String text) {}
    public static final String checkNamespacePrefix(String prefix) {}
    public static final String checkNamespaceURI(String uri) {}
    public static final String checkProcessingInstructionTarget(String target) {}
    public static final String checkCommentData(String data) {}
 
    public static boolean isXMLCharacter(char c) {}
    public static boolean isXMLNameCharacter(char c) {}
    public static boolean isXMLNameStartCharacter(char c) {}
    public static boolean isXMLLetterOrDigit(char c) {}
    public static boolean isXMLLetter(char c) {}
    public static boolean isXMLCombiningChar(char c) {}
    public static boolean isXMLExtender(char c) {}
    public static boolean isXMLDigit(char c) {}

}

JDOMException

A checked exception so you must catch it
Wraps other exceptions that are thrown during JDOM operations like IOException or SAXException
Root cause of exception (if any) is accessible through the getRootCause() method:
public Throwable getRootCause()
Subclasses:
- DataConversionException
IllegalArgumentException subclasses:
- IllegalAddException
- IllegalDataException
- IllegalNameException
- IllegalTargetException

JDOMException Class

package org.jdom;

public class JDOMException extends Exception {

    protected Throwable cause;

    public JDOMException() {}
    public JDOMException(String message)  {}
    public JDOMException(String message, Throwable rootCause)  {} 
       
    public String    getMessage() {}
    public void      printStackTrace() {}
    public void      printStackTrace(PrintStream s) {}
    public void      printStackTrace(PrintWriter w) {}
    public Throwable getCause()  {}

}

The org.jdom.output Package

DOMOutputter
SAXOutputter
XMLOutputter

Serialization

The process of taking an in-memory JDOM Document and converting it to a stream of characters that can be written onto an output stream
The org.jdom.output.XMLOutputter class

XMLOutputter

package org.jdom.output;

public class XMLOutputter implements Cloneable {

    protected static final String STANDARD_INDENT = "  ";
    
    public XMLOutputter() {}
    public XMLOutputter(String indent) {}
    public XMLOutputter(String indent, boolean newlines) {}
    public XMLOutputter(String indent, boolean newlines, String encoding) {}
    public XMLOutputter(XMLOutputter that) {}
    
    public void setLineSeparator(String separator) {}
    public void setNewlines(boolean newlines) {}
    public void setEncoding(String encoding) {}
    public void setOmitEncoding(boolean omitEncoding) {}
    public void setOmitDeclaration(boolean omitDeclaration) {}
    public void setExpandEmptyElements(boolean expandEmptyElements) {}
    public void setIndent(String indent) {}
    public void setIndent(boolean doIndent) {}
    public void setIndentSize(int indentSize) {}
    public void setTextNormalize(boolean textNormalize)

    protected String escapeAttributeEntities(String s) {} 
    protected String escapeElementEntities(String s) {}

    protected void indent(Writer out, int level) throws IOException {}
    protected void maybePrintln(Writer out) throws IOException  {}
    protected Writer makeWriter(OutputStream out) 
     throws java.io.UnsupportedEncodingException {}
    protected Writer makeWriter(OutputStream out, String encoding) 
     throws java.io.UnsupportedEncodingException {}
     
    public void output(Document doc, OutputStream out) throws IOException {}
    public void output(Document doc, Writer writer) throws IOException {}
    public void output(Element element, Writer out) throws IOException {}
    public void output(Element element, OutputStream out) {}
    public void output(CDATA cdata, Writer out) throws IOException {}
    public void output(CDATA cdata, OutputStream out) throws IOException {}
    public void output(Comment comment, Writer out) throws IOException {}
    public void output(Comment comment, OutputStream out) throws IOException {}
    public void output(String string, Writer out) throws IOException {}
    public void output(String string, OutputStream out) throws IOException {}
    public void output(EntityRef entity, Writer out) throws IOException {}
    public void output(EntityRef entity, OutputStream out) throws IOException {}
    public void output(ProcessingInstruction processingInstruction, Writer out)
      throws IOException {}
    public void output(ProcessingInstruction processingInstruction, OutputStream out)
     throws IOException {}
     
    public void outputElementContent(Element element, OutputStream out)
    public void outputElementContent(Element element, Writer out)

    public String outputString(Document doc) throws IOException {}
    public String outputString(Element element) throws IOException {}
    public String outputString(CDATA cdata) {}
    public String outputString(Comment comment) {}
    public String outputString(DocType doctype) {}
    public String outputString(EntityRef entity) {}
    public String outputString(ProcessingInstruction pi) {}

    // internal printing methods
    protected void printDeclaration(Document doc, Writer out, String encoding) 
     throws IOException {}    
    protected void printDocType(DocType docType, Writer out) throws IOException {}
    protected void printComment(Comment comment, Writer out, int indentLevel) 
     throws IOException {}
    protected void printProcessingInstruction(ProcessingInstruction pi,
     Writer out, int indentLevel) throws IOException {}
    protected void printCDATASection(CDATA cdata, Writer out, int indentLevel) 
     throws IOException {}
    protected void printElement(Element element, Writer out,
     int indentLevel, NamespaceStack namespaces) throws IOException {}
    protected void printElementContent(Element element, Writer out,
     int indentLevel, NamespaceStack namespaces, List mixedContent) 
     throws IOException {}
    protected void printString(String s, Writer out) throws IOException {}
    protected void printEntity(Entity entity, Writer out) throws IOException {}
    protected void printNamespace(Namespace ns, Writer out) throws IOException {}
    protected void printAttributes(List attributes, Element parent, 
     Writer out, NamespaceStack namespaces)  
     throws IOException {}
    
    public int parseArgs(String[] args, int i) {} 
    
}

Using the XMLOutputter Class Directly

Configured with three variables passed to the constructor:

indent
a String added at each level of output; e.g. two spaces or a tab

lineSeparator
the String to break lines with, no line breaking is performed if this is null or the empty string

encoding
The name of the encoding to use for output; e.g. UTF-16 or ISO-8859-1

Options can be set with these 10 methods:

    public void setLineSeparator(String separator) {}
    public void setNewlines(boolean newlines) {}
    public void setEncoding(String encoding) {}
    public void setOmitEncoding(boolean omitEncoding) {}
    public void setOmitDeclaration(boolean omitDeclaration) {}
    public void setExpandEmptyElements(boolean expandEmptyElements) {}
    public void setIndent(String indent) {}
    public void setIndent(boolean doIndent) {}
    public void setIndentSize(int indentSize) {}
    public void setTextNormalize(boolean textNormalize)

The output() method writes a Document onto a given OutputStream:

  public void output(Document doc, OutputStream out) throws IOException {}
  public void output(Document doc, Writer writer) throws IOException {}

There are also output() methods for other JDOM classes:

  public void output(Element element, Writer out) throws IOException {}
  public void output(Element element, OutputStream out) {}
  public void outputElementContent(Element element, Writer out) throws IOException {}
  public void output(CDATA cdata, Writer out) throws IOException {}
  public void output(CDATA cdata, OutputStream out) throws IOException {}
  public void output(Comment comment, Writer out) throws IOException {}
  public void output(Comment comment, OutputStream out) throws IOException {}
  public void output(String string, Writer out) throws IOException {}
  public void output(String string, OutputStream out) throws IOException {}
  public void output(Entity entity, Writer out) throws IOException {}
  public void output(Entity entity, OutputStream out) throws IOException {}
  public void output(ProcessingInstruction processingInstruction, Writer out)
    throws IOException {}
  public void output(ProcessingInstruction processingInstruction, OutputStream out)
   throws IOException {}
  public String outputString(Document doc) throws IOException {}
  public String outputString(Element element) throws IOException {}

Use the outputString() methods to store a document in a string

Using the XMLOutputter Class Indirectly

Configured by overriding protected methods:

  protected void printDeclaration(Document doc, Writer out, String encoding) 
  throws IOException {}    
  protected void printDocType(DocType docType, Writer out) throws IOException {}
  protected void printComment(Comment comment, Writer out, int indentLevel) 
   throws IOException {}
  protected void printProcessingInstruction(ProcessingInstruction pi,
   Writer out, int indentLevel) throws IOException {}
  protected void printCDATASection(CDATA cdata, Writer out, int indentLevel) 
   throws IOException {}
  protected void printElement(Element element, Writer out,
   int indentLevel, NamespaceStack namespaces) throws IOException {}
  protected void printElementContent(Element element, Writer out,
   int indentLevel, NamespaceStack namespaces, List mixedContent) 
   throws IOException {}
  protected void printString(String s, Writer out) throws IOException {}
  protected void printEntity(Entity entity, Writer out) throws IOException {}
  protected void printNamespace(Namespace ns, Writer out) throws IOException {}
  protected void printAttributes(List attributes, Element parent, 
   Writer out, NamespaceStack namespaces)  
   throws IOException {}

JDOM based TagStripper

A bug in the current version of JDOM prevents this from working.

import org.jdom.*;
import org.jdom.output.XMLOutputter;
import org.jdom.input.SAXBuilder;
import java.io.*;
import java.util.*;


public class TagStripper extends XMLOutputter {

  public TagStripper() {
    super();
  }

  // Things we won't print at all
  protected void printDeclaration(Document doc, Writer out, String encoding) {}
  protected void printComment(Comment comment, Writer out, int indentLevel) {}
  protected void printDocType(DocType docType, Writer out) {}
  protected void printProcessingInstruction(ProcessingInstruction pi, 
   Writer out, int indentLevel) {}
  protected void printNamespace(Namespace ns, Writer out) {}
  protected void printAttributes(List attributes, Writer out) {}
  
  protected void printElement(Element element, Writer out, 
   int indentLevel, NamespaceStack namespaces) throws IOException {
    
    List content = element.getMixedContent();
    Iterator iterator = content.iterator();
    while (iterator.hasNext()) {
      Object o = iterator.next();
      if (o instanceof String) {
        out.write((String) o);
        this.maybePrintln(out);
      }
      else if (o instanceof Element) {
        printElement((Element) o, out, indentLevel, namespaces);
      }
    }
          
  }

  // Could easily have put main() method in a separate class
  public static void main(String[] args) {
     
    if (args.length == 0) {
      System.out.println(
       "Usage: java TagStripper URL1 URL2..."); 
    } 
      
    TagStripper stripper = new TagStripper();
    SAXBuilder builder   = new SAXBuilder();
    
    // start parsing... 
    for (int i = 0; i < args.length; i++) {
      
      // command line should offer URIs or file names
      try {
        Document doc = builder.build(args[i]);
        stripper.output(doc, System.out);
      }
      catch (JDOMException e) { // a well-formedness error
        System.out.println(args[i] + " is not well formed.");
        System.out.println(e.getMessage());
      }
      catch (IOException e) { // a well-formedness error
        System.out.println(e.getMessage());
      }
      
    }  
  
  }

}

Output from a JDOM based TagStripper

% java TagStripper hotcop.xml
Hot Cop
Jacques Morali
Henri Belolo
Victor Willis
Jacques Morali
A & M Records
6:20
1978
Village People

Talking to DOM Programs

The process of taking an in-memory JDOM Document and converting it to an org.w3c.dom.Document object

The org.jdom.output.DOMOutputter class:

package org.jdom.output;

public class DOMOutputter {

  // Constructors
  public DOMOutputter() {}

  // Outputter methods
  public org.w3c.dom.Document output(Document document) {}
  public org.w3c.dom.Element  output(Element element) {}
  public org.w3c.dom.Element  output(Element element, String domAdapterClass) {}
  public org.w3c.dom.Document output(Document document, String domAdapterClass) {}

  // utility methods
  protected void buildDOMTree(Object content, org.w3c.dom.Document doc, 
   org.w3c.dom.Element current, boolean atRoot, LinkedList namespaces) {}
  public String getXmlnsTagFor(Namespace ns);
    
}

Talking to SAX Programs

The process of taking an in-memory JDOM Document and walking its tree while firing off SAX events
The org.jdom.output.SAXOutputter class

What JDOM doesn't do

Documents larger than available memory
Byte-for-byte faithful round trips
DTDs
XPath Queries (may be added in 1.1)

To Learn More

JavaWorld: http://javaworld.com/javaworld/jw-05-2000/jw-0518-jdom.html
JDOM Web Site, http://www.jdom.org/
Java and XML, 2nd Edition, Brett McLaughlin, O'Reilly & Associates, 2001, ISBN 0-5960-0197-5, http://www.oreilly.com/catalog/javaxml/

Part VI: The Oracle Speaks, Predictions for the Future

XSLT 2.0

Too dependant on schemas
Loses momentum of XSLT 1.0
But succeeds anyway

XQuery

Too early to call
Do we really need native XML databases?

DOM Level 3 succeeds

JDOM succeeds, much to the consternation of the W3C

The triumph of worse is better

Stuff we didn't talk about

XPointers
XLinks
XInclude
XSL-FO
XHTML
Schemas
Schema Repositories
MathML
SVG
Browser support

XInclude succeeds once parsers support it

Schemas, a partial success

Developers need them desperately
Far too complex to be used as broadly as they're needed; experts only
Specification is poorly written, incomplete, and riddled with known problems; recommendation by exhaustion
Will be replaced within ten years; much like Java has replaced C

XLinks

Won't succeed unless and until there's a killer app
First company to define the killer app gets to fill in the holes in the spec over the protests of the W3C and the hypertext community

XPointers; the same story

Won't succeed unless and until there's a killer app
First company to define the killer app gets to fill in the holes in the spec over the protests of the W3C and the hypertext community

XSL-FO

Slow but successful adoption; steady linear growth

XHTML Fails

Too complex
Too little tool support
Too poorly documented
Offer no benefits to web page authors; the only people benefited are the tool vendors

Schema Repositories all fail

Commerce One
UDDI
BizTalk
xml.org
etc.

MathML succeeds

Mozilla will save this

SVG Takes Off in 2002

Illustrator supports it now
Several browser plug-ins are available
Many tools
We needed this 10 years ago

Browser Support

We won't see reliable browser support for XML until at least 2002
Non-PC devices will become common; necessitating a move to browser-independent layout
Mozilla knocks off IE

Invent the Future!

The best way to predict the future is to invent it.

--Alan Kay

To Learn More

This presentation: http://www.ibiblio.org/xml/slides/xmlonesanjose2001/advancedxml
JDOM Web Site: http://www.jdom.org
XML Infoset Specification: http://www.w3.org/TR/xml-infoset

Index | Cafe con Leche

Surname	FirstName	Team	Position	Games Played	Games Started	AtBats	Runs	Hits	Doubles	Triples	Home runs	RBI	Stolen Bases	Caught Stealing	Sacrifice Hits	Sacrifice Flies	Errors	PB	Walks	Strike outs	Hit by pitch
Anderson	Garret	ANA	Outfield	156	151	622	62	183	41	7	15	79	8	3	3	3	6	0	29	80	1
Baughman	Justin	ANA	Second Base	62	54	196	24	50	9	1	1	20	10	4	5	3	8	0	6	36	1
Bolick	Frank	ANA	Third Base	21	11	45	3	7	2	0	1	2	0	0	0	0	0	0	11	8	0
Disarcina	Gary	ANA	Shortstop	157	155	551	73	158	39	3	3	56	12	7	12	3	14	0	21	51	8
Edmonds	Jim	ANA	Outfield	154	150	599	115	184	42	1	25	91	7	5	1	1	5	0	57	114	1
Erstad	Darin	ANA	Outfield	133	129	537	84	159	39	3	19	82	20	6	1	3	3	0	43	77	6
Garcia	Carlos	ANA	Second Base	19	10	35	4	5	1	0	0	0	2	0	1	0	1	0	3	11	1
Glaus	Troy	ANA	Third Base	48	45	165	19	36	9	0	1	23	1	0	0	2	7	0	15	51	0
Greene	Todd	ANA	Outfield	29	15	71	3	18	4	0	1	7	0	0	0	0	0	0	2	20	0
Helfand	Eric	ANA	Catcher	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
Hollins	Dave	ANA	Third Base	101	98	363	60	88	16	2	11	39	11	3	2	2	17	0	44	69	7
Jefferies	Gregg	ANA	Outfield	19	18	72	7	25	6	0	1	10	1	0	0	0	0	0	0	5	0
Johnson	Mark	ANA	First Base	10	2	14	1	1	0	0	0	0	0	0	0	0	0	0	0	6	0
Kreuter	Chad	ANA	Catcher	96	74	252	27	63	10	1	2	33	1	0	5	1	9	5	33	49	3
Martin	Norberto	ANA	Second Base	79	50	195	20	42	2	0	1	13	3	1	3	2	4	0	6	29	0
Mashore	Damon	ANA	Outfield	43	24	98	13	23	6	0	2	11	1	0	1	0	0	0	9	22	3
Molina	Ben	ANA	Catcher	2	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
Nevin	Phil	ANA	Catcher	75	65	237	27	54	8	1	8	27	0	0	0	2	5	20	17	67	5
O'Brien	Charlie	ANA	Catcher	62	58	175	13	45	9	0	4	18	0	0	3	3	4	1	10	33	2
Palmeiro	Orlando	ANA	Outfield	74	34	165	28	53	7	2	0	21	5	4	7	0	0	0	20	11	0
Pritchett	Chris	ANA	First Base	31	19	80	12	23	2	1	2	8	2	0	0	0	1	0	4	16	0
Salmon	Tim	ANA	Designated Hitter	136	130	463	84	139	28	1	26	88	0	1	0	10	2	0	90	100	3
Shipley	Craig	ANA	Third Base	77	32	147	18	38	7	1	2	17	0	4	4	1	3	0	5	22	5
Velarde	Randy	ANA	Second Base	51	50	188	29	49	13	1	4	26	7	2	0	1	4	0	34	42	1
Walbeck	Matt	ANA	Catcher	108	91	338	41	87	15	2	6	46	1	1	5	5	7	8	30	68	2
Williams	Reggie	ANA	Outfield	29	7	36	7	13	1	0	1	5	3	3	1	0	0	0	7	11	1