Advanced XML

Advanced XML

Elliotte Rusty Harold

SDExpo 2000 East

Monday, October 30, 2000

elharo@metalab.unc.edu

http://www.ibiblio.org/xml/


Outline


Part I: XML Infoset

The Infoset is the unfortunate standard to which those in retreat from the radical and most useful implications of well-formedness have rallied. At its core the Infoset insists that there is 'more' to XML than the straightforward syntax of well-formedness. By imposing its canonical semantics the Infoset obviates the infinite other semantic outcomes which might be elaborated in particular unique circumstances from an instance of well-formed XML 1.0 syntax. The question we should be asking is not whether the Infoset has chosen the correct canonical semantics, but whether the syntactic possibilities of XML 1.0 should be curtailed in this way at all.
--Walter Perry on the xml-dev mailing list


A normal XML document

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/css" href="song.css"?>
<!DOCTYPE SONG SYSTEM "song.dtd">
<SONG xmlns="http://metalab.unc.edu/xml/namespace/song"
      xmlns:xlink="http://www.w3.org/1999/xlink">
  <TITLE>Hot Cop</TITLE>
  <PHOTO 
    xlink:type="simple" xlink:show="onLoad" xlink:href="hotcop.jpg"
    ALT="Victor Willis in Cop Outfit" WIDTH="100" HEIGHT="200"/>
  <COMPOSER>Jacques Morali</COMPOSER>
  <COMPOSER>Henri Belolo</COMPOSER>
  <COMPOSER>Victor Willis</COMPOSER>
  <PRODUCER>Jacques Morali</PRODUCER>
  <!-- The publisher is actually Polygram but I needed 
       an example of a general entity reference. -->
  <PUBLISHER xlink:type="simple" xlink:href="http://www.amrecords.com/">
    A &amp; M Records
  </PUBLISHER>
  <LENGTH>6:20</LENGTH>
  <YEAR>1978</YEAR>
  <ARTIST>Village People</ARTIST>
</SONG>
<!-- You can tell what album I was 
     listening to when I wrote this example -->

A canonical XML document

<?xml-stylesheet type="text/css" href="song.css"?><SONG xmlns="http://metalab.unc.edu/xml/namespace/song"
xmlns:xlink="http://www.w3.org/1999/xlink">&#10; <TITLE>Hot Cop</TITLE>&#10; <PHOTO ALT="Victor Willis in
Cop Outfit" HEIGHT="200" WIDTH="100" xlink:href="hotcop.jpg" xlink:show="onLoad"
xlink:type="simple"></PHOTO>&#10; <COMPOSER>Jacques Morali</COMPOSER>&#10; <COMPOSER>Henri
Belolo</COMPOSER>&#10; <COMPOSER>Victor Willis</COMPOSER>&#10; <PRODUCER>Jacques
Morali</PRODUCER>&#10; &#10; <PUBLISHER xlink:href="http://www.amrecords.com/" xlink:type="simple">&#10; A
&amp; M Records&#10; </PUBLISHER>&#10; <LENGTH>6:20</LENGTH>&#10; <YEAR>1978</YEAR>&#10;
<ARTIST>Village People</ARTIST>&#10;</SONG>

An org.w3c.dom.Document object formed by reading hotcop.xml

import org.apache.xerces.parsers.*;
import org.w3c.dom.*;
import org.xml.sax.*;
import java.io.*;

public class DOMHotCop {

  public static void main(String[] args) {
     
    DOMParser parser = new DOMParser();
    
    try {
      parser.parse("http://metalab.unc.edu/xml/examples/hot_cop.xml"); 
      Document d = parser.getDocument();
    }
    catch (SAXException e) {
      System.err.println(e); 
    }
    catch (IOException e) {
      System.err.println(e); 
    }
   
  }

}

Are these three the same thing or not?


What is the XML InfoSet?


The InfoSet defines 15 kinds of Information Items


The Document Information Item


Elements

<PHOTO 
  xlink:type="simple" xlink:show="onLoad" xlink:href="hotcop.jpg"
  ALT="Victor Willis in Cop Outfit" WIDTH="100" HEIGHT="200"/>
  
<COMPOSER>Jacques Morali</COMPOSER>

<COMPOSER>
  <PERSON>
    <NAME>
      <FIRST>Henri</FIRST>
      <LAST>Belolo</LAST>
    </NAME>
  </PERSON>
</COMPOSER>

<rdf:RDF xmlns:rdf="http://www.w3.org/TR/REC-rdf-syntax#">
  <rdf:Description xmlns:dc="http://purl.org/dc/"
     about="http://www.ibiblio.org/examples/impressionists.xml">
    <dc:title> Impressionist Paintings </dc:title>
    <dc:creator> Elliotte Rusty Harold </dc:creator>
    <dc:description> 
      A list of famous impressionist paintings organized 
      by painter and date 
    </dc:description>
    <dc:date>2000-08-22</dc:date>
  </rdf:Description>
</rdf:RDF>

Element Information Items

An Element Information Item Includes:


Attributes

xlink:type="simple"
xlink:href="http://www.amrecords.com/"
xlink:type =  "simple"
xlink:show = "onLoad"
xlink:href="hotcop.jpg"
ALT="Victor Willis in Cop Outfit"
WIDTH=" 100 "
HEIGHT=' 200 '

An Attribute Information Item Includes:


Comments

  <!-- The publisher is actually Polygram but I needed 
       an example of a general entity reference. -->
<!--  <PUBLISHER xlink:type="simple" xlink:href="http://www.amrecords.com/">
    A &amp; M Records
  </PUBLISHER>
  <LENGTH>6:20</LENGTH>
  <YEAR>1978</YEAR>
  <ARTIST>Village People</ARTIST>
</SONG> -->
<!-- You can tell what album I was 
     listening to when I wrote this example -->

A comment Information Item includes:


A Processing Instruction Information Item Includes:

<?robots index="yes" follow="no"?>
<?php 
  mysql_connect("database.unc.edu", "clerk", "password"); 
  $result = mysql("CYNW", "SELECT LastName, FirstName FROM Employees 
    ORDER BY LastName, FirstName"); 
  $i = 0;
  while ($i < mysql_numrows ($result)) {
     $fields = mysql_fetch_row($result);
     echo "<person>$fields[1] $fields[0] </person>\r\n";
     $i++;
  }
  mysql_close();
?>

Characters


Namespace Declarations

xmlns:rdf="http://www.w3.org/TR/REC-rdf-syntax#"
xmlns:dc="http://purl.org/dc/"
xmlns="http://www.w3.org/Graphics/SVG/SVG-19991203.dtd"

A Namespace Declaration Information Item includes:


Document Type Declaration

<!DOCTYPE SONG SYSTEM "song.dtd">
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
                       "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

A Document Type Declaration Information Item includes:

external DTD
An entity information item for the external DTD subset.
children
Only the comment and processing instruction information items in the internal DTD subset and external DTD subsets.
parent

Document Type Definition

<!ELEMENT SONG (TITLE, PHOTO?, COMPOSER+, PRODUCER*, 
 PUBLISHER*, LENGTH?, YEAR?, ARTIST+)>
<!ATTLIST SONG xmlns       CDATA #REQUIRED
               xmlns:xlink CDATA #REQUIRED>
<!ELEMENT TITLE (#PCDATA)>

<!ELEMENT PHOTO EMPTY>
<!ATTLIST PHOTO xlink:type CDATA #FIXED "simple"
                xlink:href CDATA #REQUIRED
                xlink:show CDATA #IMPLIED
                ALT        CDATA #REQUIRED
                WIDTH      CDATA #REQUIRED
                HEIGHT     CDATA #REQUIRED
>

<!ELEMENT COMPOSER (#PCDATA)>
<!ELEMENT PRODUCER (#PCDATA)>
<!ELEMENT PUBLISHER (#PCDATA)>
<!ATTLIST PUBLISHER xlink:type CDATA #IMPLIED
                    xlink:href CDATA #IMPLIED
>

<!ELEMENT LENGTH (#PCDATA)>
<!-- This should be a four digit year like "1999",
     not a two-digit year like "99" -->
<!ELEMENT YEAR (#PCDATA)>

<!ELEMENT ARTIST (#PCDATA)>

Entities


Entity Marker Information Items


Entity Declaration Information Items

Each entity declaration information item includes


The InfoSet Omits:


What is Canonical XML?


How are documents canonicalized?

  1. The document is encoded in UTF-8

  2. Line breaks are normalized to a linefeed (ASCII , \n)

  3. Attribute values are normalized, as if by a validating processor

  4. Character and parsed entity references are replaced

  5. CDATA sections are replaced with their character content

  6. The XML and document type declarations are removed

  7. Empty elements are converted to start-end tag pairs

  8. White space outside of the document element and within start and end tags is normalized

  9. All white space in character content is retained (except for characters removed during linefeed normalization)

  10. Attribute value delimiters are set to double quotes

  11. Special characters in attribute values and character content are replaced by character references

  12. Superfluous namespace declarations are removed from each element

  13. Default attributes are added to each element

  14. Lexicographic order is imposed on the namespace declarations and attributes of each element


Digital Signatures


To Learn More


Questions?


Part II: JDOM

There is no compelling reason for a Java API to manipulate XML to be complex, tricky, unintuitive, or a pain in the neck.
--JDOM Mission Statement


Where we're going


Trees


Processing XML with JDOM is easy


What is JDOM?


About JDOM


JDOM versions


Four packages:

org.jdom
the classes that represent an XML document and its parts
org.jdom.input
classes for reading a document into memory
org.jdom.output
classes for writing a document onto a stream or other target (e.g. SAX or DOM app)
org.jdom.adapters
classes for hooking up to DOM implementations

The org.jdom package

The classes that represent an XML document and its parts


The org.jdom.input package

Classes for reading a document into memory from a file or other source


The org.jdom.output package

The classes for writing a document to a file or other target


The org.jdom.adapters package


Writing XML Documents with JDOM


A JDOM program that writes this XML document

<?xml version="1.0"?>
<GREETING>
  Hello JDOM!
</GREETING>

Hello JDOM

import org.jdom.*;
import org.jdom.output.XMLOutputter;


public class HelloJDOM {

  public static void main(String[] args) {
   
    Element root = new Element("GREETING");
    	
    root.setText("Hello JDOM!");
         
    Document doc = new Document(root);      
    
    // At this point the document only exists in memory.
    // We still need to serialize it
    XMLOutputter outputter = new XMLOutputter();
    try {
      outputter.output(doc, System.out);       
    }
    catch (Exception e) {
      System.err.println(e);
    }

  }

}

Actual Output

<?xml version="1.0" encoding="UTF-8"?><GREETING>Hello JDOM!</GREETING>

This is more or less what we wanted, modulo white space.


Hello DOM

Here's the same program using DOM instead of JDOM. Which is simpler?

import java.io.*;
import org.w3c.dom.*;
import org.apache.xerces.dom.*;
import org.apache.xml.serialize.*; 


public class HelloDOM {

  public static void main(String[] args) {
   
    try {
      
      DOMImplementationImpl impl = (DOMImplementationImpl) 
       DOMImplementationImpl.getDOMImplementation();
       
      DocumentType type = impl.createDocumentType("GREETING", null, null);
      
      // type is supposed to be able to be null, 
      // but in practice that didn't work                     
      DocumentImpl hello 
       = (DocumentImpl) impl.createDocument(null, "GREETING", type);
      
      Element root = hello.createElement("GREETING");
      
      // We can't use a raw string. Instead we have to first create 
      // a text node.
      Text text = hello.createTextNode("Hello DOM!");
      root.appendChild(text);
      
      // Now that the document is created we need to *serialize* it
      try {
        OutputFormat format = new OutputFormat(hello);
        XMLSerializer serializer = new XMLSerializer(System.out, format);
        serializer.serialize(root);
      }
      catch (IOException e) {
        System.err.println(e); 
      }
    }
    catch (DOMException e) {
      e.printStackTrace();
    }

  }

}

A Java program that writes Fibonacci numbers into a text file

import java.math.*;
import java.io.*;


public class FibonacciText {

  public static void main(String[] args) {

    try {
      FileOutputStream fout = new FileOutputStream("fibonacci.txt");
      OutputStreamWriter out = new OutputStreamWriter(fout, "8859_1");

      BigInteger low = BigInteger.ZERO;
      BigInteger high = BigInteger.ONE;

      for (int i = 0; i <= 25; i++) {
        out.write(low.toString() + "\r\n");
        BigInteger temp = high;
        high = high.add(low);
        low = temp;
        i++;
      }
      out.write(high.toString() + "\r\n");

      out.close();
    }
    catch (IOException e) {
      System.err.println(e);
    }

  }

}

fibonacci.txt

0
1
1
2
3
5
8
13
21
34
55
89
144
233
377
610
987
1597
2584
4181
6765
10946
17711
28657
46368
75025
196418

fibonacci.xml

Suppose we want that data in an XML document that looks something like this:

<?xml version="1.0" encoding="UTF-8"?>
<Fibonacci_Numbers>
  <fibonacci index="0">0</fibonacci>
  <fibonacci index="1">1</fibonacci>
  <fibonacci index="2">1</fibonacci>
  <fibonacci index="3">2</fibonacci>
  <fibonacci index="4">3</fibonacci>
  <fibonacci index="5">5</fibonacci>
  <fibonacci index="6">8</fibonacci>
  <fibonacci index="7">13</fibonacci>
  <fibonacci index="8">21</fibonacci>
  <fibonacci index="9">34</fibonacci>
  <fibonacci index="10">55</fibonacci>
  <fibonacci index="11">89</fibonacci>
  <fibonacci index="12">144</fibonacci>
  <fibonacci index="13">233</fibonacci>
  <fibonacci index="14">377</fibonacci>
  <fibonacci index="15">610</fibonacci>
  <fibonacci index="16">987</fibonacci>
  <fibonacci index="17">1597</fibonacci>
  <fibonacci index="18">2584</fibonacci>
  <fibonacci index="19">4181</fibonacci>
  <fibonacci index="20">6765</fibonacci>
  <fibonacci index="21">10946</fibonacci>
  <fibonacci index="22">17711</fibonacci>
  <fibonacci index="23">28657</fibonacci>
  <fibonacci index="24">46368</fibonacci>
  <fibonacci index="25">75025</fibonacci>
</Fibonacci_Numbers>

A JDOM program that writes Fibonacci numbers into an XML file

import org.jdom.Element;
import org.jdom.Document;
import org.jdom.output.XMLOutputter;
import java.math.BigInteger;
import java.io.*;


public class FibonacciJDOM {

  public static void main(String[] args) {

    Element root = new Element("Fibonacci_Numbers");

    BigInteger low  = BigInteger.ZERO;
    BigInteger high = BigInteger.ONE;

    for (int i = 0; i <= 25; i++) {
      Element fibonacci = new Element("fibonacci");
      fibonacci.addAttribute("index", String.valueOf(i));
      fibonacci.setText(low.toString());
      root.addContent(fibonacci);

      BigInteger temp = high;
      high = high.add(low);
      low = temp;
    }

    Document doc = new Document(root);
    // serialize it into a file
    try {
      FileOutputStream out = new FileOutputStream("fibonacci.xml");
      XMLOutputter serializer = new XMLOutputter();
      serializer.output(doc, out);
      out.flush();
      out.close();
    }
    catch (IOException e) {
      System.err.println(e);
    }

  }

}

Output

Again, modulo white space this is correct

<?xml version="1.0" encoding="UTF-8"?><Fibonacci_Numbers><fibonacci index="0">0</fibonacci><fibonacci index="1">1</fibonacci><fibonacci index="2">1</fibonacci><fibonacci index="3">2</fibonacci><fibonacci index="4">3</fibonacci><fibonacci index="5">5</fibonacci><fibonacci index="6">8</fibonacci><fibonacci index="7">13</fibonacci><fibonacci index="8">21</fibonacci><fibonacci index="9">34</fibonacci><fibonacci index="10">55</fibonacci><fibonacci index="11">89</fibonacci><fibonacci index="12">144</fibonacci><fibonacci index="13">233</fibonacci><fibonacci index="14">377</fibonacci><fibonacci index="15">610</fibonacci><fibonacci index="16">987</fibonacci><fibonacci index="17">1597</fibonacci><fibonacci index="18">2584</fibonacci><fibonacci index="19">4181</fibonacci><fibonacci index="20">6765</fibonacci><fibonacci index="21">10946</fibonacci><fibonacci index="22">17711</fibonacci><fibonacci index="23">28657</fibonacci><fibonacci index="24">46368</fibonacci><fibonacci index="25">75025</fibonacci></Fibonacci_Numbers>

A DOM program that writes Fibonacci numbers into an XML file

Here's the same program using DOM instead of JDOM. Which is simpler?

import java.math.*;
import java.io.*;
import org.w3c.dom.*;
import org.apache.xerces.dom.*;
import org.apache.xml.serialize.*; 


public class FibonacciDOM {

  public static void main(String[] args) {
   
    try {
      
      DOMImplementationImpl impl = (DOMImplementationImpl) 
       DOMImplementationImpl.getDOMImplementation();
       
      DocumentType type = impl.createDocumentType("Fibonacci_Numbers",
       null, null);
      
      // type is supposed to be able to be null, 
      // but in practice that didn't work                     
      DocumentImpl fibonacci 
       = (DocumentImpl) impl.createDocument(null, "Fibonacci_Numbers", type);
      
      BigInteger low  = BigInteger.ZERO;
      BigInteger high = BigInteger.ONE;      
      
      Element root = fibonacci.createElement("Fibonacci_Numbers");
      // This not only creates the element; it also makes it the
      // root element of the document. 

      for (int i = 0; i <= 25; i++) {
        Element number = fibonacci.createElement("fibonacci");
        number.setAttribute("index", Integer.toString(i));
        Text text = fibonacci.createTextNode(low.toString());
        number.appendChild(text);
        root.appendChild(number);
        BigInteger temp = high;
        high = high.add(low);
        low = temp;
      }
      
      try {
        // Now that the document is created we need to *serialize* it
        FileOutputStream out = new FileOutputStream("fibonacci_8859_1.xml");
        OutputFormat format = new OutputFormat(fibonacci);
        XMLSerializer serializer = new XMLSerializer(out, format);
        serializer.serialize(root);
        out.flush();
        out.close();
      }
      catch (IOException e) {
        System.err.println(e); 
      }
    }
    catch (DOMException e) {
      e.printStackTrace();
    }

  }

}

62 lines vs. 42 lines


Suppose you want to include a DTD


ValidFibonacci

import java.math.*;
import java.io.*;
import org.jdom.*;
import org.jdom.output.XMLOutputter;


public class ValidFibonacci {

  public static void main(String[] args) {
   
    Element root = new Element("Fibonacci_Numbers");	
  	      
    BigInteger low  = BigInteger.ZERO;
    BigInteger high = BigInteger.ONE;      
    
    for (int i = 0; i <= 25; i++) {
      Element fibonacci = new Element("fibonacci");
      Attribute index = new Attribute("index", String.valueOf(i));
      fibonacci.addAttribute(index);
      fibonacci.setText(low.toString());
      BigInteger temp = high;
      high = high.add(low);
      low = temp;
      root.addContent(fibonacci);
    }
 
    DocType type = new DocType("Fibonacci_Numbers", "fibonacci.dtd");
 
    Document doc = new Document(root, type);
    // serialize it into a file
    try {
      FileOutputStream out = new FileOutputStream("validfibonacci.xml");
      XMLOutputter serializer = new XMLOutputter(); 
      serializer.output(doc, out);
      out.flush();	
      out.close();
    }
    catch (IOException e) {
      System.err.println(e);
    }

  }

}
View Output in Browser

validfibonacci.xml

<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE Fibonacci_Numbers SYSTEM "fibonacci.dtd"><Fibonacci_Numbers><fibonacci index="0">0</fibonacci><fibonacci index="1">1</fibonacci><fibonacci index="2">1</fibonacci><fibonacci index="3">2</fibonacci><fibonacci index="4">3</fibonacci><fibonacci index="5">5</fibonacci><fibonacci index="6">8</fibonacci><fibonacci index="7">13</fibonacci><fibonacci index="8">21</fibonacci><fibonacci index="9">34</fibonacci><fibonacci index="10">55</fibonacci><fibonacci index="11">89</fibonacci><fibonacci index="12">144</fibonacci><fibonacci index="13">233</fibonacci><fibonacci index="14">377</fibonacci><fibonacci index="15">610</fibonacci><fibonacci index="16">987</fibonacci><fibonacci index="17">1597</fibonacci><fibonacci index="18">2584</fibonacci><fibonacci index="19">4181</fibonacci><fibonacci index="20">6765</fibonacci><fibonacci index="21">10946</fibonacci><fibonacci index="22">17711</fibonacci><fibonacci index="23">28657</fibonacci><fibonacci index="24">46368</fibonacci><fibonacci index="25">75025</fibonacci></Fibonacci_Numbers>

Using Namespaces


With Namespace Prefixes

import org.jdom.Element;
import org.jdom.Document;
import org.jdom.output.XMLOutputter;
import java.math.BigInteger;
import java.io.*;


public class PrefixedFibonacci {

  public static void main(String[] args) {

    Element root = new Element("math", "mathml",
     "http://www.w3.org/1998/Math/MathML");

    BigInteger low  = BigInteger.ZERO;
    BigInteger high = BigInteger.ONE;

    for (int i = 0; i <= 25; i++) {

      Element mrow = new Element("mrow", "mathml",
       "http://www.w3.org/1998/Math/MathML");

      Element mi = new Element("mi", "mathml",
       "http://www.w3.org/1998/Math/MathML");
      mi.setText("f(" + i + ")");
      mrow.addContent(mi);

      Element mo = new Element("mo", "mathml",
       "http://www.w3.org/1998/Math/MathML");
      mo.setText("=");
      mrow.addContent(mo);

      Element mn = new Element("mn", "mathml",
       "http://www.w3.org/1998/Math/MathML");
      mn.setText(low.toString());
      mrow.addContent(mn);

      BigInteger temp = high;
      high = high.add(low);
      low = temp;
      root.addContent(mrow);

    }

    Document doc = new Document(root);
    // serialize it into a file
    try {
      FileOutputStream out = new FileOutputStream("prefixed_fibonacci.xml");
      XMLOutputter serializer = new XMLOutputter();
      serializer.output(doc, out);
      out.flush();
      out.close();
    }
    catch (IOException e) {
      System.err.println(e);
    }

  }

}
View Output in Browser

The Default, Unprefixed Namespace


With Default Namespace

import org.jdom.Element;
import org.jdom.Document;
import org.jdom.output.XMLOutputter;
import java.math.BigInteger;
import java.io.*;


public class UnprefixedFibonacci {

  public static void main(String[] args) {
   
    Element root = new Element("math", "http://www.w3.org/1998/Math/MathML");	
  	      
    BigInteger low  = BigInteger.ZERO;
    BigInteger high = BigInteger.ONE;      
    
    for (int i = 0; i <= 25; i++) {
        
      Element mrow = new Element("mrow", "http://www.w3.org/1998/Math/MathML");
      
      Element mi = new Element("mi", "http://www.w3.org/1998/Math/MathML");
      mi.setText("f(" + i + ")"); 
      mrow.addContent(mi);
      
      Element mo = new Element("mo", "http://www.w3.org/1998/Math/MathML");
      mo.setText("="); 
      mrow.addContent(mo);
      
      Element mn = new Element("mn", "http://www.w3.org/1998/Math/MathML");
      mn.setText(low.toString());
      mrow.addContent(mn);

      BigInteger temp = high;
      high = high.add(low);
      low = temp;
      root.addContent(mrow);
      
    }
 
    Document doc = new Document(root);
    // serialize it into a file
    try {
      FileOutputStream out = new FileOutputStream("unprefixed_fibonacci.xml");
      XMLOutputter serializer = new XMLOutputter(); 
      serializer.output(doc, out);
      out.flush();	
      out.close();
    }
    catch (IOException e) {
      System.err.println(e);
    }

  }

}
View Output in Browser

Converting data to XML


Sample Tab Delimited Data: Baseball Statistics

SurnameFirstNameTeamPositionGames PlayedGames StartedAtBatsRunsHitsDoublesTriplesHome runsRBIStolen BasesCaught StealingSacrifice HitsSacrifice FliesErrorsPBWalksStrike outsHit by pitch
AndersonGarret ANAOutfield15615162262183417157983336029801
BaughmanJustin ANASecond Base625419624509112010453806361
BolickFrank ANAThird Base2111453720120000001180
DisarcinaGary ANAShortstop1571555517315839335612712314021518
EdmondsJim ANAOutfield1541505991151844212591751150571141
ErstadDarin ANAOutfield133129537841593931982206133043776
GarciaCarlos ANASecond Base1910354510002010103111
GlausTroy ANAThird Base484516519369012310027015510
GreeneTodd ANAOutfield29157131840170000002200
HelfandEric ANACatcher000000000000000000
HollinsDave ANAThird Base10198363608816211391132217044697
JefferiesGregg ANAOutfield19187272560110100000050
JohnsonMark ANAFirst Base10214110000000000060
KreuterChad ANACatcher9674252276310123310519533493
MartinNorberto ANASecond Base79501952042201133132406290
MashoreDamon ANAOutfield4324981323602111010009223
MolinaBen ANACatcher201000000000000000
NevinPhil ANACatcher7565237275481827000252017675
O'BrienCharlie ANACatcher625817513459041800334110332
PalmeiroOrlando ANAOutfield743416528537202154700020110
PritchettChris ANAFirst Base311980122321282000104160
SalmonTim ANADesignated Hitter1361304638413928126880101020901003
ShipleyCraig ANAThird Base77321471838712170441305225
VelardeRandy ANASecond Base5150188294913142672014034421
WalbeckMatt ANACatcher10891338418715264611557830682
WilliamsReggie ANAOutfield2973671310153310007111

A Program to convert tab delimited data to XML

import java.io.*;
import org.jdom.*;
import org.jdom.output.XMLOutputter;


public class BaseballTabToXML {

  public static void main(String[] args) {
     
    Element root = new Element("players");
    
    try {
      FileInputStream fin = new FileInputStream(args[0]);
      BufferedReader in 
       = new BufferedReader(new InputStreamReader(fin));    

      String playerStats;  
      while ((playerStats = in.readLine()) != null) {
        String[] stats = splitLine(playerStats);
        
        Element player = new Element("player");

        Element first_name = new Element("first_name");
        first_name.setText(stats[1]);
        player.addContent(first_name);
        
        Element surname = new Element("surname");
        surname.setText(stats[0]);
        player.addContent(surname);
       
        Element games_played = new Element("games_played");
        games_played.setText(stats[4]);
        player.addContent(games_played);
       
        Element at_bats = new Element("at_bats");
        at_bats.setText(stats[6]);
        player.addContent(at_bats);
       
        Element runs = new Element("runs");
        runs.setText(stats[7]);
        player.addContent(runs);
       
        Element hits = new Element("hits");
        hits.setText(stats[8]);
        player.addContent(hits);
       
        Element doubles = new Element("doubles");
        doubles.setText(stats[9]);
        player.addContent(doubles);
       
        Element triples = new Element("triples");
        triples.setText(stats[10]);
        player.addContent(triples); 

        Element home_runs = new Element("home_runs");
        home_runs.setText(stats[11]);
        player.addContent(home_runs); 

        Element runs_batted_in = new Element("runs_batted_in");
        runs_batted_in.setText(stats[12]);
        player.addContent(runs_batted_in); 

        Element stolen_bases = new Element("stolen_bases");
        stolen_bases.setText(stats[13]);
        player.addContent(stolen_bases); 

        Element caught_stealing = new Element("caught_stealing");
        caught_stealing.setText(stats[14]);
        player.addContent(caught_stealing); 

        Element sacrifice_hits = new Element("sacrifice_hits");
        sacrifice_hits.setText(stats[15]);
        player.addContent(sacrifice_hits); 

        Element sacrifice_flies = new Element("sacrifice_flies");
        sacrifice_flies.setText(stats[16]);
        player.addContent(sacrifice_flies); 

        Element errors = new Element("errors");
        errors.setText(stats[17]);
        player.addContent(errors); 

        Element passed_by_ball = new Element("passed_by_ball");
        passed_by_ball.setText(stats[18]);
        player.addContent(passed_by_ball); 

        Element walks = new Element("walks");
        walks.setText(stats[19]);
        player.addContent(walks); 

        Element strike_outs = new Element("strike_outs");
        strike_outs.setText(stats[20]);
        player.addContent(strike_outs); 

        Element hit_by_pitch = new Element("hit_by_pitch");
        hit_by_pitch.setText(stats[21]);
        player.addContent(hit_by_pitch); 
        
        root.addContent(player);
      }  
      
      Document doc = new Document(root);
      // serialize it into a file
      FileOutputStream fout 
       = new FileOutputStream("baseballstats.xml");
      
      XMLOutputter serializer = new XMLOutputter(); 
      serializer.output(doc, fout);
      fout.flush();	
      fout.close();
      in.close();
      
    }
    catch (IOException e) {
      System.err.println(e);
    }
    catch (ArrayIndexOutOfBoundsException e) {
      System.out.println("Usage: java BaseballTabToXML input_file.tab");
    }

  }

  public static String[] splitLine(String playerStats) {
    
    // count the number of tabs
    int numTabs = 0;
    for (int i = 0; i < playerStats.length(); i++) {
      if (playerStats.charAt(i) == '\t') numTabs++;
    }
    int numFields = numTabs + 1;
    String[] fields = new String[numFields];
    int position = 0;
    for (int i = 0; i < numFields; i++) {
      StringBuffer field = new StringBuffer();
      while (position < playerStats.length() 
       && playerStats.charAt(position++) != '\t') {
        field.append(playerStats.charAt(position-1));
      }
      fields[i] = field.toString();
    }    
    return fields;
    
  }

}
View Output in Browser

Baseball Stats in XML

<?xml version="1.0"?>
<players>
  <player>
    <first_name>FirstName</first_name>
    <surname>Surname</surname>
    <games_played>Games Played</games_played>
    <at_bats>AtBats</at_bats>
    <runs>Runs</runs>
    <hits>Hits</hits>
    <doubles>Doubles</doubles>
    <triples>Triples</triples>
    <home_runs>Home runs</home_runs>
    <stolen_bases>RBI</stolen_bases>
    <caught_stealing>Caught Stealing</caught_stealing>
    <sacrifice_hits>Sacrifice Hits</sacrifice_hits>
    <sacrifice_flies>Sacrifice Flies</sacrifice_flies>
    <errors>Errors</errors>
    <passed_by_ball>PB</passed_by_ball>
    <walks>Walks</walks>
    <strike_outs>Strike outs</strike_outs>
    <hit_by_pitch>Hit by pitch</hit_by_pitch>
  </player>
  <player>
    <first_name>Garret </first_name>
    <surname>Anderson</surname>
    <games_played>156</games_played>
    <at_bats>622</at_bats>
    <runs>62</runs>
    <hits>183</hits>
    <doubles>41</doubles>
    <triples>7</triples>
    <home_runs>15</home_runs>
    <stolen_bases>79</stolen_bases>
    <caught_stealing>3</caught_stealing>
    <sacrifice_hits>3</sacrifice_hits>
    <sacrifice_flies>3</sacrifice_flies>
    <errors>6</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>29</walks>
    <strike_outs>80</strike_outs>
    <hit_by_pitch>1</hit_by_pitch>
  </player>
  <player>
    <first_name>Justin </first_name>
    <surname>Baughman</surname>
    <games_played>62</games_played>
    <at_bats>196</at_bats>
    <runs>24</runs>
    <hits>50</hits>
    <doubles>9</doubles>
    <triples>1</triples>
    <home_runs>1</home_runs>
    <stolen_bases>20</stolen_bases>
    <caught_stealing>4</caught_stealing>
    <sacrifice_hits>5</sacrifice_hits>
    <sacrifice_flies>3</sacrifice_flies>
    <errors>8</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>6</walks>
    <strike_outs>36</strike_outs>
    <hit_by_pitch>1</hit_by_pitch>
  </player>
  <player>
    <first_name>Frank </first_name>
    <surname>Bolick</surname>
    <games_played>21</games_played>
    <at_bats>45</at_bats>
    <runs>3</runs>
    <hits>7</hits>
    <doubles>2</doubles>
    <triples>0</triples>
    <home_runs>1</home_runs>
    <stolen_bases>2</stolen_bases>
    <caught_stealing>0</caught_stealing>
    <sacrifice_hits>0</sacrifice_hits>
    <sacrifice_flies>0</sacrifice_flies>
    <errors>0</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>11</walks>
    <strike_outs>8</strike_outs>
    <hit_by_pitch>0</hit_by_pitch>
  </player>
  <player>
    <first_name>Gary </first_name>
    <surname>Disarcina</surname>
    <games_played>157</games_played>
    <at_bats>551</at_bats>
    <runs>73</runs>
    <hits>158</hits>
    <doubles>39</doubles>
    <triples>3</triples>
    <home_runs>3</home_runs>
    <stolen_bases>56</stolen_bases>
    <caught_stealing>7</caught_stealing>
    <sacrifice_hits>12</sacrifice_hits>
    <sacrifice_flies>3</sacrifice_flies>
    <errors>14</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>21</walks>
    <strike_outs>51</strike_outs>
    <hit_by_pitch>8</hit_by_pitch>
  </player>
  <player>
    <first_name>Jim </first_name>
    <surname>Edmonds</surname>
    <games_played>154</games_played>
    <at_bats>599</at_bats>
    <runs>115</runs>
    <hits>184</hits>
    <doubles>42</doubles>
    <triples>1</triples>
    <home_runs>25</home_runs>
    <stolen_bases>91</stolen_bases>
    <caught_stealing>5</caught_stealing>
    <sacrifice_hits>1</sacrifice_hits>
    <sacrifice_flies>1</sacrifice_flies>
    <errors>5</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>57</walks>
    <strike_outs>114</strike_outs>
    <hit_by_pitch>1</hit_by_pitch>
  </player>
  <player>
    <first_name>Darin </first_name>
    <surname>Erstad</surname>
    <games_played>133</games_played>
    <at_bats>537</at_bats>
    <runs>84</runs>
    <hits>159</hits>
    <doubles>39</doubles>
    <triples>3</triples>
    <home_runs>19</home_runs>
    <stolen_bases>82</stolen_bases>
    <caught_stealing>6</caught_stealing>
    <sacrifice_hits>1</sacrifice_hits>
    <sacrifice_flies>3</sacrifice_flies>
    <errors>3</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>43</walks>
    <strike_outs>77</strike_outs>
    <hit_by_pitch>6</hit_by_pitch>
  </player>
  <player>
    <first_name>Carlos </first_name>
    <surname>Garcia</surname>
    <games_played>19</games_played>
    <at_bats>35</at_bats>
    <runs>4</runs>
    <hits>5</hits>
    <doubles>1</doubles>
    <triples>0</triples>
    <home_runs>0</home_runs>
    <stolen_bases>0</stolen_bases>
    <caught_stealing>0</caught_stealing>
    <sacrifice_hits>1</sacrifice_hits>
    <sacrifice_flies>0</sacrifice_flies>
    <errors>1</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>3</walks>
    <strike_outs>11</strike_outs>
    <hit_by_pitch>1</hit_by_pitch>
  </player>
  <player>
    <first_name>Troy </first_name>
    <surname>Glaus</surname>
    <games_played>48</games_played>
    <at_bats>165</at_bats>
    <runs>19</runs>
    <hits>36</hits>
    <doubles>9</doubles>
    <triples>0</triples>
    <home_runs>1</home_runs>
    <stolen_bases>23</stolen_bases>
    <caught_stealing>0</caught_stealing>
    <sacrifice_hits>0</sacrifice_hits>
    <sacrifice_flies>2</sacrifice_flies>
    <errors>7</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>15</walks>
    <strike_outs>51</strike_outs>
    <hit_by_pitch>0</hit_by_pitch>
  </player>
  <player>
    <first_name>Todd </first_name>
    <surname>Greene</surname>
    <games_played>29</games_played>
    <at_bats>71</at_bats>
    <runs>3</runs>
    <hits>18</hits>
    <doubles>4</doubles>
    <triples>0</triples>
    <home_runs>1</home_runs>
    <stolen_bases>7</stolen_bases>
    <caught_stealing>0</caught_stealing>
    <sacrifice_hits>0</sacrifice_hits>
    <sacrifice_flies>0</sacrifice_flies>
    <errors>0</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>2</walks>
    <strike_outs>20</strike_outs>
    <hit_by_pitch>0</hit_by_pitch>
  </player>
  <player>
    <first_name>Eric </first_name>
    <surname>Helfand</surname>
    <games_played>0</games_played>
    <at_bats>0</at_bats>
    <runs>0</runs>
    <hits>0</hits>
    <doubles>0</doubles>
    <triples>0</triples>
    <home_runs>0</home_runs>
    <stolen_bases>0</stolen_bases>
    <caught_stealing>0</caught_stealing>
    <sacrifice_hits>0</sacrifice_hits>
    <sacrifice_flies>0</sacrifice_flies>
    <errors>0</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>0</walks>
    <strike_outs>0</strike_outs>
    <hit_by_pitch>0</hit_by_pitch>
  </player>
  <player>
    <first_name>Dave </first_name>
    <surname>Hollins</surname>
    <games_played>101</games_played>
    <at_bats>363</at_bats>
    <runs>60</runs>
    <hits>88</hits>
    <doubles>16</doubles>
    <triples>2</triples>
    <home_runs>11</home_runs>
    <stolen_bases>39</stolen_bases>
    <caught_stealing>3</caught_stealing>
    <sacrifice_hits>2</sacrifice_hits>
    <sacrifice_flies>2</sacrifice_flies>
    <errors>17</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>44</walks>
    <strike_outs>69</strike_outs>
    <hit_by_pitch>7</hit_by_pitch>
  </player>
  <player>
    <first_name>Gregg </first_name>
    <surname>Jefferies</surname>
    <games_played>19</games_played>
    <at_bats>72</at_bats>
    <runs>7</runs>
    <hits>25</hits>
    <doubles>6</doubles>
    <triples>0</triples>
    <home_runs>1</home_runs>
    <stolen_bases>10</stolen_bases>
    <caught_stealing>0</caught_stealing>
    <sacrifice_hits>0</sacrifice_hits>
    <sacrifice_flies>0</sacrifice_flies>
    <errors>0</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>0</walks>
    <strike_outs>5</strike_outs>
    <hit_by_pitch>0</hit_by_pitch>
  </player>
  <player>
    <first_name>Mark </first_name>
    <surname>Johnson</surname>
    <games_played>10</games_played>
    <at_bats>14</at_bats>
    <runs>1</runs>
    <hits>1</hits>
    <doubles>0</doubles>
    <triples>0</triples>
    <home_runs>0</home_runs>
    <stolen_bases>0</stolen_bases>
    <caught_stealing>0</caught_stealing>
    <sacrifice_hits>0</sacrifice_hits>
    <sacrifice_flies>0</sacrifice_flies>
    <errors>0</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>0</walks>
    <strike_outs>6</strike_outs>
    <hit_by_pitch>0</hit_by_pitch>
  </player>
  <player>
    <first_name>Chad </first_name>
    <surname>Kreuter</surname>
    <games_played>96</games_played>
    <at_bats>252</at_bats>
    <runs>27</runs>
    <hits>63</hits>
    <doubles>10</doubles>
    <triples>1</triples>
    <home_runs>2</home_runs>
    <stolen_bases>33</stolen_bases>
    <caught_stealing>0</caught_stealing>
    <sacrifice_hits>5</sacrifice_hits>
    <sacrifice_flies>1</sacrifice_flies>
    <errors>9</errors>
    <passed_by_ball>5</passed_by_ball>
    <walks>33</walks>
    <strike_outs>49</strike_outs>
    <hit_by_pitch>3</hit_by_pitch>
  </player>
  <player>
    <first_name>Norberto </first_name>
    <surname>Martin</surname>
    <games_played>79</games_played>
    <at_bats>195</at_bats>
    <runs>20</runs>
    <hits>42</hits>
    <doubles>2</doubles>
    <triples>0</triples>
    <home_runs>1</home_runs>
    <stolen_bases>13</stolen_bases>
    <caught_stealing>1</caught_stealing>
    <sacrifice_hits>3</sacrifice_hits>
    <sacrifice_flies>2</sacrifice_flies>
    <errors>4</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>6</walks>
    <strike_outs>29</strike_outs>
    <hit_by_pitch>0</hit_by_pitch>
  </player>
  <player>
    <first_name>Damon </first_name>
    <surname>Mashore</surname>
    <games_played>43</games_played>
    <at_bats>98</at_bats>
    <runs>13</runs>
    <hits>23</hits>
    <doubles>6</doubles>
    <triples>0</triples>
    <home_runs>2</home_runs>
    <stolen_bases>11</stolen_bases>
    <caught_stealing>0</caught_stealing>
    <sacrifice_hits>1</sacrifice_hits>
    <sacrifice_flies>0</sacrifice_flies>
    <errors>0</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>9</walks>
    <strike_outs>22</strike_outs>
    <hit_by_pitch>3</hit_by_pitch>
  </player>
  <player>
    <first_name>Ben </first_name>
    <surname>Molina</surname>
    <games_played>2</games_played>
    <at_bats>1</at_bats>
    <runs>0</runs>
    <hits>0</hits>
    <doubles>0</doubles>
    <triples>0</triples>
    <home_runs>0</home_runs>
    <stolen_bases>0</stolen_bases>
    <caught_stealing>0</caught_stealing>
    <sacrifice_hits>0</sacrifice_hits>
    <sacrifice_flies>0</sacrifice_flies>
    <errors>0</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>0</walks>
    <strike_outs>0</strike_outs>
    <hit_by_pitch>0</hit_by_pitch>
  </player>
  <player>
    <first_name>Phil </first_name>
    <surname>Nevin</surname>
    <games_played>75</games_played>
    <at_bats>237</at_bats>
    <runs>27</runs>
    <hits>54</hits>
    <doubles>8</doubles>
    <triples>1</triples>
    <home_runs>8</home_runs>
    <stolen_bases>27</stolen_bases>
    <caught_stealing>0</caught_stealing>
    <sacrifice_hits>0</sacrifice_hits>
    <sacrifice_flies>2</sacrifice_flies>
    <errors>5</errors>
    <passed_by_ball>20</passed_by_ball>
    <walks>17</walks>
    <strike_outs>67</strike_outs>
    <hit_by_pitch>5</hit_by_pitch>
  </player>
  <player>
    <first_name>Charlie </first_name>
    <surname>Obrien</surname>
    <games_played>62</games_played>
    <at_bats>175</at_bats>
    <runs>13</runs>
    <hits>45</hits>
    <doubles>9</doubles>
    <triples>0</triples>
    <home_runs>4</home_runs>
    <stolen_bases>18</stolen_bases>
    <caught_stealing>0</caught_stealing>
    <sacrifice_hits>3</sacrifice_hits>
    <sacrifice_flies>3</sacrifice_flies>
    <errors>4</errors>
    <passed_by_ball>1</passed_by_ball>
    <walks>10</walks>
    <strike_outs>33</strike_outs>
    <hit_by_pitch>2</hit_by_pitch>
  </player>
  <player>
    <first_name>Orlando </first_name>
    <surname>Palmeiro</surname>
    <games_played>74</games_played>
    <at_bats>165</at_bats>
    <runs>28</runs>
    <hits>53</hits>
    <doubles>7</doubles>
    <triples>2</triples>
    <home_runs>0</home_runs>
    <stolen_bases>21</stolen_bases>
    <caught_stealing>4</caught_stealing>
    <sacrifice_hits>7</sacrifice_hits>
    <sacrifice_flies>0</sacrifice_flies>
    <errors>0</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>20</walks>
    <strike_outs>11</strike_outs>
    <hit_by_pitch>0</hit_by_pitch>
  </player>
  <player>
    <first_name>Chris </first_name>
    <surname>Pritchett</surname>
    <games_played>31</games_played>
    <at_bats>80</at_bats>
    <runs>12</runs>
    <hits>23</hits>
    <doubles>2</doubles>
    <triples>1</triples>
    <home_runs>2</home_runs>
    <stolen_bases>8</stolen_bases>
    <caught_stealing>0</caught_stealing>
    <sacrifice_hits>0</sacrifice_hits>
    <sacrifice_flies>0</sacrifice_flies>
    <errors>1</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>4</walks>
    <strike_outs>16</strike_outs>
    <hit_by_pitch>0</hit_by_pitch>
  </player>
  <player>
    <first_name>Tim </first_name>
    <surname>Salmon</surname>
    <games_played>136</games_played>
    <at_bats>463</at_bats>
    <runs>84</runs>
    <hits>139</hits>
    <doubles>28</doubles>
    <triples>1</triples>
    <home_runs>26</home_runs>
    <stolen_bases>88</stolen_bases>
    <caught_stealing>1</caught_stealing>
    <sacrifice_hits>0</sacrifice_hits>
    <sacrifice_flies>10</sacrifice_flies>
    <errors>2</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>90</walks>
    <strike_outs>100</strike_outs>
    <hit_by_pitch>3</hit_by_pitch>
  </player>
  <player>
    <first_name>Craig </first_name>
    <surname>Shipley</surname>
    <games_played>77</games_played>
    <at_bats>147</at_bats>
    <runs>18</runs>
    <hits>38</hits>
    <doubles>7</doubles>
    <triples>1</triples>
    <home_runs>2</home_runs>
    <stolen_bases>17</stolen_bases>
    <caught_stealing>4</caught_stealing>
    <sacrifice_hits>4</sacrifice_hits>
    <sacrifice_flies>1</sacrifice_flies>
    <errors>3</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>5</walks>
    <strike_outs>22</strike_outs>
    <hit_by_pitch>5</hit_by_pitch>
  </player>
  <player>
    <first_name>Randy </first_name>
    <surname>Velarde</surname>
    <games_played>51</games_played>
    <at_bats>188</at_bats>
    <runs>29</runs>
    <hits>49</hits>
    <doubles>13</doubles>
    <triples>1</triples>
    <home_runs>4</home_runs>
    <stolen_bases>26</stolen_bases>
    <caught_stealing>2</caught_stealing>
    <sacrifice_hits>0</sacrifice_hits>
    <sacrifice_flies>1</sacrifice_flies>
    <errors>4</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>34</walks>
    <strike_outs>42</strike_outs>
    <hit_by_pitch>1</hit_by_pitch>
  </player>
  <player>
    <first_name>Matt </first_name>
    <surname>Walbeck</surname>
    <games_played>108</games_played>
    <at_bats>338</at_bats>
    <runs>41</runs>
    <hits>87</hits>
    <doubles>15</doubles>
    <triples>2</triples>
    <home_runs>6</home_runs>
    <stolen_bases>46</stolen_bases>
    <caught_stealing>1</caught_stealing>
    <sacrifice_hits>5</sacrifice_hits>
    <sacrifice_flies>5</sacrifice_flies>
    <errors>7</errors>
    <passed_by_ball>8</passed_by_ball>
    <walks>30</walks>
    <strike_outs>68</strike_outs>
    <hit_by_pitch>2</hit_by_pitch>
  </player>
  <player>
    <first_name>Reggie </first_name>
    <surname>Williams</surname>
    <games_played>29</games_played>
    <at_bats>36</at_bats>
    <runs>7</runs>
    <hits>13</hits>
    <doubles>1</doubles>
    <triples>0</triples>
    <home_runs>1</home_runs>
    <stolen_bases>5</stolen_bases>
    <caught_stealing>3</caught_stealing>
    <sacrifice_hits>1</sacrifice_hits>
    <sacrifice_flies>0</sacrifice_flies>
    <errors>0</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>7</walks>
    <strike_outs>11</strike_outs>
    <hit_by_pitch>1</hit_by_pitch>
  </player>
</players>

A Shortcut

import java.io.*;
import org.jdom.*;
import org.jdom.output.XMLOutputter;


public class BaseballTabToXMLShortcut {

  public static void main(String[] args) {
     
    Element root = new Element("players");
    
    try {
      FileInputStream fin = new FileInputStream(args[0]);
      BufferedReader in 
       = new BufferedReader(new InputStreamReader(fin));    

      String playerStats;  
      while ((playerStats = in.readLine()) != null) {
        String[] stats = splitLine(playerStats);
        
        Element player = new Element("player");

        player.addContent((new Element("first_name")).setText(stats[1]));
        player.addContent((new Element("surname")).setText(stats[0]));
        player.addContent((new Element("games_played")).setText(stats[4]));
        player.addContent((new Element("at_bats")).setText(stats[6]));
        player.addContent((new Element("runs")).setText(stats[7]));
        player.addContent((new Element("hits")).setText(stats[8]));
        player.addContent((new Element("doubles")).setText(stats[9]));
        player.addContent((new Element("triples")).setText(stats[10]));
        player.addContent((new Element("home_runs")).setText(stats[11]));
        player.addContent((new Element("runs_batted_in")).setText(stats[12]));
        player.addContent((new Element("stolen_bases")).setText(stats[13]));
        player.addContent((new Element("caught_stealing")).setText(stats[14]));
        player.addContent((new Element("sacrifice_hits")).setText(stats[15]));
        player.addContent((new Element("sacrifice_flies")).setText(stats[16]));
        player.addContent((new Element("errors")).setText(stats[17]));
        player.addContent((new Element("passed_by_ball")).setText(stats[18]));
        player.addContent((new Element("walks")).setText(stats[19]));
        player.addContent((new Element("strike_outs")).setText(stats[20]));
        player.addContent((new Element("hit_by_pitch")).setText(stats[21]));
        
        root.addContent(player);
      }  
      
      Document doc = new Document(root);
      // serialize it into a file
      FileOutputStream fout 
       = new FileOutputStream("baseballstats.xml");
      
      XMLOutputter serializer = new XMLOutputter(); 
      serializer.output(doc, fout);
      fout.flush();	
      fout.close();
      in.close();
      
    }
    catch (IOException e) {
      System.err.println(e);
    }
    catch (ArrayIndexOutOfBoundsException e) {
      System.out.println("Usage: java BaseballTabToXML input_file.tab");
    }

  }

  public static String[] splitLine(String playerStats) {
    
    // count the number of tabs
    int numTabs = 0;
    for (int i = 0; i < playerStats.length(); i++) {
      if (playerStats.charAt(i) == '\t') numTabs++;
    }
    int numFields = numTabs + 1;
    String[] fields = new String[numFields];
    int position = 0;
    for (int i = 0; i < numFields; i++) {
      StringBuffer field = new StringBuffer();
      while (position < playerStats.length() 
       && playerStats.charAt(position++) != '\t') {
        field.append(playerStats.charAt(position-1));
      }
      fields[i] = field.toString();
    }    
    return fields;
    
  }

}

Converting data to XML while Processing it

import java.io.*;
import java.text.*;
import java.util.*;
import org.jdom.*;
import org.jdom.output.XMLOutputter;

public class BattingAverage {

  public static void main(String[] args) {
     
    Element root = new Element("players");
     
    try {
      FileInputStream fin = new FileInputStream(args[0]);
      BufferedReader in 
       = new BufferedReader(new InputStreamReader(fin));
      
      String playerStats;
      
      // for formatting batting averages
      DecimalFormat averages = (DecimalFormat) 
       NumberFormat.getNumberInstance(Locale.US);
      averages.setMaximumFractionDigits(3);
      averages.setMinimumFractionDigits(3);
      averages.setMinimumIntegerDigits(0);
      
      while ((playerStats = in.readLine()) != null) {
        String[] stats = splitLine(playerStats);
        
        String formattedAverage;
        try {
          int atBats         = Integer.parseInt(stats[6]);
          int hits           = Integer.parseInt(stats[8]);
          int walks          = Integer.parseInt(stats[19]);
          int hitByPitch     = Integer.parseInt(stats[21]);
          int sacrificeFlies = Integer.parseInt(stats[16]);
          int sacrificeHits  = Integer.parseInt(stats[15]);
        
          int officialAtBats 
           = atBats - walks - hitByPitch - sacrificeHits;
          if (officialAtBats <= 0) formattedAverage = "N/A";
          else {
            double average = hits / (double) officialAtBats;
            formattedAverage = averages.format(average);
          }       
        }
        catch (Exception e) {
          // skip this player
          continue; 
        }

        Element player = new Element("player");

        Element first_name = new Element("first_name");
        first_name.setText(stats[1]);
        player.addContent(first_name);
             
        Element surname = new Element("surname");
        surname.setText(stats[0]);
        player.addContent(surname);
       
        Element battingAverage = new Element("batting_average");
        battingAverage.setText(formattedAverage);
        player.addContent(battingAverage);
   
        root.addContent(player);
        
      }  
      
      
      Document doc = new Document(root);
      // serialize it into a file
      FileOutputStream fout 
       = new FileOutputStream("battingaverages.xml");
      
      XMLOutputter serializer = new XMLOutputter(); 
      serializer.output(doc, fout);
      fout.flush();	
      fout.close();
      in.close();

    }
    catch (IOException e) {
      System.err.println(e);
    }
    catch (ArrayIndexOutOfBoundsException e) {
      System.out.println("Usage: java BattingAverage input_file.tab");
    }

  }

  public static String[] splitLine(String playerStats) {
    
    // count the number of tabs
    int numTabs = 0;
    for (int i = 0; i < playerStats.length(); i++) {
      if (playerStats.charAt(i) == '\t') numTabs++;
    }
    int numFields = numTabs + 1;
    String[] fields = new String[numFields];
    int position = 0;
    for (int i = 0; i < numFields; i++) {
      StringBuffer field = new StringBuffer();
      while (position < playerStats.length() 
       && playerStats.charAt(position++) != '\t') {
        field.append(playerStats.charAt(position-1));
      }
      fields[i] = field.toString();
    }    
    return fields;
    
  }

}
View Output in Browser

Batting Averages in XML

<?xml version="1.0"?>
<players>
  <player>
    <first_name>Garret </first_name>
    <surname>Anderson</surname>
    <batting_average>.311</batting_average>
  </player>
  <player>
    <first_name>Justin </first_name>
    <surname>Baughman</surname>
    <batting_average>.272</batting_average>
  </player>
  <player>
    <first_name>Frank </first_name>
    <surname>Bolick</surname>
    <batting_average>.206</batting_average>
  </player>
  <player>
    <first_name>Gary </first_name>
    <surname>Disarcina</surname>
    <batting_average>.310</batting_average>
  </player>
  <player>
    <first_name>Jim </first_name>
    <surname>Edmonds</surname>
    <batting_average>.341</batting_average>
  </player>
  <player>
    <first_name>Darin </first_name>
    <surname>Erstad</surname>
    <batting_average>.326</batting_average>
  </player>
  <player>
    <first_name>Carlos </first_name>
    <surname>Garcia</surname>
    <batting_average>.167</batting_average>
  </player>
  <player>
    <first_name>Troy </first_name>
    <surname>Glaus</surname>
    <batting_average>.240</batting_average>
  </player>
  <player>
    <first_name>Todd </first_name>
    <surname>Greene</surname>
    <batting_average>.261</batting_average>
  </player>
  <player>
    <first_name>Eric </first_name>
    <surname>Helfand</surname>
    <batting_average>N/A</batting_average>
  </player>
  <player>
    <first_name>Dave </first_name>
    <surname>Hollins</surname>
    <batting_average>.284</batting_average>
  </player>
  <player>
    <first_name>Gregg </first_name>
    <surname>Jefferies</surname>
    <batting_average>.347</batting_average>
  </player>
  <player>
    <first_name>Mark </first_name>
    <surname>Johnson</surname>
    <batting_average>.071</batting_average>
  </player>
  <player>
    <first_name>Chad </first_name>
    <surname>Kreuter</surname>
    <batting_average>.299</batting_average>
  </player>
  <player>
    <first_name>Norberto </first_name>
    <surname>Martin</surname>
    <batting_average>.226</batting_average>
  </player>
  <player>
    <first_name>Damon </first_name>
    <surname>Mashore</surname>
    <batting_average>.271</batting_average>
  </player>
  <player>
    <first_name>Ben </first_name>
    <surname>Molina</surname>
    <batting_average>.000</batting_average>
  </player>
  <player>
    <first_name>Phil </first_name>
    <surname>Nevin</surname>
    <batting_average>.251</batting_average>
  </player>
  <player>
    <first_name>Charlie </first_name>
    <surname>Obrien</surname>
    <batting_average>.281</batting_average>
  </player>
  <player>
    <first_name>Orlando </first_name>
    <surname>Palmeiro</surname>
    <batting_average>.384</batting_average>
  </player>
  <player>
    <first_name>Chris </first_name>
    <surname>Pritchett</surname>
    <batting_average>.303</batting_average>
  </player>
  <player>
    <first_name>Tim </first_name>
    <surname>Salmon</surname>
    <batting_average>.376</batting_average>
  </player>
  <player>
    <first_name>Craig </first_name>
    <surname>Shipley</surname>
    <batting_average>.286</batting_average>
  </player>
  <player>
    <first_name>Randy </first_name>
    <surname>Velarde</surname>
    <batting_average>.320</batting_average>
  </player>
  <player>
    <first_name>Matt </first_name>
    <surname>Walbeck</surname>
    <batting_average>.289</batting_average>
  </player>
  <player>
    <first_name>Reggie </first_name>
    <surname>Williams</surname>
    <batting_average>.481</batting_average>
  </player>
</players>

Advantages of JDOM for Writing Documents


Reading XML with JDOM


Parser APIs


JDOM Compatible Parsers for Java

Any SAX or DOM compatible parser including:


SAX


SAX2


The SAX Process


Event Based API Caveats


Document Object Model


The Design of the DOM API


DOM Evolution


Eight Modules:


DOM Trees


org.w3c.dom


The DOM Process


The JDOM Process


Parsing a Document with JDOM

import org.jdom.JDOMException;
import org.jdom.input.SAXBuilder;


public class JDOMChecker {

  public static void main(String[] args) {
  
    if (args.length == 0) {
      System.out.println("Usage: java JDOMChecker URL1 URL2..."); 
    } 
      
    SAXBuilder builder = new SAXBuilder();
     
    // start parsing... 
    for (int i = 0; i < args.length; i++) {
      
      // command line should offer URIs or file names
      try {
        builder.build(args[i]);
        // If there are no well-formedness errors, 
        // then no exception is thrown
        System.out.println(args[i] + " is well formed.");
      }
      catch (JDOMException e) { // indicates a well-formedness or other error
        System.out.println(args[i] + " is not well formed.");
        System.out.println(e.getMessage());
      }
      
    }   
  
  }

}

Parser Results

% java JDOMChecker shortlogs.xml HelloJDOM.java
shortlogs.xml is well formed.
HelloJDOM.java is not well formed.
The markup in the document preceding the root element must be well-formed.: 
Error on line 1 of XML document: The markup in the document preceding the 
root element must be well-formed.

Turning on Validation in JDOM


JDOM Validator

import org.jdom.JDOMException;
import org.jdom.input.SAXBuilder;


public class Validator {

  public static void main(String[] args) {
  
    if (args.length == 0) {
      System.out.println("Usage: java Validator URL1 URL2..."); 
    } 
      
    SAXBuilder builder = new SAXBuilder(true);
                             /*         ^^^^       */
                             /* Turn on validation */
    // start parsing... 
    // start parsing... 
    for (int i = 0; i < args.length; i++) {
      
      // command line should offer URIs or file names
      try {
          builder.build(args[i]);
        // If there are no well-formedness errors, 
        // then no exception is thrown
        System.out.println(args[i] + " is well formed.");
      }
      catch (JDOMException e) { // indicates a well-formedness or validity error
        System.out.println(args[i] + " is not valid.");
        System.out.println(e.getMessage());
      }
      
    }   
  
  }

}

Validation Output

% java Validator invalid_fibonacci.xml
invalid_fibonacci.xml is not valid.
Element type "title" must be declared.: Error on line 8 of XML document: 
Element type "title" must be declared.

% java Validator validfibonacci.xml
validfibonacci.xml is valid.

Building with DOM instead of SAX


DOMBuilder Example

import org.jdom.*;
import org.jdom.input.DOMBuilder;
import org.apache.xerces.parsers.*;


public class DOMValidator {

  public static void main(String[] args) {
  
    if (args.length == 0) {
      System.out.println("Usage: java DOMValidator URL1 URL2..."); 
    }      
      
    DOMBuilder builder = new DOMBuilder(true);
                             /*         ^^^^       */
                             /* Turn on validation */
    // start parsing... 
    DOMParser parser = new DOMParser();  // Xerces specific class
    for (int i = 0; i < args.length; i++) {
        
      try {
        // Read the entire document into memory
        parser.parse(args[i]); 
    
        org.w3c.dom.Document domDoc  = parser.getDocument();
        org.jdom.Document    jdomDoc = builder.build(domDoc);

        // If there are no validity errors, 
        // then no exception is thrown
        System.out.println(args[i] + " is valid.");
      }
      catch (Exception e) { // indicates a well-formedness or validity error
        System.out.println(args[i] + " is not valid.");
        System.out.println(e.getMessage());
      }
      
    }   
  
  }

}

Reading XML Documents

One program, three implementations:


UserLand's RSS based list of Web logs

Full list

Goal: Return a list of all the URLs in this list as java.net.URL objects

Design Decisions


The SAX ContentHandler interface

package org.xml.sax;


public interface ContentHandler {

  public void setDocumentLocator(Locator locator);
    
  public void startDocument() throws SAXException;
    
  public void endDocument() throws SAXException;
    
  public void startPrefixMapping(String prefix, String uri) 
   throws SAXException;

  public void endPrefixMapping(String prefix) throws SAXException;

  public void startElement(String namespaceURI, String localName,
   String rawName, Attributes atts) throws SAXException;

  public void endElement(String namespaceURI, String localName,
   String qualifiedName) throws SAXException;

  public void characters(char[] ch, int start, int length) 
   throws SAXException;

  public void ignorableWhitespace(char[] ch, int start, int length)
   throws SAXException;

  public void processingInstruction(String target, String data)
   throws SAXException;

  public void skippedEntity(String name) throws SAXException;
     
}

SAX Design


User Interface Class

import org.xml.sax.*;
import org.xml.sax.helpers.XMLReaderFactory;
import java.util.*;
import java.io.*;


public class WeblogsSAX {
     
  public static List listChannels() 
   throws IOException, SAXException {
    return listChannels(
     "http://static.userland.com/weblogMonitor/logs.xml"); 
  }
  
  public static List listChannels(String uri) 
   throws IOException, SAXException {
    
    XMLReader parser = XMLReaderFactory.createXMLReader();
    Vector urls = new Vector(1000);
    URIGrabber u = new URIGrabber(urls);
    parser.setContentHandler(u);
    parser.parse(uri);
    return urls;
    
  }
  
  public static void main(String[] args) {
   
    try {
      List urls;
      if (args.length > 0) urls = listChannels(args[0]);
      else urls = listChannels();
      Iterator iterator = urls.iterator();
      while (iterator.hasNext()) {
        System.out.println(iterator.next()); 
      }
    }
    catch (IOException e) {
      System.err.println(e); 
    }
    catch (SAXParseException e) {
      System.err.println(e); 
      System.err.println("at line " + e.getLineNumber() 
       + ", column " + e.getColumnNumber()); 
    }
    catch (SAXException e) {
      System.err.println(e); 
    }
    catch (/* Unexpected */ Exception e) {
      e.printStackTrace(); 
    }
    
  }
  
}

ContentHandler Class

import org.xml.sax.*;
import java.net.*;
import java.util.Vector;

             // conflicts with java.net.ContentHandler
class URIGrabber implements org.xml.sax.ContentHandler {
    
  private Vector urls;
     
  URIGrabber(Vector urls) {
    this.urls = urls;
  }
    
  // do nothing methods  
  public void setDocumentLocator(Locator locator) {}
  public void startDocument() throws SAXException {}
  public void endDocument() throws SAXException {}
  public void startPrefixMapping(String prefix, String uri) 
   throws SAXException {}
  public void endPrefixMapping(String prefix) throws SAXException {}
  public void skippedEntity(String name) throws SAXException {}  
  public void ignorableWhitespace(char[] text, int start, int length)
   throws SAXException {}
  public void processingInstruction(String target, String data)
   throws SAXException {}
  
  
  // Remember, there's no guarantee all the text of the
  // url element will be returned in a single call to characters
  private StringBuffer urlBuffer;
  private boolean collecting = false;
  
  public void startElement(String namespaceURI, String localName,
   String rawName, Attributes atts) throws SAXException {
	  
    if (rawName.equals("url")) {
      collecting = true;
      urlBuffer = new StringBuffer();
    } 
    
  }
  
  public void characters(char[] text, int start, int length) 
   throws SAXException {
    
    if (collecting) {
      urlBuffer.append(text, start, length);
    } 
    
  }
  
  public void endElement(String namespaceURI, String localName,
   String rawName) throws SAXException {
	  
    if (rawName.equals("url")) {
      collecting = false;
      String url = urlBuffer.toString();
      try {
        urls.addElement(new URL(url));
      }
      catch (MalformedURLException e) {
        // skip this url
      }
    }
    
  } 
    
}

Weblogs Output

% java Weblogs shortlogs.xml
http://www.mozillazine.org
http://www.salonherringwiredfool.com/
http://www.slashdot.org/

Weblogs with DOM


DOM Design


The Node Interface

package org.w3c.dom;

public interface Node {

  // NodeType
  public static final short ELEMENT_NODE                = 1;
  public static final short ATTRIBUTE_NODE              = 2;
  public static final short TEXT_NODE                   = 3;
  public static final short CDATA_SECTION_NODE          = 4;
  public static final short ENTITY_REFERENCE_NODE       = 5;
  public static final short ENTITY_NODE                 = 6;
  public static final short PROCESSING_INSTRUCTION_NODE = 7;
  public static final short COMMENT_NODE                = 8;
  public static final short DOCUMENT_NODE               = 9;
  public static final short DOCUMENT_TYPE_NODE          = 10;
  public static final short DOCUMENT_FRAGMENT_NODE      = 11;
  public static final short NOTATION_NODE               = 12;

  public String       getNodeName();
  public String       getNodeValue() throws DOMException;
  public void         setNodeValue(String nodeValue) throws DOMException;
  public short        getNodeType();
  public Node         getParentNode();
  public NodeList     getChildNodes();
  public Node         getFirstChild();
  public Node         getLastChild();
  public Node         getPreviousSibling();
  public Node         getNextSibling();
  public NamedNodeMap getAttributes();
  public Document     getOwnerDocument();
  public Node         insertBefore(Node newChild, Node refChild) throws DOMException;
  public Node         replaceChild(Node newChild, Node oldChild) throws DOMException;
  public Node         removeChild(Node oldChild) throws DOMException;
  public Node         appendChild(Node newChild) throws DOMException;
  public boolean      hasChildNodes();
  public Node         cloneNode(boolean deep);
  public void         normalize();
  public boolean      supports(String feature, String version);
  public String       getNamespaceURI();
  public String       getPrefix();
  public void         setPrefix(String prefix) throws DOMException;
  public String       getLocalName();
  
}

The NodeIterator Interface

package org.w3c.dom.traversal;

public interface NodeIterator {

  public Node       nextNode()     throws DOMException;
  public Node       previousNode() throws DOMException;
  public int        getWhatToShow();
  public NodeFilter getFilter();
  public boolean    getExpandEntityReferences();
  public void       detach();
    
}

The NodeFilter Interface

package org.w3c.dom.traversal;

public interface NodeFilter {

    // Constants returned by acceptNode
    public static final short   FILTER_ACCEPT = 1;
    public static final short   FILTER_REJECT = 2;
    public static final short   FILTER_SKIP   = 3;

    public short acceptNode(Node n);
    
    // Constants for whatToShow
    public static final int     SHOW_ALL                    = 0x0000FFFF;
    public static final int     SHOW_ELEMENT                = 0x00000001;
    public static final int     SHOW_ATTRIBUTE              = 0x00000002;
    public static final int     SHOW_TEXT                   = 0x00000004;
    public static final int     SHOW_CDATA_SECTION          = 0x00000008;
    public static final int     SHOW_ENTITY_REFERENCE       = 0x00000010;
    public static final int     SHOW_ENTITY                 = 0x00000020;
    public static final int     SHOW_PROCESSING_INSTRUCTION = 0x00000040;
    public static final int     SHOW_COMMENT                = 0x00000080;
    public static final int     SHOW_DOCUMENT               = 0x00000100;
    public static final int     SHOW_DOCUMENT_TYPE          = 0x00000200;
    public static final int     SHOW_DOCUMENT_FRAGMENT      = 0x00000400;
    public static final int     SHOW_NOTATION               = 0x00000800;

}

Weblogs with DOM

import org.w3c.dom.*;
import org.w3c.dom.traversal.*;
import org.xml.sax.SAXException;
import java.io.IOException;
import java.util.*;
import java.net.*;


public class WeblogsDOM {

  public static String DEFAULT_URL 
   = "http://static.userland.com/weblogMonitor/logs.xml"; 
     
  public static List listChannels() throws DOMException {
    return listChannels(DEFAULT_URL); 
  }
  
  public static List listChannels(String uri) throws DOMException {
    
    if (uri == null) {
      throw new NullPointerException("URL must be non-null");   
    }

    org.apache.xerces.parsers.DOMParser parser 
     = new org.apache.xerces.parsers.DOMParser();
    
    Vector urls = null;
    
    try {
      // Read the entire document into memory
      parser.parse(uri); 
      Document doc = parser.getDocument();
      org.apache.xerces.dom.DocumentImpl impl 
       = (org.apache.xerces.dom.DocumentImpl) doc;
      NodeIterator iterator = impl.createNodeIterator(doc, 
       NodeFilter.SHOW_ALL, new URLFilter(), true);
      urls = new Vector(100);

      Node current = null;
      while ((current = iterator.nextNode()) != null) {
        try {
          String content = current.getNodeValue();
          URL u = new URL(content);
          urls.addElement(u);
        }
        catch (MalformedURLException e) {
          // bad input data from one third party; just ignore it 
        }
      }
    }
    catch (SAXException e) {
      System.err.println(e); 
    }
    catch (IOException e) {
      System.err.println(e); 
    }
    
    return urls;
    
  }
  
  static class URLFilter implements NodeFilter {
        
    public short acceptNode(Node n) {
      
      if (n instanceof Text) {
        Node parent = n.getParentNode();
        if (parent instanceof Element) {
          Element e = (Element) parent;
          if (e.getTagName().equals("url")) {
            return NodeFilter.FILTER_ACCEPT;       
          }
        }
      }
      
      return NodeFilter.FILTER_REJECT;
      
    }
    
  }
    
  public static void main(String[] args) {
     
    try {
      List urls;
      if (args.length > 0) {
        try {
          URL url = new URL(args[0]);
          urls = listChannels(args[0]);
        }
        catch (MalformedURLException e) {
          System.err.println("Usage: java WeblogsJDOM url");
          return;
        }
      }
      else {
        urls = listChannels();
      }
      Iterator iterator = urls.iterator();
      while (iterator.hasNext()) {
        System.out.println(iterator.next()); 
      }
    }
    catch (/* Unexpected */ Exception e) {
      e.printStackTrace(); 
    }
    
  } // end main

}

Weblogs Output

% java WeblogsDOM
http://2020Hindsight.editthispage.com/
http://www.sff.net/people/mitchw/weblog/weblog.htp
http://nate.weblogs.com/
http://plugins.launchpoint.net
http://404.psistorm.net
http://home.att.net/~geek9000
http://daubnet.tzo.com/weblog
several hundred more...

Weblogs with JDOM


JDOM Design


Weblogs with JDOM

import org.jdom.*;
import org.jdom.input.SAXBuilder;
import java.util.*;
import java.net.*;


public class WeblogsJDOM {
   
  public static String DEFAULT_SYSTEM_ID 
   = "http://static.userland.com/weblogMonitor/logs.xml"; 
     
  public static List listChannels() throws JDOMException {
    return listChannels(DEFAULT_SYSTEM_ID); 
  }
  
  public static List listChannels(String systemID) 
   throws JDOMException, NullPointerException {
    
    if (systemID == null) {
      throw new NullPointerException("URL must be non-null");   
    }
    
    SAXBuilder builder = new SAXBuilder();
    // Load the entire document into memory 
    // from the network or file system
    Document doc = builder.build(systemID);
    
    // Descend the tree and find the URLs. It helps that
    // the document has a very regular structure.
    Element weblogs = doc.getRootElement();
    List logs = weblogs.getChildren("log");
    Vector urls = new Vector(logs.size());
    Iterator iterator = logs.iterator();
    while (iterator.hasNext()) {
      Object o = iterator.next();
      Element log = (Element) o;
      try {
                         // This will probably be changed to 
                         //  getElement() or getChildElement() 
        Element url = log.getChild("url"); 
        if (url == null) continue;
        String content = url.getTextTrim();
        URL u = new URL(content);
        urls.addElement(u);
      }
      catch (MalformedURLException e) {
        // bad input data from one third party; just ignore it 
      }
    }
    return urls;
    
  }
  
  public static void main(String[] args) {
   
    try {
      List urls;
      if (args.length > 0) {
        urls = listChannels(args[0]);
      }
      else {
        urls = listChannels();
      }
      Iterator iterator = urls.iterator();
      while (iterator.hasNext()) {
        System.out.println(iterator.next()); 
      }
    }
    catch (/* Unexpected */ Exception e) {
      e.printStackTrace(); 
    }
    
  }
  
}

Weblogs Output

% java WeblogsJDOM
http://2020Hindsight.editthispage.com/
http://www.sff.net/people/mitchw/weblog/weblog.htp
http://nate.weblogs.com/
http://plugins.launchpoint.net
http://404.psistorm.net
http://home.att.net/~geek9000
http://daubnet.tzo.com/weblog
several hundred more...

The org.jdom Package

The classes that represent an XML document and its parts


The Document Node


The Document Class

package org.jdom;

public class Document implements Serializable, Cloneable {

  protected List    content;
  protected Element rootElement;
  protected DocType docType;

  protected Document() {}
  public    Document(Element rootElement) {}
  public    Document(Element rootElement, DocType docType) {}

  public Element   getRootElement() {}
  public Document  setRootElement(Element rootElement) {}
  public DocType   getDocType() {}
  public Document  setDocType(DocType docType) {}
  public List      getProcessingInstructions() {}
  public List      getProcessingInstructions(String target) {}
  public ProcessingInstruction getProcessingInstruction(String target)
    throws NoSuchProcessingInstructionException {}
  public Document  addProcessingInstruction(ProcessingInstruction pi) {}
  public Document  addProcessingInstruction(String target, String data) {}
  public Document  addProcessingInstruction(String target, Map data) {}
  public Document  setProcessingInstructions(List processingInstructions) {}
  public boolean   removeProcessingInstruction(ProcessingInstruction processingInstruction) {}
  public boolean   removeProcessingInstruction(String target) {}
  public boolean   removeProcessingInstructions(String target) {}
  public Document  addComment(Comment comment) {}
  public List      getMixedContent() {}
  
  // basic utility methods
  public final String  toString() {}
  public final String  getSerializedForm() {}  // going away
  public final boolean equals(Object ob) {}
  public final int     hashCode() {}
  public final Object  clone() {}

}

Document Example

import org.jdom.Document;
import org.jdom.JDOMException;
import org.jdom.input.SAXBuilder;
import org.jdom.output.XMLOutputter;
import java.io.IOException;


public class XMLPrinter {

  public static void main(String[] args) {
  
    if (args.length == 0) {
      System.out.println("Usage: java XMLPrinter URL1 URL2..."); 
    } 
      
    SAXBuilder builder = new SAXBuilder();
     
    // start parsing... 
    for (int i = 0; i < args.length; i++) {
      
      // command line should offer URIs or file names
      try {
        Document doc = builder.build(args[i]);
        System.out.println("*************" + args[i] + "*************");
        XMLOutputter outputter = new XMLOutputter();
        outputter.output(doc, System.out);
      }
      catch (JDOMException e) { // indicates a well-formedness or other error
        System.out.println(args[i] + " is not well formed.");
        System.out.println(e.getMessage());
      }
      catch (IOException e) { // shouldn't happen beacuse System.out eats exceptions
        System.out.println(e.getMessage());
      }
      
    }   
  
  }

}

Output from XMLPrinter

% java XMLPrinter shortlogs.xml
*************shortlogs.xml*************
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE foo SYSTEM "http://msdn.microsoft.com/xml/general/htmlentities.dtd"><weblogs>
        <log>
                <name>MozillaZine</name>
                <url>http://www.mozillazine.org</url>
                <changesUrl>http://www.mozillazine.org/contents.rdf</changesUrl>

                <ownerName>Jason Kersey</ownerName>
                <ownerEmail>kerz@en.com</ownerEmail>
                <description>THE source for news on the Mozilla Organization.  DevChats, Reviews, Chats, Builds, Demos, Screenshots, and more.</description>
                <imageUrl />
                <adImageUrl>http://static.userland.com/weblogMonitor/ads/kerz@en.com.gif</adImageUrl>
                </log>
        <log>
                <name>SalonHerringWiredFool</name>
                <url>http://www.salonherringwiredfool.com/</url>
                <ownerName>Some Random Herring</ownerName>
                <ownerEmail>salonfool@wiredherring.com</ownerEmail>
                <description />
                </log>
        <log>
                <name>SlashDot.Org</name>
                <url>http://www.slashdot.org/</url>
                <ownerName>Simply a friend</ownerName>
                <ownerEmail>afriendofweblogs@weblogs.com</ownerEmail>
                <description>News for Nerds, Stuff that Matters.</description>
                </log>
        </weblogs>

Element Nodes


Element Class Implementation


The Element Class

package org.jdom;

public class Element implements Serializable, Cloneable {

    protected String    name;
    protected Namespace namespace;
    protected Element   parent;
    protected boolean   isRootElement;
    protected List      attributes;
    protected List      content;

    protected Element() {}
    public    Element(String name, Namespace namespace) {}
    public    Element(String name) {}
    public    Element(String name, String uri) {}
    public    Element(String name, String prefix, String uri) {}

    public String     getName() {}
    public Namespace  getNamespace() {}
    public String     getNamespacePrefix() {}
    public String     getNamespaceURI() {}
    public String     getQualifiedName() {}
    
    public Element    getParent() {}
    protected Element setParent(Element parent) {}
    public boolean    isRootElement() {}
    protected Element setIsRootElement(boolean isRootElement) {}
    

    public String    getText() {} 
    public String    getTextTrim() {} 
    public boolean   hasMixedContent() {} 
    public List      getMixedContent() {}
    public String    getChildText(String name) {} 
    public String    getChildTextTrim(String name) {} 
    public String    getChildText(String name, Namespace ns) {} 

    public Element   setMixedContent(List mixedContent) {} 
    public List      getChildren() {} 
    public Element   setChildren(List children) {} 
    public List      getChildren(String name, Namespace ns) {} 
    // will be renamed, probably getElement() {}
    public Element   getChild(String name, Namespace ns) {} 
    public Element   getChild(String name) {} 
    public boolean   removeChild(String name) {} 
    public boolean   removeChild(String name, Namespace ns) {} 
    public boolean   removeChildren(String name) {}
    public boolean   removeChildren(String name, Namespace ns) {} 
    public boolean   removeChildren() {} 
    
    public Element   addContent(String text) {}
    public Element   addContent(Element element) {} 
    public Element   addContent(ProcessingInstruction pi) {} 
    public Element   addContent(Entity entity) {} 
    public Element   addContent(Comment comment) {} 
    public Element   addContent(CDATA cdata) {} 
    public boolean   removeContent(Element element) {} 
    public boolean   removeContent(ProcessingInstruction pi) {} 
    public boolean   removeContent(Entity entity) {} 
    public boolean   removeContent(Comment comment) {} 
    
    public List      getAttributes() {} 
    public Attribute getAttribute(String name) {} 
    public Attribute getAttribute(String name, Namespace ns) {} 
    public String    getAttributeValue(String name) {} 
    public String    getAttributeValue(String name, Namespace ns) {} 
    public Element   setAttributes(List attributes) {} 
    public Element   addAttribute(Attribute attribute) {}
    public Element   addAttribute(String name, String value) {} 
    public boolean   removeAttribute(String name, String uri) {} 
    public boolean   removeAttribute(String name) {} 
    public boolean   removeAttribute(String name, Namespace ns) {} 
    
    public Element   getCopy(String name, Namespace ns) {}
    public Element   getCopy(String name, String uri) {}
    public Element   getCopy(String name, String prefix, String uri) {}

    /////////////////////////////////////////////////////////////////
    // Basic Utility Methods
    ///////////////////////////////////////////////////////////////// 
    public final String  toString() {}
    public final String  getSerializedForm() {}  // will be removed
    public final boolean equals(Object ob) {}
    public final int     hashCode() {}
    public final Object  clone() {}
    
}

Element Example: XCount

import org.jdom.*;
import org.jdom.input.SAXBuilder;
import java.util.*;


public class XCount {

  public static void main(String[] args) {
  
    if (args.length == 0) {
      System.out.println("Usage: java XCount URL1 URL2..."); 
    } 
      
    SAXBuilder builder = new SAXBuilder();
     
    System.out.println(
     "File\tElements\tAttributes\tComments\tProcessing Instructions\tCharacters");
    // start parsing... 
    for (int i = 0; i < args.length; i++) {
      
      // command line should offer URIs or file names
      try {
        Document doc = builder.build(args[i]);
        System.out.print(args[i] + ":\t");
        String result = count(doc);
        System.out.println(result);
      }
      catch (JDOMException e) { // indicates a well-formedness or other error
        System.out.println(args[i] + " is not a well formed XML document.");
        System.out.println(e.getMessage());
      }
      
    }   
  
  }  

  private static int numCharacters             = 0;
  private static int numComments               = 0;
  private static int numElements               = 0;
  private static int numAttributes             = 0;
  private static int numProcessingInstructions = 0;
      
  public static String count(Document doc) {

    numCharacters = 0;
    numComments = 0;
    numElements = 0;
    numAttributes = 0;
    numProcessingInstructions = 0;  

    List children = doc.getMixedContent();
    Iterator iterator = children.iterator();
    while (iterator.hasNext()) {
      Object o = iterator.next();
      if (o instanceof Element) {
        numElements++;
        count((Element) o);
      }
      else if (o instanceof Comment) numComments++;
      else if (o instanceof ProcessingInstruction) numProcessingInstructions++;   
    }
    
    String result = numElements + "\t" + numAttributes + "\t" 
     + numComments + "\t" + numProcessingInstructions + "\t" + numCharacters;
    return result;
       
  }     

  public static void count(Element element) {

    List attributes = element.getAttributes();
    numAttributes += attributes.size();
    List children = element.getMixedContent();
    Iterator iterator = children.iterator();
    while (iterator.hasNext()) {
      Object o = iterator.next();
      if (o instanceof Element) {
        numElements++;
        count((Element) o);
      }
      else if (o instanceof Comment) numComments++;
      else if (o instanceof ProcessingInstruction) numProcessingInstructions++;   
      else if (o instanceof String) {
        String s = (String) o;
        numCharacters += s.length();
      }   
    }
        
  }  

}

XCount Output

% java XCount shortlogs.xml hotcop.xml
File    Elements        Attributes      Comments        Processing Instructions
Characters
shortlogs.xml:  30      0       0       0       736
hotcop.xml:     11      8       2       1       95

Handling Attributes in JDOM


The Attribute Class

package org.jdom;

public class Attribute implements Serializable, Cloneable {

    protected String    name;
    protected Namespace namespace;
    protected String    value;

    protected Attribute() {}
    public    Attribute(String name, String value, Namespace namespace) {}
    public    Attribute(String name, String prefix, String uri, String value) {}
    public    Attribute(String name, String value) {}

    public String    getName() {}
    public String    getQualifiedName() {}
    public String    getNamespacePrefix() {}
    public String    getNamespaceURI() {}
    public Namespace getNamespace() {}
    public String    getValue() {}
    public void      setValue(String value) {}

    /////////////////////////////////////////////////////////////////
    // Basic Utility Methods
    /////////////////////////////////////////////////////////////////

    public final String  toString() {}
    public final boolean equals(Object ob) {}
    public final int     hashCode() {}
    public final Object  clone() {}

    /////////////////////////////////////////////////////////////////
    // Convenience Methods below here
    /////////////////////////////////////////////////////////////////

    public String  getValue(String defaultValue) {}
    public int     getIntValue(int defaultValue) {}
    public int     getIntValue() throws DataConversionException {}
    public long    getLongValue(long defaultValue) {}
    public long    getLongValue() throws DataConversionException {}
    public float   getFloatValue(float defaultValue) {}
    public float   getFloatValue() throws DataConversionException {}
    public double  getDoubleValue(double defaultValue) {}
    public double  getDoubleValue() throws DataConversionException {}
    public boolean getBooleanValue(boolean defaultValue) {}
    public boolean getBooleanValue() throws DataConversionException {}
    public char    getCharValue(char defaultValue) {}
    public char    getCharValue() throws DataConversionException {}

}

IDTagger

import java.io.IOException;
import org.jdom.*;
import org.jdom.input.SAXBuilder;
import org.jdom.output.XMLOutputter;
import java.util.*;


public class IDTagger {

  private static int id = 1;

  public static void processElement(Element element) {

    if (element.getAttribute("ID") == null) {
      element.addAttribute(new Attribute("ID", "_" + id));
      id = id + 1; 
    }
    
    // recursion
    List children = element.getChildren();
    Iterator iterator = children.iterator();
    while (iterator.hasNext()) {
      processElement((Element) iterator.next());   
    }
    
  }

  public static void main(String[] args) {
     
    SAXBuilder builder = new SAXBuilder();
    
    for (int i = 0; i < args.length; i++) {
        
      try {
        // Read the entire document into memory
        Document document = builder.build(args[i]); 
       
        processElement(document.getRootElement());
        
        // now we serialize the document...
        XMLOutputter serializer = new XMLOutputter(); 
        serializer.output(document, System.out);
        System.out.flush();	        
      }
      catch (JDOMException e) {
        System.err.println(e);
        continue; 
      }
      catch (IOException e) {
        System.err.println(e);
        continue; 
      }
      
    }
  
  } // end main

}

Before IDTagger

<?xml version="1.0"?><backslash
xmlns:backslash="http://slashdot.org/backslash.dtd">

 <story>
    <title>The Onion to buy the New York Times</title>
    <url>http://slashdot.org/articles/00/02/19/1128240.shtml</url>
    <time>2000-02-19 17:25:15</time>
    <author>CmdrTaco</author>
    <department>stuff-to-read</department>
    <topic>media</topic>
    <comments>20</comments>
    <section>articles</section>
    <image>topicmedia.gif</image>
  </story>
 <story>
    <title>Al Gore's Webmaster Answers Your Questions</title>
    <url>http://slashdot.org/interviews/00/02/19/0932207.shtml</url>
    <time>2000-02-19 17:00:52</time>
    <author>Roblimo</author>
    <department>political-process-online</department>
    <topic>usa</topic>
    <comments>49</comments>
    <section>interviews</section>
    <image>topicus.gif</image>
  </story>
 <story>
    <title>Open Source Africa</title>
    <url>http://slashdot.org/articles/00/02/19/1016216.shtml</url>
    <time>2000-02-19 16:05:58</time>
    <author>emmett</author>
    <department>songs-by-toto</department>
    <topic>linux</topic>
    <comments>50</comments>
    <section>articles</section>
    <image>topiclinux.gif</image>
  </story>
 <story>
    <title>Microsoft Funded by NSA, Helps Spy on Win Users?</title>
    <url>http://slashdot.org/articles/00/02/19/0750247.shtml</url>
    <time>2000-02-19 14:07:04</time>
    <author>Roblimo</author>
    <department>deep-dark-conspiracy-theories</department>
    <topic>microsoft</topic>
    <comments>154</comments>
    <section>articles</section>
    <image>topicms.gif</image>
  </story>
 <story>
    <title>X-Men Trailer Released</title>
    <url>http://slashdot.org/articles/00/02/18/0829209.shtml</url>
    <time>2000-02-19 13:47:06</time>
    <author>emmett</author>
    <department>mutant</department>
    <topic>movies</topic>
    <comments>70</comments>
    <section>articles</section>
    <image>topicmovies.gif</image>
  </story>
 <story>
    <title>Connell Replies to "Grok" Comments</title>
    <url>http://slashdot.org/articles/00/02/18/202240.shtml</url>
    <time>2000-02-19 05:01:37</time>
    <author>Hemos</author>
    <department>replying-to-things</department>
    <topic>linux</topic>
    <comments>197</comments>
    <section>articles</section>
    <image>topiclinux.gif</image>
  </story>
 <story>
    <title>etoy.com Returns</title>
    <url>http://slashdot.org/yro/00/02/18/1739216.shtml</url>
    <time>2000-02-19 02:35:06</time>
    <author>nik</author>
    <department>NP:-gimme-shelter</department>
    <topic>internet</topic>
    <comments>77</comments>
    <section>yro</section>
    <image>topicinternet.jpg</image>
  </story>
 <story>
    <title>New Propaganda Series: Rebirth</title>
    <url>http://slashdot.org/articles/00/02/18/205232.shtml</url>
    <time>2000-02-19 01:05:26</time>
    <author>Hemos</author>
    <department>as-pretty-as-always</department>
    <topic>graphics</topic>
    <comments>120</comments>
    <section>articles</section>
    <image>topicgraphics3.gif</image>
  </story>
 <story>
    <title>Giving Back</title>
    <url>http://slashdot.org/features/00/02/18/1631224.shtml</url>
    <time>2000-02-18 22:27:26</time>
    <author>emmett</author>
    <department>salvation-army</department>
    <topic>news</topic>
    <comments>122</comments>
    <section>features</section>
    <image>topicnews.gif</image>
  </story>
 <story>
    <title>Connectix Considering Open Sourcing VGS?</title>
    <url>http://slashdot.org/articles/00/02/18/1050225.shtml</url>
    <time>2000-02-18 20:46:20</time>
    <author>emmett</author>
    <department>grain-of-salt</department>
    <topic>news</topic>
    <comments>93</comments>
    <section>articles</section>
    <image>topicnews.gif</image>
  </story>
</backslash>
View Input in Browser

After IDTagger

<?xml version="1.0" encoding="UTF-8"?>
<backslash ID="_1">
  <story ID="_2">
    <title ID="_3">The Onion to buy the New York Times</title>
    <url ID="_4">http://slashdot.org/articles/00/02/19/1128240.shtml</url>
    <time ID="_5">2000-02-19 17:25:15</time>
    <author ID="_6">CmdrTaco</author>
    <department ID="_7">stuff-to-read</department>
    <topic ID="_8">media</topic>
    <comments ID="_9">20</comments>
    <section ID="_10">articles</section>
    <image ID="_11">topicmedia.gif</image>
  </story>
  <story ID="_12">
    <title ID="_13">Al Gore's Webmaster Answers Your Questions</title>
    <url ID="_14">http://slashdot.org/interviews/00/02/19/0932207.shtml</url>
    <time ID="_15">2000-02-19 17:00:52</time>
    <author ID="_16">Roblimo</author>
    <department ID="_17">political-process-online</department>
    <topic ID="_18">usa</topic>
    <comments ID="_19">49</comments>
    <section ID="_20">interviews</section>
    <image ID="_21">topicus.gif</image>
  </story>
  <story ID="_22">
    <title ID="_23">Open Source Africa</title>
    <url ID="_24">http://slashdot.org/articles/00/02/19/1016216.shtml</url>
    <time ID="_25">2000-02-19 16:05:58</time>
    <author ID="_26">emmett</author>
    <department ID="_27">songs-by-toto</department>
    <topic ID="_28">linux</topic>
    <comments ID="_29">50</comments>
    <section ID="_30">articles</section>
    <image ID="_31">topiclinux.gif</image>
  </story>
  <story ID="_32">
    <title ID="_33">Microsoft Funded by NSA, Helps Spy on Win Users?</title>
    <url ID="_34">http://slashdot.org/articles/00/02/19/0750247.shtml</url>
    <time ID="_35">2000-02-19 14:07:04</time>
    <author ID="_36">Roblimo</author>
    <department ID="_37">deep-dark-conspiracy-theories</department>
    <topic ID="_38">microsoft</topic>
    <comments ID="_39">154</comments>
    <section ID="_40">articles</section>
    <image ID="_41">topicms.gif</image>
  </story>
  <story ID="_42">
    <title ID="_43">X-Men Trailer Released</title>
    <url ID="_44">http://slashdot.org/articles/00/02/18/0829209.shtml</url>
    <time ID="_45">2000-02-19 13:47:06</time>
    <author ID="_46">emmett</author>
    <department ID="_47">mutant</department>
    <topic ID="_48">movies</topic>
    <comments ID="_49">70</comments>
    <section ID="_50">articles</section>
    <image ID="_51">topicmovies.gif</image>
  </story>
  <story ID="_52">
    <title ID="_53">Connell Replies to "Grok" Comments</title>
    <url ID="_54">http://slashdot.org/articles/00/02/18/202240.shtml</url>
    <time ID="_55">2000-02-19 05:01:37</time>
    <author ID="_56">Hemos</author>
    <department ID="_57">replying-to-things</department>
    <topic ID="_58">linux</topic>
    <comments ID="_59">197</comments>
    <section ID="_60">articles</section>
    <image ID="_61">topiclinux.gif</image>
  </story>
  <story ID="_62">
    <title ID="_63">etoy.com Returns</title>
    <url ID="_64">http://slashdot.org/yro/00/02/18/1739216.shtml</url>
    <time ID="_65">2000-02-19 02:35:06</time>
    <author ID="_66">nik</author>
    <department ID="_67">NP:-gimme-shelter</department>
    <topic ID="_68">internet</topic>
    <comments ID="_69">77</comments>
    <section ID="_70">yro</section>
    <image ID="_71">topicinternet.jpg</image>
  </story>
  <story ID="_72">
    <title ID="_73">New Propaganda Series: Rebirth</title>
    <url ID="_74">http://slashdot.org/articles/00/02/18/205232.shtml</url>
    <time ID="_75">2000-02-19 01:05:26</time>
    <author ID="_76">Hemos</author>
    <department ID="_77">as-pretty-as-always</department>
    <topic ID="_78">graphics</topic>
    <comments ID="_79">120</comments>
    <section ID="_80">articles</section>
    <image ID="_81">topicgraphics3.gif</image>
  </story>
  <story ID="_82">
    <title ID="_83">Giving Back</title>
    <url ID="_84">http://slashdot.org/features/00/02/18/1631224.shtml</url>
    <time ID="_85">2000-02-18 22:27:26</time>
    <author ID="_86">emmett</author>
    <department ID="_87">salvation-army</department>
    <topic ID="_88">news</topic>
    <comments ID="_89">122</comments>
    <section ID="_90">features</section>
    <image ID="_91">topicnews.gif</image>
  </story>
  <story ID="_92">
    <title ID="_93">Connectix Considering Open Sourcing VGS?</title>
    <url ID="_94">http://slashdot.org/articles/00/02/18/1050225.shtml</url>
    <time ID="_95">2000-02-18 20:46:20</time>
    <author ID="_96">emmett</author>
    <department ID="_97">grain-of-salt</department>
    <topic ID="_98">news</topic>
    <comments ID="_99">93</comments>
    <section ID="_100">articles</section>
    <image ID="_101">topicnews.gif</image>
  </story>
</backslash>
View Output in Browser

Handling Entities in JDOM


The Entity Class

package org.jdom;

public class Entity implements Serializable, Cloneable {

    protected String name;
    protected List   content;

    protected Entity() {}
    public    Entity(String name) {}
    
    public String  getName() {}
    public String  getContent() {}
    public Entity  setContent(String textContent) {}
    public boolean hasMixedContent() {}
    public List    getMixedContent() {}
    public Entity  setMixedContent(List mixedContent) {}
    public List    getChildren() {}
    public Entity  setChildren(List children) {}
    public Entity  addChild(Element element) {}
    public Entity  addChild(String s) {}
    public Entity  addText(String text) {}
    
    public final String  toString() {}
    public final String  getSerializedForm() {} // will be removed
    public final boolean equals(Object ob) {}
    public final int     hashCode() {}
    public final Object  clone() {}

}

Handling Comments in JDOM


The Comment Class

package org.jdom;

public class Comment implements Serializable, Cloneable {

    protected String text;

    protected Comment() {}
    public    Comment(String text) {}
    
    public String getText() {}
    public void   setText(String text) {}
    
    public final String  toString() {}
    public final String  getSerializedForm() {} // will be removed
    public final boolean equals(Object ob) {}
    public final int     hashCode() {}
    public final Object  clone() {}

}

Comment Example

import org.jdom.*;
import org.jdom.input.SAXBuilder;
import java.util.*;


public class CommentReader {

  public static void main(String[] args) {
     
    SAXBuilder builder = new SAXBuilder();
    
    for (int i = 0; i < args.length; i++) {
      try {
        // Read the entire document into memory
        Document doc = builder.build(args[i]);
        List content = doc.getMixedContent();
        Iterator iterator = content.iterator();
        while (iterator.hasNext()) {
          Object o = iterator.next();
          if (o instanceof Comment) {
            Comment c = (Comment) o;
            System.out.println(c.getText());     
            System.out.println();     
          }
          else if (o instanceof Element) {
            processElement((Element) o);   
          }
        }
      }
      catch (JDOMException e) {
        System.err.println(e); 
        e.getRootCause().printStackTrace(); 
      }
      
    }
  
  } // end main

  // note use of recursion
  public static void processElement(Element element) {
    
    List content = element.getMixedContent();
    Iterator iterator = content.iterator();
    while (iterator.hasNext()) {
      Object o = iterator.next();
      if (o instanceof Comment) {
        Comment c = (Comment) o;
        System.out.println(c.getText());     
        System.out.println();     
      }
      else if (o instanceof Element) {
        processElement((Element) o);   
      }
    } // end while
    
  }

}

CommentReader Output

% java CommentReader hotcop.xml
 The publisher is actually Polygram but I needed
       an example of a general entity reference.

 You can tell what album I was
     listening to when I wrote this example

ProcessingInstruction Nodes


The ProcessingInstruction Class

package org.jdom;

public class ProcessingInstruction implements Serializable, Cloneable {

    protected String target;
    protected String rawData;
    protected Map    mapData;

    protected ProcessingInstruction() {}
    public    ProcessingInstruction(String target, Map data) {}
    public    ProcessingInstruction(String target, String data) {}
    
    public String                getTarget() {}
    public String                getData() {}
    public ProcessingInstruction setData(String data) {}
    public ProcessingInstruction setData(Map data) {}
    public String                getValue(String name) {}
    public ProcessingInstruction setValue(String name, String value) {}
    public boolean               removeValue(String name) {}

    public final String toString() {}
    public final String getSerializedForm() {} // will be removed
    public final boolean equals(Object ob) {}
    public final int hashCode() {}
    public final Object clone() {}
}

XLinkSpider that Respects the robots Processing Instruction

import java.io.*;
import java.util.*;
import org.jdom.*;
import org.jdom.input.SAXBuilder;


public class XLinkSpider {

  private static SAXBuilder builder = new SAXBuilder();
  
  private static Vector visited = new Vector();
  
  private static int maxDepth = 5;
  private static int currentDepth = 0; 
  
  public static void listURIs(String systemID) {
    
    currentDepth++;
    try {
      if (currentDepth < maxDepth) {

        Document document = builder.build(systemID); 
                
        // check to see if we're allowed to spider
        boolean index = true;
        boolean follow = true;
        ProcessingInstruction robots 
         = document.getProcessingInstruction("robots");
        if (robots != null) {
          String indexValue = robots.getValue("index");
          if (indexValue.equalsIgnoreCase("no")) index = false;
          String followValue = robots.getValue("follow");
          if (followValue.equalsIgnoreCase("no")) follow = false;
        }
        Vector uris = new Vector();
        // search the document for uris, 
        // store them in vector, and print them
        if (follow) searchForURIs(document.getRootElement(), uris);
    
        Enumeration e = uris.elements();
        while (e.hasMoreElements()) {
          String uri = (String) e.nextElement();
          visited.addElement(uri);
          if (index) listURIs(uri); 
        }
      
      }
    
    }
    catch (JDOMException e) {
      // couldn't load the document, 
      // probably not well-formed XML, skip it 
    }
    finally { 
      currentDepth--;
      System.out.flush();     
    }
      
  }
  
  private static Namespace xlink = Namespace.getNamespace("http://www.w3.org/1999/xlink");
  
  // use recursion 
  public static void searchForURIs(Element element, Vector uris) {
    
    // look for XLinks in this element
    String uri = element.getAttributeValue("href", xlink);
    if (uri != null && !uri.equals("") && !visited.contains(uri) && !uris.contains(uri)) {
      System.out.println(uri);
      uris.addElement(uri);
    }
    
    // process child elements recursively
    List children = element.getChildren();
    Iterator iterator = children.iterator();
    while (iterator.hasNext()) {
      searchForURIs((Element) iterator.next(), uris); 
    }
    
  }

  public static void main(String[] args) {
    
    if (args.length == 0) {
      System.out.println("Usage: java XLinkSpider URL1 URL2..."); 
    } 
      
    // start parsing... 
    for (int i = 0; i < args.length; i++) {
      System.err.println(args[i]); 
      listURIs(args[i]);
    } // end for
  
  } // end main

} // end XLinkSpider

Handling Namespaces


The Namespace Class


The Namespace Class

package org.jdom;

public final class Namespace {

  public static final Namespace NO_NAMESPACE = new Namespace("", "");
  public static final Namespace XML_NAMESPACE = 
   new Namespace("xml", "http://www.w3.org/XML/1998/namespace");

  // factory methods
  public static Namespace getNamespace(String prefix, String uri) {}
  public static Namespace getNamespace(String uri) {}

  // getter methods
  public String  getPrefix() {}
  public String  getURI() {}

  // utility methods
  public boolean equals(Object ob) {}
  public String  toString() {}
  public int     hashCode() {}

}

DocType Nodes


The DocType class

package org.jdom;

public class DocType implements Serializable, Cloneable {

    protected String elementName;
    protected String publicID;
    protected String systemID;

    protected DocType() {}
    public    DocType(String rootElementName, String publicID, String systemID) {}
    public    DocType(String rootElementName, String systemID) {}
    public    DocType(String rootElementName) {}

    public String  getElementName() {}
    public String  getPublicID() {}
    public DocType setPublicID(String publicID) {}
    public String  getSystemID() {}
    public DocType setSystemID(String systemID) {}

    // Usual utility methods
    public final String  toString() {}
    public final String  getSerializedForm() {} // will be removed
    public final boolean equals(Object ob) {}
    public final int     hashCode() {}
    public final Object  clone() {}
    
}

Example of the DocType Class


XHTMLValidator

import java.io.*;
import org.jdom.*;
import org.jdom.input.SAXBuilder;


public class XHTMLValidator {

  public static void main(String[] args) {
    
    for (int i = 0; i < args.length; i++) {
      validate(args[i]);
    }   
    
  }

  private static SAXBuilder builder = new SAXBuilder(true);
                                                 /*  ^^^^ */
                                              /* turn on validation  */
  
  // not thread safe
  public static void validate(String source) {
        
      Document document;
      try {
        document = builder.build(source); 
      }
      catch (JDOMException e) {  
        System.out.println("Error: " + e.getMessage()); 
        e.printStackTrace();
        return; 
      }
      
      // If we get this far, then the document is valid XML.
      // Check to see whether the document is actually XHTML        
      DocType doctype = document.getDocType();
    
      if (doctype == null) {
        System.out.println("No DOCTYPE"); 
        return;
      }

      String name     = doctype.getElementName();
      String systemID = doctype.getSystemID();
      String publicID = doctype.getPublicID();
      
      if (!name.equals("html")) {
        System.out.println("Incorrect root element name " + name); 
      }
    
      if (publicID == null
       || (!publicID.equals("-//W3C//DTD XHTML 1.0 Strict//EN")
           && !publicID.equals("-//W3C//DTD XHTML 1.0 Transitional//EN")
           && !publicID.equals("-//W3C//DTD XHTML 1.0 Frameset//EN"))) {
        System.out.println(source + " does not seem to use an XHTML 1.0 DTD");
      }
    
      // Check the namespace on the root element
      Element root = document.getRootElement();
      Namespace namespace = root.getNamespace();
      String prefix = namespace.getPrefix();
      String uri = namespace.getURI();
      if (!uri.equals("http://www.w3.org/1999/xhtml")) {
        System.out.println(source 
         + " does not properly declare the"
         + " http://www.w3.org/1999/xhtml namespace"
         + " on the root element");        
      }
      if (!prefix.equals("")) {
        System.out.println(source 
         + " does not use the empty prefix for XHTML");        
      }
    
  }

}

Using the XHTMLValidator

% java XHTMLValidator http://www.w3.org/TR/xhtml1
Error: File "http://www.w3.org/TR/DTD/xhtml1-strict.dtd" not found.: Error on 
line -1 of XML document: File "http://www.w3.org/TR/DTD/xhtml1-strict.dtd" not 
found.
org.jdom.JDOMException: File "http://www.w3.org/TR/DTD/xhtml1-strict.dtd" not 
found.: Error on line -1 of XML document: File 
"http://www.w3.org/TR/DTD/xhtml1-strict.dtd" not found.
        at org.jdom.input.SAXBuilder.build(SAXBuilder.java:227)
        at org.jdom.input.SAXBuilder.build(SAXBuilder.java:359)
        at XHTMLValidator.validate(XHTMLValidator.java:25)
        at XHTMLValidator.main(XHTMLValidator.java:11)
Root cause: org.jdom.JDOMException: Error on line -1 of XML document: File 
"http://www.w3.org/TR/DTD/xhtml1-strict.dtd" not found.
        at org.jdom.input.SAXBuilder.build(SAXBuilder.java:228)
        at org.jdom.input.SAXBuilder.build(SAXBuilder.java:359)
        at XHTMLValidator.validate(XHTMLValidator.java:25)
        at XHTMLValidator.main(XHTMLValidator.java:11)

The Verifier Class


The Verifier Class

package org.jdom;

public final class Verifier {

    public static final String checkElementName(String name) {}
    public static final String checkAttributeName(String name) {}
    public static final String checkCharacterData(String text) {}
    public static final String checkNamespacePrefix(String prefix) {}
    public static final String checkNamespaceURI(String uri) {}
    public static final String checkProcessingInstructionTarget(String target) {}
    public static final String checkCommentData(String data) {}
 
    public static boolean isXMLCharacter(char c) {}
    public static boolean isXMLNameCharacter(char c) {}
    public static boolean isXMLNameStartCharacter(char c) {}
    public static boolean isXMLLetterOrDigit(char c) {}
    public static boolean isXMLLetter(char c) {}
    public static boolean isXMLCombiningChar(char c) {}
    public static boolean isXMLExtender(char c) {}
    public static boolean isXMLDigit(char c) {}

}

JDOMException


JDOMException Class

package org.jdom;

public class JDOMException extends Exception {

    protected Throwable rootCause;

    public JDOMException() {}
    public JDOMException(String message)  {}
    public JDOMException(String message, Throwable rootCause)  {} 
       
    public String    getMessage() {}
    public void      printStackTrace() {}
    public void      printStackTrace(PrintStream s) {}
    public void      printStackTrace(PrintWriter w) {}
    public Throwable getRootCause()  {}

}

The org.jdom.output Package


Serialization


XMLOutputter

This class is still undergoing API changes.

package org.jdom.output;

public class XMLOutputter implements Cloneable {

    protected static final String STANDARD_INDENT = "  ";
    
    public XMLOutputter() {}
    public XMLOutputter(String indent) {}
    public XMLOutputter(String indent, boolean newlines) {}
    public XMLOutputter(String indent, boolean newlines, String encoding) {}
    public XMLOutputter(XMLOutputter that) {}
    
    public void setLineSeparator(String separator) {}
    public void setNewlines(boolean newlines) {}
    public void setEncoding(String encoding) {}
    public void setOmitEncoding(boolean omitEncoding) {}
    public void setSuppressDeclaration(boolean suppressDeclaration) {}
    public void setExpandEmptyElements(boolean expandEmptyElements) {}
    public void setTrimText(boolean trimText) {}
    public void setPadText(boolean padText) {}
    public void setIndent(String indent) {}
    public void setIndent(boolean doIndent) {}
    public void setIndentLevel(int indentLevel) {}
    public void setIndentSize(int indentSize) {}

    protected void indent(Writer out, int level) throws IOException {}
    protected void maybePrintln(Writer out) throws IOException  {}
    protected Writer makeWriter(OutputStream out) 
     throws java.io.UnsupportedEncodingException {}
    protected Writer makeWriter(OutputStream out, String encoding) 
     throws java.io.UnsupportedEncodingException {}
     
    public void output(Document doc, OutputStream out) throws IOException {}
    public void output(Document doc, Writer writer) throws IOException {}
    public void output(Element element, Writer out) throws IOException {}
    public void output(Element element, OutputStream out) {}
    public void outputElementContent(Element element, Writer out) throws IOException {}
    public void output(CDATA cdata, Writer out) throws IOException {}
    public void output(CDATA cdata, OutputStream out) throws IOException {}
    public void output(Comment comment, Writer out) throws IOException {}
    public void output(Comment comment, OutputStream out) throws IOException {}
    public void output(String string, Writer out) throws IOException {}
    public void output(String string, OutputStream out) throws IOException {}
    public void output(Entity entity, Writer out) throws IOException {}
    public void output(Entity entity, OutputStream out) throws IOException {}
    public void output(ProcessingInstruction processingInstruction, Writer out)
      throws IOException {}
    public void output(ProcessingInstruction processingInstruction, OutputStream out)
     throws IOException {}
    public String outputString(Document doc) throws IOException {}
    public String outputString(Element element) throws IOException {}

    // internal printing methods
    protected void printDeclaration(Document doc, Writer out, String encoding) 
     throws IOException {}    
    protected void printDocType(DocType docType, Writer out) throws IOException {}
    protected void printComment(Comment comment, Writer out, int indentLevel) 
     throws IOException {}
    protected void printProcessingInstruction(ProcessingInstruction pi,
     Writer out, int indentLevel) throws IOException {}
    protected void printCDATASection(CDATA cdata, Writer out, int indentLevel) 
     throws IOException {}
    protected void printElement(Element element, Writer out,
     int indentLevel, NamespaceStack namespaces) throws IOException {}
    protected void printElementContent(Element element, Writer out,
     int indentLevel, NamespaceStack namespaces, List mixedContent) 
     throws IOException {}
    protected void printString(String s, Writer out) throws IOException {}
    protected void printEntity(Entity entity, Writer out) throws IOException {}
    protected void printNamespace(Namespace ns, Writer out) throws IOException {}
    protected void printAttributes(List attributes, Element parent, 
     Writer out, NamespaceStack namespaces)  
     throws IOException {}
    
    public int parseArgs(String[] args, int i) {} 
    
}

Using the XMLOutputter Class Directly


Using the XMLOutputter Class Indirectly


JDOM based TagStripper

A bug in the current version of JDOM prevents this from working.

import org.jdom.*;
import org.jdom.output.XMLOutputter;
import org.jdom.input.SAXBuilder;
import java.io.*;
import java.util.*;


public class TagStripper extends XMLOutputter {

  public TagStripper() {
    super();
  }

  // Things we won't print at all
  protected void printDeclaration(Document doc, Writer out, String encoding) {}
  protected void printComment(Comment comment, Writer out, int indentLevel) {}
  protected void printDocType(DocType docType, Writer out) {}
  protected void printProcessingInstruction(ProcessingInstruction pi, 
   Writer out, int indentLevel) {}
  protected void printNamespace(Namespace ns, Writer out) {}
  protected void printAttributes(List attributes, Writer out) {}
  
  protected void printElement(Element element, Writer out, 
   int indentLevel, NamespaceStack namespaces) throws IOException {
    
    List content = element.getMixedContent();
    Iterator iterator = content.iterator();
    while (iterator.hasNext()) {
      Object o = iterator.next();
      if (o instanceof String) {
        out.write((String) o);
        this.maybePrintln(out);
      }
      else if (o instanceof Element) {
        printElement((Element) o, out, indentLevel, namespaces);
      }
    }
          
  }

  // Could easily have put main() method in a separate class
  public static void main(String[] args) {
     
    if (args.length == 0) {
      System.out.println(
       "Usage: java TagStripper URL1 URL2..."); 
    } 
      
    TagStripper stripper = new TagStripper();
    SAXBuilder builder   = new SAXBuilder();
    
    // start parsing... 
    for (int i = 0; i < args.length; i++) {
      
      // command line should offer URIs or file names
      try {
        Document doc = builder.build(args[i]);
        stripper.output(doc, System.out);
      }
      catch (JDOMException e) { // a well-formedness error
        System.out.println(args[i] + " is not well formed.");
        System.out.println(e.getMessage());
      }
      catch (IOException e) { // a well-formedness error
        System.out.println(e.getMessage());
      }
      
    }  
  
  }

}

Output from a JDOM based TagStripper

% java TagStripper hotcop.xml
Hot Cop
Jacques Morali
Henri Belolo
Victor Willis
Jacques Morali
A & M Records
6:20
1978
Village People

Talking to DOM Programs


Talking to SAX Programs


What JDOM doesn't do


To Learn More



Questions?


Part III: XML Base and XInclude

The problem is that we're not providing the tools. We're providing the specs. That's a whole different ball game. If tools existed for actually making really interesting use of RDF and XLink and XInclude then people would use them. If IE and/or Mozilla supported the full gamut of specs, from XSLT 1.0 to XLink and XInclude (OK, so they're not quite REC's, but with time...) then you would find people using them more.
--Matt Sergeant on the xml-dev mailing list


What is XML Base?


The xml:base attribute

<slide xml:base="http://www.ibiblio.org/xml/slides/sd2000east/advancedxml">
  <title>The xml:base attribute</title>
  ...
  <previous xlink:type="simple" xlink:href="What_Is_XBase.xml"/>
  <next xlink:type="simple" xlink:href="xbaseexample.xml"/>
</slide>


XML Base Example

Adapted from the XML Base spec:

<?xml version="1.0"?>
<doc xml:base="http://example.org/today/"
     xmlns:xlink="http://www.w3.org/1999/xlink">
  <head>
    <title>Virtual Library</title>
  </head>
  <body>
    <paragraph>
      See <link xlink:type="simple" xlink:href="new.xml">what's
      new</link>!</paragraph>
    <paragraph>Check out the hot picks of the day!</paragraph>
    <olist xml:base="/hotpicks/">
      <item>
        <link xlink:type="simple" xlink:href="pick1.xml">Hot Pick #1</link>
      </item>
      <item>
        <link xlink:type="simple" xlink:href="pick2.xml">Hot Pick #2</link>
      </item>
      <item>
        <link xlink:type="simple" xlink:href="pick3.xml">Hot Pick #3</link>
      </item>
    </olist>
  </body>
</doc>

Open Issues


What is XInclude?


Alternatives (and why they don't work)


The include element

<book xmlns:xinclude="http://www.w3.org/1999/XML/xinclude">
  <title>Processing XML with Java</title>
  <chapter><xinclude:include href="dom.xml"/></chapter>
  <chapter><xinclude:include href="sax.xml"/></chapter>
  <chapter><xinclude:include href="jdom.xml"/></chapter>
</book>

The parse attribute

parse="xml"
The resource must be parsed as XML and the infosets merged. This is the default.
parse="text"
The resource must be treated as pure text and inserted as a text node. When serialized, this means that characters like < will change to &lt; and so forth.
<slide xmlns:xinclude="http://www.w3.org/1999/XML/xinclude">
  <title>The href attribute</title>
  
<ul>
  <li>Identifies the document to be included with a URI</li>
  <li>The document at the URI replaces the <code>include</code> 
      element in the including document</li>
  <li>The <code>xinclude</code> prefix is bound to the http://www.w3.org/1999/XML/xinclude
  namespace URI. 
  </li>
</ul>  

<pre><code><xinclude:include parse="text" href="processing_xml_with_java.xml"/>
</code></pre>
        
  <description>
      A slide from Elliotte Rusty Harold's Advanced XML course at
      <host_ref/>, <date_ref/>
    </description>
  <last_modified>October 26, 2000</last_modified>
</slide>


Implementation as a SAX filter


Implementation as JDOM

package com.macfaq.xml;

import java.net.*;
import java.util.*;
import java.io.*;
import org.jdom.*;
import org.jdom.input.*;
import org.jdom.output.*;

public class XIncluder {

  public final static Namespace XINCLUDE_NAMESPACE
    = Namespace.getNamespace("xinclude", "http://www.w3.org/1999/XML/xinclude");

  private static SAXBuilder builder = new SAXBuilder();

  public static Document resolve(Document original, String base)
   throws IOException, JDOMException {

    if (original == null) throw new NullPointerException("Document must not be null");

    Element  root     = original.getRootElement();

    // check to see if root element has an xml:base ????

    Element  resolved = (Element) resolve(root, base);

    // catch a ClassCastException if a String is returned????

    Document result   = new Document(resolved, original.getDocType());

    Iterator iterator = original.getMixedContent().iterator();
    while (iterator.hasNext()) {
      Object o = iterator.next();
      if (o instanceof Comment) {
        Comment c = (Comment) o;
        result.addContent((Comment) c.clone());
      }
      else if (o instanceof ProcessingInstruction) {
        ProcessingInstruction pi =(ProcessingInstruction) o;
        result.addContent((ProcessingInstruction) pi.clone());
      }
    }

    return result;
  }

  // either returns an Element or a String
  public static Object resolve(Element original, String base)
   throws IOException, JDOMException {

    if (original == null) throw new NullPointerException("You can't XInclude a null element.");
    Stack bases = new Stack();
    if (base != null) bases.push(base);

    Object result = resolve(original, bases);
    bases.pop();
    return result;

  }


  // either returns an Element or a String
  protected static Object resolve(Element original, Stack bases)
   throws IOException, JDOMException {

    Element result;
    String base = "";
    if (bases.size() != 0) base = (String) bases.peek();
    Attribute href = original.getAttribute("href", XINCLUDE_NAMESPACE);
    Attribute baseAttribute = original.getAttribute("base", Namespace.XML_NAMESPACE);
    if (baseAttribute != null) base = baseAttribute.getValue();

    if (href == null) { // recursively process children
       result = new Element(original.getName(), original.getNamespace());
       Iterator attributes = original.getAttributes().iterator();
       while (attributes.hasNext()) {
         Attribute a = (Attribute) attributes.next();
         result.addAttribute((Attribute) a.clone());
       }
       List children = original.getMixedContent();

       Iterator iterator = children.iterator();
       while (iterator.hasNext()) {
         Object o = iterator.next();
         if (o instanceof Element) {
           Element e = (Element) o;
           Object resolved = resolve(e, bases);
           if (resolved instanceof String) result.addContent((String) resolved);
           else result.addContent((Element) resolved);
         }
         else if (o instanceof String) {
           result.addContent((String) o);
         }
         else if (o instanceof Comment) {
           result.addContent((Comment) o);
         }
         else if (o instanceof CDATA) {
           result.addContent((CDATA) o);
         }
         else if (o instanceof ProcessingInstruction) {
           result.addContent((ProcessingInstruction) o);
         }
       }
    }
    else {
      boolean parse = true;
      Attribute parseAttribute = original.getAttribute("parse", XINCLUDE_NAMESPACE);
      if (parseAttribute != null) {
        if (parseAttribute.getValue().equals("text")) parse = false;
      }
      URL remote;
      if (base != null) {
        URL context = new URL(base);
        remote = new URL(context, href.getValue());
      }
      else {
        remote = new URL(href.getValue());
      }

      // need to handle unparsed results too
      // need to watch out for loops
      if (parse) {
                 // checks for equality (OK) or identity (not OK)????
        if (bases.contains(remote.toExternalForm())) {
          throw new RuntimeException("Circular XInclude Reference!");
        }
        Document doc = builder.build(remote);
        bases.push(remote.toExternalForm());
        result = (Element) resolve(doc.getRootElement(), bases);
        bases.pop();
      }
      else { // insert text
        return getURL(remote);
      }
    }
    return result;

  }

  public static String getURL(URL source) throws IOException {
    StringBuffer s = new StringBuffer();
    InputStream in = new BufferedInputStream(source.openStream());
    // does XInclude give you anything to specify the character set????
    InputStreamReader reader = new InputStreamReader(in, "8859_1");
    int c;
    while ((c = in.read()) != -1) {
      if (c == '<') s.append("&lt;");
      else if (c == '&') s.append("&amp;");
      else s.append((char) c);
    }
    return s.toString();
  }

  public static void main(String[] args) {

    SAXBuilder builder = new SAXBuilder();
    XMLOutputter outputter = new XMLOutputter();
    for (int i = 0; i < args.length; i++) {
      try {
        Document input = builder.build(args[i]);
        // absolutize URL
        String base = args[i];
        if (base.indexOf(':') < 0) {
          File f = new File(base);
          base = f.toURL().toExternalForm();
        }
        Document output = resolve(input, base);
        // need to set encoding on this to Latin-1 and check what
        // happens to UTF-8 curly quotes
        outputter.output(output, System.out);
      }
      catch (Exception e) {
        System.err.println(e);
        e.printStackTrace();
      }
    }

  }

}

Implementation as DOM


To Learn More


Questions?


Part IV: Schemas

Schemas are not the salvation for the world of Markup Languages, just as DTDs aren't the embodiment of evil.
--Ann Navarro on the XHTML-L mailing list


What are Schemas?


About Schemas


What's Wrong with DTDs?


DTDs vs. Schemas

DTDsSchemas
<!ELEMENT> declarationxsd:element element
<!ATTLIST> declarationxsd:attribute element
<!NOTATION> declaration 
<!ENTITY> declaration 
 Data types

Schema versions


greeting.xml

<?xml version="1.0"?>
<GREETING>
Hello XML!
</GREETING>

greeting.xsd according to the April 7 Working Draft

<xsd:schema xmlns:xsd="http://www.w3.org/1999/XMLSchema">
 
  <xsd:element name="GREETING" type="xsd:string"/>

</xsd:schema>

Attaching the schema to the document without namespaces


Validating the document with Xerces-J 1.2.0

D:\schemas\examples>java sax.SAX2Count -v greeting2.xml
greeting2.xml: 701 ms (1 elems, 1 attrs, 0 spaces, 12 chars)

An Invalid Document

<?xml version="1.0"?>
<GREETING xmlns:xsi="http://www.w3.org/1999/XMLSchema-instance"
  xsi:noNamespaceSchemaLocation="greeting.xsd">
  <P>Hello XML!</P>
</GREETING>

Checking the Invalid Document

D:\speaking\SDExpo 2000 East\schemas\examples>java sax.SAX2Count -v greeting3.xml
[Error] greeting3.xml:4:6: Element type "P" must be declared.
[Error] greeting3.xml:5:13: Datatype error: In element 'GREETING' : Can not have
 element children within a simple type content.
greeting3.xml: 781 ms (2 elems, 1 attrs, 0 spaces, 14 chars)

greeting.xsd in the Candidate Recommendation

The namespace URIs have changed.

<xsd:schema xmlns:xsd="http://www.w3.org/2000/10/XMLSchema">
 
  <xsd:element name="GREETING" type="xsd:string"/>

</xsd:schema>

New XSI namespace


Validating the document with XSV


An Invalid Document

<?xml version="1.0"?>
<GREETING xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance"
  xsi:noNamespaceSchemaLocation="greeting.xsd">
  <P>Hello XML!</P>
</GREETING>

Checking the Invalid Document


A More Complex Document

<?xml version="1.0"?>
<SONG xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance"
      xsi:noNamespaceSchemaLocation="song.xsd">
  <TITLE>Hot Cop</TITLE>
  <COMPOSER>Jacques Morali</COMPOSER>
  <COMPOSER>Henri Belolo</COMPOSER>
  <COMPOSER>Victor Willis</COMPOSER>
  <PRODUCER>Jacques Morali</PRODUCER>
  <PUBLISHER>PolyGram Records</PUBLISHER>
  <LENGTH>6:20</LENGTH>
  <YEAR>1978</YEAR>
  <ARTIST>Village People</ARTIST>  
</SONG>

Complex vs. Simple Types


A More Complex Schema

<xsd:schema xmlns:xsd="http://www.w3.org/2000/10/XMLSchema">

  <xsd:element name="SONG" type="songType"/>
 
  <xsd:complexType name="songType">
    <xsd:element name="TITLE"     type="xsd:string"/>
    <xsd:element name="COMPOSER"  type="xsd:string" 
      minOccurs="1" maxOccurs="unbounded"/>
    <xsd:element name="PRODUCER"  type="xsd:string" 
      minOccurs="0" maxOccurs="unbounded"/>
    <xsd:element name="PUBLISHER" type="xsd:string" 
      minOccurs="0" maxOccurs="1"/>
    <xsd:element name="LENGTH"    type="xsd:timeDuration"/>
    <xsd:element name="YEAR"      type="xsd:string"/>
    <xsd:element name="ARTIST"    type="xsd:string" 
      minOccurs="1" maxOccurs="unbounded"/>
  </xsd:complexType>
 
</xsd:schema>

Three main schema elements:


Validating the Song Document

D:\speaking\SDExpo 2000 East\schemas\examples>java sax.SAX2Count -v hotcop.xml
[Error] hotcop.xml:10:25: Datatype error: java.text.ParseException: Illegal or misplaced separator.


Here's the problem:

<?xml version="1.0"?>
<SONG xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance"
      xsi:noNamespaceSchemaLocation="song.xsd">
  <TITLE>Hot Cop</TITLE>
  <COMPOSER>Jacques Morali</COMPOSER>
  <COMPOSER>Henri Belolo</COMPOSER>
  <COMPOSER>Victor Willis</COMPOSER>
  <PRODUCER>Jacques Morali</PRODUCER>
  <PUBLISHER>PolyGram Records</PUBLISHER>
  <LENGTH>6:20</LENGTH>
  <YEAR>1978</YEAR>
  <ARTIST>Village People</ARTIST>
</SONG>

This is not in the schema time duration format! which is ISO 8601 "PnYn MnDTnH nMnS, where nY represents the number of years, nM the number of months, nD the number of days, 'T' is the date/time separator, nH the number of hours, nM the number of minutes and nS the number of seconds. The number of seconds can include decimal digits to arbitrary precision. An optional preceding minus sign ('-') is allowed, to indicate a negative duration."


Fixed Hot Cop

<?xml version="1.0"?>
<SONG xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance"
      xsi:noNamespaceSchemaLocation="song.xsd">
  <TITLE>Hot Cop</TITLE>
  <COMPOSER>Jacques Morali</COMPOSER>
  <COMPOSER>Henri Belolo</COMPOSER>
  <COMPOSER>Victor Willis</COMPOSER>
  <PRODUCER>Jacques Morali</PRODUCER>
  <PUBLISHER>PolyGram Records</PUBLISHER>
  <LENGTH>P0YT6M20S</LENGTH>
  <YEAR>1978</YEAR>
  <ARTIST>Village People</ARTIST>
</SONG>

Xerces doesn't get this one right yet!


Primitive Data Types for Schemas


Numeric Data Types for Schemas

XML Schema Built-In Simple Types
Name Type Examples
float IEEE 754 32-bit floating point number -INF, -1E4, -0, 0, 12.78E-2, 12, INF, NaN
double IEEE 754 64-bit floating point number -INF, 1.401E-90, -1E4, -0, 0, 12.78E-2, 12, INF, NaN, 3.4E42
decimal arbitrary precision, decimal numbers -2.7E400, 5.7E-444, -3.1415292, 0, 7.8, 90200.76, 3.4E1024
binary a binary number made up of zeroes and ones 10000100111
integer an arbitrarily large or small integer -500000000000000000000000, -9223372036854775809, -126789, -1, 0, 1, 5, 23, 42, 126789, 9223372036854775808, 456734987324983264987362495809587095720978
nonPositiveInteger an integer less than or equal to zero 0, -1, -2, -3, -4, -5, ...
negativeInteger an integer strictly less than zero -1, -2, -3, -4, -5, ...
long an eight-byte two's complement integer such as Java's long type -9223372036854775808, -12678967543233, -1, 9223372036854775807
int an integer that can be represented as a four-byte, two's complement number such as Java's int type -2147483648, -1, 0, 1, 5, 23, 42, 2147483647
short an integer that can be represented as a two-byte, two's complement number such as Java's short type -32768, -1, 0, 1, 5, 23, 42, 32767
byte an integer that can be represented as a one-byte, two's complement number such as Java's byte type -128, -1, 0, 1, 5, 23, 42, 127
nonNegativeInteger an integer greater than or equal to zero 0, 1, 2, 3, 4, 5, ...
unsignedLong an eight-byte unsigned integer 0, 1, 2, 3, 4, 5, ...18446744073709551614, 18446744073709551615
unsignedInt a four-byte unsigned integer 0, 1, 2, 3, 4, 5, ...4294967294, 4294967295
unsignedShort a two-byte unsigned integer 0, 1, 2, 3, 4, 5, ...65534, 65535
unsignedByte a one-byte unsigned integer 0, 1, 2, 3, 4, 5, ...254, 255
positiveInteger an integer strictly greater than zero 1, 2, 3, 4, 5, 6, ...

Time Data Types for Schemas

XML Schema Built-In Simple Types
Name Type Examples
timeInstant a particular moment in Co-Ordinated Universal Time; up to an arbitrarily small fraction of a second 1999-05-31T13:20:00.000-05:00
month A given month in a given year 2000-10
year a given year 2000
century a specified century 19
recurringDate a date in no particular year, or rather in every year --10-31
recurringDay a day in no particular month, or rather in every mnonth ----31
timeDuration a length of time, without fixed endpoints, to an arbitrary fraction of a second P2000Y10M31DT09H32M7.4312S
date a specific day in history 2000-10-31
time a specific time of day, that recurs every day 14:30:00.000, 09:30:00.000-05:00

XML Data Types for Schemas

XML Schema Built-In Simple Types
Name Type Examples
ID XML 1.0 ID attribute type any XML name that's unique among ID type attributes
IDREF XML 1.0 IDREF attribute type any XML name that's used as an ID type attribute elsewhere in the document
ENTITY XML 1.0 ENTITY attribute type any XML name that's declared as an unparsed entity in the DTD
NOTATION XML 1.0 NOTATION attribute type any XML name that's declared as a notation name in the DTD
language valid values for xml:lang as defined in XML 1.0 en-GB, en-US, fr
IDREFS XML 1.0 IDREFS attribute type a white space separated list of IDREF names
ENTITIES XML 1.0 ENTITIES attribute type a white space separated list of ENTITY names
NMTOKEN XML 1.0 NMTOKEN attribute type 12 are you ready
NMTOKENS XML 1.0 NMTOKENS attribute type a white space separated list of name tokens
Name An XML 1.0 Name set, title, rdf, math, math123, href
QName a prefixed name song:title
NCName a local name without any colons title

Assorted Data Types for Schemas

XML Schema Built-In Simple Types
Name Type Examples
string Parsed Character Data; #PCDATA Hot Cop
boolean C++'s bool type true, false, 1, 0
uriReference relative or absolute URI http://www.w3.org/TR/2000/WD-xmlschema-2-20000407/#timeDuration, /javafaq/reports/JCE1.2.1.html

A Document with Attributes

<?xml version="1.0"?>
<SONG xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance"
      xsi:noNamespaceSchemaLocation="attribute_song.xsd">
  <TITLE>Hot Cop</TITLE>
  <PHOTO ALT="Victor Willis in Cop Outfit" WIDTH="100" HEIGHT="200"/>
  <COMPOSER>Jacques Morali</COMPOSER>
  <COMPOSER>Henri Belolo</COMPOSER>
  <COMPOSER>Victor Willis</COMPOSER>
  <PRODUCER>Jacques Morali</PRODUCER>
  <PUBLISHER>PolyGram Records</PUBLISHER>
  <YEAR>1978</YEAR>
  <ARTIST>Village People</ARTIST>  
</SONG>

Declaring Attributes

<xsd:schema xmlns:xsd="http://www.w3.org/2000/10/XMLSchema">
 
  <xsd:element name="SONG" type="SongType"/>

  <!-- An empty element -->
  <xsd:complexType name="PhotoType">
    <xsd:complexContent>
       <xsd:restriction base="xsd:anyType">
         <xsd:attribute name="ALT"    type="xsd:string"/>
         <xsd:attribute name="WIDTH"  type="xsd:nonNegativeInteger"/>
         <xsd:attribute name="HEIGHT" type="xsd:nonNegativeInteger"/> 
      </xsd:restriction>
    </xsd:complexContent>
  </xsd:complexType>

  <xsd:complexType name="SongType">
  
    <xsd:element name="TITLE"     type="xsd:string" 
      minOccurs="1" maxOccurs="1"/>
    <xsd:element name="PHOTO"     type="PhotoType"  
      minOccurs="0" maxOccurs="1"/>
    <xsd:element name="COMPOSER"  type="xsd:string" 
      minOccurs="1" maxOccurs="unbounded"/>
    <xsd:element name="PRODUCER"  type="xsd:string" 
      minOccurs="0" maxOccurs="unbounded"/>
    <xsd:element name="PUBLISHER" type="xsd:string" 
      minOccurs="0" maxOccurs="1"/>
    <xsd:element name="YEAR"   type="xsd:year"
       minOccurs="1" maxOccurs="1"/>
    <xsd:element name="ARTIST" type="xsd:string" 
      minOccurs="1" maxOccurs="unbounded"/>
    
  </xsd:complexType>

</xsd:schema>

Element Content

<?xml version="1.0"?>
<SONG xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance"
      xsi:noNamespaceSchemaLocation="nested_song.xsd">
  <TITLE>Hot Cop</TITLE>
  <PHOTO ALT="Victor Willis in Cop Outfit" WIDTH="100" HEIGHT="200"/>
  <COMPOSER>
    <NAME>
      <GIVEN>Jacques</GIVEN>
      <FAMILY>Morali</FAMILY>
    </NAME>
  </COMPOSER>
  <COMPOSER>
    <NAME>
      <GIVEN>Henri</GIVEN>
      <FAMILY>Belolo</FAMILY>
    </NAME>
  </COMPOSER>
  <COMPOSER>
    <NAME>
      <GIVEN>Victor</GIVEN>
      <FAMILY>Willis</FAMILY>
    </NAME>
  </COMPOSER>
  <PRODUCER>
    <NAME>
      <GIVEN>Jacques</GIVEN>
      <FAMILY>Morali</FAMILY>
    </NAME>
  </PRODUCER>
  <PUBLISHER>PolyGram Records</PUBLISHER>
  <YEAR>1978</YEAR>
  <ARTIST>Village People</ARTIST>  
</SONG>

Declaring Complex Types

<xsd:schema xmlns:xsd="http://www.w3.org/2000/10/XMLSchema">
 
  <xsd:element name="SONG" type="SongType"/>

  <xsd:complexType name="ComposerType">
    <xsd:element name="NAME">
      <xsd:complexType>
         <xsd:element name="GIVEN"  type="xsd:string" 
           minOccurs="1" maxOccurs="1"/>
         <xsd:element name="FAMILY" type="xsd:string" 
           minOccurs="1" maxOccurs="1"/>      
      </xsd:complexType>
    </xsd:element>
  </xsd:complexType>

  <xsd:complexType name="ProducerType">
    <xsd:element name="NAME">
      <xsd:complexType>
        <xsd:element name="GIVEN"  type="xsd:string" 
          minOccurs="1" maxOccurs="1"/>
        <xsd:element name="FAMILY" type="xsd:string" 
          minOccurs="1" maxOccurs="1"/>      
      </xsd:complexType>
    </xsd:element>
  </xsd:complexType>

  <xsd:complexType name="PhotoType">
    <xsd:complexContent>
       <xsd:restriction base="xsd:anyType">
         <xsd:attribute name="ALT"    type="xsd:string"/>
         <xsd:attribute name="WIDTH"  type="xsd:nonNegativeInteger"/>
         <xsd:attribute name="HEIGHT" type="xsd:nonNegativeInteger"/> 
      </xsd:restriction>
    </xsd:complexContent>
  </xsd:complexType>
  
  <xsd:complexType name="SongType">
  
    <xsd:element name="TITLE"     type="xsd:string" 
      minOccurs="1" maxOccurs="1"/>
    <xsd:element name="PHOTO"     type="PhotoType"  
      minOccurs="0" maxOccurs="1"/>
    <xsd:element name="COMPOSER"  type="ComposerType" 
      minOccurs="1" maxOccurs="unbounded"/>
    <xsd:element name="PRODUCER"  type="ProducerType" 
      minOccurs="0" maxOccurs="unbounded"/>
    <xsd:element name="PUBLISHER" type="xsd:string" 
      minOccurs="0" maxOccurs="1"/>
    <xsd:element name="YEAR"   type="xsd:year" 
      minOccurs="1" maxOccurs="1"/>
    <xsd:element name="ARTIST" type="xsd:string" 
      minOccurs="1" maxOccurs="unbounded"/>
    
  </xsd:complexType>

</xsd:schema>

Sharing Content Models

<xsd:schema xmlns:xsd="http://www.w3.org/2000/10/XMLSchema">
 
  <xsd:element name="SONG" type="SongType"/>

  <xsd:complexType name="PersonType">
    <xsd:element name="NAME">
      <xsd:complexType>
        <xsd:element name="GIVEN"  type="xsd:string" 
          minOccurs="1" maxOccurs="1"/>
        <xsd:element name="FAMILY" type="xsd:string" 
          minOccurs="1" maxOccurs="1"/>      
      </xsd:complexType>
    </xsd:element>
  </xsd:complexType>

  <xsd:complexType name="SongType">
  
    <xsd:element name="TITLE"     type="xsd:string" 
      minOccurs="1" maxOccurs="1"/>
    <xsd:element name="PHOTO"     type="PhotoType"  
      minOccurs="0" maxOccurs="1"/>
    <xsd:element name="COMPOSER"  type="PersonType" 
      minOccurs="1" maxOccurs="unbounded"/>
    <xsd:element name="PRODUCER"  type="PersonType" 
      minOccurs="0" maxOccurs="unbounded"/>
    <xsd:element name="PUBLISHER" type="xsd:string" 
      minOccurs="0" maxOccurs="1"/>
    <xsd:element name="YEAR"   type="xsd:year" 
      minOccurs="1" maxOccurs="1"/>
    <xsd:element name="ARTIST" type="xsd:string" 
      minOccurs="1" maxOccurs="unbounded"/>
    
  </xsd:complexType>

  <xsd:complexType name="PhotoType">
    <xsd:complexContent>
       <xsd:restriction base="xsd:anyType">
         <xsd:attribute name="ALT"    type="xsd:string"/>
         <xsd:attribute name="WIDTH"  type="xsd:nonNegativeInteger"/>
         <xsd:attribute name="HEIGHT" type="xsd:nonNegativeInteger"/> 
      </xsd:restriction>
    </xsd:complexContent>
  </xsd:complexType>
  
</xsd:schema>

Mixed Content

<?xml version="1.0"?>
<SONG xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance"
      xsi:noNamespaceSchemaLocation="mixed_song.xsd">
  <TITLE>Hot Cop</TITLE>
  <PHOTO ALT="Victor Willis in Cop Outfit" WIDTH="100" HEIGHT="200"/>
  <COMPOSER>
    <NAME>Mr. <GIVEN>Jacques</GIVEN> <FAMILY>Morali</FAMILY> Esq.</NAME>
  </COMPOSER>
  <COMPOSER>
    <NAME>Mr. <GIVEN>Henri</GIVEN> L. <FAMILY>Belolo</FAMILY>, M.D.</NAME>
  </COMPOSER>
  <COMPOSER>
    <NAME>Mr. <GIVEN>Victor</GIVEN> C. <FAMILY>Willis</FAMILY></NAME>
  </COMPOSER>
  <PRODUCER>
    <NAME>Mr. <GIVEN>Jacques</GIVEN> S. <FAMILY>Morali</FAMILY></NAME>
  </PRODUCER>
  <PUBLISHER>PolyGram Records</PUBLISHER>
  <YEAR>1978</YEAR>
  <ARTIST>Village People</ARTIST>  
</SONG>

Declaring Mixed Content

<xsd:schema xmlns:xsd="http://www.w3.org/2000/10/XMLSchema">
 
  <xsd:element name="SONG" type="SongType"/>

  <xsd:complexType name="PersonType">
    <xsd:element name="NAME">
      <xsd:complexType content="mixed">
        <xsd:element name="GIVEN"  type="xsd:string" 
          minOccurs="1" maxOccurs="1"/>
        <xsd:element name="FAMILY" type="xsd:string" 
          minOccurs="1" maxOccurs="1"/>      
      </xsd:complexType>
    </xsd:element>
  </xsd:complexType>

  <xsd:complexType name="SongType" content="elementOnly">
  
    <xsd:element name="TITLE"     type="xsd:string" 
      minOccurs="1" maxOccurs="1"/>
    <xsd:element name="PHOTO"     type="PhotoType"  
      minOccurs="0" maxOccurs="1"/>
    <xsd:element name="COMPOSER"  type="PersonType" 
      minOccurs="1" maxOccurs="unbounded"/>
    <xsd:element name="PRODUCER"  type="PersonType" 
      minOccurs="0" maxOccurs="unbounded"/>
    <xsd:element name="PUBLISHER" type="xsd:string" 
      minOccurs="0" maxOccurs="1"/>
    <xsd:element name="YEAR"   type="xsd:year" 
      minOccurs="1" maxOccurs="1"/>
    <xsd:element name="ARTIST" type="xsd:string" 
      minOccurs="1" maxOccurs="unbounded"/>
    
  </xsd:complexType>

  <xsd:complexType name="PhotoType">
    <xsd:complexContent>
       <xsd:restriction base="xsd:anyType">
         <xsd:attribute name="ALT"    type="xsd:string"/>
         <xsd:attribute name="WIDTH"  type="xsd:nonNegativeInteger"/>
         <xsd:attribute name="HEIGHT" type="xsd:nonNegativeInteger"/> 
      </xsd:restriction>
    </xsd:complexContent>
  </xsd:complexType>
  
</xsd:schema>

When Order Doesn't Matter

<?xml version="1.0"?>
<SONG xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance"
      xsi:noNamespaceSchemaLocation="unordered_song.xsd">
  <TITLE>Hot Cop</TITLE>
  <PHOTO ALT="Victor Willis in Cop Outfit" WIDTH="100" HEIGHT="200"/>
  <COMPOSER>
    <NAME><FAMILY>Morali</FAMILY> <GIVEN>Jacques</GIVEN></NAME>
  </COMPOSER>
  <COMPOSER>
    <NAME><GIVEN>Henri</GIVEN> <FAMILY>Belolo</FAMILY></NAME>
  </COMPOSER>
  <COMPOSER>
    <NAME><FAMILY>Willis</FAMILY> <GIVEN>Victor</GIVEN></NAME>
  </COMPOSER>
  <PRODUCER>
    <NAME><GIVEN>Jacques</GIVEN> <FAMILY>Morali</FAMILY></NAME>
  </PRODUCER>
  <PUBLISHER>PolyGram Records</PUBLISHER>
  <YEAR>1978</YEAR>
  <ARTIST>Village People</ARTIST>  
</SONG>

The xsd:all Group

<xsd:schema xmlns:xsd="http://www.w3.org/2000/10/XMLSchema">
 
  <xsd:element name="SONG" type="SongType"/>

  <xsd:complexType name="PersonType">
    <xsd:element name="NAME">
      <xsd:complexType>
        <xsd:all>
          <xsd:element name="GIVEN"  type="xsd:string" 
            minOccurs="1" maxOccurs="1"/>
          <xsd:element name="FAMILY" type="xsd:string" 
            minOccurs="1" maxOccurs="1"/> 
        </xsd:all>     
      </xsd:complexType>
    </xsd:element>
  </xsd:complexType>

  <xsd:complexType name="SongType">
      <xsd:element name="TITLE" type="xsd:string" 
        minOccurs="1" maxOccurs="1"/>
      <xsd:element name="PHOTO" type="PhotoType"  
        minOccurs="0" maxOccurs="1"/>
      <xsd:element name="COMPOSER" type="PersonType" 
        minOccurs="1" maxOccurs="unbounded"/>
      <xsd:element name="PRODUCER" type="PersonType" 
        minOccurs="0" maxOccurs="unbounded"/>
      <xsd:element name="PUBLISHER" type="xsd:string" 
        minOccurs="0" maxOccurs="1"/>
  
      <xsd:element name="YEAR" type="xsd:year" 
        minOccurs="1" maxOccurs="1"/>

      <xsd:element name="ARTIST" type="xsd:string" 
        minOccurs="1" maxOccurs="unbounded"/>
  </xsd:complexType>

  <!-- An empty element -->
  <xsd:complexType name="PhotoType" content="empty">
    <xsd:complexContent>
       <xsd:restriction base="xsd:anyType">
         <xsd:attribute name="ALT"    type="xsd:string"/>
         <xsd:attribute name="WIDTH"  type="xsd:nonNegativeInteger"/>
         <xsd:attribute name="HEIGHT" type="xsd:nonNegativeInteger"/> 
      </xsd:restriction>
    </xsd:complexContent>
  </xsd:complexType>
  
</xsd:schema>

Choices


Sequences


Adding a Price

<?xml version="1.0"?>
<SONG xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance"
      xsi:noNamespaceSchemaLocation="derived_song.xsd">
  <TITLE>Hot Cop</TITLE>
  <PHOTO ALT="Victor Willis in Cop Outfit" WIDTH="100" HEIGHT="200"/>
  <COMPOSER>
    <NAME><FAMILY>Morali</FAMILY> <GIVEN>Jacques</GIVEN></NAME>
  </COMPOSER>
  <COMPOSER>
    <NAME><GIVEN>Henri</GIVEN> <FAMILY>Belolo</FAMILY></NAME>
  </COMPOSER>
  <COMPOSER>
    <NAME><FAMILY>Willis</FAMILY> <GIVEN>Victor</GIVEN></NAME>
  </COMPOSER>
  <PRODUCER>
    <NAME><GIVEN>Jacques</GIVEN> <FAMILY>Morali</FAMILY></NAME>
  </PRODUCER>
  <PUBLISHER>PolyGram Records</PUBLISHER>
  <YEAR>1978</YEAR>
  <ARTIST>Village People</ARTIST>  
  <PRICE>$1.35</PRICE>  
</SONG>

Derived Types


Regular Expressions


The xsd:simpletype element

  <xsd:simpleType base="xsd:string" name="money">
    <xsd:pattern value="\p{Sc}\p{Nd}+(\.\p{Nd}\p{Nd})?"/>
  </xsd:simpleType>

The Price Schema

<xsd:schema xmlns:xsd="http://www.w3.org/2000/10/XMLSchema">
 
  <xsd:element name="SONG" type="SongType"/>

  <xsd:simpleType base="xsd:string" name="money">
    <xsd:pattern value="\p{Sc}\p{Nd}+(\.\p{Nd}\p{Nd})?"/>
    <!-- 
       Regular Expression:
       \p{Sc}             Any Unicode currency indicator; e.g. $, &#xA5, &#xA3, &#A4, etc.
       \p{Nd}             A Unicode decimal digit character
       \p{Nd}+            One or more Unicode decimal digit characters
       \.                 The period character
       (\.\p{Nd}\p{Nd})
       (\.\p{Nd}\p{Nd})?  Zero or one strings of the form .35
       
       This works for any decimalized currency. 
       
    -->
  </xsd:simpleType>

  <xsd:complexType name="SongType">
      <xsd:element name="TITLE"     type="xsd:string" 
        minOccurs="1" maxOccurs="1"/>
      <xsd:element name="PHOTO"     type="PhotoType"  
        minOccurs="0" maxOccurs="1"/>
      <xsd:element name="COMPOSER"  type="PersonType" 
        minOccurs="1" maxOccurs="unbounded"/>
      <xsd:element name="PRODUCER"  type="PersonType" 
        minOccurs="0" maxOccurs="unbounded"/>
      <xsd:element name="PUBLISHER" type="xsd:string" 
        minOccurs="0" maxOccurs="1"/>
      <xsd:element name="YEAR"   type="xsd:year" 
        minOccurs="1" maxOccurs="1"/>
      <xsd:element name="ARTIST" type="xsd:string" 
        minOccurs="1" maxOccurs="unbounded"/>
      <xsd:element name="PRICE" type="money" 
        minOccurs="1" maxOccurs="1"/>      
      
  </xsd:complexType>

  <xsd:complexType name="PersonType">
    <xsd:element name="NAME">
      <xsd:complexType>
        <xsd:all>
          <xsd:element name="GIVEN"  type="xsd:string" 
            minOccurs="1" maxOccurs="1"/>
          <xsd:element name="FAMILY" type="xsd:string" 
            minOccurs="1" maxOccurs="1"/> 
        </xsd:all>     
      </xsd:complexType>
    </xsd:element>
  </xsd:complexType>

  <xsd:complexType name="PhotoType">
    <xsd:complexContent>
       <xsd:restriction base="xsd:anyType">
         <xsd:attribute name="ALT"    type="xsd:string"/>
         <xsd:attribute name="WIDTH"  type="xsd:nonNegativeInteger"/>
         <xsd:attribute name="HEIGHT" type="xsd:nonNegativeInteger"/> 
      </xsd:restriction>
    </xsd:complexContent>
  </xsd:complexType>
  
</xsd:schema>

Default Namespace

<?xml version="1.0"?>
<GREETING 
  xmlns="http://ibiblio.org/xml/schemas/greeting/"
  xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance"
  xsi:schemaLocation="http://ibiblio.org/xml/schemas/greeting/
                      greeting_defaultNS.xsd">
  Hello XML!
</GREETING>

The targetNamespace attribute

<xsd:schema xmlns:xsd="http://www.w3.org/2000/10/XMLSchema"
  targetNamespace="http://ibiblio.org/xml/schemas/greeting/"
>
 
  <xsd:element name="GREETING" type="xsd:string"/>

</xsd:schema>

A Song with a Namespace

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<SONG xmlns="http://ibiblio.org/xml/namespace/song"
      xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance"
      xsi:schemaLocation = 
       "http://ibiblio.org/xml/namespace/song namespace_song.xsd"
>
  <TITLE>Hot Cop</TITLE>
  <PHOTO ALT="Victor Willis in Cop Outfit" WIDTH="100" HEIGHT="200"/>
  <COMPOSER>Jacques Morali</COMPOSER>
  <COMPOSER>Henri Belolo</COMPOSER>
  <COMPOSER>Victor Willis</COMPOSER>
  <PRODUCER>Jacques Morali</PRODUCER>
  <PUBLISHER>PolyGram Records</PUBLISHER>
  <YEAR>1978</YEAR>
  <ARTIST>Village People</ARTIST>  
</SONG>

A Schema for a Document that Uses the Default Namespace

<xsd:schema xmlns:xsd="http://www.w3.org/2000/10/XMLSchema"
  xmlns="http://ibiblio.org/xml/namespace/song"
  targetNamespace="http://ibiblio.org/xml/namespace/song"
  elementFormDefault="qualified"
  attributeFormDefault="unqualified"
>
 
  <xsd:element name="SONG" type="SongType"/>

  <xsd:complexType name="SongType">
  
    <xsd:element name="TITLE" type="xsd:string" 
      minOccurs="1" maxOccurs="1"/>
    <xsd:element name="PHOTO" type="PhotoType"  
      minOccurs="0" maxOccurs="1"/>
    <xsd:element name="COMPOSER"  type="xsd:string" 
      minOccurs="1" maxOccurs="unbounded"/>
    <xsd:element name="PRODUCER"  type="xsd:string" 
      minOccurs="0" maxOccurs="unbounded"/>
    <xsd:element name="PUBLISHER" type="xsd:string" 
      minOccurs="0" maxOccurs="1"/>    
    <xsd:element name="YEAR" type="xsd:year" 
      minOccurs="1" maxOccurs="1"/>
    <xsd:element name="ARTIST" type="xsd:string" 
      minOccurs="0" maxOccurs="unbounded"/>
    
  </xsd:complexType>

  <xsd:complexType name="PhotoType">
    <xsd:complexContent>
       <xsd:restriction base="xsd:anyType">
         <xsd:attribute name="ALT"    type="xsd:string"/>
         <xsd:attribute name="WIDTH"  type="xsd:nonNegativeInteger"/>
         <xsd:attribute name="HEIGHT" type="xsd:nonNegativeInteger"/> 
      </xsd:restriction>
    </xsd:complexContent>
  </xsd:complexType>
  
</xsd:schema>

Multiple Namespaces, Multiple Schemas

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<SONG xmlns="http://ibiblio.org/xml/namespace/song"
      xmlns:xlink="http://www.w3.org/1999/xlink"
      xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance"
      xsi:schemaLocation = 
       "http://ibiblio.org/xml/namespace/song namespace_song.xsd
        http://www.w3.org/1999/xlink xlink.xsd"
>
  <TITLE>Hot Cop</TITLE>
  <PHOTO xlink:type="simple" xlink:href="hotcop.jpg"
    ALT="Victor Willis in Cop Outfit" WIDTH="100" HEIGHT="200"/>
  <COMPOSER>Jacques Morali</COMPOSER>
  <COMPOSER>Henri Belolo</COMPOSER>
  <COMPOSER>Victor Willis</COMPOSER>
  <PRODUCER>Jacques Morali</PRODUCER>
  <PUBLISHER>PolyGram Records</PUBLISHER>
  <YEAR>1978</YEAR>
  <ARTIST>Village People</ARTIST>  
</SONG>

XLink Schema

<xsd:schema xmlns:xsd="http://www.w3.org/2000/10/XMLSchema"
  xmlns:xlink="http://www.w3.org/1999/xlink"
  targetNamespace="http://www.w3.org/1999/xlink"
  attributeFormDefault="qualified"
>

  <xsd:attributeGroup name="XLinkAttributes">
    <!-- should make this fixed and provide the default value simple???? -->
    <xsd:attribute name="xlink:type" type="xsd:string"/>
    <xsd:attribute name="xlink:href" type="xsd:uriReference"/>
  </xsd:attributeGroup>
  
</xsd:schema>

Song Schema with XLink Support

<xsd:schema xmlns:xsd="http://www.w3.org/2000/10/XMLSchema"
  xmlns="http://ibiblio.org/xml/namespace/song"
  xmlns:xlink="http://www.w3.org/1999/xlink"
  targetNamespace="http://ibiblio.org/xml/namespace/song"
  elementFormDefault="qualified"
  attributeFormDefault="unqualified"
>
 
  <xsd:import namespace="http://www.w3.org/1999/xlink" 
              schemaLocation="xlink.xsd"/>

  <xsd:complexType name="PhotoType">
    <xsd:complexContent>
       <xsd:restriction base="xsd:anyType">
         <xsd:attributeGroup ref="XLinkAttributes"/>
         <xsd:attribute name="ALT"    type="xsd:string"/>
         <xsd:attribute name="WIDTH"  type="xsd:nonNegativeInteger"/>
         <xsd:attribute name="HEIGHT" type="xsd:nonNegativeInteger"/> 
      </xsd:restriction>
    </xsd:complexContent>
  </xsd:complexType>

  <xsd:element name="SONG" type="SongType"/>

  <xsd:complexType name="SongType">
  
    <xsd:element name="TITLE"     type="xsd:string" 
      minOccurs="1" maxOccurs="1"/>
    <xsd:element name="PHOTO"     type="PhotoType"  
      minOccurs="0" maxOccurs="1"/>
    <xsd:element name="COMPOSER"  type="xsd:string" 
      minOccurs="1" maxOccurs="unbounded"/>
    <xsd:element name="PRODUCER"  type="xsd:string" 
      minOccurs="0" maxOccurs="unbounded"/>
    <xsd:element name="PUBLISHER" type="xsd:string" 
      minOccurs="0" maxOccurs="1"/>    
    <xsd:element name="YEAR"   type="xsd:year" 
      minOccurs="1" maxOccurs="1"/>
    <xsd:element name="ARTIST" type="xsd:string" 
      minOccurs="0" maxOccurs="unbounded"/>
    
  </xsd:complexType>

</xsd:schema>

Annotations

  <xsd:annotation>
   <xsd:documentation>
    Song schema for XML and Java Example at SDExpo 2000 East
    Copyright 2000 Elliotte Rusty Harold. 
   </xsd:documentation>
  </xsd:annotation>

What Schemas don't do


Schema Alternatives


Schematron


RELAX



DTDs aren't Dead!


To Learn More


Questions?


Part V: XLinks

Once you've tasted XLink's Chunky Monkey, it's hard to reconcile yourself to HTML's vanilla.
--John E. Simpson on the xsl-list mailing list


Three Technologies

Linking in XML is divided into three parts, XLinks, XPointer, and XPath.

XLink, the XML Linking Language, defines how one document links to another document. XPointer, the XML Pointer Language, defines how individual parts of a document are addressed. XPath is a syntax used in XPointers for identifying particular nodes in an XML document's tree.

An XLink points to a URI (in practice, a URL) that specifies a particular resource. This URL may include an XPointer part that more specifically identifies the desired part or section of the targeted resource or document. XPointers use the XPath syntax shared with XSL to identify particular elements in the document tree.


Versions

This talk covers:


HTML Links are Limited


XLinks are More Powerful


Application Support

Currently, there are no general-purpose applications that support arbitrary XLinks. That's because XLinks have a much broader base of applicability than HTML links. XLinks are not just used for hypertext connections and embedding images in documents. They can be used by any custom application that needs to establish connections between documents and parts of documents, for any reason. Thus, even when XLinks are fully implemented in browsers they may not always be blue underlined text that you click to jump to another page. They can be that, but they can also be both more and less, depending on your needs.


Linking Elements


For example

<FOOTNOTE xmlns:xlink="http://www.w3.org/1999/xlink"
          xlink:type="simple"
          xlink:href="footnote7.xml">7</FOOTNOTE>
<COMPOSER xmlns:xlink="http://www.w3.org/1999/xlink"
          xlink:type="simple"
          xlink:href="http://www.interport.net/~beand/">
    Beth Anderson
</COMPOSER>
<IMAGE xmlns:xlink="http://www.w3.org/1999/xlink"
       xlink:type="simple" xlink:href="logo.gif"/>

Declaring XLink Attributes in DTDs

<!ELEMENT FOOTNOTE (#PCDATA)>
<!ATTLIST FOOTNOTE
  xmlns:xlink CDATA #FIXED "http://www.w3.org/1999/xlink"
  xlink:type  CDATA #FIXED "simple"
  xlink:href  CDATA #REQUIRED
>
<!ELEMENT COMPOSER (#PCDATA)>
<!ATTLIST COMPOSER
  xmlns:xlink CDATA #FIXED "http://www.w3.org/1999/xlink"
  xlink:type  CDATA #FIXED "simple"
  xlink:href  CDATA #REQUIRED
>
<!ELEMENT IMAGE EMPTY>
<!ATTLIST IMAGE
  xmlns:xlink CDATA #FIXED "http://www.w3.org/1999/xlink"
  xlink:type  CDATA #FIXED "simple"
  xlink:href  CDATA #REQUIRED
>

Fixed Attributes

<FOOTNOTE xlink:href="footnote7.xml">7</FOOTNOTE>
<COMPOSER xlink:href="http://www.interport.net/~beand/">
  Beth Anderson
</COMPOSER>
<IMAGE xlink:href="logo.gif"/>

Other Attributes

<!ELEMENT IMAGE EMPTY>
<!ATTLIST IMAGE
  xmlns:xlink CDATA #FIXED "http://www.w3.org/1999/xlink"
  xlink:type  CDATA #FIXED "simple"
  xlink:href  CDATA #REQUIRED
  ALT         CDATA #REQUIRED
  HEIGHT      CDATA #REQUIRED
  WIDTH       CDATA #REQUIRED
>

Descriptions of the Remote Resource

<AUTHOR 
 xmlns:xlink="http://www.w3.org/1999/xlink"
 xlink:href="http://www.macfaq.com/personal.html"
 xlink:title="Elliotte Rusty Harold's personal home page" 
 xlink:role="http://www.macfaq.com/about.html"
</AUTHOR>

As with all other attributes, the xlink:title and xlink:role attributes should be declared in the DTD for all the elements to which they belong. For example, this is a reasonable declaration for the above AUTHOR element:

<!ELEMENT AUTHOR (#PCDATA)>
<!ATTLIST AUTHOR
  xmlns:xlink CDATA #FIXED "http://www.w3.org/1999/xlink"
  xlink:type  CDATA #FIXED "simple"
  xlink:href  CDATA #REQUIRED
  xlink:title CDATA #IMPLIED
  xlink:role  CDATA #IMPLIED
>

Link Behavior

Linking elements can contain two more optional attributes that suggest to applications how the remote resource is associated with the current page. These are:


xlink:show


xlink:actuate

A linking element's xlink:actuate attribute has four predefined values:

<IMAGE 
  xmlns:xlink="http://www.w3.org/1999/xlink" 
       xlink:type="simple" xlink:href="logo.gif"
       xlink:actuate="onLoad"/>

Like all attributes in valid documents, the actuate attribute must be declared in the DTD in a <!ATTLIST> declaration for the link elements in which it appears. For example:

<!ELEMENT IMAGE EMPTY>
<!ATTLIST IMAGE 
 xmlns:xlink CDATA #FIXED "http://www.w3.org/1999/xlink"
  xlink:type CDATA #FIXED "simple"
  xlink:href CDATA #REQUIRED
  xlink:show    (new | replace | embed) #IMPLIED "embed"
  xlink:actuate (onRequest | onLoad)    #IMPLIED "onLoad"
>

Parameter Entities for Link Attributes

<!ENTITY % link-attributes
   "xlink:type     CDATA  #FIXED 'simple'
    xlink:role     CDATA  #IMPLIED
    xlink:title    CDATA  #IMPLIED

    xmlns:xlink    CDATA  #FIXED 'http://www.w3.org/1999/xlink'
    xlink:href     CDATA  #REQUIRED
    xlink:show     (new | replace | embed) #IMPLIED 'replace'
    xlink:actuate  (onRequest | onLoad)    #IMPLIED 'onRequest'"
>

<!ELEMENT COMPOSER (#PCDATA)>
<!ATTLIST COMPOSER 
    %link-attributes;
>
<!ELEMENT AUTHOR (#PCDATA)>
<!ATTLIST AUTHOR
    %link-attributes;
>
<!ELEMENT WEBSITE (#PCDATA)>
<!ATTLIST WEBSITE
    %link-attributes;
>

Extended Links


Extended Links

<WEBSITE xmlns:xlink="http://www.w3.org/1999/xlink"
         xlink:type="extended">
 ...
</WEBSITE>

Resources


Resource Example

<WEBSITE xmlns:xlink="http://www.w3.org/1999/xlink" 
         xlink:type="extended">
  <NAME xlink:type="resource">Cafe au Lait</NAME>
  <HOMESITE xlink:type="locator" 
            xlink:href="http://ibiblio.org/javafaq/"/>
  <MIRROR xlink:type="locator" 
          xlink:href="http://sunsite.kth.se/javafaq"/>
  <MIRROR xlink:type="locator" 
          xlink:href="http://sunsite.informatik.rwth-aachen.de/javafaq/"/>
  <MIRROR xlink:type="locator" 
          xlink:href="http://sunsite.cnlab-switch.ch/javafaq/"/>
</WEBSITE>

This WEBSITE element describes an extended link with five resources:

Since one of the resources referenced by this extended link is contained in the extended link, it is called an inline link. It will be included as part of one of the documents it connects.


Resource Example Diagram

This picture shows the WEBSITE extended link element and five resources, one of which WEBSITE contains, the other four of which are referred to by URLs. However, this just describes these resources. No connections are implied between them.

Four local and one remote resource with no connections

Roles and Titles for Resources

<WEBSITE xmlns:xlink="http://www.w3.org/1999/xlink" 
         xlink:type="extended" xlink:title="Cafe au Lait">
  <NAME xlink:type="resource" 
        xlink:role="http://ibiblio.org/javafaq/">
    Cafe au Lait
  </NAME>
  <HOMESITE xlink:type="locator" 
          xlink:href="http://ibiblio.org/javafaq/"
          xlink:role="http://ibiblio.org/"/>
  <MIRROR xlink:type="locator" 
         xlink:title="Cafe au Lait Swedish Mirror"
         xlink:role="http://sunsite.kth.se/"
         xlink:href="http://sunsite.kth.se/javafaq"/>
  <MIRROR xlink:type="locator" 
         xlink:title="Cafe au Lait German Mirror"
         xlink:role="http://sunsite.informatik.rwth-aachen.de/"
         xlink:href=
          "http://sunsite.informatik.rwth-aachen.de/javafaq/"/>
  <MIRROR xlink:type="locator" 
         xlink:title="Cafe au Lait Swiss Mirror"
         xlink:role="http://sunsite.cnlab-switch.ch/"
         xlink:href="http://sunsite.cnlab-switch.ch/javafaq/"/>
</WEBSITE>

DTD for Extended Links

<!ELEMENT WEBSITE (NAME, HOMESITE, MIRROR*) >
<!ATTLIST WEBSITE
  xmlns:xlink  CDATA     #FIXED "http://www.w3.org/1999/xlink"
  xlink:type  (extended) #FIXED  "extended"
  xlink:title  CDATA     #IMPLIED
  xlink:role   CDATA     #IMPLIED
>

<!ELEMENT NAME (#PCDATA)>
<!ATTLIST NAME
   xlink:type  (resource) #FIXED    "resource"
   xlink:role   CDATA     #IMPLIED
   xlink:title  CDATA     #IMPLIED
>

<!ELEMENT HOMESITE (#PCDATA)>
<!ATTLIST HOMESITE
   xlink:type  (locator)  #FIXED    "locator"
   xlink:href   CDATA     #REQUIRED
   xlink:role   CDATA     #IMPLIED
   xlink:title  CDATA     #IMPLIED
>

<!ELEMENT MIRROR (#PCDATA)>
<!ATTLIST MIRROR
   xlink:type  (locator)  #FIXED  "locator"
   xlink:href   CDATA     #REQUIRED
   xlink:role   CDATA     #IMPLIED
   xlink:title  CDATA     #IMPLIED
>

Another Shortcut for the DTD

<!ENTITY % extended.att
  "xlink:type   CDATA    #FIXED 'extended'
   xmlns:xlink  CDATA    #FIXED 'http://www.w3.org/1999/xlink'
   xlink:role   CDATA    #IMPLIED
   xlink:title  CDATA    #IMPLIED"
>

<!ENTITY % resource.att
  "xlink:type (resource) #FIXED  'resource'
   xlink:href    CDATA   #REQUIRED
   xlink:role    CDATA   #IMPLIED
   xlink:title   CDATA   #IMPLIED"
>

<!ENTITY % locator.att
  "xlink:type (locator)  #FIXED  'locator'
   xlink:href    CDATA   #REQUIRED
   xlink:role    CDATA   #IMPLIED
   xlink:title   CDATA   #IMPLIED"
>

<!ELEMENT WEBSITE (HOMESITE, MIRROR*) >
<!ATTLIST WEBSITE
   %extended.att;
>

<!ELEMENT NAME (#PCDATA)>
<!ATTLIST NAME
   %resource.att;
>

<!ELEMENT HOMESITE (#PCDATA)>
<!ATTLIST HOMESITE
   %locator.att;
>

<!ELEMENT MIRROR (#PCDATA)>
<!ATTLIST MIRROR
   %locator.att;
>

Arcs


Arc Example

<?xml version="1.0"?>
<WEBSITE xmlns:xlink="http://www.w3.org/1999/xlink" 
         xlink:type="extended" xlink:title="Cafe au Lait">
         
  <NAME xlink:type="resource" xlink:label="source">
    Cafe au Lait
  </NAME>

  <HOMESITE xlink:type="locator" 
            xlink:href="http://ibiblio.org/javafaq/"
            xlink:label="us"/>
  
  <MIRROR xlink:type="locator" 
          xlink:title="Cafe au Lait Swedish Mirror"
          xlink:label="se"
          xlink:href="http://sunsite.kth.se/javafaq"/>
  
  <MIRROR xlink:type="locator" 
          xlink:title="Cafe au Lait German Mirror"
          xlink:label="de"
          xlink:href="http://sunsite.informatik.rwth-aachen.de/javafaq/"/>
  
  <MIRROR xlink:type="locator" 
          xlink:title="Cafe au Lait Swiss Mirror"
          xlink:label="ch"
          xlink:href="http://sunsite.cnlab-switch.ch/javafaq/"/>
  
  <CONNECTION xlink:type="arc" xlink:from="source" 
              xlink:to="ch"    xlink:show="replace" 
              xlink:actuate="onRequest"/>
  <CONNECTION xlink:type="arc" xlink:from="source" 
              xlink:to="us"    xlink:show="replace" 
              xlink:actuate="onRequest"/>
  <CONNECTION xlink:type="arc" xlink:from="source" 
              xlink:to="se"    xlink:show="replace" 
              xlink:actuate="onRequest"/>
  <CONNECTION xlink:type="arc" xlink:from="source" 
              xlink:to="sk"    xlink:show="replace" 
              xlink:actuate="onRequest"/>
  
</WEBSITE>

Arc Example Diagram

An extended link with arcs

Arc Example

<WEBSITE xmlns:xlink="http://www.w3.org/1999/xlink" 
         xlink:type="extended" xlink:title="Cafe au Lait">
         
  <NAME xlink:type="resource" xlink:label="source">
    Cafe au Lait
  </NAME>

  <HOMESITE xlink:type="locator" 
           xlink:href="http://ibiblio.org/javafaq/"
           xlink:label="us"/>
  
  <MIRROR xlink:type="locator" 
         xlink:title="Cafe au Lait Swedish Mirror"
         xlink:label="se"
         xlink:href="http://sunsite.kth.se/javafaq"/>
  
  <MIRROR xlink:type="locator" 
         xlink:title="Cafe au Lait German Mirror"
         xlink:label="sk"
         xlink:href="http://sunsite.informatik.rwth-aachen.de/javafaq/"/>
  
  <MIRROR xlink:type="locator" 
         xlink:title="Cafe au Lait Swiss Mirror"
         xlink:label="ch"
         xlink:href="http://sunsite.cnlab-switch.ch/javafaq/"/>
  
  <CONNECTION xlink:type="arc"  xlink:from="source" 
              xlink:to="mirror" xlink:show="replace" 
              xlink:actuate="onRequest"/>

</WEBSITE>

Arc Example Diagram

mirror role diagram

Arc Example with omitted to attribute

<?xml version="1.0"?>
<WEBSITE xmlns:xlink="http://www.w3.org/1999/xlink" 
         xlink:type="extended" xlink:title="Cafe au Lait">
         
  <NAME xlink:type="resource" xlink:label="source">
    Cafe au Lait
  </NAME>

  <HOMESITE xlink:type="locator" 
            xlink:href="http://ibiblio.org/javafaq/"
            xlink:label="us"/>
  
  <MIRROR xlink:type="locator" 
          xlink:title="Cafe au Lait Swedish Mirror"
          xlink:label="se"
          xlink:href="http://sunsite.kth.se/javafaq"/>
  
  <MIRROR xlink:type="locator" 
          xlink:title="Cafe au Lait German Mirror"
          xlink:label="sk"
          xlink:href="http://sunsite.informatik.rwth-aachen.de/javafaq/"/>
  
  <MIRROR xlink:type="locator" 
          xlink:title="Cafe au Lait Swiss Mirror"
          xlink:label="ch"
          xlink:href="http://sunsite.cnlab-switch.ch/javafaq/"/>
  
  <xlink:arc from="source" show="new" actuate="onRequest"/>

  <CONNECTION xlink:type="arc" xlink:from="source" 
              xlink:show="replace" xlink:actuate="onRequest"/>

</WEBSITE>

Arc Example Diagram

Arcs can return to the same resource they started from

Arc DTD Fragment

<!ELEMENT WEBSITE (HOMESITE, MIRROR*, xlink:arc*) >
<!ATTLIST WEBSITE
  xmlns:xlink  CDATA  #FIXED "http://www.w3.org/1999/xlink"
  xlink:type  (extended) #FIXED  "extended"
  xlink:title  CDATA     #IMPLIED
  xlink:label  CDATA     #IMPLIED
>

<!ELEMENT HOMESITE (#PCDATA)>
<!ATTLIST HOMESITE
   xlink:type     (locator) #FIXED  "locator"
   xlink:href      CDATA    #REQUIRED
   xlink:label     CDATA    #REQUIRED
   xlink:title     CDATA    #IMPLIED
>

<!ELEMENT MIRROR (#PCDATA)>
<!ATTLIST MIRROR
   xlink:type     (locator) #FIXED  "locator"
   xlink:href      CDATA    #REQUIRED
   xlink:label     CDATA    #REQUIRED
   xlink:title     CDATA    #IMPLIED
>

<!ELEMENT xlink:arc EMPTY>
<!ATTLIST CONNECTION
  xlink:type     (arc)               #FIXED   "arc"
  xlink:from     CDATA               #IMPLIED
  xlink:to       CDATA               #IMPLIED
  xlink:show    (replace)            #IMPLIED "replace"
  xlink:actuate (onRequest | onLoad) #IMPLIED "onRequest"
>

Out-of-Line Links


Out of line Link example


Out of line Link example


Out of line Link Example

<COURSE xmlns:xlink="http://www.w3.org/1999/xlink"
         xlink:type="extended">

  <TOC xlink:type="locator" xlink:href="index.xml" xlink:label="index"/>

  <CLASS xlink:type="locator" xlink:href="week1.xml" xlink:label="class"/>
  <CLASS xlink:type="locator" xlink:href="week2.xml" xlink:label="class"/>
  <CLASS xlink:type="locator" xlink:href="week3.xml" xlink:label="class"/>
  <CLASS xlink:type="locator" xlink:href="week4.xml" xlink:label="class"/>
  <CLASS xlink:type="locator" xlink:href="week5.xml" xlink:label="class"/>
  <CLASS xlink:type="locator" xlink:href="week6.xml" xlink:label="class"/>
  <CLASS xlink:type="locator" xlink:href="week7.xml" xlink:label="class"/>
  <CLASS xlink:type="locator" xlink:href="week8.xml" xlink:label="class"/>
  <CLASS xlink:type="locator" xlink:href="week9.xml" xlink:label="class"/>
  <CLASS xlink:type="locator" xlink:href="week10.xml" xlink:label="class"/> 
  <CLASS xlink:type="locator" xlink:href="week11.xml" xlink:label="class"/> 
  <CLASS xlink:type="locator" xlink:href="week12.xml" xlink:label="class"/>
  <CLASS xlink:type="locator" xlink:href="week13.xml" xlink:label="class"/>
  
  <CONNECTION xlink:type="arc" from="index" to="class"/>
  <CONNECTION xlink:type="arc" from="class" to="index"/>
  
</COURSE>

Another Out of line Link Example

<COURSE xmlns:xlink="http://www.w3.org/1999/xlink"
         xlink:type="extended">

  <CLASS xlink:type="locator" xlink:href="week1.xml"  xlink:label="1"/>
  <CLASS xlink:type="locator" xlink:href="week2.xml"  xlink:label="2"/>
  <CLASS xlink:type="locator" xlink:href="week3.xml"  xlink:label="3"/>
  <CLASS xlink:type="locator" xlink:href="week4.xml"  xlink:label="4"/>
  <CLASS xlink:type="locator" xlink:href="week5.xml"  xlink:label="5"/>
  <CLASS xlink:type="locator" xlink:href="week6.xml"  xlink:label="6"/>
  <CLASS xlink:type="locator" xlink:href="week7.xml"  xlink:label="7"/>
  <CLASS xlink:type="locator" xlink:href="week8.xml"  xlink:label="8"/>
  <CLASS xlink:type="locator" xlink:href="week9.xml"  xlink:label="9"/>
  <CLASS xlink:type="locator" xlink:href="week10.xml" xlink:label="10"/> 
  <CLASS xlink:type="locator" xlink:href="week11.xml" xlink:label="11"/> 
  <CLASS xlink:type="locator" xlink:href="week12.xml" xlink:label="12"/>
  <CLASS xlink:type="locator" xlink:href="week13.xml" xlink:label="13"/>
  
  <!-- Previous Links --> 
  <CONNECTION xlink:type="arc" xlink:from="2" xlink:to="1"/>
  <CONNECTION xlink:type="arc" xlink:from="3" xlink:to="2"/>
  <CONNECTION xlink:type="arc" xlink:from="4" xlink:to="3"/>
  <CONNECTION xlink:type="arc" xlink:from="5" xlink:to="4"/>
  <CONNECTION xlink:type="arc" xlink:from="6" xlink:to="5"/>
  <CONNECTION xlink:type="arc" xlink:from="7" xlink:to="6"/>
  <CONNECTION xlink:type="arc" xlink:from="8" xlink:to="7"/>
  <CONNECTION xlink:type="arc" xlink:from="9" xlink:to="8"/>
  <CONNECTION xlink:type="arc" xlink:from="10" xlink:to="9"/>
  <CONNECTION xlink:type="arc" xlink:from="11" xlink:to="10"/> 
  <CONNECTION xlink:type="arc" xlink:from="12" xlink:to="11"/> 
  <CONNECTION xlink:type="arc" xlink:from="13" xlink:to="12"/>
  
  <!-- Next Links --> 
  <CONNECTION xlink:type="arc" xlink:from="1" xlink:to="2"/>
  <CONNECTION xlink:type="arc" xlink:from="2" xlink:to="3"/>
  <CONNECTION xlink:type="arc" xlink:from="3" xlink:to="4"/>
  <CONNECTION xlink:type="arc" xlink:from="4" xlink:to="5"/>
  <CONNECTION xlink:type="arc" xlink:from="5" xlink:to="6"/>
  <CONNECTION xlink:type="arc" xlink:from="6" xlink:to="7"/>
  <CONNECTION xlink:type="arc" xlink:from="7" xlink:to="8"/>
  <CONNECTION xlink:type="arc" xlink:from="8" xlink:to="9"/>
  <CONNECTION xlink:type="arc" xlink:from="9" xlink:to="10"/>
  <CONNECTION xlink:type="arc" xlink:from="10" xlink:to="11"/> 
  <CONNECTION xlink:type="arc" xlink:from="11" xlink:to="12"/> 
  <CONNECTION xlink:type="arc" xlink:from="12" xlink:to="13"/>
  
</COURSE>

Linkbases

<METADATA xlink:type="xlink:extended"
  xmlns:xlink="http://www.w3.org/1999/xlink">
  <LINKBASE xlink:type="arc"
  xmlns:xlink="http://www.w3.org/1999/xlink"
  xlink:arcrole="http://www.w3.org/1999/xlink/properties/linkbase"
  xlink:to="courselinks"/>
  <RESOURCE xlink:type="locator" href="courselinks.xml" 
             xlink:label="courselinks"/>
</METADATA>

XLink Summary


To Learn More



Questions?


Part VI: XPointers

The many advantages of descriptive pointing are crucial for a scalable, generic pointing system. Descriptive pointing is crucial for all the same reasons that descriptive markup is crucial to documents, and that making links first-class objects is crucial to linking. It is also clearly feasible, as shown by multiple implementations of the prior WDs from the XML WG, and of TEI extended pointers.
--XML Linking Working Group, XML XPointer Requirements


XPointers


What are XPointers?

XPointer, the XML Pointer Language, defines an addressing scheme for individual parts of an XML document. XLinks point to a URI (in practice, a URL) that specifies a particular resource. The URI may include an XPointer part that more specifically identifies the desired part or element of the targeted resource or document. XPointers use the same XPath syntax you're familiar with from XSL transformations to identify the parts of the document they point to, along with a few additional pieces.


Why Use XPointers?


XPointer Examples

xpointer(id("ebnf"))
xpointer(descendant::language[position()=2])
ebnf
xpointer(/child::spec/child::body/child::*/child::language[position()=2])
/1/14/2
xpointer(id("ebnf"))xpointer(id("EBNF"))

The document is not specified in the XPointer; rather, the XLink specifies the document. The XLinks you saw in the previous chapter did not contain XPointers, but it isn't hard to add XPointers to them. Most of the time you simply append the XPointer to the URI separated by a #, just as you do with named anchors in HTML. For example, the above list of XPointers could be suffixed to URLs and come out looking like the following:

http://www.w3.org/TR/1998/REC-xml-19980210.xml#xpointer(id("ebnf"))
http://www.w3.org/TR/1998/REC-xml-19980210.xml#xpointer(descendant::language[position()=2])
http://www.w3.org/TR/1998/REC-xml-19980210.xml#ebnf
http://www.w3.org/TR/1998/REC-xml-19980210.xml#xpointer(/child::spec/child::body/child::*/child::language[position()=2])
http://www.w3.org/TR/1998/REC-xml-19980210.xml#/1/14/2
http://www.w3.org/TR/1998/REC-xml-19980210.xml#xpointer(id("ebnf"))xpointer(id("EBNF"))

Normally these are used as values of the xlink:href attribute of a linking element. For example:

<SPECIFICATION xmlns:xlink="http://www.w3.org/1999/xlink" xlink:type="simple" 
 xlink:href="http://www.w3.org/TR/1998/REC-xml-19980210.xml#xpointer(id('ebnf'))">
 xlink:actuate="onRequest" xlink:show="replace"
  Extensible Markup Language (XML) 1.0
</SPECIFICATION>

A Concrete Example

<?xml version="1.0"?>
<!DOCTYPE FAMILYTREE [

  <!ELEMENT FAMILYTREE (PERSON | FAMILY)*>

  <!-- PERSON elements --> 
  <!ELEMENT PERSON (NAME*, BORN*, DIED*, SPOUSE*)>
  <!ATTLIST PERSON 
    ID      ID     #REQUIRED
    FATHER  CDATA  #IMPLIED
    MOTHER  CDATA  #IMPLIED
  >
  <!ELEMENT NAME (#PCDATA)>
  <!ELEMENT BORN (#PCDATA)>
  <!ELEMENT DIED  (#PCDATA)>
  <!ELEMENT SPOUSE EMPTY>
  <!ATTLIST SPOUSE IDREF IDREF #REQUIRED>
  
  <!--FAMILY--> 
  <!ELEMENT FAMILY (HUSBAND?, WIFE?, CHILD*) >
  <!ATTLIST FAMILY ID ID #REQUIRED>
  
  <!ELEMENT HUSBAND EMPTY>
  <!ATTLIST HUSBAND IDREF IDREF #REQUIRED>
  <!ELEMENT WIFE EMPTY>
  <!ATTLIST WIFE IDREF IDREF #REQUIRED>
  <!ELEMENT CHILD EMPTY>
  <!ATTLIST CHILD IDREF IDREF #REQUIRED>

]>
<FAMILYTREE>

  <PERSON ID="p1">
    <NAME>Domeniquette Celeste Baudean</NAME>
    <BORN>21 Apr 1836</BORN>
    <DIED>Unknown</DIED>
    <SPOUSE IDREF="p2"/>
  </PERSON>

  <PERSON ID="p2">
    <NAME>Jean Francois Bellau</NAME>
    <SPOUSE IDREF="p1"/>
  </PERSON>

  <PERSON ID="p3" FATHER="p2" MOTHER="p1">
    <NAME>Elodie Bellau</NAME>
    <BORN>11 Feb 1858</BORN>
    <DIED>12 Apr 1898</DIED>
    <SPOUSE IDREF="p4"/>
  </PERSON>

  <PERSON ID="p4" FATHER="p2" MOTHER="p1">
    <NAME>John P. Muller</NAME>
    <SPOUSE IDREF="p3"/>
  </PERSON>

  <PERSON ID="p7">
    <NAME>Adolf Eno</NAME>
    <SPOUSE IDREF="p6"/>
  </PERSON>

  <PERSON ID="p6" FATHER="p2" MOTHER="p1">
    <NAME>Maria Bellau</NAME>
    <SPOUSE IDREF="p7"/>
  </PERSON>

  <PERSON ID="p5" FATHER="p2" MOTHER="p1">
    <NAME>Eugene Bellau</NAME>
  </PERSON>

  <PERSON ID="p8" FATHER="p2" MOTHER="p1">
    <NAME>Louise Pauline Bellau</NAME>
    <BORN>29 Oct 1868</BORN>
    <DIED>3 May 1938</DIED>
    <SPOUSE IDREF="p9"/>
  </PERSON>

  <PERSON ID="p9">
    <NAME>Charles Walter Harold</NAME>
    <BORN>about 1861</BORN>
    <DIED>about 1938</DIED>
    <SPOUSE IDREF="p8"/>
  </PERSON>

  <PERSON ID="p10" FATHER="p2" MOTHER="p1">
    <NAME>Victor Joseph Bellau</NAME>
    <SPOUSE IDREF="p11"/>
  </PERSON>

  <PERSON ID="p11">
    <NAME>Ellen Gilmore</NAME>
    <SPOUSE IDREF="p10"/>
  </PERSON>

  <PERSON ID="p12" FATHER="p2" MOTHER="p1">
    <NAME>Honore Bellau</NAME>
  </PERSON>

  <FAMILY ID="f1">
    <HUSBAND IDREF="p2"/>
    <WIFE IDREF="p1"/>
    <CHILD IDREF="p3"/>
    <CHILD IDREF="p5"/>
    <CHILD IDREF="p6"/>
    <CHILD IDREF="p8"/>
    <CHILD IDREF="p10"/>
    <CHILD IDREF="p12"/>
  </FAMILY>

  <FAMILY ID="f2">
    <HUSBAND IDREF="p7"/>
    <WIFE IDREF="p6"/>
  </FAMILY>

</FAMILYTREE>

Location Paths, Steps, and Sets


Location Steps


Location Paths

xpointer(/child::FAMILYTREE/child::PERSON[position()=3])

The location path of this XPointer is /child::FAMILYTREE/child::PERSON[position()=3]. It is built from two location steps:


Location Paths that Identify Multiple Nodes

xpointer(/child::FAMILYTREE/child::PERSON[position()>3])


Axes

XPath defines twelve axes along which an XPointer may search for nodes, all from the same XPath syntax used for XSLT. These depend on context to determine exactly what they point to. For instance, consider this location path:

id("p6")/child::NAME

It begins with the id() function that returns a node set containing the element with the ID type attribute whose value is p6. This provides a context node for the following location step along the relative child axis. Other axes include ancestor, descendant, self, ancestor-or-self, descendant-or-self, attribute, and more. Each serves to select a particular subset of the elements in the document. For instance, the following axis selects from nodes that come after the context node. The preceding axis selects from nodes that come before the context node.


Location Step Axes

Axis Selects From
ancestor the parent of the context node, the parent of the parent of the context node, the parent of the parent of the parent of the context node, and so forth back to the root node
ancestor-or-self the ancestors of the context node and the context node itself
attribute the attributes of the context node
child the immediate children of the context node
descendant the children of the context node, the children of the children of the context node, and so forth
descendant-or-self the context node itself and its descendants
following all nodes that start after the end of the context node, excluding attribute and namespace nodes
following-sibling all nodes that start after the end of the context node and have the same parent as the context node
parent the unique parent node of the context node
preceding all nodes that start before the beginning of the context node, excluding attribute and namespace nodes
preceding-sibling all nodes that start before the beginning of the context node and have the same parent as the context node
self the context node

The child Axis

The child axis selects from the children of the context node. For example, consider this XPointer:

xpointer(/child::FAMILYTREE/child::PERSON[position()=3]/child::NAME)

Reading from right to left, it selects the NAME child of the third PERSON element that's a child of the FAMILYTREE element that's a child of the root element. In this example, there's only one such element, but if there are more than one then all are returned. For instance consider this XPointer:

xpointer(/child::FAMILYTREE/child::PERSON/child::NAME)

This selects all NAME children of PERSON elements that are children of FAMILYTREE elements that are children of the root. They're a dozen of these in Example 17-1.

It's important to note that the child axis only selects from the immediate children of the context node. For example, consider this URI:

http://www.theharolds.com/genealogy.xml#xpointer(/child::NAME)

This points nowhere because there are no NAME elements in the document that are direct, immediate children of the root node. There are a dozen NAME elements that are indirect children. If you'd like to refer to these, you should use the descendant axis instead of child.

As in XSLT, the child axis is implied if no explicit axis name is present. For instance, the above three XPointers would more likely be written in this abbreviated form:

xpointer(/FAMILYTREE/PERSON[position()=3]/NAME)
xpointer(/FAMILYTREE/PERSON/NAME)
xpointer(/NAME)

The descendant Axis

The descendant axis searches through all the descendants of the context node, not just the immediate children. For example, /descendant::BORN[position()=3] selects the third BORN element encountered in a depth-first search of the document tree. (Depth first is the order you get if you simply read through the XML document from top to bottom.) In Listing 17-1, that selects Louise Pauline Bellau's birthday, <BORN>29 Oct 1868</BORN>.

The descendant axis can be abbreviated by using a double slash in place of a single slash. For example, //BORN[position()=3] also selects the third BORN element encountered in a depth-first search of the document tree. //NAME selects all NAME elements in the document. //PERSON/NAME selects all NAME children of PERSON elements.


The descendant-or-self Axis

The descendant-or-self axis searches through all the descendants of the context node, starting with the context node itself, until it finds the requested element. For example, id("p11")/descendant-or-self::PERSON refers to all PERSON children of the element with ID p11 as well as that element itself, since it is of type PERSON. There is no abbreviation for descendant-or-self.


The parent Axis

The parent axis refers to the node that's the immediate parent of the context node. For example, /descendant::HUSBAND[position()=1]/parent::* refers to the parent element of the first HUSBAND element in the document.

Without a node test the parent axis can be abbreviated by a .. as in //HUSBAND[position()=1]/...


The self Axis

The self axis refers to the context node. It's sometimes useful when making relative links. For example, /self::node() selects the root node of the document (which is not the same as the root element of the document; that would be selected by /child::* or, in this example, /child::FAMILYTREE.) It can abbreviated by a single period. However, this axis is rarely used in XPointers. It's more useful for XSLT select expressions.


The ancestor Axis

The ancestor axis selects all nodes that contain the context node, starting with its parent. For example, /descendant::BORN[position()=2]/ancestor::*[position()=1] selects the element which contains the second BORN element. In this example, it selects Elodie Bellau's PERSON element. There's no abbreviation for the ancestor axis.


The ancestor-or-self Axis

The ancestor-or-self axis selects the context node and all nodes that contain it. For example, id("p1")/ancestor-or-self::* selects a node set including Domeniquette Celeste Baudean's PERSON element that has ID p1, its parent, the FAMILYTREE element, and its parent, the root node. There's also no abbreviation for the ancestor-or-self axis.


The preceding Axis

The preceding axis selects all elements that occur before the context node. The preceding axis has no respect for hierarchy. The first time it encounters an element's start tag, end tag, or empty tag, it counts that element. For example, consider this rule:

/descendant::BORN[position()=3]/preceding::*[position()=5]

This says go to the third BORN element from the root, Louise Pauline Bellau's birthday, <BORN>29 Oct 1868</BORN>, and then move back five elements. This lands on Maria Bellau's PERSON element. There's no abbreviation for the preceding axis.


The following Axis

The following axis selects all elements that occur after the context node's closing tag. Like preceding, following has no respect for hierarchy. The first time it encounters an element's start tag or empty tag, it counts that element. For example, consider this rule:

/descendant::BORN[position()=2]/following::*[position()=5]

This says go to Elodie Bellau's birthday, <BORN>11 Feb 1858</BORN>, and then move forward five elements. This lands on John P. Muller's NAME element, <NAME>John P. Muller</NAME>, after passing through Elodie Bellau's DIED element, Elodie Bellau's SPOUSE element, Elodie Bellau's PERSON element, and John P. Muller's PERSON element, in this order. There's no abbreviation for the following axis.


The preceding-sibling Axis

The preceding-sibling axis selects elements that precede the context node in the same parent element. For example, /descendant::BORN[position()=2]/preceding-sibling::*[position()=1] selects Elodie Bellau's NAME element, <NAME>Elodie Bellau</NAME>. /descendant::BORN[position()=2]/preceding- sibling::*[position()=2] doesn’t point to anything because there's only one sibling of Elodie Bellau's BORN element before it. There's no abbreviation for the preceding-sibling axis.


The following-sibling Axis

The following-sibling axis selects elements that follow the context node in the same parent element. For example, /descendant::BORN[position()=2]/following-sibling::*[position()=1] selects Elodie Bellau's DIED element, <DIED>12 Apr 1898</DIED>. /descendant::BORN[position()=2]/following- sibling::*[position()=3] doesn't point to anything because there are only two sibling elements following Elodie Bellau's BORN element. There's no abbreviation for the following-sibling axis.


The attribute Axis

The attribute axis selects an attribute node contained by the context node. For example, the XPointer /descendant::SPOUSE/attribute::IDREF selects all IDREF attributes of all SPOUSE elements in the document. The attribute axis can be abbreviated by an @ sign. Thus //SPOUSE/@IDREF selects all IDREF attributes of all SPOUSE elements in the document. @* is a general abbreviation for an attribute with any name. Thus //SPOUSE/@* indicates all attributes of all SPOUSE elements.

For another example, to find all PERSON elements in the document http://www.theharolds.com/genealogy.xml whose FATHER attribute is Jean Francois Bellau (ID p2), you could write //PERSON[@FATHER="p2"].


Node Tests

Most of the time the node test part of the basis is simply an element name like PERSON or BORN. However, there are seven other possibilities:

  <CITATION CLASS="TURING" ID="C2">
    <AUTHOR>Turing, Alan M.</AUTHOR>
    "<TITLE>On Computable Numbers,
      With an Application to the Entscheidungs-problem</TITLE>"
    <JOURNAL>
      Proceedings of the London Mathematical Society</JOURNAL>,
    <SERIES>Series 2</SERIES>,
    <VOLUME>42</VOLUME>
    (<YEAR>1936</YEAR>):
    <PAGES>230-65</PAGES>.
  </CITATION>

The following XPointer refers to the quotation mark before the TITLE element.

 id("C2")/child::text()[position()=2] 

The first text node in this fragment is the whitespace between <CITATION CLASS="TURING" ID="C2"> and <AUTHOR>. Technically, this XPointer refers to all text between </AUTHOR> and <TITLE>, including the whitespace and not just the quotation mark.

Because character data does not contain any child nodes, most relative location steps may not be attached to an XPointer that selects a text node. The exception is the point() node test which will be discussed later.

The comment() node test specifically refers to comments. For example, this XPointer points to the third comment in the document:

/descendant::comment()[position()=3]

Because comments do not contain attributes or elements, you cannot add an additional child, descendant, or attribute relative location step after the first term that selects a comment.

Finally, the processing-instruction() node test selects any processing instructions that occur along the chosen axis. You can use it without any arguments to select any processing instructions, or with arguments to specify the particular processing instruction targets you want to select. For example, /descendant::processing-instruction() selects all processing instructions in the document. However, /descendant::processing-instruction(xml-stylesheet) only finds processing instructions that begin <?xml-stylesheet. /descendant::processing-instruction(php) only finds processing instructions intended for PHP. As with comments, because processing instructions do not contain attributes or elements, you cannot add an additional child, descendant, or attribute relative location term after the first term that selects a processing instruction.

The point() and range() mode tests refer to new ways of dividing an XML document. They will be discussed below.

Although the other node tests all end with parentheses, none of them except processing-instruction() actually take any arguments.


Predicates

These are just a small sampling of the selections that predicates make possible.


Boolean Conversion

XPath predicate expressions are ultimately converted to a boolean after all calculations are finished. Non-boolean results are converted as follows:

The predicate expression is evaluated for each node in the context node list. Each node for which the expression ultimately evaluates to false is removed from the list. Thus only those nodes that satisfy the predicate remain. I will not repeat here the discussion of the operators and functions available to use expressions. However, I will show you a few examples of predicates using the expression syntax as it's likely to be used in XPointers.


The position() function

Probably the function most frequently used in XPointer predicates is position(). This returns the index of the node in the context node list. This allows you to find the first, second, third, or other indexed node. You can compare positions using the various relational operators like <, >, =, !=, >=, and <=.

xpointer(/child::FAMILYTREExpointer(/child::*[position()=1])
xpointer(/child::FAMILYTREExpointer(/child::*[position()=2])
xpointer(/child::FAMILYTREExpointer(/child::*[position()=3])
xpointer(/child::FAMILYTREExpointer(/child::*[position()=4])
xpointer(/child::FAMILYTREExpointer(/child::*[position()=5])
xpointer(/child::FAMILYTREExpointer(/child::*[position()=6])
xpointer(/child::FAMILYTREExpointer(/child::*[position()=7])
xpointer(/child::FAMILYTREExpointer(/child::*[position()=8])
xpointer(/child::FAMILYTREExpointer(/child::*[position()=9])
xpointer(/child::FAMILYTREExpointer(/child::*[position()=10])
xpointer(/child::FAMILYTREExpointer(/child::*[position()=11])
xpointer(/child::FAMILYTREExpointer(/child::*[position()=12])
xpointer(/child::FAMILYTREExpointer(/child::*[position()=13])
xpointer(/child::FAMILYTREExpointer(/child::*[position()=14])

Identifying an element by its position

xpointer(/child::FAMILYTREE/child::*[1])
xpointer(/child::FAMILYTREE/child::*[2])
xpointer(/child::FAMILYTREE/child::*[3])
xpointer(/child::FAMILYTREE/child::*[4])
xpointer(/child::FAMILYTREE/child::*[5])
xpointer(/child::FAMILYTREE/child::*[6])
xpointer(/child::FAMILYTREE/child::*[7])
xpointer(/child::FAMILYTREE/child::*[8])
xpointer(/child::FAMILYTREE/child::*[9])
xpointer(/child::FAMILYTREE/child::*[10])
xpointer(/child::FAMILYTREE/child::*[11])
xpointer(/child::FAMILYTREE/child::*[12])
xpointer(/child::FAMILYTREE/child::*[13])
xpointer(/child::FAMILYTREE/child::*[14])

Functions that Return Node Sets

The last two, here() and origin() are XPointer extensions to XPath that are not available in XSLT.


id()

The id() function is one of the simplest and most robust means of identifying a element node. It selects the element in the document that has an ID type attribute with a specified value. For example, consider the URI http://www.theharolds.com/genealogy.xml#xpointer(id("p12")). If you look back at Listing 17-1, you find this element:

<PERSON ID="p12" FATHER="p2" MOTHER="p1">
  <NAME>Honore Bellau</NAME>
</PERSON>

Since ID pointers are so common and so useful, there's also a shortcut for this. If all you want to do is point to a particular element with a particular ID, you can skip all the xpointer(id("")) fru-fru and just use the bare ID after the # like this:

http://www.theharolds.com/genealogy.xml#p12

XPointers are evaluated from left to right. The first match found is returned, so the backup is only used if an ID type attribute with the value p12 can't be found.


here()

Consider a simple slide show. In this example, here()/following::SLIDE[1] refers to the next slide in the show. here()/preceding::SLIDE[1] refers to the previous slide in the show. Presumably this would be used in conjunction with a style sheet that showed one slide at a time.

<?xml version="1.0"?>
<SLIDESHOW xmlns:xlink="http://www.w3.org/1999/xlink">
  <SLIDE>
    <H1>Welcome to the slide show!</H1>
    <BUTTON xlink:type="simple"
            xlink:href="here()/following::SLIDE[1]">
      Next
    </BUTTON>
  </SLIDE>
  <SLIDE>
    <H1>This is the second slide</H1>
    <BUTTON xlink:type="simple" 
            xlink:href="here()/preceding::SLIDE[1]">
      Previous
    </BUTTON>
    <BUTTON xlink:type="simple" 
            xlink:href="here()/following::SLIDE[1]">
      Next
    </BUTTON>
  </SLIDE>
  <SLIDE>
    <H1>This is the second slide</H1>
    <BUTTON xlink:type="simple" 
            xlink:href="here()/preceding::SLIDE[1]">
      Previous
    </BUTTON>
    <BUTTON xlink:type="simple" 
           xlink:href="here()/following::SLIDE[1]">
      Next
    </BUTTON>
  </SLIDE>
  <SLIDE>
    <H1>This is the third slide</H1>
    <BUTTON xlink:type="simple" 
            xlink:href="here()/preceding::SLIDE[1]">
      Previous
    </BUTTON>
    <BUTTON xlink:type="simple" 
            xlink:href="here().following(1,SLIDE)">
      Next
    </BUTTON>
  </SLIDE>
  ...
  <SLIDE>
    <H1>This is the last slide</H1>
    <BUTTON xlink:type="simple"
            xlink:href="here()/preceding::SLIDE[1]">
      Previous
    </BUTTON>
  </SLIDE>

</SLIDESHOW>

Generally, the here() location term is only used in fully relative URIs in XLinks. If any URI part is included, it must be the same as the URI of the current document.


origin()

The origin() function is much the same as here(); that is, it refers to the source of a link. However, origin() is used in out-of-line links where the link is not actually present in the source document. It points to the element in the source document from which the user activated the link.


Points

<BORN>11 Feb 1858</BORN>

Every point is either between two nodes or between two characters in the parsed character data of a document. To make sense of this you have to remember that parsed character data is part of a text node. For instance, consider this very simple but well-formed XML document:

<GREETING>
  Hello
</GREETING>

Tree Structure

There are exactly three nodes and 13 distinct points in this document. In order the points are:

The exact details of the white space in the document are not considered here. XPointer collapses all runs of white space to a single space.


Point Expressions

A point is selected using an XPath expression that points at a node; then suffixing it with /point()[position()=n] where n is the index of the point following that node that you want. The index refers to the point before nth child element if the context node is an element or root node, or to the nth character of the string value of the node otherwise. For example, to select the point immediately before the D in Domeniquette Celeste Baudean's NAME element,

/child::FAMILYTREE/descendant::*[position()=1]/child::NAME/child::text()/point()[position()=0]

To select the point after the last e in Domeniquette, since there are 12 letters in Domeniquette,

/child::FAMILYTREE/descendant::*[position()=1]/child::NAME/child::text()/point()[position()=12]


Ranges

In some applications it may be important to specify a range across a document rather than a particular point in the document. For instance, the selection a user makes with a mouse is not necessarily going to match up with any one element or node. It may start in the middle of one paragraph, extend across a heading and a picture and then into the middle of another paragraph two pages down.

Any such contiguous area of a document can be described with a range. A range begins at one point and continues until another point. Each point is identified by a location path. If the starting path points to a node set rather than a point, then the first point in the location set the XPointer identifies is the start point. If the ending location path points to a node set rather than a point, then the last point in the location set the XPointer identifies is the end point of the range.


Range Expressions

To specify a range, you append /range-to(end-point) to a locaiton path specifying the start point of the range. The parentheses contain a location path specifying the endpoint of the range. For example, suppose you want to select everything between the first PERSON element and the last PERSON element

xpointer(/child::PERSON[position() = 1]/range-to(/child::PERSON[position() = last()]))


Range Functions

range(location-set)
returns returns a location set containing one range for each location in the argument. The range is the minimum range necessary to cover the entire location.
range-inside(location-set)
Returns a location set containing the interiors of each of the locations in the input.
start-point(location-set)
Returns a location set that contains one point representing the first point of each location in the input location set. For example, start-point(//PERSON[1]) Returns the point immediately before the first PERSON element. start-point(//PERSON) returns the set of points immediately before each PERSON element.
end-point(location-set)
The same as start-point() except that it returns the points immediately after each location in its input.

String Ranges

XPointer provides some very basic string matching capabilities through the string-range() function. This function takes as an argument a node set to search and a substring to search for. It returns a node set containing one range for each non-overlapping match to the string. You can also provide optional index and length arguments indicating how many characters after the match the range should start and how many characters after the start the range should continue. The basic syntax is:

string-range(node-set,substring,index,length)

The first node-set argument is an XPath expression specifying which part of the document to search for a matching string. The second substring argument is the actual string to search for. By default, the range returned starts before the first matched character and encompasses all the matched characters. However, the index argument can give a positive number to start after the start of the match. For instance, setting it to 2 would indicate that the range starts after the first matched character. The length argument can specify how many characters to include in the range.

A string range points to an occurrence of a specified string, or a substring of a given string in the text (not markup) of the document. For example, this XPointer finds all occurrences of the string "Harold":

xpointer(string-range(/,"Harold"))

You can change the first argument to specify what nodes you want to look in. For example, this XPointer finds all occurrences of the string "Harold" in NAME elements:

xpointer(string-range(//NAME,"Harold"))

String ranges may have node tests. Thus this XPointer finds only the first occurrence of the string "Harold" in the document:

xpointer(string-range(/,"Harold")[position()=1])

This targets the position immediately preceding the word Harold in Charles Walter Harold's NAME element. This is not the same as pointing at the entire NAME element as an element-based selector would do.

A third numeric argument targets a particular position in the string. For example, this targets the point immediately following the first occurrence of the string "Harold" because Harold has six letters:

xpointer(string-range(/,"Harold",6)[position()=1])

An optional fourth argument specifies the number of characters to select. For example, this URI selects the "old" from the first occurrence of the entire string "Harold":

xpointer(string-range(/,"Harold",4,3)[position()=1])

If the first string argument in the node test is the empty string, then relevant positions in the context node's text contents will be selected. For example, the following XPointer targets the first six characters of the document's parsed character data:

xpointer(/string::"",1,6[position()=1])

For another example, let's suppose you want to find the year of birth for all people born in the nineteenth century. The following will accomplish that:

xpointer(string-range(//BORN, "18", 2, 4)

This says to look in all BORN elements for the string " 18". (The initial space is important to avoid accidentally matching someone born in 1918 or on the 18th day of the month.) When it's found move one character ahead (to skip the space) and return a range covering the next four characters.

When matching strings, case is considered. All white space is condensed to a single space. Markup characters are ignored.


Child Sequences

A child sequence is a shortcut for XPointers exemplified by the second example above; that is, an XPointer that consists of nothing but a series of child relative location steps counting down from the root node, each of which selects a particular child by position only. The shortcut is to use only the position number and the slashes that separate individual elements from each other, like this:

http://www.theharolds.com/genealogy.xml#/1/4

/1/4 is a child sequence that says to select the fourth child element of the first child element of the root. This syntax can be extended for any depth of child elements. For example these two URIs point to John P. Muller's NAME and SPOUSE elements respectively:

http://www.theharolds.com/genealogy.xml#/1/4/1
http://www.theharolds.com/genealogy.xml#/1/4/2

Child sequences may include an initial ID. In that case the counting begins from the element with that ID rather than from the root. For example, John P. Muller's PERSON element has an ID attribute with the value p4. Consequently the XPointer p4/1 points to his NAME element and p4/2 points to his SPOUSE element.

Each child sequence always points to a single element. You cannot use child sequences with any other relative location steps. You cannot use them to select elements of a particular type. You cannot use them to select attribute or strings. You can only use them to select a single element by its relative location in the tree.


Summary


To Learn More



Questions?


Part VII: The Oracle Speaks, Predictions for the Future


XInclude succeeds once parsers support it


JDOM succeeds, much to the consternation of the W3C


Schemas, a partial success


XLinks


XPointers; the same story


Stuff we didn't talk about


XSLT 1.1


XSL-FO


DOM Level III


XHTML Fails


XML Query Languages


Schema Repositories all fail


MathML succeeds


SVG Takes Off in 2001


Browser Support


Invent the Future!

The best way to predict the future is to invent it.
--Alan Kay


To Learn More


Questions?


Index | Cafe con Leche

Copyright 2000 Elliotte Rusty Harold
elharo@metalab.unc.edu
Last Modified November 7, 2000