JDOM

Intro to JDOM

Elliotte Rusty Harold

New York XML Users Group

Tuesday, August 8, 2000

elharo@metalab.unc.edu

http://metalab.unc.edu/xml/

Where we're going

A Very Brief Review of XML Rules and Terminology
Writing XML with JDOM
Reading XML through JDOM (plus a little SAX)
The JDOM Classes

Prerequisites

Are familiar with Java including I/O, classes, objects, polymorphism, collections API, etc.
Know XML including well-formedness, validity, namespaces, and so forth

A simple example

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/css" href="song.css"?>
<!DOCTYPE SONG SYSTEM "song.dtd">
<SONG xmlns="http://metalab.unc.edu/xml/namespace/song"
      xmlns:xlink="http://www.w3.org/1999/xlink">
  <TITLE>Hot Cop</TITLE>
  <PHOTO 
    xlink:type="simple" xlink:show="onLoad" xlink:href="hotcop.jpg"
    ALT="Victor Willis in Cop Outfit" WIDTH="100" HEIGHT="200"/>
  <COMPOSER>Jacques Morali</COMPOSER>
  <COMPOSER>Henri Belolo</COMPOSER>
  <COMPOSER>Victor Willis</COMPOSER>
  <PRODUCER>Jacques Morali</PRODUCER>
  <!-- The publisher is actually Polygram but I needed 
       an example of a general entity reference. -->
  <PUBLISHER xlink:type="simple" xlink:href="http://www.amrecords.com/">
    A &amp; M Records
  </PUBLISHER>
  <LENGTH>6:20</LENGTH>
  <YEAR>1978</YEAR>
  <ARTIST>Village People</ARTIST>
</SONG>
<!-- You can tell what album I was 
     listening to when I wrote this example -->

View in Browser

Document Type Definition

<!ELEMENT SONG (TITLE, PHOTO?, COMPOSER+, PRODUCER*, 
 PUBLISHER*, LENGTH?, YEAR?, ARTIST+)>
<!ATTLIST SONG xmlns       CDATA #REQUIRED
               xmlns:xlink CDATA #REQUIRED>
<!ELEMENT TITLE (#PCDATA)>

<!ELEMENT PHOTO EMPTY>
<!ATTLIST PHOTO xlink:type CDATA #FIXED "simple"
                xlink:href CDATA #REQUIRED
                xlink:show CDATA #IMPLIED
                ALT        CDATA #REQUIRED
                WIDTH      CDATA #REQUIRED
                HEIGHT     CDATA #REQUIRED
>

<!ELEMENT COMPOSER (#PCDATA)>
<!ELEMENT PRODUCER (#PCDATA)>
<!ELEMENT PUBLISHER (#PCDATA)>
<!ATTLIST PUBLISHER xlink:type CDATA #IMPLIED
                    xlink:href CDATA #IMPLIED
>

<!ELEMENT LENGTH (#PCDATA)>
<!-- This should be a four digit year like "1999",
     not a two-digit year like "99" -->
<!ELEMENT YEAR (#PCDATA)>

<!ELEMENT ARTIST (#PCDATA)>

Trees

An XML document is a tree.
It has a root.
It has nodes.
It is amenable to recursive processing.
Not all applications agree on what the root is.
Not all applications agree on what is and isn't a node.

Processing XML with Java is easy

You need a JDK
You need some free class libraries
You need a text editor
You need some data to process

What is JDOM?

A Pure Java API for reading and writing XML Documents
A Java-oriented API for reading and writing XML Documents
A tree-oriented API for reading and writing XML Documents
A parser independent API for reading and writing XML Documents

About JDOM

Created by Brett McLaughlin and Jason Hunter
Open source with an Apache-like license
http://www.jdom.org/

JDOM versions

1.0 Beta 4 is current tarball from June 9, 2000
Last two months have significantly changed the API
This presentation is based on the August 4, 2000 version
cvs.jdom.org

Four packages:

org.jdom: the classes that represent an XML document and its parts
org.jdom.input: classes for reading a document into memory
org.jdom.output: classes for writing a document onto a stream or other target (e.g. SAX or DOM app)
org.jdom.adapters: classes for hooking up to DOM implementations

The org.jdom package

The classes that represent an XML document and its parts

Attribute
Comment
DocType
Document
Element
Entity
ProcessingInstruction
plus Verifier
plus assorted exceptions

The org.jdom.input package

Classes for reading a document into memory from a file or other source

DOMBuilder
SAXBuilder

The org.jdom.output package

The classes for writing a document to a file or other target

XMLOutputter
SAXOutputter (under development)
DOMOutputter (under development)

The org.jdom.adapters package

Classes for hooking up JDOM to DOM implementations:
- AbstractDOMAdapter
- OracleV1DOMAdapter
- OracleV2DOMAdapter
- ProjectXDOMAdapter
- XercesDOMAdapter
- XercesDOMAdapter
You rarely need to access these directly.

Part II: Writing XML Documents with JDOM

JDOM is for both input and output
New documents can be read from a stream or constructed in memory
An org.jdom.output.XMLOutputter sends a document from memory to an OutputStream or Writer
A JDOM document can also be sent to a SAX ContentHandler or DOM org.w3c.dom.Document for further processing with a different API

A JDOM program that writes this XML document

<?xml version="1.0"?>
<GREETING>
  Hello JDOM!
</GREETING>

Hello JDOM

import org.jdom.*;
import org.jdom.output.XMLOutputter;


public class HelloJDOM {

  public static void main(String[] args) {
   
    Element root = new Element("GREETING");
    	
    root.setText("Hello JDOM!");
         
    Document doc = new Document(root);      
    
    // At this point the document only exists in memory.
    // We still need to serialize it
    XMLOutputter outputter = new XMLOutputter();
    try {
      outputter.output(doc, System.out);       
    }
    catch (Exception e) {
      System.err.println(e);
    }

  }

}

Actual Output

<?xml version="1.0" encoding="UTF-8"?>
<GREETING>Hello JDOM!</GREETING>

This is more or less what we wanted, modulo white space.

Hello DOM

Here's the same program using DOM instead of JDOM. Which is simpler?

import java.io.*;
import org.w3c.dom.*;
import org.apache.xerces.dom.*;
import org.apache.xml.serialize.*; 


public class HelloDOM {

  public static void main(String[] args) {
   
    try {
      
      DOMImplementationImpl impl = (DOMImplementationImpl) 
       DOMImplementationImpl.getDOMImplementation();
       
      DocumentType type = impl.createDocumentType("GREETING", null, null);
      
      // type is supposed to be able to be null, 
      // but in practice that didn't work                     
      DocumentImpl hello 
       = (DocumentImpl) impl.createDocument(null, "GREETING", type);
      
      Element root = hello.createElement("GREETING");
      
      // We can't use a raw string. Instead we have to first create 
      // a text node.
      Text text = hello.createTextNode("Hello DOM!");
      root.appendChild(text);
      
      // Now that the document is created we need to *serialize* it
      try {
        OutputFormat format = new OutputFormat(hello);
        XMLSerializer serializer = new XMLSerializer(System.out, format);
        serializer.serialize(root);
      }
      catch (IOException e) {
        System.err.println(e); 
      }
    }
    catch (DOMException e) {
      e.printStackTrace();
    }

  }

}

A Java program that writes Fibonacci numbers into a text file

import java.math.*;
import java.io.*;


public class FibonacciText {

  public static void main(String[] args) {
   
    try {
      FileOutputStream fout = new FileOutputStream("fibonacci.txt");
      OutputStreamWriter out = new OutputStreamWriter(fout, "8859_1");      
      
      BigInteger low = BigInteger.ZERO;
      BigInteger high = BigInteger.ONE;      
      
      int i = 0;  
      do {
        out.write(low.toString() + "\r\n");
        BigInteger temp = high;
        high = high.add(low);
        low = temp;
        i++;
      }
      while (i <= 25);
      out.write(high.toString() + "\r\n");
     
      out.close();
    }
    catch (IOException e) {
      System.err.println(e);
    }

  }

}

fibonacci.txt

fibonacci.xml

Suppose we want that data in an XML document that looks something like this:

<?xml version="1.0" encoding="UTF-8"?>
<Fibonacci_Numbers>
  <fibonacci index="0">0</fibonacci>
  <fibonacci index="1">1</fibonacci>
  <fibonacci index="2">1</fibonacci>
  <fibonacci index="3">2</fibonacci>
  <fibonacci index="4">3</fibonacci>
  <fibonacci index="5">5</fibonacci>
  <fibonacci index="6">8</fibonacci>
  <fibonacci index="7">13</fibonacci>
  <fibonacci index="8">21</fibonacci>
  <fibonacci index="9">34</fibonacci>
  <fibonacci index="10">55</fibonacci>
  <fibonacci index="11">89</fibonacci>
  <fibonacci index="12">144</fibonacci>
  <fibonacci index="13">233</fibonacci>
  <fibonacci index="14">377</fibonacci>
  <fibonacci index="15">610</fibonacci>
  <fibonacci index="16">987</fibonacci>
  <fibonacci index="17">1597</fibonacci>
  <fibonacci index="18">2584</fibonacci>
  <fibonacci index="19">4181</fibonacci>
  <fibonacci index="20">6765</fibonacci>
  <fibonacci index="21">10946</fibonacci>
  <fibonacci index="22">17711</fibonacci>
  <fibonacci index="23">28657</fibonacci>
  <fibonacci index="24">46368</fibonacci>
  <fibonacci index="25">75025</fibonacci>
</Fibonacci_Numbers>

A JDOM program that writes Fibonacci numbers into an XML file

import org.jdom.Element;
import org.jdom.Document;
import org.jdom.output.XMLOutputter;
import java.math.BigInteger;
import java.io.*;


public class FibonacciJDOM {

  public static void main(String[] args) {
   
    Element root = new Element("Fibonacci_Numbers");	
  	      
    BigInteger low  = BigInteger.ZERO;
    BigInteger high = BigInteger.ONE;      
    
    for (int i = 0; i <= 25; i++) {
      Element fibonacci = new Element("fibonacci");
      fibonacci.addAttribute("index", String.valueOf(i));
      fibonacci.setText(low.toString());
      BigInteger temp = high;
      high = high.add(low);
      low = temp;
      root.addContent(fibonacci);
    }
 
    Document doc = new Document(root);
    // serialize it into a file
    try {
      FileOutputStream out = new FileOutputStream("fibonacci.xml");
      XMLOutputter serializer = new XMLOutputter(); 
      serializer.output(doc, out);
      out.flush();	
      out.close();
    }
    catch (IOException e) {
      System.err.println(e);
    }

  }

}

A DOM program that writes Fibonacci numbers into an XML file

Here's the same program using DOM instead of JDOM. Which is simpler?

import java.math.*;
import java.io.*;
import org.w3c.dom.*;
import org.apache.xerces.dom.*;
import org.apache.xml.serialize.*; 


public class FibonacciDOM {

  public static void main(String[] args) {
   
    try {
      
      DOMImplementationImpl impl = (DOMImplementationImpl) 
       DOMImplementationImpl.getDOMImplementation();
       
      DocumentType type = impl.createDocumentType("Fibonacci_Numbers",
       null, null);
      
      // type is supposed to be able to be null, 
      // but in practice that didn't work                     
      DocumentImpl fibonacci 
       = (DocumentImpl) impl.createDocument(null, "Fibonacci_Numbers", type);
      
      BigInteger low  = BigInteger.ZERO;
      BigInteger high = BigInteger.ONE;      
      
      Element root = fibonacci.createElement("Fibonacci_Numbers");
      // This not only creates the element; it also makes it the
      // root element of the document. 

      for (int i = 0; i <= 25; i++) {
        Element number = fibonacci.createElement("fibonacci");
        number.setAttribute("index", Integer.toString(i));
        Text text = fibonacci.createTextNode(low.toString());
        number.appendChild(text);
        root.appendChild(number);
        BigInteger temp = high;
        high = high.add(low);
        low = temp;
      }
      
      try {
        // Now that the document is created we need to *serialize* it
        FileOutputStream out = new FileOutputStream("fibonacci_8859_1.xml");
        OutputFormat format = new OutputFormat(fibonacci);
        XMLSerializer serializer = new XMLSerializer(out, format);
        serializer.serialize(root);
        out.flush();
        out.close();
      }
      catch (IOException e) {
        System.err.println(e); 
      }
    }
    catch (DOMException e) {
      e.printStackTrace();
    }

  }

}

62 lines vs. 42 lines

Changing Encoding

Default output encoding is UTF-8
Can change this in several ways using methods of XMLOutputter

import java.math.*;
import java.io.*;
import org.jdom.*;
import org.jdom.output.XMLOutputter;


public class FibonacciJDOMLatin1 {

  public static void main(String[] args) {
   
    Element root = new Element("Fibonacci_Numbers");	
  	      
    BigInteger low  = BigInteger.ZERO;
    BigInteger high = BigInteger.ONE;      
    
    for (int i = 0; i <= 25; i++) {
      Element fibonacci = new Element("fibonacci");
      Attribute index = new Attribute("index", String.valueOf(i));
      fibonacci.addAttribute(index);
      fibonacci.setText(low.toString());
      BigInteger temp = high;
      high = high.add(low);
      low  = temp;
      root.addContent(fibonacci);
    }
 
    Document doc = new Document(root);
    // serialize it into a file
    try {
      FileOutputStream out = new FileOutputStream("fibonacci_8859_1.xml");
      XMLOutputter serializer = new XMLOutputter(); 
      serializer.output(doc, out, "ISO-8859-1");
      out.flush();	
      out.close();
    }
    catch (IOException e) {
      System.err.println(e);
    }

  }

}

Suppose you want to include a DTD

Suppose we have this DTD at the relative URL fibonacci.dtd:

<!ELEMENT Fibonacci_Numbers (fibonacci*)>
<!ELEMENT fibonacci (#PCDATA)>
<!ATTLIST fibonacci index CDATA #IMPLIED>

We need this DOCTYPE declaration:

<!DOCTYPE Fibonacci_Numbers SYSTEM "fibonacci.dtd">

ValidFibonacci

Use the DocType class to insert a document type declaration
JDOM does not support internal DTD subsets.
JDOM does not let you output a DTD.

import java.math.*;
import java.io.*;
import org.jdom.*;
import org.jdom.output.XMLOutputter;


public class ValidFibonacci {

  public static void main(String[] args) {
   
    Element root = new Element("Fibonacci_Numbers");	
  	      
    BigInteger low  = BigInteger.ZERO;
    BigInteger high = BigInteger.ONE;      
    
    for (int i = 0; i <= 25; i++) {
      Element fibonacci = new Element("fibonacci");
      Attribute index = new Attribute("index", String.valueOf(i));
      fibonacci.addAttribute(index);
      fibonacci.setText(low.toString());
      BigInteger temp = high;
      high = high.add(low);
      low = temp;
      root.addContent(fibonacci);
    }
 
    DocType type = new DocType("Fibonacci_Numbers", "fibonacci.dtd");
 
    Document doc = new Document(root, type);
    // serialize it into a file
    try {
      FileOutputStream out = new FileOutputStream("validfibonacci.xml");
      XMLOutputter serializer = new XMLOutputter(); 
      serializer.output(doc, out);
      out.flush();	
      out.close();
    }
    catch (IOException e) {
      System.err.println(e);
    }

  }

}

valid_fibonacci.xml

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE Fibonacci_Numbers SYSTEM "fibonacci.dtd">
<Fibonacci_Numbers>
  <fibonacci index="0">0</fibonacci>
  <fibonacci index="1">1</fibonacci>
  <fibonacci index="2">1</fibonacci>
  <fibonacci index="3">2</fibonacci>
  <fibonacci index="4">3</fibonacci>
  <fibonacci index="5">5</fibonacci>
  <fibonacci index="6">8</fibonacci>
  <fibonacci index="7">13</fibonacci>
  <fibonacci index="8">21</fibonacci>
  <fibonacci index="9">34</fibonacci>
  <fibonacci index="10">55</fibonacci>
  <fibonacci index="11">89</fibonacci>
  <fibonacci index="12">144</fibonacci>
  <fibonacci index="13">233</fibonacci>
  <fibonacci index="14">377</fibonacci>
  <fibonacci index="15">610</fibonacci>
  <fibonacci index="16">987</fibonacci>
  <fibonacci index="17">1597</fibonacci>
  <fibonacci index="18">2584</fibonacci>
  <fibonacci index="19">4181</fibonacci>
  <fibonacci index="20">6765</fibonacci>
  <fibonacci index="21">10946</fibonacci>
  <fibonacci index="22">17711</fibonacci>
  <fibonacci index="23">28657</fibonacci>
  <fibonacci index="24">46368</fibonacci>
  <fibonacci index="25">75025</fibonacci>
</Fibonacci_Numbers>

Using Namespaces

Suppose we want some MathML like this:

<?xml version="1.0" encoding="UTF-8"?>
<mathml:math xmlns:mathml="http://www.w3.org/1998/Math/MathML">
  <mathml:mrow>
    <mathml:mi>f(0)</mathml:mi>
    <mathml:mo>=</mathml:mo>
    <mathml:mn>0</mathml:mn>
  </mathml:mrow>
  <mathml:mrow>
    <mathml:mi>f(1)</mathml:mi>
    <mathml:mo>=</mathml:mo>
    <mathml:mn>1</mathml:mn>
  </mathml:mrow>
  <mathml:mrow>
    <mathml:mi>f(2)</mathml:mi>
    <mathml:mo>=</mathml:mo>
    <mathml:mn>1</mathml:mn>
  </mathml:mrow>
</mathml:math>

Do not use the qualified names like mathml:mn.
Instead use the prefixes mathml, local names like mn, and URIs like http://www.w3.org/1998/Math/MathML to create the elements.
Do not include xmlns attributes like xmlns:mathml="http://www.w3.org/1998/Math/MathML".
XMLOutputter will decide where to put the xmlns attributes when the document is serialized.

With Namespace Prefixes

import org.jdom.Element;
import org.jdom.Document;
import org.jdom.output.XMLOutputter;
import java.math.BigInteger;
import java.io.*;


public class PrefixedFibonacci {

  public static void main(String[] args) {
   
    Element root = new Element("math", "mathml", "http://www.w3.org/1998/Math/MathML");	
  	      
    BigInteger low  = BigInteger.ZERO;
    BigInteger high = BigInteger.ONE;      
    
    for (int i = 0; i <= 25; i++) {
        
      Element mrow = new Element("mrow", "mathml", "http://www.w3.org/1998/Math/MathML");
      
      Element mi = new Element("mi", "mathml", "http://www.w3.org/1998/Math/MathML");
      mi.setText("f(" + i + ")"); 
      mrow.addContent(mi);
      
      Element mo = new Element("mo", "mathml", "http://www.w3.org/1998/Math/MathML");
      mo.setText("="); 
      mrow.addContent(mo);
      
      Element mn = new Element("mn", "mathml", "http://www.w3.org/1998/Math/MathML");
      mn.setText(low.toString());
      mrow.addContent(mn);

      BigInteger temp = high;
      high = high.add(low);
      low = temp;
      root.addContent(mrow);
      
    }
 
    Document doc = new Document(root);
    // serialize it into a file
    try {
      FileOutputStream out = new FileOutputStream("prefixed_fibonacci.xml");
      XMLOutputter serializer = new XMLOutputter(); 
      serializer.output(doc, out);
      out.flush();	
      out.close();
    }
    catch (IOException e) {
      System.err.println(e);
    }

  }

}

View Output in Browser

The Default, Unprefixed Namespace

Suppose you want some MathML like this:

<?xml version="1.0" encoding="UTF-8"?>
<math xmlns="http://www.w3.org/1998/Math/MathML">
  <mrow>
    <mi>f(0)</mi>
    <mo>=</mo>
    <mn>0</mn>
  </mrow>
  <mrow>
    <mi>f(1)</mi>
    <mo>=</mo>
    <mn>1</mn>
  </mrow>
  <mrow>
    <mi>f(2)</mi>
    <mo>=</mo>
    <mn>1</mn>
  </mrow>
</math>

Do not use the local names like mn.
Instead use the local names like mn, and URIs like http://www.w3.org/1998/Math/MathML to create the elements.
Do not include xmlns attributes like xmlns="http://www.w3.org/1998/Math/MathML".
XMLOutputter will decide where to put the xmlns attribute when the document is serialized.

With Default Namespace

import org.jdom.Element;
import org.jdom.Document;
import org.jdom.output.XMLOutputter;
import java.math.BigInteger;
import java.io.*;


public class UnprefixedFibonacci {

  public static void main(String[] args) {
   
    Element root = new Element("math", "http://www.w3.org/1998/Math/MathML");	
  	      
    BigInteger low  = BigInteger.ZERO;
    BigInteger high = BigInteger.ONE;      
    
    for (int i = 0; i <= 25; i++) {
        
      Element mrow = new Element("mrow", "http://www.w3.org/1998/Math/MathML");
      
      Element mi = new Element("mi", "http://www.w3.org/1998/Math/MathML");
      mi.setText("f(" + i + ")"); 
      mrow.addContent(mi);
      
      Element mo = new Element("mo", "http://www.w3.org/1998/Math/MathML");
      mo.setText("="); 
      mrow.addContent(mo);
      
      Element mn = new Element("mn", "http://www.w3.org/1998/Math/MathML");
      mn.setText(low.toString());
      mrow.addContent(mn);

      BigInteger temp = high;
      high = high.add(low);
      low = temp;
      root.addContent(mrow);
      
    }
 
    Document doc = new Document(root);
    // serialize it into a file
    try {
      FileOutputStream out = new FileOutputStream("unprefixed_fibonacci.xml");
      XMLOutputter serializer = new XMLOutputter(); 
      serializer.output(doc, out);
      out.flush();	
      out.close();
    }
    catch (IOException e) {
      System.err.println(e);
    }

  }

}

View Output in Browser

Converting data to XML

Sample Tab Delimited Data: Baseball Statistics




Surname FirstName Team Position Games Played Games Started AtBats Runs Hits Doubles Triples Home runs RBI Stolen Bases Caught Stealing Sacrifice Hits Sacrifice Flies Errors PB Walks Strike outs Hit by pitch 


Anderson Garret ANA Outfield 156 151 622 62 183 41 7 15 79 8 3 3 3 6 0 29 80 1 


Baughman Justin ANA Second Base 62 54 196 24 50 9 1 1 20 10 4 5 3 8 0 6 36 1 


Bolick Frank ANA Third Base 21 11 45 3 7 2 0 1 2 0 0 0 0 0 0 11 8 0 


Disarcina Gary ANA Shortstop 157 155 551 73 158 39 3 3 56 12 7 12 3 14 0 21 51 8 


Edmonds Jim ANA Outfield 154 150 599 115 184 42 1 25 91 7 5 1 1 5 0 57 114 1 


Erstad Darin ANA Outfield 133 129 537 84 159 39 3 19 82 20 6 1 3 3 0 43 77 6 


Garcia Carlos ANA Second Base 19 10 35 4 5 1 0 0 0 2 0 1 0 1 0 3 11 1 


Glaus Troy ANA Third Base 48 45 165 19 36 9 0 1 23 1 0 0 2 7 0 15 51 0 


Greene Todd ANA Outfield 29 15 71 3 18 4 0 1 7 0 0 0 0 0 0 2 20 0 


Helfand Eric ANA Catcher 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 


Hollins Dave ANA Third Base 101 98 363 60 88 16 2 11 39 11 3 2 2 17 0 44 69 7 


Jefferies Gregg ANA Outfield 19 18 72 7 25 6 0 1 10 1 0 0 0 0 0 0 5 0 


Johnson Mark ANA First Base 10 2 14 1 1 0 0 0 0 0 0 0 0 0 0 0 6 0 


Kreuter Chad ANA Catcher 96 74 252 27 63 10 1 2 33 1 0 5 1 9 5 33 49 3 


Martin Norberto ANA Second Base 79 50 195 20 42 2 0 1 13 3 1 3 2 4 0 6 29 0 


Mashore Damon ANA Outfield 43 24 98 13 23 6 0 2 11 1 0 1 0 0 0 9 22 3 


Molina Ben ANA Catcher 2 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 


Nevin Phil ANA Catcher 75 65 237 27 54 8 1 8 27 0 0 0 2 5 20 17 67 5 


Obrien Charlie ANA Catcher 62 58 175 13 45 9 0 4 18 0 0 3 3 4 1 10 33 2 


Palmeiro Orlando ANA Outfield 74 34 165 28 53 7 2 0 21 5 4 7 0 0 0 20 11 0 


Pritchett Chris ANA First Base 31 19 80 12 23 2 1 2 8 2 0 0 0 1 0 4 16 0 


Salmon Tim ANA Designated Hitter 136 130 463 84 139 28 1 26 88 0 1 0 10 2 0 90 100 3 


Shipley Craig ANA Third Base 77 32 147 18 38 7 1 2 17 0 4 4 1 3 0 5 22 5 


Velarde Randy ANA Second Base 51 50 188 29 49 13 1 4 26 7 2 0 1 4 0 34 42 1 


Walbeck Matt ANA Catcher 108 91 338 41 87 15 2 6 46 1 1 5 5 7 8 30 68 2 


Williams Reggie ANA Outfield 29 7 36 7 13 1 0 1 5 3 3 1 0 0 0 7 11 1

A Program to convert tab delimited data to XML

import java.io.*;
import org.jdom.*;
import org.jdom.output.XMLOutputter;


public class BaseballTabToXML {

  public static void main(String[] args) {
     
    Element root = new Element("players");
    
    try {
      FileInputStream fin = new FileInputStream(args[0]);
      BufferedReader in 
       = new BufferedReader(new InputStreamReader(fin));    

      String playerStats;  
      while ((playerStats = in.readLine()) != null) {
        String[] stats = splitLine(playerStats);
        
        Element player = new Element("player");

        Element first_name = new Element("first_name");
        first_name.setText(stats[1]);
        player.addContent(first_name);
        
        Element surname = new Element("surname");
        surname.setText(stats[0]);
        player.addContent(surname);
       
        Element games_played = new Element("games_played");
        games_played.setText(stats[4]);
        player.addContent(games_played);
       
        Element at_bats = new Element("at_bats");
        at_bats.setText(stats[6]);
        player.addContent(at_bats);
       
        Element runs = new Element("runs");
        runs.setText(stats[7]);
        player.addContent(runs);
       
        Element hits = new Element("hits");
        hits.setText(stats[8]);
        player.addContent(hits);
       
        Element doubles = new Element("doubles");
        doubles.setText(stats[9]);
        player.addContent(doubles);
       
        Element triples = new Element("triples");
        triples.setText(stats[10]);
        player.addContent(triples); 

        Element home_runs = new Element("home_runs");
        home_runs.setText(stats[11]);
        player.addContent(home_runs); 

        Element runs_batted_in = new Element("runs_batted_in");
        runs_batted_in.setText(stats[12]);
        player.addContent(runs_batted_in); 

        Element stolen_bases = new Element("stolen_bases");
        stolen_bases.setText(stats[13]);
        player.addContent(stolen_bases); 

        Element caught_stealing = new Element("caught_stealing");
        caught_stealing.setText(stats[14]);
        player.addContent(caught_stealing); 

        Element sacrifice_hits = new Element("sacrifice_hits");
        sacrifice_hits.setText(stats[15]);
        player.addContent(sacrifice_hits); 

        Element sacrifice_flies = new Element("sacrifice_flies");
        sacrifice_flies.setText(stats[16]);
        player.addContent(sacrifice_flies); 

        Element errors = new Element("errors");
        errors.setText(stats[17]);
        player.addContent(errors); 

        Element passed_by_ball = new Element("passed_by_ball");
        passed_by_ball.setText(stats[18]);
        player.addContent(passed_by_ball); 

        Element walks = new Element("walks");
        walks.setText(stats[19]);
        player.addContent(walks); 

        Element strike_outs = new Element("strike_outs");
        strike_outs.setText(stats[20]);
        player.addContent(strike_outs); 

        Element hit_by_pitch = new Element("hit_by_pitch");
        hit_by_pitch.setText(stats[21]);
        player.addContent(hit_by_pitch); 
        
        root.addContent(player);
      }  
      
      Document doc = new Document(root);
      // serialize it into a file
      FileOutputStream fout 
       = new FileOutputStream("baseballstats.xml");
      
      XMLOutputter serializer = new XMLOutputter(); 
      serializer.output(doc, fout);
      fout.flush();	
      fout.close();
      in.close();
      
    }
    catch (IOException e) {
      System.err.println(e);
    }
    catch (ArrayIndexOutOfBoundsException e) {
      System.out.println("Usage: java BaseballTabToXML input_file.tab");
    }

  }

  public static String[] splitLine(String playerStats) {
    
    // count the number of tabs
    int numTabs = 0;
    for (int i = 0; i < playerStats.length(); i++) {
      if (playerStats.charAt(i) == '\t') numTabs++;
    }
    int numFields = numTabs + 1;
    String[] fields = new String[numFields];
    int position = 0;
    for (int i = 0; i < numFields; i++) {
      StringBuffer field = new StringBuffer();
      while (position < playerStats.length() 
       && playerStats.charAt(position++) != '\t') {
        field.append(playerStats.charAt(position-1));
      }
      fields[i] = field.toString();
    }    
    return fields;
    
  }

}

Baseball Stats in XML

<?xml version="1.0"?>
<players>
  <player>
    <first_name>FirstName</first_name>
    <surname>Surname</surname>
    <games_played>Games Played</games_played>
    <at_bats>AtBats</at_bats>
    <runs>Runs</runs>
    <hits>Hits</hits>
    <doubles>Doubles</doubles>
    <triples>Triples</triples>
    <home_runs>Home runs</home_runs>
    <stolen_bases>RBI</stolen_bases>
    <caught_stealing>Caught Stealing</caught_stealing>
    <sacrifice_hits>Sacrifice Hits</sacrifice_hits>
    <sacrifice_flies>Sacrifice Flies</sacrifice_flies>
    <errors>Errors</errors>
    <passed_by_ball>PB</passed_by_ball>
    <walks>Walks</walks>
    <strike_outs>Strike outs</strike_outs>
    <hit_by_pitch>Hit by pitch</hit_by_pitch>
  </player>
  <player>
    <first_name>Garret </first_name>
    <surname>Anderson</surname>
    <games_played>156</games_played>
    <at_bats>622</at_bats>
    <runs>62</runs>
    <hits>183</hits>
    <doubles>41</doubles>
    <triples>7</triples>
    <home_runs>15</home_runs>
    <stolen_bases>79</stolen_bases>
    <caught_stealing>3</caught_stealing>
    <sacrifice_hits>3</sacrifice_hits>
    <sacrifice_flies>3</sacrifice_flies>
    <errors>6</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>29</walks>
    <strike_outs>80</strike_outs>
    <hit_by_pitch>1</hit_by_pitch>
  </player>
  <player>
    <first_name>Justin </first_name>
    <surname>Baughman</surname>
    <games_played>62</games_played>
    <at_bats>196</at_bats>
    <runs>24</runs>
    <hits>50</hits>
    <doubles>9</doubles>
    <triples>1</triples>
    <home_runs>1</home_runs>
    <stolen_bases>20</stolen_bases>
    <caught_stealing>4</caught_stealing>
    <sacrifice_hits>5</sacrifice_hits>
    <sacrifice_flies>3</sacrifice_flies>
    <errors>8</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>6</walks>
    <strike_outs>36</strike_outs>
    <hit_by_pitch>1</hit_by_pitch>
  </player>
  <player>
    <first_name>Frank </first_name>
    <surname>Bolick</surname>
    <games_played>21</games_played>
    <at_bats>45</at_bats>
    <runs>3</runs>
    <hits>7</hits>
    <doubles>2</doubles>
    <triples>0</triples>
    <home_runs>1</home_runs>
    <stolen_bases>2</stolen_bases>
    <caught_stealing>0</caught_stealing>
    <sacrifice_hits>0</sacrifice_hits>
    <sacrifice_flies>0</sacrifice_flies>
    <errors>0</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>11</walks>
    <strike_outs>8</strike_outs>
    <hit_by_pitch>0</hit_by_pitch>
  </player>
  <player>
    <first_name>Gary </first_name>
    <surname>Disarcina</surname>
    <games_played>157</games_played>
    <at_bats>551</at_bats>
    <runs>73</runs>
    <hits>158</hits>
    <doubles>39</doubles>
    <triples>3</triples>
    <home_runs>3</home_runs>
    <stolen_bases>56</stolen_bases>
    <caught_stealing>7</caught_stealing>
    <sacrifice_hits>12</sacrifice_hits>
    <sacrifice_flies>3</sacrifice_flies>
    <errors>14</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>21</walks>
    <strike_outs>51</strike_outs>
    <hit_by_pitch>8</hit_by_pitch>
  </player>
  <player>
    <first_name>Jim </first_name>
    <surname>Edmonds</surname>
    <games_played>154</games_played>
    <at_bats>599</at_bats>
    <runs>115</runs>
    <hits>184</hits>
    <doubles>42</doubles>
    <triples>1</triples>
    <home_runs>25</home_runs>
    <stolen_bases>91</stolen_bases>
    <caught_stealing>5</caught_stealing>
    <sacrifice_hits>1</sacrifice_hits>
    <sacrifice_flies>1</sacrifice_flies>
    <errors>5</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>57</walks>
    <strike_outs>114</strike_outs>
    <hit_by_pitch>1</hit_by_pitch>
  </player>
  <player>
    <first_name>Darin </first_name>
    <surname>Erstad</surname>
    <games_played>133</games_played>
    <at_bats>537</at_bats>
    <runs>84</runs>
    <hits>159</hits>
    <doubles>39</doubles>
    <triples>3</triples>
    <home_runs>19</home_runs>
    <stolen_bases>82</stolen_bases>
    <caught_stealing>6</caught_stealing>
    <sacrifice_hits>1</sacrifice_hits>
    <sacrifice_flies>3</sacrifice_flies>
    <errors>3</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>43</walks>
    <strike_outs>77</strike_outs>
    <hit_by_pitch>6</hit_by_pitch>
  </player>
  <player>
    <first_name>Carlos </first_name>
    <surname>Garcia</surname>
    <games_played>19</games_played>
    <at_bats>35</at_bats>
    <runs>4</runs>
    <hits>5</hits>
    <doubles>1</doubles>
    <triples>0</triples>
    <home_runs>0</home_runs>
    <stolen_bases>0</stolen_bases>
    <caught_stealing>0</caught_stealing>
    <sacrifice_hits>1</sacrifice_hits>
    <sacrifice_flies>0</sacrifice_flies>
    <errors>1</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>3</walks>
    <strike_outs>11</strike_outs>
    <hit_by_pitch>1</hit_by_pitch>
  </player>
  <player>
    <first_name>Troy </first_name>
    <surname>Glaus</surname>
    <games_played>48</games_played>
    <at_bats>165</at_bats>
    <runs>19</runs>
    <hits>36</hits>
    <doubles>9</doubles>
    <triples>0</triples>
    <home_runs>1</home_runs>
    <stolen_bases>23</stolen_bases>
    <caught_stealing>0</caught_stealing>
    <sacrifice_hits>0</sacrifice_hits>
    <sacrifice_flies>2</sacrifice_flies>
    <errors>7</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>15</walks>
    <strike_outs>51</strike_outs>
    <hit_by_pitch>0</hit_by_pitch>
  </player>
  <player>
    <first_name>Todd </first_name>
    <surname>Greene</surname>
    <games_played>29</games_played>
    <at_bats>71</at_bats>
    <runs>3</runs>
    <hits>18</hits>
    <doubles>4</doubles>
    <triples>0</triples>
    <home_runs>1</home_runs>
    <stolen_bases>7</stolen_bases>
    <caught_stealing>0</caught_stealing>
    <sacrifice_hits>0</sacrifice_hits>
    <sacrifice_flies>0</sacrifice_flies>
    <errors>0</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>2</walks>
    <strike_outs>20</strike_outs>
    <hit_by_pitch>0</hit_by_pitch>
  </player>
  <player>
    <first_name>Eric </first_name>
    <surname>Helfand</surname>
    <games_played>0</games_played>
    <at_bats>0</at_bats>
    <runs>0</runs>
    <hits>0</hits>
    <doubles>0</doubles>
    <triples>0</triples>
    <home_runs>0</home_runs>
    <stolen_bases>0</stolen_bases>
    <caught_stealing>0</caught_stealing>
    <sacrifice_hits>0</sacrifice_hits>
    <sacrifice_flies>0</sacrifice_flies>
    <errors>0</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>0</walks>
    <strike_outs>0</strike_outs>
    <hit_by_pitch>0</hit_by_pitch>
  </player>
  <player>
    <first_name>Dave </first_name>
    <surname>Hollins</surname>
    <games_played>101</games_played>
    <at_bats>363</at_bats>
    <runs>60</runs>
    <hits>88</hits>
    <doubles>16</doubles>
    <triples>2</triples>
    <home_runs>11</home_runs>
    <stolen_bases>39</stolen_bases>
    <caught_stealing>3</caught_stealing>
    <sacrifice_hits>2</sacrifice_hits>
    <sacrifice_flies>2</sacrifice_flies>
    <errors>17</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>44</walks>
    <strike_outs>69</strike_outs>
    <hit_by_pitch>7</hit_by_pitch>
  </player>
  <player>
    <first_name>Gregg </first_name>
    <surname>Jefferies</surname>
    <games_played>19</games_played>
    <at_bats>72</at_bats>
    <runs>7</runs>
    <hits>25</hits>
    <doubles>6</doubles>
    <triples>0</triples>
    <home_runs>1</home_runs>
    <stolen_bases>10</stolen_bases>
    <caught_stealing>0</caught_stealing>
    <sacrifice_hits>0</sacrifice_hits>
    <sacrifice_flies>0</sacrifice_flies>
    <errors>0</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>0</walks>
    <strike_outs>5</strike_outs>
    <hit_by_pitch>0</hit_by_pitch>
  </player>
  <player>
    <first_name>Mark </first_name>
    <surname>Johnson</surname>
    <games_played>10</games_played>
    <at_bats>14</at_bats>
    <runs>1</runs>
    <hits>1</hits>
    <doubles>0</doubles>
    <triples>0</triples>
    <home_runs>0</home_runs>
    <stolen_bases>0</stolen_bases>
    <caught_stealing>0</caught_stealing>
    <sacrifice_hits>0</sacrifice_hits>
    <sacrifice_flies>0</sacrifice_flies>
    <errors>0</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>0</walks>
    <strike_outs>6</strike_outs>
    <hit_by_pitch>0</hit_by_pitch>
  </player>
  <player>
    <first_name>Chad </first_name>
    <surname>Kreuter</surname>
    <games_played>96</games_played>
    <at_bats>252</at_bats>
    <runs>27</runs>
    <hits>63</hits>
    <doubles>10</doubles>
    <triples>1</triples>
    <home_runs>2</home_runs>
    <stolen_bases>33</stolen_bases>
    <caught_stealing>0</caught_stealing>
    <sacrifice_hits>5</sacrifice_hits>
    <sacrifice_flies>1</sacrifice_flies>
    <errors>9</errors>
    <passed_by_ball>5</passed_by_ball>
    <walks>33</walks>
    <strike_outs>49</strike_outs>
    <hit_by_pitch>3</hit_by_pitch>
  </player>
  <player>
    <first_name>Norberto </first_name>
    <surname>Martin</surname>
    <games_played>79</games_played>
    <at_bats>195</at_bats>
    <runs>20</runs>
    <hits>42</hits>
    <doubles>2</doubles>
    <triples>0</triples>
    <home_runs>1</home_runs>
    <stolen_bases>13</stolen_bases>
    <caught_stealing>1</caught_stealing>
    <sacrifice_hits>3</sacrifice_hits>
    <sacrifice_flies>2</sacrifice_flies>
    <errors>4</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>6</walks>
    <strike_outs>29</strike_outs>
    <hit_by_pitch>0</hit_by_pitch>
  </player>
  <player>
    <first_name>Damon </first_name>
    <surname>Mashore</surname>
    <games_played>43</games_played>
    <at_bats>98</at_bats>
    <runs>13</runs>
    <hits>23</hits>
    <doubles>6</doubles>
    <triples>0</triples>
    <home_runs>2</home_runs>
    <stolen_bases>11</stolen_bases>
    <caught_stealing>0</caught_stealing>
    <sacrifice_hits>1</sacrifice_hits>
    <sacrifice_flies>0</sacrifice_flies>
    <errors>0</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>9</walks>
    <strike_outs>22</strike_outs>
    <hit_by_pitch>3</hit_by_pitch>
  </player>
  <player>
    <first_name>Ben </first_name>
    <surname>Molina</surname>
    <games_played>2</games_played>
    <at_bats>1</at_bats>
    <runs>0</runs>
    <hits>0</hits>
    <doubles>0</doubles>
    <triples>0</triples>
    <home_runs>0</home_runs>
    <stolen_bases>0</stolen_bases>
    <caught_stealing>0</caught_stealing>
    <sacrifice_hits>0</sacrifice_hits>
    <sacrifice_flies>0</sacrifice_flies>
    <errors>0</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>0</walks>
    <strike_outs>0</strike_outs>
    <hit_by_pitch>0</hit_by_pitch>
  </player>
  <player>
    <first_name>Phil </first_name>
    <surname>Nevin</surname>
    <games_played>75</games_played>
    <at_bats>237</at_bats>
    <runs>27</runs>
    <hits>54</hits>
    <doubles>8</doubles>
    <triples>1</triples>
    <home_runs>8</home_runs>
    <stolen_bases>27</stolen_bases>
    <caught_stealing>0</caught_stealing>
    <sacrifice_hits>0</sacrifice_hits>
    <sacrifice_flies>2</sacrifice_flies>
    <errors>5</errors>
    <passed_by_ball>20</passed_by_ball>
    <walks>17</walks>
    <strike_outs>67</strike_outs>
    <hit_by_pitch>5</hit_by_pitch>
  </player>
  <player>
    <first_name>Charlie </first_name>
    <surname>Obrien</surname>
    <games_played>62</games_played>
    <at_bats>175</at_bats>
    <runs>13</runs>
    <hits>45</hits>
    <doubles>9</doubles>
    <triples>0</triples>
    <home_runs>4</home_runs>
    <stolen_bases>18</stolen_bases>
    <caught_stealing>0</caught_stealing>
    <sacrifice_hits>3</sacrifice_hits>
    <sacrifice_flies>3</sacrifice_flies>
    <errors>4</errors>
    <passed_by_ball>1</passed_by_ball>
    <walks>10</walks>
    <strike_outs>33</strike_outs>
    <hit_by_pitch>2</hit_by_pitch>
  </player>
  <player>
    <first_name>Orlando </first_name>
    <surname>Palmeiro</surname>
    <games_played>74</games_played>
    <at_bats>165</at_bats>
    <runs>28</runs>
    <hits>53</hits>
    <doubles>7</doubles>
    <triples>2</triples>
    <home_runs>0</home_runs>
    <stolen_bases>21</stolen_bases>
    <caught_stealing>4</caught_stealing>
    <sacrifice_hits>7</sacrifice_hits>
    <sacrifice_flies>0</sacrifice_flies>
    <errors>0</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>20</walks>
    <strike_outs>11</strike_outs>
    <hit_by_pitch>0</hit_by_pitch>
  </player>
  <player>
    <first_name>Chris </first_name>
    <surname>Pritchett</surname>
    <games_played>31</games_played>
    <at_bats>80</at_bats>
    <runs>12</runs>
    <hits>23</hits>
    <doubles>2</doubles>
    <triples>1</triples>
    <home_runs>2</home_runs>
    <stolen_bases>8</stolen_bases>
    <caught_stealing>0</caught_stealing>
    <sacrifice_hits>0</sacrifice_hits>
    <sacrifice_flies>0</sacrifice_flies>
    <errors>1</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>4</walks>
    <strike_outs>16</strike_outs>
    <hit_by_pitch>0</hit_by_pitch>
  </player>
  <player>
    <first_name>Tim </first_name>
    <surname>Salmon</surname>
    <games_played>136</games_played>
    <at_bats>463</at_bats>
    <runs>84</runs>
    <hits>139</hits>
    <doubles>28</doubles>
    <triples>1</triples>
    <home_runs>26</home_runs>
    <stolen_bases>88</stolen_bases>
    <caught_stealing>1</caught_stealing>
    <sacrifice_hits>0</sacrifice_hits>
    <sacrifice_flies>10</sacrifice_flies>
    <errors>2</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>90</walks>
    <strike_outs>100</strike_outs>
    <hit_by_pitch>3</hit_by_pitch>
  </player>
  <player>
    <first_name>Craig </first_name>
    <surname>Shipley</surname>
    <games_played>77</games_played>
    <at_bats>147</at_bats>
    <runs>18</runs>
    <hits>38</hits>
    <doubles>7</doubles>
    <triples>1</triples>
    <home_runs>2</home_runs>
    <stolen_bases>17</stolen_bases>
    <caught_stealing>4</caught_stealing>
    <sacrifice_hits>4</sacrifice_hits>
    <sacrifice_flies>1</sacrifice_flies>
    <errors>3</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>5</walks>
    <strike_outs>22</strike_outs>
    <hit_by_pitch>5</hit_by_pitch>
  </player>
  <player>
    <first_name>Randy </first_name>
    <surname>Velarde</surname>
    <games_played>51</games_played>
    <at_bats>188</at_bats>
    <runs>29</runs>
    <hits>49</hits>
    <doubles>13</doubles>
    <triples>1</triples>
    <home_runs>4</home_runs>
    <stolen_bases>26</stolen_bases>
    <caught_stealing>2</caught_stealing>
    <sacrifice_hits>0</sacrifice_hits>
    <sacrifice_flies>1</sacrifice_flies>
    <errors>4</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>34</walks>
    <strike_outs>42</strike_outs>
    <hit_by_pitch>1</hit_by_pitch>
  </player>
  <player>
    <first_name>Matt </first_name>
    <surname>Walbeck</surname>
    <games_played>108</games_played>
    <at_bats>338</at_bats>
    <runs>41</runs>
    <hits>87</hits>
    <doubles>15</doubles>
    <triples>2</triples>
    <home_runs>6</home_runs>
    <stolen_bases>46</stolen_bases>
    <caught_stealing>1</caught_stealing>
    <sacrifice_hits>5</sacrifice_hits>
    <sacrifice_flies>5</sacrifice_flies>
    <errors>7</errors>
    <passed_by_ball>8</passed_by_ball>
    <walks>30</walks>
    <strike_outs>68</strike_outs>
    <hit_by_pitch>2</hit_by_pitch>
  </player>
  <player>
    <first_name>Reggie </first_name>
    <surname>Williams</surname>
    <games_played>29</games_played>
    <at_bats>36</at_bats>
    <runs>7</runs>
    <hits>13</hits>
    <doubles>1</doubles>
    <triples>0</triples>
    <home_runs>1</home_runs>
    <stolen_bases>5</stolen_bases>
    <caught_stealing>3</caught_stealing>
    <sacrifice_hits>1</sacrifice_hits>
    <sacrifice_flies>0</sacrifice_flies>
    <errors>0</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>7</walks>
    <strike_outs>11</strike_outs>
    <hit_by_pitch>1</hit_by_pitch>
  </player>
</players>

A Shortcut

import java.io.*;
import org.jdom.*;
import org.jdom.output.XMLOutputter;


public class BaseballTabToXMLShortcut {

  public static void main(String[] args) {
     
    Element root = new Element("players");
    
    try {
      FileInputStream fin = new FileInputStream(args[0]);
      BufferedReader in 
       = new BufferedReader(new InputStreamReader(fin));    

      String playerStats;  
      while ((playerStats = in.readLine()) != null) {
        String[] stats = splitLine(playerStats);
        
        Element player = new Element("player");

        player.addContent((new Element("first_name")).setText(stats[1]));
        player.addContent((new Element("surname")).setText(stats[0]));
        player.addContent((new Element("games_played")).setText(stats[4]));
        player.addContent((new Element("at_bats")).setText(stats[6]));
        player.addContent((new Element("runs")).setText(stats[7]));
        player.addContent((new Element("hits")).setText(stats[8]));
        player.addContent((new Element("doubles")).setText(stats[9]));
        player.addContent((new Element("triples")).setText(stats[10]));
        player.addContent((new Element("home_runs")).setText(stats[11]));
        player.addContent((new Element("runs_batted_in")).setText(stats[12]));
        player.addContent((new Element("stolen_bases")).setText(stats[13]));
        player.addContent((new Element("caught_stealing")).setText(stats[14]));
        player.addContent((new Element("sacrifice_hits")).setText(stats[15]));
        player.addContent((new Element("sacrifice_flies")).setText(stats[16]));
        player.addContent((new Element("errors")).setText(stats[17]));
        player.addContent((new Element("passed_by_ball")).setText(stats[18]));
        player.addContent((new Element("walks")).setText(stats[19]));
        player.addContent((new Element("strike_outs")).setText(stats[20]));
        player.addContent((new Element("hit_by_pitch")).setText(stats[21]));
        
        root.addContent(player);
      }  
      
      Document doc = new Document(root);
      // serialize it into a file
      FileOutputStream fout 
       = new FileOutputStream("baseballstats.xml");
      
      XMLOutputter serializer = new XMLOutputter(); 
      serializer.output(doc, fout);
      fout.flush();	
      fout.close();
      in.close();
      
    }
    catch (IOException e) {
      System.err.println(e);
    }
    catch (ArrayIndexOutOfBoundsException e) {
      System.out.println("Usage: java BaseballTabToXML input_file.tab");
    }

  }

  public static String[] splitLine(String playerStats) {
    
    // count the number of tabs
    int numTabs = 0;
    for (int i = 0; i < playerStats.length(); i++) {
      if (playerStats.charAt(i) == '\t') numTabs++;
    }
    int numFields = numTabs + 1;
    String[] fields = new String[numFields];
    int position = 0;
    for (int i = 0; i < numFields; i++) {
      StringBuffer field = new StringBuffer();
      while (position < playerStats.length() 
       && playerStats.charAt(position++) != '\t') {
        field.append(playerStats.charAt(position-1));
      }
      fields[i] = field.toString();
    }    
    return fields;
    
  }

}

Converting data to XML while Processing it

import java.io.*;
import java.text.*;
import java.util.*;
import org.jdom.*;
import org.jdom.output.XMLOutputter;

public class BattingAverage {

  public static void main(String[] args) {
     
    Element root = new Element("players");
     
    try {
      FileInputStream fin = new FileInputStream(args[0]);
      BufferedReader in 
       = new BufferedReader(new InputStreamReader(fin));
      
      String playerStats;
      
      // for formatting batting averages
      DecimalFormat averages = (DecimalFormat) 
       NumberFormat.getNumberInstance(Locale.US);
      averages.setMaximumFractionDigits(3);
      averages.setMinimumFractionDigits(3);
      averages.setMinimumIntegerDigits(0);
      
      while ((playerStats = in.readLine()) != null) {
        String[] stats = splitLine(playerStats);
        
        String formattedAverage;
        try {
          int atBats         = Integer.parseInt(stats[6]);
          int hits           = Integer.parseInt(stats[8]);
          int walks          = Integer.parseInt(stats[19]);
          int hitByPitch     = Integer.parseInt(stats[21]);
          int sacrificeFlies = Integer.parseInt(stats[16]);
          int sacrificeHits  = Integer.parseInt(stats[15]);
        
          int officialAtBats 
           = atBats - walks - hitByPitch - sacrificeHits;
          if (officialAtBats <= 0) formattedAverage = "N/A";
          else {
            double average = hits / (double) officialAtBats;
            formattedAverage = averages.format(average);
          }       
        }
        catch (Exception e) {
          // skip this player
          continue; 
        }

        Element player = new Element("player");

        Element first_name = new Element("first_name");
        first_name.setText(stats[1]);
        player.addContent(first_name);
             
        Element surname = new Element("surname");
        surname.setText(stats[0]);
        player.addContent(surname);
       
        Element battingAverage = new Element("batting_average");
        battingAverage.setText(formattedAverage);
        player.addContent(battingAverage);
   
        root.addContent(player);
        
      }  
      
      
      Document doc = new Document(root);
      // serialize it into a file
      FileOutputStream fout 
       = new FileOutputStream("battingaverages.xml");
      
      XMLOutputter serializer = new XMLOutputter(); 
      serializer.output(doc, fout);
      fout.flush();	
      fout.close();
      in.close();

    }
    catch (IOException e) {
      System.err.println(e);
    }
    catch (ArrayIndexOutOfBoundsException e) {
      System.out.println("Usage: java BattingAverage input_file.tab");
    }

  }

  public static String[] splitLine(String playerStats) {
    
    // count the number of tabs
    int numTabs = 0;
    for (int i = 0; i < playerStats.length(); i++) {
      if (playerStats.charAt(i) == '\t') numTabs++;
    }
    int numFields = numTabs + 1;
    String[] fields = new String[numFields];
    int position = 0;
    for (int i = 0; i < numFields; i++) {
      StringBuffer field = new StringBuffer();
      while (position < playerStats.length() 
       && playerStats.charAt(position++) != '\t') {
        field.append(playerStats.charAt(position-1));
      }
      fields[i] = field.toString();
    }    
    return fields;
    
  }

}

Batting Averages in XML

<?xml version="1.0"?>
<players>
  <player>
    <first_name>Garret </first_name>
    <surname>Anderson</surname>
    <batting_average>.311</batting_average>
  </player>
  <player>
    <first_name>Justin </first_name>
    <surname>Baughman</surname>
    <batting_average>.272</batting_average>
  </player>
  <player>
    <first_name>Frank </first_name>
    <surname>Bolick</surname>
    <batting_average>.206</batting_average>
  </player>
  <player>
    <first_name>Gary </first_name>
    <surname>Disarcina</surname>
    <batting_average>.310</batting_average>
  </player>
  <player>
    <first_name>Jim </first_name>
    <surname>Edmonds</surname>
    <batting_average>.341</batting_average>
  </player>
  <player>
    <first_name>Darin </first_name>
    <surname>Erstad</surname>
    <batting_average>.326</batting_average>
  </player>
  <player>
    <first_name>Carlos </first_name>
    <surname>Garcia</surname>
    <batting_average>.167</batting_average>
  </player>
  <player>
    <first_name>Troy </first_name>
    <surname>Glaus</surname>
    <batting_average>.240</batting_average>
  </player>
  <player>
    <first_name>Todd </first_name>
    <surname>Greene</surname>
    <batting_average>.261</batting_average>
  </player>
  <player>
    <first_name>Eric </first_name>
    <surname>Helfand</surname>
    <batting_average>N/A</batting_average>
  </player>
  <player>
    <first_name>Dave </first_name>
    <surname>Hollins</surname>
    <batting_average>.284</batting_average>
  </player>
  <player>
    <first_name>Gregg </first_name>
    <surname>Jefferies</surname>
    <batting_average>.347</batting_average>
  </player>
  <player>
    <first_name>Mark </first_name>
    <surname>Johnson</surname>
    <batting_average>.071</batting_average>
  </player>
  <player>
    <first_name>Chad </first_name>
    <surname>Kreuter</surname>
    <batting_average>.299</batting_average>
  </player>
  <player>
    <first_name>Norberto </first_name>
    <surname>Martin</surname>
    <batting_average>.226</batting_average>
  </player>
  <player>
    <first_name>Damon </first_name>
    <surname>Mashore</surname>
    <batting_average>.271</batting_average>
  </player>
  <player>
    <first_name>Ben </first_name>
    <surname>Molina</surname>
    <batting_average>.000</batting_average>
  </player>
  <player>
    <first_name>Phil </first_name>
    <surname>Nevin</surname>
    <batting_average>.251</batting_average>
  </player>
  <player>
    <first_name>Charlie </first_name>
    <surname>Obrien</surname>
    <batting_average>.281</batting_average>
  </player>
  <player>
    <first_name>Orlando </first_name>
    <surname>Palmeiro</surname>
    <batting_average>.384</batting_average>
  </player>
  <player>
    <first_name>Chris </first_name>
    <surname>Pritchett</surname>
    <batting_average>.303</batting_average>
  </player>
  <player>
    <first_name>Tim </first_name>
    <surname>Salmon</surname>
    <batting_average>.376</batting_average>
  </player>
  <player>
    <first_name>Craig </first_name>
    <surname>Shipley</surname>
    <batting_average>.286</batting_average>
  </player>
  <player>
    <first_name>Randy </first_name>
    <surname>Velarde</surname>
    <batting_average>.320</batting_average>
  </player>
  <player>
    <first_name>Matt </first_name>
    <surname>Walbeck</surname>
    <batting_average>.289</batting_average>
  </player>
  <player>
    <first_name>Reggie </first_name>
    <surname>Williams</surname>
    <batting_average>.481</batting_average>
  </player>
</players>

Advantages of JDOM for Writing Documents

You don't need to worry about well-formedness rules
Pretty printing is almost automatic.
You can pick any encoding Java supports.
Validity is not automatically maintained.

Part III: Parser APIs

The stereotypical "Desperate Perl Hacker" (DPH) is supposed to be able to write an XML parser in a weekend.
The parser does the hard work for you.
Your code reads the document through by hooking up JDOM to the parser.
JDOM can connect to any parser that supports SAX or DOM.

Parser APIs

SAX, the Simple API for XML
- SAX1
- SAX2
DOM, the Document Object Model
- DOM Level 0
- DOM Level 1
- DOM Level 2
- DOM Level 3
Proprietary APIs
- Parser specific APIs
- Sun's Java API for XML Parsing = SAX1 + DOM1 + a few factory classes
- JSR-000031 XML Data Binding Specification from Bluestone, Sun, WebMethods et al.
  The proposed specification will define an XML data-binding facility for the JavaTM Platform. Such a facility compiles an XML schema into one or more Java classes. These automatically-generated classes handle the translation between XML documents that follow the schema and interrelated instances of the derived classes. They also ensure that the constraints expressed in the schema are maintained as instances of the classes are manipulated.
And of course JDOM

JDOM Compatible Parsers for Java

Any SAX or DOM compatible parser including:

Apache XML Project's Xerces Java: http://xml.apache.org/xerces-j/index.html
Oracle's XML Parser for Java: http://technet.oracle.com/tech/xml/parser_java2
Sun's Java API for XML http://java.sun.com/products/xml

SAX

Public domain, developed on xml-dev mailing list
Maintained by David Megginson
org.xml.sax package
http://www.megginson.com/SAX/
Event based
Read-only
SAX omits DTD declarations

SAX2

Adds:
- Namespace support
- Optional Validation
- Optional Lexical events for comments, CDATA sections, entity references
A lot more configurable
Deprecates a lot of SAX1
Adapter classes convert between SAX2 and SAX1 parsers.

The SAX Process

Contruct a parser-specific implementation of the XMLReader interface
Your code registers a ContentHandler with the parser
An InputSource feeds the document into the parser
As the document is read, the parser calls back to the methods of the methods of the ContentHandler to tell it what it's seeing in the document.

Event Based API Caveats

You do not always have all the information you need at the time of a given callback
You may need to store information in various data structures (stacks, queues,vectors, arrays, etc.) and act on it at a later point
For example, the characters() method is not guaranteed to give you the maximum number of contiguous characters. It may split a single run of characters over multiple method calls.

Document Object Model

Defines how XML and HTML documents are represented as objects in programs
W3C Standard
Defined in IDL; thus language independent
HTML as well as XML
Writing as well as reading
More complete than SAX or JDOM; covers everything except internal and external DTD subsets
DOM focuses more on the document; SAX focuses more on the parser.

The Design of the DOM API

Parser independent interfaces; parser dependent implementation classes. Most programs must use the parser dependent classes. JAXP helps solve this, but so far only for DOM Level 1.
Everything's a Node:
- Extensive use of polymorphism
- Lots of casting
Language independence means there's very limited use of the Java class library; Various features are reinvented
Language independence requires no method overloading because not all languages support it.
Several features are poor design in Java, if not in other languages:
- Named constants are often shorts
- Only one kind of exception; details provided by constants
- No Java-specific utility methods like equals(), hashCode(), clone(), or toString()

DOM Evolution

DOM Level 0:
DOM Level 1, a W3C Standard
DOM Level 2, a W3C Standard
DOM Level 3, eventual W3C Standard to add schema and DTD support

Eight Modules:

Eight Modules:

Core org.w3c.dom ^*

HTML org.w3c.dom.html

Views org.w3c.dom.views

StyleSheets org.w3c.dom.stylesheets

CSS org.w3c.dom.css

Events org.w3c.dom.events ^*

Traversal org.w3c.dom.traversal ^*

Range org.w3c.dom.range

Only the core and traversal modules really apply to XML. The other six are for HTML.
^* indicates Xerces support

DOM Trees

Entire document is represented as a grove of trees.
Each XML document should contain exactly one tree.
A tree contains nodes.
Some nodes may contain other nodes (depending on node type).
Each document node contains:
- zero or one doctype nodes
- one root element node
- zero or more comment and processing instruction nodes

org.w3c.dom

17 interfaces:

DOM Interface JDOM Equivalent

Attr Attribute

CDATASection

CharacterData

Comment Comment

Document Document

DocumentFragment

DocumentType DocType

DOMImplementation

Element Element

Entity Entity

EntityReference

NamedNodeMap

Node

NodeList

Notation

ProcessingInstruction ProcessingInstruction

Text java.lang.String

plus one exception: DOMException
Plus a bunch of HTML stuff in org.w3c.dom.html and other packages we will ignore

DOM Interface	JDOM Equivalent
`Attr`	`Attribute`
`CDATASection`
`CharacterData`
`Comment`	`Comment`
`Document`	`Document`
`DocumentFragment`
`DocumentType`	`DocType`
`DOMImplementation`
`Element`	`Element`
`Entity`	`Entity`
`EntityReference`
`NamedNodeMap`
`Node`
`NodeList`
`Notation`
`ProcessingInstruction`	`ProcessingInstruction`
`Text`	`java.lang.String`

The DOM Process

Library specific code creates a parser
The parser parses the document and returns an org.w3c.dom.Document object.
The entire document is stored in memory.
DOM methods and interfaces are used to extract data from this object

The JDOM Process

Construct an org.jdom.input.SAXBuilder or an org.jdom.input.DOMBuilder; no parser specific code is needed!
Invoke the builder's build() method to build a Document object from a
- Reader
- InputStream
- URL
- File
- String containing a SYSTEM ID
If there's a problem building the document, a JDOMException is thrown
Work with the resulting Document object

Parsing a Document with JDOM

import org.jdom.JDOMException;
import org.jdom.input.SAXBuilder;


public class JDOMChecker {

  public static void main(String[] args) {
  
    if (args.length == 0) {
      System.out.println("Usage: java JDOMChecker URL1 URL2..."); 
    } 
      
    SAXBuilder builder = new SAXBuilder();
     
    // start parsing... 
    for (int i = 0; i < args.length; i++) {
      
      // command line should offer URIs or file names
      try {
        builder.build(args[i]);
        // If there are no well-formedness errors, 
        // then no exception is thrown
        System.out.println(args[i] + " is well formed.");
      }
      catch (JDOMException e) { // indicates a well-formedness or other error
        System.out.println(args[i] + " is not well formed.");
        System.out.println(e.getMessage());
      }
      
    }   
  
  }

}

Parser Results

% java JDOMChecker shortlogs.xml HelloJDOM.java
shortlogs.xml is well formed.
HelloJDOM.java is not well formed.
The markup in the document preceding the root element must be well-formed.: 
Error on line 1 of XML document: The markup in the document preceding the 
root element must be well-formed.

Turning on Validation in JDOM

Not all parsers are validating but Xerces-J is.
Validity errors are not fatal; therefore they do not necessarily cause a JDOMException
However, you can tell the builder you want it to validate by passing true to its constructor:
```
    SAXBuilder builder = new SAXBuilder(true);
```

JDOM Validator

import org.jdom.JDOMException;
import org.jdom.input.SAXBuilder;


public class Validator {

  public static void main(String[] args) {
  
    if (args.length == 0) {
      System.out.println("Usage: java Validator URL1 URL2..."); 
    } 
      
    SAXBuilder builder = new SAXBuilder(true);
                             /*         ^^^^       */
                             /* Turn on validation */
    // start parsing... 
    // start parsing... 
    for (int i = 0; i < args.length; i++) {
      
      // command line should offer URIs or file names
      try {
          builder.build(args[i]);
        // If there are no well-formedness errors, 
        // then no exception is thrown
        System.out.println(args[i] + " is well formed.");
      }
      catch (JDOMException e) { // indicates a well-formedness or validity error
        System.out.println(args[i] + " is not valid.");
        System.out.println(e.getMessage());
      }
      
    }   
  
  }

}

Validation Output

% java Validator invalid_fibonacci.xml
invalid_fibonacci.xml is not valid.
Element type "title" must be declared.: Error on line 8 of XML document: 
Element type "title" must be declared.

% java Validator validfibonacci.xml
validfibonacci.xml is valid.

Building with DOM instead of SAX

Use DOMBuilder instead of SAXBuilder
Must have an existing DOM tree, specifically an org.w3c.dom.Document (Note the name conflict with org.jdom.Document)
DOM validation is broken as of JDOM beta 4.
In general, SAX is easier.

DOMBuilder Example

import org.jdom.*;
import org.jdom.input.DOMBuilder;
import org.apache.xerces.parsers.*;


public class DOMValidator {

  public static void main(String[] args) {
  
    if (args.length == 0) {
      System.out.println("Usage: java DOMValidator URL1 URL2..."); 
    }      
      
    DOMBuilder builder = new DOMBuilder(true);
                             /*         ^^^^       */
                             /* Turn on validation */
    // start parsing... 
    DOMParser parser = new DOMParser();  // Xerces specific class
    for (int i = 0; i < args.length; i++) {
        
      try {
        // Read the entire document into memory
        parser.parse(args[i]); 
    
        org.w3c.dom.Document domDoc  = parser.getDocument();
        org.jdom.Document    jdomDoc = builder.build(domDoc);

        // If there are no validity errors, 
        // then no exception is thrown
        System.out.println(args[i] + " is valid.");
      }
      catch (Exception e) { // indicates a well-formedness or validity error
        System.out.println(args[i] + " is not valid.");
        System.out.println(e.getMessage());
      }
      
    }   
  
  }

}

Schemas

An XML syntax
Let you specify the contents of elements
Type derivation
Xerces can validate against schemas.
Standard is not quite finished yet.
JDOM does not specifically support this.

Part IV: Reading XML Documents

One program, three implementations:

SAX
DOM
JDOM

UserLand's RSS based list of Web logs

UserLand's RSS based list of Web logs at http://static.userland.com/weblogMonitor/logs.xml:

<?xml version="1.0" encoding="ISO-8859-1" ?>
<!DOCTYPE foo SYSTEM "http://msdn.microsoft.com/xml/general/htmlentities.dtd">
<weblogs>
	<log>
		<name>MozillaZine</name>
		<url>http://www.mozillazine.org</url>
		<changesUrl>http://www.mozillazine.org/contents.rdf</changesUrl>
		<ownerName>Jason Kersey</ownerName>
		<ownerEmail>kerz@en.com</ownerEmail>
		<description>THE source for news on the Mozilla Organization.  DevChats, Reviews, Chats, Builds, Demos, Screenshots, and more.</description>
		<imageUrl></imageUrl>
		<adImageUrl>http://static.userland.com/weblogMonitor/ads/kerz@en.com.gif</adImageUrl>
		</log>
	<log>
		<name>SalonHerringWiredFool</name>
		<url>http://www.salonherringwiredfool.com/</url>
		<ownerName>Some Random Herring</ownerName>
		<ownerEmail>salonfool@wiredherring.com</ownerEmail>
		<description></description>
		</log>
	<log>
		<name>Scripting News</name>
		<url>http://www.scripting.com/</url>
		<ownerName>Dave Winer</ownerName>
		<ownerEmail>dave@userland.com</ownerEmail>
		<description>News and commentary from the cross-platform scripting community.</description>
		<imageUrl>http://www.scripting.com/gifs/tinyScriptingNews.gif</imageUrl>
		<adImageUrl>http://static.userland.com/weblogMonitor/ads/dave@userland.com.gif</adImageUrl>
		</log>
	<log>
		<name>SlashDot.Org</name>
		<url>http://www.slashdot.org/</url>
		<ownerName>Simply a friend</ownerName>
		<ownerEmail>afriendofweblogs@weblogs.com</ownerEmail>
		<description>News for Nerds, Stuff that Matters.</description>
		</log>
	</weblogs>

Full list

Goal: Return a list of all the URLs in this list as java.net.URL objects

Design Decisions

Should we return an array, an Enumeration, a List, or what?
Perhaps we should use multiple threads?

The SAX ContentHandler interface

package org.xml.sax;


public interface ContentHandler {

    public void setDocumentLocator(Locator locator);
    
    public void startDocument() throws SAXException;
    
    public void endDocument()	throws SAXException;
    
    public void startPrefixMapping(String prefix, String uri) 
     throws SAXException;

    public void endPrefixMapping(String prefix) throws SAXException;

    public void startElement(String namespaceURI, String localName,
		 String rawName, Attributes atts) throws SAXException;

    public void endElement(String namespaceURI, String localName,
     String rawName) throws SAXException;

    public void characters(char[] ch, int start, int length) 
     throws SAXException;

    public void ignorableWhitespace(char ch[], int start, int length)
     throws SAXException;

    public void processingInstruction(String target, String data)
     throws SAXException;

    public void skippedEntity(String name) throws SAXException;
     
}

SAX Design

We do not know how many URLs there will be when we start parsing so let's use a Vector
Single threaded for simplicity but a real program would use multiple threads
- One to load and parse the data
- Another thread (probably the main thread) to serve the data
- Early data could be provided before the entire document had been read
The character data of each url element needs to be stored. Everything else can be ignored.
A startElement() with the name url indicates that we need to start storing this data.
A stopElement() with the name url indicates that we need to stop storing this data, convert it to a URL and put it in the Vector
Should we hide the XML parsing inside a non-public class to avoid accidentally calling the methods from unexpected places or threads?

User Interface Class

import org.xml.sax.*;
import org.apache.xerces.parsers.*;
import java.util.*;
import java.io.*;


public class WeblogsSAX {
     
  public static List listChannels() 
   throws IOException, SAXException {
    return listChannels(
     "http://static.userland.com/weblogMonitor/logs.xml"); 
  }
  
  public static List listChannels(String uri) 
   throws IOException, SAXException {
    
    XMLReader parser = new SAXParser();
    Vector urls = new Vector(1000);
    URIGrabber u = new URIGrabber(urls);
    parser.setContentHandler(u);
    parser.parse(uri);
    return urls;
    
  }
  
  public static void main(String[] args) {
   
    try {
      List urls;
      if (args.length > 0) urls = listChannels(args[0]);
      else urls = listChannels();
      Iterator iterator = urls.iterator();
      while (iterator.hasNext()) {
        System.out.println(iterator.next()); 
      }
    }
    catch (IOException e) {
      System.err.println(e); 
    }
    catch (SAXParseException e) {
      System.err.println(e); 
      System.err.println("at line " + e.getLineNumber() 
       + ", column " + e.getColumnNumber()); 
    }
    catch (SAXException e) {
      System.err.println(e); 
    }
    catch (/* Unexpected */ Exception e) {
      e.printStackTrace(); 
    }
    
  }
  
}

ContentHandler Class

import org.xml.sax.*;
import java.net.*;
import java.util.Vector;

             // conflicts with java.net.ContentHandler
class URIGrabber implements org.xml.sax.ContentHandler {
    
  private Vector urls;
     
  URIGrabber(Vector urls) {
    this.urls = urls;
  }
    
  // do nothing methods  
  public void setDocumentLocator(Locator locator) {}
  public void startDocument() throws SAXException {}
  public void endDocument() throws SAXException {}
  public void startPrefixMapping(String prefix, String uri) 
   throws SAXException {}
  public void endPrefixMapping(String prefix) throws SAXException {}
  public void skippedEntity(String name) throws SAXException {}  
  public void ignorableWhitespace(char[] text, int start, int length)
   throws SAXException {}
  public void processingInstruction(String target, String data)
   throws SAXException {}
  
  
  // Remember, there's no guarantee all the text of the
  // url element will be returned in a single call to characters
  private StringBuffer urlBuffer;
  private boolean collecting = false;
  
  public void startElement(String namespaceURI, String localName,
   String rawName, Attributes atts) throws SAXException {
	  
    if (rawName.equals("url")) {
      collecting = true;
      urlBuffer = new StringBuffer();
    } 
    
  }
  
  public void characters(char[] text, int start, int length) 
   throws SAXException {
    
    if (collecting) {
      urlBuffer.append(text, start, length);
    } 
    
  }
  
  public void endElement(String namespaceURI, String localName,
   String rawName) throws SAXException {
	  
    if (rawName.equals("url")) {
      collecting = false;
      String url = urlBuffer.toString();
      try {
        urls.addElement(new URL(url));
      }
      catch (MalformedURLException e) {
        // skip this url
      }
    }
    
  } 
    
}

Weblogs Output

% java Weblogs shortlogs.xml
http://www.mozillazine.org
http://www.salonherringwiredfool.com/
http://www.scripting.com/
http://www.slashdot.org/

Weblogs with DOM

This example is very sequential so SAX fits it nicely
Let's look at the port to DOM

DOM Design

We cannot easily find out how many URLs there will be when we start parsing, even though they're all in memory.
Single threaded by nature; no benefit to mutiple threads since no data will be available until the entire document has been read and parsed.
The character data of each url element needs to be read. Everything else can be ignored.
We can use NodeIterator to walk the tree.
We can use NodeIterator to select only the url elements.
The XML parsing is so straight-forward it can be done inside one method. No extra class is required.

The Node Interface

package org.w3c.dom;

public interface Node {

  // NodeType
  public static final short ELEMENT_NODE                = 1;
  public static final short ATTRIBUTE_NODE              = 2;
  public static final short TEXT_NODE                   = 3;
  public static final short CDATA_SECTION_NODE          = 4;
  public static final short ENTITY_REFERENCE_NODE       = 5;
  public static final short ENTITY_NODE                 = 6;
  public static final short PROCESSING_INSTRUCTION_NODE = 7;
  public static final short COMMENT_NODE                = 8;
  public static final short DOCUMENT_NODE               = 9;
  public static final short DOCUMENT_TYPE_NODE          = 10;
  public static final short DOCUMENT_FRAGMENT_NODE      = 11;
  public static final short NOTATION_NODE               = 12;

  public String       getNodeName();
  public String       getNodeValue() throws DOMException;
  public void         setNodeValue(String nodeValue) throws DOMException;
  public short        getNodeType();
  public Node         getParentNode();
  public NodeList     getChildNodes();
  public Node         getFirstChild();
  public Node         getLastChild();
  public Node         getPreviousSibling();
  public Node         getNextSibling();
  public NamedNodeMap getAttributes();
  public Document     getOwnerDocument();
  public Node         insertBefore(Node newChild, Node refChild) throws DOMException;
  public Node         replaceChild(Node newChild, Node oldChild) throws DOMException;
  public Node         removeChild(Node oldChild) throws DOMException;
  public Node         appendChild(Node newChild) throws DOMException;
  public boolean      hasChildNodes();
  public Node         cloneNode(boolean deep);
  public void         normalize();
  public boolean      supports(String feature, String version);
  public String       getNamespaceURI();
  public String       getPrefix();
  public void         setPrefix(String prefix) throws DOMException;
  public String       getLocalName();
  
}

The NodeIterator Interface

package org.w3c.dom.traversal;

public interface NodeIterator {

  public Node       nextNode()     throws DOMException;
  public Node       previousNode() throws DOMException;
  public int        getWhatToShow();
  public NodeFilter getFilter();
  public boolean    getExpandEntityReferences();
  public void       detach();
    
}

The NodeFilter Interface

package org.w3c.dom.traversal;

public interface NodeFilter {

    // Constants returned by acceptNode
    public static final short   FILTER_ACCEPT = 1;
    public static final short   FILTER_REJECT = 2;
    public static final short   FILTER_SKIP   = 3;

    public short acceptNode(Node n);
    
    // Constants for whatToShow
    public static final int     SHOW_ALL                    = 0x0000FFFF;
    public static final int     SHOW_ELEMENT                = 0x00000001;
    public static final int     SHOW_ATTRIBUTE              = 0x00000002;
    public static final int     SHOW_TEXT                   = 0x00000004;
    public static final int     SHOW_CDATA_SECTION          = 0x00000008;
    public static final int     SHOW_ENTITY_REFERENCE       = 0x00000010;
    public static final int     SHOW_ENTITY                 = 0x00000020;
    public static final int     SHOW_PROCESSING_INSTRUCTION = 0x00000040;
    public static final int     SHOW_COMMENT                = 0x00000080;
    public static final int     SHOW_DOCUMENT               = 0x00000100;
    public static final int     SHOW_DOCUMENT_TYPE          = 0x00000200;
    public static final int     SHOW_DOCUMENT_FRAGMENT      = 0x00000400;
    public static final int     SHOW_NOTATION               = 0x00000800;

}

Weblogs with DOM

import org.w3c.dom.*;
import org.w3c.dom.traversal.*;
import org.xml.sax.SAXException;
import java.io.IOException;
import java.util.*;
import java.net.*;


public class WeblogsDOM {

  public static String DEFAULT_URL 
   = "http://static.userland.com/weblogMonitor/logs.xml"; 
     
  public static List listChannels() throws DOMException {
    return listChannels(DEFAULT_URL); 
  }
  
  public static List listChannels(String uri) throws DOMException {
    
    if (uri == null) {
      throw new NullPointerException("URL must be non-null");   
    }

    org.apache.xerces.parsers.DOMParser parser 
     = new org.apache.xerces.parsers.DOMParser();
    
    Vector urls = null;
    
    try {
      // Read the entire document into memory
      parser.parse(uri); 
      Document doc = parser.getDocument();
      org.apache.xerces.dom.DocumentImpl impl 
       = (org.apache.xerces.dom.DocumentImpl) doc;
      NodeIterator iterator = impl.createNodeIterator(doc, 
       NodeFilter.SHOW_ALL, new URLFilter(), true);
      urls = new Vector(100);

      Node current = null;
      while ((current = iterator.nextNode()) != null) {
        try {
          String content = current.getNodeValue();
          URL u = new URL(content);
          urls.addElement(u);
        }
        catch (MalformedURLException e) {
          // bad input data from one third party; just ignore it 
        }
      }
    }
    catch (SAXException e) {
      System.err.println(e); 
    }
    catch (IOException e) {
      System.err.println(e); 
    }
    
    return urls;
    
  }
  
  static class URLFilter implements NodeFilter {
        
    public short acceptNode(Node n) {
      
      if (n instanceof Text) {
        Node parent = n.getParentNode();
        if (parent instanceof Element) {
          Element e = (Element) parent;
          if (e.getTagName().equals("url")) {
            return NodeFilter.FILTER_ACCEPT;       
          }
        }
      }
      
      return NodeFilter.FILTER_REJECT;
      
    }
    
  }
    
  public static void main(String[] args) {
     
    try {
      List urls;
      if (args.length > 0) {
        try {
          URL url = new URL(args[0]);
          urls = listChannels(args[0]);
        }
        catch (MalformedURLException e) {
          System.err.println("Usage: java WeblogsJDOM url");
          return;
        }
      }
      else {
        urls = listChannels();
      }
      Iterator iterator = urls.iterator();
      while (iterator.hasNext()) {
        System.out.println(iterator.next()); 
      }
    }
    catch (/* Unexpected */ Exception e) {
      e.printStackTrace(); 
    }
    
  } // end main

}

Weblogs Output

% java WeblogsDOM
http://2020Hindsight.editthispage.com/
http://www.sff.net/people/mitchw/weblog/weblog.htp
http://nate.weblogs.com/
http://plugins.launchpoint.net
http://404.psistorm.net
http://home.att.net/~geek9000
http://daubnet.tzo.com/weblog
several hundred more...

Weblogs with JDOM

Let's look at the port to JDOM

JDOM Design

We can easily find out how many URLs there will be when we start parsing.
Single threaded by nature; no benefit to mutiple threads since no data will be available until the entire document has been read and parsed.
The character data of each url element needs to be read. Everything else can be ignored.
The format is very straight-forward so we don't need to traverse the entire tree.
The XML parsing is so straight-forward it can be done inside one method. No extra class is required.

Weblogs with JDOM

import org.jdom.*;
import org.jdom.input.SAXBuilder;
import java.util.*;
import java.net.*;


public class WeblogsJDOM {
   
  public static String DEFAULT_SYSTEM_ID 
   = "http://static.userland.com/weblogMonitor/logs.xml"; 
     
  public static List listChannels() throws JDOMException {
    return listChannels(DEFAULT_SYSTEM_ID); 
  }
  
  public static List listChannels(String systemID) 
   throws JDOMException, NullPointerException {
    
    if (systemID == null) {
      throw new NullPointerException("URL must be non-null");   
    }
    
    SAXBuilder builder = new SAXBuilder();
    // Load the entire document into memory 
    // from the network or file system
    Document doc = builder.build(systemID);
    
    // Descend the tree and find the URLs. It helps that
    // the document has a very regular structure.
    Element weblogs = doc.getRootElement();
    List logs = weblogs.getChildren("log");
    Vector urls = new Vector(logs.size());
    Iterator iterator = logs.iterator();
    while (iterator.hasNext()) {
      Object o = iterator.next();
      Element log = (Element) o;
      try {
                         // This will probably be changed to 
                         //  getElement() or getChildElement() 
        Element url = log.getChild("url"); 
        if (url == null) continue;
        String content = url.getTextTrim();
        URL u = new URL(content);
        urls.addElement(u);
      }
      catch (MalformedURLException e) {
        // bad input data from one third party; just ignore it 
      }
    }
    return urls;
    
  }
  
  public static void main(String[] args) {
   
    try {
      List urls;
      if (args.length > 0) {
        urls = listChannels(args[0]);
      }
      else {
        urls = listChannels();
      }
      Iterator iterator = urls.iterator();
      while (iterator.hasNext()) {
        System.out.println(iterator.next()); 
      }
    }
    catch (/* Unexpected */ Exception e) {
      e.printStackTrace(); 
    }
    
  }
  
}

Weblogs Output

% java WeblogsJDOM
http://2020Hindsight.editthispage.com/
http://www.sff.net/people/mitchw/weblog/weblog.htp
http://nate.weblogs.com/
http://plugins.launchpoint.net
http://404.psistorm.net
http://home.att.net/~geek9000
http://daubnet.tzo.com/weblog
several hundred more...

Part V: The org.jdom Package

The classes that represent an XML document and its parts

Document
Element
Attribute
Comment
DocType
Entity
ProcessingInstruction
Verifier
plus assorted exceptions

The Document Node

The root node containing the entire document; not the same as the root element
Contains:
- one element
- zero or more processing instructions
- zero or more comments
- zero or one document type declarations

The Document Class

package org.jdom;

public class Document implements Serializable, Cloneable {

  protected List    content;
  protected Element rootElement;
  protected DocType docType;

  protected Document() {}
  public    Document(Element rootElement) {}
  public    Document(Element rootElement, DocType docType) {}

  public Element   getRootElement() {}
  public Document  setRootElement(Element rootElement) {}
  public DocType   getDocType() {}
  public Document  setDocType(DocType docType) {}
  public List      getProcessingInstructions() {}
  public List      getProcessingInstructions(String target) {}
  public ProcessingInstruction getProcessingInstruction(String target)
    throws NoSuchProcessingInstructionException {}
  public Document  addProcessingInstruction(ProcessingInstruction pi) {}
  public Document  addProcessingInstruction(String target, String data) {}
  public Document  addProcessingInstruction(String target, Map data) {}
  public Document  setProcessingInstructions(List processingInstructions) {}
  public boolean   removeProcessingInstruction(ProcessingInstruction processingInstruction) {}
  public boolean   removeProcessingInstruction(String target) {}
  public boolean   removeProcessingInstructions(String target) {}
  public Document  addComment(Comment comment) {}
  public List      getMixedContent() {}
  
  // basic utility methods
  public final String  toString() {}
  public final String  getSerializedForm() {}  // going away
  public final boolean equals(Object ob) {}
  public final int     hashCode() {}
  public final Object  clone() {}

}

Document Example

import org.jdom.Document;
import org.jdom.JDOMException;
import org.jdom.input.SAXBuilder;
import org.jdom.output.XMLOutputter;
import java.io.IOException;


public class XMLPrinter {

  public static void main(String[] args) {
  
    if (args.length == 0) {
      System.out.println("Usage: java XMLPrinter URL1 URL2..."); 
    } 
      
    SAXBuilder builder = new SAXBuilder();
     
    // start parsing... 
    for (int i = 0; i < args.length; i++) {
      
      // command line should offer URIs or file names
      try {
        Document doc = builder.build(args[i]);
        System.out.println("*************" + args[i] + "*************");
        XMLOutputter outputter = new XMLOutputter();
        outputter.output(doc, System.out, "UTF-8");
      }
      catch (JDOMException e) { // indicates a well-formedness or other error
        System.out.println(args[i] + " is not well formed.");
        System.out.println(e.getMessage());
      }
      catch (IOException e) { // shouldn't happen beacuse System.out eats exceptions
        System.out.println(e.getMessage());
      }
      
    }   
  
  }

}

Output from XMLPrinter

% java XMLPrinter shortlogs.xml|more
*************shortlogs.xml*************
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE foo SYSTEM "http://msdn.microsoft.com/xml/general/htmlentities.dtd">
<weblogs>

          <log>

                    <name>MozillaZine</name>

                    <url>http://www.mozillazine.org</url>

                    <changesUrl>http://www.mozillazine.org/contents.rdf</changes
Url>

                    <ownerName>Jason Kersey</ownerName>

                    <ownerEmail>kerz@en.com</ownerEmail>

                    <description>THE source for news on the Mozilla Organization
.  DevChats, Reviews, Chats, Builds, Demos, Screenshots, and more.</description>

                    <imageUrl />
etc.

Element Nodes

Represents a complete element including its start tag, end tag, and content
Contains:
- Child Elements
- Processing Instructions
- Comments
- Text
JDOM enforces restrictions on element names and possibly values; e.g. name cannot contain start with a digit.

Element Class Implementation

The content is stored as a java.util.List which contains
- One String object per text node
- One Element object per child element
- One Comment object per comment
- One ProcessingInstruction object per processing instruction
Use the regular methods of java.util.List to add, remove, and inspect the contents of an element
Since the methods of java.util.List expect to work with Object objects, casting back to JDOM types and String is frequent
Various utility methods mean you don't always have to work with the full list.
Attributes are available as a separate List since attributes are not children.
This list only contains Attribute objects.

The Element Class

package org.jdom;

public class Element implements Serializable, Cloneable {

    protected String    name;
    protected Namespace namespace;
    protected Element   parent;
    protected boolean   isRootElement;
    protected List      attributes;
    protected List      content;

    protected Element() {}
    public    Element(String name, Namespace namespace) {}
    public    Element(String name) {}
    public    Element(String name, String uri) {}
    public    Element(String name, String prefix, String uri) {}

    public String     getName() {}
    public Namespace  getNamespace() {}
    public String     getNamespacePrefix() {}
    public String     getNamespaceURI() {}
    public String     getQualifiedName() {}
    
    public Element    getParent() {}
    protected Element setParent(Element parent) {}
    public boolean    isRootElement() {}
    protected Element setIsRootElement(boolean isRootElement) {}
    

    public String    getText() {} 
    public String    getTextTrim() {} 
    public boolean   hasMixedContent() {} 
    public List      getMixedContent() {}
    public String    getChildText(String name) {} 
    public String    getChildTextTrim(String name) {} 
    public String    getChildText(String name, Namespace ns) {} 

    public Element   setMixedContent(List mixedContent) {} 
    public List      getChildren() {} 
    public Element   setChildren(List children) {} 
    public List      getChildren(String name, Namespace ns) {} 
    // will be renamed, probably getElement() {}
    public Element   getChild(String name, Namespace ns) {} 
    public Element   getChild(String name) {} 
    public boolean   removeChild(String name) {} 
    public boolean   removeChild(String name, Namespace ns) {} 
    public boolean   removeChildren(String name) {}
    public boolean   removeChildren(String name, Namespace ns) {} 
    public boolean   removeChildren() {} 
    
    public Element   addContent(String text) {}
    public Element   addContent(Element element) {} 
    public Element   addContent(ProcessingInstruction pi) {} 
    public Element   addContent(Entity entity) {} 
    public Element   addContent(Comment comment) {} 
    public boolean   removeContent(Element element) {} 
    public boolean   removeContent(ProcessingInstruction pi) {} 
    public boolean   removeContent(Entity entity) {} 
    public boolean   removeContent(Comment comment) {} 
    
    public List      getAttributes() {} 
    public Attribute getAttribute(String name) {} 
    public Attribute getAttribute(String name, Namespace ns) {} 
    public String    getAttributeValue(String name) {} 
    public String    getAttributeValue(String name, Namespace ns) {} 
    public Element   setAttributes(List attributes) {} 
    public Element   addAttribute(Attribute attribute) {}
    public Element   addAttribute(String name, String value) {} 
    public boolean   removeAttribute(String name, String uri) {} 
    public boolean   removeAttribute(String name) {} 
    public boolean   removeAttribute(String name, Namespace ns) {} 
    
    public Element   getCopy(String name, Namespace ns) {}
    public Element   getCopy(String name, String uri) {}
    public Element   getCopy(String name, String prefix, String uri) {}

    /////////////////////////////////////////////////////////////////
    // Basic Utility Methods
    ///////////////////////////////////////////////////////////////// 
    public final String  toString() {}
    public final String  getSerializedForm() {}  // will be removed
    public final boolean equals(Object ob) {}
    public final int     hashCode() {}
    public final Object  clone() {}
    
}

Element Example: XCount

import org.jdom.*;
import org.jdom.input.SAXBuilder;
import java.util.*;


public class XCount {

  public static void main(String[] args) {
  
    if (args.length == 0) {
      System.out.println("Usage: java XCount URL1 URL2..."); 
    } 
      
    SAXBuilder builder = new SAXBuilder();
     
    System.out.println(
     "File\tElements\tAttributes\tComments\tProcessing Instructions\tCharacters");
    // start parsing... 
    for (int i = 0; i < args.length; i++) {
      
      // command line should offer URIs or file names
      try {
        Document doc = builder.build(args[i]);
        System.out.print(args[i] + ":\t");
        String result = count(doc);
        System.out.println(result);
      }
      catch (JDOMException e) { // indicates a well-formedness or other error
        System.out.println(args[i] + " is not a well formed XML document.");
        System.out.println(e.getMessage());
      }
      
    }   
  
  }  

  private static int numCharacters             = 0;
  private static int numComments               = 0;
  private static int numElements               = 0;
  private static int numAttributes             = 0;
  private static int numProcessingInstructions = 0;
      
  public static String count(Document doc) {

    numCharacters = 0;
    numComments = 0;
    numElements = 0;
    numAttributes = 0;
    numProcessingInstructions = 0;  

    List children = doc.getMixedContent();
    Iterator iterator = children.iterator();
    while (iterator.hasNext()) {
      Object o = iterator.next();
      if (o instanceof Element) {
        numElements++;
        count((Element) o);
      }
      else if (o instanceof Comment) numComments++;
      else if (o instanceof ProcessingInstruction) numProcessingInstructions++;   
    }
    
    String result = numElements + "\t" + numAttributes + "\t" 
     + numComments + "\t" + numProcessingInstructions + "\t" + numCharacters;
    return result;
       
  }     

  public static void count(Element element) {

    List attributes = element.getAttributes();
    numAttributes += attributes.size();
    List children = element.getMixedContent();
    Iterator iterator = children.iterator();
    while (iterator.hasNext()) {
      Object o = iterator.next();
      if (o instanceof Element) {
        numElements++;
        count((Element) o);
      }
      else if (o instanceof Comment) numComments++;
      else if (o instanceof ProcessingInstruction) numProcessingInstructions++;   
      else if (o instanceof String) {
        String s = (String) o;
        numCharacters += s.length();
      }   
    }
        
  }  

}

XCount Output

% java XCount shortlogs.xml hotcop.xml
File    Elements        Attributes      Comments        Processing Instructions
Characters
shortlogs.xml:  30      0       0       0       736
hotcop.xml:     11      8       2       1       95

Handling Attributes in JDOM

Each attribute is represented as an Attribute object
Each Attribute has:
- A local name, a String
- A value, a String
- A Namespace object (which may be Namespace.NO_NAMESPACE)
Everything else can be determined from these three items.

Convenience methods can convert the attribute value to various types like int or double
JDOM enforces restrictions on attribute names and values; e.g. value may not contain < or >
Attributes are stored in a java.util.List in the Element that contains them
This list only contains Attribute objects.

The Attribute Class

package org.jdom;

public class Attribute implements Serializable, Cloneable {

    protected String    name;
    protected Namespace namespace;
    protected String    value;

    protected Attribute() {}
    public    Attribute(String name, String value, Namespace namespace) {}
    public    Attribute(String name, String prefix, String uri, String value) {}
    public    Attribute(String name, String value) {}

    public String    getName() {}
    public String    getQualifiedName() {}
    public String    getNamespacePrefix() {}
    public String    getNamespaceURI() {}
    public Namespace getNamespace() {}
    public String    getValue() {}
    public void      setValue(String value) {}

    /////////////////////////////////////////////////////////////////
    // Basic Utility Methods
    /////////////////////////////////////////////////////////////////

    public final String  toString() {}
    public final boolean equals(Object ob) {}
    public final int     hashCode() {}
    public final Object  clone() {}

    /////////////////////////////////////////////////////////////////
    // Convenience Methods below here
    /////////////////////////////////////////////////////////////////

    public String  getValue(String defaultValue) {}
    public int     getIntValue(int defaultValue) {}
    public int     getIntValue() throws DataConversionException {}
    public long    getLongValue(long defaultValue) {}
    public long    getLongValue() throws DataConversionException {}
    public float   getFloatValue(float defaultValue) {}
    public float   getFloatValue() throws DataConversionException {}
    public double  getDoubleValue(double defaultValue) {}
    public double  getDoubleValue() throws DataConversionException {}
    public boolean getBooleanValue(boolean defaultValue) {}
    public boolean getBooleanValue() throws DataConversionException {}
    public char    getCharValue(char defaultValue) {}
    public char    getCharValue() throws DataConversionException {}

}

IDTagger

import java.io.IOException;
import org.jdom.*;
import org.jdom.input.SAXBuilder;
import org.jdom.output.XMLOutputter;
import java.util.*;


public class IDTagger {

  private static int id = 1;

  public static void processElement(Element element) {
    

    if (element.getAttribute("ID") == null) {
      element.addAttribute(new Attribute("ID", "_" + id));
      id = id + 1; 
    }
    
    // recursion
    List children = element.getChildren();
    Iterator iterator = children.iterator();
    while (iterator.hasNext()) {
      processElement((Element) iterator.next());   
    }
    
  }

  public static void main(String[] args) {
     
    SAXBuilder builder = new SAXBuilder();
    
    for (int i = 0; i < args.length; i++) {
        
      try {
        // Read the entire document into memory
        Document document = builder.build(args[i]); 
       
        processElement(document.getRootElement());
        
        // now we serialize the document...
        XMLOutputter serializer = new XMLOutputter(); 
        serializer.output(document, System.out);
        System.out.flush();	        
      }
      catch (JDOMException e) {
        System.err.println(e);
        continue; 
      }
      catch (IOException e) {
        System.err.println(e);
        continue; 
      }
      
    }
  
  } // end main

}

Before IDTagger

<?xml version="1.0"?><backslash
xmlns:backslash="http://slashdot.org/backslash.dtd">

 <story>
    <title>The Onion to buy the New York Times</title>
    <url>http://slashdot.org/articles/00/02/19/1128240.shtml</url>
    <time>2000-02-19 17:25:15</time>
    <author>CmdrTaco</author>
    <department>stuff-to-read</department>
    <topic>media</topic>
    <comments>20</comments>
    <section>articles</section>
    <image>topicmedia.gif</image>
  </story>
 <story>
    <title>Al Gore's Webmaster Answers Your Questions</title>
    <url>http://slashdot.org/interviews/00/02/19/0932207.shtml</url>
    <time>2000-02-19 17:00:52</time>
    <author>Roblimo</author>
    <department>political-process-online</department>
    <topic>usa</topic>
    <comments>49</comments>
    <section>interviews</section>
    <image>topicus.gif</image>
  </story>
 <story>
    <title>Open Source Africa</title>
    <url>http://slashdot.org/articles/00/02/19/1016216.shtml</url>
    <time>2000-02-19 16:05:58</time>
    <author>emmett</author>
    <department>songs-by-toto</department>
    <topic>linux</topic>
    <comments>50</comments>
    <section>articles</section>
    <image>topiclinux.gif</image>
  </story>
 <story>
    <title>Microsoft Funded by NSA, Helps Spy on Win Users?</title>
    <url>http://slashdot.org/articles/00/02/19/0750247.shtml</url>
    <time>2000-02-19 14:07:04</time>
    <author>Roblimo</author>
    <department>deep-dark-conspiracy-theories</department>
    <topic>microsoft</topic>
    <comments>154</comments>
    <section>articles</section>
    <image>topicms.gif</image>
  </story>
 <story>
    <title>X-Men Trailer Released</title>
    <url>http://slashdot.org/articles/00/02/18/0829209.shtml</url>
    <time>2000-02-19 13:47:06</time>
    <author>emmett</author>
    <department>mutant</department>
    <topic>movies</topic>
    <comments>70</comments>
    <section>articles</section>
    <image>topicmovies.gif</image>
  </story>
 <story>
    <title>Connell Replies to "Grok" Comments</title>
    <url>http://slashdot.org/articles/00/02/18/202240.shtml</url>
    <time>2000-02-19 05:01:37</time>
    <author>Hemos</author>
    <department>replying-to-things</department>
    <topic>linux</topic>
    <comments>197</comments>
    <section>articles</section>
    <image>topiclinux.gif</image>
  </story>
 <story>
    <title>etoy.com Returns</title>
    <url>http://slashdot.org/yro/00/02/18/1739216.shtml</url>
    <time>2000-02-19 02:35:06</time>
    <author>nik</author>
    <department>NP:-gimme-shelter</department>
    <topic>internet</topic>
    <comments>77</comments>
    <section>yro</section>
    <image>topicinternet.jpg</image>
  </story>
 <story>
    <title>New Propaganda Series: Rebirth</title>
    <url>http://slashdot.org/articles/00/02/18/205232.shtml</url>
    <time>2000-02-19 01:05:26</time>
    <author>Hemos</author>
    <department>as-pretty-as-always</department>
    <topic>graphics</topic>
    <comments>120</comments>
    <section>articles</section>
    <image>topicgraphics3.gif</image>
  </story>
 <story>
    <title>Giving Back</title>
    <url>http://slashdot.org/features/00/02/18/1631224.shtml</url>
    <time>2000-02-18 22:27:26</time>
    <author>emmett</author>
    <department>salvation-army</department>
    <topic>news</topic>
    <comments>122</comments>
    <section>features</section>
    <image>topicnews.gif</image>
  </story>
 <story>
    <title>Connectix Considering Open Sourcing VGS?</title>
    <url>http://slashdot.org/articles/00/02/18/1050225.shtml</url>
    <time>2000-02-18 20:46:20</time>
    <author>emmett</author>
    <department>grain-of-salt</department>
    <topic>news</topic>
    <comments>93</comments>
    <section>articles</section>
    <image>topicnews.gif</image>
  </story>
</backslash>

View Input in Browser

After IDTagger

<?xml version="1.0" encoding="UTF-8"?>
<backslash ID="_1">
  <story ID="_2">
    <title ID="_3">The Onion to buy the New York Times</title>
    <url ID="_4">http://slashdot.org/articles/00/02/19/1128240.shtml</url>
    <time ID="_5">2000-02-19 17:25:15</time>
    <author ID="_6">CmdrTaco</author>
    <department ID="_7">stuff-to-read</department>
    <topic ID="_8">media</topic>
    <comments ID="_9">20</comments>
    <section ID="_10">articles</section>
    <image ID="_11">topicmedia.gif</image>
  </story>
  <story ID="_12">
    <title ID="_13">Al Gore's Webmaster Answers Your Questions</title>
    <url ID="_14">http://slashdot.org/interviews/00/02/19/0932207.shtml</url>
    <time ID="_15">2000-02-19 17:00:52</time>
    <author ID="_16">Roblimo</author>
    <department ID="_17">political-process-online</department>
    <topic ID="_18">usa</topic>
    <comments ID="_19">49</comments>
    <section ID="_20">interviews</section>
    <image ID="_21">topicus.gif</image>
  </story>
  <story ID="_22">
    <title ID="_23">Open Source Africa</title>
    <url ID="_24">http://slashdot.org/articles/00/02/19/1016216.shtml</url>
    <time ID="_25">2000-02-19 16:05:58</time>
    <author ID="_26">emmett</author>
    <department ID="_27">songs-by-toto</department>
    <topic ID="_28">linux</topic>
    <comments ID="_29">50</comments>
    <section ID="_30">articles</section>
    <image ID="_31">topiclinux.gif</image>
  </story>
  <story ID="_32">
    <title ID="_33">Microsoft Funded by NSA, Helps Spy on Win Users?</title>
    <url ID="_34">http://slashdot.org/articles/00/02/19/0750247.shtml</url>
    <time ID="_35">2000-02-19 14:07:04</time>
    <author ID="_36">Roblimo</author>
    <department ID="_37">deep-dark-conspiracy-theories</department>
    <topic ID="_38">microsoft</topic>
    <comments ID="_39">154</comments>
    <section ID="_40">articles</section>
    <image ID="_41">topicms.gif</image>
  </story>
  <story ID="_42">
    <title ID="_43">X-Men Trailer Released</title>
    <url ID="_44">http://slashdot.org/articles/00/02/18/0829209.shtml</url>
    <time ID="_45">2000-02-19 13:47:06</time>
    <author ID="_46">emmett</author>
    <department ID="_47">mutant</department>
    <topic ID="_48">movies</topic>
    <comments ID="_49">70</comments>
    <section ID="_50">articles</section>
    <image ID="_51">topicmovies.gif</image>
  </story>
  <story ID="_52">
    <title ID="_53">Connell Replies to "Grok" Comments</title>
    <url ID="_54">http://slashdot.org/articles/00/02/18/202240.shtml</url>
    <time ID="_55">2000-02-19 05:01:37</time>
    <author ID="_56">Hemos</author>
    <department ID="_57">replying-to-things</department>
    <topic ID="_58">linux</topic>
    <comments ID="_59">197</comments>
    <section ID="_60">articles</section>
    <image ID="_61">topiclinux.gif</image>
  </story>
  <story ID="_62">
    <title ID="_63">etoy.com Returns</title>
    <url ID="_64">http://slashdot.org/yro/00/02/18/1739216.shtml</url>
    <time ID="_65">2000-02-19 02:35:06</time>
    <author ID="_66">nik</author>
    <department ID="_67">NP:-gimme-shelter</department>
    <topic ID="_68">internet</topic>
    <comments ID="_69">77</comments>
    <section ID="_70">yro</section>
    <image ID="_71">topicinternet.jpg</image>
  </story>
  <story ID="_72">
    <title ID="_73">New Propaganda Series: Rebirth</title>
    <url ID="_74">http://slashdot.org/articles/00/02/18/205232.shtml</url>
    <time ID="_75">2000-02-19 01:05:26</time>
    <author ID="_76">Hemos</author>
    <department ID="_77">as-pretty-as-always</department>
    <topic ID="_78">graphics</topic>
    <comments ID="_79">120</comments>
    <section ID="_80">articles</section>
    <image ID="_81">topicgraphics3.gif</image>
  </story>
  <story ID="_82">
    <title ID="_83">Giving Back</title>
    <url ID="_84">http://slashdot.org/features/00/02/18/1631224.shtml</url>
    <time ID="_85">2000-02-18 22:27:26</time>
    <author ID="_86">emmett</author>
    <department ID="_87">salvation-army</department>
    <topic ID="_88">news</topic>
    <comments ID="_89">122</comments>
    <section ID="_90">features</section>
    <image ID="_91">topicnews.gif</image>
  </story>
  <story ID="_92">
    <title ID="_93">Connectix Considering Open Sourcing VGS?</title>
    <url ID="_94">http://slashdot.org/articles/00/02/18/1050225.shtml</url>
    <time ID="_95">2000-02-18 20:46:20</time>
    <author ID="_96">emmett</author>
    <department ID="_97">grain-of-salt</department>
    <topic ID="_98">news</topic>
    <comments ID="_99">93</comments>
    <section ID="_100">articles</section>
    <image ID="_101">topicnews.gif</image>
  </story>
</backslash>

View Output in Browser

Handling Entities in JDOM

Unparsed entities really aren't handled at all.
In general, the parser resolves parsed entities and you never see them.
When writing, the outputter outputs entity references but not the entity's content.
The Entity class represents a parsed entity.
The API is mostly like the API of Element
This one is still being thought out.

The Entity Class

package org.jdom;

public class Entity implements Serializable, Cloneable {

    protected String name;
    protected List   content;

    protected Entity() {}
    public    Entity(String name) {}
    
    public String  getName() {}
    public String  getContent() {}
    public Entity  setContent(String textContent) {}
    public boolean hasMixedContent() {}
    public List    getMixedContent() {}
    public Entity  setMixedContent(List mixedContent) {}
    public List    getChildren() {}
    public Entity  setChildren(List children) {}
    public Entity  addChild(Element element) {}
    public Entity  addChild(String s) {}
    public Entity  addText(String text) {}
    
    public final String  toString() {}
    public final String  getSerializedForm() {} // will be removed
    public final boolean equals(Object ob) {}
    public final int     hashCode() {}
    public final Object  clone() {}

}

Handling Comments in JDOM

A Comment object Represents a comment like this example from the XML 1.0 spec:

<!--* N.B. some readers (notably JC) find the following
paragraph awkward and redundant.  I agree it's logically redundant:
it *says* it is summarizing the logical implications of
matching the grammar, and that means by definition it's
logically redundant.  I don't think it's rhetorically
redundant or unnecessary, though, so I'm keeping it.  It
could however use some recasting when the editors are feeling
stronger. -MSM *-->

No children
JDOM checks the content to make sure it's legal (i.e. does not contain a double-hyphen)

The Comment Class

package org.jdom;

public class Comment implements Serializable, Cloneable {

    protected String text;

    protected Comment() {}
    public    Comment(String text) {}
    
    public String getText() {}
    public void   setText(String text) {}
    
    public final String  toString() {}
    public final String  getSerializedForm() {} // will be removed
    public final boolean equals(Object ob) {}
    public final int     hashCode() {}
    public final Object  clone() {}

}

Comment Example

import org.jdom.*;
import org.jdom.input.SAXBuilder;
import java.util.*;


public class CommentReader {

  public static void main(String[] args) {
     
    SAXBuilder builder = new SAXBuilder();
    
    for (int i = 0; i < args.length; i++) {
      try {
        // Read the entire document into memory
        Document doc = builder.build(args[i]);
        List content = doc.getMixedContent();
        Iterator iterator = content.iterator();
        while (iterator.hasNext()) {
          Object o = iterator.next();
          if (o instanceof Comment) {
            Comment c = (Comment) o;
            System.out.println(c.getText());     
            System.out.println();     
          }
          else if (o instanceof Element) {
            processElement((Element) o);   
          }
        }
      }
      catch (JDOMException e) {
        System.err.println(e); 
      }
      
    }
  
  } // end main

  // note use of recursion
  public static void processElement(Element element) {
    
    List content = element.getMixedContent();
    Iterator iterator = content.iterator();
    while (iterator.hasNext()) {
      Object o = iterator.next();
      if (o instanceof Comment) {
        Comment c = (Comment) o;
        System.out.println(c.getText());     
        System.out.println();     
      }
      else if (o instanceof Element) {
        processElement((Element) o);   
      }
    } // end while
    
  }

}

CommentReader Output

% java CommentReader hotcop.xml
 The publisher is actually Polygram but I needed
       an example of a general entity reference.

 You can tell what album I was
     listening to when I wrote this example

Or try http://www.w3.org/TR/1998/REC-xml-19980210.xml for more interesting output. (Actually this document exposes a bug in JDOM right now.????)

ProcessingInstruction Nodes

Represents a processing instruction like
<?robots index="yes" follow="no"?>
No children

Some have pseudo-attributes; some don't:

<?php 
  mysql_connect("database.unc.edu", "clerk", "password"); 
  $result = mysql("music", "SELECT LastName, FirstName FROM Employees 
    ORDER BY LastName, FirstName"); 
  $i = 0;
  while ($i < mysql_numrows ($result)) {
     $fields = mysql_fetch_row($result);
     echo "<person>$fields[1] $fields[0] </person>\r\n";
     $i++;
  }
  mysql_close();
?>

A ProcessingInstruction is represented as either
- Target and Value
- Target and Pseudo-attributes
As usual JDOM checks the contents of each processingInstruction object for well-formedness

The ProcessingInstruction Class

package org.jdom;

public class ProcessingInstruction implements Serializable, Cloneable {

    protected String target;
    protected String rawData;
    protected Map    mapData;

    protected ProcessingInstruction() {}
    public    ProcessingInstruction(String target, Map data) {}
    public    ProcessingInstruction(String target, String data) {}
    
    public String                getTarget() {}
    public String                getData() {}
    public ProcessingInstruction setData(String data) {}
    public ProcessingInstruction setData(Map data) {}
    public String                getValue(String name) {}
    public ProcessingInstruction setValue(String name, String value) {}
    public boolean               removeValue(String name) {}

    public final String toString() {}
    public final String getSerializedForm() {} // will be removed
    public final boolean equals(Object ob) {}
    public final int hashCode() {}
    public final Object clone() {}
}

XLinkSpider that Respects the robots Processing Instruction

import java.io.*;
import java.util.*;
import org.jdom.*;
import org.jdom.input.SAXBuilder;


public class XLinkSpider {

  private static SAXBuilder builder = new SAXBuilder();
  
  private static Vector visited = new Vector();
  
  private static int maxDepth = 5;
  private static int currentDepth = 0; 
  
  public static void listURIs(String systemID) {
    
    currentDepth++;
    try {
      if (currentDepth < maxDepth) {

        Document document = builder.build(systemID); 
                
        // check to see if we're allowed to spider
        boolean index = true;
        boolean follow = true;
        ProcessingInstruction robots 
         = document.getProcessingInstruction("robots");
        if (robots != null) {
          String indexValue = robots.getValue("index");
          if (indexValue.equalsIgnoreCase("no")) index = false;
          String followValue = robots.getValue("follow");
          if (followValue.equalsIgnoreCase("no")) follow = false;
        }
        Vector uris = new Vector();
        // search the document for uris, 
        // store them in vector, and print them
        if (follow) searchForURIs(document.getRootElement(), uris);
    
        Enumeration e = uris.elements();
        while (e.hasMoreElements()) {
          String uri = (String) e.nextElement();
          visited.addElement(uri);
          if (index) listURIs(uri); 
        }
      
      }
    
    }
    catch (JDOMException e) {
      // couldn't load the document, 
      // probably not well-formed XML, skip it 
    }
    finally { 
      currentDepth--;
      System.out.flush();     
    }
      
  }
  
  private static Namespace xlink = Namespace.getNamespace("http://www.w3.org/1999/xlink");
  
  // use recursion 
  public static void searchForURIs(Element element, Vector uris) {
    
    // look for XLinks in this element
    String uri = element.getAttributeValue("href", xlink);
    if (uri != null && !uri.equals("") && !visited.contains(uri) && !uris.contains(uri)) {
      System.out.println(uri);
      uris.addElement(uri);
    }
    
    // process child elements recursively
    List children = element.getChildren();
    Iterator iterator = children.iterator();
    while (iterator.hasNext()) {
      searchForURIs((Element) iterator.next(), uris); 
    }
    
  }

  public static void main(String[] args) {
    
    if (args.length == 0) {
      System.out.println("Usage: java XLinkSpider URL1 URL2..."); 
    } 
      
    // start parsing... 
    for (int i = 0; i < args.length; i++) {
      System.err.println(args[i]); 
      listURIs(args[i]);
    } // end for
  
  } // end main

} // end XLinkSpider

Handling Namespaces

JDOM is fully namespace aware
Namespaces are represented by instances of the Namespace class rather than by attributes or raw strings
Always ask for elements and attributes by local names and namespace URIs
Elements and attributes that are not in any namespace can be asked for by local name alone
Never identify an element or attribute by qualified name

The Namespace Class

Mostly for internal parser use
Occasionally useful for tasks like finding out whether a document contains any XLinks

The Namespace Class

package org.jdom;

public final class Namespace {

  public static final Namespace NO_NAMESPACE = new Namespace("", "");

  // factory methods
  public static Namespace getNamespace(String prefix, String uri) {}
  public static Namespace getNamespace(String uri) {}

  // getter methods
  public String  getPrefix() {}
  public String  getURI() {}

  // utility methods
  public boolean equals(Object ob) {}
  public String  toString() {}
  public int     hashCode() {}

}

DocType Nodes

Represents a document type declaration
Has no children

The DocType class

package org.jdom;

public class DocType implements Serializable, Cloneable {

    protected String elementName;
    protected String publicID;
    protected String systemID;

    protected DocType() {}
    public    DocType(String rootElementName, String publicID, String systemID) {}
    public    DocType(String rootElementName, String systemID) {}
    public    DocType(String rootElementName) {}

    public String  getElementName() {}
    public String  getPublicID() {}
    public DocType setPublicID(String publicID) {}
    public String  getSystemID() {}
    public DocType setSystemID(String systemID) {}

    // Usual utility methods
    public final String  toString() {}
    public final String  getSerializedForm() {} // will be removed
    public final boolean equals(Object ob) {}
    public final int     hashCode() {}
    public final Object  clone() {}
    
}

Example of the DocType Class

Verify that a document is correct XHTML
From the XHTML 1.0 spec:
1. It must validate against one of the three DTDs found in Appendix A.
2. The root element of the document must be <html>.
3. The root element of the document must designate the XHTML namespace using the xmlns attribute [XMLNAMES]. The namespace for XHTML is defined to be http://www.w3.org/1999/xhtml.
4. There must be a DOCTYPE declaration in the document prior to the root element. The public identifier included in the DOCTYPE declaration must reference one of the three DTDs found in Appendix A using the respective Formal Public Identifier. The system identifier may be changed to reflect local system conventions.
```
<!DOCTYPE html 
     PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
     "DTD/xhtml1-strict.dtd">

<!DOCTYPE html 
     PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
     "DTD/xhtml1-transitional.dtd">

<!DOCTYPE html 
     PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN"
     "DTD/xhtml1-frameset.dtd">
```

XHTMLValidator

import java.io.*;
import org.jdom.*;
import org.jdom.input.SAXBuilder;


public class XHTMLValidator {

  public static void main(String[] args) {
    
    for (int i = 0; i < args.length; i++) {
      validate(args[i]);
    }   
    
  }

  private static SAXBuilder builder = new SAXBuilder(true);
                                                 /*  ^^^^ */
                                              /* turn on validation  */
  
  // not thread safe
  public static void validate(String source) {
        
      Document document;
      try {
        document = builder.build(source); 
      }
      catch (JDOMException e) {  
        System.out.println("Error: " + e.getMessage()); 
        e.printStackTrace();
        return; 
      }
      
      // If we get this far, then the document is valid XML.
      // Check to see whether the document is actually XHTML        
      DocType doctype = document.getDocType();
    
      if (doctype == null) {
        System.out.println("No DOCTYPE"); 
        return;
      }

      String name     = doctype.getElementName();
      String systemID = doctype.getSystemID();
      String publicID = doctype.getPublicID();
      
      if (!name.equals("html")) {
        System.out.println("Incorrect root element name " + name); 
      }
    
      if (publicID == null
       || (!publicID.equals("-//W3C//DTD XHTML 1.0 Strict//EN")
           && !publicID.equals("-//W3C//DTD XHTML 1.0 Transitional//EN")
           && !publicID.equals("-//W3C//DTD XHTML 1.0 Frameset//EN"))) {
        System.out.println(source + " does not seem to use an XHTML 1.0 DTD");
      }
    
      // Check the namespace on the root element
      Element root = document.getRootElement();
      Namespace namespace = root.getNamespace();
      String prefix = namespace.getPrefix();
      String uri = namespace.getURI();
      if (!uri.equals("http://www.w3.org/1999/xhtml")) {
        System.out.println(source 
         + " does not properly declare the http://www.w3.org/1999/xhtml namespace on the root element");        
      }
      if (!prefix.equals("")) {
        System.out.println(source 
         + " does not use the empty prefix for XHTML");        
      }
    
  }

}

Using the XHTMLValidator

% java XHTMLValidator http://www.w3.org/TR/xhtml1
Error: File "http://www.w3.org/TR/DTD/xhtml1-strict.dtd" not found.: Error on 
line -1 of XML document: File "http://www.w3.org/TR/DTD/xhtml1-strict.dtd" not 
found.
org.jdom.JDOMException: File "http://www.w3.org/TR/DTD/xhtml1-strict.dtd" not 
found.: Error on line -1 of XML document: File 
"http://www.w3.org/TR/DTD/xhtml1-strict.dtd" not found.
        at org.jdom.input.SAXBuilder.build(SAXBuilder.java:227)
        at org.jdom.input.SAXBuilder.build(SAXBuilder.java:359)
        at XHTMLValidator.validate(XHTMLValidator.java:25)
        at XHTMLValidator.main(XHTMLValidator.java:11)
Root cause: org.jdom.JDOMException: Error on line -1 of XML document: File 
"http://www.w3.org/TR/DTD/xhtml1-strict.dtd" not found.
        at org.jdom.input.SAXBuilder.build(SAXBuilder.java:228)
        at org.jdom.input.SAXBuilder.build(SAXBuilder.java:359)
        at XHTMLValidator.validate(XHTMLValidator.java:25)
        at XHTMLValidator.main(XHTMLValidator.java:11)

The Verifier Class

Checks a variety of strings to see if they're legal for partiuclar uses in XML. as specified by XML 1.0 and Namespaces in XML.
Mostly for internal parser use

The Verifier Class

package org.jdom;

public final class Verifier {

    public static final String checkElementName(String name) {}
    public static final String checkAttributeName(String name) {}
    public static final String checkCharacterData(String text) {}
    public static final String checkNamespacePrefix(String prefix) {}
    public static final String checkNamespaceURI(String uri) {}
    public static final String checkProcessingInstructionTarget(String target) {}
    public static final String checkCommentData(String data) {}
 
    public static boolean isXMLCharacter(char c) {}
    public static boolean isXMLNameCharacter(char c) {}
    public static boolean isXMLNameStartCharacter(char c) {}
    public static boolean isXMLLetterOrDigit(char c) {}
    public static boolean isXMLLetter(char c) {}
    public static boolean isXMLCombiningChar(char c) {}
    public static boolean isXMLExtender(char c) {}
    public static boolean isXMLDigit(char c) {}

}

JDOMException

A checked exception so you must catch it
Wraps other exceptions that are thrown during JDOM operations like IOException or SAXException
Root cause of exception (if any) is accessible through the getRootCause() method:
public Throwable getRootCause()
Subclasses:
- DataConversionException
- NoSuchAttributeException
- NoSuchChildException
- NoSuchProcessingInstructionException
IllegalArgumentException subclasses:
- IllegalAddException
- IllegalDataException
- IllegalNameException
- IllegalTargetException

JDOMException Class

package org.jdom;

public class JDOMException extends Exception {

    protected Throwable rootCause;

    public JDOMException() {}
    public JDOMException(String message)  {}
    public JDOMException(String message, Throwable rootCause)  {} 
       
    public String    getMessage() {}
    public void      printStackTrace() {}
    public void      printStackTrace(PrintStream s) {}
    public void      printStackTrace(PrintWriter w) {}
    public Throwable getRootCause()  {}

}

Part V: The org.jdom.output Package

DOMOutputter
SAXOutputter
XMLOutputter

Serialization

The process of taking an in-memory JDOM Document and converting it to a stream of characters that can be written onto an output stream
The org.jdom.output.XMLOutputter class

XMLOutputter

This class is still undergoing major API changes.

package org.jdom.output;

public class XMLOutputter {

    // constructors
    public XMLOutputter() {}
    public XMLOutputter(String indent) {}
    public XMLOutputter(String indent, String lineSeparator) {}
    public XMLOutputter(String indent, String lineSeparator, String encoding) {}
    
    
    // Basic output methods
    public void output(Document doc, OutputStream out) throws IOException {}
    public void output(Document doc, OutputStream out, String encoding) 
     throws IOException {}
    
    // configuration methods
    public void setLineSeparator(String lineSeparator) {}
    public String getLineSeparator() {}
    
    // Methods for subclasses to call
    protected void indent(Writer out, int level) throws IOException {}
    protected void maybePrintln(Writer out) throws IOException  {}

    // Methods for subclasses to override
    protected void printComment(Comment comment, Writer out, int indentLevel) 
     throws IOException {}
    protected void printProcessingInstruction(ProcessingInstruction pi, Writer out, 
     int indentLevel) throws IOException {}
    protected void printDeclaration(Document doc, Writer out, String encoding)  
     throws IOException {}
    protected void printDocType(DocType docType, Writer out) throws IOException {}
    protected void printElement(Element element, Writer out, int indentLevel)
     throws IOException
    protected void printEntity(Entity entity, Writer out) throws IOException {}
    protected void printNamespace(Namespace ns, Writer out) throws IOException {}
    protected void printAttributes(List attributes, Writer out) throws IOException {}
    
}

Using the XMLOutputter Class Directly

Configured with three variables passed to the constructor:
- indent: a String added at each level of output; e.g. two spaces or a tab
- lineSeparator: the String to break lines with, no line breaking is performed if this is null or the empty string
- encoding: The name of the encoding to use for output; e.g. UTF-16 or ISO-8859-1

The output() method writes a Document onto a given OutputStream:


public void output(Document doc, OutputStream out, String encoding) throws IOException
public void output(Document doc, OutputStream out) throws IOException

Using the XMLOutputter Class Indirectly

Configured by overriding protected methods:

    protected void printComment(Comment comment, Writer out, int indentLevel) 
     throws IOException {}
    protected void printProcessingInstruction(ProcessingInstruction pi, Writer out, 
     int indentLevel) throws IOException {}
    protected void printDeclaration(Document doc, Writer out, String encoding)  
     throws IOException {}
    protected void printDocType(DocType docType, Writer out) throws IOException {}
    protected void printElement(Element element, Writer out, int indentLevel)
     throws IOException
    protected void printEntity(Entity entity, Writer out) throws IOException {}
    protected void printNamespace(Namespace ns, Writer out) throws IOException {}
    protected void printAttributes(List attributes, Writer out) throws IOException {}

JDOM based TagStripper

import org.jdom.*;
import org.jdom.output.XMLOutputter;
import org.jdom.input.SAXBuilder;
import java.io.*;
import java.util.*;


public class TagStripper extends XMLOutputter {

  public TagStripper() {
    super();
  }

  // Things we won't print at all
  protected void printDeclaration(Document doc, Writer out, String encoding) {}
  protected void printComment(Comment comment, Writer out, int indentLevel) {}
  protected void printDocType(DocType docType, Writer out) {}
  protected void printProcessingInstruction(ProcessingInstruction pi, 
   Writer out, int indentLevel) {}
  protected void printNamespace(Namespace ns, Writer out) {}
  protected void printAttributes(List attributes, Writer out) {}
  
  
  protected void printElement(Element element, Writer out, int indentLevel)
   throws IOException {
    
    List content = element.getMixedContent();
    Iterator iterator = content.iterator();
    while (iterator.hasNext()) {
      Object o = iterator.next();
      if (o instanceof String) {
        out.write((String) o);
        this.maybePrintln(out);
      }
      else if (o instanceof Element) {
        printElement((Element) o, out, indentLevel);
      }
    }
          
  }

  // Could easily have put main() method in a separate class
  public static void main(String[] args) {
     
    if (args.length == 0) {
      System.out.println(
       "Usage: java TagStripper URL1 URL2..."); 
    } 
      
    TagStripper stripper = new TagStripper();
    SAXBuilder builder   = new SAXBuilder();
    
    // start parsing... 
    for (int i = 0; i < args.length; i++) {
      
      // command line should offer URIs or file names
      try {
        Document doc = builder.build(args[i]);
        stripper.output(doc, System.out);
      }
      catch (JDOMException e) { // a well-formedness error
        System.out.println(args[i] + " is not well formed.");
        System.out.println(e.getMessage());
      }
      catch (IOException e) { // a well-formedness error
        System.out.println(e.getMessage());
      }
      
    }  
  
  }

}

Output from a JDOM based TagStripper

% java TagStripper hotcop.xml
Hot Cop
Jacques Morali
Henri Belolo
Victor Willis
Jacques Morali
A & M Records
6:20
1978
Village People

Talking to DOM Programs

The process of taking an in-memory JDOM Document and converting it to an org.w3c.dom.Document object

The org.jdom.output.DOMOutputter class:


package org.jdom.output;

public class DOMOutputter {

    // constructor
    public DOMOutputter() {}
    
    // output methods
    public org.w3c.dom.Document output(Document document) {}
    public org.w3c.dom.Document output(Document document, String domAdapterClass) {}
    
    // utility methods
    public String getXmlnsTagFor(Namespace ns) {}
    protected void buildDOMTree(Object content, org.w3c.dom.Document doc, 
     org.w3c.dom.Element current, boolean atRoot) {}
    
}

Talking to SAX Programs

The process of taking an in-memory JDOM Document and walking its tree while firing off SAX events
The org.jdom.output.SAXOutputter class
Not implemented yet

Part VI: The org.jdom.input package

DOMBuilder
SAXBuilder

SAXBuilder

package org.jdom.input;

public class SAXBuilder {

  public SAXBuilder(String saxDriverClassName, boolean validate) {}
  public SAXBuilder(String saxDriverClassName) {}
  public SAXBuilder(boolean validate) {}
  public SAXBuilder() {}

  // The basic builder method
  protected Document build(InputSource in) throws JDOMException {}

  // These all just convert their arguments to a SAX InputSource
  // and call the protected build() method:
  public Document build(InputStream in) throws JDOMException {}
  public Document build(File file) throws JDOMException {}
  public Document build(URL url) throws JDOMException {}
  public Document build(InputStream in, String systemID) throws JDOMException {}
  public Document build(Reader in) throws JDOMException {}
  public Document build(Reader in, String systemID) throws JDOMException {}
  public Document build(String systemID) {}

  // Configuration methods
  public void setErrorHandler(org.xml.sax.ErrorHandler errorHandler) {}
  public void setEntityResolver(org.xml.sax.EntityResolver entityResolver) {}

}

DOMBuilder

package org.jdom.input;

public class DOMBuilder {

  public DOMBuilder(String adapterClass, boolean validate) {}
  public DOMBuilder(String adapterClass) {}
  public DOMBuilder(boolean validate) {}
  public DOMBuilder() {}

  public Document build(org.w3c.dom.Document domDocument) {}
  public Document build(InputStream in) throws JDOMException
  public Document build(File file) throws JDOMException
  public Document build(URL url) throws JDOMException
        
}

What JDOM doesn't do

Documents larger than available memory
Byte-for-byte faithful round trips
Maintaining namespace prefixes (will be added before 1.0)
DTDs
XPath Queries (may be added in 1.1)

To Learn More

This presentation: http://metalab.unc.edu/xml/slides/xmlsig/jdom/
JavaWorld: http://javaworld.com/javaworld/jw-05-2000/jw-0518-jdom.html
JDOM Web Site, http://www.jdom.org/
Java and XML, Brett McLaughlin, O'Reilly & Associates, 2000, ISBN 0-596-00016-2, http://www.oreilly.com/catalog/javaxml/

Questions?

Index | Cafe con Leche

Surname	FirstName	Team	Position	Games Played	Games Started	AtBats	Runs	Hits	Doubles	Triples	Home runs	RBI	Stolen Bases	Caught Stealing	Sacrifice Hits	Sacrifice Flies	Errors	PB	Walks	Strike outs	Hit by pitch
Anderson	Garret	ANA	Outfield	156	151	622	62	183	41	7	15	79	8	3	3	3	6	0	29	80	1
Baughman	Justin	ANA	Second Base	62	54	196	24	50	9	1	1	20	10	4	5	3	8	0	6	36	1
Bolick	Frank	ANA	Third Base	21	11	45	3	7	2	0	1	2	0	0	0	0	0	0	11	8	0
Disarcina	Gary	ANA	Shortstop	157	155	551	73	158	39	3	3	56	12	7	12	3	14	0	21	51	8
Edmonds	Jim	ANA	Outfield	154	150	599	115	184	42	1	25	91	7	5	1	1	5	0	57	114	1
Erstad	Darin	ANA	Outfield	133	129	537	84	159	39	3	19	82	20	6	1	3	3	0	43	77	6
Garcia	Carlos	ANA	Second Base	19	10	35	4	5	1	0	0	0	2	0	1	0	1	0	3	11	1
Glaus	Troy	ANA	Third Base	48	45	165	19	36	9	0	1	23	1	0	0	2	7	0	15	51	0
Greene	Todd	ANA	Outfield	29	15	71	3	18	4	0	1	7	0	0	0	0	0	0	2	20	0
Helfand	Eric	ANA	Catcher	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
Hollins	Dave	ANA	Third Base	101	98	363	60	88	16	2	11	39	11	3	2	2	17	0	44	69	7
Jefferies	Gregg	ANA	Outfield	19	18	72	7	25	6	0	1	10	1	0	0	0	0	0	0	5	0
Johnson	Mark	ANA	First Base	10	2	14	1	1	0	0	0	0	0	0	0	0	0	0	0	6	0
Kreuter	Chad	ANA	Catcher	96	74	252	27	63	10	1	2	33	1	0	5	1	9	5	33	49	3
Martin	Norberto	ANA	Second Base	79	50	195	20	42	2	0	1	13	3	1	3	2	4	0	6	29	0
Mashore	Damon	ANA	Outfield	43	24	98	13	23	6	0	2	11	1	0	1	0	0	0	9	22	3
Molina	Ben	ANA	Catcher	2	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
Nevin	Phil	ANA	Catcher	75	65	237	27	54	8	1	8	27	0	0	0	2	5	20	17	67	5
Obrien	Charlie	ANA	Catcher	62	58	175	13	45	9	0	4	18	0	0	3	3	4	1	10	33	2
Palmeiro	Orlando	ANA	Outfield	74	34	165	28	53	7	2	0	21	5	4	7	0	0	0	20	11	0
Pritchett	Chris	ANA	First Base	31	19	80	12	23	2	1	2	8	2	0	0	0	1	0	4	16	0
Salmon	Tim	ANA	Designated Hitter	136	130	463	84	139	28	1	26	88	0	1	0	10	2	0	90	100	3
Shipley	Craig	ANA	Third Base	77	32	147	18	38	7	1	2	17	0	4	4	1	3	0	5	22	5
Velarde	Randy	ANA	Second Base	51	50	188	29	49	13	1	4	26	7	2	0	1	4	0	34	42	1
Walbeck	Matt	ANA	Catcher	108	91	338	41	87	15	2	6	46	1	1	5	5	7	8	30	68	2
Williams	Reggie	ANA	Outfield	29	7	36	7	13	1	0	1	5	3	3	1	0	0	0	7	11	1

Core	`org.w3c.dom` ^*
HTML	`org.w3c.dom.html`
Views	`org.w3c.dom.views`
StyleSheets	`org.w3c.dom.stylesheets`
CSS	`org.w3c.dom.css`
Events	`org.w3c.dom.events` ^*
Traversal	`org.w3c.dom.traversal` ^*
Range	`org.w3c.dom.range`