Processing XML with Java


Processing XML with Java

Elliotte Rusty Harold

OOP 2003

Friday, January 24, 2003

elharo@metalab.unc.edu

http://www.cafeconleche.org/


Where we're going


Processing XML with Java is easy


Prerequisites


XML API Styles


Parser APIs


Part I: XML Infoset

The Infoset is the unfortunate standard to which those in retreat from the radical and most useful implications of well-formedness have rallied. At its core the Infoset insists that there is 'more' to XML than the straightforward syntax of well-formedness. By imposing its canonical semantics the Infoset obviates the infinite other semantic outcomes which might be elaborated in particular unique circumstances from an instance of well-formed XML 1.0 syntax. The question we should be asking is not whether the Infoset has chosen the correct canonical semantics, but whether the syntactic possibilities of XML 1.0 should be curtailed in this way at all.
--Walter Perry on the xml-dev mailing list


A simple example

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/css" href="song.css"?>
<!DOCTYPE SONG SYSTEM "song.dtd">
<SONG xmlns="http://www.cafeconleche.org/namespace/song"
      xmlns:xlink="http://www.w3.org/1999/xlink">
  <TITLE>Hot Cop</TITLE>
  <PHOTO 
    xlink:type="simple" xlink:show="onLoad" xlink:href="hotcop.jpg"
    ALT="Victor Willis in Cop Outfit" WIDTH="100" HEIGHT="200"/>
  <COMPOSER>Jacques Morali</COMPOSER>
  <COMPOSER>Henri Belolo</COMPOSER>
  <COMPOSER>Victor Willis</COMPOSER>
  <PRODUCER>Jacques Morali</PRODUCER>
  <!-- The publisher is actually Polygram but I needed 
       an example of a general entity reference. -->
  <PUBLISHER xlink:type="simple" xlink:href="http://www.amrecords.com/">
    A &amp; M Records
  </PUBLISHER>
  <LENGTH>6:20</LENGTH>
  <YEAR>1978</YEAR>
  <ARTIST>Village People</ARTIST>
</SONG>
<!-- You can tell what album I was 
     listening to when I wrote this example -->
View in Browser

Markup and Character Data


Markup and Character Data Example

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/css" href="song.css"?>
<!DOCTYPE SONG SYSTEM "song.dtd">
<SONG xmlns="http://www.cafeconleche.org/namespace/song"
      xmlns:xlink="http://www.w3.org/1999/xlink">
  <TITLE>Hot Cop</TITLE>
  <PHOTO 
    xlink:type="simple" xlink:show="onLoad" xlink:href="hotcop.jpg"
    ALT="Victor Willis in Cop Outfit" WIDTH="100" HEIGHT="200"/>
  <COMPOSER>Jacques Morali</COMPOSER>
  <COMPOSER>Henri Belolo</COMPOSER>
  <COMPOSER>Victor Willis</COMPOSER>
  <PRODUCER>Jacques Morali</PRODUCER>
  <!-- The publisher is actually Polygram but I needed 
       an example of a general entity reference. -->
  <PUBLISHER xlink:type="simple" xlink:href="http://www.amrecords.com/">
    A &amp; M Records
  </PUBLISHER>
  <LENGTH>6:20</LENGTH>
  <YEAR>1978</YEAR>
  <ARTIST>Village People</ARTIST>
</SONG>
<!-- You can tell what album I was 
     listening to when I wrote this example -->

Elements and Tags


Entities


Parsed Character Data


CDATA sections


Comments


Processing Instructions


The XML Declaration

<?xml version="1.0" encoding="UTF-8" standalone="no"?>

Document Type Declaration

<!DOCTYPE SONG SYSTEM "song.dtd">


Document Type Definition (DTD)

<!ELEMENT SONG (TITLE, PHOTO?, COMPOSER+, PRODUCER*, 
 PUBLISHER*, LENGTH?, YEAR?, ARTIST+)>

<!ELEMENT TITLE     (#PCDATA)>
<!ELEMENT COMPOSER  (#PCDATA)>
<!ELEMENT PRODUCER  (#PCDATA)>
<!ELEMENT PUBLISHER (#PCDATA)>
<!ELEMENT LENGTH    (#PCDATA)>
<!-- This should be a four digit year like "1999",
     not a two-digit year like "99" -->
<!ELEMENT YEAR   (#PCDATA)>
<!ELEMENT ARTIST (#PCDATA)>
<!ELEMENT PHOTO EMPTY>
<!ATTLIST PHOTO xlink:type (simple) #FIXED "simple" 
                xlink:show (onLoad) #FIXED "onLoad" 
                xlink:href CDATA #REQUIRED
                ALT CDATA #REQUIRED
                WIDTH NMTOKEN #REQUIRED
                HEIGHT NMTOKEN #REQUIRED
>
<!ATTLIST PUBLISHER xlink:type (simple) #FIXED "simple" 
                    xlink:href CDATA #REQUIRED

>
<!ATTLIST SONG xmlns CDATA       #FIXED "http://www.cafeconleche.org/namespace/song"
               xmlns:xlink CDATA #FIXED "http://www.w3.org/1999/xlink"
>

XML Names


XML Namespaces


Namespace Syntax


Namespace URIs


Binding Prefixes to Namespace URIs


The Default Namespace


How Parsers Handle Namespaces


Questions?


Three Variations on a Theme


A normal XML document

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/css" href="song.css"?>
<!DOCTYPE SONG SYSTEM "song.dtd">
<SONG xmlns="http://www.cafeconleche.org/namespace/song"
      xmlns:xlink="http://www.w3.org/1999/xlink">
  <TITLE>Hot Cop</TITLE>
  <PHOTO 
    xlink:type="simple" xlink:show="onLoad" xlink:href="hotcop.jpg"
    ALT="Victor Willis in Cop Outfit" WIDTH="100" HEIGHT="200"/>
  <COMPOSER>Jacques Morali</COMPOSER>
  <COMPOSER>Henri Belolo</COMPOSER>
  <COMPOSER>Victor Willis</COMPOSER>
  <PRODUCER>Jacques Morali</PRODUCER>
  <!-- The publisher is actually Polygram but I needed 
       an example of a general entity reference. -->
  <PUBLISHER xlink:type="simple" xlink:href="http://www.amrecords.com/">
    A &amp; M Records
  </PUBLISHER>
  <LENGTH>6:20</LENGTH>
  <YEAR>1978</YEAR>
  <ARTIST>Village People</ARTIST>
</SONG>
<!-- You can tell what album I was 
     listening to when I wrote this example -->

A canonical XML document

<?xml-stylesheet type="text/css" href="song.css"?>
<SONG xmlns="http://www.cafeonleche.org/namespace/song" xmlns:xlink="http://www.w3.org/1999/xlink">
  <TITLE>Hot Cop</TITLE>
  <PHOTO ALT="Victor Willis in Cop Outfit" HEIGHT="200" WIDTH="100" xlink:href="hotcop.jpg" xlink:show="onLoad" xlink:type="simple"></PHOTO>
  <COMPOSER>Jacques Morali</COMPOSER>
  <COMPOSER>Henri Belolo</COMPOSER>
  <COMPOSER>Victor Willis</COMPOSER>
  <PRODUCER>Jacques Morali</PRODUCER>
  
  <PUBLISHER xlink:href="http://www.amrecords.com/" xlink:type="simple">
    A &amp; M Records
  </PUBLISHER>
  <LENGTH>6:20</LENGTH>
  <YEAR>1978</YEAR>
  <ARTIST>Village People</ARTIST>
</SONG>

An org.w3c.dom.Document object formed by reading hotcop.xml


import org.apache.xerces.parsers.*;
import org.w3c.dom.*;
import org.xml.sax.*;
import java.io.*;

public class DOMHotCop {

  public static void main(String[] args) {
     
    DOMParser parser = new DOMParser();
    
    try {
      parser.parse("http://www.cafeconleche.org/examples/hotcop.xml"); 
      Document d = parser.getDocument();
    }
    catch (SAXException e) {
      System.err.println(e); 
    }
    catch (IOException e) {
      System.err.println(e); 
    }
   
  }

}

Are these three the same thing or not?


What is the XML InfoSet?


The InfoSet defines 11 Kinds of Information Items

Not everyone agrees that this is a good thing! or that this is the right list!


The Document Information Item


Element Information Items

An Element Information Item Includes:


Attributes

xlink:type="simple"
xlink:href="http://www.amrecords.com/"
xlink:type =  "simple"
xlink:show = "onLoad"
xlink:href="hotcop.jpg"
ALT="Victor Willis in Cop Outfit"
WIDTH=" 100 "
HEIGHT=' 200 '

An Attribute Information Item Includes:


Comments

  <!-- The publisher is actually Polygram but I needed 
       an example of a general entity reference. -->
<!--  <PUBLISHER xlink:type="simple" xlink:href="http://www.amrecords.com/">
    A &amp; M Records
  </PUBLISHER>
  <LENGTH>6:20</LENGTH>
  <YEAR>1978</YEAR>
  <ARTIST>Village People</ARTIST>
</SONG> -->
<!-- You can tell what album I was 
     listening to when I wrote this example -->

A comment Information Item includes:


A Processing Instruction Information Item Includes:

<?robots index="yes" follow="no"?>
<?php 
  mysql_connect("database.unc.edu", "clerk", "password"); 
  $result = mysql("CYNW", "SELECT LastName, FirstName FROM Employees 
    ORDER BY LastName, FirstName"); 
  $i = 0;
  while ($i < mysql_numrows ($result)) {
     $fields = mysql_fetch_row($result);
     echo "<person>$fields[1] $fields[0] </person>\r\n";
     $i++;
  }
  mysql_close();
?>

Characters


Namespace Information Items


Document Type Declaration

<!DOCTYPE SONG SYSTEM "song.dtd">
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
                      "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

A Document Type Declaration Information Item includes:

system identifier
public identifier
children
Only the processing instruction information items in the internal DTD subset and external DTD subsets.
parent

Unparsed Entity Information Items

Each unparsed entity information item includes


The InfoSet Omits:


To Learn More


Part II: Writing XML Documents with Java

I have learned to be even more skeptical than before about the slew of APIs doing the rounds in the XML development community. An XML instance is just a documents, guys; you need to understand the document structure and document interchange choreography of your systems. Don't let some API get in the way of your understanding of XML systems at the document level. If you do, you run the risk becoming a slave to the APIs and hitting a wall when the APIs fail you.

--Sean McGrath
Read the rest in ITworld.com - XML IN PRACTICE - APIs Considered Harmful


You don't always need a new API


Unicode


Readers and Writers


A Java program that writes Fibonacci numbers into a text file

import java.math.BigInteger;
import java.io.*;


public class FibonacciText {

  public static void main(String[] args) {

    try {
      OutputStream fout = new FileOutputStream("fibonacci.txt");
      Writer out = new OutputStreamWriter(fout, "8859_1");

      BigInteger low  = BigInteger.ONE;
      BigInteger high = BigInteger.ONE;

      for (int i = 1; i <= 25; i++) {
        out.write(low.toString() + "\r\n");
        BigInteger temp = high;
        high = high.add(low);
        low = temp;
      }
      out.write(high.toString() + "\r\n");

      out.close();
    }
    catch (IOException e) {
      System.err.println(e);
    }

  }

}

fibonacci.txt

1
1
2
3
5
8
13
21
34
55
89
144
233
377
610
987
1597
2584
4181
6765
10946
17711
28657
46368
75025
121393
317811

A Java program that writes Fibonacci numbers into an XML file

import java.math.BigInteger;
import java.io.*;


public class FibonacciXML {

  public static void main(String[] args) {
   
    try {
      OutputStream  fout = new FileOutputStream("fibonacci.xml");
      Writer out = new OutputStreamWriter(fout);      
      
      BigInteger low  = BigInteger.ONE;
      BigInteger high = BigInteger.ONE;      
      
      out.write("<?xml version=\"1.0\"?>\r\n");  
      out.write("<Fibonacci_Numbers>\r\n");  
      for (int i = 1; i <= 25; i++) {
        out.write("  <fibonacci index=\"" + i + "\">");
        out.write(low.toString());
        out.write("</fibonacci>\r\n");
        BigInteger temp = high;
        high = high.add(low);
        low = temp;
      }
      out.write("</Fibonacci_Numbers>");  
 
      out.close();
    }
    catch (IOException e) {
      System.err.println(e);
    }

  }

}

fibonacci.xml

<?xml version="1.0"?>
<Fibonacci_Numbers>
  <fibonacci index="1">1</fibonacci>
  <fibonacci index="2">1</fibonacci>
  <fibonacci index="3">2</fibonacci>
  <fibonacci index="4">3</fibonacci>
  <fibonacci index="5">5</fibonacci>
  <fibonacci index="6">8</fibonacci>
  <fibonacci index="7">13</fibonacci>
  <fibonacci index="8">21</fibonacci>
  <fibonacci index="9">34</fibonacci>
  <fibonacci index="10">55</fibonacci>
  <fibonacci index="11">89</fibonacci>
  <fibonacci index="12">144</fibonacci>
  <fibonacci index="13">233</fibonacci>
  <fibonacci index="14">377</fibonacci>
  <fibonacci index="15">610</fibonacci>
  <fibonacci index="16">987</fibonacci>
  <fibonacci index="17">1597</fibonacci>
  <fibonacci index="18">2584</fibonacci>
  <fibonacci index="19">4181</fibonacci>
  <fibonacci index="20">6765</fibonacci>
  <fibonacci index="21">10946</fibonacci>
  <fibonacci index="22">17711</fibonacci>
  <fibonacci index="23">28657</fibonacci>
  <fibonacci index="24">46368</fibonacci>
  <fibonacci index="25">75025</fibonacci>
</Fibonacci_Numbers>

Suppose we want to use a different encoding than UTF-8

import java.math.BigInteger;
import java.io.*;


public class FibonacciLatin1 {

  public static void main(String[] args) {
   
    try {
      OutputStream fout = new FileOutputStream("fibonacci_8859_1.xml");
      Writer out = new OutputStreamWriter(fout, "8859_1");      
      
      BigInteger low  = BigInteger.ONE;
      BigInteger high = BigInteger.ONE;      
      
      out.write("<?xml version=\"1.0\" encoding=\"8859_1\"?>\r\n");  
      out.write("<Fibonacci_Numbers>\r\n");  
      for (int i = 1; i <= 25; i++) {
        out.write("  <fibonacci index=\"" + i + "\">");
        out.write(low.toString());
        out.write("</fibonacci>\r\n");
        
        BigInteger temp = high;
        high = high.add(low);
        low = temp;
      }
      out.write("</Fibonacci_Numbers>");  
 
      out.close();
    }
    catch (IOException e) {
      System.err.println(e);
    }

  }

}

fibonacci_8859_1.xml

<?xml version="1.0" encoding="8859_1"?>
<Fibonacci_Numbers>
  <fibonacci index="1">1</fibonacci>
  <fibonacci index="2">1</fibonacci>
  <fibonacci index="3">2</fibonacci>
  <fibonacci index="4">3</fibonacci>
  <fibonacci index="5">5</fibonacci>
  <fibonacci index="6">8</fibonacci>
  <fibonacci index="7">13</fibonacci>
  <fibonacci index="8">21</fibonacci>
  <fibonacci index="9">34</fibonacci>
  <fibonacci index="10">55</fibonacci>
  <fibonacci index="11">89</fibonacci>
  <fibonacci index="12">144</fibonacci>
  <fibonacci index="13">233</fibonacci>
  <fibonacci index="14">377</fibonacci>
  <fibonacci index="15">610</fibonacci>
  <fibonacci index="16">987</fibonacci>
  <fibonacci index="17">1597</fibonacci>
  <fibonacci index="18">2584</fibonacci>
  <fibonacci index="19">4181</fibonacci>
  <fibonacci index="20">6765</fibonacci>
  <fibonacci index="21">10946</fibonacci>
  <fibonacci index="22">17711</fibonacci>
  <fibonacci index="23">28657</fibonacci>
  <fibonacci index="24">46368</fibonacci>
  <fibonacci index="25">75025</fibonacci>
</Fibonacci_Numbers>

Suppose you want to include a DTD

import java.math.BigInteger;
import java.io.*;


public class FibonacciDTD {

  public static void main(String[] args) {
   
    try {
      OutputStream fout = new FileOutputStream("valid_fibonacci.xml");
      Writer out = new OutputStreamWriter(fout, "UTF-8");      
      
      BigInteger low  = BigInteger.ONE;
      BigInteger high = BigInteger.ONE;      
      
      out.write("<?xml version=\"1.0\"?>\r\n");  
      out.write("<!DOCTYPE Fibonacci_Numbers [\r\n");
      out.write("  <!ELEMENT Fibonacci_Numbers (fibonacci*)>\r\n");      
      out.write("  <!ELEMENT fibonacci (#PCDATA)>\r\n");      
      out.write("  <!ATTLIST fibonacci index CDATA #IMPLIED>\r\n");      
      out.write("]>\r\n");  
      out.write("<Fibonacci_Numbers>\r\n");  
      for (int i = 1; i <= 25; i++) {
        out.write("  <fibonacci index=\"" + i + "\">");
        out.write(low.toString());
        out.write("</fibonacci>\r\n");
        
        BigInteger temp = high;
        high = high.add(low);
        low = temp;
      }
      out.write("</Fibonacci_Numbers>");  
 
      out.close();
    }
    catch (IOException e) {
      System.err.println(e);
    }

  }

}

valid_fibonacci.xml

<?xml version="1.0"?>
<!DOCTYPE Fibonacci_Numbers [
  <!ELEMENT Fibonacci_Numbers (fibonacci*)>
  <!ELEMENT fibonacci (#PCDATA)>
  <!ATTLIST fibonacci index CDATA #IMPLIED>
]>
<Fibonacci_Numbers>
  <fibonacci index="0">0</fibonacci>
  <fibonacci index="1">1</fibonacci>
  <fibonacci index="2">1</fibonacci>
  <fibonacci index="3">2</fibonacci>
  <fibonacci index="4">3</fibonacci>
  <fibonacci index="5">5</fibonacci>
  <fibonacci index="6">8</fibonacci>
  <fibonacci index="7">13</fibonacci>
  <fibonacci index="8">21</fibonacci>
  <fibonacci index="9">34</fibonacci>
  <fibonacci index="10">55</fibonacci>
  <fibonacci index="11">89</fibonacci>
  <fibonacci index="12">144</fibonacci>
  <fibonacci index="13">233</fibonacci>
  <fibonacci index="14">377</fibonacci>
  <fibonacci index="15">610</fibonacci>
  <fibonacci index="16">987</fibonacci>
  <fibonacci index="17">1597</fibonacci>
  <fibonacci index="18">2584</fibonacci>
  <fibonacci index="19">4181</fibonacci>
  <fibonacci index="20">6765</fibonacci>
  <fibonacci index="21">10946</fibonacci>
  <fibonacci index="22">17711</fibonacci>
  <fibonacci index="23">28657</fibonacci>
  <fibonacci index="24">46368</fibonacci>
</Fibonacci_Numbers>

Questions?


Converting data to XML


Sample Tab Delimited Data: Baseball Statistics

SurnameFirstNameTeamPositionGames PlayedGames StartedAtBatsRunsHitsDoublesTriplesHome runsRBIStolen BasesCaught StealingSacrifice HitsSacrifice FliesErrorsPBWalksStrike outsHit by pitch
AndersonGarret ANAOutfield15615162262183417157983336029801
BaughmanJustin ANASecond Base625419624509112010453806361
BolickFrank ANAThird Base2111453720120000001180
DisarcinaGary ANAShortstop1571555517315839335612712314021518
EdmondsJim ANAOutfield1541505991151844212591751150571141
ErstadDarin ANAOutfield133129537841593931982206133043776
GarciaCarlos ANASecond Base1910354510002010103111
GlausTroy ANAThird Base484516519369012310027015510
GreeneTodd ANAOutfield29157131840170000002200
HelfandEric ANACatcher000000000000000000
HollinsDave ANAThird Base10198363608816211391132217044697
JefferiesGregg ANAOutfield19187272560110100000050
JohnsonMark ANAFirst Base10214110000000000060
KreuterChad ANACatcher9674252276310123310519533493
MartinNorberto ANASecond Base79501952042201133132406290
MashoreDamon ANAOutfield4324981323602111010009223
MolinaBen ANACatcher201000000000000000
NevinPhil ANACatcher7565237275481827000252017675
ObrienCharlie ANACatcher625817513459041800334110332
PalmeiroOrlando ANAOutfield743416528537202154700020110
PritchettChris ANAFirst Base311980122321282000104160
SalmonTim ANADesignated Hitter1361304638413928126880101020901003
ShipleyCraig ANAThird Base77321471838712170441305225
VelardeRandy ANASecond Base5150188294913142672014034421
WalbeckMatt ANACatcher10891338418715264611557830682
WilliamsReggie ANAOutfield2973671310153310007111

A Program to convert tab delimited data to XML

import java.io.*;


public class BaseballTabToXML {

  public static void main(String[] args) {
     
    try {
      FileInputStream fin = new FileInputStream(args[0]);
      BufferedReader in 
       = new BufferedReader(new InputStreamReader(fin));
      
      FileOutputStream fout 
       = new FileOutputStream("baseballstats.xml");
      Writer out = new OutputStreamWriter(fout, "UTF-8");      
      out.write("<?xml version=\"1.0\"?>\r\n");  
      out.write("<players>\r\n");
      String playerStats;  
      while ((playerStats = in.readLine()) != null) {
        String[] stats = splitLine(playerStats);         
        out.write("  <player>\r\n");
          out.write("    <first_name>" + stats[1] + "</first_name>\r\n");
          out.write("    <surname>" + stats[0] + "</surname>\r\n");
          out.write("    <games_played>" + stats[4] + "</games_played>\r\n");
          out.write("    <at_bats>" + stats[6] + "</at_bats>\r\n");
          out.write("    <runs>" + stats[7] + "</runs>\r\n");
          out.write("    <hits>" + stats[8] + "</hits>\r\n");
          out.write("    <doubles>" + stats[9] + "</doubles>\r\n");
          out.write("    <triples>" + stats[10] + "</triples>\r\n");
          out.write("    <home_runs>" + stats[11] + "</home_runs>\r\n");
          out.write("    <stolen_bases>" + stats[12] + "</stolen_bases>\r\n");
          out.write("    <caught_stealing>" + stats[14] + "</caught_stealing>\r\n");
          out.write("    <sacrifice_hits>" + stats[15] + "</sacrifice_hits>\r\n");
          out.write("    <sacrifice_flies>" + stats[16] + "</sacrifice_flies>\r\n");
          out.write("    <errors>" + stats[17] + "</errors>\r\n");
          out.write("    <passed_by_ball>" + stats[18] + "</passed_by_ball>\r\n");
          out.write("    <walks>" + stats[19] + "</walks>\r\n");
          out.write("    <strike_outs>" + stats[20] + "</strike_outs>\r\n");
          out.write("    <hit_by_pitch>" + stats[21] + "</hit_by_pitch>\r\n");
        out.write("  </player>\r\n");
      }  
      out.write("</players>\r\n");  
      out.close();
      in.close();
    }
    catch (IOException e) {
      System.err.println(e);
    }
    catch (ArrayIndexOutOfBoundsException e) {
      System.out.println("Usage: java BaseballTabToXML input_file.tab");
    }

  }

  public static String[] splitLine(String playerStats) {
    
    // count the number of tabs
    int numTabs = 0;
    for (int i = 0; i < playerStats.length(); i++) {
      if (playerStats.charAt(i) == '\t') numTabs++;
    }
    int numFields = numTabs + 1;
    String[] fields = new String[numFields];
    int position = 0;
    for (int i = 0; i < numFields; i++) {
      StringBuffer field = new StringBuffer();
      while (position < playerStats.length() 
       && playerStats.charAt(position++) != '\t') {
        field.append(playerStats.charAt(position-1));
      }
      fields[i] = field.toString();
    }    
    return fields;
    
  }

}

Baseball Stats in XML

<?xml version="1.0"?>
<players>
  <player>
    <first_name>FirstName</first_name>
    <surname>Surname</surname>
    <games_played>Games Played</games_played>
    <at_bats>AtBats</at_bats>
    <runs>Runs</runs>
    <hits>Hits</hits>
    <doubles>Doubles</doubles>
    <triples>Triples</triples>
    <home_runs>Home runs</home_runs>
    <stolen_bases>RBI</stolen_bases>
    <caught_stealing>Caught Stealing</caught_stealing>
    <sacrifice_hits>Sacrifice Hits</sacrifice_hits>
    <sacrifice_flies>Sacrifice Flies</sacrifice_flies>
    <errors>Errors</errors>
    <passed_by_ball>PB</passed_by_ball>
    <walks>Walks</walks>
    <strike_outs>Strike outs</strike_outs>
    <hit_by_pitch>Hit by pitch</hit_by_pitch>
  </player>
  <player>
    <first_name>Garret </first_name>
    <surname>Anderson</surname>
    <games_played>156</games_played>
    <at_bats>622</at_bats>
    <runs>62</runs>
    <hits>183</hits>
    <doubles>41</doubles>
    <triples>7</triples>
    <home_runs>15</home_runs>
    <stolen_bases>79</stolen_bases>
    <caught_stealing>3</caught_stealing>
    <sacrifice_hits>3</sacrifice_hits>
    <sacrifice_flies>3</sacrifice_flies>
    <errors>6</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>29</walks>
    <strike_outs>80</strike_outs>
    <hit_by_pitch>1</hit_by_pitch>
  </player>
  <player>
    <first_name>Justin </first_name>
    <surname>Baughman</surname>
    <games_played>62</games_played>
    <at_bats>196</at_bats>
    <runs>24</runs>
    <hits>50</hits>
    <doubles>9</doubles>
    <triples>1</triples>
    <home_runs>1</home_runs>
    <stolen_bases>20</stolen_bases>
    <caught_stealing>4</caught_stealing>
    <sacrifice_hits>5</sacrifice_hits>
    <sacrifice_flies>3</sacrifice_flies>
    <errors>8</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>6</walks>
    <strike_outs>36</strike_outs>
    <hit_by_pitch>1</hit_by_pitch>
  </player>
  <player>
    <first_name>Frank </first_name>
    <surname>Bolick</surname>
    <games_played>21</games_played>
    <at_bats>45</at_bats>
    <runs>3</runs>
    <hits>7</hits>
    <doubles>2</doubles>
    <triples>0</triples>
    <home_runs>1</home_runs>
    <stolen_bases>2</stolen_bases>
    <caught_stealing>0</caught_stealing>
    <sacrifice_hits>0</sacrifice_hits>
    <sacrifice_flies>0</sacrifice_flies>
    <errors>0</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>11</walks>
    <strike_outs>8</strike_outs>
    <hit_by_pitch>0</hit_by_pitch>
  </player>
  <player>
    <first_name>Gary </first_name>
    <surname>Disarcina</surname>
    <games_played>157</games_played>
    <at_bats>551</at_bats>
    <runs>73</runs>
    <hits>158</hits>
    <doubles>39</doubles>
    <triples>3</triples>
    <home_runs>3</home_runs>
    <stolen_bases>56</stolen_bases>
    <caught_stealing>7</caught_stealing>
    <sacrifice_hits>12</sacrifice_hits>
    <sacrifice_flies>3</sacrifice_flies>
    <errors>14</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>21</walks>
    <strike_outs>51</strike_outs>
    <hit_by_pitch>8</hit_by_pitch>
  </player>
  <player>
    <first_name>Jim </first_name>
    <surname>Edmonds</surname>
    <games_played>154</games_played>
    <at_bats>599</at_bats>
    <runs>115</runs>
    <hits>184</hits>
    <doubles>42</doubles>
    <triples>1</triples>
    <home_runs>25</home_runs>
    <stolen_bases>91</stolen_bases>
    <caught_stealing>5</caught_stealing>
    <sacrifice_hits>1</sacrifice_hits>
    <sacrifice_flies>1</sacrifice_flies>
    <errors>5</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>57</walks>
    <strike_outs>114</strike_outs>
    <hit_by_pitch>1</hit_by_pitch>
  </player>
  <player>
    <first_name>Darin </first_name>
    <surname>Erstad</surname>
    <games_played>133</games_played>
    <at_bats>537</at_bats>
    <runs>84</runs>
    <hits>159</hits>
    <doubles>39</doubles>
    <triples>3</triples>
    <home_runs>19</home_runs>
    <stolen_bases>82</stolen_bases>
    <caught_stealing>6</caught_stealing>
    <sacrifice_hits>1</sacrifice_hits>
    <sacrifice_flies>3</sacrifice_flies>
    <errors>3</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>43</walks>
    <strike_outs>77</strike_outs>
    <hit_by_pitch>6</hit_by_pitch>
  </player>
  <player>
    <first_name>Carlos </first_name>
    <surname>Garcia</surname>
    <games_played>19</games_played>
    <at_bats>35</at_bats>
    <runs>4</runs>
    <hits>5</hits>
    <doubles>1</doubles>
    <triples>0</triples>
    <home_runs>0</home_runs>
    <stolen_bases>0</stolen_bases>
    <caught_stealing>0</caught_stealing>
    <sacrifice_hits>1</sacrifice_hits>
    <sacrifice_flies>0</sacrifice_flies>
    <errors>1</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>3</walks>
    <strike_outs>11</strike_outs>
    <hit_by_pitch>1</hit_by_pitch>
  </player>
  <player>
    <first_name>Troy </first_name>
    <surname>Glaus</surname>
    <games_played>48</games_played>
    <at_bats>165</at_bats>
    <runs>19</runs>
    <hits>36</hits>
    <doubles>9</doubles>
    <triples>0</triples>
    <home_runs>1</home_runs>
    <stolen_bases>23</stolen_bases>
    <caught_stealing>0</caught_stealing>
    <sacrifice_hits>0</sacrifice_hits>
    <sacrifice_flies>2</sacrifice_flies>
    <errors>7</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>15</walks>
    <strike_outs>51</strike_outs>
    <hit_by_pitch>0</hit_by_pitch>
  </player>
  <player>
    <first_name>Todd </first_name>
    <surname>Greene</surname>
    <games_played>29</games_played>
    <at_bats>71</at_bats>
    <runs>3</runs>
    <hits>18</hits>
    <doubles>4</doubles>
    <triples>0</triples>
    <home_runs>1</home_runs>
    <stolen_bases>7</stolen_bases>
    <caught_stealing>0</caught_stealing>
    <sacrifice_hits>0</sacrifice_hits>
    <sacrifice_flies>0</sacrifice_flies>
    <errors>0</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>2</walks>
    <strike_outs>20</strike_outs>
    <hit_by_pitch>0</hit_by_pitch>
  </player>
  <player>
    <first_name>Eric </first_name>
    <surname>Helfand</surname>
    <games_played>0</games_played>
    <at_bats>0</at_bats>
    <runs>0</runs>
    <hits>0</hits>
    <doubles>0</doubles>
    <triples>0</triples>
    <home_runs>0</home_runs>
    <stolen_bases>0</stolen_bases>
    <caught_stealing>0</caught_stealing>
    <sacrifice_hits>0</sacrifice_hits>
    <sacrifice_flies>0</sacrifice_flies>
    <errors>0</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>0</walks>
    <strike_outs>0</strike_outs>
    <hit_by_pitch>0</hit_by_pitch>
  </player>
  <player>
    <first_name>Dave </first_name>
    <surname>Hollins</surname>
    <games_played>101</games_played>
    <at_bats>363</at_bats>
    <runs>60</runs>
    <hits>88</hits>
    <doubles>16</doubles>
    <triples>2</triples>
    <home_runs>11</home_runs>
    <stolen_bases>39</stolen_bases>
    <caught_stealing>3</caught_stealing>
    <sacrifice_hits>2</sacrifice_hits>
    <sacrifice_flies>2</sacrifice_flies>
    <errors>17</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>44</walks>
    <strike_outs>69</strike_outs>
    <hit_by_pitch>7</hit_by_pitch>
  </player>
  <player>
    <first_name>Gregg </first_name>
    <surname>Jefferies</surname>
    <games_played>19</games_played>
    <at_bats>72</at_bats>
    <runs>7</runs>
    <hits>25</hits>
    <doubles>6</doubles>
    <triples>0</triples>
    <home_runs>1</home_runs>
    <stolen_bases>10</stolen_bases>
    <caught_stealing>0</caught_stealing>
    <sacrifice_hits>0</sacrifice_hits>
    <sacrifice_flies>0</sacrifice_flies>
    <errors>0</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>0</walks>
    <strike_outs>5</strike_outs>
    <hit_by_pitch>0</hit_by_pitch>
  </player>
  <player>
    <first_name>Mark </first_name>
    <surname>Johnson</surname>
    <games_played>10</games_played>
    <at_bats>14</at_bats>
    <runs>1</runs>
    <hits>1</hits>
    <doubles>0</doubles>
    <triples>0</triples>
    <home_runs>0</home_runs>
    <stolen_bases>0</stolen_bases>
    <caught_stealing>0</caught_stealing>
    <sacrifice_hits>0</sacrifice_hits>
    <sacrifice_flies>0</sacrifice_flies>
    <errors>0</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>0</walks>
    <strike_outs>6</strike_outs>
    <hit_by_pitch>0</hit_by_pitch>
  </player>
  <player>
    <first_name>Chad </first_name>
    <surname>Kreuter</surname>
    <games_played>96</games_played>
    <at_bats>252</at_bats>
    <runs>27</runs>
    <hits>63</hits>
    <doubles>10</doubles>
    <triples>1</triples>
    <home_runs>2</home_runs>
    <stolen_bases>33</stolen_bases>
    <caught_stealing>0</caught_stealing>
    <sacrifice_hits>5</sacrifice_hits>
    <sacrifice_flies>1</sacrifice_flies>
    <errors>9</errors>
    <passed_by_ball>5</passed_by_ball>
    <walks>33</walks>
    <strike_outs>49</strike_outs>
    <hit_by_pitch>3</hit_by_pitch>
  </player>
  <player>
    <first_name>Norberto </first_name>
    <surname>Martin</surname>
    <games_played>79</games_played>
    <at_bats>195</at_bats>
    <runs>20</runs>
    <hits>42</hits>
    <doubles>2</doubles>
    <triples>0</triples>
    <home_runs>1</home_runs>
    <stolen_bases>13</stolen_bases>
    <caught_stealing>1</caught_stealing>
    <sacrifice_hits>3</sacrifice_hits>
    <sacrifice_flies>2</sacrifice_flies>
    <errors>4</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>6</walks>
    <strike_outs>29</strike_outs>
    <hit_by_pitch>0</hit_by_pitch>
  </player>
  <player>
    <first_name>Damon </first_name>
    <surname>Mashore</surname>
    <games_played>43</games_played>
    <at_bats>98</at_bats>
    <runs>13</runs>
    <hits>23</hits>
    <doubles>6</doubles>
    <triples>0</triples>
    <home_runs>2</home_runs>
    <stolen_bases>11</stolen_bases>
    <caught_stealing>0</caught_stealing>
    <sacrifice_hits>1</sacrifice_hits>
    <sacrifice_flies>0</sacrifice_flies>
    <errors>0</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>9</walks>
    <strike_outs>22</strike_outs>
    <hit_by_pitch>3</hit_by_pitch>
  </player>
  <player>
    <first_name>Ben </first_name>
    <surname>Molina</surname>
    <games_played>2</games_played>
    <at_bats>1</at_bats>
    <runs>0</runs>
    <hits>0</hits>
    <doubles>0</doubles>
    <triples>0</triples>
    <home_runs>0</home_runs>
    <stolen_bases>0</stolen_bases>
    <caught_stealing>0</caught_stealing>
    <sacrifice_hits>0</sacrifice_hits>
    <sacrifice_flies>0</sacrifice_flies>
    <errors>0</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>0</walks>
    <strike_outs>0</strike_outs>
    <hit_by_pitch>0</hit_by_pitch>
  </player>
  <player>
    <first_name>Phil </first_name>
    <surname>Nevin</surname>
    <games_played>75</games_played>
    <at_bats>237</at_bats>
    <runs>27</runs>
    <hits>54</hits>
    <doubles>8</doubles>
    <triples>1</triples>
    <home_runs>8</home_runs>
    <stolen_bases>27</stolen_bases>
    <caught_stealing>0</caught_stealing>
    <sacrifice_hits>0</sacrifice_hits>
    <sacrifice_flies>2</sacrifice_flies>
    <errors>5</errors>
    <passed_by_ball>20</passed_by_ball>
    <walks>17</walks>
    <strike_outs>67</strike_outs>
    <hit_by_pitch>5</hit_by_pitch>
  </player>
  <player>
    <first_name>Charlie </first_name>
    <surname>Obrien</surname>
    <games_played>62</games_played>
    <at_bats>175</at_bats>
    <runs>13</runs>
    <hits>45</hits>
    <doubles>9</doubles>
    <triples>0</triples>
    <home_runs>4</home_runs>
    <stolen_bases>18</stolen_bases>
    <caught_stealing>0</caught_stealing>
    <sacrifice_hits>3</sacrifice_hits>
    <sacrifice_flies>3</sacrifice_flies>
    <errors>4</errors>
    <passed_by_ball>1</passed_by_ball>
    <walks>10</walks>
    <strike_outs>33</strike_outs>
    <hit_by_pitch>2</hit_by_pitch>
  </player>
  <player>
    <first_name>Orlando </first_name>
    <surname>Palmeiro</surname>
    <games_played>74</games_played>
    <at_bats>165</at_bats>
    <runs>28</runs>
    <hits>53</hits>
    <doubles>7</doubles>
    <triples>2</triples>
    <home_runs>0</home_runs>
    <stolen_bases>21</stolen_bases>
    <caught_stealing>4</caught_stealing>
    <sacrifice_hits>7</sacrifice_hits>
    <sacrifice_flies>0</sacrifice_flies>
    <errors>0</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>20</walks>
    <strike_outs>11</strike_outs>
    <hit_by_pitch>0</hit_by_pitch>
  </player>
  <player>
    <first_name>Chris </first_name>
    <surname>Pritchett</surname>
    <games_played>31</games_played>
    <at_bats>80</at_bats>
    <runs>12</runs>
    <hits>23</hits>
    <doubles>2</doubles>
    <triples>1</triples>
    <home_runs>2</home_runs>
    <stolen_bases>8</stolen_bases>
    <caught_stealing>0</caught_stealing>
    <sacrifice_hits>0</sacrifice_hits>
    <sacrifice_flies>0</sacrifice_flies>
    <errors>1</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>4</walks>
    <strike_outs>16</strike_outs>
    <hit_by_pitch>0</hit_by_pitch>
  </player>
  <player>
    <first_name>Tim </first_name>
    <surname>Salmon</surname>
    <games_played>136</games_played>
    <at_bats>463</at_bats>
    <runs>84</runs>
    <hits>139</hits>
    <doubles>28</doubles>
    <triples>1</triples>
    <home_runs>26</home_runs>
    <stolen_bases>88</stolen_bases>
    <caught_stealing>1</caught_stealing>
    <sacrifice_hits>0</sacrifice_hits>
    <sacrifice_flies>10</sacrifice_flies>
    <errors>2</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>90</walks>
    <strike_outs>100</strike_outs>
    <hit_by_pitch>3</hit_by_pitch>
  </player>
  <player>
    <first_name>Craig </first_name>
    <surname>Shipley</surname>
    <games_played>77</games_played>
    <at_bats>147</at_bats>
    <runs>18</runs>
    <hits>38</hits>
    <doubles>7</doubles>
    <triples>1</triples>
    <home_runs>2</home_runs>
    <stolen_bases>17</stolen_bases>
    <caught_stealing>4</caught_stealing>
    <sacrifice_hits>4</sacrifice_hits>
    <sacrifice_flies>1</sacrifice_flies>
    <errors>3</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>5</walks>
    <strike_outs>22</strike_outs>
    <hit_by_pitch>5</hit_by_pitch>
  </player>
  <player>
    <first_name>Randy </first_name>
    <surname>Velarde</surname>
    <games_played>51</games_played>
    <at_bats>188</at_bats>
    <runs>29</runs>
    <hits>49</hits>
    <doubles>13</doubles>
    <triples>1</triples>
    <home_runs>4</home_runs>
    <stolen_bases>26</stolen_bases>
    <caught_stealing>2</caught_stealing>
    <sacrifice_hits>0</sacrifice_hits>
    <sacrifice_flies>1</sacrifice_flies>
    <errors>4</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>34</walks>
    <strike_outs>42</strike_outs>
    <hit_by_pitch>1</hit_by_pitch>
  </player>
  <player>
    <first_name>Matt </first_name>
    <surname>Walbeck</surname>
    <games_played>108</games_played>
    <at_bats>338</at_bats>
    <runs>41</runs>
    <hits>87</hits>
    <doubles>15</doubles>
    <triples>2</triples>
    <home_runs>6</home_runs>
    <stolen_bases>46</stolen_bases>
    <caught_stealing>1</caught_stealing>
    <sacrifice_hits>5</sacrifice_hits>
    <sacrifice_flies>5</sacrifice_flies>
    <errors>7</errors>
    <passed_by_ball>8</passed_by_ball>
    <walks>30</walks>
    <strike_outs>68</strike_outs>
    <hit_by_pitch>2</hit_by_pitch>
  </player>
  <player>
    <first_name>Reggie </first_name>
    <surname>Williams</surname>
    <games_played>29</games_played>
    <at_bats>36</at_bats>
    <runs>7</runs>
    <hits>13</hits>
    <doubles>1</doubles>
    <triples>0</triples>
    <home_runs>1</home_runs>
    <stolen_bases>5</stolen_bases>
    <caught_stealing>3</caught_stealing>
    <sacrifice_hits>1</sacrifice_hits>
    <sacrifice_flies>0</sacrifice_flies>
    <errors>0</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>7</walks>
    <strike_outs>11</strike_outs>
    <hit_by_pitch>1</hit_by_pitch>
  </player>
</players>

Converting data to XML while Processing it

import java.io.*;
import java.text.*;
import java.util.*;

public class BattingAverage {

  public static void main(String[] args) {

    try {
      FileInputStream fin = new FileInputStream(args[0]);
      BufferedReader in
       = new BufferedReader(new InputStreamReader(fin));

      FileOutputStream fout
       = new FileOutputStream("battingaverages.xml");
      Writer out = new OutputStreamWriter(fout, "UTF-8");
      out.write("<?xml version=\"1.0\"?>\r\n");
      out.write("<players>\r\n");
      String playerStats;

      // for formatting batting averages
      DecimalFormat averages = (DecimalFormat)
        NumberFormat.getNumberInstance(Locale.US);
      averages.setMaximumFractionDigits(3);
      averages.setMinimumFractionDigits(3);
      averages.setMinimumIntegerDigits(0);

      while ((playerStats = in.readLine()) != null) {
        String[] stats = splitLine(playerStats);

        String formattedAverage;
        try {
          int atBats         = Integer.parseInt(stats[6]);
          int hits           = Integer.parseInt(stats[8]);

          if (atBats <= 0) formattedAverage = "N/A";
          else {
            double average = hits / (double) atBats;
            formattedAverage = averages.format(average);
          }
        }
        catch (Exception e) {
          // skip this player
          continue;
        }

        out.write("  <player>\r\n");
        out.write("    <first_name>" + stats[1] + "</first_name>\r\n");
        out.write("    <surname>" + stats[0] + "</surname>\r\n");
        out.write("    <batting_average>" + formattedAverage
         + "</batting_average>\r\n");
        out.write("  </player>\r\n");
      }
      out.write("</players>\r\n");
      out.close();
      in.close();
    }
    catch (IOException e) {
      System.err.println(e);
    }
    catch (ArrayIndexOutOfBoundsException e) {
      System.out.println("Usage: java BattingAverage input_file.tab");
    }

  }

  public static String[] splitLine(String playerStats) {

    // count the number of tabs
    int numTabs = 0;
    for (int i = 0; i < playerStats.length(); i++) {
      if (playerStats.charAt(i) == '\t') numTabs++;
    }
    int numFields = numTabs + 1;
    String[] fields = new String[numFields];
    int position = 0;
    for (int i = 0; i < numFields; i++) {
      StringBuffer field = new StringBuffer();
      while (position < playerStats.length()
       && playerStats.charAt(position++) != '\t') {
        field.append(playerStats.charAt(position-1));
      }
      fields[i] = field.toString();
    }
    return fields;

  }

}

Batting Averages in XML

<?xml version="1.0"?>
<players>
  <player>
    <first_name>Garret </first_name>
    <surname>Anderson</surname>
    <batting_average>.294</batting_average>
  </player>
  <player>
    <first_name>Justin </first_name>
    <surname>Baughman</surname>
    <batting_average>.255</batting_average>
  </player>
  <player>
    <first_name>Frank </first_name>
    <surname>Bolick</surname>
    <batting_average>.156</batting_average>
  </player>
  <player>
    <first_name>Gary </first_name>
    <surname>Disarcina</surname>
    <batting_average>.287</batting_average>
  </player>
  <player>
    <first_name>Jim </first_name>
    <surname>Edmonds</surname>
    <batting_average>.307</batting_average>
  </player>
  <player>
    <first_name>Darin </first_name>
    <surname>Erstad</surname>
    <batting_average>.296</batting_average>
  </player>
  <player>
    <first_name>Carlos </first_name>
    <surname>Garcia</surname>
    <batting_average>.143</batting_average>
  </player>
  <player>
    <first_name>Troy </first_name>
    <surname>Glaus</surname>
    <batting_average>.218</batting_average>
  </player>
  <player>
    <first_name>Todd </first_name>
    <surname>Greene</surname>
    <batting_average>.254</batting_average>
  </player>
  <player>
    <first_name>Eric </first_name>
    <surname>Helfand</surname>
    <batting_average>N/A</batting_average>
  </player>
  <player>
    <first_name>Dave </first_name>
    <surname>Hollins</surname>
    <batting_average>.242</batting_average>
  </player>
  <player>
    <first_name>Gregg </first_name>
    <surname>Jefferies</surname>
    <batting_average>.347</batting_average>
  </player>
  <player>
    <first_name>Mark </first_name>
    <surname>Johnson</surname>
    <batting_average>.071</batting_average>
  </player>
  <player>
    <first_name>Chad </first_name>
    <surname>Kreuter</surname>
    <batting_average>.250</batting_average>
  </player>
  <player>
    <first_name>Norberto </first_name>
    <surname>Martin</surname>
    <batting_average>.215</batting_average>
  </player>
  <player>
    <first_name>Damon </first_name>
    <surname>Mashore</surname>
    <batting_average>.235</batting_average>
  </player>
  <player>
    <first_name>Ben </first_name>
    <surname>Molina</surname>
    <batting_average>.000</batting_average>
  </player>
  <player>
    <first_name>Phil </first_name>
    <surname>Nevin</surname>
    <batting_average>.228</batting_average>
  </player>
  <player>
    <first_name>Charlie </first_name>
    <surname>Obrien</surname>
    <batting_average>.257</batting_average>
  </player>
  <player>
    <first_name>Orlando </first_name>
    <surname>Palmeiro</surname>
    <batting_average>.321</batting_average>
  </player>
  <player>
    <first_name>Chris </first_name>
    <surname>Pritchett</surname>
    <batting_average>.288</batting_average>
  </player>
  <player>
    <first_name>Tim </first_name>
    <surname>Salmon</surname>
    <batting_average>.300</batting_average>
  </player>
  <player>
    <first_name>Craig </first_name>
    <surname>Shipley</surname>
    <batting_average>.259</batting_average>
  </player>
  <player>
    <first_name>Randy </first_name>
    <surname>Velarde</surname>
    <batting_average>.261</batting_average>
  </player>
  <player>
    <first_name>Matt </first_name>
    <surname>Walbeck</surname>
    <batting_average>.257</batting_average>
  </player>
  <player>
    <first_name>Reggie </first_name>
    <surname>Williams</surname>
    <batting_average>.361</batting_average>
  </player>
</players>

The point is this:


Questions?


To Learn More


Part III: Reading XML Documents with SAX

Actually, SAX2 has ** MUCH ** better infoset support than DOM does. Yes, I've done the detailed analysis.

--David Brownell on the xml-dev mailing list


Reading XML Documents


SAX


SAX Parsers for Java

Parser URL Validating Namespaces DOM1 DOM2 SAX1 SAX2 License
Yuval Oren's Piccolo http://piccolo.sourceforge.net/ X X X LGPL
Apache XML Project's Xerces Java http://xml.apache.org/xerces2-j/index.html X X X X X X Apache Software License, Version 1.1
IBM's XML for Java http://www.alphaworks.ibm.com/formula/xml X X X X X X Apache Software License, Version 1.1
Microstar/David Brownell's Ælfred http://www.gnu.org/software/classpathx/jaxp/jaxp.html X X X   X X GPL with library exception
Silfide's SXP http://www.loria.fr/projets/XSilfide/EN/sxp/    X   X  Non-GPL viral open source license
Sun's Crimson http://xml.apache.org/crimson/ X X X   X   Apache
Oracle's XML Parser for Java http://technet.oracle.com/ X X X   X  free beer

SAX1


SAX2


The SAX2 Process

  1. Use the factory method XMLReaderFactory.createXMLReader() to retrieve a parser-specific implementation of the XMLReader interface

  2. Your code registers a ContentHandler with the parser

  3. An InputSource feeds the document into the parser

  4. As the document is read, the parser calls back to the methods of the ContentHandler to tell it what it's seeing in the document.


Making an XMLReader


Parsing a Document with XMLReader

import org.xml.sax.*;
import org.xml.sax.helpers.*;
import java.io.*;


public class SAX2Checker {

  public static void main(String[] args) {

    XMLReader parser;
    try {
      parser = XMLReaderFactory.createXMLReader();
    }
    catch (SAXException ex) {
      try {
        parser = XMLReaderFactory.createXMLReader(
         "org.apache.xerces.parsers.SAXParser");
      }
      catch (SAXException ex2) {
        System.out.println("Could not locate a parser."
         + "Please set the the org.xml.sax.driver property.");
        return;
      }
    }

    if (args.length == 0) {
      System.out.println("Usage: java SAX2Checker URL1 URL2...");
    }

    // start parsing...
    for (int i = 0; i < args.length; i++) {

      // command line should offer URIs or file names
      try {
        parser.parse(args[i]);
        // If there are no well-formedness errors
        // then no exception is thrown
        System.out.println(args[i] + " is well formed.");
      }
      catch (SAXParseException e) { // well-formedness error
        System.out.println(args[i] + " is not well formed.");
        System.out.println(e.getMessage()
         + " at line " + e.getLineNumber()
         + ", column " + e.getColumnNumber());
      }
      catch (SAXException e) { // some other kind of error
        System.out.println(e.getMessage());
      }
      catch (IOException e) {
        System.out.println("Could not check " + args[i]
         + " because of the IOException " + e);
      }

    }

  }

}

Sample Output from SAX2Checker

C:\>java SAX2Checker http://www.cafeconleche.org/
http://www.cafeconleche.org/ is not well formed.
The element type "dt" must be terminated by the 
matching end-tag "</dt>". 
at line 186, column 5

The ContentHandler interface

package org.xml.sax;


public interface ContentHandler {

    public void setDocumentLocator(Locator locator);
    
    public void startDocument() throws SAXException;
    
    public void endDocument() throws SAXException;
    
    public void startPrefixMapping(String prefix, String uri) 
     throws SAXException;

    public void endPrefixMapping(String prefix) throws SAXException;

    public void startElement(String namespaceURI, String localName,
     String qualifiedName, Attributes atts) throws SAXException;

    public void endElement(String namespaceURI, String localName,
     String qualifiedName) throws SAXException;

    public void characters(char[] text, int start, int length) 
     throws SAXException;

    public void ignorableWhitespace(char[] text, int start, int length)
     throws SAXException;

    public void processingInstruction(String target, String data)
     throws SAXException;

    public void skippedEntity(String name) throws SAXException;
     
}

SAX2 Event Reporter

import org.xml.sax.*;
import org.xml.sax.helpers.*;
import java.io.*;

public class EventReporter implements ContentHandler {

  public void setDocumentLocator(Locator locator) {
    System.out.println("setDocumentLocator(" + locator + ")");
  }

  public void startDocument() throws SAXException {
    System.out.println("startDocument()");
  }

  public void endDocument() throws SAXException {
    System.out.println("endDocument()");
  }

  public void startElement(String namespaceURI, String localName, 
   String qualifiedName, Attributes atts)
   throws SAXException {
    namespaceURI = '"' + namespaceURI + '"';
    localName = '"' + localName + '"';
    qualifiedName = '"' + qualifiedName + '"';
    String attributeString = "{";
    for (int i = 0; i < atts.getLength(); i++) {
      attributeString += atts.getQName(i) + "=\"" 
       + atts.getValue(i) + "\"";
      if (i != atts.getLength()-1) attributeString += ", ";
    }
    attributeString += "}";
    System.out.println("startElement(" + namespaceURI + ", " 
     + localName + ", " + qualifiedName + ", " + attributeString + ")");
  }

  public void endElement(String namespaceURI, String localName, 
   String qualifiedName)
   throws SAXException {
    namespaceURI = '"' + namespaceURI + '"';
    localName = '"' + localName + '"';
    qualifiedName = '"' + qualifiedName + '"';
    System.out.println("endElement(" + namespaceURI + ", " 
     + localName + ", " + qualifiedName + ")");
  }

  public void characters(char[] text, int start, int length)
   throws SAXException {
    String textString = "[" + new String(text) + "]";
    System.out.println("characters(" + textString + ", " 
     + start + ", " +  length + ")");
  }

  public void ignorableWhitespace(char[] text, int start, int length)
   throws SAXException {
    System.out.println("ignorableWhitespace()");
  }

  public void processingInstruction(String target, String data)
   throws SAXException {
    System.out.println("processingInstruction(" + target + ", " 
     + data + ")");
  }

  public void startPrefixMapping(String prefix, String uri)
   throws SAXException {
    System.out.println("startPrefixMapping(\"" + prefix + "\", \"" 
     + uri + "\")");
  }

  public void endPrefixMapping(String prefix) throws SAXException {
    System.out.println("endPrefixMapping(\"" + prefix + "\")");
  }

  public void skippedEntity(String name) throws SAXException {
    System.out.println("skippedEntity(" + name + ")");
  }

  // Could easily have put main() method in a separate class
  public static void main(String[] args) {

    XMLReader parser;
    try {
     parser = XMLReaderFactory.createXMLReader();
    }
    catch (Exception e) {
      // fall back on Xerces parser by name
      try {
        parser = XMLReaderFactory.createXMLReader(
         "org.apache.xerces.parsers.SAXParser");
      }
      catch (Exception ee) {
        System.err.println("Couldn't locate a SAX parser");
        return;
      }
    }


    if (args.length == 0) {
      System.out.println(
       "Usage: java EventReporter URL1 URL2...");
    }

    // Install the content handler
    parser.setContentHandler(new EventReporter());

    // start parsing...
    for (int i = 0; i < args.length; i++) {

      // command line should offer URIs or file names
      try {
        parser.parse(args[i]);
      }
      catch (SAXParseException e) { // well-formedness error
        System.out.println(args[i] + " is not well formed.");
        System.out.println(e.getMessage()
         + " at line " + e.getLineNumber()
         + ", column " + e.getColumnNumber());
      }
      catch (SAXException e) { // some other kind of error
        System.out.println(e.getMessage());
      }
      catch (IOException e) {
        System.out.println("Could not report on " + args[i]
         + " because of the IOException " + e);
      }

    }

  }

}

Event Reporter Output

View in Browser

A Sample Application

Full list

Goal: Return a list of all the URLs in this list as java.net.URL objects

Design Decisions


SAX Design


User Interface Class

import org.xml.sax.*;
import org.xml.sax.helpers.XMLReaderFactory;
import java.util.*;
import java.io.*;


public class WeblogsSAX {
     
  public static List listChannels() 
   throws IOException, SAXException {
    return listChannels(
     "http://static.userland.com/weblogMonitor/logs.xml"); 
  }
  
  public static List listChannels(String uri) 
   throws IOException, SAXException {
    
    XMLReader parser;
    try {
      parser = XMLReaderFactory.createXMLReader();
    }
    catch (SAXException ex) {
      parser = XMLReaderFactory.createXMLReader(
       "org.apache.xerces.parsers.SAXParser"
      );
    }
    Vector urls = new Vector(1000);
    ContentHandler handler = new URIGrabber(urls);
    parser.setContentHandler(handler);
    parser.parse(uri);
    return urls;
    
  }
  
  public static void main(String[] args) {
   
    try {
      List urls;
      if (args.length > 0) urls = listChannels(args[0]);
      else urls = listChannels();
      Iterator iterator = urls.iterator();
      while (iterator.hasNext()) {
        System.out.println(iterator.next()); 
      }
    }
    catch (IOException e) {
      System.err.println(e); 
    }
    catch (SAXParseException e) {
      System.err.println(e); 
      System.err.println("at line " + e.getLineNumber() 
       + ", column " + e.getColumnNumber()); 
    }
    catch (SAXException e) {
      System.err.println(e); 
    }
    catch (/* Unexpected */ Exception e) {
      e.printStackTrace(); 
    }
    
  }
  
}

ContentHandler Class

import org.xml.sax.*;
import java.net.*;
import java.util.Vector;

             // conflicts with java.net.ContentHandler
class URIGrabber implements org.xml.sax.ContentHandler {

  private Vector urls;

  URIGrabber(Vector urls) {
    this.urls = urls;
  }

  // do nothing methods
  public void setDocumentLocator(Locator locator) {}
  public void startDocument() throws SAXException {}
  public void endDocument() throws SAXException {}
  public void startPrefixMapping(String prefix, String uri)
   throws SAXException {}
  public void endPrefixMapping(String prefix) throws SAXException {}
  public void skippedEntity(String name) throws SAXException {}
  public void ignorableWhitespace(char[] text, int start, int length)
   throws SAXException {}
  public void processingInstruction(String target, String data)
   throws SAXException {}


  // Remember, there's no guarantee all the text of the
  // url element will be returned in a single call to characters
  private StringBuffer urlBuffer;
  private boolean collecting = false;

  public void startElement(String namespaceURI, String localName,
   String qualifiedName, Attributes atts) throws SAXException {

    if (qualifiedName.equals("url")) {
      collecting = true;
      urlBuffer = new StringBuffer();
    }

  }

  public void characters(char[] text, int start, int length)
   throws SAXException {

    if (collecting) {
      urlBuffer.append(text, start, length);
    }

  }

  public void endElement(String namespaceURI, String localName,
   String qualifiedName) throws SAXException {

    if (qualifiedName.equals("url")) {
      collecting = false;
      String url = urlBuffer.toString();
      try {
        urls.addElement(new URL(url));
      }
      catch (MalformedURLException e) {
        // skip this url
      }
    }

  }

}

Weblogs Output

% java Weblogs shortlogs.xml
http://www.mozillazine.org
http://www.salonherringwiredfool.com/
http://www.slashdot.org/

Features and Properties


Feature/Property SAXExceptions

SAXNotRecognizedException
The parser never allows you to set or get this feature or property
SAXNotSupportedException
The parser does not allow this value for a requested feature/property, or the feature/property is read-only, or the feature/property cannot be read/written at this moment in the parsing process.

Required Features


Core Features

adapted from SAX2 documentation by David Megginson


Turning on Validation


Three Levels of Errors

In increasing order of severity:

  1. A warning; e.g. ambiguous content model, a constraint for compatibility

  2. A recoverable error: typically a validity error

  3. A fatal error: typically a well-formedness error


The ErrorHandler interface

package org.xml.sax;

public interface ErrorHandler {
 
  public void warning(SAXParseException exception)
   throws SAXException;

  public void error(SAXParseException exception)
   throws SAXException;
    
  public void fatalError(SAXParseException exception)
   throws SAXException;
    
}

An ErrorHandler for Reporting Validity Errors

import org.xml.sax.*;
import java.io.*;


public class ValidityErrorReporter implements ErrorHandler {
 
  private Writer out;
 
  public ValidityErrorReporter(Writer out) {
    this.out = out;
  }
 
  public ValidityErrorReporter() {
    this(new OutputStreamWriter(System.out));
  }
 
  public void warning(SAXParseException ex)
   throws SAXException {

    try {
      out.write(ex.getMessage() + "\r\n");
      out.write(" at line " + ex.getLineNumber() + ", column " 
       + ex.getColumnNumber() + "\r\n");
      out.flush();
    }
    catch (IOException e) {
      throw new SAXException(e); 
    }
    
  }

  public void error(SAXParseException ex)
   throws SAXException {
    
    try {
      out.write(ex.getMessage() + "\r\n");
      out.write(" at line " + ex.getLineNumber() + ", column " 
       + ex.getColumnNumber() + "\r\n");
      out.flush();
    }
    catch (IOException e) {
      throw new SAXException(e); 
    }
    
  }
    
  public void fatalError(SAXParseException ex)
   throws SAXException {
    
    try {
      out.write(ex.getMessage() + "\r\n");
      out.write(" at line " + ex.getLineNumber() + ", column " 
       + ex.getColumnNumber() + "\r\n");
      out.flush();
    }
    catch (IOException e) {
      throw new SAXException(e); 
    }
    
  }
    
}

Validating

import org.xml.sax.*;
import org.xml.sax.helpers.*;
import org.apache.xerces.parsers.*; 
import java.io.*;


public class SAX2Validator {

  public static void main(String[] args) {
    
    XMLReader parser;
    try {
      parser = XMLReaderFactory.createXMLReader();
    }
    catch (SAXException ex) {
      try {
        parser = XMLReaderFactory.createXMLReader(
         "org.apache.xerces.parsers.SAXParser"
        );
      }
      catch (SAXException ex2) {
        System.err.println("Could not locate a SAX2 Parser");
        return;
      }
    }
     
    // turn on validation
    try {
      parser.setFeature(
       "http://xml.org/sax/features/validation", true);
      parser.setErrorHandler(new ValidityErrorReporter());
    }
    catch (SAXNotRecognizedException e) {
      System.err.println(
       "Installed XML parser cannot validate;"
       + " checking for well-formedness instead...");
    } 
    catch (SAXNotSupportedException e) {
      System.err.println(
       "Cannot turn on validation here; "
       + "checking for well-formedness instead...");
    } 
     
    if (args.length == 0) {
      System.out.println("Usage: java SAX2Validator URL1 URL2..."); 
    } 
      
    // start parsing... 
    for (int i = 0; i < args.length; i++) {
      
      // command line should offer URIs or file names
      try {
        parser.parse(args[i]);
        // If there are no well-formedness errors, 
        // then no exception is thrown
        System.out.println(args[i] + " is well formed.");
      }
      catch (SAXParseException e) { // well-formedness error
        System.out.println(args[i] + " is not well formed.");
        System.out.println(e.getMessage()
         + " at line " + e.getLineNumber() 
         + ", column " + e.getColumnNumber());
      }
      catch (SAXException e) { // some other kind of error
        System.out.println(e.getMessage());
      }
      catch (IOException e) {
        System.out.println("Could not check " + args[i] 
         + " because of the IOException " + e);
      }
      
    }  
  
  }

}

Core Properties

adapted from SAX2 documentation by David Megginson


Nonstandard Features in Xerces


Nonstandard Properties in Xerces

http://apache.org/xml/properties/schema/external-schemaLocation
http://apache.org/xml/properties/schema/external-noNamespaceSchemaLocation

Properties for Extension Handlers


Handling Attributes in SAX2


Attributes Example

import org.xml.sax.*;
import org.apache.xerces.parsers.*;
import java.io.*;
import java.util.*;
import org.xml.sax.helpers.*;


public class XLinkSpider extends DefaultHandler {

  public static Enumeration listURIs(String systemId) 
   throws SAXException, IOException {
    
    // set up the parser 
    XMLReader parser;
    try {
      parser = XMLReaderFactory.createXMLReader();
    } 
    catch (SAXException e) {
      try {
        parser = XMLReaderFactory.createXMLReader(
         "org.apache.xerces.parsers.SAXParser");
      }
      catch (SAXException e2) {
        System.err.println("Error: could not locate a parser.");
        return null;
      }
    }
      
    // Install the Content Handler   
    XLinkSpider spider = new XLinkSpider();   
    parser.setContentHandler(spider);
    parser.parse(systemId);
    return spider.uris.elements();
      
  }
  
  private Vector uris = new Vector();

  public void startElement(String namespaceURI, String localName, 
   String rawName, Attributes atts) throws SAXException {
    
     String uri = atts.getValue(
      "http://www.w3.org/1999/xlink", "href");
     if (uri != null) uris.addElement(uri);
    
  }
  

  public static void main(String[] args) {
    
    if (args.length == 0) {
      System.out.println("Usage: java XLinkSpider URL1 URL2..."); 
    } 
      
    // start parsing... 
    for (int i = 0; i < args.length; i++) {
      
      try {
        Enumeration uris = listURIs(args[i]);
        while (uris.hasMoreElements()) {
          String s = (String) uris.nextElement();
          System.out.println(s);
        }
      }
      catch (Exception e) {
        System.err.println(e);
        e.printStackTrace(); 
      }
      
    } // end for
  
  } // end main

} // end XLinkSpider

Resolving Entities


EntityResolver Example

import org.xml.sax.*;

public class RSSResolver implements EntityResolver {

  public InputSource resolveEntity(String publicID, String systemID) {

    if ( publicID.equals(
          "-//Netscape Communications//DTD RSS 0.91//EN")
     || systemID.equals(
          "http://my.netscape.com/publish/formats/rss-0.91.dtd")) {
      return new InputSource(
       "http://www.cafeconleche.org/dtds/rss.dtd");
    } 
    else {
      // use the default behaviour
      return null;
    }
    
  }
   
}
 

Questions?


Handling DTDs


DTDHandler Example


TextEntityReplacer

import org.xml.sax.*;
import java.util.*;
import java.net.*;
import java.io.*;


public class TextEntityReplacer implements DTDHandler {

  /* This class stores the notation and entity declarations 
     for a single document. It is not designed to be reused
     for multiple parses, though that would be straightforward
     extension. The public and system IDs of the document
     being parsed are set in the constructor.    
  */ 
  
  private URL systemID;
  private String publicID;
  
  public TextEntityReplacer(String publicID, String systemID) 
   throws MalformedURLException {
    this.publicID = publicID;
    this.systemID = new URL(systemID);
  }

  // store all notations in a hashtable. We'll need them later
  private Hashtable notations = new Hashtable();

  // for the DTDHandler interface
  public void notationDecl(String name, String publicID, 
   String systemID)
   throws SAXException {
    
    Notation n = new Notation(name, publicID, systemID);
    notations.put(name, n);
    
  }
  
  private class Notation {
    
    String name;
    String publicID;
    String systemID;
    
    Notation(String name, String publicID, String systemID) {
      this.name = name;
      this.publicID = publicID;
      this.systemID = systemID;
    } 
    
  }
 
   
  // store all unparsed entities in a hashtable. We'll need them later
  private Hashtable unparsedEntities = new Hashtable();

  // for the DTDHandler interface
  public void unparsedEntityDecl(String name, String publicID, 
   String systemID, String notationName) throws SAXException {
    
    UnparsedEntity e = new UnparsedEntity(name, publicID, 
     systemID, notationName);
    unparsedEntities.put(name, e);
    
  }    

  private class UnparsedEntity {
    
    String name;
    String publicID;
    String systemID;
    String notationName;
    
    UnparsedEntity(String name, String publicID, 
     String systemID, String notationName) {
      this.name = name;
      this.notationName = notationName;
      this.publicID = publicID;
      this.systemID = systemID;
    } 
    
  }


  public boolean isText(String notationName) {
    
    Object o = notations.get(notationName);
    if (o == null) return false;
    Notation n = (Notation) o;
    if (n.systemID.startsWith("text/")) return true;
    return false;
    
  }
  
  public String getText(String entityName) throws IOException {
    
    Object o = unparsedEntities.get(entityName);
    if (o == null) return "";
    UnparsedEntity entity = (UnparsedEntity) o;
    if (!isText(entity.notationName)) {
      return " binary data "; // could throw an exception instead
    }
    
    URL source;
    try {
      source = new URL(systemID, entity.systemID);     
    }
    catch (Exception e) {
      return " unresolvable entity "; // could throw an exception instead
    }
    
    // I'm not really handling character encodings here. 
    // A more detailed look at the MIME media type would allow that.
    Reader in = new BufferedReader(
      new InputStreamReader(source.openStream())
    );
    StringBuffer result = new StringBuffer();
    int c;
    while ((c = in.read()) != -1) {
      result.append((char) c); 
    }
    
    return result.toString();
    
  }

}

Handling Declarations


The DeclHandler interface:

package org.xml.sax.ext;

import org.xml.sax.SAXException;


public interface DeclHandler {

  public void elementDecl(String name, String model)
   throws SAXException;

  public void attributeDecl(String elementName, String attributeName, 
   String type, String defaultValue, String value) 
   throws SAXException;

  public void internalEntityDecl(String name, String value)
   throws SAXException;

  public void externalEntityDecl(String name, String publicID,
   String systemID) throws SAXException;

}

DTDMerger

import org.xml.sax.*;
import org.xml.sax.ext.DeclHandler;
import org.xml.sax.helpers.XMLReaderFactory;
import java.io.IOException;


public class DTDMerger implements DeclHandler {

  public void elementDecl(String name, String model)
   throws SAXException {
    System.out.println("<!ELEMENT " + name + " " + model + " >");
  }
  
  public void attributeDecl(String elementName, 
   String attributeName, String type, String mode, 
   String defaultValue) throws SAXException {
     
    System.out.print("<!ATTLIST ");
    System.out.print(elementName);
    System.out.print(" ");
    System.out.print(attributeName);
    System.out.print(" ");
    System.out.print(type);
    System.out.print(" ");
    if (mode != null) {
      System.out.print(mode + " ");
    }
    if (defaultValue != null) {
      System.out.print('"' + defaultValue + "\" ");
    }
    System.out.println(">");   
     
  }
  
  public void internalEntityDecl(String name, 
   String value) throws SAXException {
     
    if (!name.startsWith("%")) { // ignore parameter entities
      System.out.println("<!ENTITY " + name + " \"" 
       + value + "\">");        
    }
    
  }
  
  public void externalEntityDecl(String name, 
   String publicID, String systemID) throws SAXException {
     
    if (!name.startsWith("%")) { // ignore parameter entities
      if (publicID != null) { 
        System.out.println("<!ENTITY " + name + " PUBLIC \"" 
         + publicID + "\" \"" + systemID + "\">");        
      
      }
      else {
        System.out.println("<!ENTITY " + name + " SYSTEM \"" 
         + systemID + "\">");        
      }
    }
    
  }

  public static void main(String[] args) {

    if (args.length <= 0) {
      System.out.println("Usage: java DTDMerger URL");
      return;
    }
    String document = args[0];
    
    XMLReader parser = null;
    try {
      parser = XMLReaderFactory.createXMLReader();
      DeclHandler handler = new DTDMerger();
      parser.setProperty(
       "http://xml.org/sax/properties/declaration-handler", 
       handler);
      parser.parse(document);
    }
    catch (SAXNotRecognizedException e) {
      System.err.println(parser.getClass() 
       + " does not support declaration handlers.");
    }
    catch (SAXNotSupportedException e) {
      System.err.println(parser.getClass() 
       + " does not support declaration handlers.");

    }
    catch (SAXException e) {
      System.err.println(e);
      // As long as we finished with the DTD we really don't care
    }
    catch (IOException e) { 
      System.out.println(
       "Due to an IOException, the parser could not check " 
       + document
      ); 
    }
   
  }
   
}

Handling Lexical Events


The LexicalHandler interface

package org.xml.sax.ext;

import org.xml.sax.SAXException;


public interface LexicalHandler {

  public void startDTD(String name, String publicID, String systemID)
   throws SAXException;
  public void endDTD() throws SAXException;
  public void startEntity(String name) throws SAXException;
  public void endEntity(String name) throws SAXException;
  public void startCDATA() throws SAXException;
  public void endCDATA() throws SAXException;
  public void comment (char[] text, int start, int length) 
   throws SAXException;

}

LexicalHandler Example

import org.xml.sax.*;
import org.xml.sax.ext.*;
import org.xml.sax.helpers.*;
import java.io.IOException;


public class SAXCommentReader implements LexicalHandler {

  public void startDTD(String name, String publicId, String systemId)
   throws SAXException {}
  public void endDTD() throws SAXException {}
  public void startEntity(String name) throws SAXException {}
  public void endEntity(String name) throws SAXException {}
  public void startCDATA() throws SAXException {}
  public void endCDATA() throws SAXException {}

  public void comment (char[] text, int start, int length)
   throws SAXException {

    String comment = new String(text, start, length);
    System.out.println(comment);

  }

  public static void main(String[] args) {

    // set up the parser
    XMLReader parser;
    try {
      parser = XMLReaderFactory.createXMLReader();
    }
    catch (SAXException e) {
      try {
        parser = XMLReaderFactory.createXMLReader(
         "org.apache.xerces.parsers.SAXParser");
      }
      catch (SAXException e2) {
        System.err.println("Error: could not locate a parser.");
        return;
      }
    }

    // turn on comment handling
    try {
      parser.setProperty(
       "http://xml.org/sax/properties/lexical-handler",
       new SAXCommentReader()
      );
    }
    catch (SAXNotRecognizedException e) {
      System.err.println(
       "Installed XML parser does not provide lexical events...");
      return;
    }
    catch (SAXNotSupportedException e) {
      System.err.println(
       "Cannot turn on comment processing here");
      return;
    }

    if (args.length == 0) {
      System.out.println("Usage: java SAXCommentReader URL1 URL2...");
    }

    // start parsing...
    for (int i = 0; i < args.length; i++) {

      try {
        parser.parse(args[i]);
      }
      catch (SAXParseException e) { // well-formedness error
        System.out.println(args[i] + " is not well formed.");
        System.out.println(e.getMessage()
         + " at line " + e.getLineNumber()
         + ", column " + e.getColumnNumber());
      }
      catch (SAXException e) { // some other kind of error
        System.out.println(e.getMessage());
      }
      catch (IOException e) {
        System.out.println("Could not check " + args[i]
         + " because of the IOException " + e);
      }

    }

  }

}

SAXCommentReader Output

C:\EXAMPLES>java SAXCommentReader hotcop.xml
 This should be a four digit year like "1999",
     not a two-digit year like "99"
 The publisher is actually Polygram but I needed
       an example of a general entity reference.
 You can tell what album I was
     listening to when I wrote this example

Or try http://www.w3.org/TR/2000/REC-xml-20001006.xml


The Locator interface


Locator Example

import org.xml.sax.*;
import org.xml.sax.helpers.*;
import org.apache.xerces.parsers.*; 
import java.io.*;


public class LocationReporter implements ContentHandler {

  private Locator locator = null;

  public void setDocumentLocator(Locator locator) {
    this.locator = locator;  
  }
  
  private String reportPosition() {
    
    if (locator != null) {
      
      String publicID = locator.getPublicId();
      String systemID = locator.getSystemId();
      int line        = locator.getLineNumber();
      int column      = locator.getColumnNumber();
      
      String name;
      if (publicID != null) name = publicID;
      else name = systemID;
      
      return " in " + name + " at line " + line 
       + ", column " + column;
    }
    return "";
    
  }
  
  public void startDocument() throws SAXException {
    System.out.println("Document started" + reportPosition()); 
  }

  public void endDocument() throws SAXException {
    System.out.println("Document ended" + reportPosition()); 
  }
  
  public void characters(char[] text, int start, int length) 
   throws SAXException {
    System.out.println("Got some characters" + reportPosition()); 
  }
  
  public void ignorableWhitespace(char[] text, int start, int length)
   throws SAXException {
    System.out.println("Got some ignorable white space" 
     + reportPosition()); 
  }
  
  public void processingInstruction(String target, String data)
   throws SAXException {
    System.out.println("Got a processing instruction" 
     + reportPosition()); 
  }
  
  // Changed methods for SAX2
  public void startElement(String namespaceURI, String localName,
	 String qualifiedName, Attributes atts) throws SAXException {
    System.out.println("Element " + qualifiedName + " started" 
     + reportPosition()); 
  }
  
  public void endElement(String namespaceURI, String localName,
	 String qualifiedName) throws SAXException {
    System.out.println("Element " + qualifiedName + " ended" 
     + reportPosition()); 
  } 

  // new methods for SAX2
  public void startPrefixMapping(String prefix, String uri) 
   throws SAXException {
    System.out.println("Started mapping prefix " + prefix 
     + " to URI " + uri + reportPosition());     
  }

  public void endPrefixMapping(String prefix) throws SAXException {
    System.out.println("Stopped mapping prefix " 
     + prefix + reportPosition());         
  }

  public void skippedEntity(String name) throws SAXException {
    System.out.println("Skipped entity " + name + reportPosition());         
  }  

  // Could easily have put main() method in a separate class
  public static void main(String[] args) {
    
    XMLReader parser;
    try {
      parser = XMLReaderFactory.createXMLReader();
    }
    catch (SAXException ex) {
      try {
        parser = XMLReaderFactory.createXMLReader(
         "org.apache.xerces.parsers.SAXParser");
      }
      catch (SAXException e2) {
        System.err.println("Error: no parser found!");
        return; 
      }
    }
     
    if (args.length == 0) {
      System.out.println(
       "Usage: java LocationReporter URL1 URL2..."); 
    } 
      
    // Install the Content Handler      
    parser.setContentHandler(new LocationReporter());
    
    // start parsing... 
    for (int i = 0; i < args.length; i++) {
      
      // command line should offer URIs or file names
      try {
        parser.parse(args[i]);
      }
      catch (SAXParseException e) { // well-formedness error
        System.out.println(args[i] + " is not well formed.");
        System.out.println(e.getMessage()
         + " at line " + e.getLineNumber() 
         + ", column " + e.getColumnNumber());
      }
      catch (SAXException e) { // some other kind of error
        System.out.println(e.getMessage());
      }
      catch (IOException e) {
        System.out.println("Could not report on " + args[i] 
         + " because of the IOException " + e);
      }
      
    }  
  
  }

}
View Output

Locator Example

Document started in file:///J:/KENTUCKY/xmlandjava/EXAMPLES/hotcop.xml at line 1, column 1
Got a processing instruction in file:///J:/KENTUCKY/xmlandjava/EXAMPLES/hotcop.xml at line 2, column 51
Started mapping prefix  to URI http://metalab.unc.edu/xml/namespace/song in file:///J:/KENTUCKY/xmlandjava/EXAMPLES/hotcop.xml at line 5, column 50
Started mapping prefix xlink to URI http://www.w3.org/1999/xlink in file:///J:/KENTUCKY/xmlandjava/EXAMPLES/hotcop.xml at line 5, column 50
Element SONG started in file:///J:/KENTUCKY/xmlandjava/EXAMPLES/hotcop.xml at line 5, column 50
Got some ignorable white space in file:///J:/KENTUCKY/xmlandjava/EXAMPLES/hotcop.xml at line 6, column 3
Element TITLE started in file:///J:/KENTUCKY/xmlandjava/EXAMPLES/hotcop.xml at line 6, column 10
Got some characters in file:///J:/KENTUCKY/xmlandjava/EXAMPLES/hotcop.xml at line 6, column 17
Element TITLE ended in file:///J:/KENTUCKY/xmlandjava/EXAMPLES/hotcop.xml at line 6, column 26
Got some ignorable white space in file:///J:/KENTUCKY/xmlandjava/EXAMPLES/hotcop.xml at line 7, column 3
Element PHOTO started in file:///J:/KENTUCKY/xmlandjava/EXAMPLES/hotcop.xml at line 9, column 65
Element PHOTO ended in file:///J:/KENTUCKY/xmlandjava/EXAMPLES/hotcop.xml at line 9, column 65
Got some ignorable white space in file:///J:/KENTUCKY/xmlandjava/EXAMPLES/hotcop.xml at line 10, column 3
Element COMPOSER started in file:///J:/KENTUCKY/xmlandjava/EXAMPLES/hotcop.xml at line 10, column 13
Got some characters in file:///J:/KENTUCKY/xmlandjava/EXAMPLES/hotcop.xml at line 10, column 27
Element COMPOSER ended in file:///J:/KENTUCKY/xmlandjava/EXAMPLES/hotcop.xml at line 10, column 39
Got some ignorable white space in file:///J:/KENTUCKY/xmlandjava/EXAMPLES/hotcop.xml at line 11, column 3
Element COMPOSER started in file:///J:/KENTUCKY/xmlandjava/EXAMPLES/hotcop.xml at line 11, column 13
Got some characters in file:///J:/KENTUCKY/xmlandjava/EXAMPLES/hotcop.xml at line 11, column 25
Element COMPOSER ended in file:///J:/KENTUCKY/xmlandjava/EXAMPLES/hotcop.xml at line 11, column 37
Got some ignorable white space in file:///J:/KENTUCKY/xmlandjava/EXAMPLES/hotcop.xml at line 12, column 3
Element COMPOSER started in file:///J:/KENTUCKY/xmlandjava/EXAMPLES/hotcop.xml at line 12, column 13
Got some characters in file:///J:/KENTUCKY/xmlandjava/EXAMPLES/hotcop.xml at line 12, column 26
Element COMPOSER ended in file:///J:/KENTUCKY/xmlandjava/EXAMPLES/hotcop.xml at line 12, column 38
Got some ignorable white space in file:///J:/KENTUCKY/xmlandjava/EXAMPLES/hotcop.xml at line 13, column 3
Element PRODUCER started in file:///J:/KENTUCKY/xmlandjava/EXAMPLES/hotcop.xml at line 13, column 13
Got some characters in file:///J:/KENTUCKY/xmlandjava/EXAMPLES/hotcop.xml at line 13, column 27
Element PRODUCER ended in file:///J:/KENTUCKY/xmlandjava/EXAMPLES/hotcop.xml at line 13, column 39
Got some ignorable white space in file:///J:/KENTUCKY/xmlandjava/EXAMPLES/hotcop.xml at line 14, column 3
Got some ignorable white space in file:///J:/KENTUCKY/xmlandjava/EXAMPLES/hotcop.xml at line 16, column 3
Element PUBLISHER started in file:///J:/KENTUCKY/xmlandjava/EXAMPLES/hotcop.xml at line 16, column 73
Got some characters in file:///J:/KENTUCKY/xmlandjava/EXAMPLES/hotcop.xml at line 17, column 7
Got some characters in file:///J:/KENTUCKY/xmlandjava/EXAMPLES/hotcop.xml at line 17, column 12
Got some characters in file:///J:/KENTUCKY/xmlandjava/EXAMPLES/hotcop.xml at line 18, column 3
Element PUBLISHER ended in file:///J:/KENTUCKY/xmlandjava/EXAMPLES/hotcop.xml at line 18, column 16
Got some ignorable white space in file:///J:/KENTUCKY/xmlandjava/EXAMPLES/hotcop.xml at line 19, column 3
Element LENGTH started in file:///J:/KENTUCKY/xmlandjava/EXAMPLES/hotcop.xml at line 19, column 11
Got some characters in file:///J:/KENTUCKY/xmlandjava/EXAMPLES/hotcop.xml at line 19, column 15
Element LENGTH ended in file:///J:/KENTUCKY/xmlandjava/EXAMPLES/hotcop.xml at line 19, column 25
Got some ignorable white space in file:///J:/KENTUCKY/xmlandjava/EXAMPLES/hotcop.xml at line 20, column 3
Element YEAR started in file:///J:/KENTUCKY/xmlandjava/EXAMPLES/hotcop.xml at line 20, column 9
Got some characters in file:///J:/KENTUCKY/xmlandjava/EXAMPLES/hotcop.xml at line 20, column 13
Element YEAR ended in file:///J:/KENTUCKY/xmlandjava/EXAMPLES/hotcop.xml at line 20, column 21
Got some ignorable white space in file:///J:/KENTUCKY/xmlandjava/EXAMPLES/hotcop.xml at line 21, column 3
Element ARTIST started in file:///J:/KENTUCKY/xmlandjava/EXAMPLES/hotcop.xml at line 21, column 11
Got some characters in file:///J:/KENTUCKY/xmlandjava/EXAMPLES/hotcop.xml at line 21, column 25
Element ARTIST ended in file:///J:/KENTUCKY/xmlandjava/EXAMPLES/hotcop.xml at line 21, column 35
Got some ignorable white space in file:///J:/KENTUCKY/xmlandjava/EXAMPLES/hotcop.xml at line 22, column 1
Element SONG ended in file:///J:/KENTUCKY/xmlandjava/EXAMPLES/hotcop.xml at line 22, column 9
Stopped mapping prefix xlink in file:///J:/KENTUCKY/xmlandjava/EXAMPLES/hotcop.xml at line 22, column 9
Stopped mapping prefix  in file:///J:/KENTUCKY/xmlandjava/EXAMPLES/hotcop.xml at line 22, column 9
Document ended in Null Entity at line -1, column -1

The DefaultHandler class


The NamespaceSupport class


Filtering XML


XMLFilter Example

import org.xml.sax.*;
import org.xml.sax.helpers.*;
import java.util.*;
import java.io.IOException;


public class UnparsedTextFilter extends XMLFilterImpl {

  private TextEntityReplacer replacer;

  public UnparsedTextFilter(XMLReader parent) {
    super(parent);
  }

  public void parse(InputSource input) 
   throws IOException, SAXException {
                 System.out.println("parsing");

    replacer = new TextEntityReplacer(input.getPublicId(), 
     input.getSystemId());
    this.setDTDHandler(replacer); 
    this.setContentHandler(this); 
  }
  // The other parse() method just calls this one 

  public void parse(String systemId) 
   throws IOException, SAXException {
    parse(new InputSource(systemId)); 
  }

  public void startElement(String uri, String localName, 
   String qualifiedName, Attributes attributes) throws SAXException {
              System.out.println("startElement");

    Vector extraText = new Vector();

    // Are there any unparsed entities in the attributes?
    for (int i = 0; i < attributes.getLength(); i++) {
      if (attributes.getType(i).equals("ENTITY")) {
        try {
          System.out.println("replacing");
          String s = replacer.getText(attributes.getValue(i));
          if (s != null) extraText.addElement(s);
        }
        catch (IOException e) {
          System.err.println(e); 
        }
      } 
      
    }    

    super.startElement(uri, localName, qualifiedName, attributes);
    
    // Now spew out the values of the unparsed entities:
    Enumeration e = extraText.elements();
    while (e.hasMoreElements()) {
      Object o = e.nextElement();
      String s = (String) o;
      super.characters(s.toCharArray(), 0, s.length()); 
    }
    
  }

}

TextMerger

import org.xml.sax.*;
import org.xml.sax.helpers.*;
import java.util.*;
import java.io.IOException;
import org.apache.xml.serialize.*;


public class TextMerger {

  public static void main(String[] args) {
  
    XMLReader base;
    try {
     base = XMLReaderFactory.createXMLReader(
      "org.apache.xerces.parsers.SAXParser");
    }
    catch (Exception e) {
      // fall back on default parser
      try {
        base = XMLReaderFactory.createXMLReader();
      }
      catch (Exception ee) {
        System.err.println("Couldn't locate a SAX parser");
        return;          
      }
    }
    
    XMLReader parser = new UnparsedTextFilter(base);
    
    //essentially a pretty printer
    XMLSerializer printer 
     = new XMLSerializer(System.out, new OutputFormat());
    
    base.setContentHandler(printer);
    
    for (int i = 0; i < args.length; i++) {
      try {
        System.out.println("Parsing " + args[i]);
        parser.parse(args[i]);
      }
      catch (SAXParseException e) { // well-formedness error
        System.out.println(args[i] + " is not well formed.");
        System.out.println(e.getMessage()
         + " at line " + e.getLineNumber() 
         + ", column " + e.getColumnNumber());
      }
      catch (SAXException e) { // some other kind of error
        System.out.println(e.getMessage());
      }
      catch (IOException e) {
        System.out.println("Could not report on " + args[i] 
         + " because of the IOException " + e);
      }      
    } // end for
    System.out.flush();
  
  }

}

InputSource


The InputSource interface

package org.xml.sax;

import java.io.*;

public class InputSource {

  public InputSource() 
  public InputSource(String systemID) 
  public InputSource(InputStream in)
  public InputSource(Reader in)

  public void   setPublicId(String publicID)
  public String getPublicId()
  public void   setSystemId(String systemID)
  public String getSystemId()

  public void        setByteStream(InputStream byteStream)
  public InputStream getByteStream()
  public void        setEncoding(String encoding)
  public String      getEncoding()
  public void        setCharacterStream(Reader characterStream)
  public Reader      getCharacterStream()

}

Example of InputSource

import org.xml.sax;
import java.io.*;
import java.net.*;
import java.util.zip.*;
...
try {

  URL u = new URL(
   "http://www.cafeconleche.org/examples/1998validstats.xml.gz"); 
  InputStream raw = u.openStream();
  InputStream decompressed = new GZIPInputStream(raw);
  InputSource in = new InputSource(decompressed);
  // read the document... 

}
catch (IOException e) {
  System.err.println(e);
}
catch (SAXException e) {
  System.err.println(e);
}

What SAX2 doesn't do


Event Based API Caveats


To Learn More



Questions?


Part IV: DOM, The Document Object Model

The DOM (like XML) is not a triumph of elegance; it's a triumph of "if we do not hang together, we shall hang separately." At least the Browser Wars were not followed by API Wars. Better a common API that we all love to hate than a bazillion contending APIs that carve the Web up into contending enclaves of True Believers.

--Mike Champion on the xml-dev mailing list, Thursday, September 27, 2001


Where we're going


Trees


Document Object Model


DOM Evolution


DOM Implementations for Java


Eight Modules:


DOM Trees


org.w3c.dom


The DOM Process

  1. Library specific code creates a parser

  2. The parser parses the document and returns a DOM org.w3c.dom.Document object.

  3. The entire document is stored in memory.

  4. DOM methods and interfaces are used to extract data from this object


Parsing documents with a DOM Parser Example

import org.apache.xerces.parsers.*;
import org.w3c.dom.*;
import org.xml.sax.*;
import java.io.*;

public class DOMParserMaker {

  public static void main(String[] args) {
     
    // This is simpler but less flexible than the SAX approach.
    // Perhaps a good creational design pattern is needed here?   
  
    DOMParser parser = new DOMParser();
    
    for (int i = 0; i < args.length; i++) {
      try {
        // Read the entire document into memory
        parser.parse(args[i]); 
       
        Document d = parser.getDocument();
        // work with the document...
      }
      catch (SAXException e) {
        System.err.println(e); 
      }
      catch (IOException e) {
        System.err.println(e); 
      }
      
    }
   
  }

}

The JAXP Process

  1. javax.xml.parsers.DocumentBuilderFactory.newInstance() creates a DocumentBuilderFactory

  2. The factory's newBuilder() method creates a DocumentBuilder

  3. The builder parses the document and returns a DOM org.w3c.dom.Document object.

  4. The entire document is stored in memory.

  5. DOM methods and interfaces are used to extract data from this object


Parsing documents with a JAXP DocumentBuilder

import javax.xml.parsers.*;
import org.w3c.dom.*;
import org.xml.sax.*;
import java.io.*;

public class JAXPParserMaker {

  public static void main(String[] args) {
     
    try {       
      DocumentBuilderFactory builderFactory 
       = DocumentBuilderFactory.newInstance();
      DocumentBuilder parser 
       = builderFactory.newDocumentBuilder();
    
      for (int i = 0; i < args.length; i++) {
        try {
          // Read the entire document into memory
          Document d = parser.parse(args[i]); 
          // work with the document...
        }
        catch (SAXException e) {
        System.err.println(e); 
        }
        catch (IOException e) {
          System.err.println(e); 
        }
      
      } // end for
      
    }
    catch (ParserConfigurationException e) {
      System.err.println("You need to install a JAXP aware parser.");
    }
   
  }

}

The Node Interface

package org.w3c.dom;

public interface Node {

  // NodeType
  public static final short ELEMENT_NODE                = 1;
  public static final short ATTRIBUTE_NODE              = 2;
  public static final short TEXT_NODE                   = 3;
  public static final short CDATA_SECTION_NODE          = 4;
  public static final short ENTITY_REFERENCE_NODE       = 5;
  public static final short ENTITY_NODE                 = 6;
  public static final short PROCESSING_INSTRUCTION_NODE = 7;
  public static final short COMMENT_NODE                = 8;
  public static final short DOCUMENT_NODE               = 9;
  public static final short DOCUMENT_TYPE_NODE          = 10;
  public static final short DOCUMENT_FRAGMENT_NODE      = 11;
  public static final short NOTATION_NODE               = 12;

  public String       getNodeName();
  public String       getNodeValue() throws DOMException;
  public void         setNodeValue(String nodeValue) throws DOMException;
  public short        getNodeType();
  public Node         getParentNode();
  public NodeList     getChildNodes();
  public Node         getFirstChild();
  public Node         getLastChild();
  public Node         getPreviousSibling();
  public Node         getNextSibling();
  public NamedNodeMap getAttributes();
  public Document     getOwnerDocument();
  public Node         insertBefore(Node newChild, Node refChild) throws DOMException;
  public Node         replaceChild(Node newChild, Node oldChild) throws DOMException;
  public Node         removeChild(Node oldChild) throws DOMException;
  public Node         appendChild(Node newChild) throws DOMException;
  public boolean      hasChildNodes();
  public Node         cloneNode(boolean deep);
  public void         normalize();
  public boolean      supports(String feature, String version);
  public String       getNamespaceURI();
  public String       getPrefix();
  public void         setPrefix(String prefix) throws DOMException;
  public String       getLocalName();
  
}

The NodeList Interface

package org.w3c.dom;

public interface NodeList {
  public Node item(int index);
  public int  getLength();
}

Now we're really ready to read a document


Node Reporter

import javax.xml.parsers.*;
import org.w3c.dom.*;
import org.xml.sax.*;
import java.io.*;


public class NodeReporter {

  public static void main(String[] args) {
     
    try {       
      DocumentBuilderFactory builderFactory 
       = DocumentBuilderFactory.newInstance();
      DocumentBuilder parser 
       = builderFactory.newDocumentBuilder();
      NodeReporter iterator = new NodeReporter();
        
      for (int i = 0; i < args.length; i++) {
        try {
          // Read the entire document into memory
          Document doc = parser.parse(args[i]); 
          iterator.followNode(doc);
        }
        catch (SAXException ex) {
          System.err.println(args[i] + " is not well-formed."); 
        }
        catch (IOException ex) {
          System.err.println(ex); 
        }
      }
    }
    catch (ParserConfigurationException ex) {
      System.err.println("You need to install a JAXP aware parser.");
    }
  
  } // end main

  // note use of recursion
  public void followNode(Node node) {
    
    processNode(node);
    if (node.hasChildNodes()) {
      NodeList children = node.getChildNodes();
      for (int i = 0; i < children.getLength(); i++) {
        followNode(children.item(i));
      } 
    }
    
  }

  public void processNode(Node node) {
    
    String name = node.getNodeName();
    String type = getTypeName(node.getNodeType());
    System.out.println("Type " + type + ": " + name);
    
  }
  
  public static String getTypeName(int type) {
    
    switch (type) {
      case Node.ELEMENT_NODE: 
        return "Element";
      case Node.ATTRIBUTE_NODE: 
        return "Attribute";
      case Node.TEXT_NODE: 
        return "Text";
      case Node.CDATA_SECTION_NODE: 
        return "CDATA Section";
      case Node.ENTITY_REFERENCE_NODE: 
        return "Entity Reference";
      case Node.ENTITY_NODE: 
        return "Entity";
      case Node.PROCESSING_INSTRUCTION_NODE: 
        return "Processing Instruction";
      case Node.COMMENT_NODE : 
        return "Comment";
      case Node.DOCUMENT_NODE: 
        return "Document";
      case Node.DOCUMENT_TYPE_NODE: 
        return "Document Type Declaration";
      case Node.DOCUMENT_FRAGMENT_NODE: 
        return "Document Fragment";
      case Node.NOTATION_NODE: 
        return "Notation";
      default: 
        return "Unknown Type"; 
    }
    
  }

}

Node Reporter Output

% java NodeReporter hotcop.xml
Type Document: #document
Type Processing Instruction: xml-stylesheet
Type Document Type Declaration: SONG
Type Element: SONG
Type Text: #text
Type Element: TITLE
Type Text: #text
Type Text: #text
Type Element: PHOTO
Type Text: #text
Type Element: COMPOSER
Type Text: #text
Type Text: #text
Type Element: COMPOSER
Type Text: #text
Type Text: #text
Type Element: COMPOSER
Type Text: #text
Type Text: #text
Type Element: PRODUCER
Type Text: #text
Type Text: #text
Type Comment: #comment
Type Text: #text
Type Element: PUBLISHER
Type Text: #text
Type Text: #text
Type Element: LENGTH
Type Text: #text
Type Text: #text
Type Element: YEAR
Type Text: #text
Type Text: #text
Type Element: ARTIST
Type Text: #text
Type Text: #text
Type Comment: #comment

Attributes are missing from this output. They are not nodes. They are properties of nodes.


Node Values as returned by getNodeValue()

Node TypeNode Value
element nodenull
attribute nodeattribute value
text nodetext of the node
CDATA section nodetext of the section
entity reference nodenull
entity nodenull
processing instruction nodecontent of the processing instruction, not including the target
comment nodetext of the comment
document nodenull
document type declaration nodenull
document fragment nodenull
notation nodenull

The Document Node


The Document Interface

package org.w3c.dom;

  public interface Document extends Node {
  
    public DocumentType      getDoctype();
    public DOMImplementation getImplementation();
    public Element           getDocumentElement();
    public Element           createElement(String tagName) throws DOMException;
    public Element           createElementNS(String namespaceURI, String qualifiedName) throws DOMException;
    public DocumentFragment  createDocumentFragment();
    public Text              createTextNode(String data);
    public Comment           createComment(String data);
    public CDATASection      createCDATASection(String data) throws DOMException;
    public ProcessingInstruction createProcessingInstruction(String target, String data)
     throws DOMException;
    public Attr            createAttribute(String name) throws DOMException;
    public Attr            createAttributeNS(String namespaceURI, String qualifiedName) throws DOMException;
    public EntityReference createEntityReference(String name) throws DOMException;
    public NodeList        getElementsByTagName(String tagname);
    public NodeList        getElementsByTagNameNS(String namespaceURI, String localName);
    public Element         getElementById(String elementId);
    public Node            importNode(Node importedNode, boolean deep) throws DOMException;
    
}

A Sample Application

Full list

DOM Design


Weblogs with DOM

import org.w3c.dom.*;
import org.xml.sax.SAXException;
import java.io.IOException;
import java.util.*;
import java.net.*;


public class WeblogsDOM {

  public static String DEFAULT_URL
   = "http://static.userland.com/weblogMonitor/logs.xml";

  public static List listChannels() throws DOMException {
    return listChannels(DEFAULT_URL);
  }

  public static List listChannels(String uri) throws DOMException {

    if (uri == null) {
      throw new NullPointerException("URL must be non-null");
    }

    org.apache.xerces.parsers.DOMParser parser
     = new org.apache.xerces.parsers.DOMParser();

    Vector urls = null;

    try {
      // Read the entire document into memory
      parser.parse(uri);
      Document doc = parser.getDocument();
      NodeList logs = doc.getElementsByTagName("url");

      urls = new Vector(logs.getLength());

      for (int i = 0; i < logs.getLength(); i++) {
        try {
          Node element = logs.item(i);
          Node text = element.getFirstChild();
          String content = text.getNodeValue();
          URL u = new URL(content);
          urls.addElement(u);
        }
        catch (MalformedURLException e) {
          // bad input data from one third party; just ignore it
        }
      }
    }
    catch (SAXException e) {
      System.err.println(e);
    }
    catch (IOException e) {
      System.err.println(e);
    }

    return urls;

  }

  public static void main(String[] args) {

    try {
      List urls;
      if (args.length > 0) {
        try {
          URL url = new URL(args[0]);
          urls = listChannels(args[0]);
        }
        catch (MalformedURLException e) {
          System.err.println("Usage: java WeblogsDOM url");
          return;
        }
      }
      else {
        urls = listChannels();
      }
      Iterator iterator = urls.iterator();
      while (iterator.hasNext()) {
        System.out.println(iterator.next());
      }
    }
    catch (/* Unexpected */ Exception e) {
      e.printStackTrace();
    }

  } // end main

}

Weblogs Output

% java WeblogsDOM
http://2020Hindsight.editthispage.com/
http://www.sff.net/people/mitchw/weblog/weblog.htp
http://nate.weblogs.com/
http://plugins.launchpoint.net
http://404.psistorm.net
http://home.att.net/~geek9000
http://daubnet.tzo.com/weblog
several hundred more...

Element Nodes


The Element Interface

package org.w3c.dom;

public interface Element extends Node {

  public String   getTagName();

  public NodeList getElementsByTagName(String name);
  public NodeList getElementsByTagNameNS(String namespaceURI, 
   String localName);

  public String   getAttribute(String name);
  public String   getAttributeNS(String namespaceURI, 
   String localName);
  public void     setAttribute(String name, String value) 
   throws DOMException;
  public void     setAttributeNS(String namespaceURI, 
   String qualifiedName, String value) throws DOMException;
  public void     removeAttribute(String name) throws DOMException;
  public void     removeAttributeNS(String namespaceURI, 
   String localName) throws DOMException;
  public Attr     getAttributeNode(String name);
  public Attr     getAttributeNodeNS(String namespaceURI, String localName);
  public Attr     setAttributeNode(Attr newAttr) throws DOMException;
  public Attr     setAttributeNodeNS(Attr newAttr) throws DOMException;
  public Attr     removeAttributeNode(Attr oldAttr) throws DOMException;

}

IDTagger

import org.apache.xerces.parsers.DOMParser;
import org.w3c.dom.*;
import org.xml.sax.*;
import java.io.IOException;
import org.apache.xml.serialize.*;


public class IDTagger {

  int id = 1;

  public void processNode(Node node) {
    
    if (node.getNodeType() == Node.ELEMENT_NODE) {
      
      Element element = (Element) node;
      String currentID = element.getAttribute("ID");
      if (currentID == null || currentID.equals("")) {
        element.setAttribute("ID", "_" + id);
        id = id + 1; 
      }
    }
    
  }

  // note use of recursion
  public void followNode(Node node) {
    
    processNode(node);
    if (node.hasChildNodes()) {
      NodeList children = node.getChildNodes();
      for (int i = 0; i < children.getLength(); i++) {
        followNode(children.item(i));
      } 
    }
    
  }

  public static void main(String[] args) {
     
    DOMParser parser  = new DOMParser();
    IDTagger iterator = new IDTagger();
    
    for (int i = 0; i < args.length; i++) {
      try {
        // Read the entire document into memory
        parser.parse(args[i]); 
       
        Document document = parser.getDocument();
        iterator.followNode(document);
        
        // now we serialize the document...
        OutputFormat format = new OutputFormat(document);
        XMLSerializer serializer 
         = new XMLSerializer(System.out, format);
        serializer.serialize(document);       
        
      }
      catch (SAXException e) {
        System.err.println(e); 
      }
      catch (IOException e) {
        System.err.println(e); 
      }
      
    }
  
  } // end main

}

Output from IDTagger

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE SONG SYSTEM "song.dtd">
<?xml-stylesheet type="text/css" href="song.css"?><!-- This should be a four digit year like "1999",
     not a two-digit year like "99" --><SONG xmlns="http://www.cafeconleche.org/namespace/song" ID="_1" xmlns:xlink="http://www.w3.org/1999/xlink">   <TITLE ID="_2">Hot Cop</TITLE>   <PHOTO ALT="Victor Willis in Cop Outfit" HEIGHT="200" ID="_3" WIDTH="100" xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="hotcop.jpg" xlink:show="onLoad" xlink:type="simple"/>   <COMPOSER ID="_4">Jacques Morali</COMPOSER>   <COMPOSER ID="_5">Henri Belolo</COMPOSER>   <COMPOSER ID="_6">Victor Willis</COMPOSER>   <PRODUCER ID="_7">Jacques Morali</PRODUCER>   <!-- The publisher is actually Polygram but I needed 
       an example of a general entity reference. -->   <PUBLISHER ID="_8" xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="http://www.amrecords.com/" xlink:type="simple">     A &amp; M Records   </PUBLISHER>   <LENGTH ID="_9">6:20</LENGTH>   <YEAR ID="_10">1978</YEAR>   <ARTIST ID="_11">Village People</ARTIST> </SONG><!-- You can tell what album I was 
     listening to when I wrote this example -->
View Output in Browser

CharacterData interface


The CharacterData Interface

package org.w3c.dom;

public interface CharacterData extends Node {

  public String getData() throws DOMException;
  public void   setData(String data) throws DOMException;
  public int    getLength();
  public String substringData(int offset, int count) 
   throws DOMException;
  public void   appendData(String arg) 
   throws DOMException;
  public void   insertData(int offset, String arg) 
   throws DOMException;
  public void   deleteData(int offset, int count) 
   throws DOMException;
  public void   replaceData(int offset, int count, String arg) 
   throws DOMException;
  
}

ROT13 XML Text

import org.apache.xerces.parsers.DOMParser;
import org.apache.xml.serialize.*;
import org.w3c.dom.*;
import org.xml.sax.SAXException;
import java.io.IOException;


public class ROT13XML {

  public void processNode(Node node) {
    
    if (node.getNodeType() == Node.TEXT_NODE
     || node.getNodeType() == Node.COMMENT_NODE
     || node.getNodeType() == Node.CDATA_SECTION_NODE) {
      CharacterData text = (CharacterData) node;
      String data = text.getData();
      text.setData(rot13(data));
    }
    
  }

  // note use of recursion
  public void followNode(Node node) {
    
    processNode(node);
    if (node.hasChildNodes()) {
      NodeList children = node.getChildNodes();
      for (int i = 0; i < children.getLength(); i++) {
        followNode(children.item(i));
      } 
    }
    
  }
  
  public static String rot13(String s) {
    
    StringBuffer result = new StringBuffer(s.length());
    for (int i = 0; i < s.length(); i++) {
      int c = s.charAt(i);
      if (c >= 'A' && c <= 'M') result.append((char) (c+13));
      else if (c >= 'N' && c <= 'Z') result.append((char) (c-13));
      else if (c >= 'a' && c <= 'm') result.append((char) (c+13));
      else if (c >= 'n' && c <= 'z') result.append((char) (c-13));
      else result.append((char) c);
      
    } 
    return result.toString();
    
  }

  public static void main(String[] args) {
     
    DOMParser parser   = new DOMParser();
    ROT13XML  iterator = new ROT13XML();
    
    for (int i = 0; i < args.length; i++) {
      try {
        // Read the entire document into memory
        parser.parse(args[i]); 
       
        Document document = parser.getDocument();
        iterator.followNode(document);
        
        // now we serialize the document...
        OutputFormat format = new OutputFormat(document);
        XMLSerializer serializer 
         = new XMLSerializer(System.out, format);
        serializer.serialize(document);
               
      }
      catch (SAXException e) {
        System.err.println(e); 
      }
      catch (IOException e) {
        System.err.println(e); 
      }
      
    }
  
  } // end main

}

ROT13 XML Output

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE SONG SYSTEM "song.dtd">
<?xml-stylesheet type="text/css" href="song.css"?>
<SONG xmlns="http://metalab.unc.edu/xml/namespace/song"
xmlns:xlink="http://www.w3.org/1999/xlink">   <TITLE>Ubg Pbc</TITLE>
<PHOTO ALT="Victor Willis in Cop Outfit" HEIGHT="200" WIDTH="100"
xlink:href="hotcop.jpg" xlink:show="onLoad" xlink:type="simple"/>
<COMPOSER>Wnpdhrf Zbenyv</COMPOSER>   <COMPOSER>Uraev Orybyb</COMPOSER>
<COMPOSER>Ivpgbe Jvyyvf</COMPOSER>   <PRODUCER>Wnpdhrf Zbenyv</PRODUCER>
<!-- Gur choyvfure vf npghnyyl Cbyltenz ohg V arrqrq         na rknzcyr
bs n trareny ragvgl ersrerapr. -->   <PUBLISHER
xlink:href="http://www.amrecords.com/" xlink:type="simple">     N &amp;
Z Erpbeqf   </PUBLISHER>   <LENGTH>6:20</LENGTH>   <YEAR>1978</YEAR>
<ARTIST>Ivyyntr Crbcyr</ARTIST> </SONG>
<!-- Lbh pna gryy jung nyohz V jnf 
     yvfgravat gb jura V jebgr guvf rknzcyr -->

Text Nodes


The Text Interface

package org.w3c.dom;

public interface Text extends CharacterData {

  public Text splitText(int offset) throws DOMException;
  
}

CDATA section Nodes


The CDATASection Interface

package org.w3c.dom;

public interface CDATASection extends Text {
}

DocumentType Nodes


The DocumentType Interface

package org.w3c.dom;

public interface DocumentType extends Node {

  public String       getName();
  public NamedNodeMap getEntities();
  public NamedNodeMap getNotations();
  public String       getPublicId();
  public String       getSystemId();
  public String       getInternalSubset();
  
}

Example of the DocumentType Interface


XHTMLValidator

import org.w3c.dom.*;
import javax.xml.parsers.*;
import java.io.*;
import org.xml.sax.*;


public class XHTMLValidator {

  public static void main(String[] args) {
    
    if (args.length == 0) {
       System.err.println("Usage: java XHTMLValidator URL");
       return;   
    }

    try {
      DocumentBuilderFactory builderFactory 
       = DocumentBuilderFactory.newInstance();
      builderFactory.setNamespaceAware(true);
      builderFactory.setValidating(true);
      DocumentBuilder parser 
       = builderFactory.newDocumentBuilder();
      parser.setErrorHandler(new ValidityErrorReporter());

      Document document;
      try {
        document = parser.parse(args[0]); 
        // ValidityErrorReporter prints any validity errors detected
      }
      catch (SAXException e) {  
        System.out.println(args[0] + " is not valid."); 
        return; 
      }
      
      // If we get this far, then the document is valid XML.
      // Check to see whether the document is actually XHTML    
      DocumentType doctype = document.getDoctype();
  
      if (doctype == null) {
        System.out.println("No DOCTYPE"); 
        return;
      }
  
      String name     = doctype.getName();
      String systemID = doctype.getSystemId();
      String publicID = doctype.getPublicId();
    
      if (!name.equals("html")) {
        System.out.println("Incorrect root element name " + name); 
      }
  
      if (publicID == null
       || (!publicID.equals("-//W3C//DTD XHTML 1.0 Strict//EN")
           && !publicID.equals(
                "-//W3C//DTD XHTML 1.0 Transitional//EN")
           && !publicID.equals(
                "-//W3C//DTD XHTML 1.0 Frameset//EN"))) {
        System.out.println(args[0] 
         + " does not seem to use an XHTML 1.0 DTD");
      }
  
      // Check the namespace on the root element
      Element root = document.getDocumentElement();
      String xmlnsValue = root.getAttribute("xmlns");
      if (!xmlnsValue.equals("http://www.w3.org/1999/xhtml")) {
        System.out.println(args[0] 
         + " does not properly declare the"
         + " http://www.w3.org/1999/xhtml"
         + " namespace on the root element");        
      }
      
      System.out.println(args[0] + " is valid XHTML.");
      
    }
    catch (IOException e) {
      System.err.println("Could not read " + args[0]);
    }
    catch (Exception e) {
      System.err.println(e);
      e.printStackTrace();
    }
    
  }

}

EntityReference Nodes


The EntityReference Interface

package org.w3c.dom;

public interface EntityReference extends Node {

}

Attr Nodes


The Attr Interface

package org.w3c.dom;

public interface Attr extends Node {

  public String   getName();
  public boolean  getSpecified();
  public String   getValue();
  public void     setValue(String value) throws DOMException;
  public Element  getOwnerElement();
  
}

XLinkSpider with DOM

import org.xml.sax.*;
import java.io.*;
import java.util.*;
import org.w3c.dom.*;
import javax.xml.parsers.*;


public class DOMSpider {

  private static DocumentBuilder parser;
  
  // namespace support is turned off by default in JAXP
  static {
    try {
      DocumentBuilderFactory builderFactory 
       = DocumentBuilderFactory.newInstance();
      builderFactory.setNamespaceAware(true);
      parser = builderFactory.newDocumentBuilder();
    }
    catch (Exception ex) {
      throw new RuntimeException("Couldn't build a parser!");
    }
  }
  
  private static Vector visited = new Vector();
  
  private static int maxDepth = 5;
  private static int currentDepth = 0; 
  
  public static void listURIs(String systemId) {
    
    currentDepth++;
    try {
      if (currentDepth < maxDepth) {
        Document document = parser.parse(systemId);
    
        Vector uris = new Vector();
        // search the document for uris, 
        // store them in vector, and print them
        searchForURIs(document.getDocumentElement(), uris); 
    
        Enumeration e = uris.elements();
        while (e.hasMoreElements()) {
          String uri = (String) e.nextElement();
          visited.addElement(uri);
          listURIs(uri); 
        }
      
      }
    
    }
    catch (SAXException e) {
      // couldn't load the document, 
      // probably not well-formed XML, skip it 
    }
    catch (IOException e) {
      // couldn't load the document, 
      // likely network failure, skip it 
    }
    finally { 
      currentDepth--;
      System.out.flush();     
    }
      
  }
  
  // use recursion 
  public static void searchForURIs(Element element, Vector uris) {
    
    // look for XLinks in this element
    String uri = element.getAttributeNS("http://www.w3.org/1999/xlink", "href");

    if (uri != null && !uri.equals("") 
         && !visited.contains(uri) 
         && !uris.contains(uri)) {
      System.out.println(uri);
      uris.addElement(uri);
    }
    
    // process child elements recursively
    NodeList children = element.getChildNodes();
    for (int i = 0; i < children.getLength(); i++) {
      Node n = children.item(i);
      if (n instanceof Element) {
        searchForURIs((Element) n, uris);
      } 
    }
    
  }
  

  public static void main(String[] args) {
    
    if (args.length == 0) {
      System.out.println("Usage: java DOMSpider URL1 URL2..."); 
    } 
      
    // start parsing... 
    for (int i = 0; i < args.length; i++) {
      
      try {
        listURIs(args[i]);
      }
      catch (Exception e) {
        System.err.println(e);
        e.printStackTrace(); 
      }
      
    } // end for
  
  } // end main

} // end DOMSpider

ProcessingInstruction Nodes


The ProcessingInstruction Interface

package org.w3c.dom;

public interface ProcessingInstruction extends Node {

  public String getTarget();
  public String getData();
  public void   setData(String data) throws DOMException;
  
}

XLinkSpider that Respects robots processing instruction

import org.xml.sax.*;
import java.io.*;
import java.util.*;
import org.w3c.dom.*;
import javax.xml.parsers.*;


public class PoliteDOMSpider {

  private static DocumentBuilder parser;
  
  // namespace support is turned off by default in JAXP
  static {
    try {
      DocumentBuilderFactory builderFactory 
       = DocumentBuilderFactory.newInstance();
      builderFactory.setNamespaceAware(true);
      parser = builderFactory.newDocumentBuilder();
    }
    catch (Exception ex) {
      throw new RuntimeException("Couldn't build a parser!");
    }
  }
  
  private static Vector visited = new Vector();
  
  private static int maxDepth = 5;
  private static int currentDepth = 0; 

  public static boolean robotsAllowed(Document document) {
    
    NodeList children = document.getChildNodes();
    for (int i = 0; i < children.getLength(); i++) {
      Node n = children.item(i);
      if (n instanceof ProcessingInstruction) {
        ProcessingInstruction pi = (ProcessingInstruction) n;
        if (pi.getTarget().equals("robots")) {
          String data = pi.getData();
          if (data.indexOf("follow=\"no\"") >= 0) {
            return false; 
          } 
        }
      }
    }
    
    return true;
    
  }
  
  public static void listURIs(String systemId) {
    
    currentDepth++;
    try {
      if (currentDepth < maxDepth) {
        Document document = parser.parse(systemId);
    
        if (robotsAllowed(document)) {
          Vector uris = new Vector();
          // search the document for uris,
          // store them in vector, print them
          searchForURIs(document.getDocumentElement(), uris);
    
          Enumeration e = uris.elements();
          while (e.hasMoreElements()) {
            String uri = (String) e.nextElement();
            visited.addElement(uri);
            listURIs(uri); 
          }
          
        }
      
      }
    
    }
    catch (SAXException e) {
      // couldn't load the document, 
      // probably not well-formed XML, skip it 
    }
    catch (IOException e) {
      // couldn't load the document, 
      // likely network failure, skip it 
    }
    finally { 
      currentDepth--;
      System.out.flush();     
    }
      
  }
  
  // use recursion 
  public static void searchForURIs(Element element, Vector uris) {
    
    // look for XLinks in this element
    String uri = element.getAttributeNS("http://www.w3.org/1999/xlink", "href");

    if (uri != null && !uri.equals("") 
         && !visited.contains(uri) 
         && !uris.contains(uri)) {
      System.out.println(uri);
      uris.addElement(uri);
    }
    
    // process child elements recursively
    NodeList children = element.getChildNodes();
    for (int i = 0; i < children.getLength(); i++) {
      Node n = children.item(i);
      if (n instanceof Element) {
        searchForURIs((Element) n, uris);
      } 
    }
    
  }
  

  public static void main(String[] args) {
    
    if (args.length == 0) {
      System.out.println("Usage: java PoliteDOMSpider URL1 URL2..."); 
    } 
      
    // start parsing... 
    for (int i = 0; i < args.length; i++) {
      
      try {
        listURIs(args[i]);
      }
      catch (Exception e) {
        System.err.println(e);
        e.printStackTrace(); 
      }
      
    } // end for
  
  } // end main

} // end PoliteDOMSpider

Comment Nodes


The Comment Interface

package org.w3c.dom;

public interface Comment extends CharacterData {
}

Comment Example

import org.apache.xerces.parsers.*;
import org.w3c.dom.*;
import org.xml.sax.*;
import java.io.*;


public class DOMCommentReader {

  public static void main(String[] args) {
     
    DOMParser parser = new DOMParser();
    
    for (int i = 0; i < args.length; i++) {
      try {
        // Read the entire document into memory
        parser.parse(args[i]); 
       
        Document d = parser.getDocument();
        processNode(d);
      }
      catch (SAXException e) {
        System.err.println(e); 
      }
      catch (IOException e) {
        System.err.println(e); 
      }
      
    }
  
  } // end main

  // note use of recursion
  public static void processNode(Node node) {
    
    int type = node.getNodeType();
    if (type == Node.COMMENT_NODE) {
      System.out.println(node.getNodeValue());
      System.out.println();
    }
    else {
      if (node.hasChildNodes()) {
        NodeList children = node.getChildNodes();
        for (int i = 0; i < children.getLength(); i++) {
          processNode(children.item(i));
        } 
      }
    }
    
  }

}

DOMCommentReader Output

% java DOMCommentReader hotcop.xml
 The publisher is actually Polygram but I needed
       an example of a general entity reference.

 You can tell what album I was
     listening to when I wrote this example

Or try http://www.w3.org/TR/1998/REC-xml-19980210.xml for more interesting output


Entity Nodes


The Entity Interface

package org.w3c.dom;

public interface Entity extends Node {

  public String  getPublicId();
  public String  getSystemId();
  public String  getNotationName();
  
}

DOMException


Questions?


The org.w3c.dom.traversal Package

Four interfaces:


NodeIterator

package org.w3c.dom.traversal;

public interface NodeIterator {

  public int        getWhatToShow();
  public NodeFilter getFilter();
  public boolean    getExpandEntityReferences();
  public Node       nextNode() throws DOMException;
  public Node       previousNode() throws DOMException;
  public void       detach();
    
}

ValueReporter

import org.apache.xerces.parsers.*;
import org.apache.xerces.dom.*;
import org.w3c.dom.*;
import org.w3c.dom.traversal.*;
import org.xml.sax.*;
import java.io.*;


public class ValueReporter {

  public static void main(String[] args) {
     
    DOMParser parser = new DOMParser();
    
    for (int i = 0; i < args.length; i++) {
      try {
        // Read the entire document into memory
        parser.parse(args[i]); 
       
        Document doc = parser.getDocument();
        DocumentImpl impl = (DocumentImpl) doc;
        NodeIterator iterator = impl.createNodeIterator(
         doc.getDocumentElement(), NodeFilter.SHOW_ALL, null, true
        );
        Node node;
        while ((node = iterator.nextNode()) != null) {
          processNode(node);      
        }
      }
      catch (SAXException e) {
        System.err.println(e); 
      }
      catch (IOException e) {
        System.err.println(e); 
      }
      
    }
  
  } // end main

  public static void processNode(Node node) {
    
    String name = node.getNodeName();
    String type = getTypeName(node.getNodeType());
    String value = node.getNodeValue();
    System.out.println("Type " + type + ": " + name 
     + " \"" + value + "\"");
    
  }
  
  public static String getTypeName(int type) {
    
    switch (type) {
      case Node.ELEMENT_NODE: 
        return "Element";
      case Node.ATTRIBUTE_NODE: 
        return "Attribute";
      case Node.TEXT_NODE: 
        return "Text";
      case Node.CDATA_SECTION_NODE: 
        return "CDATA Section";
      case Node.ENTITY_REFERENCE_NODE: 
        return "Entity Reference";
      case Node.ENTITY_NODE: 
        return "Entity";
      case Node.PROCESSING_INSTRUCTION_NODE: 
        return "Processing Instruction";
      case Node.COMMENT_NODE: 
        return "Comment";
      case Node.DOCUMENT_NODE: 
        return "Document";
      case Node.DOCUMENT_TYPE_NODE: 
        return "Document Type Declaration";
      case Node.DOCUMENT_FRAGMENT_NODE: 
        return "Document Fragment";
      case Node.NOTATION_NODE: 
        return "Notation";
      default: 
        return "Unknown Type"; 
    }
    
  }

}

ValueReporter Output

% java ValueReporter hotcop.xml
Type Element: SONG "null"
Type Text: #text "
  "
Type Element: TITLE "null"
Type Text: #text "Hot Cop"
Type Text: #text "
  "
Type Element: PHOTO "null"
Type Text: #text "
  "
Type Element: COMPOSER "null"
Type Text: #text "Jacques Morali"
Type Text: #text "
  "
Type Element: COMPOSER "null"
Type Text: #text "Henri Belolo"
Type Text: #text "
  "
Type Element: COMPOSER "null"
Type Text: #text "Victor Willis"
Type Text: #text "
  "
Type Element: PRODUCER "null"
Type Text: #text "Jacques Morali"
Type Text: #text "
  "
Type Comment: #comment " The publisher is actually Polygram but I needed
       an example of a general entity reference. "
Type Text: #text "
  "
Type Element: PUBLISHER "null"
Type Text: #text "
    A & M Records
  "
Type Text: #text "
  "
Type Element: LENGTH "null"
Type Text: #text "6:20"
Type Text: #text "
  "
Type Element: YEAR "null"
Type Text: #text "1978"
Type Text: #text "
  "
Type Element: ARTIST "null"
Type Text: #text "Village People"
Type Text: #text "
"

Attributes are missing from this output. They are not children. They are properties of nodes.


NodeFilter

package org.w3c.dom.traversal;

public interface NodeFilter {

  // Constants returned by acceptNode
  public static final short FILTER_ACCEPT             = 1;
  public static final short FILTER_REJECT             = 2;
  public static final short FILTER_SKIP               = 3;

  // Constants for whatToShow
  public static final int   SHOW_ALL                  = 0x0000FFFF;
  public static final int   SHOW_ELEMENT              = 0x00000001;
  public static final int   SHOW_ATTRIBUTE            = 0x00000002;
  public static final int   SHOW_TEXT                 = 0x00000004;
  public static final int   SHOW_CDATA_SECTION        = 0x00000008;
  public static final int   SHOW_ENTITY_REFERENCE     = 0x00000010;
  public static final int   SHOW_ENTITY               = 0x00000020;
  public static final int   SHOW_PROCESSING_INSTRUCTION = 0x00000040;
  public static final int   SHOW_COMMENT              = 0x00000080;
  public static final int   SHOW_DOCUMENT             = 0x00000100;
  public static final int   SHOW_DOCUMENT_TYPE        = 0x00000200;
  public static final int   SHOW_DOCUMENT_FRAGMENT    = 0x00000400;
  public static final int   SHOW_NOTATION             = 0x00000800;

  public short        acceptNode(Node n);
    
}

DOM based TagStripper

import org.apache.xerces.parsers.*;
import org.apache.xerces.dom.*;
import org.w3c.dom.*;
import org.w3c.dom.traversal.*;
import org.xml.sax.SAXException;
import java.io.IOException;


public class DOMTagStripper {

  public static void main(String[] args) {
     
    DOMParser parser = new DOMParser();
    
    for (int i = 0; i < args.length; i++) {
      try {
        // Read the entire document into memory
        parser.parse(args[i]); 
       
        Document doc = parser.getDocument();
        DocumentImpl impl = (DocumentImpl) doc;
        NodeIterator iterator = impl.createNodeIterator(
         doc.getDocumentElement(), NodeFilter.SHOW_TEXT, null, true
        );
        Node node;
        while ((node = iterator.nextNode()) != null) {
          System.out.print(node.getNodeValue());      
        }
      }
      catch (SAXException e) {
        System.err.println(e); 
      }
      catch (IOException e) {
        System.err.println(e); 
      }
      
    }
  
  } // end main

}

Output from a DOM based TagStripper

% java DOMTagStripper hotcop.xml

  Hot Cop
  Jacques Morali
  Henri Belolo
  Victor Willis
  Jacques Morali

  A & M Records
  6:20
  1978
  Village People

Writing XML Documents with DOM


org.apache.xerces.dom.DOMImplementationImpl

package org.apache.xerces.dom;

public class DOMImplementationImpl implements DOMImplementation {

  public boolean hasFeature(String feature, String version) 
  
  public static DOMImplementation getDOMImplementation()
  
  public DocumentType createDocumentType(String qualifiedName, 
   String publicID, String systemID, String internalSubset)
                                          
  public Document createDocument(String namespaceURI, 
   String qualifiedName, DocumentType doctype)
   throws DOMException

} 

A Xerces/DOM program that writes Fibonacci numbers into an XML document

import java.math.BigInteger;
import java.io.*;
import org.w3c.dom.*;
import org.apache.xerces.dom.*;


public class FibonacciDOM {

  public static void main(String[] args) {

    try {

      DOMImplementation impl 
       = DOMImplementationImpl.getDOMImplementation();

      Document fibonacci 
       = impl.createDocument(null, "Fibonacci_Numbers", null);

      BigInteger low  = BigInteger.ONE;
      BigInteger high = BigInteger.ONE;

      Element root = fibonacci.getDocumentElement();

      for (int i = 1; i <= 25; i++) {
        Element number = fibonacci.createElement("fibonacci");
        number.setAttribute("index", Integer.toString(i));
        Text text = fibonacci.createTextNode(low.toString());
        number.appendChild(text);
        root.appendChild(number);

        BigInteger temp = high;
        high = high.add(low);
        low = temp;
      }

      // Now the document has been created and exists in memory
    }
    catch (DOMException e) {
      e.printStackTrace();
    }

  }

}

A JAXP/DOM program that writes Fibonacci numbers into an XML document

import java.math.BigInteger;
import java.io.*;
import org.w3c.dom.*;
import javax.xml.parsers.*;


public class FibonacciJAXP {

  public static void main(String[] args) {

    try {       
      DocumentBuilderFactory factory 
       = DocumentBuilderFactory.newInstance();
      DocumentBuilder builder = factory.newDocumentBuilder();
      DOMImplementation impl = builder.getDOMImplementation();

      Document fibonacci 
       = impl.createDocument(null, "Fibonacci_Numbers", null);

      BigInteger low  = BigInteger.ONE;
      BigInteger high = BigInteger.ONE;

      Element root = fibonacci.getDocumentElement();

      for (int i = 1; i <= 25; i++) {
        Element number = fibonacci.createElement("fibonacci");
        number.setAttribute("index", Integer.toString(i));
        Text text = fibonacci.createTextNode(low.toString());
        number.appendChild(text);
        root.appendChild(number);

        BigInteger temp = high;
        high = high.add(low);
        low = temp;
      }

      // Now the document has been created and exists in memory
    }
    catch (DOMException e) {
      e.printStackTrace();
    }
    catch (ParserConfigurationException e) {
      System.err.println("You need to install a JAXP aware DOM implementation.");
    }
    
  }

}

Serialization


A DOM program that writes Fibonacci numbers onto System.out

import java.math.BigInteger;
import java.io.*;
import org.w3c.dom.*;
import org.apache.xerces.dom.*;
import org.apache.xml.serialize.*; 


public class FibonacciDOMSerializer {

  public static void main(String[] args) {
   
    try {
      
      DOMImplementation impl 
       = DOMImplementationImpl.getDOMImplementation();

      Document fibonacci 
       = impl.createDocument(null, "Fibonacci_Numbers", null);
      
      BigInteger low  = BigInteger.ONE;
      BigInteger high = BigInteger.ONE;      
      
      Element root = fibonacci.getDocumentElement(); 

      for (int i = 1; i <= 25; i++) {
        Element number = fibonacci.createElement("fibonacci");
        number.setAttribute("index", Integer.toString(i));
        Text text = fibonacci.createTextNode(low.toString());
        number.appendChild(text);
        root.appendChild(number);
        
        BigInteger temp = high;
        high = high.add(low);
        low = temp;
      }
      
      try {
        // Now that the document is created we need to *serialize* it
        OutputFormat format = new OutputFormat(fibonacci);
        XMLSerializer serializer 
         = new XMLSerializer(System.out, format);
        serializer.serialize(fibonacci);
      }
      catch (IOException e) {
        System.err.println(e); 
      }
    }
    catch (DOMException e) {
      e.printStackTrace();
    }

  }

}

fibonacci.xml

<?xml version="1.0" encoding="UTF-8"?>
<Fibonacci_Numbers><fibonacci index="0">0</fibonacci><fibonacci index="1">1</fibonacci><fibonacci index="2">1</fibonacci><fibonacci index="3">2</fibonacci><fibonacci index="4">3</fibonacci><fibonacci index="5">5</fibonacci><fibonacci index="6">8</fibonacci><fibonacci index="7">13</fibonacci><fibonacci index="8">21</fibonacci><fibonacci index="9">34</fibonacci><fibonacci index="10">55</fibonacci><fibonacci index="11">89</fibonacci><fibonacci index="12">144</fibonacci><fibonacci index="13">233</fibonacci><fibonacci index="14">377</fibonacci><fibonacci index="15">610</fibonacci><fibonacci index="16">987</fibonacci><fibonacci index="17">1597</fibonacci><fibonacci index="18">2584</fibonacci><fibonacci index="19">4181</fibonacci><fibonacci index="20">6765</fibonacci><fibonacci index="21">10946</fibonacci><fibonacci index="22">17711</fibonacci><fibonacci index="23">28657</fibonacci><fibonacci index="24">46368</fibonacci><fibonacci index="25">75025</fibonacci></Fibonacci_Numbers>

OutputFormat

package org.apache.xml.serialize;

public class OutputFormat extends Object {

  public OutputFormat()
  public OutputFormat(String method, 
   String encoding, boolean indenting)
  public OutputFormat(Document doc)
  public OutputFormat(Document doc, 
   String encoding, boolean indenting)
  
  public String   getMethod()
  public void     setMethod(String method)
  public String   getVersion()
  public void     setVersion(String version)
  public int      getIndent()
  public boolean  getIndenting()
  public void     setIndent(int indent)
  public void     setIndenting(boolean on)
  public String   getEncoding()
  public void     setEncoding(String encoding)
  public String   getMediaType()
  public void     setMediaType(String mediaType)
  public void     setDoctype(String publicID, String systemID)
  public String   getDoctypePublic()
  public String   getDoctypeSystem()
  public boolean  getOmitXMLDeclaration()
  public void     setOmitXMLDeclaration(boolean omit)
  public boolean  getStandalone()
  public void     setStandalone(boolean standalone)
  public String[] getCDataElements()
  public boolean  isCDataElement(String tagName)
  public void     setCDataElements(String[] cdataElements)
  public String[] getNonEscapingElements()
  public boolean  isNonEscapingElement(String tagName)
  public void     setNonEscapingElements(String[] nonEscapingElements)
  public String   getLineSeparator()
  public void     setLineSeparator(String lineSeparator)
  public boolean  getPreserveSpace()
  public void     setPreserveSpace(boolean preserve)
  public int      getLineWidth()
  public void     setLineWidth(int lineWidth)
  public char     getLastPrintable()
  
  public static String whichMethod(Document doc)
  public static String whichDoctypePublic(Document doc)
  public static String whichDoctypeSystem(Document doc)
  public static String whichMediaType(String method)
  
}

Better formatted output

 try {
  // Now that the document is created we need to *serialize* it
  OutputFormat format = new OutputFormat(fibonacci, "8859_1", true);
  format.setLineSeparator("\r\n");
  format.setLineWidth(72);
  format.setDoctype(null, "fibonacci.dtd");
  XMLSerializer serializer = new XMLSerializer(System.out, format);
  serializer.serialize(root);
}
catch (IOException e) {
  System.err.println(e); 
}

formatted_fibonacci.xml

<?xml version="1.0" encoding="8859_1"?>
<!DOCTYPE Fibonacci_Numbers SYSTEM "fibonacci.dtd">
<Fibonacci_Numbers>
    <fibonacci index="0">0</fibonacci>
    <fibonacci index="1">1</fibonacci>
    <fibonacci index="2">1</fibonacci>
    <fibonacci index="3">2</fibonacci>
    <fibonacci index="4">3</fibonacci>
    <fibonacci index="5">5</fibonacci>
    <fibonacci index="6">8</fibonacci>
    <fibonacci index="7">13</fibonacci>
    <fibonacci index="8">21</fibonacci>
    <fibonacci index="9">34</fibonacci>
    <fibonacci index="10">55</fibonacci>
    <fibonacci index="11">89</fibonacci>
    <fibonacci index="12">144</fibonacci>
    <fibonacci index="13">233</fibonacci>
    <fibonacci index="14">377</fibonacci>
    <fibonacci index="15">610</fibonacci>
    <fibonacci index="16">987</fibonacci>
    <fibonacci index="17">1597</fibonacci>
    <fibonacci index="18">2584</fibonacci>
    <fibonacci index="19">4181</fibonacci>
    <fibonacci index="20">6765</fibonacci>
    <fibonacci index="21">10946</fibonacci>
    <fibonacci index="22">17711</fibonacci>
    <fibonacci index="23">28657</fibonacci>
    <fibonacci index="24">46368</fibonacci>
    <fibonacci index="25">75025</fibonacci>
</Fibonacci_Numbers>

DOM based XMLPrettyPrinter

import org.apache.xerces.parsers.*;
import org.w3c.dom.*;
import org.xml.sax.*;
import java.io.*;
import org.apache.xerces.dom.*;
import org.apache.xml.serialize.*; 


public class DOMPrettyPrinter {

  public static void main(String[] args) { 
     
    DOMParser parser = new DOMParser();
    
    for (int i = 0; i < args.length; i++) {
      try {
        // Read the entire document into memory
        parser.parse(args[i]); 
       
        Document document = parser.getDocument();
        
        OutputFormat format 
         = new OutputFormat(document, "UTF-8", true);
        format.setLineSeparator("\r\n");
        format.setIndenting(true);
        format.setIndent(2);
        format.setLineWidth(72);
        format.setPreserveSpace(false);
        XMLSerializer serializer 
         = new XMLSerializer(System.out, format);
        serializer.serialize(document);     
      }
      catch (SAXException e) {
        System.err.println(e); 
      }
      catch (IOException e) {
        System.err.println(e); 
      }
      
    }
  
  } // end main

}

Output from a DOM based XMLPrettyPrinter

<?xml version="1.0" encoding="UTF-8"?>
<!-- <!DOCTYPE foo SYSTEM "http://msdn.microsoft.com/xml/general/htmlentities.dtd"> -->
<weblogs>
  <log>
    <name>MozillaZine</name>
    <url>http://www.mozillazine.org</url>
    <changesUrl>http://www.mozillazine.org/contents.rdf</changesUrl>
    <ownerName>Jason Kersey</ownerName>
    <ownerEmail>kerz@en.com</ownerEmail>
    <description>THE source for news on the Mozilla Organization.
      DevChats, Reviews, Chats, Builds, Demos, Screenshots, and more.</description>
    <imageUrl/>
    <adImageUrl>http://static.userland.com/weblogMonitor/ads/kerz@en.com.gif</adImageUrl>
  </log>
  <log>
    <name>SalonHerringWiredFool</name>
    <url>http://www.salonherringwiredfool.com/</url>
    <ownerName>Some Random Herring</ownerName>
    <ownerEmail>salonfool@wiredherring.com</ownerEmail>
    <description/>
  </log>
  <log>
    <name>Scripting News</name>
    <url>http://www.scripting.com/</url>
    <ownerName>Dave Winer</ownerName>
    <ownerEmail>dave@userland.com</ownerEmail>
    <description>News and commentary from the cross-platform scripting community.</description>
    <imageUrl>http://www.scripting.com/gifs/tinyScriptingNews.gif</imageUrl>
    <adImageUrl>http://static.userland.com/weblogMonitor/ads/dave@userland.com.gif</adImageUrl>
  </log>
  <log>
    <name>SlashDot.Org</name>
    <url>http://www.slashdot.org/</url>
    <ownerName>Simply a friend</ownerName>
    <ownerEmail>afriendofweblogs@weblogs.com</ownerEmail>
    <description>News for Nerds, Stuff that Matters.</description>
  </log>
</weblogs>

The point is this:


Questions?


To Learn More



Part V: JDOM

There is no compelling reason for a Java API to manipulate XML to be complex, tricky, unintuitive, or a pain in the neck.
--JDOM Mission Statement


Where we're going


What is JDOM?


About JDOM


JDOM versions


Six packages:

org.jdom
the classes that represent an XML document and its parts
org.jdom.input
classes for reading a document into memory
org.jdom.output
classes for writing a document onto a stream or other target (e.g. SAX or DOM app)
org.jdom.adapters
classes for hooking up to DOM implementations
org.jdom.filter
classes to mask parts of tree while navigating
org.jdom.transform
XSLT support via TrAX

The org.jdom package

The classes that represent an XML document and its parts


The org.jdom.input package

Classes for reading a document into memory from a file or other source


The org.jdom.output package

The classes for writing a document to a file or other target


The org.jdom.filter package

Classes and interfaces for masking out parts of a JDOM tree before navigating it:


The org.jdom.adapters package


The org.jdom.transform package

Classes for XSLT support:


Writing XML Documents with JDOM


A JDOM program that writes this XML document

<?xml version="1.0"?>
<GREETING>
  Hello JDOM!
</GREETING>

Hello JDOM

import org.jdom.*;
import org.jdom.output.XMLOutputter;
import java.io.IOException;


public class HelloJDOM {

  public static void main(String[] args) {
   
    Element root = new Element("GREETING");
    	
    root.setText("Hello JDOM!");
         
    Document doc = new Document(root);      
    
    // At this point the document only exists in memory.
    // We still need to serialize it
    XMLOutputter outputter = new XMLOutputter();
    try {
      outputter.output(doc, System.out);       
    }
    catch (IOException e) {
      System.err.println(e);
    }

  }

}

Actual Output

<?xml version="1.0" encoding="UTF-8"?>
<GREETING>Hello JDOM!</GREETING>

This is more or less what we wanted, modulo white space.


Hello DOM

Here's the same program using DOM instead of JDOM. Which is simpler?

import java.io.*;
import javax.xml.parsers.*;
import org.w3c.dom.*;
import org.apache.xerces.dom.*;
import org.apache.xml.serialize.*;


public class HelloDOM {

  public static void main(String[] args) {

    try {

      DocumentBuilderFactory factory
       = DocumentBuilderFactory.newInstance();
      DocumentBuilder builder = factory.newDocumentBuilder();
      DOMImplementation impl = builder.getDOMImplementation();

      Document hello = impl.createDocument(null, "GREETING", null);
      //                                   ^^^^              ^^^^
      //                               Namespace URI       DocType

      Element root = hello.getDocumentElement();

      // We can't use a raw string. Instead we must first create
      // a text node.
      Text text = hello.createTextNode("Hello DOM!");
      root.appendChild(text);

      // Now that the document is created we need to *serialize* it
      try {
        OutputFormat format = new OutputFormat(hello);
        XMLSerializer serializer 
         = new XMLSerializer(System.out, format);
        serializer.serialize(root);
      }
      catch (IOException e) {
        System.err.println(e);
      }
    }
    catch (DOMException e) {
      e.printStackTrace();
    }
    catch (ParserConfigurationException e) {
      System.out.println(e);
    }

  }

}

White space is significant


Actual Output

<?xml version="1.0" encoding="UTF-8"?>
<GREETING>
  Hello JDOM!
</GREETING>

fibonacci.xml

Suppose we want data in an XML document that looks something like this:

<?xml version="1.0" encoding="UTF-8"?>
<Fibonacci_Numbers>
  <fibonacci index="1">1</fibonacci>
  <fibonacci index="2">1</fibonacci>
  <fibonacci index="3">2</fibonacci>
  <fibonacci index="4">3</fibonacci>
  <fibonacci index="5">5</fibonacci>
  <fibonacci index="6">8</fibonacci>
  <fibonacci index="7">13</fibonacci>
  <fibonacci index="8">21</fibonacci>
  <fibonacci index="9">34</fibonacci>
  <fibonacci index="10">55</fibonacci>
  <fibonacci index="11">89</fibonacci>
  <fibonacci index="12">144</fibonacci>
  <fibonacci index="13">233</fibonacci>
  <fibonacci index="14">377</fibonacci>
  <fibonacci index="15">610</fibonacci>
  <fibonacci index="16">987</fibonacci>
  <fibonacci index="17">1597</fibonacci>
  <fibonacci index="18">2584</fibonacci>
  <fibonacci index="19">4181</fibonacci>
  <fibonacci index="20">6765</fibonacci>
  <fibonacci index="21">10946</fibonacci>
  <fibonacci index="22">17711</fibonacci>
  <fibonacci index="23">28657</fibonacci>
  <fibonacci index="24">46368</fibonacci>
  <fibonacci index="25">75025</fibonacci>
</Fibonacci_Numbers>

A JDOM program that writes Fibonacci numbers into an XML file

import org.jdom.*;
import org.jdom.output.XMLOutputter;
import java.math.BigInteger;
import java.io.*;


public class FibonacciJDOM {

  public static void main(String[] args) {

    Element root = new Element("Fibonacci_Numbers");

    BigInteger low  = BigInteger.ONE;
    BigInteger high = BigInteger.ONE;

    for (int i = 1; i <= 25; i++) {
      Element fibonacci = new Element("fibonacci");
      Attribute index = new Attribute("index", String.valueOf(i));
      fibonacci.setAttribute(index);
      fibonacci.setText(low.toString());
      root.addContent(fibonacci);

      BigInteger temp = high;
      high = high.add(low);
      low = temp;
    }

    Document doc = new Document(root);
    // serialize it into a file
    try {
      FileOutputStream out 
       = new FileOutputStream("fibonacci_jdom.xml");
      XMLOutputter serializer = new XMLOutputter();
      serializer.output(doc, out);
      out.flush();
      out.close();
    }
    catch (IOException e) {
      System.err.println(e);
    }

  }

}

Output

Again, modulo white space this is correct

<?xml version="1.0" encoding="UTF-8"?>
<Fibonacci_Numbers><fibonacci index="1">1</fibonacci><fibonacci index="2">1</fibonacci><fibonacci index="3">2</fibonacci><fibonacci index="4">3</fibonacci><fibonacci index="5">5</fibonacci><fibonacci index="6">8</fibonacci><fibonacci index="7">13</fibonacci><fibonacci index="8">21</fibonacci><fibonacci index="9">34</fibonacci><fibonacci index="10">55</fibonacci><fibonacci index="11">89</fibonacci><fibonacci index="12">144</fibonacci><fibonacci index="13">233</fibonacci><fibonacci index="14">377</fibonacci><fibonacci index="15">610</fibonacci><fibonacci index="16">987</fibonacci><fibonacci index="17">1597</fibonacci><fibonacci index="18">2584</fibonacci><fibonacci index="19">4181</fibonacci><fibonacci index="20">6765</fibonacci><fibonacci index="21">10946</fibonacci><fibonacci index="22">17711</fibonacci><fibonacci index="23">28657</fibonacci><fibonacci index="24">46368</fibonacci><fibonacci index="25">75025</fibonacci></Fibonacci_Numbers>

Controlling white space on output

Pass an indent string and whether or not to add newlines to the XMLSerializer constructor.

import org.jdom.*;
import org.jdom.output.XMLOutputter;
import java.math.BigInteger;
import java.io.*;


public class PrettyFibonacciJDOM {

  public static void main(String[] args) {

    Element root = new Element("Fibonacci_Numbers");

    BigInteger low  = BigInteger.ONE;
    BigInteger high = BigInteger.ONE;

    for (int i = 1; i <= 25; i++) {
      Element fibonacci = new Element("fibonacci");
      Attribute index = new Attribute("index", String.valueOf(i));
      fibonacci.setAttribute(index);
      fibonacci.setText(low.toString());
      root.addContent(fibonacci);

      BigInteger temp = high;
      high = high.add(low);
      low = temp;
    }

    Document doc = new Document(root);
    // serialize it into a file
    try {
      FileOutputStream out 
       = new FileOutputStream("pretty_fibonacci_jdom.xml");
      XMLOutputter serializer = new XMLOutputter("  ", true);
      serializer.output(doc, out);
      out.flush();
      out.close();
    }
    catch (IOException e) {
      System.err.println(e);
    }

  }

}

Output

Again, modulo white space this is correct

<?xml version="1.0" encoding="UTF-8"?>
<Fibonacci_Numbers>
  <fibonacci index="1">1</fibonacci>
  <fibonacci index="2">1</fibonacci>
  <fibonacci index="3">2</fibonacci>
  <fibonacci index="4">3</fibonacci>
  <fibonacci index="5">5</fibonacci>
  <fibonacci index="6">8</fibonacci>
  <fibonacci index="7">13</fibonacci>
  <fibonacci index="8">21</fibonacci>
  <fibonacci index="9">34</fibonacci>
  <fibonacci index="10">55</fibonacci>
  <fibonacci index="11">89</fibonacci>
  <fibonacci index="12">144</fibonacci>
  <fibonacci index="13">233</fibonacci>
  <fibonacci index="14">377</fibonacci>
  <fibonacci index="15">610</fibonacci>
  <fibonacci index="16">987</fibonacci>
  <fibonacci index="17">1597</fibonacci>
  <fibonacci index="18">2584</fibonacci>
  <fibonacci index="19">4181</fibonacci>
  <fibonacci index="20">6765</fibonacci>
  <fibonacci index="21">10946</fibonacci>
  <fibonacci index="22">17711</fibonacci>
  <fibonacci index="23">28657</fibonacci>
  <fibonacci index="24">46368</fibonacci>
  <fibonacci index="25">75025</fibonacci>
</Fibonacci_Numbers>


Suppose you want to include a DTD


ValidFibonacci

import java.math.BigInteger;
import java.io.*;
import org.jdom.*;
import org.jdom.output.XMLOutputter;


public class ValidFibonacci {

  public static void main(String[] args) {
   
    Element root = new Element("Fibonacci_Numbers");	
  	      
    BigInteger low  = BigInteger.ONE;
    BigInteger high = BigInteger.ONE;      
    
    for (int i = 1; i <= 25; i++) {
      Element fibonacci = new Element("fibonacci");
      Attribute index = new Attribute("index", String.valueOf(i));
      fibonacci.setAttribute(index);
      fibonacci.setText(low.toString());
      BigInteger temp = high;
      high = high.add(low);
      low = temp;
      root.addContent(fibonacci);
    }
 
    DocType type = new DocType("Fibonacci_Numbers", "fibonacci.dtd");
 
    Document doc = new Document(root, type);
    // serialize it into a file
    try {
      FileOutputStream out = new FileOutputStream("validfibonacci.xml");
      XMLOutputter serializer = new XMLOutputter("  ", true); 
      serializer.output(doc, out);
      out.flush();	
      out.close();
    }
    catch (IOException e) {
      System.err.println(e);
    }

  }

}

validfibonacci.xml

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE Fibonacci_Numbers SYSTEM "fibonacci.dtd">
<Fibonacci_Numbers>
  <fibonacci index="1">1</fibonacci>
  <fibonacci index="2">1</fibonacci>
  <fibonacci index="3">2</fibonacci>
  <fibonacci index="4">3</fibonacci>
  <fibonacci index="5">5</fibonacci>
  <fibonacci index="6">8</fibonacci>
  <fibonacci index="7">13</fibonacci>
  <fibonacci index="8">21</fibonacci>
  <fibonacci index="9">34</fibonacci>
  <fibonacci index="10">55</fibonacci>
  <fibonacci index="11">89</fibonacci>
  <fibonacci index="12">144</fibonacci>
  <fibonacci index="13">233</fibonacci>
  <fibonacci index="14">377</fibonacci>
  <fibonacci index="15">610</fibonacci>
  <fibonacci index="16">987</fibonacci>
  <fibonacci index="17">1597</fibonacci>
  <fibonacci index="18">2584</fibonacci>
  <fibonacci index="19">4181</fibonacci>
  <fibonacci index="20">6765</fibonacci>
  <fibonacci index="21">10946</fibonacci>
  <fibonacci index="22">17711</fibonacci>
  <fibonacci index="23">28657</fibonacci>
  <fibonacci index="24">46368</fibonacci>
  <fibonacci index="25">75025</fibonacci>
</Fibonacci_Numbers>

View Output in Browser

Internal DTD Subsets

import java.math.BigInteger;
import java.io.*;
import org.jdom.*;
import org.jdom.output.XMLOutputter;


public class InternalValidFibonacci {

  public static void main(String[] args) {
   
    Element root = new Element("Fibonacci_Numbers");	
  	      
    BigInteger low  = BigInteger.ONE;
    BigInteger high = BigInteger.ONE;      
    
    for (int i = 1; i <= 25; i++) {
      Element fibonacci = new Element("fibonacci");
      Attribute index = new Attribute("index", String.valueOf(i));
      fibonacci.setAttribute(index);
      fibonacci.setText(low.toString());
      BigInteger temp = high;
      high = high.add(low);
      low = temp;
      root.addContent(fibonacci);
    }
 
    String dtd = "<!ELEMENT Fibonacci_Numbers (fibonacci*)>\r\n";
    dtd += "<!ELEMENT fibonacci (#PCDATA)>\r\n";
    dtd += "<!ATTLIST fibonacci index CDATA #IMPLIED>\r\n";

    DocType type = new DocType("Fibonacci_Numbers");
    type.setInternalSubset(dtd);
 
    Document doc = new Document(root, type);
    // serialize it into a file
    try {
      FileOutputStream out = new FileOutputStream("internalvalidfibonacci.xml");
      XMLOutputter serializer = new XMLOutputter("  ", true); 
      serializer.output(doc, out);
      out.flush();	
      out.close();
    }
    catch (IOException e) {
      System.err.println(e);
    }

  }

}
View Output in Browser

internalvalidfibonacci.xml

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE Fibonacci_Numbers [
<!ELEMENT Fibonacci_Numbers (fibonacci*)>
<!ELEMENT fibonacci (#PCDATA)>
<!ATTLIST fibonacci index CDATA #IMPLIED>
]>
<Fibonacci_Numbers>
  <fibonacci index="1">1</fibonacci>
  <fibonacci index="2">1</fibonacci>
  <fibonacci index="3">2</fibonacci>
  <fibonacci index="4">3</fibonacci>
  <fibonacci index="5">5</fibonacci>
  <fibonacci index="6">8</fibonacci>
  <fibonacci index="7">13</fibonacci>
  <fibonacci index="8">21</fibonacci>
  <fibonacci index="9">34</fibonacci>
  <fibonacci index="10">55</fibonacci>
  <fibonacci index="11">89</fibonacci>
  <fibonacci index="12">144</fibonacci>
  <fibonacci index="13">233</fibonacci>
  <fibonacci index="14">377</fibonacci>
  <fibonacci index="15">610</fibonacci>
  <fibonacci index="16">987</fibonacci>
  <fibonacci index="17">1597</fibonacci>
  <fibonacci index="18">2584</fibonacci>
  <fibonacci index="19">4181</fibonacci>
  <fibonacci index="20">6765</fibonacci>
  <fibonacci index="21">10946</fibonacci>
  <fibonacci index="22">17711</fibonacci>
  <fibonacci index="23">28657</fibonacci>
  <fibonacci index="24">46368</fibonacci>
  <fibonacci index="25">75025</fibonacci>
</Fibonacci_Numbers>

View Output in Browser

Using Namespaces


Rules for Using Namespaces


With Namespace Prefixes

import org.jdom.Element;
import org.jdom.Document;
import org.jdom.output.XMLOutputter;
import java.math.BigInteger;
import java.io.*;


public class PrefixedFibonacci {

  public static void main(String[] args) {

    Element root = new Element("math", "mathml",
     "http://www.w3.org/1998/Math/MathML");

    BigInteger low  = BigInteger.ONE;
    BigInteger high = BigInteger.ONE;

    for (int i = 1; i <= 25; i++) {

      Element mrow = new Element("mrow", "mathml",
       "http://www.w3.org/1998/Math/MathML");

      Element mi = new Element("mi", "mathml",
       "http://www.w3.org/1998/Math/MathML");
      mi.setText("f(" + i + ")");
      mrow.addContent(mi);

      Element mo = new Element("mo", "mathml",
       "http://www.w3.org/1998/Math/MathML");
      mo.setText("=");
      mrow.addContent(mo);

      Element mn = new Element("mn", "mathml",
       "http://www.w3.org/1998/Math/MathML");
      mn.setText(low.toString());
      mrow.addContent(mn);

      BigInteger temp = high;
      high = high.add(low);
      low = temp;
      root.addContent(mrow);

    }

    Document doc = new Document(root);
    // serialize it into a file
    try {
      FileOutputStream out 
       = new FileOutputStream("prefixed_fibonacci.xml");
      XMLOutputter serializer = new XMLOutputter("  ", true); 
      serializer.output(doc, out);
      out.flush();
      out.close();
    }
    catch (IOException e) {
      System.err.println(e);
    }

  }

}
View Output in Browser

The Default, Unprefixed Namespace


Rules for Using Default Namespace


With Default Namespace

import org.jdom.Element;
import org.jdom.Document;
import org.jdom.output.XMLOutputter;
import java.math.BigInteger;
import java.io.*;


public class UnprefixedFibonacci {

  public static void main(String[] args) {
   
    Element root = new Element("math", 
     "http://www.w3.org/1998/Math/MathML");	
  	      
    BigInteger low  = BigInteger.ONE;
    BigInteger high = BigInteger.ONE;      
    
    for (int i = 1; i <= 25; i++) {
        
      Element mrow = new Element("mrow", 
       "http://www.w3.org/1998/Math/MathML");
      
      Element mi = new Element("mi", 
       "http://www.w3.org/1998/Math/MathML");
      mi.setText("f(" + i + ")"); 
      mrow.addContent(mi);
      
      Element mo = new Element("mo", 
       "http://www.w3.org/1998/Math/MathML");
      mo.setText("="); 
      mrow.addContent(mo);
      
      Element mn = new Element("mn", 
       "http://www.w3.org/1998/Math/MathML");
      mn.setText(low.toString());
      mrow.addContent(mn);

      BigInteger temp = high;
      high = high.add(low);
      low = temp;
      root.addContent(mrow);
      
    }
 
    Document doc = new Document(root);
    // serialize it into a file
    try {
      FileOutputStream out 
       = new FileOutputStream("unprefixed_fibonacci.xml");
      XMLOutputter serializer = new XMLOutputter("  ", true); 
      serializer.output(doc, out);
      out.flush();	
      out.close();
    }
    catch (IOException e) {
      System.err.println(e);
    }

  }

}
View Output in Browser

Converting data to XML


Sample Tab Delimited Data: Baseball Statistics

SurnameFirstNameTeamPositionGames PlayedGames StartedAtBatsRunsHitsDoublesTriplesHome runsRBIStolen BasesCaught StealingSacrifice HitsSacrifice FliesErrorsPBWalksStrike outsHit by pitch
AndersonGarret ANAOutfield15615162262183417157983336029801
BaughmanJustin ANASecond Base625419624509112010453806361
BolickFrank ANAThird Base2111453720120000001180
DisarcinaGary ANAShortstop1571555517315839335612712314021518
EdmondsJim ANAOutfield1541505991151844212591751150571141
ErstadDarin ANAOutfield133129537841593931982206133043776
GarciaCarlos ANASecond Base1910354510002010103111
GlausTroy ANAThird Base484516519369012310027015510
GreeneTodd ANAOutfield29157131840170000002200
HelfandEric ANACatcher000000000000000000
HollinsDave ANAThird Base10198363608816211391132217044697
JefferiesGregg ANAOutfield19187272560110100000050
JohnsonMark ANAFirst Base10214110000000000060
KreuterChad ANACatcher9674252276310123310519533493
MartinNorberto ANASecond Base79501952042201133132406290
MashoreDamon ANAOutfield4324981323602111010009223
MolinaBen ANACatcher201000000000000000
NevinPhil ANACatcher7565237275481827000252017675
O'BrienCharlie ANACatcher625817513459041800334110332
PalmeiroOrlando ANAOutfield743416528537202154700020110
PritchettChris ANAFirst Base311980122321282000104160
SalmonTim ANADesignated Hitter1361304638413928126880101020901003
ShipleyCraig ANAThird Base77321471838712170441305225
VelardeRandy ANASecond Base5150188294913142672014034421
WalbeckMatt ANACatcher10891338418715264611557830682
WilliamsReggie ANAOutfield2973671310153310007111

A Program to convert tab delimited data to XML

import java.io.*;
import org.jdom.*;
import org.jdom.output.XMLOutputter;


public class JDOMBaseballTabToXML {

  public static void main(String[] args) {
     
    Element root = new Element("players");
    
    try {
      FileInputStream fin = new FileInputStream(args[0]);
      BufferedReader in 
       = new BufferedReader(new InputStreamReader(fin));    

      String playerStats;  
      while ((playerStats = in.readLine()) != null) {
        String[] stats = splitLine(playerStats);
        
        Element player = new Element("player");

        Element first_name = new Element("first_name");
        first_name.setText(stats[1]);
        player.addContent(first_name);
        
        Element surname = new Element("surname");
        surname.setText(stats[0]);
        player.addContent(surname);
       
        Element games_played = new Element("games_played");
        games_played.setText(stats[4]);
        player.addContent(games_played);
       
        Element at_bats = new Element("at_bats");
        at_bats.setText(stats[6]);
        player.addContent(at_bats);
       
        Element runs = new Element("runs");
        runs.setText(stats[7]);
        player.addContent(runs);
       
        Element hits = new Element("hits");
        hits.setText(stats[8]);
        player.addContent(hits);
       
        Element doubles = new Element("doubles");
        doubles.setText(stats[9]);
        player.addContent(doubles);
       
        Element triples = new Element("triples");
        triples.setText(stats[10]);
        player.addContent(triples); 

        Element home_runs = new Element("home_runs");
        home_runs.setText(stats[11]);
        player.addContent(home_runs); 

        Element runs_batted_in = new Element("runs_batted_in");
        runs_batted_in.setText(stats[12]);
        player.addContent(runs_batted_in); 

        Element stolen_bases = new Element("stolen_bases");
        stolen_bases.setText(stats[13]);
        player.addContent(stolen_bases); 

        Element caught_stealing = new Element("caught_stealing");
        caught_stealing.setText(stats[14]);
        player.addContent(caught_stealing); 

        Element sacrifice_hits = new Element("sacrifice_hits");
        sacrifice_hits.setText(stats[15]);
        player.addContent(sacrifice_hits); 

        Element sacrifice_flies = new Element("sacrifice_flies");
        sacrifice_flies.setText(stats[16]);
        player.addContent(sacrifice_flies); 

        Element errors = new Element("errors");
        errors.setText(stats[17]);
        player.addContent(errors); 

        Element passed_by_ball = new Element("passed_by_ball");
        passed_by_ball.setText(stats[18]);
        player.addContent(passed_by_ball); 

        Element walks = new Element("walks");
        walks.setText(stats[19]);
        player.addContent(walks); 

        Element strike_outs = new Element("strike_outs");
        strike_outs.setText(stats[20]);
        player.addContent(strike_outs); 

        Element hit_by_pitch = new Element("hit_by_pitch");
        hit_by_pitch.setText(stats[21]);
        player.addContent(hit_by_pitch); 
        
        root.addContent(player);
      }  
      
      Document doc = new Document(root);
      // serialize it into a file
      FileOutputStream fout 
       = new FileOutputStream("baseballstats.xml");
      
      XMLOutputter serializer = new XMLOutputter("  ", true); 
      serializer.output(doc, fout);
      fout.flush();	
      fout.close();
      in.close();
      
    }
    catch (IOException e) {
      System.err.println(e);
    }
    catch (ArrayIndexOutOfBoundsException e) {
      System.out.println("Usage: java BaseballTabToXML input_file.tab");
    }

  }

  public static String[] splitLine(String playerStats) {
    
    // count the number of tabs
    int numTabs = 0;
    for (int i = 0; i < playerStats.length(); i++) {
      if (playerStats.charAt(i) == '\t') numTabs++;
    }
    int numFields = numTabs + 1;
    String[] fields = new String[numFields];
    int position = 0;
    for (int i = 0; i < numFields; i++) {
      StringBuffer field = new StringBuffer();
      while (position < playerStats.length() 
       && playerStats.charAt(position++) != '\t') {
        field.append(playerStats.charAt(position-1));
      }
      fields[i] = field.toString();
    }    
    return fields;
    
  }

}
View Output in Browser

Baseball Stats in XML

<?xml version="1.0"?>
<players>
  <player>
    <first_name>FirstName</first_name>
    <surname>Surname</surname>
    <games_played>Games Played</games_played>
    <at_bats>AtBats</at_bats>
    <runs>Runs</runs>
    <hits>Hits</hits>
    <doubles>Doubles</doubles>
    <triples>Triples</triples>
    <home_runs>Home runs</home_runs>
    <stolen_bases>RBI</stolen_bases>
    <caught_stealing>Caught Stealing</caught_stealing>
    <sacrifice_hits>Sacrifice Hits</sacrifice_hits>
    <sacrifice_flies>Sacrifice Flies</sacrifice_flies>
    <errors>Errors</errors>
    <passed_by_ball>PB</passed_by_ball>
    <walks>Walks</walks>
    <strike_outs>Strike outs</strike_outs>
    <hit_by_pitch>Hit by pitch</hit_by_pitch>
  </player>
  <player>
    <first_name>Garret </first_name>
    <surname>Anderson</surname>
    <games_played>156</games_played>
    <at_bats>622</at_bats>
    <runs>62</runs>
    <hits>183</hits>
    <doubles>41</doubles>
    <triples>7</triples>
    <home_runs>15</home_runs>
    <stolen_bases>79</stolen_bases>
    <caught_stealing>3</caught_stealing>
    <sacrifice_hits>3</sacrifice_hits>
    <sacrifice_flies>3</sacrifice_flies>
    <errors>6</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>29</walks>
    <strike_outs>80</strike_outs>
    <hit_by_pitch>1</hit_by_pitch>
  </player>
  <player>
    <first_name>Justin </first_name>
    <surname>Baughman</surname>
    <games_played>62</games_played>
    <at_bats>196</at_bats>
    <runs>24</runs>
    <hits>50</hits>
    <doubles>9</doubles>
    <triples>1</triples>
    <home_runs>1</home_runs>
    <stolen_bases>20</stolen_bases>
    <caught_stealing>4</caught_stealing>
    <sacrifice_hits>5</sacrifice_hits>
    <sacrifice_flies>3</sacrifice_flies>
    <errors>8</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>6</walks>
    <strike_outs>36</strike_outs>
    <hit_by_pitch>1</hit_by_pitch>
  </player>
  <player>
    <first_name>Frank </first_name>
    <surname>Bolick</surname>
    <games_played>21</games_played>
    <at_bats>45</at_bats>
    <runs>3</runs>
    <hits>7</hits>
    <doubles>2</doubles>
    <triples>0</triples>
    <home_runs>1</home_runs>
    <stolen_bases>2</stolen_bases>
    <caught_stealing>0</caught_stealing>
    <sacrifice_hits>0</sacrifice_hits>
    <sacrifice_flies>0</sacrifice_flies>
    <errors>0</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>11</walks>
    <strike_outs>8</strike_outs>
    <hit_by_pitch>0</hit_by_pitch>
  </player>
  <player>
    <first_name>Gary </first_name>
    <surname>Disarcina</surname>
    <games_played>157</games_played>
    <at_bats>551</at_bats>
    <runs>73</runs>
    <hits>158</hits>
    <doubles>39</doubles>
    <triples>3</triples>
    <home_runs>3</home_runs>
    <stolen_bases>56</stolen_bases>
    <caught_stealing>7</caught_stealing>
    <sacrifice_hits>12</sacrifice_hits>
    <sacrifice_flies>3</sacrifice_flies>
    <errors>14</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>21</walks>
    <strike_outs>51</strike_outs>
    <hit_by_pitch>8</hit_by_pitch>
  </player>
  <player>
    <first_name>Jim </first_name>
    <surname>Edmonds</surname>
    <games_played>154</games_played>
    <at_bats>599</at_bats>
    <runs>115</runs>
    <hits>184</hits>
    <doubles>42</doubles>
    <triples>1</triples>
    <home_runs>25</home_runs>
    <stolen_bases>91</stolen_bases>
    <caught_stealing>5</caught_stealing>
    <sacrifice_hits>1</sacrifice_hits>
    <sacrifice_flies>1</sacrifice_flies>
    <errors>5</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>57</walks>
    <strike_outs>114</strike_outs>
    <hit_by_pitch>1</hit_by_pitch>
  </player>
  <player>
    <first_name>Darin </first_name>
    <surname>Erstad</surname>
    <games_played>133</games_played>
    <at_bats>537</at_bats>
    <runs>84</runs>
    <hits>159</hits>
    <doubles>39</doubles>
    <triples>3</triples>
    <home_runs>19</home_runs>
    <stolen_bases>82</stolen_bases>
    <caught_stealing>6</caught_stealing>
    <sacrifice_hits>1</sacrifice_hits>
    <sacrifice_flies>3</sacrifice_flies>
    <errors>3</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>43</walks>
    <strike_outs>77</strike_outs>
    <hit_by_pitch>6</hit_by_pitch>
  </player>
  <player>
    <first_name>Carlos </first_name>
    <surname>Garcia</surname>
    <games_played>19</games_played>
    <at_bats>35</at_bats>
    <runs>4</runs>
    <hits>5</hits>
    <doubles>1</doubles>
    <triples>0</triples>
    <home_runs>0</home_runs>
    <stolen_bases>0</stolen_bases>
    <caught_stealing>0</caught_stealing>
    <sacrifice_hits>1</sacrifice_hits>
    <sacrifice_flies>0</sacrifice_flies>
    <errors>1</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>3</walks>
    <strike_outs>11</strike_outs>
    <hit_by_pitch>1</hit_by_pitch>
  </player>
  <player>
    <first_name>Troy </first_name>
    <surname>Glaus</surname>
    <games_played>48</games_played>
    <at_bats>165</at_bats>
    <runs>19</runs>
    <hits>36</hits>
    <doubles>9</doubles>
    <triples>0</triples>
    <home_runs>1</home_runs>
    <stolen_bases>23</stolen_bases>
    <caught_stealing>0</caught_stealing>
    <sacrifice_hits>0</sacrifice_hits>
    <sacrifice_flies>2</sacrifice_flies>
    <errors>7</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>15</walks>
    <strike_outs>51</strike_outs>
    <hit_by_pitch>0</hit_by_pitch>
  </player>
  <player>
    <first_name>Todd </first_name>
    <surname>Greene</surname>
    <games_played>29</games_played>
    <at_bats>71</at_bats>
    <runs>3</runs>
    <hits>18</hits>
    <doubles>4</doubles>
    <triples>0</triples>
    <home_runs>1</home_runs>
    <stolen_bases>7</stolen_bases>
    <caught_stealing>0</caught_stealing>
    <sacrifice_hits>0</sacrifice_hits>
    <sacrifice_flies>0</sacrifice_flies>
    <errors>0</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>2</walks>
    <strike_outs>20</strike_outs>
    <hit_by_pitch>0</hit_by_pitch>
  </player>
  <player>
    <first_name>Eric </first_name>
    <surname>Helfand</surname>
    <games_played>0</games_played>
    <at_bats>0</at_bats>
    <runs>0</runs>
    <hits>0</hits>
    <doubles>0</doubles>
    <triples>0</triples>
    <home_runs>0</home_runs>
    <stolen_bases>0</stolen_bases>
    <caught_stealing>0</caught_stealing>
    <sacrifice_hits>0</sacrifice_hits>
    <sacrifice_flies>0</sacrifice_flies>
    <errors>0</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>0</walks>
    <strike_outs>0</strike_outs>
    <hit_by_pitch>0</hit_by_pitch>
  </player>
  <player>
    <first_name>Dave </first_name>
    <surname>Hollins</surname>
    <games_played>101</games_played>
    <at_bats>363</at_bats>
    <runs>60</runs>
    <hits>88</hits>
    <doubles>16</doubles>
    <triples>2</triples>
    <home_runs>11</home_runs>
    <stolen_bases>39</stolen_bases>
    <caught_stealing>3</caught_stealing>
    <sacrifice_hits>2</sacrifice_hits>
    <sacrifice_flies>2</sacrifice_flies>
    <errors>17</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>44</walks>
    <strike_outs>69</strike_outs>
    <hit_by_pitch>7</hit_by_pitch>
  </player>
  <player>
    <first_name>Gregg </first_name>
    <surname>Jefferies</surname>
    <games_played>19</games_played>
    <at_bats>72</at_bats>
    <runs>7</runs>
    <hits>25</hits>
    <doubles>6</doubles>
    <triples>0</triples>
    <home_runs>1</home_runs>
    <stolen_bases>10</stolen_bases>
    <caught_stealing>0</caught_stealing>
    <sacrifice_hits>0</sacrifice_hits>
    <sacrifice_flies>0</sacrifice_flies>
    <errors>0</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>0</walks>
    <strike_outs>5</strike_outs>
    <hit_by_pitch>0</hit_by_pitch>
  </player>
  <player>
    <first_name>Mark </first_name>
    <surname>Johnson</surname>
    <games_played>10</games_played>
    <at_bats>14</at_bats>
    <runs>1</runs>
    <hits>1</hits>
    <doubles>0</doubles>
    <triples>0</triples>
    <home_runs>0</home_runs>
    <stolen_bases>0</stolen_bases>
    <caught_stealing>0</caught_stealing>
    <sacrifice_hits>0</sacrifice_hits>
    <sacrifice_flies>0</sacrifice_flies>
    <errors>0</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>0</walks>
    <strike_outs>6</strike_outs>
    <hit_by_pitch>0</hit_by_pitch>
  </player>
  <player>
    <first_name>Chad </first_name>
    <surname>Kreuter</surname>
    <games_played>96</games_played>
    <at_bats>252</at_bats>
    <runs>27</runs>
    <hits>63</hits>
    <doubles>10</doubles>
    <triples>1</triples>
    <home_runs>2</home_runs>
    <stolen_bases>33</stolen_bases>
    <caught_stealing>0</caught_stealing>
    <sacrifice_hits>5</sacrifice_hits>
    <sacrifice_flies>1</sacrifice_flies>
    <errors>9</errors>
    <passed_by_ball>5</passed_by_ball>
    <walks>33</walks>
    <strike_outs>49</strike_outs>
    <hit_by_pitch>3</hit_by_pitch>
  </player>
  <player>
    <first_name>Norberto </first_name>
    <surname>Martin</surname>
    <games_played>79</games_played>
    <at_bats>195</at_bats>
    <runs>20</runs>
    <hits>42</hits>
    <doubles>2</doubles>
    <triples>0</triples>
    <home_runs>1</home_runs>
    <stolen_bases>13</stolen_bases>
    <caught_stealing>1</caught_stealing>
    <sacrifice_hits>3</sacrifice_hits>
    <sacrifice_flies>2</sacrifice_flies>
    <errors>4</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>6</walks>
    <strike_outs>29</strike_outs>
    <hit_by_pitch>0</hit_by_pitch>
  </player>
  <player>
    <first_name>Damon </first_name>
    <surname>Mashore</surname>
    <games_played>43</games_played>
    <at_bats>98</at_bats>
    <runs>13</runs>
    <hits>23</hits>
    <doubles>6</doubles>
    <triples>0</triples>
    <home_runs>2</home_runs>
    <stolen_bases>11</stolen_bases>
    <caught_stealing>0</caught_stealing>
    <sacrifice_hits>1</sacrifice_hits>
    <sacrifice_flies>0</sacrifice_flies>
    <errors>0</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>9</walks>
    <strike_outs>22</strike_outs>
    <hit_by_pitch>3</hit_by_pitch>
  </player>
  <player>
    <first_name>Ben </first_name>
    <surname>Molina</surname>
    <games_played>2</games_played>
    <at_bats>1</at_bats>
    <runs>0</runs>
    <hits>0</hits>
    <doubles>0</doubles>
    <triples>0</triples>
    <home_runs>0</home_runs>
    <stolen_bases>0</stolen_bases>
    <caught_stealing>0</caught_stealing>
    <sacrifice_hits>0</sacrifice_hits>
    <sacrifice_flies>0</sacrifice_flies>
    <errors>0</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>0</walks>
    <strike_outs>0</strike_outs>
    <hit_by_pitch>0</hit_by_pitch>
  </player>
  <player>
    <first_name>Phil </first_name>
    <surname>Nevin</surname>
    <games_played>75</games_played>
    <at_bats>237</at_bats>
    <runs>27</runs>
    <hits>54</hits>
    <doubles>8</doubles>
    <triples>1</triples>
    <home_runs>8</home_runs>
    <stolen_bases>27</stolen_bases>
    <caught_stealing>0</caught_stealing>
    <sacrifice_hits>0</sacrifice_hits>
    <sacrifice_flies>2</sacrifice_flies>
    <errors>5</errors>
    <passed_by_ball>20</passed_by_ball>
    <walks>17</walks>
    <strike_outs>67</strike_outs>
    <hit_by_pitch>5</hit_by_pitch>
  </player>
  <player>
    <first_name>Charlie </first_name>
    <surname>Obrien</surname>
    <games_played>62</games_played>
    <at_bats>175</at_bats>
    <runs>13</runs>
    <hits>45</hits>
    <doubles>9</doubles>
    <triples>0</triples>
    <home_runs>4</home_runs>
    <stolen_bases>18</stolen_bases>
    <caught_stealing>0</caught_stealing>
    <sacrifice_hits>3</sacrifice_hits>
    <sacrifice_flies>3</sacrifice_flies>
    <errors>4</errors>
    <passed_by_ball>1</passed_by_ball>
    <walks>10</walks>
    <strike_outs>33</strike_outs>
    <hit_by_pitch>2</hit_by_pitch>
  </player>
  <player>
    <first_name>Orlando </first_name>
    <surname>Palmeiro</surname>
    <games_played>74</games_played>
    <at_bats>165</at_bats>
    <runs>28</runs>
    <hits>53</hits>
    <doubles>7</doubles>
    <triples>2</triples>
    <home_runs>0</home_runs>
    <stolen_bases>21</stolen_bases>
    <caught_stealing>4</caught_stealing>
    <sacrifice_hits>7</sacrifice_hits>
    <sacrifice_flies>0</sacrifice_flies>
    <errors>0</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>20</walks>
    <strike_outs>11</strike_outs>
    <hit_by_pitch>0</hit_by_pitch>
  </player>
  <player>
    <first_name>Chris </first_name>
    <surname>Pritchett</surname>
    <games_played>31</games_played>
    <at_bats>80</at_bats>
    <runs>12</runs>
    <hits>23</hits>
    <doubles>2</doubles>
    <triples>1</triples>
    <home_runs>2</home_runs>
    <stolen_bases>8</stolen_bases>
    <caught_stealing>0</caught_stealing>
    <sacrifice_hits>0</sacrifice_hits>
    <sacrifice_flies>0</sacrifice_flies>
    <errors>1</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>4</walks>
    <strike_outs>16</strike_outs>
    <hit_by_pitch>0</hit_by_pitch>
  </player>
  <player>
    <first_name>Tim </first_name>
    <surname>Salmon</surname>
    <games_played>136</games_played>
    <at_bats>463</at_bats>
    <runs>84</runs>
    <hits>139</hits>
    <doubles>28</doubles>
    <triples>1</triples>
    <home_runs>26</home_runs>
    <stolen_bases>88</stolen_bases>
    <caught_stealing>1</caught_stealing>
    <sacrifice_hits>0</sacrifice_hits>
    <sacrifice_flies>10</sacrifice_flies>
    <errors>2</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>90</walks>
    <strike_outs>100</strike_outs>
    <hit_by_pitch>3</hit_by_pitch>
  </player>
  <player>
    <first_name>Craig </first_name>
    <surname>Shipley</surname>
    <games_played>77</games_played>
    <at_bats>147</at_bats>
    <runs>18</runs>
    <hits>38</hits>
    <doubles>7</doubles>
    <triples>1</triples>
    <home_runs>2</home_runs>
    <stolen_bases>17</stolen_bases>
    <caught_stealing>4</caught_stealing>
    <sacrifice_hits>4</sacrifice_hits>
    <sacrifice_flies>1</sacrifice_flies>
    <errors>3</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>5</walks>
    <strike_outs>22</strike_outs>
    <hit_by_pitch>5</hit_by_pitch>
  </player>
  <player>
    <first_name>Randy </first_name>
    <surname>Velarde</surname>
    <games_played>51</games_played>
    <at_bats>188</at_bats>
    <runs>29</runs>
    <hits>49</hits>
    <doubles>13</doubles>
    <triples>1</triples>
    <home_runs>4</home_runs>
    <stolen_bases>26</stolen_bases>
    <caught_stealing>2</caught_stealing>
    <sacrifice_hits>0</sacrifice_hits>
    <sacrifice_flies>1</sacrifice_flies>
    <errors>4</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>34</walks>
    <strike_outs>42</strike_outs>
    <hit_by_pitch>1</hit_by_pitch>
  </player>
  <player>
    <first_name>Matt </first_name>
    <surname>Walbeck</surname>
    <games_played>108</games_played>
    <at_bats>338</at_bats>
    <runs>41</runs>
    <hits>87</hits>
    <doubles>15</doubles>
    <triples>2</triples>
    <home_runs>6</home_runs>
    <stolen_bases>46</stolen_bases>
    <caught_stealing>1</caught_stealing>
    <sacrifice_hits>5</sacrifice_hits>
    <sacrifice_flies>5</sacrifice_flies>
    <errors>7</errors>
    <passed_by_ball>8</passed_by_ball>
    <walks>30</walks>
    <strike_outs>68</strike_outs>
    <hit_by_pitch>2</hit_by_pitch>
  </player>
  <player>
    <first_name>Reggie </first_name>
    <surname>Williams</surname>
    <games_played>29</games_played>
    <at_bats>36</at_bats>
    <runs>7</runs>
    <hits>13</hits>
    <doubles>1</doubles>
    <triples>0</triples>
    <home_runs>1</home_runs>
    <stolen_bases>5</stolen_bases>
    <caught_stealing>3</caught_stealing>
    <sacrifice_hits>1</sacrifice_hits>
    <sacrifice_flies>0</sacrifice_flies>
    <errors>0</errors>
    <passed_by_ball>0</passed_by_ball>
    <walks>7</walks>
    <strike_outs>11</strike_outs>
    <hit_by_pitch>1</hit_by_pitch>
  </player>
</players>

A Shortcut

import java.io.*;
import org.jdom.*;
import org.jdom.output.XMLOutputter;


public class BaseballTabToXMLShortcut {

  public static void main(String[] args) {
     
    Element root = new Element("players");
    
    try {
      FileInputStream fin = new FileInputStream(args[0]);
      BufferedReader in 
       = new BufferedReader(new InputStreamReader(fin));    

      String playerStats;  
      while ((playerStats = in.readLine()) != null) {
        String[] stats = splitLine(playerStats);
        
        Element player = new Element("player");

        player.addContent((new Element("first_name")).setText(stats[1]));
        player.addContent((new Element("surname")).setText(stats[0]));
        player.addContent((new Element("games_played")).setText(stats[4]));
        player.addContent((new Element("at_bats")).setText(stats[6]));
        player.addContent((new Element("runs")).setText(stats[7]));
        player.addContent((new Element("hits")).setText(stats[8]));
        player.addContent((new Element("doubles")).setText(stats[9]));
        player.addContent((new Element("triples")).setText(stats[10]));
        player.addContent((new Element("home_runs")).setText(stats[11]));
        player.addContent((new Element("runs_batted_in")).setText(stats[12]));
        player.addContent((new Element("stolen_bases")).setText(stats[13]));
        player.addContent((new Element("caught_stealing")).setText(stats[14]));
        player.addContent((new Element("sacrifice_hits")).setText(stats[15]));
        player.addContent((new Element("sacrifice_flies")).setText(stats[16]));
        player.addContent((new Element("errors")).setText(stats[17]));
        player.addContent((new Element("passed_by_ball")).setText(stats[18]));
        player.addContent((new Element("walks")).setText(stats[19]));
        player.addContent((new Element("strike_outs")).setText(stats[20]));
        player.addContent((new Element("hit_by_pitch")).setText(stats[21]));
        
        root.addContent(player);
      }  
      
      Document doc = new Document(root);
      // serialize it into a file
      FileOutputStream fout 
       = new FileOutputStream("baseballstats.xml");
      
      XMLOutputter serializer = new XMLOutputter(); 
      serializer.output(doc, fout);
      fout.flush();	
      fout.close();
      in.close();
      
    }
    catch (IOException e) {
      System.err.println(e);
    }
    catch (ArrayIndexOutOfBoundsException e) {
      System.out.println(
       "Usage: java BaseballTabToXML input_file.tab");
    }

  }

  public static String[] splitLine(String playerStats) {
    
    // count the number of tabs
    int numTabs = 0;
    for (int i = 0; i < playerStats.length(); i++) {
      if (playerStats.charAt(i) == '\t') numTabs++;
    }
    int numFields = numTabs + 1;
    String[] fields = new String[numFields];
    int position = 0;
    for (int i = 0; i < numFields; i++) {
      StringBuffer field = new StringBuffer();
      while (position < playerStats.length() 
       && playerStats.charAt(position++) != '\t') {
        field.append(playerStats.charAt(position-1));
      }
      fields[i] = field.toString();
    }    
    return fields;
    
  }

}

Converting data to XML while Processing it

import java.io.*;
import java.text.*;
import java.util.*;
import org.jdom.*;
import org.jdom.output.XMLOutputter;

public class JDOMBattingAverage {

  public static void main(String[] args) {
     
    Element root = new Element("players");
     
    try {
      FileInputStream fin = new FileInputStream(args[0]);
      BufferedReader in 
       = new BufferedReader(new InputStreamReader(fin));
      
      String playerStats;
      
      // for formatting batting averages
      DecimalFormat averages = (DecimalFormat) 
       NumberFormat.getNumberInstance(Locale.US);
      averages.setMaximumFractionDigits(3);
      averages.setMinimumFractionDigits(3);
      averages.setMinimumIntegerDigits(0);
      
      while ((playerStats = in.readLine()) != null) {
        String[] stats = splitLine(playerStats);
        
        String formattedAverage;
        try {
          int atBats         = Integer.parseInt(stats[6]);
          int hits           = Integer.parseInt(stats[8]);
        
          if (atBats <= 0) formattedAverage = "N/A";
          else {
            double average = hits / (double) atBats;
            formattedAverage = averages.format(average);
          }       
        }
        catch (Exception e) {
          // skip this player
          continue; 
        }

        Element player = new Element("player");

        Element first_name = new Element("first_name");
        first_name.setText(stats[1]);
        player.addContent(first_name);
             
        Element surname = new Element("surname");
        surname.setText(stats[0]);
        player.addContent(surname);
       
        Element battingAverage = new Element("batting_average");
        battingAverage.setText(formattedAverage);
        player.addContent(battingAverage);
   
        root.addContent(player);
        
      }  
      
      
      Document doc = new Document(root);
      // serialize it into a file
      FileOutputStream fout 
       = new FileOutputStream("battingaverages.xml");
      
      XMLOutputter serializer = new XMLOutputter("  ", true); 
      serializer.output(doc, fout);
      fout.flush();	
      fout.close();
      in.close();

    }
    catch (IOException e) {
      System.err.println(e);
    }
    catch (ArrayIndexOutOfBoundsException e) {
      System.out.println("Usage: java JDOMBattingAverage input_file.tab");
    }

  }

  public static String[] splitLine(String playerStats) {
    
    // count the number of tabs
    int numTabs = 0;
    for (int i = 0; i < playerStats.length(); i++) {
      if (playerStats.charAt(i) == '\t') numTabs++;
    }
    int numFields = numTabs + 1;
    String[] fields = new String[numFields];
    int position = 0;
    for (int i = 0; i < numFields; i++) {
      StringBuffer field = new StringBuffer();
      while (position < playerStats.length() 
       && playerStats.charAt(position++) != '\t') {
        field.append(playerStats.charAt(position-1));
      }
      fields[i] = field.toString();
    }    
    return fields;
    
  }

}
View Output in Browser

Batting Averages in XML

<?xml version="1.0"?>
<players>
  <player>
    <first_name>Garret </first_name>
    <surname>Anderson</surname>
    <batting_average>.294</batting_average>
  </player>
  <player>
    <first_name>Justin </first_name>
    <surname>Baughman</surname>
    <batting_average>.255</batting_average>
  </player>
  <player>
    <first_name>Frank </first_name>
    <surname>Bolick</surname>
    <batting_average>.156</batting_average>
  </player>
  <player>
    <first_name>Gary </first_name>
    <surname>Disarcina</surname>
    <batting_average>.287</batting_average>
  </player>
  <player>
    <first_name>Jim </first_name>
    <surname>Edmonds</surname>
    <batting_average>.307</batting_average>
  </player>
  <player>
    <first_name>Darin </first_name>
    <surname>Erstad</surname>
    <batting_average>.296</batting_average>
  </player>
  <player>
    <first_name>Carlos </first_name>
    <surname>Garcia</surname>
    <batting_average>.143</batting_average>
  </player>
  <player>
    <first_name>Troy </first_name>
    <surname>Glaus</surname>
    <batting_average>.218</batting_average>
  </player>
  <player>
    <first_name>Todd </first_name>
    <surname>Greene</surname>
    <batting_average>.254</batting_average>
  </player>
  <player>
    <first_name>Eric </first_name>
    <surname>Helfand</surname>
    <batting_average>N/A</batting_average>
  </player>
  <player>
    <first_name>Dave </first_name>
    <surname>Hollins</surname>
    <batting_average>.242</batting_average>
  </player>
  <player>
    <first_name>Gregg </first_name>
    <surname>Jefferies</surname>
    <batting_average>.347</batting_average>
  </player>
  <player>
    <first_name>Mark </first_name>
    <surname>Johnson</surname>
    <batting_average>.071</batting_average>
  </player>
  <player>
    <first_name>Chad </first_name>
    <surname>Kreuter</surname>
    <batting_average>.250</batting_average>
  </player>
  <player>
    <first_name>Norberto </first_name>
    <surname>Martin</surname>
    <batting_average>.215</batting_average>
  </player>
  <player>
    <first_name>Damon </first_name>
    <surname>Mashore</surname>
    <batting_average>.235</batting_average>
  </player>
  <player>
    <first_name>Ben </first_name>
    <surname>Molina</surname>
    <batting_average>.000</batting_average>
  </player>
  <player>
    <first_name>Phil </first_name>
    <surname>Nevin</surname>
    <batting_average>.228</batting_average>
  </player>
  <player>
    <first_name>Charlie </first_name>
    <surname>Obrien</surname>
    <batting_average>.257</batting_average>
  </player>
  <player>
    <first_name>Orlando </first_name>
    <surname>Palmeiro</surname>
    <batting_average>.321</batting_average>
  </player>
  <player>
    <first_name>Chris </first_name>
    <surname>Pritchett</surname>
    <batting_average>.288</batting_average>
  </player>
  <player>
    <first_name>Tim </first_name>
    <surname>Salmon</surname>
    <batting_average>.300</batting_average>
  </player>
  <player>
    <first_name>Craig </first_name>
    <surname>Shipley</surname>
    <batting_average>.259</batting_average>
  </player>
  <player>
    <first_name>Randy </first_name>
    <surname>Velarde</surname>
    <batting_average>.261</batting_average>
  </player>
  <player>
    <first_name>Matt </first_name>
    <surname>Walbeck</surname>
    <batting_average>.257</batting_average>
  </player>
  <player>
    <first_name>Reggie </first_name>
    <surname>Williams</surname>
    <batting_average>.361</batting_average>
  </player>
</players>

Advantages of JDOM for Writing Documents


Questions?


Reading XML with JDOM


JDOM Compatible Parsers for Java

Any SAX or DOM compatible parser including:


The JDOM Process


Parsing a Document with JDOM

import org.jdom.JDOMException;
import org.jdom.input.SAXBuilder;
import java.io.IOException;


public class JDOMChecker {

  public static void main(String[] args) {
  
    if (args.length == 0) {
      System.out.println("Usage: java JDOMChecker URL1 URL2..."); 
    } 
      
    SAXBuilder builder = new SAXBuilder();
     
    // start parsing... 
    for (int i = 0; i < args.length; i++) {
      
      // command line should offer URIs or file names
      try {
        builder.build(args[i]);
        // If no exception is thrown, then there are
        // no well-formedness errors.
        System.out.println(args[i] + " is well-formed.");
      }
             // indicates a well-formedness error
      catch (JDOMException e) { 
        System.out.println(args[i] + " is not well-formed.");
        System.out.println(e.getMessage());
      }
      catch (IOException e) { 
        System.out.println("Could not check " + args[i]);
        System.out.println("because " + e.getMessage());
      }
      
    }   
  
  }

}

Parser Results

% java JDOMChecker shortlogs.xml HelloJDOM.java
shortlogs.xml is well formed.
HelloJDOM.java is not well formed.
The markup in the document preceding the root element must be well-formed.: 
Error on line 1 of XML document: The markup in the document preceding the 
root element must be well-formed.

Turning on Validation in JDOM


JDOM Validator

import org.jdom.input.*;
import org.jdom.JDOMException;
import org.xml.sax.*;
import java.io.*;


public class JDOMValidator {

  public static void main(String[] args) {

    SAXBuilder parser = new SAXBuilder(true);

    if (args.length == 0) {
      System.out.println("Usage: java JDOMValidator URL1 URL2...");
    }

    // start parsing...
    for (int i = 0; i < args.length; i++) {

      // command line should offer URIs or file names
      try {
        parser.build(args[i]);
        // If there are no well-formedness errors,
        // then no exception is thrown
        System.out.println(args[i] + " is well formed.");
      }
      catch (JDOMException e) { 
        System.out.println(args[i] + " is not valid.");
        System.out.println(e.getMessage());
      }

    }

  }

}

Validation Output

% java JDOMValidator invalid_fibonacci.xml
invalid_fibonacci.xml is not valid.
Element type "title" must be declared.: Error on line 8 of XML document: 
Element type "title" must be declared.

% java JDOMValidator validfibonacci.xml
validfibonacci.xml is valid.

Weblogs with JDOM

Full list

Goal: Return a list of all the URLs in this list as java.net.URL objects

Design Decisions


JDOM Design


Weblogs with JDOM

import org.jdom.*;
import org.jdom.input.SAXBuilder;
import java.util.*;
import java.net.*;


public class WeblogsJDOM {
   
  public static String DEFAULT_SYSTEM_ID 
   = "http://static.userland.com/weblogMonitor/logs.xml"; 
     
  public static List listChannels() throws JDOMException {
    return listChannels(DEFAULT_SYSTEM_ID); 
  }
  
  public static List listChannels(String systemID) 
   throws JDOMException, NullPointerException {
    
    if (systemID == null) {
      throw new NullPointerException("URL must be non-null");   
    }
    
    SAXBuilder builder = new SAXBuilder();
    // Load the entire document into memory 
    // from the network or file system
    Document doc = builder.build(systemID);
    
    // Descend the tree and find the URLs. It helps that
    // the document has a very regular structure.
    Element weblogs = doc.getRootElement();
    List logs = weblogs.getChildren("log");
    Vector urls = new Vector(logs.size());
    Iterator iterator = logs.iterator();
    while (iterator.hasNext()) {
      Object o = iterator.next();
      Element log = (Element) o;
      try {
                         // This will probably be changed to 
                         //  getElement() or getChildElement() 
        Element url = log.getChild("url"); 
        if (url == null) continue;
        String content = url.getTextTrim();
        URL u = new URL(content);
        urls.addElement(u);
      }
      catch (MalformedURLException e) {
        // bad input data from one third party; just ignore it 
      }
    }
    return urls;
    
  }
  
  public static void main(String[] args) {
   
    try {
      List urls;
      if (args.length > 0) {
        urls = listChannels(args[0]);
      }
      else {
        urls = listChannels();
      }
      Iterator iterator = urls.iterator();
      while (iterator.hasNext()) {
        System.out.println(iterator.next()); 
      }
    }
    catch (/* Unexpected */ Exception e) {
      e.printStackTrace(); 
    }
    
  }
  
}

Weblogs Output

% java WeblogsJDOM
http://2020Hindsight.editthispage.com/
http://www.sff.net/people/mitchw/weblog/weblog.htp
http://nate.weblogs.com/
http://plugins.launchpoint.net
http://404.psistorm.net
http://home.att.net/~geek9000
http://daubnet.tzo.com/weblog
several hundred more...

The org.jdom Package

The classes that represent an XML document and its parts


The Document Node


The Document Class

package org.jdom;

public class Document implements Serializable, Cloneable {

  protected ContentList content;
  protected DocType docType;

  public Document()
  public Document(Element rootElement, DocType docType)
  public Document(Element rootElement) 
  public Document(List newContent, DocType docType) 
  public Document(List content)

  public boolean  hasRootElement()
  public Element  getRootElement()
  public Document setRootElement(Element rootElement)
  public Element  detachRootElement() 
  
  public DocType  getDocType()
  public Document setDocType(DocType docType)
  
  public Document addContent(ProcessingInstruction pi)
  public Document addContent(Comment comment) 
  public List     getContent()
  public List     getContent(Filter filter)
  public Document setContent(List newContent)
  public boolean  removeContent(ProcessingInstruction pi)
  public boolean  removeContent(Comment comment) 
  
  // Java utility methods
  public String toString()
  public final boolean equals(Object ob)
  public final int hashCode()
  public Object clone() 
  
}

Document Example

import org.jdom.Document;
import org.jdom.JDOMException;
import org.jdom.input.SAXBuilder;
import org.jdom.output.XMLOutputter;
import java.io.IOException;


public class XMLPrinter {

  public static void main(String[] args) {
  
    if (args.length == 0) {
      System.out.println("Usage: java XMLPrinter URL1 URL2..."); 
    } 
      
    SAXBuilder builder = new SAXBuilder();
     
    // start parsing... 
    for (int i = 0; i < args.length; i++) {
      
      // command line should offer URIs or file names
      try {
        Document doc = builder.build(args[i]);
        System.out.println("*************" + args[i] 
         + "*************");
        XMLOutputter outputter = new XMLOutputter();
        outputter.output(doc, System.out);
      }
      // indicates a well-formedness or other error
      catch (JDOMException e) { 
        System.out.println(args[i] + " is not well formed.");
        System.out.println(e.getMessage());
      }
      // shouldn't happen because System.out eats exceptions
      catch (IOException e) { 
        System.out.println(e.getMessage());
      }
      
    }   
  
  }

}

Output from XMLPrinter

% java XMLPrinter shortlogs.xml
*************shortlogs.xml*************
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE foo SYSTEM "http://msdn.microsoft.com/xml/general/htmlentities.dtd"><weblogs>
        <log>
                <name>MozillaZine</name>
                <url>http://www.mozillazine.org</url>
                <changesUrl>http://www.mozillazine.org/contents.rdf</changesUrl>

                <ownerName>Jason Kersey</ownerName>
                <ownerEmail>kerz@en.com</ownerEmail>
                <description>THE source for news on the Mozilla Organization.  DevChats, Reviews, Chats, Builds, Demos, Screenshots, and more.</description>
                <imageUrl />
                <adImageUrl>http://static.userland.com/weblogMonitor/ads/kerz@en.com.gif</adImageUrl>
                </log>
        <log>
                <name>SalonHerringWiredFool</name>
                <url>http://www.salonherringwiredfool.com/</url>
                <ownerName>Some Random Herring</ownerName>
                <ownerEmail>salonfool@wiredherring.com</ownerEmail>
                <description />
                </log>
        <log>
                <name>SlashDot.Org</name>
                <url>http://www.slashdot.org/</url>
                <ownerName>Simply a friend</ownerName>
                <ownerEmail>afriendofweblogs@weblogs.com</ownerEmail>
                <description>News for Nerds, Stuff that Matters.</description>
                </log>
        </weblogs>

Element Nodes


Element Class Implementation


The Element Class

package org.jdom;

public class Element implements Serializable, Cloneable {

    protected           String        name;
    protected transient Namespace     namespace;
    protected           Object        parent;
    protected           AttributeList attributes;
    protected transient List          additionalNamespaces
    protected           List          content;

    protected Element() {}
    public    Element(String name, Namespace namespace) {}
    public    Element(String name) {}
    public    Element(String name, String uri) {}
    public    Element(String name, String prefix, String uri) {}

    public String     getName() {}
    public Namespace  getNamespace() {}
    public Namespace  getNamespace(String prefix) {}
    public String     getNamespacePrefix() {}
    public String     getNamespaceURI() {}
    public String     getQualifiedName() {}
    public Element    getParent() {}
    
    protected Element setParent(Element parent) {}
    public    boolean isRootElement() {}
    protected Element setIsRootElement(boolean isRootElement) {}
    public    Element setChildren(List children)
    protected Element setDocument(Document document)
    public    Element setName(String name)
    public    Element setNamespace(Namespace namespace)
    public    Element setText(String text)

    public String    getText() {} 
    public String    getTextTrim() {} 
    public String    getTextNormalize() {} 
    
    public String    getChildText(String name) {} 
    public String    getChildTextTrim(String name) {} 
    public String    getChildTextNormalize(String name) {} 
    public String    getChildText(String name, Namespace ns) {} 
    public String    getChildTextTrim(String name, Namespace ns) {} 
    public String    getChildTextNormalize(String name, Namespace ns) {} 

    public List      getChildren() {} 
    public Element   setChildren(List children) {} 
    public List      getChildren(String name) {} 
    public List      getChildren(String name, Namespace ns) {} 
    public Element   getChild(String name, Namespace ns) {} 
    public Element   getChild(String name) {} 
    public boolean   removeChild(String name) {} 
    public boolean   removeChild(String name, Namespace ns) {} 
    public boolean   removeChildren(String name) {}
    public boolean   removeChildren(String name, Namespace ns) {} 
    public boolean   removeChildren() {} 
    
    public List      getContent()
    public List      getContent(Filter filter)
    public Element   setContent(List newContent)
    public Element   addContent(String text) {}
    public Element   addContent(Text text) {}
    public Element   addContent(Element element) {} 
    public Element   addContent(ProcessingInstruction pi) {} 
    public Element   addContent(EntityRef entity) {} 
    public Element   addContent(Comment comment) {} 
    public Element   addContent(CDATA cdata) {} 
    public boolean   removeContent(Element element) {} 
    public boolean   removeContent(CDATA cdata) {} 
    public boolean   removeContent(ProcessingInstruction pi) {} 
    public boolean   removeContent(EntityRef entity) {} 
    public boolean   removeContent(Comment comment) {} 
    
    public Element   detach()

    public List      getAttributes() {} 
    public Attribute getAttribute(String name) {} 
    public Attribute getAttribute(String name, Namespace ns) {} 
    public String    getAttributeValue(String name) {} 
    public String    getAttributeValue(String name, Namespace ns) {} 
    public Element   setAttribute(Attribute attribute) {} 
    public Element   setAttributes(List attributes) {} 
    public boolean   removeAttribute(String name) {} 
    public boolean   removeAttribute(String name, Namespace ns) {} 

    public void addNamespaceDeclaration(Namespace additionalNamespace) {}
    public void removeNamespaceDeclaration(Namespace additionalNamespace) {}
    public List getAdditionalNamespaces() {}

    public Element detach() {}
    
    ///////////////////////////////////////
    // Basic Utility Methods
    /////////////////////////////////////// 
    public final String  toString() {}
    public final boolean equals(Object ob) {}
    public final int     hashCode() {}
    public final Object  clone() {}
    
}

Element Example: XCount

import org.jdom.*;
import org.jdom.input.SAXBuilder;
import java.util.*;


public class XCount {

  public static void main(String[] args) {
  
    if (args.length == 0) {
      System.out.println("Usage: java XCount URL1 URL2..."); 
    } 
      
    SAXBuilder builder = new SAXBuilder();
     
    System.out.println(
     "File\tElements\tAttributes\tComments\tProcessing Instructions\tCharacters");
    // start parsing... 
    for (int i = 0; i < args.length; i++) {
      
      // command line should offer URIs or file names
      try {
        Document doc = builder.build(args[i]);
        System.out.print(args[i] + ":\t");
        String result = count(doc);
        System.out.println(result);
      }
             // indicates a well-formedness or other error
      catch (JDOMException e) { 
        System.out.println(args[i] 
         + " is not a well formed XML document.");
        System.out.println(e.getMessage());
      }
      
    }   
  
  }  

  private static int numCharacters             = 0;
  private static int numComments               = 0;
  private static int numElements               = 0;
  private static int numAttributes             = 0;
  private static int numProcessingInstructions = 0;
      
  public static String count(Document doc) {

    numCharacters = 0;
    numComments = 0;
    numElements = 0;
    numAttributes = 0;
    numProcessingInstructions = 0;  

    List children = doc.getContent();
    Iterator iterator = children.iterator();
    while (iterator.hasNext()) {
      Object o = iterator.next();
      if (o instanceof Element) {
        numElements++;
        count((Element) o);
      }
      else if (o instanceof Comment) numComments++;
      else if (o instanceof ProcessingInstruction) {
        numProcessingInstructions++;   
      }
    }
    
    String result = numElements + "\t" + numAttributes + "\t" 
     + numComments + "\t" + numProcessingInstructions + "\t" 
     + numCharacters;
    return result;
       
  }     

  public static void count(Element element) {

    List attributes = element.getAttributes();
    numAttributes += attributes.size();
    List children = element.getContent();
    Iterator iterator = children.iterator();
    while (iterator.hasNext()) {
      Object o = iterator.next();
      if (o instanceof Element) {
        numElements++;
        count((Element) o);
      }
      else if (o instanceof Comment) numComments++;
      else if (o instanceof ProcessingInstruction) {
        numProcessingInstructions++;   
      }
      else if (o instanceof Text) {
        Text t = (Text) o;
        String s = t.getText();
        numCharacters += s.length();
      }   
      else if (o instanceof CDATA) {
        CDATA c = (CDATA) o;
        String s = c.getText();
        numCharacters += s.length();
      }   
    }
        
  }  

}

XCount Output

% java XCount shortlogs.xml hotcop.xml
File    Elements        Attributes      Comments        Processing Instructions
Characters
shortlogs.xml:  30      0       0       0       736
hotcop.xml:     11      8       2       1       95

Handling Attributes in JDOM


The Attribute Class

package org.jdom;

public class Attribute implements Serializable, Cloneable {

    protected String    name;
    protected Namespace namespace;
    protected String    value;
    protected Element   parent;

    protected Attribute() {}
    public    Attribute(String name, String value) {}
    public    Attribute(String name, String value, Namespace namespace) {}

    public String    getName() {}
    public Attribute setName(String name) {}
    public String    getQualifiedName() {}
    public String    getNamespacePrefix() {}
    public String    getNamespaceURI() {}
    public Namespace getNamespace() {}
    public String    getValue() {}
    public Attribute setValue(String value) {}
    protected Attribute setParent(Element parent) {}
    
    public Attribute detach() {}

    /////////////////////////////////////////////////////////////////
    // Basic Utility Methods
    /////////////////////////////////////////////////////////////////

    public final String  toString() {}
    public final boolean equals(Object ob) {}
    public final int     hashCode() {}
    public final Object  clone() {}

    /////////////////////////////////////////////////////////////////
    // Convenience Methods below here
    /////////////////////////////////////////////////////////////////

    public int     getIntValue() throws DataConversionException {}
    public long    getLongValue() throws DataConversionException {}
    public float   getFloatValue() throws DataConversionException {}
    public double  getDoubleValue() throws DataConversionException {}
    public boolean getBooleanValue() throws DataConversionException {}
    
}

XLinkSpider with JDOM

import java.io.*;
import java.util.*;
import org.jdom.*;
import org.jdom.input.SAXBuilder;


public class BasicXLinkSpider {

  private static SAXBuilder builder = new SAXBuilder();
  
  private static Vector visited = new Vector();
  
  private static int maxDepth = 5;
  private static int currentDepth = 0; 
  
  public static void listURIs(String systemID) {
    
    currentDepth++;
    try {
      if (currentDepth < maxDepth) {

        Document document = builder.build(systemID); 
                
        Vector uris = new Vector();
        // search the document for uris, 
        // store them in vector, and print them
        searchForURIs(document.getRootElement(), uris);
    
        Enumeration e = uris.elements();
        while (e.hasMoreElements()) {
          String uri = (String) e.nextElement();
          visited.addElement(uri);
          listURIs(uri); 
        }
      
      }
    
    }
    catch (JDOMException ex) {
      // couldn't load the document, 
      // probably not well-formed XML, skip it 
    }
    catch (IOException ex) {
      // couldn't load the document, 
      // probably broken link, skip it 
    }
    finally { 
      currentDepth--;
      System.out.flush();     
    }
      
  }
  
  private static Namespace xlink 
   = Namespace.getNamespace("http://www.w3.org/1999/xlink");
  
  // use recursion 
  public static void searchForURIs(Element element, Vector uris) {
    
    // look for XLinks in this element
    String uri = element.getAttributeValue("href", xlink);
    if (uri != null && !uri.equals("") 
     && !visited.contains(uri) && !uris.contains(uri)) {
      System.out.println(uri);
      uris.addElement(uri);
    }
    
    // process child elements recursively
    List children = element.getChildren();
    Iterator iterator = children.iterator();
    while (iterator.hasNext()) {
      searchForURIs((Element) iterator.next(), uris); 
    }
    
  }

  public static void main(String[] args) {
    
    if (args.length == 0) {
      System.out.println("Usage: java BasicXLinkSpider URL1 URL2..."); 
    } 
      
    // start parsing... 
    for (int i = 0; i < args.length; i++) {
      System.err.println(args[i]); 
      listURIs(args[i]);
    } // end for
  
  } // end main

} // end BasicXLinkSpider

IDTagger

import java.io.IOException;
import org.jdom.*;
import org.jdom.input.SAXBuilder;
import org.jdom.output.XMLOutputter;
import java.util.*;


public class JDOMIDTagger {

  private static int id = 1;

  public static void processElement(Element element) {

    if (element.getAttribute("ID") == null) {
      element.setAttribute(new Attribute("ID", "_" + id));
      id = id + 1; 
    }
    
    // recursion
    List children = element.getChildren();
    Iterator iterator = children.iterator();
    while (iterator.hasNext()) {
      processElement((Element) iterator.next());   
    }
    
  }

  public static void main(String[] args) {
     
    SAXBuilder builder = new SAXBuilder();
    
    for (int i = 0; i < args.length; i++) {
        
      try {
        // Read the entire document into memory
        Document document = builder.build(args[i]); 
       
        processElement(document.getRootElement());
        
        // now we serialize the document...
        XMLOutputter serializer = new XMLOutputter(); 
        serializer.output(document, System.out);
        System.out.flush();	        
      }
      catch (JDOMException e) {
        System.err.println(e);
        continue; 
      }
      catch (IOException e) {
        System.err.println(e);
        continue; 
      }
      
    }
  
  } // end main

}

Before IDTagger

<?xml version="1.0"?><backslash
xmlns:backslash="http://slashdot.org/backslash.dtd">

 <story>
    <title>The Onion to buy the New York Times</title>
    <url>http://slashdot.org/articles/00/02/19/1128240.shtml</url>
    <time>2000-02-19 17:25:15</time>
    <author>CmdrTaco</author>
    <department>stuff-to-read</department>
    <topic>media</topic>
    <comments>20</comments>
    <section>articles</section>
    <image>topicmedia.gif</image>
  </story>
 <story>
    <title>Al Gore's Webmaster Answers Your Questions</title>
    <url>http://slashdot.org/interviews/00/02/19/0932207.shtml</url>
    <time>2000-02-19 17:00:52</time>
    <author>Roblimo</author>
    <department>political-process-online</department>
    <topic>usa</topic>
    <comments>49</comments>
    <section>interviews</section>
    <image>topicus.gif</image>
  </story>
 <story>
    <title>Open Source Africa</title>
    <url>http://slashdot.org/articles/00/02/19/1016216.shtml</url>
    <time>2000-02-19 16:05:58</time>
    <author>emmett</author>
    <department>songs-by-toto</department>
    <topic>linux</topic>
    <comments>50</comments>
    <section>articles</section>
    <image>topiclinux.gif</image>
  </story>
 <story>
    <title>Microsoft Funded by NSA, Helps Spy on Win Users?</title>
    <url>http://slashdot.org/articles/00/02/19/0750247.shtml</url>
    <time>2000-02-19 14:07:04</time>
    <author>Roblimo</author>
    <department>deep-dark-conspiracy-theories</department>
    <topic>microsoft</topic>
    <comments>154</comments>
    <section>articles</section>
    <image>topicms.gif</image>
  </story>
 <story>
    <title>X-Men Trailer Released</title>
    <url>http://slashdot.org/articles/00/02/18/0829209.shtml</url>
    <time>2000-02-19 13:47:06</time>
    <author>emmett</author>
    <department>mutant</department>
    <topic>movies</topic>
    <comments>70</comments>
    <section>articles</section>
    <image>topicmovies.gif</image>
  </story>
 <story>
    <title>Connell Replies to "Grok" Comments</title>
    <url>http://slashdot.org/articles/00/02/18/202240.shtml</url>
    <time>2000-02-19 05:01:37</time>
    <author>Hemos</author>
    <department>replying-to-things</department>
    <topic>linux</topic>
    <comments>197</comments>
    <section>articles</section>
    <image>topiclinux.gif</image>
  </story>
 <story>
    <title>etoy.com Returns</title>
    <url>http://slashdot.org/yro/00/02/18/1739216.shtml</url>
    <time>2000-02-19 02:35:06</time>
    <author>nik</author>
    <department>NP:-gimme-shelter</department>
    <topic>internet</topic>
    <comments>77</comments>
    <section>yro</section>
    <image>topicinternet.jpg</image>
  </story>
 <story>
    <title>New Propaganda Series: Rebirth</title>
    <url>http://slashdot.org/articles/00/02/18/205232.shtml</url>
    <time>2000-02-19 01:05:26</time>
    <author>Hemos</author>
    <department>as-pretty-as-always</department>
    <topic>graphics</topic>
    <comments>120</comments>
    <section>articles</section>
    <image>topicgraphics3.gif</image>
  </story>
 <story>
    <title>Giving Back</title>
    <url>http://slashdot.org/features/00/02/18/1631224.shtml</url>
    <time>2000-02-18 22:27:26</time>
    <author>emmett</author>
    <department>salvation-army</department>
    <topic>news</topic>
    <comments>122</comments>
    <section>features</section>
    <image>topicnews.gif</image>
  </story>
 <story>
    <title>Connectix Considering Open Sourcing VGS?</title>
    <url>http://slashdot.org/articles/00/02/18/1050225.shtml</url>
    <time>2000-02-18 20:46:20</time>
    <author>emmett</author>
    <department>grain-of-salt</department>
    <topic>news</topic>
    <comments>93</comments>
    <section>articles</section>
    <image>topicnews.gif</image>
  </story>
</backslash>
View Input in Browser

After IDTagger

<?xml version="1.0" encoding="UTF-8"?>
<backslash ID="_1">
  <story ID="_2">
    <title ID="_3">The Onion to buy the New York Times</title>
    <url ID="_4">http://slashdot.org/articles/00/02/19/1128240.shtml</url>
    <time ID="_5">2000-02-19 17:25:15</time>
    <author ID="_6">CmdrTaco</author>
    <department ID="_7">stuff-to-read</department>
    <topic ID="_8">media</topic>
    <comments ID="_9">20</comments>
    <section ID="_10">articles</section>
    <image ID="_11">topicmedia.gif</image>
  </story>
  <story ID="_12">
    <title ID="_13">Al Gore's Webmaster Answers Your Questions</title>
    <url ID="_14">http://slashdot.org/interviews/00/02/19/0932207.shtml</url>
    <time ID="_15">2000-02-19 17:00:52</time>
    <author ID="_16">Roblimo</author>
    <department ID="_17">political-process-online</department>
    <topic ID="_18">usa</topic>
    <comments ID="_19">49</comments>
    <section ID="_20">interviews</section>
    <image ID="_21">topicus.gif</image>
  </story>
  <story ID="_22">
    <title ID="_23">Open Source Africa</title>
    <url ID="_24">http://slashdot.org/articles/00/02/19/1016216.shtml</url>
    <time ID="_25">2000-02-19 16:05:58</time>
    <author ID="_26">emmett</author>
    <department ID="_27">songs-by-toto</department>
    <topic ID="_28">linux</topic>
    <comments ID="_29">50</comments>
    <section ID="_30">articles</section>
    <image ID="_31">topiclinux.gif</image>
  </story>
  <story ID="_32">
    <title ID="_33">Microsoft Funded by NSA, Helps Spy on Win Users?</title>
    <url ID="_34">http://slashdot.org/articles/00/02/19/0750247.shtml</url>
    <time ID="_35">2000-02-19 14:07:04</time>
    <author ID="_36">Roblimo</author>
    <department ID="_37">deep-dark-conspiracy-theories</department>
    <topic ID="_38">microsoft</topic>
    <comments ID="_39">154</comments>
    <section ID="_40">articles</section>
    <image ID="_41">topicms.gif</image>
  </story>
  <story ID="_42">
    <title ID="_43">X-Men Trailer Released</title>
    <url ID="_44">http://slashdot.org/articles/00/02/18/0829209.shtml</url>
    <time ID="_45">2000-02-19 13:47:06</time>
    <author ID="_46">emmett</author>
    <department ID="_47">mutant</department>
    <topic ID="_48">movies</topic>
    <comments ID="_49">70</comments>
    <section ID="_50">articles</section>
    <image ID="_51">topicmovies.gif</image>
  </story>
  <story ID="_52">
    <title ID="_53">Connell Replies to "Grok" Comments</title>
    <url ID="_54">http://slashdot.org/articles/00/02/18/202240.shtml</url>
    <time ID="_55">2000-02-19 05:01:37</time>
    <author ID="_56">Hemos</author>
    <department ID="_57">replying-to-things</department>
    <topic ID="_58">linux</topic>
    <comments ID="_59">197</comments>
    <section ID="_60">articles</section>
    <image ID="_61">topiclinux.gif</image>
  </story>
  <story ID="_62">
    <title ID="_63">etoy.com Returns</title>
    <url ID="_64">http://slashdot.org/yro/00/02/18/1739216.shtml</url>
    <time ID="_65">2000-02-19 02:35:06</time>
    <author ID="_66">nik</author>
    <department ID="_67">NP:-gimme-shelter</department>
    <topic ID="_68">internet</topic>
    <comments ID="_69">77</comments>
    <section ID="_70">yro</section>
    <image ID="_71">topicinternet.jpg</image>
  </story>
  <story ID="_72">
    <title ID="_73">New Propaganda Series: Rebirth</title>
    <url ID="_74">http://slashdot.org/articles/00/02/18/205232.shtml</url>
    <time ID="_75">2000-02-19 01:05:26</time>
    <author ID="_76">Hemos</author>
    <department ID="_77">as-pretty-as-always</department>
    <topic ID="_78">graphics</topic>
    <comments ID="_79">120</comments>
    <section ID="_80">articles</section>
    <image ID="_81">topicgraphics3.gif</image>
  </story>
  <story ID="_82">
    <title ID="_83">Giving Back</title>
    <url ID="_84">http://slashdot.org/features/00/02/18/1631224.shtml</url>
    <time ID="_85">2000-02-18 22:27:26</time>
    <author ID="_86">emmett</author>
    <department ID="_87">salvation-army</department>
    <topic ID="_88">news</topic>
    <comments ID="_89">122</comments>
    <section ID="_90">features</section>
    <image ID="_91">topicnews.gif</image>
  </story>
  <story ID="_92">
    <title ID="_93">Connectix Considering Open Sourcing VGS?</title>
    <url ID="_94">http://slashdot.org/articles/00/02/18/1050225.shtml</url>
    <time ID="_95">2000-02-18 20:46:20</time>
    <author ID="_96">emmett</author>
    <department ID="_97">grain-of-salt</department>
    <topic ID="_98">news</topic>
    <comments ID="_99">93</comments>
    <section ID="_100">articles</section>
    <image ID="_101">topicnews.gif</image>
  </story>
</backslash>
View Output in Browser

Handling Entities in JDOM


The EntityRef Class

package org.jdom;

public class EntityRef implements Serializable, Cloneable {

    protected String   name;
    protected String   publicID;
    protected String   systemID;
    protected Element  parent;
    protected Document document;

    protected EntityRef() {}
    public EntityRef(String name) {}
    public EntityRef(String name, String publicID, String systemID) {}
    
    public EntityRef detach() {}
    
    public Document  getDocument() {}
    public String    getName() {}
    public Element   getParent() {}
    public String    getPublicID()  {}
    public String    getSystemID() {}

    protected EntityRef setParent(Element parent) {}
    public    EntityRef setName(String newPublicID) {}
    public    EntityRef setPublicID(String newPublicID) {}
    public    EntityRef setSystemID(String newSystemID) {}

    public Object clone() {}
    public final boolean equals(Object o) {}
    public final int hashCode() {}
    public String toString() {}
    
}

Handling Comments in JDOM


The Comment Class

package org.jdom;

public class Comment implements Serializable, Cloneable {

    protected String text;

    protected Comment() {}
    public    Comment(String text) {}
    
    public String     getText() {}
    public void       setText(String text) {}
    public Comment    detach() {}
    public Document   getDocument() {}
    protected Comment setDocument(Document document) {}
    public Element    getParent() {}
    protected Comment setParent(Element parent){}
    
    public final String  toString() {}
    public final boolean equals(Object ob) {}
    public final int     hashCode() {}
    public final Object  clone() {}

}

Comment Example

import org.jdom.*;
import org.jdom.input.SAXBuilder;
import java.util.*;


public class CommentReader {

  public static void main(String[] args) {
     
    SAXBuilder builder = new SAXBuilder();
    
    for (int i = 0; i < args.length; i++) {
      try {
        // Read the entire document into memory
        Document doc = builder.build(args[i]);
        List content = doc.getContent();
        Iterator iterator = content.iterator();
        while (iterator.hasNext()) {
          Object o = iterator.next();
          if (o instanceof Comment) {
            Comment c = (Comment) o;
            System.out.println(c.getText());     
            System.out.println();     
          }
          else if (o instanceof Element) {
            processElement((Element) o);   
          }
        }
      }
      catch (JDOMException e) {
        System.err.println(e); 
        e.getCause().printStackTrace(); 
      }
      
    }
  
  } // end main

  // note use of recursion
  public static void processElement(Element element) {
    
    List content = element.getContent();
    Iterator iterator = content.iterator();
    while (iterator.hasNext()) {
      Object o = iterator.next();
      if (o instanceof Comment) {
        Comment c = (Comment) o;
        System.out.println(c.getText());     
        System.out.println();     
      }
      else if (o instanceof Element) {
        processElement((Element) o);   
      }
    } // end while
    
  }

}

CommentReader Output

% java CommentReader hotcop.xml
 The publisher is actually Polygram but I needed
       an example of a general entity reference.

 You can tell what album I was
     listening to when I wrote this example

Or try http://www.w3.org/TR/1998/REC-xml-19980210.xml for more interesting output.


ProcessingInstruction Nodes


The ProcessingInstruction Class

package org.jdom;

public class ProcessingInstruction implements Serializable, Cloneable {

    protected String   target;
    protected String   rawData;
    protected Map      mapData;
    protected Element  parent;
    
    protected ProcessingInstruction() {}
    public    ProcessingInstruction(String target, Map data) {}
    public    ProcessingInstruction(String target, String data) {}
    
    public String                getTarget() {}
    public String                getData() {}
    public ProcessingInstruction setData(String data) {}
    public ProcessingInstruction setData(Map data) {}
    public String                getValue(String name) {}
    public ProcessingInstruction setValue(String name, String value) {}
    public boolean               removeValue(String name) {}

    public    Document              getDocument() {}
    protected ProcessingInstruction setDocument(Document document) {}
    public    Element               getParent() {}
    protected ProcessingInstruction setParent(Element parent){}
    public ProcessingInstruction detach()

    public final String  toString() {}
    public final boolean equals(Object ob) {}
    public final int     hashCode() {}
    public final Object  clone() {}
    
}

XLinkSpider that Respects the robots Processing Instruction

import java.io.*;
import java.util.*;
import org.jdom.*;
import org.jdom.input.SAXBuilder;


public class AdvancedSpider {

  private static SAXBuilder builder = new SAXBuilder();
  
  private static Vector visited = new Vector();
  
  private static int maxDepth = 5;
  private static int currentDepth = 0; 
  
  public static void listURIs(String systemID) {
    
    currentDepth++;
    try {
      if (currentDepth < maxDepth) {

        Document document = builder.build(systemID); 
                
        // check to see if we're allowed to spider
        boolean index = true;
        boolean follow = true;
        ProcessingInstruction robots = findRobots(document);
        if (robots != null) {
          String indexValue = robots.getValue("index");
          if (indexValue.equalsIgnoreCase("no")) index = false;
          String followValue = robots.getValue("follow");
          if (followValue.equalsIgnoreCase("no")) follow = false;
        }
        Vector uris = new Vector();
        // search the document for uris, 
        // store them in vector, and print them
        if (follow) searchForURIs(document.getRootElement(), uris);
    
        Enumeration e = uris.elements();
        while (e.hasMoreElements()) {
          String uri = (String) e.nextElement();
          visited.addElement(uri);
          if (index) listURIs(uri); 
        }
      
      }
    
    }
    catch (JDOMException e) {
      // couldn't load the document, 
      // probably not well-formed XML, skip it 
    }
    catch (IOException ex) {
      // couldn't load the document, 
      // probably broken link, skip it 
    }
    finally { 
      currentDepth--;
      System.out.flush();     
    }
      
  }
  
  private static ProcessingInstruction findRobots(Document doc) {
    List content = doc.getContent();
    Iterator children = content.iterator();
    while (children.hasNext()) {
       Object o = children.next(); 
       if (o instanceof Element) return null; 
       if (o instanceof ProcessingInstruction) {
          ProcessingInstruction candidate = (ProcessingInstruction) o; 
          if (candidate.getTarget().equals("robots")) return candidate;
       }
    }
    
    return null;
  }
  
  private static Namespace xlink 
   = Namespace.getNamespace("http://www.w3.org/1999/xlink");
  
  // use recursion 
  public static void searchForURIs(Element element, Vector uris) {
    
    // look for XLinks in this element
    String uri = element.getAttributeValue("href", xlink);
    if (uri != null && !uri.equals("") 
     && !visited.contains(uri) && !uris.contains(uri)) {
      System.out.println(uri);
      uris.addElement(uri);
    }
    
    // process child elements recursively
    List children = element.getChildren();
    Iterator iterator = children.iterator();
    while (iterator.hasNext()) {
      searchForURIs((Element) iterator.next(), uris); 
    }
    
  }

  public static void main(String[] args) {
    
    if (args.length == 0) {
      System.out.println("Usage: java AdvancedSpider URL1 URL2..."); 
    } 
      
    // start parsing... 
    for (int i = 0; i < args.length; i++) {
      System.err.println(args[i]); 
      listURIs(args[i]);
    } // end for
  
  } // end main

} // end AdvancedSpider

Handling Namespaces


The Namespace Class


The Namespace Class

package org.jdom;

public final class Namespace {

  public static final Namespace NO_NAMESPACE = new Namespace("", "");
  public static final Namespace XML_NAMESPACE = 
   new Namespace("xml", "http://www.w3.org/XML/1998/namespace");

  // factory methods
  public static Namespace getNamespace(String prefix, String uri) {}
  public static Namespace getNamespace(String uri) {}

  // getter methods
  public String  getPrefix() {}
  public String  getURI() {}

  // utility methods
  public boolean equals(Object ob) {}
  public String  toString() {}
  public int     hashCode() {}

}

DocType Nodes


The DocType class

package org.jdom;

public class DocType implements Serializable, Cloneable {

  protected String   elementName;
  protected String   publicID;
  protected String   systemID;
  protected Document document;
  protected String   internalSubset;

  protected DocType() {}
  public DocType(String elementName, String publicID, 
   String systemID) {}
  public DocType(String elementName, String systemID) {}
  public DocType(String elementName) {}

  public String   getElementName() {}
  public DocType  setElementName(String elementName) {}
  public String   getPublicID() {}
  public DocType  setPublicID(String publicID) {}
  public String   getSystemID() {}
  public DocType  setSystemID(String systemID) {}
  public Document getDocument() {}
  public void     setInternalSubset(String newData) {}

  protected DocType setDocument(Document document) {}

  public String getInternalSubset() {}

  public String toString() {}
  public final boolean equals(Object o) {}
  public final int hashCode() {}

  public Object clone() {}

}

Example of the DocType Class


XHTMLValidator

import java.io.*;
import org.jdom.*;
import org.jdom.input.SAXBuilder;


public class JDOMXHTMLValidator {

  public static void main(String[] args) {
    
    for (int i = 0; i < args.length; i++) {
      validate(args[i]);
    }   
    
  }

  private static SAXBuilder builder = new SAXBuilder(true);
                                                 /*  ^^^^ */
                                              /* turn on validation  */
  
  // not thread safe
  public static void validate(String source) {
        
      Document document;
      try {
        document = builder.build(source); 
      }
      catch (JDOMException e) {  
        System.out.println("Error: " + e.getMessage()); 
        e.printStackTrace();
        return; 
      }
      
      // If we get this far, then the document is valid XML.
      // Check to see whether the document is actually XHTML        
      DocType doctype = document.getDocType();
    
      if (doctype == null) {
        System.out.println("No DOCTYPE"); 
        return;
      }

      String name     = doctype.getElementName();
      String systemID = doctype.getSystemID();
      String publicID = doctype.getPublicID();
      
      if (!name.equals("html")) {
        System.out.println("Incorrect root element name " + name); 
      }
    
      if (publicID == null
       || (!publicID.equals("-//W3C//DTD XHTML 1.0 Strict//EN")
           && !publicID.equals("-//W3C//DTD XHTML 1.0 Transitional//EN")
           && !publicID.equals("-//W3C//DTD XHTML 1.0 Frameset//EN"))) {
        System.out.println(source + " does not seem to use an XHTML 1.0 DTD");
      }
    
      // Check the namespace on the root element
      Element root = document.getRootElement();
      Namespace namespace = root.getNamespace();
      String prefix = namespace.getPrefix();
      String uri = namespace.getURI();
      if (!uri.equals("http://www.w3.org/1999/xhtml")) {
        System.out.println(source 
         + " does not properly declare the"
         + " http://www.w3.org/1999/xhtml namespace"
         + " on the root element");        
      }
      if (!prefix.equals("")) {
        System.out.println(source 
         + " does not use the empty prefix for XHTML");        
      }
    
  }

}

Using the XHTMLValidator

% java JDOMXHTMLValidator http://www.w3.org/TR/xhtml1
Error: File "http://www.w3.org/TR/DTD/xhtml1-strict.dtd" not found.: Error on 
line -1 of XML document: File "http://www.w3.org/TR/DTD/xhtml1-strict.dtd" not 
found.
org.jdom.JDOMException: File "http://www.w3.org/TR/DTD/xhtml1-strict.dtd" not 
found.: Error on line -1 of XML document: File 
"http://www.w3.org/TR/DTD/xhtml1-strict.dtd" not found.
        at org.jdom.input.SAXBuilder.build(SAXBuilder.java:227)
        at org.jdom.input.SAXBuilder.build(SAXBuilder.java:359)
        at XHTMLValidator.validate(XHTMLValidator.java:25)
        at XHTMLValidator.main(XHTMLValidator.java:11)
Root cause: org.jdom.JDOMException: Error on line -1 of XML document: File 
"http://www.w3.org/TR/DTD/xhtml1-strict.dtd" not found.
        at org.jdom.input.SAXBuilder.build(SAXBuilder.java:228)
        at org.jdom.input.SAXBuilder.build(SAXBuilder.java:359)
        at XHTMLValidator.validate(XHTMLValidator.java:25)
        at XHTMLValidator.main(XHTMLValidator.java:11)

The Verifier Class


The Verifier Class

package org.jdom;

public final class Verifier {

    public static final String checkElementName(String name) {}
    public static final String checkAttributeName(String name) {}
    public static final String checkCharacterData(String text) {}
    public static final String checkNamespacePrefix(String prefix) {}
    public static final String checkNamespaceURI(String uri) {}
    public static final String checkProcessingInstructionTarget(String target) {}
    public static final String checkCommentData(String data) {}
 
    public static boolean isXMLCharacter(char c) {}
    public static boolean isXMLNameCharacter(char c) {}
    public static boolean isXMLNameStartCharacter(char c) {}
    public static boolean isXMLLetterOrDigit(char c) {}
    public static boolean isXMLLetter(char c) {}
    public static boolean isXMLCombiningChar(char c) {}
    public static boolean isXMLExtender(char c) {}
    public static boolean isXMLDigit(char c) {}

    public static final String checkNamespaceCollision(
     Namespace namespace, Namespace other) {}
    public static final String checkNamespaceCollision(
     Attribute attribute, Namespace other) {}
    public static final String checkNamespaceCollision(
     Namespace namespace, Element element) {}
    public static final String checkNamespaceCollision(
     Namespace namespace, Attribute attribute) {}
    public static final String checkNamespaceCollision(
     Namespace namespace, List list) {}

}

JDOMException


JDOMException Class

package org.jdom;

public class JDOMException extends Exception {

    protected Throwable cause;

    public JDOMException() {}
    public JDOMException(String message)  {}
    public JDOMException(String message, Throwable rootCause)  {} 
       
    public String    getMessage() {}
    public void      printStackTrace() {}
    public void      printStackTrace(PrintStream s) {}
    public void      printStackTrace(PrintWriter w) {}
    public Throwable getCause()  {}

}

The org.jdom.output Package


Serialization


XMLOutputter

package org.jdom.output;

public class XMLOutputter implements Cloneable {

    public XMLOutputter() {}
    public XMLOutputter(String indent) {}
    public XMLOutputter(String indent, boolean newlines) {}
    public XMLOutputter(String indent, boolean newlines, String encoding) {}
    public XMLOutputter(XMLOutputter that) {}
    
    public void setLineSeparator(String separator) {}
    public void setNewlines(boolean newlines) {}
    public void setEncoding(String encoding) {}
    public void setOmitEncoding(boolean omitEncoding) {}
    public void setOmitDeclaration(boolean omitDeclaration) {}
    public void setExpandEmptyElements(boolean expandEmptyElements) {}
    public void setIndent(String indent) {}
    
    public void setTrimAllWhite(boolean trimAllWhite) {}
    public void setTextTrim(boolean textTrim) {}
    public void setTextNormalize(boolean textNormalize)

    protected String escapeAttributeEntities(String s) {} 
    protected String escapeElementEntities(String s) {}

    protected void indent(Writer out, int level) throws IOException {}
    protected Writer makeWriter(OutputStream out) 
     throws java.io.UnsupportedEncodingException {}
    protected Writer makeWriter(OutputStream out, String encoding) 
     throws java.io.UnsupportedEncodingException {}
    protected XMLOutputter.NamespaceStack createNamespaceStack() {}

    public void output(Document doc, OutputStream out) throws IOException {}
    public void output(Document doc, Writer writer) throws IOException {}
    public void output(Element element, Writer out) throws IOException {}
    public void output(Element element, OutputStream out) {}
    public void output(CDATA cdata, Writer out) throws IOException {}
    public void output(CDATA cdata, OutputStream out) throws IOException {}
    public void output(Comment comment, Writer out) throws IOException {}
    public void output(Comment comment, OutputStream out) throws IOException {}
    public void output(EntityRef entity, Writer out) throws IOException {}
    public void output(EntityRef entity, OutputStream out) throws IOException {}
    public void output(ProcessingInstruction processingInstruction, Writer out)
      throws IOException {}
    public void output(ProcessingInstruction processingInstruction, OutputStream out)
     throws IOException {}
    public void output(Text text, OutputStream out) throws IOException {}
    public void output(Text text, Writer out) throws IOException {}
     
    public void outputElementContent(Element element, OutputStream out)
    public void outputElementContent(Element element, Writer out)

    public String outputString(Document doc) throws IOException {}
    public String outputString(Element element) throws IOException {}
    public String outputString(CDATA cdata) {}
    public String outputString(Comment comment) {}
    public String outputString(DocType doctype) {}
    public String outputString(EntityRef entity) {}
    public String outputString(ProcessingInstruction pi) {}
    public String outputString(Text text) {}

    // internal printing methods
    protected void printDeclaration(Document doc, Writer out, String encoding) 
     throws IOException {}    
    protected void printDocType(DocType docType, Writer out) throws IOException {}
    protected void printComment(Comment comment, Writer out, int indentLevel) 
     throws IOException {}
    protected void printProcessingInstruction(ProcessingInstruction pi,
     Writer out) throws IOException {}
    protected void printCDATA(CDATA cdata, Writer out, int indentLevel) 
     throws IOException {}
    protected void printText(Text text, Writer out) throws IOException {}
    protected void printElement(Element element, Writer out,
     int indentLevel, NamespaceStack namespaces) throws IOException {}
    protected void printString(String s, Writer out) throws IOException {}
    protected void printEntity(Entity entity, Writer out) throws IOException {}
    protected void printNamespace(Namespace ns, Writer out) throws IOException {}
    protected void printAttributes(List attributes, Element parent, 
     Writer out, NamespaceStack namespaces)  
     throws IOException {}
    
    public int parseArgs(String[] args, int i) {} 
    
}

Using the XMLOutputter Class Directly


Using the XMLOutputter Class Indirectly


JDOM based TagStripper

import org.jdom.*;
import org.jdom.output.XMLOutputter;
import org.jdom.input.SAXBuilder;
import java.io.*;
import java.util.*;


public class JDOMTagStripper extends XMLOutputter {

  public JDOMTagStripper() {
    super();
  }

  // Things we won't print at all
  protected void printDeclaration(Document doc, Writer out, String encoding) {}
  protected void printComment(Comment comment, Writer out, int indentLevel) {}
  protected void printDocType(DocType docType, Writer out) {}
  protected void printProcessingInstruction(ProcessingInstruction pi, 
   Writer out) {}
  protected void printNamespace(Namespace ns, Writer out) {}
  protected void printAttributes(List attributes, Writer out) {}
  
  protected void printElement(Element element, Writer out, 
   int indentLevel, NamespaceStack namespaces) throws IOException {
    
    List content = element.getContent();
    Iterator iterator = content.iterator();
    while (iterator.hasNext()) {
      Object o = iterator.next();
      if (o instanceof Text) {
        Text t = (Text) o;
        out.write(t.getText());
      }
      else if (o instanceof CDATA) {
        CDATA t = (CDATA) o;
        out.write(t.getText());
      }
      else if (o instanceof Element) {
        printElement((Element) o, out, indentLevel, namespaces);
      }
    }
          
  }

  // Could easily have put main() method in a separate class
  public static void main(String[] args) {
     
    if (args.length == 0) {
      System.out.println(
       "Usage: java TagStripper URL1 URL2..."); 
    } 
      
    JDOMTagStripper stripper = new JDOMTagStripper();
    SAXBuilder builder = new SAXBuilder();
    
    // start parsing... 
    for (int i = 0; i < args.length; i++) {
      
      // command line should offer URIs or file names
      try {
        Document doc = builder.build(args[i]);
        stripper.output(doc, System.out);
      }
      catch (JDOMException e) { // a well-formedness error
        System.out.println(args[i] + " is not well formed.");
        System.out.println(e.getMessage());
      }
      catch (IOException e) { // a well-formedness error
        System.out.println(e.getMessage());
      }
      
    }  
  
  }

}

Output from a JDOM based TagStripper

% java TagStripper hotcop.xml

 Hot Cop

 Jacques Morali
 Henri Belolo
 Victor Willis
 Jacques Morali


   A & M Records

 6:20
 1978
 Village People


Talking to DOM Programs


Talking to SAX Programs


What JDOM doesn't do


To Learn More


Part VI: Pull Parsing

pull parsing is the way to go in the future. The first 3 XML parsers (Lark, NXP, and expat) all were event-driven because... er well that was 1996, can't exactly remember, seemed like a good idea at the time.

--Tim Bray on the xml-dev mailing list, Wednesday, September 18, 2002


Pull Parsing is


Pull APIs


XMLPULL


Only Three Classes:

XmlPullParser:
an abstract class that represents the parser
XmlPullParserFactory:
the factory class that instantiates an implementation dependent subclass of XmlPullParser
XmlPullException:
the generic class for everything other than an IOException that might go wrong when parsing an XML document, particularly well-formedness errors and tokens that don't have the expected type
XmlSerializer:
Under development; planned for 1.2

Simple Wellformedness Checker

import org.xmlpull.v1.*;
import java.net.*;
import java.io.*;

 
public class PullChecker {

  public static void main(String[] args) {
        
    if (args.length == 0) {
      System.err.println("Usage: java PullChecker url" );
      return;   
    }
        
    try {
      XmlPullParserFactory factory = XmlPullParserFactory.newInstance();
      XmlPullParser parser = factory.newPullParser();

      InputStream in;
      try {
        URL u = new URL(args[0]);
        in = u.openStream();
      }
      catch (MalformedURLException ex) {
          // Maybe it's a file name
          in = new FileInputStream(args[0]);
      }
      parser.setInput(in, null);
        
      while (parser.next() != XmlPullParser.END_DOCUMENT) {
        // reading the document...   
      }
            
      // If we get here there are no exceptions
      System.out.println(args[0] + " is well-formed");      
    }
    catch (XmlPullParserException ex) {
       System.out.println(args[0] + " is not well-formed"); 
       System.out.println(ex);  
    }
    catch (IOException ex) {
      System.out.println(args[0] + " could not be checked due to an " 
       + ex.getClass().getName());   
      ex.printStackTrace();      
    }
        
  }

}

Output from a Simple Wellformedness Checker

% java PullChecker http://www.rddl.org/
http://www.rddl.org/ is well-formed
% java PullChecker http://www.cafeconleche.org/
http://www.cafeconleche.org/ is well-formed
% java PullChecker http://www.cafeaulait.org
http://www.cafeaulait.org is not well-formed
org.xmlpull.v1.XmlPullParserException: attribute value must start with quotation or 
apostrophe not j (position: TEXT seen ...rogramming, Javabeans, 
\r\nnetwork programming">\r\n<script language=j... @16:19) 

Event Codes


Listening to Events

import org.xmlpull.v1.*;
import java.net.*;
import java.io.*;

 
public class EventLister {

  public static void main(String[] args) {
        
    if (args.length == 0) {
      System.err.println("Usage: java EventLister url" );
     return;    
    }
        
    try {
      XmlPullParserFactory factory = XmlPullParserFactory.newInstance();
      XmlPullParser parser = factory.newPullParser();

      InputStream in;
      try {
        URL u = new URL(args[0]);
        in = u.openStream();
      }
      catch (MalformedURLException ex) {
          // Maybe it's a file name
          in = new FileInputStream(args[0]);
      }
      parser.setInput(in, null);
        
      while (true) {
         int event = parser.nextToken();
         if (event == XmlPullParser.START_TAG) {
             System.out.println("Start tag");
         }
         else if (event == XmlPullParser.END_TAG) {
             System.out.println("End tag");
         }
         else if (event == XmlPullParser.START_DOCUMENT) {
             System.out.println("Start document");
         }
         else if (event == XmlPullParser.TEXT) {
             System.out.println("Text");
         }
         else if (event == XmlPullParser.CDSECT) {
             System.out.println("CDATA Section");
         }
         else if (event == XmlPullParser.COMMENT) {
             System.out.println("Comment");
         }
         else if (event == XmlPullParser.DOCDECL) {
             System.out.println("Document type declaration");
         }
         else if (event == XmlPullParser.ENTITY_REF) {
             System.out.println("Entity Reference");
         }
         else if (event == XmlPullParser.IGNORABLE_WHITESPACE) {
             System.out.println("Ignorable white space");
         }
         else if (event == XmlPullParser.PROCESSING_INSTRUCTION) {
             System.out.println("Processing Instruction");
         }
         else if (event == XmlPullParser.END_DOCUMENT) {
             System.out.println("End Document");
             break;
         }
      }           
    }
    catch (XmlPullParserException ex) {
       System.out.println(ex);  
    }
    catch (IOException e) {
      System.out.println("IOException while parsing " + args[0]);   
    }
        
  }

}

Output from EventLister

~/speaking/oop2003/xmlandjava/examples% java EventLister hotcop.xml
Ignorable white space
Processing Instruction
Ignorable white space
Document type declaration
Ignorable white space
Start tag
Text
Start tag
Text
End tag
Text
Start tag
End tag
Text
Start tag
Text
End tag
Text
Start tag
Text
End tag
Text
Start tag
Text
End tag
Text
Start tag
Text
End tag
Text
Comment
Text
Start tag
Text
Entity Reference
Text
End tag
Text
Start tag
Text
End tag
Text
Start tag
Text
End tag
Text
Start tag
Text
End tag
Text
End tag
Ignorable white space
Comment
Ignorable white space
End Document

getText()

The getText() method returns the text of the current event:

public String getText()

Exactly what this is depends on the type of the event:


getText() Example

import org.xmlpull.v1.*;
import java.net.*;
import java.io.*;

 
public class EventText {

  public static void main(String[] args) {
		
    if (args.length == 0) {
      System.err.println("Usage: java EventText url" );
	 return;	
    }
		
    try {
      XmlPullParserFactory factory = XmlPullParserFactory.newInstance();
      factory.setNamespaceAware(true);
      XmlPullParser parser = factory.newPullParser();

      
      InputStream in;
      try {
        URL u = new URL(args[0]);
        in = u.openStream();
      }
      catch (MalformedURLException ex) {
          // Maybe it's a file name
          in = new FileInputStream(args[0]);
      }
      parser.setInput(in, null);
        
      while (true) {
  	     int event = parser.nextToken();
 	     if (event == XmlPullParser.START_TAG) {
             System.out.println("Start-tag: " + parser.getText()) ;
    	 }
         else if (event == XmlPullParser.END_TAG) {
             System.out.println("End-tag: " + parser.getText());
         }
         else if (event == XmlPullParser.START_DOCUMENT) {
             System.out.println("Start document: "  + parser.getText());
         }
         else if (event == XmlPullParser.TEXT) {
             System.out.println("Text: " + parser.getText());
         }
         else if (event == XmlPullParser.CDSECT) {
             System.out.println("CDATA Section: " + parser.getText());
         }
         else if (event == XmlPullParser.COMMENT) {
             System.out.println("Comment: " + parser.getText());
         }
         else if (event == XmlPullParser.DOCDECL) {
             System.out.println("Document type declaration: " + parser.getText());
         }
         else if (event == XmlPullParser.ENTITY_REF) {
             System.out.println("Entity Reference: " + parser.getText());
         }
         else if (event == XmlPullParser.IGNORABLE_WHITESPACE) {
             System.out.println("Ignorable white space: " + parser.getText());
         }
         else if (event == XmlPullParser.PROCESSING_INSTRUCTION) {
             System.out.println("Processing Instruction: " + parser.getText());
         }
  	     else if (event == XmlPullParser.END_DOCUMENT) {
             System.out.println("End Document: " + parser.getText());
             break;
         } // end else if
      }  // end while
    } // end try
    catch (XmlPullParserException ex) {
       System.out.println(ex);	
    }
    catch (IOException e) {
      System.out.println("IOException while parsing " + args[0]);	
    }
		
  }
 
}

Things to note


Names

If the event is a tag, then the following methods in XmlPullParser also work:

public String getName()
public String getNamespace()
public String getPrefix()

Names Example

import org.xmlpull.v1.*;
import java.net.*;
import java.io.*;

 
public class NamePrinter {

  public static void main(String[] args) {
        
    if (args.length == 0) {
      System.err.println("Usage: java NamePrinter url" );
      return;   
    }
        
    try {
      XmlPullParserFactory factory = XmlPullParserFactory.newInstance();
      factory.setNamespaceAware(true);
      XmlPullParser parser = factory.newPullParser();
      
      InputStream in;
      try {
        URL u = new URL(args[0]);
        in = u.openStream();
      }
      catch (MalformedURLException ex) {
          // Maybe it's a file name
          in = new FileInputStream(args[0]);
      }
      parser.setInput(in, null);
        
      while (true) {
         int event = parser.nextToken();
         if (event == XmlPullParser.START_TAG) {
             System.out.println("Start tag: ");
             printEvent(parser);
         }
         else if (event == XmlPullParser.END_TAG) {
             System.out.println("End tag");
             printEvent(parser);
         }
         else if (event == XmlPullParser.START_DOCUMENT) {
             System.out.println("Start document");
         }
         else if (event == XmlPullParser.TEXT) {
             System.out.println("Text");
             printEvent(parser);
         }
         else if (event == XmlPullParser.CDSECT) {
             System.out.println("CDATA Section");
             printEvent(parser);
         }
         else if (event == XmlPullParser.COMMENT) {
             System.out.println("Comment");
             printEvent(parser);
         }
         else if (event == XmlPullParser.DOCDECL) {
             System.out.println("Document type declaration");
             printEvent(parser);
         }
         else if (event == XmlPullParser.ENTITY_REF) {
             System.out.println("Entity Reference");
             printEvent(parser);
         }
         else if (event == XmlPullParser.IGNORABLE_WHITESPACE) {
             System.out.println("Ignorable white space");
             printEvent(parser);
         }
         else if (event == XmlPullParser.PROCESSING_INSTRUCTION) {
             System.out.println("Processing Instruction");
             printEvent(parser);
         }
         else if (event == XmlPullParser.END_DOCUMENT) {
             System.out.println("End Document");
             break;
         } // end else if
      }  // end while
    }
    catch (XmlPullParserException ex) {
       System.out.println(ex);  
    }
    catch (IOException ex) {
      System.out.println("IOException while parsing " + args[0]);   
      ex.printStackTrace();
    }
        
  }
  
  private static void printEvent(XmlPullParser parser) {
      String localName = parser.getName();
      String prefix = parser.getPrefix();
      String uri = parser.getNamespace();
      
      if (localName != null) System.out.println("\tName: " + localName);
      if (prefix != null) System.out.println("\tPrefix: " + prefix);
      if (uri != null) System.out.println("\tNamespace URI: " + uri);
      System.out.println();
  }

}

The next() method


next() Example

List all the titles in an RSS 0.91 document:

<?xml version="1.0" encoding="iso-8859-1" ?>
<!-- generator="HPE/1.0" -->
<!-- Copyright (C) 2000-2002 News Is Free. Terms Of Service http://www.newsisfree.com/termsofservice.php -->

<rss version="0.91">
<channel>
<title>Ananova: <!-- interrupting comment -->Archeology</title>
<link>http://www.ananova.com/news/index.html?keywords=Archaeology&amp;menu=news.scienceanddiscovery.archaeology</link>
<description>Ananova: News on the move from the leading site for breaking 
UK and world news, sport, entertainment, business and weather stories and information. 
(By http://www.newsisfree.com/syndicate.php 
- FOR PERSONAL AND NON COMMERCIAL USE ONLY!)</description>
<language>en</language>
<webMaster>mkrus@newsisfree.com</webMaster>

<lastBuildDate>11/05/02 22:16 CET</lastBuildDate>
<image>
  <link>http://www.newsisfree.com/sources/info/3389/</link>
  <url>http://www.newsisfree.com/HPE/Images/button.gif</url>
  <title>Powered by News Is Free</title><width>88</width>
  <height>31</height>
</image>

<item>
<title>Britain's earliest leprosy victim may have been found</title>
<link>http://www.newsisfree.com/click/-2,9782455,3389/</link>
</item>
<item>
<title>20th anniversary of Mary Rose recovery</title>

<link>http://www.newsisfree.com/click/-2,9773139,3389/</link>
</item>
<item>
<title>'Proof of Jesus' burial box damaged on way to Canada</title>
<link>http://www.newsisfree.com/click/-6,9663454,3389/</link>
</item>
<item>
<title>Remains of four woolly rhinos give new insight into Ice Age</title>
<link>http://www.newsisfree.com/click/-4,9533904,3389/</link>
</item>
<item>
<title>Experts solve crop lines mystery</title>

<link>http://www.newsisfree.com/click/-5,9352720,3389/</link>
</item>
</channel>
</rss>

RSSLister

import org.xmlpull.v1.*;
import java.net.*;
import java.io.*;

 
public class RSSTitles {

  public static void main(String[] args) {
        
    if (args.length == 0) {
      System.err.println("Usage: java RSSTitles url" );
      return;   
    }
        
    try {
      XmlPullParserFactory factory = XmlPullParserFactory.newInstance();
      XmlPullParser parser = factory.newPullParser();
      
      InputStream in;
      try {
        URL u = new URL(args[0]);
        in = u.openStream();
      }
      catch (MalformedURLException ex) {
          // Maybe it's a file name
          in = new FileInputStream(args[0]);
      }
      parser.setInput(in, null);
      
      boolean printing = false;
      while (true) {
         int event = parser.next();
         if (event == XmlPullParser.START_TAG) {
             String name = parser.getName();
             if (name.equals("title")) printing = true;
         }
         else if (event == XmlPullParser.END_TAG) {
             String name = parser.getName();
             if (name.equals("title")) printing = false;
         }
         else if (event == XmlPullParser.TEXT) {
             if (printing) System.out.println(parser.getText());
         }
         else if (event == XmlPullParser.END_DOCUMENT) {
             break;
         } // end else if
      }  // end while
    }
    catch (XmlPullParserException ex) {
       System.out.println(ex);  
    }
    catch (IOException ex) {
      System.out.println("IOException while parsing " + args[0]);   
    }
        
  }

}

Improved RSSLister

Print only item titles:

import org.xmlpull.v1.*;
import java.net.*;
import java.io.*;

 
public class BetterRSSLister {

  public static void main(String[] args) {
        
    if (args.length == 0) {
      System.err.println("Usage: java BetterRSSLister url" );
      return;   
    }
        
    try {
      XmlPullParserFactory factory = XmlPullParserFactory.newInstance();
      XmlPullParser parser = factory.newPullParser();
      
      InputStream in;
      try {
        URL u = new URL(args[0]);
        in = u.openStream();
      }
      catch (MalformedURLException ex) {
          // Maybe it's a file name
          in = new FileInputStream(args[0]);
      }
      parser.setInput(in, null);
      
      boolean inItem = false;
      boolean inTitle = false;
      // Nested elements could be handled by incrementing
      // and decrementing an integer instead
      // of a simple boolean.
      while (true) {
         int event = parser.next();
         if (event == XmlPullParser.START_TAG) {
             String name = parser.getName();
             if (name.equals("title")) inTitle = true;
             if (name.equals("item")) inItem = true;
         }
         else if (event == XmlPullParser.END_TAG) {
             String name = parser.getName();
             if (name.equals("title")) inTitle = false;
             if (name.equals("item")) inItem = false;
         }
         else if (event == XmlPullParser.TEXT) {
             if (inTitle && inItem) System.out.println(parser.getText());
         }
         else if (event == XmlPullParser.END_DOCUMENT) {
             break;
         } // end else if
      }  // end while
    }
    catch (XmlPullParserException ex) {
       System.out.println(ex);  
    }
    catch (IOException ex) {
      System.out.println("IOException while parsing " + args[0]);   
    }
        
  }

}

The nextTag() method


The nextText() method


Attributes


Attributes Example: XLinkSpider

import org.xmlpull.v1.*;
import java.net.*;
import java.io.*;
import java.util.*;

public class PullSpider {

  // Need to keep track of where we've been 
  // so we don't get stuck in an infinite loop
  private List spideredURIs = new Vector();

  // This linked list keeps track of where we're going.
  // Although the LinkedList class does not guarantee queue like
  // access, I always access it in a first-in/first-out fashion.
  private LinkedList queue = new LinkedList();
  
  private URL currentURL;
  private XmlPullParser parser;
  
  public PullSpider() {
      try {
        XmlPullParserFactory factory = XmlPullParserFactory.newInstance();
        factory.setNamespaceAware(true);
        this.parser = factory.newPullParser();
      }
      catch (XmlPullParserException ex) {
         throw new RuntimeException("Could not locate a pull parser");   
      }
  }

  private void processStartTag() {
    
    String type 
     = parser.getAttributeValue("http://www.w3.org/1999/xlink", "type");
    if (type != null) {
      String href 
       = parser.getAttributeValue("http://www.w3.org/1999/xlink", "href");
          if (href != null) {
            try {
              URL foundURL = new URL(currentURL, href);
              if (!spideredURIs.contains(foundURL)) {
                queue.addFirst(foundURL);
              }
            }
           catch (MalformedURLException ex) {
             // skip it   
            }
        }
    }
  }
  
  public void spider(URL uri) {
      
    System.out.println("Spidering " + uri);
    currentURL = uri;
    try {
      parser.setInput(this.currentURL.openStream(), null);
      spideredURIs.add(currentURL);
      
      for (int event = parser.next(); event != XmlPullParser.END_DOCUMENT; event = parser.next()) {
         if (event == XmlPullParser.START_TAG) {
             processStartTag();
         }
       }  // end for
      
       while (!queue.isEmpty()) {
         URL nextURL = (URL) queue.removeLast();
         spider(nextURL);
       }
      
    }
    catch (Exception ex) {
       // skip this document
    }
    
  }

  public static void main(String[] args) throws Exception {
        
    if (args.length == 0) {
      System.err.println("Usage: java PullSpider url" );
       return;  
    }
        
    PullSpider spider = new PullSpider();
    spider.spider(new URL(args[0]));
        
  } // end main

} // end PullSpider


Output from the PullSpider

Spidering http://www.rddl.org
Visited http://www.rddl.org
Spidering http://www.rddl.org/natures
Spidering http://www.rddl.org/purposes
Visited http://www.rddl.org/purposes
Spidering http://www.rddl.org/xrd.css
Spidering http://www.rddl.org/rddl-xhtml.dtd
Spidering http://www.rddl.org/rddl-qname-1.mod
Spidering http://www.rddl.org/rddl-resource-1.mod
Spidering http://www.rddl.org/xhtml-arch-1.mod
Spidering http://www.rddl.org/xhtml-attribs-1.mod
Spidering http://www.rddl.org/xhtml-base-1.mod
Spidering http://www.rddl.org/xhtml-basic-form-1.mod
Spidering http://www.rddl.org/xhtml-basic-table-1.mod
Spidering http://www.rddl.org/xhtml-blkphras-1.mod
Spidering http://www.rddl.org/xhtml-blkstruct-1.mod
Spidering http://www.rddl.org/xhtml-charent-1.mod
Spidering http://www.rddl.org/xhtml-datatypes-1.mod
Spidering http://www.rddl.org/xhtml-framework-1.mod
Spidering http://www.rddl.org/xhtml-hypertext-1.mod
Spidering http://www.rddl.org/xhtml-image-1.mod
Spidering http://www.rddl.org/xhtml-inlphras-1.mod
Spidering http://www.rddl.org/xhtml-inlstruct-1.mod
Spidering http://www.rddl.org/xhtml-lat1.ent
Spidering http://www.rddl.org/xhtml-link-1.mod
Spidering http://www.rddl.org/xhtml-meta-1.mod
Spidering http://www.rddl.org/xhtml-notations-1.mod
Spidering http://www.rddl.org/xhtml-object-1.mod
Spidering http://www.rddl.org/xhtml-param-1.mod
Spidering http://www.rddl.org/xhtml-qname-1.mod
Spidering http://www.rddl.org/xhtml-rddl-model-1.mod
Spidering http://www.rddl.org/xhtml-special.ent
Spidering http://www.rddl.org/xhtml-struct-1.mod
Spidering http://www.rddl.org/xhtml-symbol.ent
Spidering http://www.rddl.org/xhtml-text-1.mod
Spidering http://www.rddl.org/xlink-module-1.mod
Spidering http://www.rddl.org/rddl.rdfs
Visited http://www.rddl.org/rddl.rdfs
Spidering http://www.rddl.org/rddl-integration.rxg
Visited http://www.rddl.org/rddl-integration.rxg
Spidering http://www.rddl.org/modules/rddl-1.rxm
Spidering http://www.rddl.org/modules/xhtml-attribs-1.rxm
Spidering http://www.rddl.org/modules/xhtml-base-1.rxm
Visited http://www.rddl.org/modules/xhtml-base-1.rxm
Spidering http://www.rddl.org/modules/xhtml-basic-form-1.rxm
Spidering http://www.rddl.org/modules/xhtml-basic-table-1.rxm
Spidering http://www.rddl.org/modules/xhtml-basic10-model-1.rxm
Visited http://www.rddl.org/modules/xhtml-basic10-model-1.rxm
Spidering http://www.rddl.org/modules/xhtml-basic10.rxm
Spidering http://www.rddl.org/modules/xhtml-blkphras-1.rxm
Visited http://www.rddl.org/modules/xhtml-blkphras-1.rxm
Spidering http://www.rddl.org/modules/xhtml-blkstruct-1.rxm
Visited http://www.rddl.org/modules/xhtml-blkstruct-1.rxm
Spidering http://www.rddl.org/modules/xhtml-for-rddl.rxm
Spidering http://www.rddl.org/modules/xhtml-framework-1.rxm
Visited http://www.rddl.org/modules/xhtml-framework-1.rxm
Spidering http://www.rddl.org/modules/xhtml-hypertext-1.rxm
Spidering http://www.rddl.org/modules/xhtml-image-1.rxm
Spidering http://www.rddl.org/modules/xhtml-inlphras-1.rxm
Visited http://www.rddl.org/modules/xhtml-inlphras-1.rxm
Spidering http://www.rddl.org/modules/xhtml-inlstruct-1.rxm
Visited http://www.rddl.org/modules/xhtml-inlstruct-1.rxm
Spidering http://www.rddl.org/modules/xhtml-link-1.rxm
Spidering http://www.rddl.org/modules/xhtml-list-1.rxm
Visited http://www.rddl.org/modules/xhtml-list-1.rxm
Spidering http://www.rddl.org/modules/xhtml-meta-1.rxm
Visited http://www.rddl.org/modules/xhtml-meta-1.rxm
Spidering http://www.rddl.org/modules/xhtml-object-1.rxm
Spidering http://www.rddl.org/modules/xhtml-param-1.rxm
Spidering http://www.rddl.org/modules/xhtml-text-1.rxm
Visited http://www.rddl.org/modules/xhtml-text-1.rxm
Spidering http://www.rddl.org/xhtml-rddl.rng
Visited http://www.rddl.org/xhtml-rddl.rng
Spidering http://www.rddl.org/modules/attribs.rng
Visited http://www.rddl.org/modules/attribs.rng
Spidering http://www.rddl.org/modules/base.rng
Visited http://www.rddl.org/modules/base.rng
Spidering http://www.rddl.org/modules/basic-form.rng
Visited http://www.rddl.org/modules/basic-form.rng
Spidering http://www.rddl.org/modules/basic-table.rng
Visited http://www.rddl.org/modules/basic-table.rng
Spidering http://www.rddl.org/modules/datatypes.rng
Visited http://www.rddl.org/modules/datatypes.rng
Spidering http://www.rddl.org/modules/struct.rng
Visited http://www.rddl.org/modules/struct.rng
Spidering http://www.rddl.org/modules/text.rng
Visited http://www.rddl.org/modules/text.rng
Spidering http://www.rddl.org/modules/hypertext.rng
Visited http://www.rddl.org/modules/hypertext.rng
Spidering http://www.rddl.org/modules/list.rng
Visited http://www.rddl.org/modules/list.rng
Spidering http://www.rddl.org/modules/image.rng
Visited http://www.rddl.org/modules/image.rng
Spidering http://www.rddl.org/modules/param.rng
Visited http://www.rddl.org/modules/param.rng
Spidering http://www.rddl.org/modules/object.rng
Visited http://www.rddl.org/modules/object.rng
Spidering http://www.rddl.org/modules/meta.rng
Visited http://www.rddl.org/modules/meta.rng
Spidering http://www.rddl.org/modules/link.rng
Visited http://www.rddl.org/modules/link.rng
Spidering http://www.rddl.org/modules/xlink.rng
Visited http://www.rddl.org/modules/xlink.rng
Spidering http://www.rddl.org/modules/resource.rng
Visited http://www.rddl.org/modules/resource.rng
Spidering http://www.rddl.org/rddl.sch
Visited http://www.rddl.org/rddl.sch
Spidering http://www.rddl.org/rddl-schematron.xsl
Visited http://www.rddl.org/rddl-schematron.xsl
Spidering http://www.rddl.org/rddl.soc
Spidering http://www.rddl.org/xhtml-rddl.trex
Visited http://www.rddl.org/xhtml-rddl.trex
Spidering http://www.rddl.org/rddl-20010122.zip
Spidering http://www.rddl.org/RDDL-JOM.html
Visited http://www.rddl.org/RDDL-JOM.html
Spidering http://www.rddl.org/rddl.jar
Spidering http://www.rddl.org/rddlapi.xsl
Visited http://www.rddl.org/rddlapi.xsl
Spidering http://www.rddl.org/rddlview.xsl
Visited http://www.rddl.org/rddlview.xsl
Spidering http://www.rddl.org/rddl2rdf.xsl
Visited http://www.rddl.org/rddl2rdf.xsl
Spidering http://www.rddl.org/rddl2rss.xsl
Visited http://www.rddl.org/rddl2rss.xsl
Spidering http://www.injektilo.org/rddl/RDDL.NET.zip
Spidering http://www.rddl.org/rddl.htc
Spidering http://www.rddl.org/home
Visited http://www.rddl.org/home
Spidering http://www.w3.org/TR/REC-xml-names
Spidering http://www.ietf.org/rfc/rfc2396.txt
Spidering http://www.w3.org/tr/xlink
Spidering http://www.w3.org/TR/xhtml-basic
Visited http://www.w3.org/TR/xhtml-basic
Spidering http://www.w3.org/TR/xmlbase/
Spidering http://www.w3.org/tr/xptr
Spidering http://www.w3.org/TR/xml-infoset/
Spidering http://www.w3.org/tr/xhtml1
Visited http://www.w3.org/tr/xhtml1
Spidering http://www.w3.org/TR/xlink2rdf/
Spidering http://www.w3.org/TR/xhtml-modularization/
Visited http://www.w3.org/TR/xhtml-modularization/
Spidering http://www.rddl.org/purposes#canonicalization
Visited http://www.rddl.org/purposes#canonicalization
Spidering http://www.rddl.org/purposes#target
Visited http://www.rddl.org/purposes#target
Spidering http://www.rddl.org/purposes#target
Visited http://www.rddl.org/purposes#target

Processing Instructions


Pull Processing Instructions Example

import org.xmlpull.v1.*;
import java.net.*;
import java.io.*;

 
public class PILister {

  public static void main(String[] args) {
        
    if (args.length == 0) {
      System.err.println("Usage: java PILister url" );
     return;    
    }
        
    try {
      XmlPullParserFactory factory = XmlPullParserFactory.newInstance();
      XmlPullParser parser = factory.newPullParser();

      InputStream in;
      try {
        URL u = new URL(args[0]);
        in = u.openStream();
      }
      catch (MalformedURLException ex) {
          // Maybe it's a file name
          in = new FileInputStream(args[0]);
      }
      parser.setInput(in, null);
        
      while (true) {
         int event = parser.nextToken();
         if (event == XmlPullParser.PROCESSING_INSTRUCTION) {
             System.out.println("Target: " + parser.getName());
             System.out.println("Data: " + parser.getText());
             System.out.println();
         }
         else if (event == XmlPullParser.END_DOCUMENT) {
            break;   
         }
      }           
    }
    catch (XmlPullParserException ex) {
       System.out.println(ex);  
    }
    catch (IOException e) {
      System.out.println("IOException while parsing " + args[0]);   
    }
        
  }

}

Output from PILister

????
import org.xmlpull.v1.*;
import java.net.*;
import java.io.*;

 
public class CommentPuller {

  public static void main(String[] args) {
        
    if (args.length == 0) {
      System.err.println("Usage: java CommentPuller url" );
      return;   
    }
        
    try {
      XmlPullParserFactory factory = XmlPullParserFactory.newInstance();
      XmlPullParser parser = factory.newPullParser();

      InputStream in;
      try {
        URL u = new URL(args[0]);
        in = u.openStream();
      }
      catch (MalformedURLException ex) {
          // Maybe it's a file name
          in = new FileInputStream(args[0]);
      }
      parser.setInput(in, null);
        
      while (true) {
         int event = parser.nextToken();
         if (event == XmlPullParser.COMMENT) {
             System.out.println(parser.getText());
         }
         else if (event == XmlPullParser.END_DOCUMENT) {
            break;   
         }
      }           
    }
    catch (XmlPullParserException ex) {
       System.out.println(ex);  
    }
    catch (IOException e) {
      System.out.println("IOException while parsing " + args[0]);   
    }
        
  }

}

Comments

import org.xmlpull.v1.*;
import java.net.*;
import java.io.*;

 
public class CommentPuller {

  public static void main(String[] args) {
        
    if (args.length == 0) {
      System.err.println("Usage: java CommentPuller url" );
      return;   
    }
        
    try {
      XmlPullParserFactory factory = XmlPullParserFactory.newInstance();
      XmlPullParser parser = factory.newPullParser();

      InputStream in;
      try {
        URL u = new URL(args[0]);
        in = u.openStream();
      }
      catch (MalformedURLException ex) {
          // Maybe it's a file name
          in = new FileInputStream(args[0]);
      }
      parser.setInput(in, null);
        
      while (true) {
         int event = parser.nextToken();
         if (event == XmlPullParser.COMMENT) {
             System.out.println(parser.getText());
         }
         else if (event == XmlPullParser.END_DOCUMENT) {
            break;   
         }
      }           
    }
    catch (XmlPullParserException ex) {
       System.out.println(ex);  
    }
    catch (IOException e) {
      System.out.println("IOException while parsing " + args[0]);   
    }
        
  }

}

Output from CommentPuller

% java CommentPuller hotcop.xml
 The publisher is actually Polygram but I needed 
       an example of a general entity reference. 
 You can tell what album I was 
     listening to when I wrote this example 

Features and Properties

    public void setFeature(String name, boolean state) 
     throws XmlPullParserException;
    public boolean getFeature(String name);
    public void setProperty(String name, Object value)
     throws XmlPullParserException;
    public Object getProperty(String name);

Required Features


Optional Features


Example: PullValidator

import org.xmlpull.v1.*;
import java.net.*;
import java.io.*;

 
public class PullValidator {

  public static void main(String[] args) {
        
    if (args.length == 0) {
      System.err.println("Usage: java PullValidator url" );
     return;    
    }
        
    try {
      XmlPullParserFactory factory = XmlPullParserFactory.newInstance();
      XmlPullParser parser = factory.newPullParser();
      try {
        parser.setFeature(XmlPullParser.FEATURE_VALIDATION, true);
      }
      catch (XmlPullParserException ex) {
         System.err.println("This is not a validating parser");   
         return;
      }

      InputStream in;
      try {
        URL u = new URL(args[0]);
        in = u.openStream();
      }
      catch (MalformedURLException ex) {
          // Maybe it's a file name
          in = new FileInputStream(args[0]);
      }
      parser.setInput(in, null);
        
      for (int event = parser.next(); 
           event != XmlPullParser.END_DOCUMENT ;
            event = parser.next()) ;
            
      // If we get here there are no exceptions
      System.out.println(args[0] + " is valid");      
    }
    catch (XmlPullParserException ex) {
       System.out.println(args[0] + " is not valid");   
       System.out.println(ex);  
    }
    catch (IOException ex) {
      System.out.println(args[0] + " could not be checked due to an " 
       + ex.getClass().getName());   
      ex.printStackTrace();      
    }
        
  }

}

XML Declaration

<?xml version="1.0" encoding="ISO-8859-1" standalone="no"?>


Example: PullDeclaration

import org.xmlpull.v1.*;
import java.net.*;
import java.io.*;

 
public class PullDeclaration {

  public static void main(String[] args) {
        
    if (args.length == 0) {
      System.err.println("Usage: java PullDeclaration url" );
     return;    
    }
        
    try {
      XmlPullParserFactory factory = XmlPullParserFactory.newInstance();
      XmlPullParser parser = factory.newPullParser();

      InputStream in;
      try {
        URL u = new URL(args[0]);
        in = u.openStream();
      }
      catch (MalformedURLException ex) {
          // Maybe it's a file name
          in = new FileInputStream(args[0]);
      }
      parser.setInput(in, null);
        
      for (int event = parser.next(); 
           event != XmlPullParser.START_TAG;
            event = parser.next()) ;
            
      String version = (String) parser.getProperty(
       "http://xmlpull.org/v1/doc/properties.html#xmldecl-version");
      Boolean standalone = (Boolean) parser.getProperty(
       "http://xmlpull.org/v1/doc/features.html#xmldecl-standalone");
      if (standalone == null) standalone = Boolean.FALSE;
      String encoding = parser.getInputEncoding();

      System.out.println("version=\"" + version + "\"");   
      System.out.println("standalone=\"" + standalone + "\"");   
      System.out.println("encoding=\"" + encoding + "\"");   
       
    }
    catch (XmlPullParserException ex) {
       System.out.println(args[0] + " is not well-formed");   
       System.out.println(ex);  
    }
    catch (IOException ex) {
      System.out.println(args[0] + " could not be checked due to an " 
       + ex.getClass().getName());   
      ex.printStackTrace();      
    }
        
  }

}

Output from PullDeclaration

% java PullDeclaration hotcop.xml
version="1.0"
standalone="false"
encoding="UTF-8"

Namespaces


Requirements


XmlPullParserFactory

package org.xmlpull.v1;

public class XmlPullParserFactory {

  public static final String PROPERTY_NAME =
        "org.xmlpull.v1.XmlPullParserFactory";

  public void    setFeature(String name, boolean state) 
   throws XmlPullParserException;
  public boolean getFeature (String name);
  public void    setNamespaceAware(boolean awareness);
  public boolean isNamespaceAware();
  public void    setValidating(boolean validating) ;
  public boolean isValidating();
  
  public        XmlPullParser        newPullParser()
   throws XmlPullParserException;
  public static XmlPullParserFactory newInstance() 
   throws XmlPullParserException;
  public static XmlPullParserFactory newInstance(String classNames, Class context)
   throws XmlPullParserException;
   
}

XmlPullParser

package org.xmlpull.v1;

public interface XmlPullParser {

    public final static String NO_NAMESPACE = "";

    public final static int START_DOCUMENT;
    public final static int END_DOCUMENT;
    public final static int START_TAG;
    public final static int END_TAG;
    public final static int TEXT;
    public final static int CDSECT;
    public final static int ENTITY_REF;
    public final static int IGNORABLE_WHITESPACE;
    public final static int PROCESSING_INSTRUCTION;
    public final static int COMMENT;
    public final static int DOCDECL;

    public final static String [] TYPES = {
        "START_DOCUMENT",
        "END_DOCUMENT",
        "START_TAG",
        "END_TAG",
        "TEXT",
        "CDSECT",
        "ENTITY_REF",
        "IGNORABLE_WHITESPACE",
        "PROCESSING_INSTRUCTION",
        "COMMENT",
        "DOCDECL"
    };

    public final static String FEATURE_PROCESS_NAMESPACES =
        "http://xmlpull.org/v1/doc/features.html#process-namespaces";
    public final static String FEATURE_REPORT_NAMESPACE_ATTRIBUTES =
        "http://xmlpull.org/v1/doc/features.html#report-namespace-prefixes";
    public final static String FEATURE_PROCESS_DOCDECL =
        "http://xmlpull.org/v1/doc/features.html#process-docdecl";
    public final static String FEATURE_VALIDATION =
        "http://xmlpull.org/v1/doc/features.html#validation";

    public void setFeature(String name, boolean state) 
     throws XmlPullParserException;
    public boolean getFeature(String name);
    public void setProperty(String name, Object value)
     throws XmlPullParserException;
    public Object getProperty(String name);

    public void setInput(Reader in) throws XmlPullParserException;
    public void setInput(InputStream inputStream, String inputEncoding)
        throws XmlPullParserException;

    // actual parsing methods
    public int getEventType()
        throws XmlPullParserException;
    public int next()
        throws XmlPullParserException, IOException;
    public int nextToken()
        throws XmlPullParserException, IOException;
        
    // Utility methods
    public void require(int type, String namespace, String name)
        throws XmlPullParserException, IOException;
    public String nextText() throws XmlPullParserException, IOException;
    public int    nextTag() throws XmlPullParserException, IOException;        
        
    public String getInputEncoding();
    public void defineEntityReplacementText( String entityName,
     String replacementText ) throws XmlPullParserException;
    public int getNamespaceCount(int depth) 
     throws XmlPullParserException;
     
   public String getNamespacePrefix(int position) throws XmlPullParserException;
   public String getNamespaceUri(int position) throws XmlPullParserException;
   public String getNamespace(String prefix);
   public int    getDepth();
   public String getPositionDescription();
   public int    getLineNumber();
   public int    getColumnNumber();

   // Text methods
   public boolean isWhitespace() throws XmlPullParserException;
   public String  getText();
   public char[]  getTextCharacters(int[] holderForStartAndLength);

    // Tag methods
    public String  getNamespace();
    public String  getName();
    public String  getPrefix();
    public boolean isEmptyElementTag() throws XmlPullParserException;

    // Attribute methods
    public int     getAttributeCount();
    public String  getAttributeNamespace(int index);
    public String  getAttributePrefix(int index);
    public String  getAttributeType(int index);
    public boolean isAttributeDefault(int index);
    public String  getAttributeValue(int index);
    public String  getAttributeValue(String namespace, String name);
}

XmlPullException

package org.xmlpull.v1;

public class XmlPullParserException extends Exception {

    public XmlPullParserException(String message);
    public XmlPullParserException(String message, Throwable throwble) ;
    public XmlPullParserException(String message, int row, int column);
    public XmlPullParserException(String message, XmlPullParser parser, Throwable chain);

    public Throwable getDetail();
    public void printStackTrace();

}

XmlSerializer

package org.xmlpull.v1;

public interface XmlSerializer {

  public void setFeature(String name, boolean state)
   throws IllegalArgumentException, IllegalStateException;
  public boolean getFeature(String name);
  public void setProperty(String name, Object value)
   throws IllegalArgumentException, IllegalStateException;
  public Object getProperty(String name);

  public void setOutput(OutputStream out, String encoding)
   throws IOException, IllegalArgumentException, IllegalStateException;
  public void setOutput(Writer out)
   throws IOException, IllegalArgumentException, IllegalStateException;

  public void startDocument(String encoding, Boolean standalone)
   throws IOException, IllegalArgumentException, IllegalStateException;
  public void endDocument()
   throws IOException, IllegalArgumentException, IllegalStateException;
  public void setPrefix(String prefix, String namespace)
   throws IOException, IllegalArgumentException, IllegalStateException;
  public String getPrefix(String namespace, boolean generatePrefix)
   throws IllegalArgumentException;
  public int getDepth();
  public String getNamespace();
  public String getName();

  public XmlSerializer startTag(String namespace, String name)
   throws IOException, IllegalArgumentException, IllegalStateException;
  public XmlSerializer attribute(String namespace, String name, String value)
   throws IOException, IllegalArgumentException, IllegalStateException;
  public XmlSerializer endTag(String namespace, String name)
   throws IOException, IllegalArgumentException, IllegalStateException;
  public XmlSerializer text(String text)
   throws IOException, IllegalArgumentException, IllegalStateException;
  public XmlSerializer text(char [] buf, int start, int len)
   throws IOException, IllegalArgumentException, IllegalStateException;

  public void cdsect(String text)
   throws IOException, IllegalArgumentException, IllegalStateException;
  public void entityRef(String text)  throws IOException,
        IllegalArgumentException, IllegalStateException;
  public void processingInstruction(String text)
   throws IOException, IllegalArgumentException, IllegalStateException;
  public void comment(String text)
   throws IOException, IllegalArgumentException, IllegalStateException;
  public void docdecl(String text)
   throws IOException, IllegalArgumentException, IllegalStateException;
  public void ignorableWhitespace(String text)
   throws IOException, IllegalArgumentException, IllegalStateException;

  public void flush() throws IOException;

}

Serializer Example: Convert RDDL to XHTML


Example: RDDLStripper

import org.xmlpull.v1.*;
import java.net.*;
import java.io.*;

 
public class RDDLStripper {
    
  public final static String RDDL_NS = "http://www.rddl.org/";

  public static void main(String[] args) {
        
    if (args.length == 0) {
      System.err.println("Usage: java RDDLStripper url" );
      return;    
    }
        
    try {
      XmlPullParserFactory factory = XmlPullParserFactory.newInstance();
      factory.setNamespaceAware(true);
      XmlPullParser parser = factory.newPullParser();
      XmlSerializer serializer = factory.newSerializer();
      serializer.setOutput(System.out, "ISO-8859-1");
      
      InputStream in;
      try {
        URL u = new URL(args[0]);
        in = u.openStream();
      }
      catch (MalformedURLException ex) {
          // Maybe it's a file name
          in = new FileInputStream(args[0]);
      }
      parser.setInput(in, null);
        
      while (true) {
         int event = parser.nextToken();
         if (event == XmlPullParser.START_TAG) {
             String namespaceURI = parser.getNamespace();
             if (!namespaceURI.equals(RDDL_NS)) {
                 String prefix = parser.getPrefix();
                 if (prefix == null) prefix = "";
                 if (namespaceURI != null) {
                     serializer.setPrefix(prefix, namespaceURI);
                 }
                 serializer.startTag(namespaceURI, parser.getName());
                 // add attributes
                 for (int i = 0; i < parser.getAttributeCount(); i++) {
                     serializer.attribute(
                       parser.getAttributeNamespace(i),
                       parser.getAttributeName(i),
                       parser.getAttributeValue(i)
                     );
                     // How to define attribute prefixes????
                 }
             }
         }
         else if (event == XmlPullParser.END_TAG) {
             String namespaceURI = parser.getNamespace();
             if (!namespaceURI.equals(RDDL_NS)) {
                 serializer.endTag(namespaceURI, parser.getName());
             }
         }
         else if (event == XmlPullParser.TEXT) {
             serializer.text(parser.getText());
         }
         else if (event == XmlPullParser.CDSECT) {
             serializer.cdsect(parser.getText());
         }
         else if (event == XmlPullParser.COMMENT) {
             serializer.comment(parser.getText());
         }
         else if (event == XmlPullParser.DOCDECL) {
             serializer.docdecl(parser.getText());
         }
         else if (event == XmlPullParser.ENTITY_REF) {
             serializer.entityRef(parser.getName());
        }
         else if (event == XmlPullParser.IGNORABLE_WHITESPACE) {
             serializer.ignorableWhitespace(parser.getText());
         }
         else if (event == XmlPullParser.PROCESSING_INSTRUCTION) {
             serializer.processingInstruction(parser.getText());
         }
         else if (event == XmlPullParser.TEXT) {
             serializer.text(parser.getText());
         }
         else if (event == XmlPullParser.END_DOCUMENT) {
            serializer.flush();
            break;
         }
      }           
    }
    catch (XmlPullParserException ex) {
       System.out.println(ex);  
    }
    catch (IOException e) {
      System.out.println("IOException while parsing " + args[0]);   
    }
        
  }

}

One of my favorite features


Java Issues


XML Issues


NekoPull


XMLEvent


XMLEvent Subclasses

NekoPull Class Hierarchy diagram

Parsing Documents


Simple Wellformedness Checker

import org.apache.xerces.xni.parser.XMLInputSource;
import org.apache.xerces.xni.XNIException;
import org.cyberneko.pull.*;
import org.cyberneko.pull.event.*;
import org.cyberneko.pull.parsers.Xerces2;
import java.io.IOException;

 
public class NekoChecker {

  public static void main(String[] args) {
        
    if (args.length == 0) {
      System.err.println("Usage: java NekoChecker url" );
      return;   
    }
        
    try {
      XMLPullParser parser = new Xerces2();;
      XMLInputSource source = new XMLInputSource(null, args[0], null);
      parser.setInputSource(source);
        
      // read entire document
      while (parser.nextEvent() != null) ;
            
      // If we get here there are no exceptions
      System.out.println(args[0] + " is well-formed");      
    }
    catch (XNIException ex) {
       System.out.println(args[0] + " is not well-formed"); 
       System.out.println(ex);  
    }
    catch (IOException ex) {
      System.out.println(args[0] 
       + " could not be checked due to an " 
       + ex.getClass().getName());   
      ex.printStackTrace();      
    }
        
  }

}

Listening to Events

import org.apache.xerces.xni.parser.XMLInputSource;
import org.apache.xerces.xni.XNIException;
import org.cyberneko.pull.*;
import org.cyberneko.pull.event.*;
import org.cyberneko.pull.parsers.Xerces2;
import java.io.IOException;

 
public class NekoLister {

  public static void main(String[] args) {
        
    if (args.length == 0) {
      System.err.println("Usage: java NekoLister url" );
      return;   
    }
        
    try {
      XMLPullParser parser = new Xerces2();;
      XMLInputSource source = new XMLInputSource(null, args[0], null);
      parser.setInputSource(source);
        
      XMLEvent event;
      while ((event = parser.nextEvent()) != null) {
        switch (event.type) {
          case XMLEvent.ELEMENT: 
            System.out.println("Element");
            break;
          case XMLEvent.DOCUMENT: 
            System.out.println("Document");
            break;
          case XMLEvent.CHARACTERS: 
            System.out.println("Characters");
            break;
          case XMLEvent.PREFIX_MAPPING: 
            System.out.println("Prefix mapping");
            break;
          case XMLEvent.GENERAL_ENTITY: 
            System.out.println("General Entity");
            break;
          case XMLEvent.PROCESSING_INSTRUCTION: 
            System.out.println("Processing instruction");
            break;
          case XMLEvent.CDATA: 
            System.out.println("CDATA section");
            break;
          case XMLEvent.TEXT_DECL: 
            System.out.println("Text declaration");
            break;
          case XMLEvent.DOCTYPE_DECL: 
            System.out.println("Document type declaration");
            break;
          default:
            System.out.println("Unexpected event");
        } 
      }
    }
    catch (XNIException ex) {
       System.out.println(args[0] + " is not well-formed"); 
       System.out.println(ex);  
    }
    catch (IOException ex) {
      System.out.println(args[0] + " could not be checked due to an " + ex.getClass().getName());   
      ex.printStackTrace();      
    }
        
  }

}

BoundedEvent

package org.cyberneko.pull.event;

public abstract class BoundedEvent extends XMLEvent {

    public boolean start;

    protected BoundedEvent(short type);

} 

ElementEvent

package org.cyberneko.pull.event;

public class ElementEvent extends BoundedEvent {

    public QName element;
    public XMLAttributes attributes;
    public boolean empty;

    public ElementEvent();

} 

QName class

package org.apache.xerces.xni;

public class QName implements Cloneable {

    public String prefix;
    public String localpart;
    public String rawname;
    public String uri;

    public QName();
    public QName(String prefix, String localpart, String rawname, String uri);
    public QName(QName qname);
    
    public void setValues(QName qname);
    public void setValues(String prefix, String localpart, String rawname, String uri);
    public void clear();
    
    public Object  clone();
    public int     hashCode();
    public boolean equals(Object object);
    public String  toString();

}

CharactersEvent

package org.cyberneko.pull.event;

public class CharactersEvent extends XMLEvent {

    public XMLString text;
    public boolean ignorable;

    public CharactersEvent();

}

NekoRSSLister

import org.apache.xerces.xni.parser.XMLInputSource;
import org.apache.xerces.xni.XNIException;
import org.cyberneko.pull.*;
import org.cyberneko.pull.event.*;
import org.cyberneko.pull.parsers.Xerces2;
import java.io.IOException;

 
public class NekoRSSLister {

  public static void main(String[] args) {
        
    if (args.length == 0) {
      System.err.println("Usage: java NekoRSSLister url");
      return;   
    }
        
    try {
      XMLPullParser parser = new Xerces2();
      XMLInputSource source = new XMLInputSource(null, args[0], null);
      parser.setInputSource(source);
        
      XMLEvent event;
      boolean inTitle = false
      while ((event = parser.nextEvent()) != null) {
        switch (event.type) {
          case XMLEvent.ELEMENT: 
            ElementEvent element = (ElementEvent) event;
            String name = element.QName.localpart;
            if (name.equals("title") && element.QName.uri == null) {
                if (element.start) inTitle = true;
                else inTitle = false;
            }
            break;
          case XMLEvent.CHARACTERS: 
            if (inTitle) {
              CharactersEvent text = (CharactersEvent) event;
              System.out.println(text.text);
            }
            break;
          case XMLEvent.CDATA: 
            if (inTitle) {
              CDATAEvent text = (CDATAEvent) event;
              System.out.println(text.text);
            }
            break;
          default:
            // do nothing
        } 
      }
    }
    catch (XNIException ex) {
       System.out.println(args[0] + " is not well-formed"); 
       System.out.println(ex);  
    }
    catch (IOException ex) {
      System.out.println(args[0] + " could not be checked due to an " 
       + ex.getClass().getName());  
      ex.printStackTrace();      
    }
        
  }

}

Attributes

package org.apache.xerces.xni;

public interface XMLAttributes {

  public int     getLength();
  public int     getIndex(String qualifiedName);
  public int     getIndex(String uri, String localPart);
  public void    setName(int index, QName name);
  public void    getName(int index, QName name);
  public String  getPrefix(int index);
  public String  getURI(int index);
  public String  getLocalName(int index);
  public String  getQName(int index);
  
  public void    setValue(int index, String value);
  public String  getValue(int index);
  public String  getValue(String qualifiedName);
  public String  getValue(String uri, String localName);
  public void    setNonNormalizedValue(int index, String value);
  public String  getNonNormalizedValue(int index); 
  
  public void    setType(int index, String type);
  public String  getType(int index);
  public String  getType(String qualifiedName);
  public String  getType(String uri, String localName);
  public void    setSpecified(int index, boolean specified);
  public boolean isSpecified(int index);
  
  public int  addAttribute(QName name, String type, String value);
  public void removeAllAttributes();
  public void removeAttributeAt(int index);  
  
  public Augmentations getAugmentations (int attributeIndex);
  public Augmentations getAugmentations (String uri, String localPart);
  public Augmentations getAugmentations(String qualifiedName);

}

NekoSpider

import org.apache.xerces.xni.*;
import org.apache.xerces.xni.parser.XMLInputSource;
import org.cyberneko.pull.*;
import org.cyberneko.pull.event.*;
import org.cyberneko.pull.parsers.Xerces2;
import java.net.*;
import java.io.*;
import java.util.*;

public class NekoSpider {

  // Need to keep track of where we've been 
  // so we don't get stuck in an infinite loop
  private List spideredURIs = new Vector();

  // This linked list keeps track of where we're going.
  // Although the LinkedList class does not guarantee queue like
  // access, I always access it in a first-in/first-out fashion.
  private LinkedList queue = new LinkedList();
  
  private URL currentURL;
  private XMLPullParser parser;
  
  public NekoSpider() {
      this.parser = new Xerces2();
  }

  private void processStartTag(ElementEvent element) {
    
    XMLAttributes attributes = element.attributes;
    String type = attributes.getValue("http://www.w3.org/1999/xlink", "type");
    if (type != null) {
      String href = attributes.getValue("http://www.w3.org/1999/xlink", "href");
      if (href != null) {
        try {
          URL foundURL = new URL(currentURL, href);
          if (!spideredURIs.contains(foundURL)) {
            queue.addFirst(foundURL);
          }
        }
        catch (MalformedURLException ex) {
          // skip it   
        }
      }
    }
  }
  
  public void spider(URL uri) {
      
    System.out.println("Spidering " + uri);
    try {
      XMLInputSource source 
       = new XMLInputSource(null, uri.toExternalForm(), null);
      parser.setInputSource(source);
      spideredURIs.add(uri);
      
      XMLEvent event;
      while ((event = parser.nextEvent()) != null) {
         if (event.type == XMLEvent.ELEMENT) {
             ElementEvent element = (ElementEvent) event;
             if (element.start) processStartTag(element);
         }
       }  // end for
      
       while (!queue.isEmpty()) {
         URL nextURL = (URL) queue.removeLast();
         spider(nextURL);
       }
      
    }
    catch (Exception ex) {
       // skip this document
    }
    
  }

  public static void main(String[] args) throws Exception {
        
    if (args.length == 0) {
      System.err.println("Usage: java NekoSpider url" );
       return;  
    }
        
    NekoSpider spider = new NekoSpider();
    spider.spider(new URL(args[0]));
        
  } // end main

} // end NekoSpider


DocumentEvent

package org.cyberneko.pull.event;

public class DocumentEvent extends BoundedEvent {

    public XMLLocator locator;
    public String encoding;

    public DocumentEvent();

}

ProcessingInstructionEvent

package org.cyberneko.pull.event;

public class ProcessingInstructionEvent extends XMLEvent {

    public String target;
    public XMLString data;

    public ProcessingInstructionEvent();

}

NekoPILister

import org.apache.xerces.xni.parser.XMLInputSource;
import org.apache.xerces.xni.XNIException;
import org.cyberneko.pull.*;
import org.cyberneko.pull.event.*;
import org.cyberneko.pull.parsers.Xerces2;
import java.io.IOException;

 
public class NekoPILister {

  public static void main(String[] args) {
        
    if (args.length == 0) {
      System.err.println("Usage: java NekoPILister url" );
      return;   
    }
        
    try {
      XMLPullParser parser = new Xerces2();
      XMLInputSource source = new XMLInputSource(null, args[0], null);
      parser.setInputSource(source);
        
      XMLEvent event;
      while ((event = parser.nextEvent()) != null) {
        if (event.type == XMLEvent.PROCESSING_INSTRUCTION) { 
            ProcessingInstructionEvent instruction 
             = (ProcessingInstructionEvent) event;
            System.out.println("Target: " + instruction.target);
            System.out.println("Data:   " + instruction.data);
            System.out.println();
        }
      }
    }
    catch (XNIException ex) {
       System.out.println(args[0] + " is not well-formed"); 
       System.out.println(ex);  
    }
    catch (IOException ex) {
      System.out.println(args[0] + " could not be checked due to an " 
       + ex.getClass().getName());  
      ex.printStackTrace();      
    }
        
  }

}

CommentEvent

package org.cyberneko.pull.event;

public class CommentEvent extends XMLEvent {

    public XMLString text;

    public CommentEvent();

} // class CommentEvent

NekoCommentPuller

import org.apache.xerces.xni.parser.XMLInputSource;
import org.apache.xerces.xni.XNIException;
import org.cyberneko.pull.*;
import org.cyberneko.pull.event.*;
import org.cyberneko.pull.parsers.Xerces2;
import java.io.IOException;

 
public class NekoCommentReader {

  public static void main(String[] args) {
        
    if (args.length == 0) {
      System.err.println("Usage: java NekoCommentReader url" );
      return;   
    }
        
    try {
      XMLPullParser parser = new Xerces2();
      XMLInputSource source = new XMLInputSource(null, args[0], null);
      parser.setInputSource(source);
        
      XMLEvent event;
      while ((event = parser.nextEvent()) != null) {
        if (event.type == XMLEvent.COMMENT) { 
            CommentEvent comment = (CommentEvent) event;
            System.out.println(comment.text);
        }
      }
    }
    catch (XNIException ex) {
       System.out.println(args[0] + " is not well-formed"); 
       System.out.println(ex);  
    }
    catch (IOException ex) {
      System.out.println(args[0] + " could not be checked due to an " 
       + ex.getClass().getName());   
      ex.printStackTrace();      
    }
        
  }

}

TextDeclEvent

package org.cyberneko.pull.event;

public class TextDeclEvent extends XMLEvent {

    public boolean xmldecl;
    public String  version;
    public String  encoding;
    public String  standalone;

    public TextDeclEvent();

}

PrefixMappingEvent

package org.cyberneko.pull.event;

public class PrefixMappingEvent extends BoundedEvent {

    public String prefix;
    public String uri;

    public PrefixMappingEvent();

} 

PrefixLister

import org.apache.xerces.xni.parser.XMLInputSource;
import org.apache.xerces.xni.XNIException;
import org.cyberneko.pull.*;
import org.cyberneko.pull.event.*;
import org.cyberneko.pull.parsers.Xerces2;
import java.io.IOException;

 
public class PrefixLister {

  public static void main(String[] args) {
        
    if (args.length == 0) {
      System.err.println("Usage: java PrefixLister url" );
      return;   
    }
        
    try {
      XMLPullParser parser = new Xerces2();
      XMLInputSource source = new XMLInputSource(null, args[0], null);
      parser.setInputSource(source);
        
      XMLEvent event;
      while ((event = parser.nextEvent()) != null) {
        if (event.type == XMLEvent.PREFIX_MAPPING) { 
            PrefixMappingEvent mapping = (PrefixMappingEvent) event;
            System.out.println("Prefix: " + mapping.prefix);
            System.out.println("URI:    " + mapping.uri);
            System.out.println();
        }
      }
    }
    catch (XNIException ex) {
       System.out.println(args[0] + " is not well-formed"); 
       System.out.println(ex);  
    }
    catch (IOException ex) {
      System.out.println(args[0] + " could not be checked due to an " 
       + ex.getClass().getName());  
      ex.printStackTrace();      
    }
        
  }

}

GeneralEntityEvent

package org.cyberneko.pull.event;

public class GeneralEntityEvent extends BoundedEvent {

  public String name;
  public String publicId;
  public String baseSystemId;
  public String literalSystemId;
  public String expandedSystemId;
  public String encoding;

  public GeneralEntityEvent();

}

EntityLister

import org.apache.xerces.xni.parser.XMLInputSource;
import org.apache.xerces.xni.XNIException;
import org.cyberneko.pull.*;
import org.cyberneko.pull.event.*;
import org.cyberneko.pull.parsers.Xerces2;
import java.io.IOException;

 
public class EntityLister {

  public static void main(String[] args) {
        
    if (args.length == 0) {
      System.err.println("Usage: java EntityLister url" );
      return;   
    }
        
    try {
      XMLPullParser parser = new Xerces2();
      XMLInputSource source = new XMLInputSource(null, args[0], null);
      parser.setInputSource(source);
        
      XMLEvent event;
      while ((event = parser.nextEvent()) != null) {
        if (event.type == XMLEvent.GENERAL_ENTITY) { 
            GeneralEntityEvent entity = (GeneralEntityEvent) event;
            if (entity.start) {
              System.out.println("Name:               " + entity.name);
              System.out.println("Public ID:          " + entity.pubid);
              System.out.println("Base System ID:     " + entity.basesysid);
              System.out.println("Literal System ID:  " + entity.literalsysid);
              System.out.println("Expanded System ID: " + entity.expandedsysid);
              System.out.println("Encoding:           " + entity.encoding);
              System.out.println();
           }
        }
      }
    }
    catch (XNIException ex) {
       System.out.println(args[0] + " is not well-formed"); 
       System.out.println(ex);  
    }
    catch (IOException ex) {
      System.out.println(args[0] + " could not be checked due to an " 
       + ex.getClass().getName());  
      ex.printStackTrace();      
    }
        
  }

}

XMLPullParser

package org.cyberneko.pull;

public interface XMLPullParser 
  extends XMLEventIterator, XMLComponentManager {

    public void setInputSource(XMLInputSource inputSource)
      throws XMLConfigurationException, IOException;
    public void cleanup();
    
    public void setErrorHandler(XMLErrorHandler errorHandler);
    public XMLErrorHandler getErrorHandler();

    public void setEntityResolver(XMLEntityResolver entityResolver);
    public XMLEntityResolver getEntityResolver();

    public void setLocale(Locale locale) throws XNIException;
    public Locale getLocale();

    public boolean getFeature(String featureId)
      throws XMLConfigurationException;
    public void setFeature(String featureId, boolean state)
      throws XMLConfigurationException;
    public void setProperty(String propertyId, Object value)
      throws XMLConfigurationException;
    public Object getProperty(String propertyId)
      throws XMLConfigurationException;

    public XMLEvent nextEvent() throws XNIException, IOException;
    
}

StAX


To Learn More


Part VII: XOM


To Learn More



Part VIII: TrAX


What is TrAX


TrAX Classes

There are four main classes and interfaces in TrAX, all in the javax.xml.transforms package:

Transformer

The class that represents the style sheet. It transforms a Source into a Result.

TransformerFactory

A factory class that reads a stylesheet to produce a new Transformer.

Source

The interface that represents the input XML document to be transformed, whether presented as a DOM tree, an InputStream, or a SAX event sequence.

Result

The interface that represents the XML document produced by the transformation, whether generated as a DOM tree, an OutputStream, or a SAX event sequence.


The Process of a TrAX Transformation

  1. Load the TransformerFactory with the static TransformerFactory.newInstance() factory method.

  2. Form a Source object from the XSLT stylesheet.

  3. Pass this Source object to the factory’s newTransformer() factory method to build a Transformer object.

  4. Build a Source object from the input XML document you wish to transform.

  5. Build a Result object for the target of the transformation.

  6. Pass both the source and the result to the Transformer object’s transform() method.

Steps four through six can be repeated for as many different input documents as you want. You can reuse the same Transformer object repeatedly in series, though you can’t use it in multiple threads in parallel.


TrAX Example

try {
  TransformerFactory xformFactory = TransformerFactory.newInstance();
  Source xsl = new StreamSource("stylesheet.xsl");
  Transformer stylesheet = xformFactory.newTransformer(xsl);

  Source request  = new StreamSource(in);
  Result response = new StreamResult(out);
  stylesheet.transform(request, response);
}
catch (TransformerException e) {
  System.err.println(e); 
}

Thread Safety

  TransformerFactory xformFactory = TransformerFactory.newInstance();
  Source xsl = new StreamSource("stylesheet.xsl");
  Templates templates = xformFactory.newTemplates(xsl);
  ...
while (true) {
  InputStream  in   = getNextDocument();
  OutputStream out  = getNextTarget();
  Source request    = new StreamSource(in);
  Result response   = new StreamResult(out);
  Transformer transformer = templates.newTransformer();
  transformer.transform(request, response);
}

Locating Transformers


The xml-stylesheet processing instruction

public abstract Source getAssociatedStylesheet(Source xmlDocument, String media, String title, String charset) throws TransformerConfigurationException;

// The InputStream in contains the XML document to be transformed
try {
  Source inputDocument = new StreamSource(in);
  TransformerFactory xformFactory = TransformerFactory.newInstance();
  Source xsl = xformFactory.getAssociatedStyleSheet(inputDocument, "print", null, null);
  Transformer stylesheet = xformFactory.newTransformer(xsl);

  Result outputDocument = new StreamResult(out);
  stylesheet.transform(inputDocument, outputDocument);
}
catch (TransformerConfigurationException e) {
  System.err.println("Problem with the xml-stylesheet processing instruction"); 
}
catch (TransformerException e) {
  System.err.println("Problem with the stylesheet"); 
}

Features


Features Example

import javax.xml.transform.*;
import javax.xml.transform.dom.*;
import javax.xml.transform.stream.*;
import javax.xml.transform.sax.*;


public class TrAXFeatureTester {

  public static void main(String[] args) {
  
    TransformerFactory xformFactory = TransformerFactory.newInstance();
      
    String name = xformFactory.getClass().getName();

    if (xformFactory.getFeature(DOMResult.FEATURE)) {
      System.out.println(name + " supports DOM output."); 
    }
    else {
      System.out.println(name + " does not support DOM output."); 
    }
    if (xformFactory.getFeature(DOMSource.FEATURE)) {
      System.out.println(name + " supports DOM input."); 
    }
    else {
      System.out.println(name + " does not support DOM input."); 
    }
    
    if (xformFactory.getFeature(SAXResult.FEATURE)) {
      System.out.println(name + " supports SAX output."); 
    }
    else {
      System.out.println(name + " does not support SAX output."); 
    }
    if (xformFactory.getFeature(SAXSource.FEATURE)) {
      System.out.println(name + " supports SAX input."); 
    }
    else {
      System.out.println(name + " does not support SAX input."); 
    }
    
    if (xformFactory.getFeature(StreamResult.FEATURE)) {
      System.out.println(name + " supports stream output."); 
    }
    else {
      System.out.println(name + " does not support stream output."); 
    }
    if (xformFactory.getFeature(StreamSource.FEATURE)) {
      System.out.println(name + " supports stream input."); 
    }
    else {
      System.out.println(name + " does not support stream input."); 
    }
    
    if (xformFactory.getFeature(SAXTransformerFactory.FEATURE)) {
      System.out.println(name + " returns SAXTransformerFactory "
       + "objects from TransformerFactory.newInstance()."); 
    }
    else {
      System.out.println(name 
       + " does not use SAXTransformerFactory."); 
    }
    if (xformFactory.getFeature(SAXTransformerFactory.FEATURE_XMLFILTER)) {
      System.out.println( 
       name + " supports the newXMLFilter() methods."); 
    }
    else {
      System.out.println( 
       name + " does not support the newXMLFilter() methods."); 
    }
  
  }

}

Feature Tester Output

Here’s the results of running this program against Saxon 6.5.1:

C:\XMLJAVA>java -Djavax.xml.transform.TransformerFactory=com.icl.saxon.TransformerFactoryImpl TrAXFeatureTester
com.icl.saxon.TransformerFactoryImpl supports DOM output.
com.icl.saxon.TransformerFactoryImpl supports DOM input.
com.icl.saxon.TransformerFactoryImpl supports SAX output.
com.icl.saxon.TransformerFactoryImpl supports SAX input.
com.icl.saxon.TransformerFactoryImpl supports stream output.
com.icl.saxon.TransformerFactoryImpl supports stream input.
com.icl.saxon.TransformerFactoryImpl returns 
 SAXTransformerFactory objects from 
 TransformerFactory.newInstance().
com.icl.saxon.TransformerFactoryImpl supports the newXMLFilter() 
 methods.

XSLT Processor Attributes

Some XSLT processors provide non-standard, custom attributes that control their behavior. Like features, these are also named via URIs. For example, Xalan-J 2.3 defines these three attributes:

http://apache.org/xalan/features/optimize

By default, Xalan rewrites stylesheets in an attempt to optimize them (similar to the behavior of an optimizing compiler for Java or other languages). This can confuse tools that need direct access to the stylesheet such as XSLT profilers and debuggers. If you’re using such a tool with Xalan, you should set this attribute to false.

http://apache.org/xalan/features/incremental

Setting this feature to true allows Xalan to begin producing output before it has finished processing the entire input document. This may cause problems if an error is detected late in the process, but it shouldn’t be a big problem in fully debugged and tested environments.

http://apache.org/xalan/features/source_location

Setting this to true tells Xalan to provide a JAXP SourceLocator a program can use to determine the location (line numbers, column numbers, system IDs, and public IDs) of individual nodes during the transform. However, it engenders a substantial performance hit so it’s turned off by default.

Other processors define their own attributes. Although TrAX is designed as a generic API, it does let you access such custom features with these two methods:

public abstract void setAttribute(String name Object value) throws IllegalArgumentException;
public abstract Object getAttribute(String name) throws IllegalArgumentException;

For example, this code tries to turn on incremental output:

TransformerFactory xformFactory 
 = TransformerFactory.newInstance();
try {
  xformFactory.setAttribute(
   "http://apache.org/xalan/features/incremental", Boolean.TRUE);
}
catch (IllegalArgumentException e) { 
  // This XSLT processor does not support the
  // http://apache.org/xalan/features/incremental attribute,
  // but we can still use the processor anyway
}

URI Resolution

package javax.xml.transform;

public interface URIResolver {

  public Source resolve(String href, String base) 
   throws TransformerException;
   
}

A URIResolver class

import javax.xml.transform.*;
import javax.xml.transform.stream.StreamSource;
import java.util.zip.GZIPInputStream;
import java.net.URL;
import java.io.InputStream;


public class GZipURIResolver implements URIResolver {

  public Source resolve(String href, String base) {
   
    try {
      href = href + ".gz";
      URL context = new URL(base);
      URL u = new URL(context, href); 
      InputStream in = u.openStream();
      GZIPInputStream gin = new GZIPInputStream(in);
      return new StreamSource(gin, u.toString());
    }
    catch (Exception e) {
      // If anything goes wrong, just return null and let
      // the default resolver try.
    }
    return null;
  }

}

The following two methods in TransformerFactory set and get the URIResolver that Transformer objects created by this factory will use to resolve URIs:

public abstract void setURIResolver(URIResolver resolver);
public abstract URIResolver getURIResolver();

For example,

URIResolver resolver = new GZipURIResolver();
factory.setURIResolver(resolver);

Error Handling

XSLT transformations can fail for any of several reasons, including:

By default, any such problems are reported by printing them on System.err. However, you can provide more sophisticated error handling, reporting, and logging by implementing the ErrorListener interface.

package javax.xml.transform;

public interface ErrorListener {

  public void warning(TransformerException exception)
   throws TransformerException;
  public void error(TransformerException exception)
   throws TransformerException;
  public void fatalError(TransformerException exception)
   throws TransformerException;
     
}

ErrorListener Example

import javax.xml.transform.*;
import java.util.logging.*;


public class LoggingErrorListener implements ErrorListener {

  private Logger logger;
  
  public LoggingErrorListener(Logger logger) {
    this.logger = logger;
  }
  
  public void warning(TransformerException exception) {
   
    logger.log(Level.WARNING, exception.getMessage(), exception);
   
    // Don't throw an exception and stop the processor
    // just for a warning; but do log the problem
  }
  
  public void error(TransformerException exception)
   throws TransformerException {
    
    logger.log(Level.SEVERE, exception.getMessage(), exception);
    // XSLT is not as draconian as XML. There are numerous errors
    // which the processor may but does not have to recover from; 
    // e.g. multiple templates that match a node with the same
    // priority. I do not want to allow that so I throw this 
    // exception here.
    throw exception;
    
  }
  
  public void fatalError(TransformerException exception)
   throws TransformerException {
    
    logger.log(Level.SEVERE, exception.getMessage(), exception);

    // This is an error which the processor cannot recover from; 
    // e.g. a malformed stylesheet or input document
    // so I must throw this exception here.
    throw exception;
    
  }
     
}

The following two methods appear in both TransformerFactory and Transformer. They enable you to set and get the ErrorListener that the object will report problems to:

public abstract void setErrorListener(ErrorListener listener)
    throws IllegalArgumentException;

public abstract ErrorListener getErrorListener();

An ErrorListener registered with a Transformer will report errors with the transformation. An ErrorListener registered with a TransformerFactory will report errors with the factory’s attempts to create new Transformer objects. For example, this code fragment installs separate LoggingErrorListeners on the TransformerFactory and the Transformer object it creates that will record messages in two different logs.

TransformerFactory factory = TransformerFactory.newInstance();
Logger factoryLogger 
 = Logger.getLogger("com.macfaq.trax.factory");
ErrorListener factoryListener 
 = new LoggingErrorListener(factoryLogger);
factory.setErrorListener(factoryListener);
Source source = new StreamSource("FibonacciXMLRPC.xsl");
Transformer stylesheet = factory.newTransformer(source);
Logger transformerLogger 
 = Logger.getLogger("com.macfaq.trax.transformer");
ErrorListener transformerListener 
 = new LoggingErrorListener(transformerLogger);
stylesheet.setErrorListener(transformerListener);

Passing Parameters to Style Sheets

Top-level xsl:param and xsl:variable elements both define variables by binding a name to a value. This variable can be dereferenced elsewhere in the stylesheet using the form $name. Once set, the value of an XSLT variable is fixed and cannot be changed. However if the variable is defined with a top-level xsl:param element instead of an xsl:variable element, then the default value can be changed before the transformation begins.

For example, the DocBook XSL stylesheets have a number of parameters that set various formatting options. I use these settings:

  <xsl:param name="fop.extensions">1</xsl:param>
  <xsl:param name="page.width.portrait">7.375in</xsl:param>
  <xsl:param name="page.height.portrait">9.25in</xsl:param>

  <xsl:param name="page.margin.top">0.5in</xsl:param>
  <xsl:param name="page.margin.bottom">0.5in</xsl:param>
  <xsl:param name="region.before.extent">0.5in</xsl:param>
  <xsl:param name="body.margin.top">0.5in</xsl:param>

  <xsl:param name="page.margin.outer">1.0in</xsl:param>
  <xsl:param name="page.margin.inner">1.0in</xsl:param>
  <xsl:param name="body.font.family">Times</xsl:param>
  <xsl:param name="variablelist.as.blocks" select="1"/>

  <xsl:param name="generate.section.toc.level" select="1"/>
  <xsl:param name="generate.component.toc" select="0"/>

The initial (and thus final) value of any parameter can be changed inside your Java code using these three methods of the Transformer class:

public abstract void setParameter(String name, Object value);
public abstract Object getParameter(String name);
public abstract void clearParameters();

The setParameter() method provides a value for a parameter that overrides any value used in the stylesheet itself. The processor is responsible for converting the Java object type passed to a reasonable XSLT equivalent. This should work well enough for String, Integer, Double, and Boolean as well as DOM types like Node and NodeList. However, I wouldn’t rely on it for anything more complex like a File or a Frame.

The getParameter() method returns the value of a parameter previously set by Java. It will not return any value from the stylesheet itself, even if it has not been overridden by the Java code. Finally, the clearParameters() method eliminates all Java mappings of parameters so that those variables are returned to whatever value is specified in the stylesheet.

For example, in Java the above list of parameters for the DocBook stylesheets could be set with a JAXP Transformer object like this:

transformer.setParameter("fop.extensions", "1");
transformer.setParameter("page.width.portrait", "7.375in");
transformer.setParameter("page.height.portrait", "9.25in");
transformer.setParameter("page.margin.top", "0.5in");
transformer.setParameter("region.before.extent", "0.5in");
transformer.setParameter("body.margin.top", "0.5in");
transformer.setParameter("page.margin.bottom", "0.5in");
transformer.setParameter("page.margin.outer", "1.0in");
transformer.setParameter("page.margin.inner", "1.0in");
transformer.setParameter("body.font.family", "Times");
transformer.setParameter("variablelist.as.blocks", "1");
transformer.setParameter("generate.section.toc.level", "1");
transformer.setParameter("generate.component.toc", "0");

Here I used strings for all the values. However, in a few cases I could have used a Number of some kind instead.


Output Properties

The xsl:output instruction controls the details of serialization. For example, it can specify XML, HTML, or plain text output. It can specify the encoding of the output, what the document type declaration points to, whether the elements should be indented, what the value of the standalone declaration is, where CDATA sections should be used, and more. For example, adding this xsl:output element to a stylesheet would produce plain text output instead of XML:

<xsl:output
  method="text"
  encoding="US-ASCII"
  media-type="text/plain"
/>

This xsl:output element asks for pretty-printed XML:

<xsl:output
  method="xml"
  encoding="UTF-16"
  indent="yes"
  media-type="text/xml"
  standalone="yes"
/>

In all, there are ten attributes of the xsl:output element that control serialization of the result tree:

method="xml | html | text"

The output method. xml is the default. html uses classic HTML syntax such as <hr> instead of <hr />. text outputs plain text but no markup.

version="1.0"

The version number used in the XML declaration. Currently, this should always have the value 1.0.

encoding="UTF-8 | UTF-16 | ISO-8859-1 | …"

The encoding used for the output and in the encoding declaration of the output document.

omit-xml-declaration="yes | no"

yes if the XML declaration should be omitted, no otherwise. (i.e. no if the XML declaration should be included, yes if it shouldn’t be.) The default is no.

standalone="yes | no"

The value of the standalone attribute for the XML declaration; either yes or no

doctype-public="public ID"

The public identifier used in the DOCTYPE declaration

doctype-system="URI"

The URL used as a system identifier in the DOCTYPE declaration

cdata-section-elements="element_name_1 element_name_2 …"

A white space separated list of the qualified names of the elements’ whose content should be output as a CDATA section

indent="yes | no"

yes if extra white space should be added to pretty-print the result, no otherwise. The default is no.

media-type="text/xml | text/html | text/plain | application/xml… "

The MIME media type of the output such as text/html, application/xml, or application/xml+svg


Controlling Output Properties from Java


Sources and Results

The Source and Result interfaces abstract out the API dependent details of exactly how an XML document is represented. You can construct sources from DOM nodes, SAX event sequences, and raw streams. You can target the result of a transform at a DOM Node, a SAX ContentHandler, or a stream-based target such as an OutputStream, Writer, File, or String. Other models may also provide their own implementations of these interfaces. For instance, JDOM has an org.jdom.transform package that includes a JDOMSource and JDOMResult class.

In fact, these different models have very little in common, other than that they all hold an XML document. Consequently, the Source and Result interfaces don’t themselves provide a lot of the functionality you need, just methods to get the system and public ID of the document. Everything else is deferred to the implementations.


DOMSource and DOMResult

package javax.xml.transform.dom;

public class DOMSource implements Source {

  public static final String FEATURE =
    "http://javax.xml.transform.dom.DOMSource/feature";

  public DOMSource() {}
  public DOMSource(Node node);
  public DOMSource(Node node, String systemID);

  public void    setNode(Node node);
  public Node   getNode();
  public void    setSystemId(String baseID);
  public String getSystemId();

}

In theory, you should be able to convert any DOM Node object into a DOMSource and transform it. In practice, only transforming document nodes is truly reliable. (It’s not even clear that the XSLT processing model applies to anything that isn’t a complete document.) In my tests, Xalan-J could transform all the nodes I threw at it. However, Saxon could only transform Document objects and Element objects that were part of a document tree.

package javax.xml.transform.dom;

public class DOMResult implements Result {

  public static final String FEATURE =
  "http://javax.xml.transform.dom.DOMResult/feature";

  public DOMResult();
  public DOMResult(Node node);
  public DOMResult(Node node, String systemID);
  
  public void setNode(Node node);
  public Node getNode();
  public void setSystemId(String systemId);
  public String getSystemId();
  
}

If you specify a Node for the result, either via the constructor or by calling setNode(), then the output of the transform will be appended to that node’s children. Otherwise, the transform output will be appended to a new Document or DocumentFragment Node. The getNode() method returns this Node.


SAXSource and SAXResult

package javax.xml.transform.sax;

public class SAXSource implements Source {

  public static final String FEATURE =
   "http://javax.xml.transform.sax.SAXSource/feature";

  public SAXSource();
  public SAXSource(XMLReader reader, InputSource inputSource);
  public SAXSource(InputSource inputSource);
  
  public void        setXMLReader(XMLReader reader);
  public XMLReader   getXMLReader();
  public void        setInputSource(InputSource inputSource);
  public InputSource getInputSource();
  public void        setSystemId(String systemID);
  public String      getSystemId();
  
  public static InputSource sourceToInputSource(Source source);
  
}


package javax.xml.transform.sax;

public class SAXResult implements Result

  public static final String FEATURE =
   "http://javax.xml.transform.sax.SAXResult/feature";

  public SAXResult();
  public SAXResult(ContentHandler handler);
  
  public void           setHandler(ContentHandler handler);
  public ContentHandler getHandler();
  public void           setLexicalHandler(LexicalHandler handler);
  public LexicalHandler getLexicalHandler();
  public void           setSystemId(String systemId);
  public String         getSystemId();
  
}

StreamSource and StreamResult

The StreamSource and StreamResult classes are used as sources and targets for transforms from sequences of bytes and characters. This includes streams, readers, writers, strings, and files. What unifies these is that none of them know they contain an XML document. Indeed, on input they may not always contain an XML document. If so, an exception will be thrown as soon as you attempt to build a Transformer or a Templates object from the StreamSource.

package javax.xml.transform.stream;

public class StreamSource implements Source {

  public static final String FEATURE =
   "http://javax.xml.transform.stream.StreamSource/feature";

  public StreamSource();
  public StreamSource(InputStream inputStream);
  public StreamSource(InputStream inputStream, String systemID);
  public StreamSource(Reader reader);
  public StreamSource(Reader reader, String systemID);
  public StreamSource(String systemID);
  public StreamSource(File f);
  
  public void        setInputStream(InputStream inputStream);
  public InputStream getInputStream();
  public void        setReader(Reader reader);
  public Reader      getReader();
  public void        setPublicId(String publicID);
  public String      getPublicId();
  public void        setSystemId(String systemID);
  public String      getSystemId();
  public void        setSystemId(File f);
  
}

You should not specify both an InputStream and a Reader. If you do, which one the processor reads from is implementation dependent. If neither an InputStream nor a Reader is available, then the processor will attempt to open a connection to the URI specified by the system ID. You should set the system ID even if you do specify an InputStream or a Reader because this will be needed to resolve relative URLs that appear inside the stylesheet and input document.

package javax.xml.transform.stream;

public class StreamResult implements Result

  public static final String FEATURE =
   "http://javax.xml.transform.stream.StreamResult/feature";

  public StreamResult() {}
  public StreamResult(OutputStream outputStream);
  public StreamResult(Writer writer);
  public StreamResult(String systemID);
  public StreamResult(File f);
  
  public void         setOutputStream(OutputStream outputStream);
  public OutputStream getOutputStream();
  public void         setWriter(Writer writer);
  public Writer       getWriter();
  public void         setSystemId(String systemID);
  public void         setSystemId(File f);
  public String       getSystemId();
  
}

You should specify the system ID URL and one of the other identifiers (File, OutputStream, Writer, or String.) If you specify more than one possible target, which one the processor chooses is implementation dependent.


To Learn More



To Learn More


Index | Cafe con Leche

Copyright 2000-2003 Elliotte Rusty Harold
elharo@metalab.unc.edu
Last Modified January 16, 2003