JDOM

Intro to JDOM

Elliotte Rusty Harold

Extreme Markup Languages 2000

Wednesday, August 16, 2000

elharo@metalab.unc.edu

http://metalab.unc.edu/xml/

What is JDOM?

A Pure Java API for reading and writing XML Documents
A Java-oriented API for reading and writing XML Documents
A tree-oriented API for reading and writing XML Documents
A parser independent API for reading and writing XML Documents

About JDOM

Created by Brett McLaughlin and Jason Hunter
Open source with an Apache-like license
http://www.jdom.org/

What's Wrong with DOM?

Language Independent; therefore language incovenient:
- Very limited use of the Java class library; Many features are reinvented:
  - NodeIterator instead of java.util.Iterator
- Bad design:
  - Method overloading isn't used because JavaScript doesn't support it.
  - Named constants are often shorts
  - Only one kind of exception; details provided by constants
  - No Java-specific utility methods like equals(), hashCode(), clone(), or toString()
Unexpected behavior:
- The value of a non-empty element is null
- DOM implementations allow the creation of malformed documents
Parser independent interfaces but parser dependent implementation classes. Most programs must use the parser dependent classes. JAXP helps solve this, but so far only for DOM Level 1.

DOM vs. JDOM

DOM Interface	JDOM Class
`Attr`	`Attribute`
`CDATASection`
`CharacterData`
`Comment`	`Comment`
`Document`	`Document`
`DocumentFragment`
`DocumentType`	`DocType`
`DOMImplementation`
`Element`	`Element`
`Entity`	`Entity`
`EntityReference`
`NamedNodeMap`
`Node`
`NodeList`
`Notation`
`ProcessingInstruction`	`ProcessingInstruction`
`Text`	`java.lang.String`
	`Verifier`
`DOMException`	`JDOMException`, `IllegalAddException`, `IllegalDataException`, `IllegalNameException.`, `IllegalTargetException`

JDOM versions

1.0 Beta 4 is current tarball from June 9, 2000
Last two months have significantly changed the API
This presentation is based on the August 4, 2000 version
cvs.jdom.org

Four packages:

org.jdom: the classes that represent an XML document and its parts
org.jdom.input: classes for reading a document into memory
org.jdom.output: classes for writing a document onto a stream or other target (e.g. SAX or DOM app)
org.jdom.adapters: classes for hooking up to DOM implementations

The org.jdom package

The classes that represent an XML document and its parts

Attribute
Comment
DocType
Document
Element
Entity
ProcessingInstruction
plus Verifier
plus assorted exceptions

The org.jdom.input package

Classes for reading a document into memory from a file or other source

DOMBuilder
SAXBuilder

The org.jdom.output package

The classes for writing a document to a file or other target

XMLOutputter
SAXOutputter (under development)
DOMOutputter (under development)

The org.jdom.adapters package

Classes for hooking up JDOM to DOM implementations:
- AbstractDOMAdapter
- OracleV1DOMAdapter
- OracleV2DOMAdapter
- ProjectXDOMAdapter
- XercesDOMAdapter
- XercesDOMAdapter
You rarely need to access these directly.

Writing XML Documents with JDOM

JDOM is for both input and output
New documents can be read from a stream or constructed in memory
An org.jdom.output.XMLOutputter sends a document from memory to an OutputStream or Writer
A JDOM document can also be sent to a SAX ContentHandler or DOM org.w3c.dom.Document for further processing with a different API

A JDOM program that writes this XML document

<?xml version="1.0"?>
<GREETING>
  Hello JDOM!
</GREETING>

Hello JDOM

import org.jdom.*;
import org.jdom.output.XMLOutputter;


public class HelloJDOM {

  public static void main(String[] args) {
   
    Element root = new Element("GREETING");
    	
    root.setText("Hello JDOM!");
         
    Document doc = new Document(root);      
    
    // At this point the document only exists in memory.
    // We still need to serialize it
    XMLOutputter outputter = new XMLOutputter();
    try {
      outputter.output(doc, System.out);       
    }
    catch (Exception e) {
      System.err.println(e);
    }

  }

}

Actual Output

<?xml version="1.0" encoding="UTF-8"?>
<GREETING>Hello JDOM!</GREETING>

This is more or less what we wanted, modulo white space.

Hello DOM

Here's the same program using DOM instead of JDOM. Which is simpler?

import java.io.*;
import org.w3c.dom.*;
import org.apache.xerces.dom.*;
import org.apache.xml.serialize.*; 


public class HelloDOM {

  public static void main(String[] args) {
   
    try {
      
      DOMImplementationImpl impl = (DOMImplementationImpl) 
       DOMImplementationImpl.getDOMImplementation();
       
      DocumentType type = impl.createDocumentType("GREETING", null, null);
      
      // type is supposed to be able to be null, 
      // but in practice that didn't work                     
      DocumentImpl hello 
       = (DocumentImpl) impl.createDocument(null, "GREETING", type);
      
      Element root = hello.createElement("GREETING");
      
      // We can't use a raw string. Instead we have to first create 
      // a text node.
      Text text = hello.createTextNode("Hello DOM!");
      root.appendChild(text);
      
      // Now that the document is created we need to *serialize* it
      try {
        OutputFormat format = new OutputFormat(hello);
        XMLSerializer serializer = new XMLSerializer(System.out, format);
        serializer.serialize(root);
      }
      catch (IOException e) {
        System.err.println(e); 
      }
    }
    catch (DOMException e) {
      e.printStackTrace();
    }

  }

}

fibonacci.xml

Suppose we want that data in an XML document that looks something like this:

<?xml version="1.0" encoding="UTF-8"?>
<Fibonacci_Numbers>
  <fibonacci index="0">0</fibonacci>
  <fibonacci index="1">1</fibonacci>
  <fibonacci index="2">1</fibonacci>
  <fibonacci index="3">2</fibonacci>
  <fibonacci index="4">3</fibonacci>
  <fibonacci index="5">5</fibonacci>
  <fibonacci index="6">8</fibonacci>
  <fibonacci index="7">13</fibonacci>
  <fibonacci index="8">21</fibonacci>
  <fibonacci index="9">34</fibonacci>
  <fibonacci index="10">55</fibonacci>
  <fibonacci index="11">89</fibonacci>
  <fibonacci index="12">144</fibonacci>
  <fibonacci index="13">233</fibonacci>
  <fibonacci index="14">377</fibonacci>
  <fibonacci index="15">610</fibonacci>
  <fibonacci index="16">987</fibonacci>
  <fibonacci index="17">1597</fibonacci>
  <fibonacci index="18">2584</fibonacci>
  <fibonacci index="19">4181</fibonacci>
  <fibonacci index="20">6765</fibonacci>
  <fibonacci index="21">10946</fibonacci>
  <fibonacci index="22">17711</fibonacci>
  <fibonacci index="23">28657</fibonacci>
  <fibonacci index="24">46368</fibonacci>
  <fibonacci index="25">75025</fibonacci>
</Fibonacci_Numbers>

A JDOM program that writes Fibonacci numbers into an XML file

import org.jdom.Element;
import org.jdom.Document;
import org.jdom.output.XMLOutputter;
import java.math.BigInteger;
import java.io.*;


public class FibonacciJDOM {

  public static void main(String[] args) {
   
    Element root = new Element("Fibonacci_Numbers");	
  	      
    BigInteger low  = BigInteger.ZERO;
    BigInteger high = BigInteger.ONE;      
    
    for (int i = 0; i <= 25; i++) {
      Element fibonacci = new Element("fibonacci");
      fibonacci.addAttribute("index", String.valueOf(i));
      fibonacci.setText(low.toString());
      BigInteger temp = high;
      high = high.add(low);
      low = temp;
      root.addContent(fibonacci);
    }
 
    Document doc = new Document(root);
    // serialize it into a file
    try {
      FileOutputStream out = new FileOutputStream("fibonacci.xml");
      XMLOutputter serializer = new XMLOutputter(); 
      serializer.output(doc, out);
      out.flush();	
      out.close();
    }
    catch (IOException e) {
      System.err.println(e);
    }

  }

}

Using Namespaces

Suppose we want some MathML like this:

<?xml version="1.0" encoding="UTF-8"?>
<mathml:math xmlns:mathml="http://www.w3.org/1998/Math/MathML">
  <mathml:mrow>
    <mathml:mi>f(0)</mathml:mi>
    <mathml:mo>=</mathml:mo>
    <mathml:mn>0</mathml:mn>
  </mathml:mrow>
  <mathml:mrow>
    <mathml:mi>f(1)</mathml:mi>
    <mathml:mo>=</mathml:mo>
    <mathml:mn>1</mathml:mn>
  </mathml:mrow>
  <mathml:mrow>
    <mathml:mi>f(2)</mathml:mi>
    <mathml:mo>=</mathml:mo>
    <mathml:mn>1</mathml:mn>
  </mathml:mrow>
</mathml:math>

Rule for Using Namespaces

Do not use the qualified names like mathml:mn.
Instead use the prefixes mathml, local names like mn, and URIs like http://www.w3.org/1998/Math/MathML to create the elements.
Do not include xmlns attributes like xmlns:mathml="http://www.w3.org/1998/Math/MathML".
XMLOutputter will decide where to put the xmlns attributes when the document is serialized.

With Namespace Prefixes

import org.jdom.Element;
import org.jdom.Document;
import org.jdom.output.XMLOutputter;
import java.math.BigInteger;
import java.io.*;


public class PrefixedFibonacci {

  public static void main(String[] args) {
   
    Element root = new Element("math", "mathml", "http://www.w3.org/1998/Math/MathML");	
  	      
    BigInteger low  = BigInteger.ZERO;
    BigInteger high = BigInteger.ONE;      
    
    for (int i = 0; i <= 25; i++) {
        
      Element mrow = new Element("mrow", "mathml", "http://www.w3.org/1998/Math/MathML");
      
      Element mi = new Element("mi", "mathml", "http://www.w3.org/1998/Math/MathML");
      mi.setText("f(" + i + ")"); 
      mrow.addContent(mi);
      
      Element mo = new Element("mo", "mathml", "http://www.w3.org/1998/Math/MathML");
      mo.setText("="); 
      mrow.addContent(mo);
      
      Element mn = new Element("mn", "mathml", "http://www.w3.org/1998/Math/MathML");
      mn.setText(low.toString());
      mrow.addContent(mn);

      BigInteger temp = high;
      high = high.add(low);
      low = temp;
      root.addContent(mrow);
      
    }
 
    Document doc = new Document(root);
    // serialize it into a file
    try {
      FileOutputStream out = new FileOutputStream("prefixed_fibonacci.xml");
      XMLOutputter serializer = new XMLOutputter(); 
      serializer.output(doc, out);
      out.flush();	
      out.close();
    }
    catch (IOException e) {
      System.err.println(e);
    }

  }

}

View Output in Browser

Parsing Exisitng Documents

The stereotypical "Desperate Perl Hacker" (DPH) is supposed to be able to write an XML parser in a weekend.
The parser does the hard work for you.
Your code reads the document through by hooking up JDOM to the parser.
JDOM can connect to any parser that supports SAX or DOM.

JDOM Compatible Parsers for Java

Any SAX or DOM compatible parser including:

Apache XML Project's Xerces Java: http://xml.apache.org/xerces-j/index.html
Oracle's XML Parser for Java: http://technet.oracle.com/tech/xml/parser_java2
Sun's Java API for XML http://java.sun.com/products/xml

The JDOM Process

Construct an org.jdom.input.SAXBuilder or an org.jdom.input.DOMBuilder; no parser specific code is needed!
Invoke the builder's build() method to build a Document object from a
- Reader
- InputStream
- URL
- File
- String containing a SYSTEM ID
If there's a problem building the document, a JDOMException is thrown
Work with the resulting Document object

Parsing a Document with JDOM

import org.jdom.JDOMException;
import org.jdom.input.SAXBuilder;


public class JDOMChecker {

  public static void main(String[] args) {
  
    if (args.length == 0) {
      System.out.println("Usage: java JDOMChecker URL1 URL2..."); 
    } 
      
    SAXBuilder builder = new SAXBuilder();
     
    // start parsing... 
    for (int i = 0; i < args.length; i++) {
      
      // command line should offer URIs or file names
      try {
        builder.build(args[i]);
        // If there are no well-formedness errors, 
        // then no exception is thrown
        System.out.println(args[i] + " is well formed.");
      }
      catch (JDOMException e) { // indicates a well-formedness or other error
        System.out.println(args[i] + " is not well formed.");
        System.out.println(e.getMessage());
      }
      
    }   
  
  }

}

Parser Results

% java JDOMChecker shortlogs.xml HelloJDOM.java
shortlogs.xml is well formed.
HelloJDOM.java is not well formed.
The markup in the document preceding the root element must be well-formed.: 
Error on line 1 of XML document: The markup in the document preceding the 
root element must be well-formed.

UserLand's RSS based list of Web logs

UserLand's RSS based list of Web logs at http://static.userland.com/weblogMonitor/logs.xml:

<?xml version="1.0" encoding="ISO-8859-1" ?>
<!DOCTYPE foo SYSTEM "http://msdn.microsoft.com/xml/general/htmlentities.dtd">
<weblogs>
	<log>
		<name>MozillaZine</name>
		<url>http://www.mozillazine.org</url>
		<changesUrl>http://www.mozillazine.org/contents.rdf</changesUrl>
		<ownerName>Jason Kersey</ownerName>
		<ownerEmail>kerz@en.com</ownerEmail>
		<description>THE source for news on the Mozilla Organization.  DevChats, Reviews, Chats, Builds, Demos, Screenshots, and more.</description>
		<imageUrl></imageUrl>
		<adImageUrl>http://static.userland.com/weblogMonitor/ads/kerz@en.com.gif</adImageUrl>
		</log>
	<log>
		<name>SalonHerringWiredFool</name>
		<url>http://www.salonherringwiredfool.com/</url>
		<ownerName>Some Random Herring</ownerName>
		<ownerEmail>salonfool@wiredherring.com</ownerEmail>
		<description></description>
		</log>
	<log>
		<name>Scripting News</name>
		<url>http://www.scripting.com/</url>
		<ownerName>Dave Winer</ownerName>
		<ownerEmail>dave@userland.com</ownerEmail>
		<description>News and commentary from the cross-platform scripting community.</description>
		<imageUrl>http://www.scripting.com/gifs/tinyScriptingNews.gif</imageUrl>
		<adImageUrl>http://static.userland.com/weblogMonitor/ads/dave@userland.com.gif</adImageUrl>
		</log>
	<log>
		<name>SlashDot.Org</name>
		<url>http://www.slashdot.org/</url>
		<ownerName>Simply a friend</ownerName>
		<ownerEmail>afriendofweblogs@weblogs.com</ownerEmail>
		<description>News for Nerds, Stuff that Matters.</description>
		</log>
	</weblogs>

Full list

Goal: Return a list of all the URLs in this list as java.net.URL objects

Design Decisions

Should we return an array, an Enumeration, a List, or what?
Perhaps we should use multiple threads?

JDOM Design

We can easily find out how many URLs there will be when we start parsing.
Single threaded by nature; no benefit to mutiple threads since no data will be available until the entire document has been read and parsed.
The character data of each url element needs to be read. Everything else can be ignored.
The format is very straight-forward so we don't need to traverse the entire tree.
The XML parsing is so straight-forward it can be done inside one method. No extra class is required.

Weblogs with JDOM

import org.jdom.*;
import org.jdom.input.SAXBuilder;
import java.util.*;
import java.net.*;


public class WeblogsJDOM {
   
  public static String DEFAULT_SYSTEM_ID 
   = "http://static.userland.com/weblogMonitor/logs.xml"; 
     
  public static List listChannels() throws JDOMException {
    return listChannels(DEFAULT_SYSTEM_ID); 
  }
  
  public static List listChannels(String systemID) 
   throws JDOMException, NullPointerException {
    
    if (systemID == null) {
      throw new NullPointerException("URL must be non-null");   
    }
    
    SAXBuilder builder = new SAXBuilder();
    // Load the entire document into memory 
    // from the network or file system
    Document doc = builder.build(systemID);
    
    // Descend the tree and find the URLs. It helps that
    // the document has a very regular structure.
    Element weblogs = doc.getRootElement();
    List logs = weblogs.getChildren("log");
    Vector urls = new Vector(logs.size());
    Iterator iterator = logs.iterator();
    while (iterator.hasNext()) {
      Object o = iterator.next();
      Element log = (Element) o;
      try {
                         // This will probably be changed to 
                         //  getElement() or getChildElement() 
        Element url = log.getChild("url"); 
        if (url == null) continue;
        String content = url.getTextTrim();
        URL u = new URL(content);
        urls.addElement(u);
      }
      catch (MalformedURLException e) {
        // bad input data from one third party; just ignore it 
      }
    }
    return urls;
    
  }
  
  public static void main(String[] args) {
   
    try {
      List urls;
      if (args.length > 0) {
        urls = listChannels(args[0]);
      }
      else {
        urls = listChannels();
      }
      Iterator iterator = urls.iterator();
      while (iterator.hasNext()) {
        System.out.println(iterator.next()); 
      }
    }
    catch (/* Unexpected */ Exception e) {
      e.printStackTrace(); 
    }
    
  }
  
}

Weblogs Output

% java WeblogsJDOM
http://2020Hindsight.editthispage.com/
http://www.sff.net/people/mitchw/weblog/weblog.htp
http://nate.weblogs.com/
http://plugins.launchpoint.net
http://404.psistorm.net
http://home.att.net/~geek9000
http://daubnet.tzo.com/weblog
several hundred more...

The org.jdom Package

The classes that represent an XML document and its parts

Document
Element
Attribute
Comment
DocType
Entity
ProcessingInstruction
Verifier
plus assorted exceptions

The Document Node

The root node containing the entire document; not the same as the root element
Contains:
- one element
- zero or more processing instructions
- zero or more comments
- zero or one document type declarations

The Document Class

package org.jdom;

public class Document implements Serializable, Cloneable {

  protected List    content;
  protected Element rootElement;
  protected DocType docType;

  protected Document() {}
  public    Document(Element rootElement) {}
  public    Document(Element rootElement, DocType docType) {}

  public Element   getRootElement() {}
  public Document  setRootElement(Element rootElement) {}
  public DocType   getDocType() {}
  public Document  setDocType(DocType docType) {}
  public List      getProcessingInstructions() {}
  public List      getProcessingInstructions(String target) {}
  public ProcessingInstruction getProcessingInstruction(String target)
    throws NoSuchProcessingInstructionException {}
  public Document  addProcessingInstruction(ProcessingInstruction pi) {}
  public Document  addProcessingInstruction(String target, String data) {}
  public Document  addProcessingInstruction(String target, Map data) {}
  public Document  setProcessingInstructions(List processingInstructions) {}
  public boolean   removeProcessingInstruction(ProcessingInstruction processingInstruction) {}
  public boolean   removeProcessingInstruction(String target) {}
  public boolean   removeProcessingInstructions(String target) {}
  public Document  addComment(Comment comment) {}
  public List      getMixedContent() {}
  
  // basic utility methods
  public final String  toString() {}
  public final String  getSerializedForm() {}  // going away
  public final boolean equals(Object ob) {}
  public final int     hashCode() {}
  public final Object  clone() {}

}

Element Nodes

Represents a complete element including its start tag, end tag, and content
Contains:
- Child Elements
- Processing Instructions
- Comments
- Text
JDOM enforces restrictions on element names and possibly values; e.g. name cannot contain start with a digit.

Element Class Implementation

The content is stored as a java.util.List which contains
- One String object per text node
- One Element object per child element
- One Comment object per comment
- One ProcessingInstruction object per processing instruction
Use the regular methods of java.util.List to add, remove, and inspect the contents of an element
Since the methods of java.util.List expect to work with Object objects, casting back to JDOM types and String is frequent
Various utility methods mean you don't always have to work with the full list.
Attributes are available as a separate List since attributes are not children.
This list only contains Attribute objects.

The Element Class

package org.jdom;

public class Element implements Serializable, Cloneable {

    protected String    name;
    protected Namespace namespace;
    protected Element   parent;
    protected boolean   isRootElement;
    protected List      attributes;
    protected List      content;

    protected Element() {}
    public    Element(String name, Namespace namespace) {}
    public    Element(String name) {}
    public    Element(String name, String uri) {}
    public    Element(String name, String prefix, String uri) {}

    public String     getName() {}
    public Namespace  getNamespace() {}
    public String     getNamespacePrefix() {}
    public String     getNamespaceURI() {}
    public String     getQualifiedName() {}
    
    public Element    getParent() {}
    protected Element setParent(Element parent) {}
    public boolean    isRootElement() {}
    protected Element setIsRootElement(boolean isRootElement) {}
    

    public String    getText() {} 
    public String    getTextTrim() {} 
    public boolean   hasMixedContent() {} 
    public List      getMixedContent() {}
    public String    getChildText(String name) {} 
    public String    getChildTextTrim(String name) {} 
    public String    getChildText(String name, Namespace ns) {} 

    public Element   setMixedContent(List mixedContent) {} 
    public List      getChildren() {} 
    public Element   setChildren(List children) {} 
    public List      getChildren(String name, Namespace ns) {} 
    // will be renamed, probably getElement() {}
    public Element   getChild(String name, Namespace ns) {} 
    public Element   getChild(String name) {} 
    public boolean   removeChild(String name) {} 
    public boolean   removeChild(String name, Namespace ns) {} 
    public boolean   removeChildren(String name) {}
    public boolean   removeChildren(String name, Namespace ns) {} 
    public boolean   removeChildren() {} 
    
    public Element   addContent(String text) {}
    public Element   addContent(Element element) {} 
    public Element   addContent(ProcessingInstruction pi) {} 
    public Element   addContent(Entity entity) {} 
    public Element   addContent(Comment comment) {} 
    public boolean   removeContent(Element element) {} 
    public boolean   removeContent(ProcessingInstruction pi) {} 
    public boolean   removeContent(Entity entity) {} 
    public boolean   removeContent(Comment comment) {} 
    
    public List      getAttributes() {} 
    public Attribute getAttribute(String name) {} 
    public Attribute getAttribute(String name, Namespace ns) {} 
    public String    getAttributeValue(String name) {} 
    public String    getAttributeValue(String name, Namespace ns) {} 
    public Element   setAttributes(List attributes) {} 
    public Element   addAttribute(Attribute attribute) {}
    public Element   addAttribute(String name, String value) {} 
    public boolean   removeAttribute(String name, String uri) {} 
    public boolean   removeAttribute(String name) {} 
    public boolean   removeAttribute(String name, Namespace ns) {} 
    
    public Element   getCopy(String name, Namespace ns) {}
    public Element   getCopy(String name, String uri) {}
    public Element   getCopy(String name, String prefix, String uri) {}

    /////////////////////////////////////////////////////////////////
    // Basic Utility Methods
    ///////////////////////////////////////////////////////////////// 
    public final String  toString() {}
    public final String  getSerializedForm() {}  // will be removed
    public final boolean equals(Object ob) {}
    public final int     hashCode() {}
    public final Object  clone() {}
    
}

Element Example: XCount

import org.jdom.*;
import org.jdom.input.SAXBuilder;
import java.util.*;


public class XCount {

  public static void main(String[] args) {
  
    if (args.length == 0) {
      System.out.println("Usage: java XCount URL1 URL2..."); 
    } 
      
    SAXBuilder builder = new SAXBuilder();
     
    System.out.println(
     "File\tElements\tAttributes\tComments\tProcessing Instructions\tCharacters");
    // start parsing... 
    for (int i = 0; i < args.length; i++) {
      
      // command line should offer URIs or file names
      try {
        Document doc = builder.build(args[i]);
        System.out.print(args[i] + ":\t");
        String result = count(doc);
        System.out.println(result);
      }
      catch (JDOMException e) { // indicates a well-formedness or other error
        System.out.println(args[i] + " is not a well formed XML document.");
        System.out.println(e.getMessage());
      }
      
    }   
  
  }  

  private static int numCharacters             = 0;
  private static int numComments               = 0;
  private static int numElements               = 0;
  private static int numAttributes             = 0;
  private static int numProcessingInstructions = 0;
      
  public static String count(Document doc) {

    numCharacters = 0;
    numComments = 0;
    numElements = 0;
    numAttributes = 0;
    numProcessingInstructions = 0;  

    List children = doc.getMixedContent();
    Iterator iterator = children.iterator();
    while (iterator.hasNext()) {
      Object o = iterator.next();
      if (o instanceof Element) {
        numElements++;
        count((Element) o);
      }
      else if (o instanceof Comment) numComments++;
      else if (o instanceof ProcessingInstruction) numProcessingInstructions++;   
    }
    
    String result = numElements + "\t" + numAttributes + "\t" 
     + numComments + "\t" + numProcessingInstructions + "\t" + numCharacters;
    return result;
       
  }     

  public static void count(Element element) {

    List attributes = element.getAttributes();
    numAttributes += attributes.size();
    List children = element.getMixedContent();
    Iterator iterator = children.iterator();
    while (iterator.hasNext()) {
      Object o = iterator.next();
      if (o instanceof Element) {
        numElements++;
        count((Element) o);
      }
      else if (o instanceof Comment) numComments++;
      else if (o instanceof ProcessingInstruction) numProcessingInstructions++;   
      else if (o instanceof String) {
        String s = (String) o;
        numCharacters += s.length();
      }   
    }
        
  }  

}

XCount Output

% java XCount shortlogs.xml hotcop.xml
File    Elements        Attributes      Comments        Processing Instructions
Characters
shortlogs.xml:  30      0       0       0       736
hotcop.xml:     11      8       2       1       95

Handling Attributes in JDOM

Each attribute is represented as an Attribute object
Each Attribute has:
- A local name, a String
- A value, a String
- A Namespace object (which may be Namespace.NO_NAMESPACE)
Everything else can be determined from these three items.

Convenience methods can convert the attribute value to various types like int or double
JDOM enforces restrictions on attribute names and values; e.g. value may not contain < or >
Attributes are stored in a java.util.List in the Element that contains them
This list only contains Attribute objects.

The Attribute Class

package org.jdom;

public class Attribute implements Serializable, Cloneable {

    protected String    name;
    protected Namespace namespace;
    protected String    value;

    protected Attribute() {}
    public    Attribute(String name, String value, Namespace namespace) {}
    public    Attribute(String name, String prefix, String uri, String value) {}
    public    Attribute(String name, String value) {}

    public String    getName() {}
    public String    getQualifiedName() {}
    public String    getNamespacePrefix() {}
    public String    getNamespaceURI() {}
    public Namespace getNamespace() {}
    public String    getValue() {}
    public void      setValue(String value) {}

    /////////////////////////////////////////////////////////////////
    // Basic Utility Methods
    /////////////////////////////////////////////////////////////////

    public final String  toString() {}
    public final boolean equals(Object ob) {}
    public final int     hashCode() {}
    public final Object  clone() {}

    /////////////////////////////////////////////////////////////////
    // Convenience Methods below here
    /////////////////////////////////////////////////////////////////

    public String  getValue(String defaultValue) {}
    public int     getIntValue(int defaultValue) {}
    public int     getIntValue() throws DataConversionException {}
    public long    getLongValue(long defaultValue) {}
    public long    getLongValue() throws DataConversionException {}
    public float   getFloatValue(float defaultValue) {}
    public float   getFloatValue() throws DataConversionException {}
    public double  getDoubleValue(double defaultValue) {}
    public double  getDoubleValue() throws DataConversionException {}
    public boolean getBooleanValue(boolean defaultValue) {}
    public boolean getBooleanValue() throws DataConversionException {}
    public char    getCharValue(char defaultValue) {}
    public char    getCharValue() throws DataConversionException {}

}

IDTagger

import java.io.IOException;
import org.jdom.*;
import org.jdom.input.SAXBuilder;
import org.jdom.output.XMLOutputter;
import java.util.*;


public class IDTagger {

  private static int id = 1;

  public static void processElement(Element element) {
    

    if (element.getAttribute("ID") == null) {
      element.addAttribute(new Attribute("ID", "_" + id));
      id = id + 1; 
    }
    
    // recursion
    List children = element.getChildren();
    Iterator iterator = children.iterator();
    while (iterator.hasNext()) {
      processElement((Element) iterator.next());   
    }
    
  }

  public static void main(String[] args) {
     
    SAXBuilder builder = new SAXBuilder();
    
    for (int i = 0; i < args.length; i++) {
        
      try {
        // Read the entire document into memory
        Document document = builder.build(args[i]); 
       
        processElement(document.getRootElement());
        
        // now we serialize the document...
        XMLOutputter serializer = new XMLOutputter(); 
        serializer.output(document, System.out);
        System.out.flush();	        
      }
      catch (JDOMException e) {
        System.err.println(e);
        continue; 
      }
      catch (IOException e) {
        System.err.println(e);
        continue; 
      }
      
    }
  
  } // end main

}

Before IDTagger

<?xml version="1.0"?><backslash
xmlns:backslash="http://slashdot.org/backslash.dtd">

 <story>
    <title>The Onion to buy the New York Times</title>
    <url>http://slashdot.org/articles/00/02/19/1128240.shtml</url>
    <time>2000-02-19 17:25:15</time>
    <author>CmdrTaco</author>
    <department>stuff-to-read</department>
    <topic>media</topic>
    <comments>20</comments>
    <section>articles</section>
    <image>topicmedia.gif</image>
  </story>
 <story>
    <title>Al Gore's Webmaster Answers Your Questions</title>
    <url>http://slashdot.org/interviews/00/02/19/0932207.shtml</url>
    <time>2000-02-19 17:00:52</time>
    <author>Roblimo</author>
    <department>political-process-online</department>
    <topic>usa</topic>
    <comments>49</comments>
    <section>interviews</section>
    <image>topicus.gif</image>
  </story>
 <story>
    <title>Open Source Africa</title>
    <url>http://slashdot.org/articles/00/02/19/1016216.shtml</url>
    <time>2000-02-19 16:05:58</time>
    <author>emmett</author>
    <department>songs-by-toto</department>
    <topic>linux</topic>
    <comments>50</comments>
    <section>articles</section>
    <image>topiclinux.gif</image>
  </story>
 <story>
    <title>Microsoft Funded by NSA, Helps Spy on Win Users?</title>
    <url>http://slashdot.org/articles/00/02/19/0750247.shtml</url>
    <time>2000-02-19 14:07:04</time>
    <author>Roblimo</author>
    <department>deep-dark-conspiracy-theories</department>
    <topic>microsoft</topic>
    <comments>154</comments>
    <section>articles</section>
    <image>topicms.gif</image>
  </story>
 <story>
    <title>X-Men Trailer Released</title>
    <url>http://slashdot.org/articles/00/02/18/0829209.shtml</url>
    <time>2000-02-19 13:47:06</time>
    <author>emmett</author>
    <department>mutant</department>
    <topic>movies</topic>
    <comments>70</comments>
    <section>articles</section>
    <image>topicmovies.gif</image>
  </story>
 <story>
    <title>Connell Replies to "Grok" Comments</title>
    <url>http://slashdot.org/articles/00/02/18/202240.shtml</url>
    <time>2000-02-19 05:01:37</time>
    <author>Hemos</author>
    <department>replying-to-things</department>
    <topic>linux</topic>
    <comments>197</comments>
    <section>articles</section>
    <image>topiclinux.gif</image>
  </story>
 <story>
    <title>etoy.com Returns</title>
    <url>http://slashdot.org/yro/00/02/18/1739216.shtml</url>
    <time>2000-02-19 02:35:06</time>
    <author>nik</author>
    <department>NP:-gimme-shelter</department>
    <topic>internet</topic>
    <comments>77</comments>
    <section>yro</section>
    <image>topicinternet.jpg</image>
  </story>
 <story>
    <title>New Propaganda Series: Rebirth</title>
    <url>http://slashdot.org/articles/00/02/18/205232.shtml</url>
    <time>2000-02-19 01:05:26</time>
    <author>Hemos</author>
    <department>as-pretty-as-always</department>
    <topic>graphics</topic>
    <comments>120</comments>
    <section>articles</section>
    <image>topicgraphics3.gif</image>
  </story>
 <story>
    <title>Giving Back</title>
    <url>http://slashdot.org/features/00/02/18/1631224.shtml</url>
    <time>2000-02-18 22:27:26</time>
    <author>emmett</author>
    <department>salvation-army</department>
    <topic>news</topic>
    <comments>122</comments>
    <section>features</section>
    <image>topicnews.gif</image>
  </story>
 <story>
    <title>Connectix Considering Open Sourcing VGS?</title>
    <url>http://slashdot.org/articles/00/02/18/1050225.shtml</url>
    <time>2000-02-18 20:46:20</time>
    <author>emmett</author>
    <department>grain-of-salt</department>
    <topic>news</topic>
    <comments>93</comments>
    <section>articles</section>
    <image>topicnews.gif</image>
  </story>
</backslash>

View Input in Browser

After IDTagger

<?xml version="1.0" encoding="UTF-8"?>
<backslash ID="_1">
  <story ID="_2">
    <title ID="_3">The Onion to buy the New York Times</title>
    <url ID="_4">http://slashdot.org/articles/00/02/19/1128240.shtml</url>
    <time ID="_5">2000-02-19 17:25:15</time>
    <author ID="_6">CmdrTaco</author>
    <department ID="_7">stuff-to-read</department>
    <topic ID="_8">media</topic>
    <comments ID="_9">20</comments>
    <section ID="_10">articles</section>
    <image ID="_11">topicmedia.gif</image>
  </story>
  <story ID="_12">
    <title ID="_13">Al Gore's Webmaster Answers Your Questions</title>
    <url ID="_14">http://slashdot.org/interviews/00/02/19/0932207.shtml</url>
    <time ID="_15">2000-02-19 17:00:52</time>
    <author ID="_16">Roblimo</author>
    <department ID="_17">political-process-online</department>
    <topic ID="_18">usa</topic>
    <comments ID="_19">49</comments>
    <section ID="_20">interviews</section>
    <image ID="_21">topicus.gif</image>
  </story>
  <story ID="_22">
    <title ID="_23">Open Source Africa</title>
    <url ID="_24">http://slashdot.org/articles/00/02/19/1016216.shtml</url>
    <time ID="_25">2000-02-19 16:05:58</time>
    <author ID="_26">emmett</author>
    <department ID="_27">songs-by-toto</department>
    <topic ID="_28">linux</topic>
    <comments ID="_29">50</comments>
    <section ID="_30">articles</section>
    <image ID="_31">topiclinux.gif</image>
  </story>
  <story ID="_32">
    <title ID="_33">Microsoft Funded by NSA, Helps Spy on Win Users?</title>
    <url ID="_34">http://slashdot.org/articles/00/02/19/0750247.shtml</url>
    <time ID="_35">2000-02-19 14:07:04</time>
    <author ID="_36">Roblimo</author>
    <department ID="_37">deep-dark-conspiracy-theories</department>
    <topic ID="_38">microsoft</topic>
    <comments ID="_39">154</comments>
    <section ID="_40">articles</section>
    <image ID="_41">topicms.gif</image>
  </story>
  <story ID="_42">
    <title ID="_43">X-Men Trailer Released</title>
    <url ID="_44">http://slashdot.org/articles/00/02/18/0829209.shtml</url>
    <time ID="_45">2000-02-19 13:47:06</time>
    <author ID="_46">emmett</author>
    <department ID="_47">mutant</department>
    <topic ID="_48">movies</topic>
    <comments ID="_49">70</comments>
    <section ID="_50">articles</section>
    <image ID="_51">topicmovies.gif</image>
  </story>
  <story ID="_52">
    <title ID="_53">Connell Replies to "Grok" Comments</title>
    <url ID="_54">http://slashdot.org/articles/00/02/18/202240.shtml</url>
    <time ID="_55">2000-02-19 05:01:37</time>
    <author ID="_56">Hemos</author>
    <department ID="_57">replying-to-things</department>
    <topic ID="_58">linux</topic>
    <comments ID="_59">197</comments>
    <section ID="_60">articles</section>
    <image ID="_61">topiclinux.gif</image>
  </story>
  <story ID="_62">
    <title ID="_63">etoy.com Returns</title>
    <url ID="_64">http://slashdot.org/yro/00/02/18/1739216.shtml</url>
    <time ID="_65">2000-02-19 02:35:06</time>
    <author ID="_66">nik</author>
    <department ID="_67">NP:-gimme-shelter</department>
    <topic ID="_68">internet</topic>
    <comments ID="_69">77</comments>
    <section ID="_70">yro</section>
    <image ID="_71">topicinternet.jpg</image>
  </story>
  <story ID="_72">
    <title ID="_73">New Propaganda Series: Rebirth</title>
    <url ID="_74">http://slashdot.org/articles/00/02/18/205232.shtml</url>
    <time ID="_75">2000-02-19 01:05:26</time>
    <author ID="_76">Hemos</author>
    <department ID="_77">as-pretty-as-always</department>
    <topic ID="_78">graphics</topic>
    <comments ID="_79">120</comments>
    <section ID="_80">articles</section>
    <image ID="_81">topicgraphics3.gif</image>
  </story>
  <story ID="_82">
    <title ID="_83">Giving Back</title>
    <url ID="_84">http://slashdot.org/features/00/02/18/1631224.shtml</url>
    <time ID="_85">2000-02-18 22:27:26</time>
    <author ID="_86">emmett</author>
    <department ID="_87">salvation-army</department>
    <topic ID="_88">news</topic>
    <comments ID="_89">122</comments>
    <section ID="_90">features</section>
    <image ID="_91">topicnews.gif</image>
  </story>
  <story ID="_92">
    <title ID="_93">Connectix Considering Open Sourcing VGS?</title>
    <url ID="_94">http://slashdot.org/articles/00/02/18/1050225.shtml</url>
    <time ID="_95">2000-02-18 20:46:20</time>
    <author ID="_96">emmett</author>
    <department ID="_97">grain-of-salt</department>
    <topic ID="_98">news</topic>
    <comments ID="_99">93</comments>
    <section ID="_100">articles</section>
    <image ID="_101">topicnews.gif</image>
  </story>
</backslash>

View Output in Browser

Handling Namespaces

JDOM is fully namespace aware
Namespaces are represented by instances of the Namespace class rather than by attributes or raw strings
Always ask for elements and attributes by local names and namespace URIs
Elements and attributes that are not in any namespace can be asked for by local name alone
Never identify an element or attribute by qualified name

The Namespace Class

Mostly for internal parser use
Occasionally useful for tasks like finding out whether a document contains any XLinks

The Namespace Class

package org.jdom;

public final class Namespace {

  public static final Namespace NO_NAMESPACE = new Namespace("", "");

  // factory methods
  public static Namespace getNamespace(String prefix, String uri) {}
  public static Namespace getNamespace(String uri) {}

  // getter methods
  public String  getPrefix() {}
  public String  getURI() {}

  // utility methods
  public boolean equals(Object ob) {}
  public String  toString() {}
  public int     hashCode() {}

}

What JDOM doesn't do

Documents larger than available memory
Byte-for-byte faithful round trips
Maintaining namespace prefixes (will be added before 1.0)
DTDs
XPath Queries (may be added in 1.1)

To Learn More

This week's presentation: http://metalab.unc.edu/xml/slides/extreme/jdom/
Last week's presentation: http://metalab.unc.edu/xml/slides/xmlsig/jdom/
JavaWorld: http://javaworld.com/javaworld/jw-05-2000/jw-0518-jdom.html
JDOM Web Site, http://www.jdom.org/
Java and XML, Brett McLaughlin, O'Reilly & Associates, 2000, ISBN 0-596-00016-2, http://www.oreilly.com/catalog/javaxml/

Questions?

Index | Cafe con Leche