XOM Makes XML Easier

Elliotte Rusty Harold

XML Developers Network of the Capital District

Tuesday, January 17, 2006

elharo@metalab.unc.edu

http://www.xom.nu/


Outline


A few opinions

XML was, as has been fretted over before, ugly, hard, and boring to code with. Not any more :). XOM rocks! I'm using it in all my projects now.

Keep it up!

--Patrick Collison


I did some XML Programming during the last month with Java's DOM. this was not funny !! I also played with Ruby's powerful REXML. this is a great API becaue it uses the power of Ruby and it was designed for Ruby and is not a generic interface like DOM. this is way REXML is so popular in the Ruby world.

and this is why I like XOM. for me it fits much better to Java than DOM. I hope that XOM will become for Java what REXML is for Ruby now.

--Markus Jais


Overall, I found XOM to be an amazingly well-organized, intuitive API that's easy to learn and to use. I like how it reinforces good practices and provides insight about XML -- such as the lack of whitespace when XML is produced without a serializer and the identical treatment of text whether it consists of character entities, CDATA sections, or regular characters.

I can't compare it to JDOM, but it's appreciably more pleasant to work with than the Simple API for XML Processing.

--Rogers Cadenhead


i spent yesterday writing the code to render my application config as xml. using xom was like falling off a log. no muss, no fuss, the methods did what i expected, and any confusion was quickly ironed out by a visit to the (copious) examples, or the javadocs. i did run into what might be a bug, but it only showed up because i made a dumb cut-n-paste error (see my other email).

after i get the output tidied up, i'll move on to reading it back in. i'm confident that that will be almost as easy...

--Dirk Bergstrom


XOM has the best API ever.

In my app we churn business objects into XHTML then XSL:FO and finally PDF. XOM makes it super easy to build the XHTML tree. And if I play my cards right, I might be able to turn that XHTML into FO without serializing it to bytes first. Amazing.

XOM makes XML fun again! Get rid of SAX, DOM and hardcoded "<html>". Get XOM, be happy.

--Jesse Wilson


I just started to use XOM in my beanshell scripts and have found it intuitive and very simple to use. It produces code that is very clear at a higher level of abstraction than I usually am forced to work.

--Gary Furash


XOM is the most correct and easiest to use XML tree and streaming API I've come across so far.

--Wolfgang Hoschek


Current version


Why Me?


Four Styles of XML API


Push APIs


Pull APIs


Data Binding APIs


Tree APIs


DOM


DOM Ugliness


Reasons for DOM Ugliness


What I learned from DOM


JDOM


Is JDOM too Java-centric?


What I learned from JDOM


dom4j


Conclusion: We can do better


nu.xom: A New XML Object Model


Design Goals


Design Principles


Principles of API Design


XML Principles


Java Design Principles


Development Style


Create and serialize a document

import java.math.BigInteger;
import nu.xom.Element;
import nu.xom.Document;

public class FibonacciXML {

  public static void main(String[] args) {
   
      BigInteger low  = BigInteger.ONE;
      BigInteger high = BigInteger.ONE;      
      
      Element root = new Element("Fibonacci_Numbers");  
      for (int i = 1; i <= 10; i++) {
        Element fibonacci = new Element("fibonacci");
        fibonacci.appendChild(low.toString());
        root.appendChild(fibonacci);
		
        BigInteger temp = high;
        high = high.add(low);
        low = temp;
      }
      Document doc = new Document(root);
      System.out.println(doc.toXML());  

  }

}

FibonacciXML Output

% java -classpath ~/XOM/build/classes:. FibonacciXML
<?xml version="1.0"?>
<Fibonacci_Numbers><fibonacci>1</fibonacci><fibonacci>1</fibonacci><fibonacci>2</fibonacci><fibonacci>3</fibonacci><fibonacci>5</fibonacci><fibonacci>8</fibonacci><fibonacci>13</fibonacci><fibonacci>21</fibonacci><fibonacci>34</fibonacci><fibonacci>55</fibonacci></Fibonacci_Numbers>

Parsing a document

try {
  Builder parser = new Builder();
  Document doc = parser.build(url);
  System.out.println(doc.toXML());
}
catch (ParsingException ex) {
  System.out.println(url + " is not well-formed.");
  System.out.println(ex.getMessage());
}
catch (IOException ex) { 
  System.out.println("Due to an IOException, "
  + "the parser could not check " + args[0]); 
}

The Node Class

public abstract class Node {

  public       String     getValue();
  public final Document   getDocument();
  public       String     getBaseURI();
  public final ParentNode getParent();
  public       Node       getChild(int position);
  public       int        getChildCount();

  public final void       detach();
  public       Node       copy();    
  public       String     toXML(); 
  
  public final boolean    equals(Object o);
  public final int        hashCode();
      
}

Example: PropertyPrinter

import java.io.*;
import nu.xom.*;

public class PropertyPrinter {

    private Writer out;
    
    public PropertyPrinter(Writer out) {
      if (out == null) {
        throw new NullPointerException("Writer must be non-null.");
      }
      this.out = out;
    }
    
    public PropertyPrinter() {
      this(new OutputStreamWriter(System.out));
    }
    
    private int nodeCount = 0;
    
    public void writeNode(Node node) throws IOException {
      
        if (node == null) {
            throw new NullPointerException("Node must be non-null.");
        }
        if (node instanceof Document) { 
            // starting a new document, reset the node count
            nodeCount = 1; 
        }
      
        String type      = node.getClass().getName(); // never null
        String value     = node.getValue();
        
        String name      = null; 
        String localName = null;
        String uri       = null;
        String prefix    = null;

        if (node instanceof Element) {
            Element element = (Element) node;
            name = element.getQualifiedName();
            localName = element.getLocalName();
            uri = element.getNamespaceURI();
            prefix = element.getNamespacePrefix();
        }
        else if (node instanceof Attribute) {
            Attribute attribute = (Attribute) node;
            name = attribute.getQualifiedName();
            localName = attribute.getLocalName();
            uri = attribute.getNamespaceURI();
            prefix = attribute.getNamespacePrefix();
        }
      
        StringBuffer result = new StringBuffer();
        result.append("Node " + nodeCount + ":\r\n");
        result.append("  Type: " + type + "\r\n");
        if (name != null) {
            result.append("  Name: " + name + "\r\n");
        }
        if (localName != null) {
            result.append("  Local Name: " + localName + "\r\n");
        }
        if (prefix != null) {
            result.append("  Prefix: " + prefix + "\r\n");
        }
        if (uri != null) {
            result.append("  Namespace URI: " + uri + "\r\n");
        }
        if (value != null) {
            result.append("  Value: " + value + "\r\n");
        }
      
        out.write(result.toString());
        out.write("\r\n");
        out.flush();
      
        nodeCount++;
      
    }
    
    public static void main(String[] args) throws Exception {
     
      Builder builder = new Builder();
      for (int i = 0; i < args.length; i++) {
          PropertyPrinter p = new PropertyPrinter();
          File f = new File(args[i]);
          Document doc = builder.build(f);
          p.writeNode(doc);
      }   
        
    }
}

PropertyPrinter Output

% java -classpath ~/XOM/build/classes:. PropertyPrinter hotcop.xml
Node 1:
  Type: nu.xom.Document
  Value:
  Hot Cop

  Jacques Morali
  Henri Belolo
  Victor Willis
  Jacques Morali


    A & M Records

  6:20
  1978
  Village People

Example: TreeReporter

import java.io.IOException;
import nu.xom.*;

public class TreeReporter {

    public static void main(String[] args) {
     
        if (args.length <= 0) {
          System.out.println("Usage: java TreeReporter URL");
          return; 
        }
         
        TreeReporter iterator = new TreeReporter();
        try {
          Builder parser = new Builder();
          
          // Read the entire document into memory
          Node document = parser.build(args[0]); 
          
          // Process it starting at the root
          iterator.followNode(document);
    
        }
        catch (IOException ex) { 
          System.out.println(ex); 
        }
        catch (ParsingException ex) { 
          System.out.println(ex); 
        }
  
    } // end main

    private PropertyPrinter printer = new PropertyPrinter();
  
    // note use of recursion
    public void followNode(Node node) throws IOException {
    
        printer.writeNode(node);
        for (int i = 0; i < node.getChildCount(); i++) {
            followNode(node.getChild(i));
        }
    
  }
}

TreeReporter Output

% java -classpath ~/XOM/build/classes:. TreeReporter
elharo@stallion examples]$ java -classpath ~/XOM/build/classes:. TreeReporter hotcop.xml
Node 1:
  Type: nu.xom.Document
  Value:
  Hot Cop

  Jacques Morali
  Henri Belolo
  Victor Willis
  Jacques Morali


    A & M Records

  6:20
  1978
  Village People


Node 2:
  Type: nu.xom.ProcessingInstruction
  Value: type="text/css" href="song.css"

Node 3:
  Type: nu.xom.DocType
  Value:

Node 4:
  Type: nu.xom.Element
  Name: SONG
  Local Name: SONG
  Prefix:
  Namespace URI: http://metalab.unc.edu/xml/namespace/song
  Value:
  Hot Cop

  Jacques Morali
  Henri Belolo
  Victor Willis
  Jacques Morali


    A & M Records

  6:20
  1978
  Village People

...

The Document Class

package nu.xom;

public class Document extends ParentNode {

  public Document(Element root);
  public Document(Document doc);
  
  public final DocType getDocType() ;
  public final Element getRootElement();
  public       void    setRootElement(Element root)
  public       void    setBaseURI(String URI);
  public final String  getBaseURI();
  
  public       void    insertChild(int position, Node c);
  public       void    removeChild(int position);
  public       void    removeChild(Node child);

  public final String  getValue() ;
  public final String  toXML();
  public       Node    copy();
  
}

Example: Validating XHTML


Verify Root Element is html in the XHTML namespace

      boolean valid = true;       
      DocType doctype = document.getDocType();
    
      if (doctype == null) {
        valid = false;
      }
      else {
        // check doctype
      }
    
      Element root = document.getRootElement();
      String uri = root.getNamespaceURI();
      String prefix = root.getNamespacePrefix();
      if (!uri.equals("http://www.w3.org/1999/xhtml")) {
        valid = false;
      }
      if (!prefix.equals("")) valid = false;

The Element Class


Element Constructors:

    public Element(String name);
    public Element(String name, String uri);
    public Element(Element element);
    Element para = new Element("para");
    Element p = new Element("p", "http://www.w3.org/1999/xhtml");
    Element text = new Element("svg:text", "http://www.w3.org/TR/2000/svg");

Element Properties


Methods to get child elements

    public final Elements getChildElements(String name);
    public final Elements getChildElements(String localName, String namespace);
    public final Element  getFirstChildElement(String name);
    public final Element  getFirstChildElement(String localName, String namespace);

The Elements class

public final class Elements {

    public int     size();
    public Element get(int index);
    
}

Recursive Descent

public void process(Element element) {

  Elements children = element.getChildElements();
  for (int i = 0; i < children.size(); i++) {
    process(children.get(i));
  }

}

Example: TreeViewer

import javax.swing.*;
import javax.swing.tree.*;
import nu.xom.*;

public class TreeViewer {

    // Initialize the per-element data structures
    public static MutableTreeNode processElement(Element element) {

        String data;
        if (element.getNamespaceURI().equals(""))
            data = element.getLocalName();
        else {
            data =
                '{'
                    + element.getNamespaceURI()
                    + "} "
                    + element.getQualifiedName();
        }

        MutableTreeNode node = new DefaultMutableTreeNode(data);
        Elements children = element.getChildElements();
        for (int i = 0; i < children.size(); i++) {
            node.insert(processElement(children.get(i)), i);
        }

        return node;

    }

    public static void display(Document doc) {

        Element root = doc.getRootElement();
        JTree tree = new JTree(processElement(root));
        JScrollPane treeView = new JScrollPane(tree);
        JFrame f = new JFrame("XML Tree");


        String version = System.getProperty("java.version");
        if (version.startsWith("1.2") || version.startsWith("1.1")) {
            f.setDefaultCloseOperation(JFrame.HIDE_ON_CLOSE); 
        }
        else {
            // JFrame.EXIT_ON_CLOSE == 3 but this named constant is not
            // available in Java 1.2
            f.setDefaultCloseOperation(3);
        }
        f.getContentPane().add(treeView);
        f.pack();
        f.show();

    }

    public static void main(String[] args) {

        try {
            Builder builder = new Builder();
            for (int i = 0; i < args.length; i++) {
                Document doc = builder.build(args[i]);
                display(doc);
            }
        }
        catch (Exception ex) {
            System.err.println(ex);
        }

    } // end main()

} // end TreeViewer

Attribute Methods on Element

    public       void      addAttribute(Attribute attribute);
    public       void      removeAttribute(Attribute attribute);
    public final Attribute getAttribute(String name);
    public final Attribute getAttribute(String localName, String namespaceURI);
    public final String    getAttributeValue(String name);
    public final String    getAttributeValue(String localName, String namespaceURI);
    public final int       getAttributeCount();
    public final Attribute getAttribute(int i);

Example: IDTagger

import java.io.IOException;
import nu.xom.*;

public class IDTagger {

  private static int id = 1;

  public static void processElement(Element element) {
    
    if (element.getAttribute("ID") == null) {
      element.addAttribute(new Attribute("ID", "_" + id));
      id = id + 1; 
    }
    
    // recursion
    Elements children = element.getChildElements();
    for (int i = 0; i < children.size(); i++) {
      processElement(children.get(i));   
    }
    
  }

  public static void main(String[] args) {
     
    Builder builder = new Builder();
    
    for (int i = 0; i < args.length; i++) {
        
      try {
        // Read the entire document into memory
        Document document = builder.build(args[i]); 
       
        processElement(document.getRootElement());
        
        System.out.println(document.toXML());         
      }
      catch (ParsingException ex) {
        System.err.println(ex);
        continue; 
      }
      catch (IOException ex) {
        System.err.println(ex);
        continue; 
      }
      
    }
  
  } // end main

}

Additional Namespaces

public void addNamespaceDeclaration(String prefix, String URI);
public void removeNamespaceDeclaration(String prefix);

Enumerating Namespaces


The Text Class

package nu.xom;

public class Text extends Node {

  public Text(String data);
  public Text(Text text);

  public       void   setValue(String data);
  public final String getValue();
  
  public final Node getChild(int i);
  public final int  getChildCount();

  public final String toString();

  public       Node    copy();
  public final String  toXML();

}

ROT13XML

import java.io.IOException;
import nu.xom.*;

public class ROT13XML {

    // note use of recursion
    public static void encode(Node node) {
    
        if (node instanceof Text) {
          Text text = (Text) node;
          String data = text.getValue();
          text.setValue(rot13(data));
        }
        
        // recurse the children
        for (int i = 0; i < node.getChildCount(); i++) {
            encode(node.getChild(i));
        } 
    
    }
  
    public static String rot13(String s) {
    
        StringBuffer out = new StringBuffer(s.length());
        for (int i = 0; i < s.length(); i++) {
          int c = s.charAt(i);
          if (c >= 'A' && c <= 'M') out.append((char) (c+13));
          else if (c >= 'N' && c <= 'Z') out.append((char) (c-13));
          else if (c >= 'a' && c <= 'm') out.append((char) (c+13));
          else if (c >= 'n' && c <= 'z') out.append((char) (c-13));
          else out.append((char) c);
        } 
        return out.toString();
    
    }

  public static void main(String[] args) {

    if (args.length <= 0) {
      System.out.println("Usage: java ROT13XML URL");
      return;
    }
    
    String url = args[0];
    
    try {
      Builder parser = new Builder();
      
      // Read the document
      Document document = parser.build(url); 
      
      // Modify the document
      ROT13XML.encode(document);

      // Write it out again
      System.out.println(document.toXML());

    }
    catch (IOException ex) { 
      System.out.println(
      "Due to an IOException, the parser could not encode " + url
      ); 
    }
    catch (ParsingException ex) { 
      System.out.println(ex);
    }
     
  } // end main

}

ROT13XML Output

% java -classpath ~/XOM/build/classes:. ROT13XML hotcop.xml
% java -classpath ~/XOM/build/classes:. ROT13XML hotcop.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/css" href="song.css"?>
<!DOCTYPE SONG SYSTEM "song.dtd">
<SONG xmlns="http://metalab.unc.edu/xml/namespace/song" xmlns:xlink="http://www.w3.org/1999/xlink">
  <TITLE>Ubg Pbc</TITLE>
  <PHOTO xlink:type="simple" xlink:show="onLoad" xlink:href="hotcop.jpg" ALT="Victor Willis in Cop Outfit" WIDTH="100" HEIGHT="200" />
  <COMPOSER>Wnpdhrf Zbenyv</COMPOSER>
  <COMPOSER>Uraev Orybyb</COMPOSER>
  <COMPOSER>Ivpgbe Jvyyvf</COMPOSER>
  <PRODUCER>Wnpdhrf Zbenyv</PRODUCER>
  <!-- The publisher is actually Polygram but I needed
       an example of a general entity reference. -->
  <PUBLISHER xlink:type="simple" xlink:href="http://www.amrecords.com/">
    N &amp; Z Erpbeqf
  </PUBLISHER>
  <LENGTH>6:20</LENGTH>
  <YEAR>1978</YEAR>
  <ARTIST>Ivyyntr Crbcyr</ARTIST>
</SONG>
<!-- You can tell what album I was
     listening to when I wrote this example -->

The Attribute Class


Attribute Constructors

   public Attribute(String localName, String value);
   public Attribute(String localName, String value, Type type);
   public Attribute(String name, String URI, String value, Type type);
   public Attribute(Attribute attribute);

Attribute Getter and Setter Methods

public final Type   getType();
public       void   setType(Type type);
public final String getValue();
public       void   setValue(String value);
public final String getLocalName();
public       void   setLocalName(String localName);
public final String getQualifiedName()
public final String getNamespaceURI();
public final String getPrefix();
public       void   setNamespace(String prefix, String URI);

Example: XLinkSpider

import java.net.*;
import java.util.*;
import nu.xom.*;

public class XLinkSpider {

    private Set spidered = new HashSet();
    private Builder parser = new Builder();
    private List queue = new LinkedList();
    
    public static final String XLINK_NS 
      = "http://www.w3.org/1999/xlink";
    
    public void search(URL url) {
        try {
            String systemID = url.toExternalForm();
            Document doc = parser.build(systemID);
            System.out.println(url);
            search(doc.getRootElement());
        }
        catch (Exception ex) {
            // just skip this document
        }
        
        if (queue.isEmpty()) return;
        
        URL discovered = (URL) queue.remove(0);
        spidered.add(discovered);
        search(discovered);      
    }

    private void search(Element element) {

        Attribute href = element.getAttribute("href", XLINK_NS);
        
        URL base = null;
        try {
            base = new URL(element.getBaseURI());
        }
        catch (MalformedURLException ex) {
            // Probably just no protocol handler for the 
            // kind of URLs used inside this element
            return;
        }
        if (href != null) {
            String uri = href.getValue();
            // absolutize URL
            try {
                URL discovered = new URL(base, uri);
                // remove fragment identifier if any
                discovered = new URL(
                  discovered.getProtocol(),
                  discovered.getHost(),
                  discovered.getFile()
                );
                
                if (!spidered.contains(discovered) 
                  && !queue.contains(discovered)) {
                    queue.add(discovered);   
                }
            }
            catch (MalformedURLException ex) {
                // skip this one   
            }
        }
        Elements children = element.getChildElements();
        for (int i = 0; i < children.size(); i++) {
            search(children.get(i));
        }
        
    }

    public static void main(String[] args) {
      
        XLinkSpider spider = new XLinkSpider();
        for (int i = 0; i < args.length; i++) { 
            try { 
                spider.search(new URL(args[i]));
            }
            catch (MalformedURLException ex) {
                System.err.println(ex);   
            }
        }
      
    }  // end main()

}

XLinkSpider Output

% java -classpath ~/XOM/build/classes:. XLinkSpider http://www.rddl.org
http://www.rddl.org
http://www.rddl.org/purposes
http://www.rddl.org/rddl.rdfs
http://www.rddl.org/rddl-integration.rxg
http://www.rddl.org/modules/rddl-1.rxm
http://www.rddl.org/modules/xhtml-attribs-1.rxm
http://www.rddl.org/modules/xhtml-base-1.rxm
http://www.rddl.org/modules/xhtml-basic-form-1.rxm
http://www.rddl.org/modules/xhtml-basic-table-1.rxm
http://www.rddl.org/modules/xhtml-basic10-model-1.rxm
http://www.rddl.org/modules/xhtml-basic10.rxm
http://www.rddl.org/modules/xhtml-blkphras-1.rxm
http://www.rddl.org/modules/xhtml-blkstruct-1.rxm
http://www.rddl.org/modules/xhtml-for-rddl.rxm
http://www.rddl.org/modules/xhtml-framework-1.rxm
http://www.rddl.org/modules/xhtml-hypertext-1.rxm
http://www.rddl.org/modules/xhtml-image-1.rxm
http://www.rddl.org/modules/xhtml-inlphras-1.rxm
http://www.rddl.org/modules/xhtml-inlstruct-1.rxm
http://www.rddl.org/modules/xhtml-link-1.rxm
http://www.rddl.org/modules/xhtml-list-1.rxm
http://www.rddl.org/modules/xhtml-meta-1.rxm
...
http://www.w3.org/TR/xhtml-basic
http://www.w3.org/TR/xml-infoset/
http://www.w3.org/tr/xhtml1
http://www.w3.org/TR/xhtml-modularization/
http://www.rddl.org/purposes/software
http://www.ascc.net/xml/schematron
http://www.w3.org/2001/XMLSchema
http://www.examplotron.org
...

Attribute.Type


The ProcessingInstruction Class

package nu.xom;

public class ProcessingInstruction extends Node {

  public ProcessingInstruction(String target, String data) {
  public ProcessingInstruction(ProcessingInstruction instruction)

  public final String getTarget();
  public       void   setTarget(String target);
  public final String getValue();
  public       void   setValue(String data);
  
  public final Node getChild(int i);
  public final int  getChildCount();

  public final Node   copy();
  public final String toXML();

  public final String toString();

}

Example: PoliteSpider

import java.net.*;
import java.util.*;
import nu.xom.*;

public class PoliteSpider {

    private Set spidered = new HashSet();
    private Builder parser = new Builder();
    private List queue = new LinkedList();
    
    public static final String XLINK_NS 
     = "http://www.w3.org/1999/xlink";
    
    public void search(URL url) {
        
        try {
            String systemID = url.toExternalForm();
            Document doc = parser.build(systemID);
            
            boolean follow = true;
            boolean index = true;
            for (int i = 0; i < doc.getChildCount(); i++) {
                Node child = doc.getChild(i); 
                if (child instanceof Element) break;  
                if (child instanceof ProcessingInstruction){
                    ProcessingInstruction instruction 
                      = (ProcessingInstruction) child;
                    if (instruction.getTarget().equals("robots")) {
                        Element data 
                          = PseudoAttributes.getAttributes(instruction); 
                        Attribute indexAtt = data.getAttribute("index"); 
                        if (indexAtt != null) {
                            String value = indexAtt.getValue().trim();
                            if (value.equals("no")) index = false;
                        }
                        Attribute followAtt = data.getAttribute("follow"); 
                        if (followAtt != null) {
                            String value = followAtt.getValue().trim();
                            if (value.equals("no")) follow = false;
                        }
                    }   
                }  
            }
            
            if (index) System.out.println(url);
            if (follow) search(doc.getRootElement());
        }
        catch (Exception ex) {
            // just skip this document
        }
        
        if (queue.isEmpty()) return;
        
        URL discovered = (URL) queue.remove(0);
        spidered.add(discovered);
        search(discovered);      
        
    }

    private void search(Element element) {

        Attribute href = element.getAttribute("href", XLINK_NS);
        
        URL base = null;
        try {
            base = new URL(element.getBaseURI());
        }
        catch (MalformedURLException ex) {
            // Probably just no protocol handler for the 
            // kind of URLs used inside this element
            return;
        }
        if (href != null) {
            String uri = href.getValue();
            // absolutize URL
            try {
                URL discovered = new URL(base, uri);
                // remove fragment identifier if any
                discovered = new URL(
                  discovered.getProtocol(),
                  discovered.getHost(),
                  discovered.getFile()
                );
                
                if (!spidered.contains(discovered) 
                  && !queue.contains(discovered)) {
                    queue.add(discovered);   
                }
            }
            catch (MalformedURLException ex) {
                // skip this one   
            }
        }
        Elements children = element.getChildElements();
        for (int i = 0; i < children.size(); i++) {
            search(children.get(i));
        }
    }

    public static void main(String[] args) {
      
        PoliteSpider spider = new PoliteSpider();
        for (int i = 0; i < args.length; i++) { 
            try { 
                spider.search(new URL(args[i]));
            }
            catch (MalformedURLException ex) {
                System.err.println(ex);   
            }
        }
      
    } // end main()
}

The DocType Class

public class DocType extends Node{

 public DocType(String rootElementName, String publicID, String systemID);
 public DocType(String rootElementName, String systemID);
 public DocType(String rootElementName);
 public DocType(DocType doctype);
    
 public       void   setRootElementName(String name);
 public final String getRootElementName();
 public final String getInternalDTDSubset();
 public       String setInternalDTDSubset(String subset); // 1.1 and later
 public       void   setPublicID(String id);
 public final String getPublicID();
 public       void   setSystemID(String id);
 public final String getSystemID();
 
 public final Node getChild(int i);
 public final int  getChildCount();

 public final Node   copy();
 public final String toXML();
 
}

Validating XHTML


Three XHTML DTDs:


XHTMLValidator

import java.io.IOException;
import nu.xom.*;

public class XHTMLValidator {

  public static void main(String[] args) {
    
    for (int i = 0; i < args.length; i++) {
      validate(args[i]);
    }   
    
  }

  private static Builder builder = new Builder(true);
                         /* turn on validation ^^^^ */
  
  // not thread safe
  public static void validate(String source) {
        
      Document document;
      try {
        document = builder.build(source); 
      }
      catch (ParsingException ex) {  
        System.out.println(source 
         + " is invalid XML, and thus not XHTML."); 
        return; 
      }
      catch (IOException ex) {  
        System.out.println("Could not read: " + source); 
        return; 
      }
      
      // If we get this far, then the document is valid XML.
      // Check to see whether the document is actually XHTML 
      boolean valid = true;       
      DocType doctype = document.getDocType();
    
      if (doctype == null) {
        System.out.println("No DOCTYPE");
        valid = false;
      }
      else {
        // verify the DOCTYPE
        String name     = doctype.getRootElementName();
        String publicID = doctype.getPublicID();
      
        if (!name.equals("html")) {
          System.out.println(
           "Incorrect root element name " + name);
          valid = false;
        }
    
        if (publicID == null
         || (!publicID.equals("-//W3C//DTD XHTML 1.0 Strict//EN")
           && !publicID.equals(
            "-//W3C//DTD XHTML 1.0 Transitional//EN")
           && !publicID.equals(
            "-//W3C//DTD XHTML 1.0 Frameset//EN"))) {
          valid = false;
          System.out.println(source 
           + " does not seem to use an XHTML 1.0 DTD");
        }
      }
    
    
      // Check the namespace on the root element
      Element root = document.getRootElement();
      String uri = root.getNamespaceURI();
      String prefix = root.getNamespacePrefix();
      if (!uri.equals("http://www.w3.org/1999/xhtml")) {
        valid = false;
        System.out.println(source 
         + " does not properly declare the"
         + " http://www.w3.org/1999/xhtml namespace"
         + " on the root element");        
      }
      if (!prefix.equals("")) {
        valid = false;
        System.out.println(source 
         + " does not use the empty prefix for XHTML");        
      }
      
      if (valid) System.out.println(source + " is valid XHTML.");
    
  }

}

XHTMLValidator Output

% java -classpath ~/XOM/build/classes:. XHTMLValidator http://www.w3.org/ http://www.cafeconleche.org/
http://www.w3.org/ is valid XHTML.
http://www.cafeconleche.org/ is invalid XML, and thus not XHTML.

The Comment Class

package nu.xom;

public class Comment extends Node {

  public Comment(String data);
  public Comment(Comment comment);

  public final String getValue();
  public       void   setValue(String data);
  
  public final Node getChild(int i);
  public final int  getChildCount();
  
  public final Node   copy();
  public final String toXML();
  
  public final String toString();
	
}

Example: CommentReader

import java.io.IOException;
import nu.xom.*;

public class CommentReader {

    public static void list(Node node) {
        
        for (int i = 0; i < node.getChildCount(); i++) {           
            Node child = node.getChild(i);
            if (child instanceof Comment) {
                System.out.println(child.toXML());
            }
            else {
                list(child);   
            }
        }
        
    } 

    public static void main(String[] args) {
  
        if (args.length <= 0) {
          System.out.println("Usage: java CommentReader URL");
          return;
        }
        
        try {
          Builder parser = new Builder();
          Document doc = parser.build(args[0]);
          list(doc);
        }
        catch (ParsingException ex) {
          System.out.println(args[0] + " is not well-formed.");
          System.out.println(ex.getMessage());
        }
        catch (IOException ex) { 
          System.out.println(
           "Due to an IOException, the parser could not read " 
           + args[0]
          ); 
        }
  
    }

}

CommentReader Output

$ java -classpath  ~/XOM/build/classes/:. CommentReader http://www.w3.org/TR/2004/REC-DOM-Level-3-Val-20040127/xml-source.xml
<!-- $Id: xml-source.xml,v 1.7 2004/01/26 22:31:28 plehegar Exp $ -->
<!--
  *************************************************************************
  * FRONT MATTER                                                          *
  *************************************************************************
  -->
<!--
  ******************************************************
  | filenames to be used for each section              |
  ******************************************************
-->
<!--
    ******************************************************
    * DOCUMENT ABSTRACT                                  *
    ******************************************************
    -->
<!-- $Id: xml-source.xml,v 1.7 2004/01/26 22:31:28 plehegar Exp $ -->
<!-- $Id: xml-source.xml,v 1.7 2004/01/26 22:31:28 plehegar Exp $ -->
<!--
 *************************************************************************
 * BEGINNING OF COPYRIGHT NOTICE                                         *
 *************************************************************************
-->
<!--
 *************************************************************************
 * END OF COPYRIGHT NOTICE                                               *
 *************************************************************************
-->
<!-- $Id: xml-source.xml,v 1.7 2004/01/26 22:31:28 plehegar Exp $ -->
<!--
 *************************************************************************
 * BEGINNING OF VALIDATION
 *************************************************************************
-->
<!--
  ******************************************************
  Last known edit 12/03/2003
  Suggestions welcome, especially if accompanied by
  proposed revisions already marked up as per spec.dtd!
  ******************************************************
  -->
<!--
  ******************************************************
  | OVERVIEW                                            |
  ******************************************************
  -->
<!--
  ******************************************************
  | ISSUES                                             |
  ******************************************************
<div2 id="Level-3-VAL-Issue-List">
  <head>Issue List</head>

  <div3 id="VAL-Issues-List-Resolved">
    <head>Resolved Issues</head>

    <issue id="VAL-Issue-8" status="open">
      <p>For Validation interfaces there should be no dependency on DOM Core.
      </p>
      <p>The <code>NodeEditVAL</code> interface will not extend DOM Core.  It is simply an object that expresses similar interfaces.</p>
    </issue>

  </div3>

-->...

The Builder Class

package nu.xom;

public class Builder {

    public Builder();
    public Builder(boolean validate);
    public Builder(boolean validate, NodeFactory factory);
    public Builder(XMLReader parser);
    public Builder(NodeFactory factory);
    public Builder(XMLReader parser, boolean validate);
    public Builder(XMLReader parser, boolean validate, NodeFactory factory);
    
    public Document build(String systemID) 
      throws ParsingException, ValidityException, IOException;
    public Document build(InputStream in) 
      throws ParsingException, ValidityException, IOException;
    public Document build(InputStream in, String baseURI) 
      throws ParsingException, ValidityException, IOException;
    public Document build(File in) 
      throws ParsingException, ValidityException, IOException;
    public Document build(Reader in) 
      throws ParsingException, ValidityException, IOException;
    public Document build(Reader in, String baseURI) 
      throws ParsingException, ValidityException, IOException;
    public Document build(String document, String baseURI) 
      throws ParsingException, ValidityException, IOException;
      
    public NodeFactory getNodeFactory();
    
}

Example: Schema Validating

try {      
  XMLReader xerces = XMLReaderFactory.createXMLReader(
   "org.apache.xerces.parsers.SAXParser"); 
  xerces.setFeature(
   "http://apache.org/xml/features/validation/schema",
    true);                         
  Builder parser = new Builder(xerces, true);
  parser.build(url);
  System.out.println(url + " is schema valid.");
}
catch (SAXException ex) {
  System.out.println("Could not load Xerces.");
}
catch (ParsingException ex) {
  System.out.println(url + " is not schema valid.");
  System.out.println(ex.getMessage());
}
catch (IOException ex) { 
  System.out.println("Due to an IOException, Xerces could not check " 
  + url); 
}

Serializer

public class Serializer {

    public Serializer(OutputStream out);
    public Serializer(OutputStream out, String encoding);
 
    public int     getIndent();
    public void    setIndent(int indent);
    public String  getLineSeparator();
    public void    setLineSeparator(String lineSeparator);
    public int     getMaxLength();
    public void    setMaxLength(int length);
    public boolean getPreserveBaseURI();
    public void    setPreserveBaseURI(boolean preserve);
    public boolean getNormalizationFormC();
    public void    setNormalizationFormC(boolean preserve);

    public void    write(Document doc) throws IOException;
    public void    flush() throws IOException;

}

Example: Pretty Printing

import java.io.IOException;
import nu.xom.*;

public class PrettyPrinter {

    public static void main(String[] args) {
  
        if (args.length <= 0) {
          System.out.println("Usage: java PrettyPrinter URL");
          return;
        }
        
        try {
          Builder parser = new Builder();
          Document doc = parser.build(args[0]);
          Serializer serializer = new Serializer(System.out, "ISO-8859-1");
          serializer.setIndent(4);
          serializer.setMaxLength(64);
          serializer.setPreserveBaseURI(true);
          serializer.write(doc);
          serializer.flush();
        }
        catch (ParsingException ex) {
          System.out.println(args[0] + " is not well-formed.");
          System.out.println(ex.getMessage());
        }
        catch (IOException ex) { 
          System.out.println(
           "Due to an IOException, the parser could not check " 
           + args[0]
          ); 
        }
  
    }

}

Encoding


Connecting to other Models


The Wrong Side of 80/20


Subclassing


NodeFactory

package nu.xom;

public class NodeFactory {

    public Element  makeRootElement(String name, String namespace);
    public Element  startMakingElement(String name, String namespace);
    public Nodes    finishMakingElement(Element element);
    
    public Document startMakingDocument();
    public void     finishMakingDocument(Document document);
    public Nodes    makeAttribute(String name, String URI, String value, Attribute.Type type);
    public Nodes    makeComment(String data);
    public Nodes    makeDocType(String rootElementName, String publicID, String systemID);
    public Nodes    makeText(String data);
    public Nodes    makeProcessingInstruction(String target, String data);
    
}

Factories


Processing Arbitrarily Large Documents


Streaming Processing of Large Documents

import java.io.IOException;
import nu.xom.*;

public class RSSHeadlines extends NodeFactory {

    private boolean inTitle = false;
    private Nodes empty = new Nodes();

    public Element startMakingElement(String name, String namespace) {              
        if ("title".equals(name) ) {
            inTitle = true; 
            return new Element(name, namespace);
        }
        return null;            
    }

    public Nodes finishMakingElement(Element element) {
        if ("title".equals(element.getQualifiedName()) ) {
            System.out.println(element.getValue());
            inTitle = false;
        }
        return empty;
    }

    public Nodes makeComment(String data) {
        return empty;  
    }    

    public Element makeRootElement(String name, String namespace) {
        return new Element(name, namespace); 
    }

    public Nodes makeAttribute(String name, String namespace, 
      String value, Attribute.Type type) {
        return empty;
    }

    public Nodes makeDocType(String rootElementName, 
      String publicID, String systemID) {
        return empty;    
    }

    public Nodes makeProcessingInstruction(
      String target, String data) {
        return empty; 
    }    
    
    public static void main(String[] args) {
  
        String url = "http://www.bbc.co.uk/syndication/feeds/news/ukfs_news/world/rss091.xml";
        if (args.length > 0) {
          url = args[0];
        }
        
        try {
          Builder parser = new Builder(new RSSHeadlines());
          parser.build(url);
        }
        catch (ParsingException ex) {
          System.out.println(url + " is not well-formed.");
          System.out.println(ex.getMessage());
        }
        catch (IOException ex) { 
          System.out.println(
           "Due to an IOException, the parser could not read " + url
          ); 
        }
  
    }

}

Output of RSS Headlines

% java -classpath ~/XOM/build/classes:. RSSHeadlines
BBC News | World | UK Edition
BBC News
Ailing Pope to stay in hospital
UK's Kenya envoy in fresh attack
Tsunami survivors found on island
'Nepal crisis cabinet' unveiled
Bush to make key policy speech
Sunnis say Iraq poll illegitimate
Egypt to host Middle East summit
Germany renews pledge to Israel
US hostage photo 'is doll hoax'
Golf: Langer gives up captaincy
Football: Ref scandal escalates
Zimbabwe expels SA union leaders
Africans 'worst-hit by warming'
Clinton made UN's tsunami envoy
Jet skids off New Jersey runway
Trauma risk for tsunami survivors
US 'ties N Korea to nuclear deal'
Five million Germans out of work
Conference examines Roma plight
Syria and Jordan talk about peace
Ex-UN chief warns of water wars
South Asia group postpones talks
Couple arrested over tsunami baby
Heroes who defied the Holocaust
...

XPath Processing of Small-to-Medium Documents

import java.io.IOException;
import nu.xom.*;

public class XPathHeadlines {
    
    public static void main(String[] args) {
  
        String url = "http://www.bbc.co.uk/syndication/feeds/news/ukfs_news/world/rss091.xml";
        if (args.length > 0) {
          url = args[0];
        }
        
        try {
          Builder parser = new Builder();
          Document doc = parser.build(url);
          Nodes titles = doc.query("//title");
          for (int i = 0; i < titles.size(); i++) {
            System.out.println(titles.get(i).getValue());
          }
        }
        catch (ParsingException ex) {
          System.out.println(url + " is not well-formed.");
          System.out.println(ex.getMessage());
        }
        catch (IOException ex) { 
          System.out.println(
           "Due to an IOException, the parser could not read " + url
          ); 
        }
  
    }

}

Performance


Candidates for Optimization


How does XOM differ from JDOM?


In XOM's Favor


XOM is simpler!

Number of public methods (and constructors) in DOM2 JDOM XOM 1.1
Node 25 8 * 13
Attribute 5 29 20
Element 16 73 37
ProcessingInstruction 3 14 9
Comment 0 5 9
Builder N/A 32 ** 16
Document 17 41 13
Total 66 202 117

* Content

** SAXBuilder


Future Directions


Props


To Learn More


Index | Cafe con Leche

Copyright 2004-2006 Elliotte Rusty Harold
elharo@metalab.unc.edu
Last Modified January 18, 2006