Talking to SAX Programs

JDOM works very well with SAX parsers. SAX is an almost ideal event model for building a JDOM tree; and when the tree is complete, JDOM makes it easy to walk the tree, firing off SAX events as you go. Since SAX is so fast and memory-efficient, SAX doesn’t add a lot of extra overhead to JDOM programs.

Configuring SAXBuilder

When reading a file or stream through a SAX parser, you can set various properties on the parser including the ErrorHandler, DTDHandler, EntityResolver, and any custom features or properties that are supported by the underlying SAX XMLReader. SAXBuilder includes several methods that just delegate these configurations to the underlying XMLReader:

public void setErrorHandler(ErrorHandler errorHandler);
public void setEntityResolver(EntityResolver entityResolver);
public void setDTDHandler(DTDHandler dtdHandler);
public void setIgnoringElementContentWhitespace(boolean ignoreWhitespace);
public void setFeature(String name, boolean value);
public void setProperty(String name, Object value);

For example, suppose you want to schema validate documents before using them. This requires three additional steps beyond the norm:

  1. Explicitly pick a parser class that is known to be able to schema validate such as org.apache.xerces.parsers.SAXParser (Most parsers can’t schema validate.)

  2. Install a SAX ErrorHandler that reports validity errors.

  3. Set the SAX feature that turns on schema validation to true. Which feature this is depends on which parser you picked in step 1. In Xerces, it’s http://apache.org/xml/features/validation/schema and you also need to turn validation on using the standard SAX feature http://xml.org/sax/features/validation.

Example 14.11 is a simple JDOM program that schema validates a URL named on the command line using Xerces. This is similar to the earlier JDOMValidator example. However here because the installed ErrorHandler (BestSAXChecker from Chapter 7) merely prints validity error messages on System.out and does not throw an exception, validity errors do not terminate the parse. The Document object is still built as long as it’s well-formed, whether or not it’s valid. You could, of course, change this behavior by using a more draconian ErrorHandler that did throw exceptions for validity errors.

Example 14.11. A JDOM program that schema validates documents

import org.jdom.JDOMException;
import org.jdom.input.SAXBuilder;
import java.io.IOException;


public class JDOMSchemaValidator {

  public static void main(String[] args) {
  
    if (args.length == 0) {
      System.out.println("Usage: java JDOMSchemaValidator URL"); 
      return;
    } 
      
    SAXBuilder builder = new SAXBuilder(
     "org.apache.xerces.parsers.SAXParser");
    builder.setValidation(true);
    builder.setErrorHandler(new BestSAXChecker());
                             // ^^^^^^^^^^^^^^
                             // From Chapter 7
    // turn on schema support
    builder.setFeature(
      "http://apache.org/xml/features/validation/schema", true);                         
                             
    // command line should offer URIs or file names
    try {
      builder.build(args[0]);
    }
    // indicates a well-formedness error
    catch (JDOMException e) { 
      System.out.println(args[0] + " is not well-formed.");
      System.out.println(e.getMessage());
    }  
    catch (IOException e) { 
      System.out.println("Could not check " + args[0]);
      System.out.println(" because " + e.getMessage());
    }  
  
  }

}

Here’s the result from when I used this program to check a mildly invalid document. One error was reported:

D:\books\XMLJAVA\examples\14>java JDOMSchemaValidator original_hotcop.xml
Error: cvc-type.3.1.3: The value '6:20' of element 'LENGTH' is 
 not valid. 
 at line 10, column 24
 in entity file:///D:/books/XMLJAVA/examples/14/original_hotcop.xml

Caution

You should only use setFeature() and setProperty() for non-standard features and properties like http://apache.org/xml/features/validation/schema. SAXBuilder requires certain settings of the standard features such as http://xml.org/sax/features/namespace-prefixes and standard properties such as http://xml.org/sax/properties/lexical-handler in order to work properly. If you change these, the document may not be built correctly.

Another interesting possibility is the option to set a SAX filter that is applied to the document as it’s read:

public void setXMLFilter(XMLFilter filter);

If you use this, the JDOM Document will include only the filtered content.

SAXOutputter

Besides reading a file or stream through a SAX parser, you can also feed a JDOM document into a SAX ContentHandler using the org.jdom.output.SAXOutputter class. This class is initially configured with a ContentHandler and optionally an ErrorHandler, DTDHandler, EntityResolver, and/or LexicalHandler. The output() method walks the tree, firing off events to these handlers as it does so.

For example, suppose you’ve built a document in memory that happens to contain some XInclude elements and you’d like to resolve them. JDOM does not have any built-in support for XInclude. To JDOM, an XInclude element is just an element that happens to have the local name include and the namespace URI http://www.w3.org/2001/XInclude. However, GNU JAXP does include a filter that can resolve XIncludes. Unfortunately it’s a SAX filter rather than a JDOM filter. Not to worry. It’s straightforward to feed a JDOM document into the GNU JAXP gnu.xml.pipeline.XIncludeFilter using a SAXOutputter as shown in Example 14.12:

Example 14.12. A JDOM program that passes documents to a SAX ContentHandler

import org.jdom.*;
import org.jdom.input.SAXBuilder;
import org.jdom.output.SAXOutputter;
import java.io.IOException;
import gnu.xml.pipeline.*;
import org.xml.sax.SAXException;


public class XIncluder {

  public static void main(String[] args) {
  
    if (args.length == 0) {
      System.out.println("Usage: java XIncluder URL"); 
      return;
    } 
      
    SAXBuilder builder = new SAXBuilder(
     "gnu.xml.aelfred2.XmlReader");

    // command line should offer URIs or file names
    try {
      Document doc = builder.build(args[0]);
      XIncludeFilter filter = new XIncludeFilter(
        new TextConsumer(System.out)
      );
      SAXOutputter outputter = new SAXOutputter(filter);
      outputter.setContentHandler(filter);
      outputter.setDTDHandler(filter);
      outputter.setLexicalHandler(filter);
      outputter.output(doc);
    }
    // indicates a well-formedness error
    catch (JDOMException e) { 
      System.out.println(args[0] + " is not well-formed.");
      System.out.println(e.getMessage());
    }  
    catch (SAXException e) { 
      System.out.println(e.getMessage());
    }  
    catch (IOException e) { 
      System.out.println("Could not merge " + args[0]);
      System.out.println(" because " + e.getMessage());
    }  
  
  }

}

Here the XIncludeFilter is itself hooked up to another GNU JAXP class, TextConsumer, which merely prints the document on a specified OutputStream.


Copyright 2001, 2002 Elliotte Rusty Haroldelharo@metalab.unc.eduLast Modified April 18, 2002
Up To Cafe con Leche