What's New in XML in Java 5 and 6

Elliotte Rusty Harold

Thursday, March 22, 2007

elharo@metalab.unc.edu

http://www.cafeconleche.org/

Where We're Going

JAXP
DOM 3
Java XPath API
Java Validation API
Java XML Digital Signature API
StAX

JAXP, the Java API for XML Processing

SAX
DOM
TrAX
Factory Classes
StAX starting in version 1.4

JAXP Versions

JAXP 1.0 supported SAX 1.0 and DOM 1.0
JAXP 1.1 supports SAX 2 and DOM 2, bundled with Java 1.4
JAXP 1.2 adds support for W3C XML schemas
JAXP 1.3 adds moves from Crimson to Xerces; adds DOM 3; bundled with Java 5
JAXP 1.4 adds support for StAX; bundled with Java 6

DOM Level 3

Core Changes
Bootstrapping
Loading/Parsing/Building
Filters
Error Handling
Serialization

DOM Level 3

of all of the things the W3C has given us, the DOM is probably the one with the least value.

--Michael Brennan on the xml-dev mailing list

DOM Level 3 Core Changes

Node
UserDataHandler
DOMConfiguration
Document
Text
Element
Attr
Entity

New methods in the Node interface

Adds:

Base URI

The URI this document came from. May be null.

Tree position

The order of a node relative to another reference node in tree order

Methods to set the text content of a node

Methods to test for equality

Methods to work with namespaces

Methods to associate user data with each node
I will only show the new members.

Java binding:

package org.w3c.dom;

public interface Node {

  public String getBaseURI();

  public static final short DOCUMENT_POSITION_DISCONNECTED = 0x01;
  public static final short DOCUMENT_POSITION_PRECEDING = 0x02;
  public static final short DOCUMENT_POSITION_FOLLOWING = 0x04;
  public static final short DOCUMENT_POSITION_CONTAINS = 0x08;
  public static final short DOCUMENT_POSITION_IS_CONTAINED = 0x10;

  public static final short DOCUMENT_POSITION_IMPLEMENTATION_SPECIFIC = 0x20;

  public short   compareDocumentPosition(Node other) throws DOMException;

  public String  getTextContent() throws DOMException;
  public void    setTextContent(String textContent) throws DOMException;

  public void    normalize();

  public boolean isSameNode(Node other);
  public boolean isEqualNode(Node arg);

  public String  lookupPrefix(String namespaceURI);
  public boolean isDefaultNamespace(String namespaceURI);
  public String  lookupNamespaceURI(String prefix);

  public Node    getFeature(String feature, String version);
    
  public Object  setUserData(String key, Object data, UserDataHandler handler);
  public Object  getUserData(String key);

}

User Data

A user-defined callback class that is invoked when a node is cloned, imported, deleted, adopted, or renamed

package org.w3c.dom;

public interface UserDataHandler {

  // OperationType
  public static final short NODE_CLONED   = 1;
  public static final short NODE_IMPORTED = 2;
  public static final short NODE_DELETED  = 3;
  public static final short NODE_RENAMED  = 4;
  public static final short NODE_ADOPTED  = 5;

  public void handle(short operation, String key, Object data, Node src, Node dst);

}

DOMConfiguration

Maintains a table of boolean, String, and Object parameters such as canonical-form and error-handler
Makes it possible to change what normalizeDocument() does by modifying these parameters

Java binding:

package org.w3c.dom;

public interface DOMConfiguration {

  public void    setParameter(String name, Object value) throws DOMException;
  public Object  getParameter(String name) throws DOMException;
  public boolean canSetParameter(String name, Object value);
  public DOMStringList getParameterNames();

}

Standard parameters include:

canonical-form, default false, optional

Canonicalize the document according to the rules of Canonical XML within the limits of DOM

cdata-sections, default true, required
Keep CDATASection nodes in the document

check-character-normalization, default false, optional

Check if the characters in the document are Unicode normalized in normalization form C

comments, default true

Retain Comment nodes in the document.

datatype-normalization, default false, optional

Use schema-normalized values

element-content-whitespace, default true, optional

Retain white space in element content (a.k.a ignorable white space)

entities, default true, required

Include EntityReference nodes in the document

error-handler, required (non-boolean)

A DOMErrorHandler object

infoset, default true, required

Set various parameters to keep all the information defined in the XML InfoSet

namespaces, default true, optional

Use namespace rules

namespace-declarations, default true, required

Include namespace declaration attributes as Attr nodes in the document.

normalize-characters, default false, optional

Normalize characters according to Unicode NFC

schema-location, optional

a list of schema URIs, separated by whitespace

schema-type, String, optional

an absolute URI such as http://www.w3.org/2001/XMLSchema or http://www.w3.org/TR/REC-xml (DTD) representing the type of the schema language

split-cdata-sections, default true, required

What to do when a CDATA object contains a ]]>

validate, default false, optional

Validate against the schema when normalizing

validate-if-schema

Validate against the schema when normalizing if one is found; otherwise don't bother

well-formed, default true, optional

fully check nodes for well-formedness
Implementations may also define their own custom parameters

New methods in Text

Adds:

isElementContentWhiteSpace()

Returns true if this node contains "ignorable" whitespace

wholeText()

Returns all text of Text nodes logically adjacent to this node; i.e. the XPath value of the text node

Java binding:

package org.w3c.dom;
  
public interface Text extends Node {
  
  public boolean isElementContentWhiteSpace();
    
  public String  getWholeText();
  public Text    replaceWholeText(String content) throws DOMException;

}

New methods in Element

Adds:

schemaTypeInfo

A TypeInfo object that provides a name and URI for the element's type, as given in the document's schema.

setIdAttribute

Methods to set ID type attributes

Java binding:

package org.w3c.dom;
  
public interface Element extends Node {
  
  public TypeInfo getSchemaTypeInfo();

  public void setIdAttribute(String name, boolean isId) throws DOMException;
  public void setIdAttributeNS(String namespaceURI, String localName, boolean isId) 
    throws DOMException;
  public void setIdAttributeNode(Attr idAttr, boolean isId) throws DOMException;

}

New methods in Attr

Adds:

schemaTypeInfo

A TypeInfo object that provides a name and URI for the attribute's type, as given in the document's schema.

isId

Method to tell if an attribute is an ID type

Java binding:

package org.w3c.dom;
  
public interface Attr extends Node {
  
  public TypeInfo getSchemaTypeInfo();
  public boolean isId();

}

Bootstrapping

DOM2 has no implementation-independent means to create a new Document object
Implementation-dependent methods tend to be fairly complex. For example, in Xerces-J:
DOMImplementation impl = DOMImplementationImpl.getDOMImplementation(); Document fibonacci = impl.createDocument(null, "Fibonacci_Numbers", null);

DOM3 Bootstrapping

Still no language-independent means to create a new Document object

Does provide an implementation-independent method for Java only:

  DOMImplementationRegistry registry = DOMImplementationRegistry.newInstance();
  DOMImplementation impl = registry.getDOMImplementation("XML");

package org.w3c.dom.bootstrap;

public class DOMImplementationRegistry { 

  public final static String PROPERTY =
        "org.w3c.dom.DOMImplementationSourceList";

  public static DOMImplementationRegistry newInstance()       
            throws ClassNotFoundException, InstantiationException, 
            IllegalAccessException;
  public DOMImplementation getDOMImplementation(String features)
            throws ClassNotFoundException,
            InstantiationException, IllegalAccessException, ClassCastException;
  public DOMImplementationList getDOMImplementationList(String features)
            throws ClassNotFoundException,
            InstantiationException, IllegalAccessException, ClassCastException;
  public void addSource(DOMImplementationSource s)
            throws ClassNotFoundException,
            InstantiationException, IllegalAccessException;
            
}

Bootstrapping Example

getDOMImplementation() returns a DOMImplementation object that supports the features given in the argument, or null if no such implementation can be found.

Request a DOMImplementation that supports XML DOM Level 1, any version of the traversal module, and DOM Level 2 events:

try {
  DOMImplementation impl = DOMImplementationRegistry
   .getDOMImplementation("XML 1.0 Traversal Events 2.0");
  if (impl != null) { 
    DocumentType svgDOCTYPE = impl.createDocumentType("svg", 
     "-//W3C//DTD SVG 1.0//EN", 
     "http://www.w3.org/TR/2001/REC-SVG-20010904/DTD/svg10.dtd");
    Document svgDoc = impl.createDocument(
     "http://www.w3.org/2000/svg", "svg", svgDOCTYPE
    );  
    // work with the document...
  }
}
catch (Exception ex) { 
  System.out.println(ex); 
}

Be sure to check whether the implementation returned is null before using it. Many installations may not be able to support all the features you ask for.
DOMImplementationRegistry searches for DOMImplementation classes by reading the value of the org.w3c.dom.DOMImplementationSourceList Java system property. This property should contain a white space separated list of DOMImplementationSource

DOM Error Handler Interfaces

DOMErrorHandler
DOMLocator

Load and Save

Loading: parsing an existing XML document to produce a Document object
Saving: serializing a Document object into a file or onto a stream
Completely implementation dependent in DOM2

Parsing documents with DOM3

import org.w3c.dom.*;
import org.w3c.dom.ls.*;
import org.w3c.dom.bootstrap.*;


public class DOM3ParserMaker {

  public static void main(String[] args) 
    throws ClassNotFoundException, InstantiationException, IllegalAccessException {

    System.setProperty(DOMImplementationRegistry.PROPERTY,
      "org.apache.xerces.dom.DOMImplementationSourceImpl");
    DOMImplementationRegistry registry = DOMImplementationRegistry.newInstance();
 
    DOMImplementation impl = registry.getDOMImplementation("LS-Load");
    if (impl == null) {
        System.err.println("Coudl not locate a DOM3 Parser");
        return;    
    }
    
    DOMImplementationLS implls = (DOMImplementationLS) impl;
    LSParser parser = implls.createLSParser(DOMImplementationLS.MODE_SYNCHRONOUS , null);

    for (int i = 0; i < args.length; i++) {
      try {
        Document d = parser.parseURI(args[i]);
      }
      catch (DOMException ex) {
        System.err.println(ex);
      }
      
    }

  }

}

The Load and Save Package: org.w3c.dom.ls

DOMImplementationLS: A sub-interface of DOMImplementation that provides the factory methods for creating the objects required for loading and saving.
LSParser: A parser interface
LSInput: Encapsulate information about the source of the XML to be loaded, like SAX's InputSource
LSResourceResolver: During loading, provides a way for applications to redirect references to external entities.
LSParserFilter: Provide the ability to examine and optionally remove Element nodes as they are being processed during the parsing of a document. like SAX filters.
LSSerializer: An interface for serializing DOM documents onto a stream or string.
LSSerializerFilter: Provide the ability to examine and optionally remove or modify nodes as they are being output.
LSLoadEvent: A document has been completely loaded
LSProgressEvent: A document has been partially loaded

DOMImplementationLS

Factory interface to create new LSParser and LSSerializer implementations.

Java Binding:

package org.w3c.dom.ls;

public interface DOMImplementationLS {

  public static final short MODE_SYNCHRONOUS  = 1;
  public static final short MODE_ASYNCHRONOUS = 2;

  public LSParser     createLSParser(short mode, String schemaType) 
    throws DOMException;
  public LSSerializer createLSSerializer();
  public LSInput      createLSInput();
  public LSOutput     createLSOutput();

}

Creating DOMImplementationLS Objects

Use the feature "LS" or "LS-Async" to find a DOMImplementation object that supports Load and Save.
Cast the DOMImplementation object to DOMImplementationLS.

System.setProperty(DOMImplementationRegistry.PROPERTY,
  "org.apache.xerces.dom.DOMImplementationSourceImpl");
DOMImplementationRegistry registry = DOMImplementationRegistry.newInstance();
DOMImplementation impl = registry.getDOMImplementation("XML 1.0 LS 3.0");
  if (impl != null) {
    DOMImplementationLS implls = (DOMImplementationLS) impl;
    // ...
  }

LSParser

Provides an implementation-independent API for parsing XML documents to produce a DOM Document object.
Instances are built by the createLSParser() method in DOMImplementationLS.

Java Binding:

package org.w3c.dom.ls;

public interface LSParser {

  public DOMConfiguration getDomConfig();
  
  public LSParserFilter getFilter();
  public void           setFilter(LSParserFilter filter);

  public boolean getAsync();
  public boolean getBusy();

  public Document parse(LSInput input) throws DOMException, LSException;
  public Document parseURI(String uri) throws DOMException, LSException;

  // ACTION_TYPES
  public static final short ACTION_APPEND_AS_CHILDREN = 1;
  public static final short ACTION_REPLACE_CHILDREN   = 2;
  public static final short ACTION_INSERT_BEFORE      = 3;
  public static final short ACTION_INSERT_AFTER       = 4;
  public static final short ACTION_REPLACE            = 5;

  public Node parseWithContext(LSInput input, Node contextArg, short action)
    throws DOMException, LSException;

  public void abort();

}

LSInput

Like SAX2's InputSource class, this interface is an abstraction of all the different things (streams, files, byte arrays, sockets, URLs, etc.) from which an XML document can be read.

Java Binding:

package org.w3c.dom.ls;

public interface LSInput {

  public Reader getCharacterStream();
  public void setCharacterStream(Reader in);

  public InputStream getByteStream();
  public void        setByteStream(InputStream in);

  public String getStringData();
  public void   setStringData(String stringData);

  public String getSystemId();
  public void   setSystemId(String systemId);

  public String getPublicId();
  public void   setPublicId(String publicId);

  public String getBaseURI();
  public void   setBaseURI(String baseURI);

  public String getEncoding();
  public void   setEncoding(String encoding);

  public boolean getCertifiedText(); // known to be in NFC
  public void    setCertifiedText(boolean certifiedText);

}

LSOutput

An abstraction of all the different things (streams, files, byte arrays, sockets, strings, etc.) to which an XML document can be written
Created by DOMIMplementationLS's createLSOutput() method

Java Binding:

package org.w3c.dom.ls;

public interface LSOutput {

    public Writer getCharacterStream();
    public void   setCharacterStream(java.io.Writer characterStream);

    public OutputStream getByteStream();
    public void         setByteStream(OutputStream byteStream);

    public String getSystemId();
    public void   setSystemId(String systemId);

    public String getEncoding();
    public void   setEncoding(String encoding);

}

LSResourceResolver

Like SAX2's EntityResolver interface, this interface lets applications redirect references to external entities.

Java Binding:

package org.w3c.dom.ls;

public interface LSResourceResolver {

  public LSInput resolveResource(String type, String namespaceURI, 
    String publicID, String systemID, String baseURI);

}

LSSerializer

Provides an API for serializing (writing) a DOM document out as a sequence of bytes onto a stream, file, socket, byte array, etc.

Java Binding:

package org.w3c.dom.ls;

public interface LSSerializer {

  public DOMConfiguration getDomConfig();

  public String getNewLine();
  public void   setNewLine(String newLine);

  public LSSerializerFilter getFilter();
  public void               setFilter(LSSerializerFilter filter);

  public boolean write(Node nodeArg, LSOutput destination) throws LSException;
  public boolean writeToURI(Node nodeArg, String uri) throws LSException;
  public String  writeToString(Node node) throws DOMException, LSException;

}

Fibonacci with DOM3

import java.math.*;
import java.io.*;
import org.w3c.dom.*;
import org.w3c.dom.bootstrap.*;
import org.w3c.dom.ls.*;


public class FibonacciDOM3 {

  public static void main(String[] args) throws Exception {

      System.setProperty(DOMImplementationRegistry.PROPERTY,
        "org.apache.xerces.dom.DOMImplementationSourceImpl");
      DOMImplementationRegistry registry = DOMImplementationRegistry.newInstance();
      DOMImplementation impl = registry.getDOMImplementation("XML 1.0 LS");
      if (impl == null) {
         System.err.println("Oops! Couln't find DOM3 implementation");   
         return;
      }
      Document fibonacci = impl.createDocument(null, "Fibonacci_Numbers", null );

      BigInteger low  = BigInteger.ZERO;
      BigInteger high = BigInteger.ONE;

      Element root = fibonacci.getDocumentElement();

      for (int i = 0; i <= 25; i++) {
        Element number = fibonacci.createElement("fibonacci");
        number.setAttribute("index", Integer.toString(i));
        Text text = fibonacci.createTextNode(low.toString());
        number.appendChild(text);
        root.appendChild(number);

        BigInteger temp = high;
        high = high.add(low);
        low = temp;
      }

      // Now that the document is created we need to *serialize* it
      DOMImplementationLS implls = (DOMImplementationLS) impl;
      LSSerializer serializer = implls.createLSSerializer();
      LSOutput output = implls.createLSOutput();
      output.setByteStream(new FileOutputStream("fibonacci_dom.xml"));
      
      serializer.write(fibonacci, output);

  }

}

LSParserFilter

Lets applications examine nodes as they are being constructed during a parse.
As each node is examined, it may be modified or removed, or parsing may be aborted.

Java Binding:

package org.w3c.dom.ls;

public interface LSParserFilter {

    // Constants returned by startElement and acceptNode
  public static final short FILTER_ACCEPT    = 1;
  public static final short FILTER_REJECT    = 2;
  public static final short FILTER_SKIP      = 3;
  public static final short FILTER_INTERRUPT = 4;

  public short startElement(Element element);
  public short acceptNode(Node node);
  public int   getWhatToShow();

}

LSSerializerFilter

Lets applications examine nodes as they are being output.
As each element is examined, it may be modified or removed, or output may be aborted.

Java Binding:

package org.w3c.dom.ls;

public interface LSSerializerFilter extends NodeFilter {

  public int getWhatToShow();

}

XPath

Query without detailed DOM navigation
e.g. //book[author="Neal Stephenson"]/title to find all books by Neal Stephenson
XPath is not Turing complete but Java is.

Java XPath API

javax.xml.xpath package
XPath 1.0. No XPath 2.
XQuery API is under development.

Using the XPath API

import java.io.IOException;
import org.w3c.dom.*;
import org.xml.sax.SAXException;
import javax.xml.parsers.*;
import javax.xml.xpath.*;

public class XPathExample {

  public static void main(String[] args) 
   throws ParserConfigurationException, SAXException, 
          IOException, XPathExpressionException {

    // 1. Parse a document with JAXP
    DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();
    domFactory.setNamespaceAware(true); // never forget this!
    DocumentBuilder builder = domFactory.newDocumentBuilder();
    Document doc = builder.parse("books.xml");

    // 2. Compile the expression
    XPathFactory factory = XPathFactory.newInstance();
    XPath xpath = factory.newXPath();
    XPathExpression expr 
     = xpath.compile("//book[author='Neal Stephenson']/title/text()");

    // 3. Make the query
    Object result = expr.evaluate(doc, XPathConstants.NODESET);

    // 4. Get the result.
    NodeList nodes = (NodeList) result;
    for (int i = 0; i < nodes.getLength(); i++) {
        System.out.println(nodes.item(i).getNodeValue()); 
    }

  }

}

XPath Data Model

evaluate() is declared to return Object.
Actual return type depends on the result of the XPath expression:
- number maps to a java.lang.Double
- string maps to a java.lang.String
- boolean maps to a java.lang.Boolean
- node-set maps to an org.w3c.dom.NodeList
Second argument specifies the return type you want
- XPathConstants.NODESET
- XPathConstants.BOOLEAN
- XPathConstants.NUMBER
- XPathConstants.STRING
- XPathConstants.NODE
If the requested conversion can't be made, evaluate() throws an XPathException.

Namespaces

//pre:book[pre:author='Neal Stephenson']/pre:title/text()

XSLT uses namespaces in scope to resolve namespace prefixes in XPath expressions.
But Java is not XML. There are no namespaces in scope in a Java program.
How to solve this?

NamespaceContext

Supply a NamespaceContext object to evaluate() that can resolve the prefixes:

package javax.xml.namespace;

public interface NamespaceContext {

 String   getNamespaceURI(String prefix);
 String   getPrefix(String namespaceURI);
 Iterator getPrefixes(String namespaceURI);

}

Java 7/JAXP 1.5 should add a standard implementation of this interface.

NamespaceContext Example

import java.util.Iterator;
import javax.xml.*;
import javax.xml.namespace.NamespaceContext;

public class PersonalNamespaceContext implements NamespaceContext {

    public String getNamespaceURI(String prefix) {
        if (prefix == null) throw new NullPointerException("Null prefix");
        else if ("pre".equals(prefix)) return "http://www.example.org/books";
        else if ("xml".equals(prefix)) return XMLConstants.XML_NS_URI;
        return XMLConstants.NULL_NS_URI;
    }

    // This method isn't necessary for XPath processing.
    public String getPrefix(String uri) {
        throw new UnsupportedOperationException();
    }

    // This method isn't necessary for XPath processing either.
    public Iterator getPrefixes(String uri) {
        throw new UnsupportedOperationException();
    }

}

XPath query that uses namespaces

  XPathFactory factory = XPathFactory.newInstance();
  XPath xpath = factory.newXPath();
  xpath.setNamespaceContext(new PersonalNamespaceContext());
  XPathExpression expr = xpath.compile(
    "//pre:book[pre:author='Neal Stephenson']/pre:title/text()"
  );

  Object result = expr.evaluate(doc, XPathConstants.NODESET);
  NodeList nodes = (NodeList) result;
  for (int i = 0; i < nodes.getLength(); i++) {
    System.out.println(nodes.item(i).getNodeValue()); 
  }

XPath Extension Functions

Do what's inconvenient with XPath
Implement the javax.xml.xpath.XPathFunction interface.
public Object evaluate(List args) throws XPathFunctionException
Return one of these five types:
- java.lang.String
- java.lang.Double
- java.lang.Boolean
- org.w3c.dom.Nodelist
- org.w3c.dom.Node

Extension Function that verifies ISBN checksums

import java.util.List;
import javax.xml.xpath.*;
import org.w3c.dom.*;

public class ISBNValidator implements XPathFunction {
    
  public Object evaluate(List args) throws XPathFunctionException {

    if (args.size() != 1) {
      throw new XPathFunctionException(
       "Wrong number of arguments to valid-isbn()");
    }

    String isbn;
    Object o = args.get(0);

    // perform conversions
    if (o instanceof String) isbn = (String) args.get(0);
    else if (o instanceof Boolean) isbn = o.toString();
    else if (o instanceof Double) isbn = o.toString();
    else if (o instanceof NodeList) {
        NodeList list = (NodeList) o;
        Node node = list.item(0);
        // getTextContent is available in Java 5 and DOM 3.
        // In Java 1.4 and DOM 2, you'd need to recursively 
        // accumulate the content.
        isbn= node.getTextContent();
    }
    else {
      throw new XPathFunctionException("Could not convert argument type");
    }

    char[] data = isbn.toCharArray();
    if (data.length != 10) return Boolean.FALSE;
    int checksum = 0;
    for (int i = 0; i < 9; i++) {
        checksum += (i+1) * (data[i]-'0');
    }
    int checkdigit = checksum % 11;

    if (checkdigit + '0' == data[9] 
     || (data[9] == 'X && checkdigit == 10)) {
        return Boolean.TRUE;
    }
    return Boolean.FALSE;

  }

}

Java Validation API

Validation. augmentation, and type information
javax.xml.validation
Schema language independent
JDK bundles W3C schema support
3rd party Libraries add other languages such as RELAX NG

Schema Validation Example

import java.io.*;
import javax.xml.transform.Source;
import javax.xml.transform.stream.StreamSource;
import javax.xml.validation.*;
import org.xml.sax.SAXException;

public class DocbookXSDCheck {

  public static void main(String[] args) throws SAXException, IOException {

    // 1. Lookup a factory for the W3C XML Schema language
    SchemaFactory factory = 
      SchemaFactory.newInstance("http://www.w3.org/2001/XMLSchema");
    
    // 2. Compile the schema. 
    // Here the schema is loaded from a java.io.File, but you could use 
    // a java.net.URL or a javax.xml.transform.Source instead.
    File schemaLocation = new File("/opt/xml/docbook/xsd/docbook.xsd");
    Schema schema = factory.newSchema(schemaLocation);
  
    // 3. Get a validator from the schema.
    Validator validator = schema.newValidator();
    
    // 4. Parse the document you want to check.
    Source source = new StreamSource(args[0]);
    
    // 5. Check the document
    try {
      validator.validate(source);
      System.out.println(args[0] + " is valid.");
    }
    catch (SAXException ex) {
      System.out.println(args[0] + " is not valid because ");
      System.out.println(ex.getMessage());
    }  
    
  }

}

Validate against a document-specified schema

SchemaFactory factory = SchemaFactory.newInstance("http://www.w3.org/2001/XMLSchema");
Schema schema = factory.newSchema();

Other schema languages

XMLConstants.W3C_XML_SCHEMA_NS_URI: http://www.w3.org/2001/XMLSchema
XMLConstants.RELAXNG_NS_URI: http://relaxng.org/ns/structure/1.0
XMLConstants.XML_DTD_NS_URI: http://www.w3.org/TR/REC-xml
Other languages can be added with libraries
Only W3C schema supported out-of-the-box in JDK 5/6

Schema Augmentation

Adding information to the document from the schema such as default attribute values.
Usually a bad idea.

import java.io.*;
import javax.xml.transform.dom.*;
import javax.xml.validation.*;
import javax.xml.parsers.*;
import org.w3c.dom.*;
import org.xml.sax.SAXException;

public class DocbookXSDAugmenter {

    public static void main(String[] args) 
      throws SAXException, IOException, ParserConfigurationException {

        SchemaFactory factory 
         = SchemaFactory.newInstance("http://www.w3.org/2001/XMLSchema");
        File schemaLocation = new File("/opt/xml/docbook/xsd/docbook.xsd");
        Schema schema = factory.newSchema(schemaLocation);
        Validator validator = schema.newValidator();
        
        DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();
        domFactory.setNamespaceAware(true); // never forget this
        DocumentBuilder builder = domFactory.newDocumentBuilder();
        Document doc = builder.parse(new File(args[0]));
        
        DOMSource source = new DOMSource(doc);
        DOMResult result = new DOMResult();
        
        try {
            validator.validate(source, result);
            Document augmented = (Document) result.getNode();
            // do whatever you need to do with the augmented document...
        }
        catch (SAXException ex) {
            System.out.println(args[0] + " is not valid because ");
            System.out.println(ex.getMessage());
        }  
        
    }

}

Reporting Type Information

DOM 3 TypeInfo tells you what the type is
The Schema object provides a ValidatorHandler that implements SAX's ContentHandler interface that you install in a SAX parser.
You also install your own ContentHandler in the ValidatorHandler (not the parser).
The ValidatorHandler makes available a TypeInfoProvider that your ContentHandler can call to determine the type of the current element or one of its attributes.

DOM 3 TypeInfo interface

package org.w3c.dom;
  
public interface TypeInfo {

 public String  getTypeName();
 public String  getTypeNamespace();
 public boolean isDerivedFrom(String typeNamespace, 
   String typeName, int derivationMethod);
 
 public static int DERIVATION_EXTENSION;
 public static int DERIVATION_LIST;
 public static int DERIVATION_RESTRICTION;
 public static int DERIVATION_UNION;

}

JAXP TypeInfo provider class

package javax.xml.validation;

public abstract class TypeInfoProvider {

  public abstract TypeInfo getElementTypeInfo();
  public abstract TypeInfo getAttributeTypeInfo(int index);
  public abstract boolean  isIdAttribute(int index);
  public abstract boolean  isSpecified(int index);

}

Program to Report All Types

import java.io.*;
import javax.xml.validation.*;

import org.xml.sax.*;
import org.xml.sax.helpers.*;

public class TypeLister extends DefaultHandler {

    private TypeInfoProvider provider;
    
    public TypeLister(TypeInfoProvider provider) {
        this.provider = provider;
    }

    public static void main(String[] args) throws SAXException, IOException {

        SchemaFactory factory 
         = SchemaFactory.newInstance("http://www.w3.org/2001/XMLSchema");
        File schemaLocation = new File("/opt/xml/docbook/xsd/docbook.xsd");
        Schema schema = factory.newSchema(schemaLocation);
    
        ValidatorHandler vHandler = schema.newValidatorHandler();
        TypeInfoProvider provider = vHandler.getTypeInfoProvider();
        ContentHandler   cHandler = new TypeLister(provider);
        vHandler.setContentHandler(cHandler);
        
        XMLReader parser = XMLReaderFactory.createXMLReader();
        parser.setContentHandler(vHandler);
        parser.parse(args[0]);
        
    }
    
    public void startElement(String namespace, String localName,
      String qualifiedName, Attributes atts) throws SAXException {
        String type = provider.getElementTypeInfo().getTypeName();
        System.out.println(qualifiedName + ": " + type);
    }

}

Output

book: #AnonType_book
title: #AnonType_title
subtitle: #AnonType_subtitle
info: #AnonType_info
copyright: #AnonType_copyright
year: #AnonType_year
holder: #AnonType_holder
author: #AnonType_author
personname: #AnonType_personname
firstname: #AnonType_firstname
othername: #AnonType_othername
surname: #AnonType_surname
personblurb: #AnonType_personblurb
para: #AnonType_para
link: #AnonType_link

Java XML Digital Signatures API

Digital Signatures

W3C/IETF Joint Proposed Recommendation, August 20, 2001
XML Signatures provide:

Integrity
Message authentication
Signer authentication

For data of any type

Not Just for Signing XML

Signed data can be located within the XML that includes the signature or elsewhere.
An enveloped signature is enclosed inside the XML element it signs
An enveloping signature signs XML data it contains.
A detached signature signs data external to the Signature element, possibly in another document entirely.

Generic Digital Signature Process

The signature processor calculates a hash code for some data using a strong, one-way hash function.
The processor encrypts the hash code using a private key.
The verifier calculates the hash code for the data it's received.
It then decrypts the encrypted hash code using the public key to see if the hash codes match.

XML Signature Process

The signature processor digests (calculates the hash code for) a data object.
The processor places the digest value in a Signature element.
The processor digests the Signature element.
The processor cryptographically signs the Signature element.

A Detached Signature

<?xml version='1.0' encoding='UTF-8'?>
<Signature xmlns="http://www.w3.org/2000/09/xmldsig#">
  <SignedInfo>
    <CanonicalizationMethod Algorithm="http://www.w3.org/TR/2000/WD-xml-c14n-20000119"/>
    <SignatureMethod Algorithm="http://www.w3.org/2000/09/xmldsig#dsa-sha1"/>
    <Reference URI="http://www.cafeconleche.org/slides/hoffman/fundamentals/examples/hotcop.xml">
      <DigestMethod Algorithm="http://www.w3.org/2000/09/xmldsig#sha1"/>
      <DigestValue>nvfYilfgN/rICyzhGmjidKCFoC8=</DigestValue>
    </Reference>
  </SignedInfo>
  <SignatureValue>
    hfowa4qdbuMkoZfX1/VXd4UBpIpZMM5+6CElmY7jOIKFqvXq5A5VKw==
  </SignatureValue>
  <KeyInfo>
    <KeyValue>
      <DSAKeyValue>
        <P>
          /X9TgR11EilS30qcLuzk5/YRt1I870QAwx4/gLZRJmlFXUAiUftZPY1Y+r/F9bow9s
          ubVWzXgTuAHTRv8mZgt2uZUKWkn5/oBHsQIsJPu6nX/rfGG/g7V+fGqKYVDwT7g/bT
          xR7DAjVUE1oWkTL2dfOuK2HXKu/yIgMZndFIAcc=
        </P>
        <Q>l2BQjxUjC8yykrmCouuEC/BYHPU=</Q>
        <G>
          9+GghdabPd7LvKtcNrhXuXmUr7v6OuqC+VdMCz0HgmdRWVeOutRZT+ZxBxCBgLRJFn
          Ej6EwoFhO3zwkyjMim4TwWeotUfI0o4KOuHiuzpnWRbqN/C/ohNWLx+2J6ASQ7zKTx
          vqhRkImog9/hWuWfBpKLZl6Ae1UlZAFMO/7PSSo=
        </G>
        <Y>
          6jKpNnmkkWeArsn5Oeeg2njcz+nXdk0f9kZI892ddlR8Lg1aMhPeFTYuoq3I6neFlb
          BjWzuktNZKiXYBfKsSTB8U09dTiJo2ir3HJuY7eW/p89osKMfixPQsp9vQMgzph6Qa
          lY7j4MB7y5ROJYsTr1/fFwmj/yhkHwpbpzed1LE=
        </Y>
      </DSAKeyValue>
    </KeyValue>
    <X509Data>
      <X509IssuerSerial>
        <X509IssuerName>CN=Elliotte Rusty Harold, OU=Metrotech, O=Polytechnic, L=Brooklyn, ST=New York, C=US</X509IssuerName>
        <X509SerialNumber>983556890</X509SerialNumber></X509IssuerSerial>
      <X509SubjectName>CN=Elliotte Rusty Harold, OU=Metrotech, O=Polytechnic, L=Brooklyn, ST=New York, C=US</X509SubjectName>
      <X509Certificate>
MIIDLzCCAu0CBDqf4xowCwYHKoZIzjgEAwUAMH0xCzAJBgNVBAYTAlVTMREwDwYDVQQIEwhOZXcg
WW9yazERMA8GA1UEBxMIQnJvb2tseW4xFDASBgNVBAoTC1BvbHl0ZWNobmljMRIwEAYDVQQLEwlN
ZXRyb3RlY2gxHjAcBgNVBAMTFUVsbGlvdHRlIFJ1c3R5IEhhcm9sZDAeFw0wMTAzMDIxODE0NTBa
Fw0wMTA1MzExODE0NTBaMH0xCzAJBgNVBAYTAlVTMREwDwYDVQQIEwhOZXcgWW9yazERMA8GA1UE
BxMIQnJvb2tseW4xFDASBgNVBAoTC1BvbHl0ZWNobmljMRIwEAYDVQQLEwlNZXRyb3RlY2gxHjAc
BgNVBAMTFUVsbGlvdHRlIFJ1c3R5IEhhcm9sZDCCAbgwggEsBgcqhkjOOAQBMIIBHwKBgQD9f1OB
HXUSKVLfSpwu7OTn9hG3UjzvRADDHj+AtlEmaUVdQCJR+1k9jVj6v8X1ujD2y5tVbNeBO4AdNG/y
ZmC3a5lQpaSfn+gEexAiwk+7qdf+t8Yb+DtX58aophUPBPuD9tPFHsMCNVQTWhaRMvZ1864rYdcq
7/IiAxmd0UgBxwIVAJdgUI8VIwvMspK5gqLrhAvwWBz1AoGBAPfhoIXWmz3ey7yrXDa4V7l5lK+7
+jrqgvlXTAs9B4JnUVlXjrrUWU/mcQcQgYC0SRZxI+hMKBYTt88JMozIpuE8FnqLVHyNKOCjrh4r
s6Z1kW6jfwv6ITVi8ftiegEkO8yk8b6oUZCJqIPf4VrlnwaSi2ZegHtVJWQBTDv+z0kqA4GFAAKB
gQDqMqk2eaSRZ4Cuyfk556DaeNzP6dd2TR/2Rkjz3Z12VHwuDVoyE94VNi6ircjqd4WVsGNbO6S0
1kqJdgF8qxJMHxTT11OImjaKvccm5jt5b+nz2iwox+LE9Cyn29AyDOmHpBqVjuPgwHvLlE4lixOv
X98XCaP/KGQfClunN53UsTALBgcqhkjOOAQDBQADLwAwLAIUODqxsFzS96BjrVA4LVo5FzuWBRMC
FC0xfXxbaJaCJuVqtcBv4bqwV0EX
      </X509Certificate>
    </X509Data>
  </KeyInfo>
</Signature>

Signing a Document: Step 1. Create the Signer Object

XMLSignatureFactory factory = XMLSignatureFactory.getInstance("DOM");

DigestMethod sha1 = factory.newDigestMethod(DigestMethod.SHA1, null);
CanonicalizationMethod inclusive = factory.newCanonicalizationMethod
 (CanonicalizationMethod.INCLUSIVE, (C14NMethodParameterSpec) null);
SignatureMethod rsasha1 
 = factory.newSignatureMethod(SignatureMethod.RSA_SHA1, null);

Transform enveloped = factory.newTransform
  (Transform.ENVELOPED, (TransformParameterSpec) null));
List transforms = Collections.singletonList(enveloped);

// empty string means sign the current, complete document
Reference ref = factory.newReference("", sha1, transforms);
List references = Collections.singletonList(ref);

SignedInfo signer = factory.newSignedInfo(inclusive, rsasha1, references);

Signing a Document: Step 2. Create the Key

char[] password = "secret".toCharArray();
KeyStore store = KeyStore.getInstance("JKS");
InputStream keys = new FileInputStream("keys.jks");
store.load(keys, password);
KeyStore.PrivateKeyEntry entry = (KeyStore.PrivateKeyEntry) store.getEntry
  ("theKey", new KeyStore.PasswordProtection(password));
X509Certificate cert = (X509Certificate) entry.getCertificate();

KeyInfoFactory keyFactory = factory.getKeyInfoFactory();
List certs = new ArrayList();
certs.add(cert.getSubjectX500Principal().getName());
certs.add(cert);
X509Data data = keyFactory.newX509Data(certs);
List dataList = Collections.singletonList(data);
KeyInfo key = keyFactory.newKeyInfo(dataList);

Signing a Document: Step 3. Sign the Document

Document doc = getDOMDocument( /* wherever you like */ );

DOMSignContext context 
  = new DOMSignContext(entry.getPrivateKey(), doc.getDocumentElement());

XMLSignature signature = factory.newXMLSignature(signer, key);
signature.sign(context);
// The Signature element has now been added to the Document.

Verifying a Signature

NodeList nodes 
  = doc.getElementsByTagNameNS(XMLSignature.XMLNS, "Signature");
DOMValidateContext dvc 
  = new DOMValidateContext(new X509KeySelector(), nodes.item(0));
XMLSignature signature = factory.unmarshalXMLSignature(dvc);
if (signature.validate(dvc)) {
  System.err.println(
    "Signature failed! Document may have been tampered with.");
}

StAX

XML API Styles

Push: SAX, XNI
Tree: DOM, JDOM, XOM, ElectricXML, dom4j, Sparta
Data binding: Castor, Zeus, JAXB
Pull: XMLPULL, StAX, NekoPull
Transform: XSLT, TrAX, XQuery

Pull Parsing

pull parsing is the way to go in the future. The first 3 XML parsers (Lark, NXP, and expat) all were event-driven because... er well that was 1996, can't exactly remember, seemed like a good idea at the time.

--Tim Bray on the xml-dev mailing list, Wednesday, September 18, 2002

Pull Parsing is

Fast
Memory efficient
Streamable
Read-only

StAX

Streaming API for XML
javax.xml.stream.
JSR-173, proposed by BEA Systems:
Two recently proposed JSRs, JAXB and JAX-RPC, highlight the need for an XML Streaming API. Both data binding and remote procedure calling (RPC) require processing of XML as a stream of events, where the current context of the XML defines subsequent processing of the XML. A streaming API makes this type of code much more natural to write than SAX, and much more efficient than DOM.
Goals:
- Develop APIs and conventions that allow a user to programmatically pull parse events from an XML input stream.
- Develop APIs that allow a user to write events to an XML output stream.
- Develop a set of objects and interfaces that encapsulate the information contained in an XML stream.
The specification should be easy to use, efficient, and not require a grammar. It should include support for namespaces, and associated XML constructs. The specification will make reasonable efforts to define APIs that are "pluggable".

Major Classes and Interfaces

XMLStreamReader:: an interface that represents the parser
XMLInputFactory:: the factory class that instantiates an implementation dependent implementation of XMLStreamReader
XMLStreamException:: the generic class for everything other than an IOException that might go wrong when parsing an XML document, particularly well-formedness errors

Simple Wellformedness Checker

import javax.xml.stream.*;
import java.net.*;
import java.io.*;

 
public class StAXChecker {

  public static void main(String[] args) {
        
    if (args.length == 0) {
      System.err.println("Usage: java StAXChecker url" );
      return;   
    }
        
    try {

      InputStream in;
      try {
        URL u = new URL(args[0]);
        in = u.openStream();
      }
      catch (MalformedURLException ex) {
          // Maybe it's a file name
          in = new FileInputStream(args[0]);
      }
      
      XMLInputFactory factory = XMLInputFactory.newInstance();
      XMLStreamReader parser = factory.createXMLStreamReader(in);
        
      while (true) {
           int event = parser.next();
           if (event == XMLStreamConstants.END_DOCUMENT) {
                parser.close();
                break;
           }
      }
      parser.close();
            
      // If we get here there are no exceptions
      System.out.println(args[0] + " is well-formed");      
    }
    catch (XMLStreamException ex) {
       System.out.println(args[0] + " is not well-formed"); 
       System.out.println(ex);  
    }
    catch (IOException ex) {
      System.out.println(args[0] + " could not be checked due to an " 
       + ex.getClass().getName());   
      ex.printStackTrace();      
    }
        
  }

}

Output from a Simple Wellformedness Checker

$ java -classpath stax.jar:.:bea.jar StAXChecker http://www.cafeconleche.org/
http://www.cafeconleche.org/ is well-formed
$ java -classpath stax.jar:.:bea.jar StAXChecker http://www.xml.com/
http://www.xml.com/ is not well-formed
javax.xml.stream.XMLStreamException: ParseError at [row,col]:[44,7]
Message: could not resolve entity named 'nbsp'

Event Codes

The XMLStreamConstants interface defines the int event codes returned by next() to tell you what kind of node the parser read.
17 event codes:
- XMLStreamConstants.START_DOCUMENT
- XMLStreamConstants.END_DOCUMENT
- XMLStreamConstants.START_ENTITY
- XMLStreamConstants.END_ENTITY
- XMLStreamConstants.START_ELEMENT
- XMLStreamConstants.END_ELEMENT
- XMLStreamConstants.ATTRIBUTE
- XMLStreamConstants.CHARACTERS
- XMLStreamConstants.CDATA
- XMLStreamConstants.SPACE
- XMLStreamConstants.PROCESSING_INSTRUCTION
- XMLStreamConstants.COMMENT
- XMLStreamConstants.ENTITY_REFERENCE
- XMLStreamConstants.NOTATION_DECLARATION
- XMLStreamConstants.ENTITY_DECLARATION
- XMLStreamConstants.NAMESPACE
- XMLStreamConstants.DTD
Depending on what the event is, different methods are available on the XMLStreamReader

Listening to Events

import javax.xml.stream.*;
import java.net.*;
import java.io.*;

 
public class EventLister {

  public static void main(String[] args) {
        
    if (args.length == 0) {
      System.err.println("Usage: java EventLister url" );
     return;    
    }
        
    try {

      InputStream in;
      try {
        URL u = new URL(args[0]);
        in = u.openStream();
      }
      catch (MalformedURLException ex) {
          // Maybe it's a file name
          in = new FileInputStream(args[0]);
      }

      XMLInputFactory factory = XMLInputFactory.newInstance();
      XMLStreamReader parser = factory.createXMLStreamReader(in);
        
      while (true) {
         int event = parser.next();
         if (event == XMLStreamConstants.START_ELEMENT) {
             System.out.println("Start tag");
         }
         else if (event == XMLStreamConstants.END_ELEMENT) {
             System.out.println("End tag");
         }
         else if (event == XMLStreamConstants.START_DOCUMENT) {
             System.out.println("Start document");
         }
         else if (event == XMLStreamConstants.CHARACTERS) {
             System.out.println("Text");
         }
         else if (event == XMLStreamConstants.CDATA) {
             System.out.println("CDATA Section");
         }
         else if (event == XMLStreamConstants.COMMENT) {
             System.out.println("Comment");
         }
         else if (event == XMLStreamConstants.DTD) {
             System.out.println("Document type declaration");
         }
         else if (event == XMLStreamConstants.ENTITY_REFERENCE) {
             System.out.println("Entity Reference");
         }
         else if (event == XMLStreamConstants.START_ENTITY) {
             System.out.println("Entity Reference");
         }
         else if (event == XMLStreamConstants.END_ENTITY) {
             System.out.println("Entity Reference");
         }
         else if (event == XMLStreamConstants.SPACE) {
             System.out.println("Ignorable white space");
         }
         else if (event == XMLStreamConstants.NOTATION_DECLARATION) {
             System.out.println("Notation Declaration");
         }
         else if (event == XMLStreamConstants.ENTITY_DECLARATION) {
             System.out.println("Entity Declaration");
         }
         else if (event == XMLStreamConstants.PROCESSING_INSTRUCTION) {
             System.out.println("Processing Instruction");
         }
         else if (event == XMLStreamConstants.END_DOCUMENT) {
             System.out.println("End Document");
             break;
         }
      }           
    }
    catch (XMLStreamException ex) {
       System.out.println(ex);  
    }
    catch (IOException ex) {
      System.out.println("IOException while parsing " + args[0]);   
    }
        
  }

}

Output from EventLister

% java -classpath stax.jar:.:bea.jar EventLister hotcop.xml
Ignorable white space
Start tag
Text
Start tag
Text
End tag
Text
Start tag
Text
End tag
Text
Start tag
Text
End tag
Text
Start tag
Text
End tag
Text
Start tag
Text
End tag
Text
Start tag
Text
End tag
Text
Start tag
Text
End tag
Text
Start tag
Text
End tag
Text
Start tag
Text
End tag
Text
End tag
Ignorable white space
End Document

XMLStreamReader getter methods depend on the current state

Invokable methods
Event Type	Valid Methods
START_ELEMENT	next(), getName(), getLocalName(), hasName(), getPrefix(), getAttributeCount(), getAttributeName(int index), getAttributeNamespace(int index), getAttributePrefix(int index), getAttributeQName(int index), getAttributeType(int index), getAttributeValue(int index), getAttributeValue(String namespaceURI, String localName), isAttributeSpecified(), getNamespaceContext(), getNamespaceCount(), getNamespacePrefix(int index), getNamespaceURI(), getNamespaceURI(int index), getNamespaceURI(String prefix), getElementText(), nextTag()
ATTRIBUTE	next(), nextTag(), getAttributeCount(), getAttributeName(int index), getAttributeNamespace(int index), getAttributePrefix(int index), getAttributeQName(int index), getAttributeType(int index), getAttributeValue(int index), getAttributeValue(String namespaceURI, String localName), isAttributeSpecified()
NAMESPACE	next(), nextTag(), getNamespaceContext(), getNamespaceCount(), getNamespacePrefix(int index), getNamespaceURI(), getNamespaceURI(int index), getNamespaceURI(String prefix)
END_ELEMENT	next(), getName(), getLocalName(), hasName(), getPrefix(), getNamespaceContext(), getNamespaceCount(), getNamespacePrefix(int index), getNamespaceURI(), getNamespaceURI(int index), getNamespaceURI(String prefix), nextTag()
CHARACTERS	next(), getText(), getTextCharacters(), getTextCharacters(int sourceStart, char[] target, int targetStart, int length), getTextLength(), nextTag()
CDATA	next(), getText(), getTextCharacters(), getTextCharacters(int sourceStart, char[] target, int targetStart, int length), getTextLength(), nextTag()
COMMENT	next(), getText(), getTextCharacters(), getTextCharacters(int sourceStart, char[] target, int targetStart, int length), getTextLength(), nextTag()
SPACE	next(), getText(), getTextCharacters(), getTextCharacters(int sourceStart, char[] target, int targetStart, int length), getTextLength(), nextTag()
START_DOCUMENT	next(), getEncoding(), next(), getPrefix(), getVersion(), isStandalone(), standaloneSet(), getCharacterEncodingScheme(), nextTag()
END_DOCUMENT	close()
PROCESSING_INSTRUCTION	next(), getPITarget(), getPIData(), nextTag()
ENTITY_REFERENCE	next(), getLocalName(), getText(), nextTag()
DTD	next(), getText(), nextTag()

getText()

The getText() method returns the text of the current event:
```
public String getText()
```
It works for CHARACTERS, SPACE, ENTITY_REFERENCE, DTD, and COMMENT events.
Not guaranteed to return a maximum contiguous run of text unless javax.xml.stream.isCoalescing is true
For all other event types it throws an java.lang.IllegalStateException

getText() Example

import javax.xml.stream.*;
import java.net.*;
import java.io.*;

 
public class EventText {

  public static void main(String[] args) {
        
    if (args.length == 0) {
      System.err.println("Usage: java EventText url" );
      return;    
    }
        
    try {

      InputStream in;
      try {
        URL u = new URL(args[0]);
        in = u.openStream();
      }
      catch (MalformedURLException ex) {
          // Maybe it's a file name
          in = new FileInputStream(args[0]);
      }

      XMLInputFactory factory = XMLInputFactory.newInstance();
      XMLStreamReader parser = factory.createXMLStreamReader(in);
        
      for (int event = parser.next(); 
           event != XMLStreamConstants.END_DOCUMENT; 
           event = parser.next()) {
         if (event == XMLStreamConstants.CHARACTERS 
           || event == XMLStreamConstants.SPACE 
           || event == XMLStreamConstants.CDATA) {
             System.out.println(parser.getText());
         }
         else if (event == XMLStreamConstants.COMMENT) {
             System.out.println("<!-- " + parser.getText() + "-->");
         }
      }           
    }
    catch (XMLStreamException ex) {
       System.out.println(ex);  
    }
    catch (IOException ex) {
      System.out.println("IOException while parsing " + args[0]);   
    }
        
  }

}

Output

$ java -classpath stax.jar:.:bea.jar EventText hotcop.xml




Hot Cop


Jacques Morali


Henri Belolo


Victor Willis


Jacques Morali


PolyGram Records


6:20


1978


Village People

isFoo() and hasFoo()

Rather than testing for type, it's sometimes useful to ask if the current event can be queried for a certain characteristic:

public boolean isStartElement()
public boolean isEndElement()
public boolean isCharacters()
public boolean isWhiteSpace()
public boolean hasText()
public boolean hasName()
public boolean hasNext()

hasText() Example

import javax.xml.stream.*;
import java.net.*;
import java.io.*;

 
public class SimplerEventText {

  public static void main(String[] args) {
        
    if (args.length == 0) {
      System.err.println("Usage: java SimplerEventText url" );
      return;    
    }
        
    try {

      InputStream in;
      try {
        URL u = new URL(args[0]);
        in = u.openStream();
      }
      catch (MalformedURLException ex) {
          // Maybe it's a file name
          in = new FileInputStream(args[0]);
      }

      XMLInputFactory factory = XMLInputFactory.newInstance();
      XMLStreamReader parser = factory.createXMLStreamReader(in);
        
      for (int event = parser.next(); 
           parser.hasNext(); 
           event = parser.next()) {
         if (parser.hasText()) {
             System.out.println(parser.getText());
         }
      }           
    }
    catch (XMLStreamException ex) {
       System.out.println(ex);  
    }
    catch (IOException ex) {
      System.out.println("IOException while parsing " + args[0]);   
    }
        
  }

}

More efficient way of getting text

public char[] getTextCharacters() 
public int    getTextStart()
public int    getTextLength()

The char array returned may be reused, and is good only until the next call to next()

import javax.xml.stream.*;
import java.net.*;
import java.io.*;

 
public class EfficientEventText {

  public static void main(String[] args) {
        
    if (args.length == 0) {
      System.err.println("Usage: java EfficientEventText url" );
      return;    
    }
        
    try {

      InputStream in;
      try {
        URL u = new URL(args[0]);
        in = u.openStream();
      }
      catch (MalformedURLException ex) {
          // Maybe it's a file name
          in = new FileInputStream(args[0]);
      }

      XMLInputFactory factory = XMLInputFactory.newInstance();
      XMLStreamReader parser = factory.createXMLStreamReader(in);
        
      Writer out = new OutputStreamWriter(System.out);
      for (int event = parser.next(); 
           event != XMLStreamConstants.END_DOCUMENT; 
           event = parser.next()) {
         if (event == XMLStreamConstants.CHARACTERS 
           || event == XMLStreamConstants.SPACE 
           || event == XMLStreamConstants.CDATA) {
             out.write(parser.getTextCharacters(), 
              parser.getTextStart(), parser.getTextLength());
         }
      }          
      out.flush();
      out.close();
    }
    catch (XMLStreamException ex) {
       System.out.println(ex);  
    }
    catch (IOException ex) {
      System.out.println("IOException while parsing " + args[0]);   
    }
        
  }

}

Reusable Text Arrays

public int getTextCharacters(int sourceStart, char[] target, int targetStart, int length)
  throws XMLStreamException, IndexOutOfBoundsException, 
         UnsupportedOperationException, IllegalStateException

Names

If the event is START_ELEMENT or END_ELEMENT, then the following methods in XMLStreamReader also work:

public String getLocalName()
public String getPrefix()
public QName getName()

getLocalName() returns the local (unprefixed) name of the element
getQName() returns a QName object for this element
getPrefix() returns the prefix of the element, or null if the element does not have a prefix

QName Class

import javax.xml.namespace.*;

public class QName {

    public QName(String localPart);
    public QName(String namespaceURI, String localPart);
    public QName(String namespaceURI, String localPart, String prefix);
    
    public String getLocalPart();
    public String getPrefix();
    public String getNamespaceURI();
    
    public static QName valueOf(String qNameAsString);

    public int     hashCode();
    public boolean equals(Object object);
    public String  toString();

}

Names Example

import javax.xml.stream.*;
import java.net.*;
import java.io.*;

 
public class NamePrinter {

  public static void main(String[] args) {
        
    if (args.length == 0) {
      System.err.println("Usage: java NamePrinter url" );
      return;   
    }
        
    try {

      InputStream in;
      try {
        URL u = new URL(args[0]);
        in = u.openStream();
      }
      catch (MalformedURLException ex) {
          // Maybe it's a file name
          in = new FileInputStream(args[0]);
      }
      
      XMLInputFactory factory = XMLInputFactory.newInstance();
      XMLStreamReader parser = factory.createXMLStreamReader(in);
              
      while (true) {
         int event = parser.next();
         if (event == XMLStreamConstants.START_ELEMENT) {
             System.out.println("Start tag: ");
             printEvent(parser);
         }
         else if (event == XMLStreamConstants.END_ELEMENT) {
             System.out.println("End tag");
             printEvent(parser);
         }
         else if (event == XMLStreamConstants.START_DOCUMENT) {
             System.out.println("Start document");
         }
         else if (event == XMLStreamConstants.CHARACTERS) {
             System.out.println("Text");
             printEvent(parser);
         }
         else if (event == XMLStreamConstants.CDATA) {
             System.out.println("CDATA Section");
             printEvent(parser);
         }
         else if (event == XMLStreamConstants.COMMENT) {
             System.out.println("Comment");
             printEvent(parser);
         }
         else if (event == XMLStreamConstants.DTD) {
             System.out.println("Document type declaration");
             printEvent(parser);
         }
         else if (event == XMLStreamConstants.ENTITY_REFERENCE) {
             System.out.println("Entity Reference");
             printEvent(parser);
         }
         else if (event == XMLStreamConstants.SPACE) {
             System.out.println("Ignorable white space");
             printEvent(parser);
         }
         else if (event == XMLStreamConstants.PROCESSING_INSTRUCTION) {
             System.out.println("Processing Instruction");
             printEvent(parser);
         }
         else if (event == XMLStreamConstants.END_DOCUMENT) {
             System.out.println("End Document");
             break;
         } // end else if
      }  // end while
    }
    catch (XMLStreamException ex) {
       System.out.println(ex);  
    }
    catch (IOException ex) {
      System.out.println("IOException while parsing " + args[0]);   
      ex.printStackTrace();
    }
        
  }
  
  private static void printEvent(XMLStreamReader parser) {
      String localName = parser.getLocalName();
      String prefix = parser.getPrefix();
      String uri = parser.getNamespaceURI();
      
      if (localName != null) System.out.println("\tLocal name: " + localName);
      if (prefix != null) System.out.println("\tPrefix: " + prefix);
      if (uri != null) System.out.println("\tNamespace URI: " + uri);
      System.out.println();
  }

}

Names Example Output

[146:sd2004west/stax/examples] elharo% java -classpath .:bea.jar:stax.jar NamePrinter hotcop.xml
Ignorable white space

Start tag: 
        Local name: SONG
        Namespace URI: 

Text

Start tag: 
        Local name: TITLE
        Namespace URI: 

Text

End tag
        Local name: TITLE
        Namespace URI: 

Text

Start tag: 
        Local name: COMPOSER
        Namespace URI: 

Text

End tag
        Local name: COMPOSER
        Namespace URI: 

Text

Start tag: 
        Local name: COMPOSER
        Namespace URI: 

Text

End tag
        Local name: COMPOSER
        Namespace URI: 

Text

Start tag: 
        Local name: COMPOSER
        Namespace URI: 

Text

End tag
        Local name: COMPOSER
        Namespace URI: 

Text

Start tag: 
        Local name: PRODUCER
        Namespace URI: 

Text

End tag
        Local name: PRODUCER
        Namespace URI: 

Text

Start tag: 
        Local name: PUBLISHER
        Namespace URI: 

Text

End tag
        Local name: PUBLISHER
        Namespace URI: 

Text

Start tag: 
        Local name: LENGTH
        Namespace URI: 

Text

End tag
        Local name: LENGTH
        Namespace URI: 

Text

Start tag: 
        Local name: YEAR
        Namespace URI: 

Text

End tag
        Local name: YEAR
        Namespace URI: 

Text

Start tag: 
        Local name: ARTIST
        Namespace URI: 

Text

End tag
        Local name: ARTIST
        Namespace URI: 

Text

End tag
        Local name: SONG
        Namespace URI: 

Ignorable white space

End Document

RSSLister

import javax.xml.stream.*;
import java.net.*;
import java.io.*;

 
public class RSSLister {

  public static void main(String[] args) {
        
    if (args.length == 0) {
      System.err.println("Usage: java RSSLister url" );
      return;    
    }
        
    try {

      InputStream in;
      try {
        URL u = new URL(args[0]);
        in = u.openStream();
      }
      catch (MalformedURLException ex) {
          // Maybe it's a file name
          in = new FileInputStream(args[0]);
      }

      XMLInputFactory factory = XMLInputFactory.newInstance();
      XMLStreamReader parser = factory.createXMLStreamReader(in);
        
      boolean printing = false;
      for (int event = parser.next(); 
           parser.hasNext(); 
           event = parser.next()) {
         if (event == XMLStreamConstants.START_ELEMENT) {
             String name = parser.getLocalName();
             if (name.equals("title")) printing = true;
         }
         else if (event == XMLStreamConstants.END_ELEMENT) {
             String name = parser.getLocalName();
             if (name.equals("title")) printing = false;
         }
         else if (parser.hasText() && event != XMLStreamConstants.COMMENT) {
             if (printing) System.out.println(parser.getText());
         }
      }  
      parser.close();
         
    }
    catch (XMLStreamException ex) {
       System.out.println(ex);  
    }
    catch (IOException ex) {
      System.out.println("IOException while parsing " + args[0]);   
    }
        
  }

}

RSSLister Output

$ java -classpath stax.jar:.:bea.jar RSSLister ananova.rss
Ananova:
Archeology
Powered by News Is Free
Britain's earliest leprosy victim may have been found
20th anniversary of Mary Rose recovery
'Proof of Jesus' burial box damaged on way to Canada
Remains of four woolly rhinos give new insight into Ice Age
Experts solve crop lines mystery

Improved RSSLister

Print only item titles:

import javax.xml.stream.*;
import java.net.*;
import java.io.*;

 
public class BetterRSSLister {

  public static void main(String[] args) {
        
    if (args.length == 0) {
      System.err.println("Usage: java BetterRSSLister url" );
      return;    
    }
        
    try {

      InputStream in;
      try {
        URL u = new URL(args[0]);
        in = u.openStream();
      }
      catch (MalformedURLException ex) {
          // Maybe it's a file name
          in = new FileInputStream(args[0]);
      }

      XMLInputFactory factory = XMLInputFactory.newInstance();
      XMLStreamReader parser = factory.createXMLStreamReader(in);
        
      boolean inItem = false;
      boolean inTitle = false;
      // I am relying on no recursion here. To fix this
      // just keep an int count rather than a boolean
      for (int event = parser.nextTag(); 
           parser.hasNext(); 
           event = parser.next()) {
         if (event == XMLStreamConstants.START_ELEMENT) {
             String name = parser.getLocalName();
             if (name.equals("title")) inTitle = true;
             else if (name.equals("item")) inItem = true;
         }
         else if (event == XMLStreamConstants.END_ELEMENT) {
             String name = parser.getLocalName();
             if (name.equals("title")) inTitle = false;
             else if (name.equals("item")) inItem = false;
          }
         else if (parser.hasText() && event != XMLStreamConstants.COMMENT) {
             if (inItem && inTitle) System.out.println(parser.getText());
         }
      }  
      parser.close();
      
    }
    catch (XMLStreamException ex) {
       System.out.println(ex);  
    }
    catch (IOException ex) {
      System.out.println("IOException while parsing " + args[0]);   
    }
        
  }

}

RSSLister Output

$ java -classpath stax.jar:.:bea.jar RSSLister ananova.rss
Archeology
Powered by News Is Free
Britain's earliest leprosy victim may have been found
20th anniversary of Mary Rose recovery
'Proof of Jesus' burial box damaged on way to Canada
Remains of four woolly rhinos give new insight into Ice Age
Experts solve crop lines mystery

The nextTag() method

Skips ahead to the next start-tag or end-tag
Ignores comments, processing instructions, and boundary white space
Throws an exception if it encounters non-whitespace text
useful for processing record-like XML

Attributes

These methods are invokable when the event type is START_ELEMENT:

  
  public int     getAttributeCount()
  public String  getAttributeNamespace(int index)
  public String  getAttributeName(int index)
  public QName   getAttributeQName(int index)
  public String  getAttributePrefix(int index)
  public String  getAttributeType(int index)
  public boolean isAttributeSpecified(int index)
  public String  getAttributeValue(int index)
  public String  getAttributeValue(String namespace, String name)

xmlns and xmlns:prefix attributes are not reported
If the javax.xml.stream.isNamespaceAware property is false, xmlns and xmlns:prefix attributes are reported.

Attributes Example: XLinkSpider

import javax.xml.stream.*;
import java.net.*;
import java.io.*;
import java.util.*;

public class PullSpider {

  // Need to keep track of where we've been 
  // so we don't get stuck in an infinite loop
  private List spideredURIs = new Vector();

  // This linked list keeps track of where we're going.
  // Although the LinkedList class does not guarantee queue like
  // access, I always access it in a first-in/first-out fashion.
  private LinkedList queue = new LinkedList();
  
  private URL currentURL;
  private XMLInputFactory factory;
  
  public PullSpider() {
      this.factory = XMLInputFactory.newInstance();
  }

  private void processStartTag(XMLStreamReader parser) {
    
    String type 
     = parser.getAttributeValue("http://www.w3.org/1999/xlink", "type");
    if (type != null) {
      String href 
       = parser.getAttributeValue("http://www.w3.org/1999/xlink", "href");
          if (href != null) {
            try {
              URL foundURL = new URL(currentURL, href);
              if (!spideredURIs.contains(foundURL)) {
                queue.addFirst(foundURL);
              }
            }
           catch (MalformedURLException ex) {
             // skip this URL  
           }
        }
    }
  }
  
  public void spider(URL url) {
      
    System.out.println("Spidering " + url);
    currentURL = url;
    try {
      XMLStreamReader parser = factory.createXMLStreamReader(currentURL.openStream());
      spideredURIs.add(currentURL);
      
      for (int event = parser.next(); 
           parser.hasNext(); 
           event = parser.next()) {
         if (event == XMLStreamConstants.START_ELEMENT) {
             processStartTag(parser);
         }
       }  // end for
       parser.close();
       
       while (!queue.isEmpty()) {
         URL nextURL = (URL) queue.removeLast();
         spider(nextURL);
       }
      
    }
    catch (Exception ex) {
       // skip this document
    }
    
  }

  public static void main(String[] args) throws Exception {
        
    if (args.length == 0) {
      System.err.println("Usage: java PullSpider url" );
       return;  
    }
        
    PullSpider spider = new PullSpider();
    spider.spider(new URL(args[0]));
        
  } // end main

} // end PullSpider

Output from the PullSpider

$ java -classpath stax.jar:.:bea.jar PullSpider http://www.rddl.org
Spidering http://www.rddl.org
Spidering http://www.rddl.org/natures
Spidering http://www.rddl.org/purposes
Spidering http://www.rddl.org/xrd.css
Spidering http://www.rddl.org/rddl-xhtml.dtd
Spidering http://www.rddl.org/rddl-qname-1.mod
Spidering http://www.rddl.org/rddl-resource-1.mod
Spidering http://www.rddl.org/xhtml-arch-1.mod
Spidering http://www.rddl.org/xhtml-attribs-1.mod
Spidering http://www.rddl.org/xhtml-base-1.mod
Spidering http://www.rddl.org/xhtml-basic-form-1.mod
Spidering http://www.rddl.org/xhtml-basic-table-1.mod
Spidering http://www.rddl.org/xhtml-blkphras-1.mod
Spidering http://www.rddl.org/xhtml-blkstruct-1.mod
Spidering http://www.rddl.org/xhtml-charent-1.mod
Spidering http://www.rddl.org/xhtml-datatypes-1.mod
Spidering http://www.rddl.org/xhtml-framework-1.mod
Spidering http://www.rddl.org/xhtml-hypertext-1.mod
Spidering http://www.rddl.org/xhtml-image-1.mod
Spidering http://www.rddl.org/xhtml-inlphras-1.mod
Spidering http://www.rddl.org/xhtml-inlstruct-1.mod
Spidering http://www.rddl.org/xhtml-lat1.ent
Spidering http://www.rddl.org/xhtml-link-1.mod
Spidering http://www.rddl.org/xhtml-meta-1.mod
Spidering http://www.rddl.org/xhtml-notations-1.mod
Spidering http://www.rddl.org/xhtml-object-1.mod
Spidering http://www.rddl.org/xhtml-param-1.mod
Spidering http://www.rddl.org/xhtml-qname-1.mod
Spidering http://www.rddl.org/xhtml-rddl-model-1.mod
Spidering http://www.rddl.org/xhtml-special.ent
Spidering http://www.rddl.org/xhtml-struct-1.mod
Spidering http://www.rddl.org/xhtml-symbol.ent
Spidering http://www.rddl.org/xhtml-text-1.mod
Spidering http://www.rddl.org/xlink-module-1.mod
Spidering http://www.rddl.org/rddl.rdfs
Spidering http://www.rddl.org/rddl-integration.rxg
Spidering http://www.rddl.org/modules/rddl-1.rxm

Namespaces

Namespace support is turned on by default.
xmlns and xmlns:prefix attributes are not reported as attributes
These methods report the namespace declarations for a START_ELEMENT event (bindings going into scope) or an END_ELEMENT event (bindings going out of scope):
```
public int    getNamespaceCount()
public String getNamespacePrefix(int index)
public String getNamespaceURI(int index)
```
These methods return namespace information for the current element in a START_ELEMENT event or an END_ELEMENT event:
```
public String getNamespaceURI()
public String getPrefix()
```
This method returns a namespace context that can be used to query all the namespaces in scope inside a particular element (START_ELEMENT event or an END_ELEMENT event):
public NamespaceContext getNamespaceContext()
This is only valid until the next call to next() (or nextTag()).

The NamespaceContext Class

package javax.xml.namespace;

public interface NamespaceContext {

  public String   getNamespaceURI(String prefix);
  public String   getPrefix(String namespaceURI);
  public Iterator getPrefixes(String namespaceURI);
  
}

XMLStreamWriter

An event based API for creating XML documents
Minimal well-formedness checking

Instances are created by XMLOutputFactory.createXMLStreamWriter() factory method:


XMLOutputFactory factory = XMLOutputFactory.newInstance();
XMLStreamWriter serializer = XMLOutputFactory.newInstance().createXMLStreamWriter;

package javax.xml.stream;

public interface XMLStreamWriter {
  
  public void writeStartElement(String localName) 
    throws XMLStreamException;
  public void writeStartElement(String namespaceURI, String localName) 
    throws XMLStreamException;
  public void writeStartElement(String prefix,
                                String localName,
                                String namespaceURI) 
    throws XMLStreamException;

  public void writeEmptyElement(String namespaceURI, String localName) 
    throws XMLStreamException;
  public void writeEmptyElement(String prefix, String localName, String namespaceURI) 
    throws XMLStreamException;
  public void writeEmptyElement(String localName) 
    throws XMLStreamException;
    
  public void writeEndElement() 
    throws XMLStreamException;
    
  public void writeEndDocument() 
    throws XMLStreamException;

   public void writeAttribute(String localName, String value) 
    throws XMLStreamException;
  public void writeAttribute(String prefix,
                             String namespaceURI,
                             String localName,
                             String value) 
    throws XMLStreamException;
  public void writeAttribute(String namespaceURI,
                             String localName,
                             String value) 
    throws XMLStreamException;

  public void writeNamespace(String prefix, String namespaceURI) 
    throws XMLStreamException;
  public void writeDefaultNamespace(String namespaceURI)
    throws XMLStreamException;

  public void writeComment(String data) 
    throws XMLStreamException;
  public void writeProcessingInstruction(String target) 
    throws XMLStreamException;
  public void writeProcessingInstruction(String target,
                                         String data) 
    throws XMLStreamException;
  public void writeCData(String data) 
    throws XMLStreamException;
  public void writeDTD(String dtd) 
    throws XMLStreamException;
  public void writeEntityRef(String name) 
    throws XMLStreamException;
  public void writeStartDocument() 
    throws XMLStreamException;
  public void writeStartDocument(String version) 
    throws XMLStreamException;
  public void writeStartDocument(String encoding,
                                 String version) 
    throws XMLStreamException;
  public void writeCharacters(String text) 
    throws XMLStreamException;
    
  public void writeCharacters(char[] text, int start, int len) 
    throws XMLStreamException;

  public String getPrefix(String uri) 
    throws XMLStreamException;
  public void setPrefix(String prefix, String uri) 
    throws XMLStreamException;
  public void setDefaultNamespace(String uri) 
    throws XMLStreamException;
  public void setNamespaceContext(NamespaceContext context)
    throws XMLStreamException;
  public NamespaceContext getNamespaceContext();

  public void close() throws XMLStreamException;
  public void flush() throws XMLStreamException;  
  
  public Object getProperty(java.lang.String name) throws IllegalArgumentException;

}

XMLStreamWriter Example: Convert RDDL to XHTML

Goal: Convert a RDDL document to pure XHTML.
RDDL is just an XHTML Basic document in which there's one extra element: rddl:resource which can appear anywhere a p element can appear, and can contain anything a div element can contain.

The customary rddl prefix is mapped to the http://www.rddl.org/ namespace URL:

<rddl:resource id="rec-xhtml"
        xlink:title="W3C REC XHTML"
        xlink:role="http://www.w3.org/1999/xhtml"
        xlink:arcrole="http://www.rddl.org/purposes#reference"
        xlink:href="http://www.w3.org/tr/xhtml1"
        >
<li><a href="http://www.w3.org/tr/xhtml1">W3C XHTML 1.0</a></li>
</rddl:resource>

The program needs to throw away the <rddl:resource> start-tag and </rddl:resource> end-tag while leaving everything else intact.

Example: RDDLStripper

import javax.xml.stream.*;
import java.net.*;
import java.io.*;

 
public class RDDLStripper {
    
  public final static String RDDL_NS = "http://www.rddl.org/";

  public static void main(String[] args) {
        
    if (args.length == 0) {
      System.err.println("Usage: java RDDLStripper url" );
      return;    
    }
        
    try {      
      InputStream in;
      try {
        URL u = new URL(args[0]);
        in = u.openStream();
      }
      catch (MalformedURLException ex) {
        // Maybe it's a file name
        in = new FileInputStream(args[0]);
      }
      
      XMLStreamReader parser 
       = XMLInputFactory.newInstance().createXMLStreamReader(in);
      XMLStreamWriter serializer 
       = XMLOutputFactory.newInstance().createXMLStreamWriter(System.out);
        
      while (true) {
         int event = parser.next();
         if (parser.isStartElement()) {
             String namespaceURI = parser.getNamespaceURI();
             if (!namespaceURI.equals(RDDL_NS)) {
                 serializer.writeStartElement(namespaceURI, parser.getLocalName());
                 // add attributes
                 for (int i = 0; i < parser.getAttributeCount(); i++) {
                     serializer.writeAttribute(
                       parser.getAttributeNamespace(i),
                       parser.getAttributeName(i),
                       parser.getAttributeValue(i)
                     );
                 }
                 // add namespace declarations
                 for (int i = 0; i < parser.getNamespaceCount(); i++) {
                     String uri = parser.getNamespaceURI(i);
                     if (!RDDL_NS.equals(uri)) {
                       serializer.writeNamespace(parser.getNamespacePrefix(i), uri);
                     }
                 }
             }
         }
         else if (parser.isEndElement()) {
             String namespaceURI = parser.getNamespaceURI();
             if (!namespaceURI.equals(RDDL_NS)) {
                 serializer.writeEndElement();
             }
         }
         else if (event == XMLStreamConstants.CHARACTERS
           || event == XMLStreamConstants.SPACE) {
             serializer.writeCharacters(parser.getText());
         }
         else if (event == XMLStreamConstants.CDATA) {
             serializer.writeCData(parser.getText());
         }
         else if (event == XMLStreamConstants.COMMENT) {
             serializer.writeComment(parser.getText());
         }
         else if (event == XMLStreamConstants.DTD) {
             serializer.writeDTD(parser.getText());
         }
         else if (event == XMLStreamConstants.ENTITY_REFERENCE) {
             serializer.writeEntityRef(parser.getLocalName());
         }
         else if (event == XMLStreamConstants.PROCESSING_INSTRUCTION) {
             serializer.writeProcessingInstruction(parser.getPITarget(), parser.getPIData());
         }
         else if (event == XMLStreamConstants.END_DOCUMENT) {
            serializer.flush();
            break;
         }
      }  
      serializer.close();         
      parser.close();
      
    }
    catch (XMLStreamException ex) {
       System.out.println(ex);  
    }
    catch (IOException ex) {
      System.out.println("IOException while parsing " + args[0]);   
    }
        
  }

}

One of my favorite features

Makes certain kinds of programs really easy:
- Filter out certain kinds of nodes
- Filter out certain tags
- Convert processing instructions to elements
- Comment reader
- Change names of elements
- Add attributes to elements
Changes have to be local to be easy:
- Start-tag changes based on name, namespace, and attributes
- End-tag changes based on name and namespace
- Event changes based on that event only
I don't know whether these programs are realistic patterns or just common tutorial examples

Future Work

Java XQuery API, JSR 225, currently in early draft review
Java XML Encryption API, JSR 106, currently in public review' no progress in last year

To Learn More

This presentation: http://www.cafeconleche.org/slides/sd2007west/java5xml/
Document Object Model (DOM) Level 3 Core Specification Version 1.0: http://www.w3.org/TR/DOM-Level-3-Core
Document Object Model (DOM) Level 3 Load and Save Specification: http://www.w3.org/TR/DOM-Level-3-LS/
Document Object Model (DOM) Requirements: http://www.w3.org/TR/DOM-Requirements/
Java XPath API: http://www-128.ibm.com/developerworks/xml/library/x-javaxpathapi.html
Java Validation API: http://www-128.ibm.com/developerworks/xml/library/x-javaxmlvalidapi.html
XML Digital Signatures API: http://java.sun.com/developer/technicalArticles/xml/dig_signature_api/
StAX: http://www.cafeconleche.org/slides/sd2004west/stax
An introduction to StAX: http://www.xml.com/pub/a/2003/09/17/stax.html

Index | Cafe con Leche