JAXP
DOM 3
Java XPath API
Java Validation API
Java XML Digital Signature API
StAX
SAX
DOM
TrAX
Factory Classes
StAX starting in version 1.4
JAXP 1.0 supported SAX 1.0 and DOM 1.0
JAXP 1.1 supports SAX 2 and DOM 2, bundled with Java 1.4
JAXP 1.2 adds support for W3C XML schemas
JAXP 1.3 adds moves from Crimson to Xerces; adds DOM 3; bundled with Java 5
JAXP 1.4 adds support for StAX; bundled with Java 6
Core Changes
Bootstrapping
Loading/Parsing/Building
Filters
Error Handling
Serialization
of all of the things the W3C has given us, the DOM is probably the one with the least value.
--Michael Brennan on the xml-dev mailing list
Node
UserDataHandler
DOMConfiguration
Document
Text
Element
Attr
Entity
Adds:
I will only show the new members.
Java binding:
package org.w3c.dom;
public interface Node {
public String getBaseURI();
public static final short DOCUMENT_POSITION_DISCONNECTED = 0x01;
public static final short DOCUMENT_POSITION_PRECEDING = 0x02;
public static final short DOCUMENT_POSITION_FOLLOWING = 0x04;
public static final short DOCUMENT_POSITION_CONTAINS = 0x08;
public static final short DOCUMENT_POSITION_IS_CONTAINED = 0x10;
public static final short DOCUMENT_POSITION_IMPLEMENTATION_SPECIFIC = 0x20;
public short compareDocumentPosition(Node other) throws DOMException;
public String getTextContent() throws DOMException;
public void setTextContent(String textContent) throws DOMException;
public void normalize();
public boolean isSameNode(Node other);
public boolean isEqualNode(Node arg);
public String lookupPrefix(String namespaceURI);
public boolean isDefaultNamespace(String namespaceURI);
public String lookupNamespaceURI(String prefix);
public Node getFeature(String feature, String version);
public Object setUserData(String key, Object data, UserDataHandler handler);
public Object getUserData(String key);
}
A user-defined callback class that is invoked when a node is cloned, imported, deleted, adopted, or renamed
package org.w3c.dom;
public interface UserDataHandler {
// OperationType
public static final short NODE_CLONED = 1;
public static final short NODE_IMPORTED = 2;
public static final short NODE_DELETED = 3;
public static final short NODE_RENAMED = 4;
public static final short NODE_ADOPTED = 5;
public void handle(short operation, String key, Object data, Node src, Node dst);
}
Maintains a table of boolean, String, and Object parameters such as canonical-form and error-handler
Makes it possible to change what normalizeDocument()
does by modifying these parameters
Java binding:
package org.w3c.dom;
public interface DOMConfiguration {
public void setParameter(String name, Object value) throws DOMException;
public Object getParameter(String name) throws DOMException;
public boolean canSetParameter(String name, Object value);
public DOMStringList getParameterNames();
}
Standard parameters include:
canonical-form
, default false
, optionalcdata-sections
, default true, requiredCDATASection
nodes in the document
check-character-normalization
, default false, optionalcomments
, default trueComment
nodes in the document.
datatype-normalization
, default false, optionalelement-content-whitespace
, default true, optionalentities
, default true, requiredEntityReference
nodes in the
documenterror-handler
, required (non-boolean)DOMErrorHandler
objectinfoset
, default true, requirednamespaces
, default true, optionalnamespace-declarations
, default true, requiredAttr
nodes in the document.normalize-characters
, default false, optionalschema-location
, optionalschema-type
, String, optionalsplit-cdata-sections
, default true, required]]>
validate
, default false, optionalvalidate-if-schema
well-formed
, default true, optionalImplementations may also define their own custom parameters
Adds:
isElementContentWhiteSpace()
wholeText()
Text
nodes logically
adjacent to this node;
i.e. the XPath value of the text nodeJava binding:
package org.w3c.dom;
public interface Text extends Node {
public boolean isElementContentWhiteSpace();
public String getWholeText();
public Text replaceWholeText(String content) throws DOMException;
}
Adds:
schemaTypeInfo
TypeInfo
object
that provides a name and URI for the element's type,
as given in the document's schema.
setIdAttribute
Java binding:
package org.w3c.dom;
public interface Element extends Node {
public TypeInfo getSchemaTypeInfo();
public void setIdAttribute(String name, boolean isId) throws DOMException;
public void setIdAttributeNS(String namespaceURI, String localName, boolean isId)
throws DOMException;
public void setIdAttributeNode(Attr idAttr, boolean isId) throws DOMException;
}
Adds:
schemaTypeInfo
TypeInfo
object
that provides a name and URI for the attribute's type,
as given in the document's schema.
isId
Java binding:
package org.w3c.dom;
public interface Attr extends Node {
public TypeInfo getSchemaTypeInfo();
public boolean isId();
}
DOM2 has no implementation-independent means to create
a new Document
object
Implementation-dependent methods tend to be fairly complex. For example, in Xerces-J:
DOMImplementation impl = DOMImplementationImpl.getDOMImplementation();
Document fibonacci = impl.createDocument(null, "Fibonacci_Numbers", null);
Still no language-independent means to create
a new Document
object
Does provide an implementation-independent method for Java only:
DOMImplementationRegistry registry = DOMImplementationRegistry.newInstance();
DOMImplementation impl = registry.getDOMImplementation("XML");
package org.w3c.dom.bootstrap;
public class DOMImplementationRegistry {
public final static String PROPERTY =
"org.w3c.dom.DOMImplementationSourceList";
public static DOMImplementationRegistry newInstance()
throws ClassNotFoundException, InstantiationException,
IllegalAccessException;
public DOMImplementation getDOMImplementation(String features)
throws ClassNotFoundException,
InstantiationException, IllegalAccessException, ClassCastException;
public DOMImplementationList getDOMImplementationList(String features)
throws ClassNotFoundException,
InstantiationException, IllegalAccessException, ClassCastException;
public void addSource(DOMImplementationSource s)
throws ClassNotFoundException,
InstantiationException, IllegalAccessException;
}
getDOMImplementation()
returns a
DOMImplementation
object that supports the features given in the argument,
or null if no such implementation can be found.
Request a DOMImplementation
that supports XML DOM Level 1, any version of the traversal module, and DOM Level 2 events:
try {
DOMImplementation impl = DOMImplementationRegistry
.getDOMImplementation("XML 1.0 Traversal Events 2.0");
if (impl != null) {
DocumentType svgDOCTYPE = impl.createDocumentType("svg",
"-//W3C//DTD SVG 1.0//EN",
"http://www.w3.org/TR/2001/REC-SVG-20010904/DTD/svg10.dtd");
Document svgDoc = impl.createDocument(
"http://www.w3.org/2000/svg", "svg", svgDOCTYPE
);
// work with the document...
}
}
catch (Exception ex) {
System.out.println(ex);
}
Be sure to check whether the implementation returned is null before using it. Many installations may not be able to support all the features you ask for.
DOMImplementationRegistry
searches for DOMImplementation classes by reading the value of the
org.w3c.dom.DOMImplementationSourceList
Java system property.
This property should contain a white space separated list of DOMImplementationSource
DOMErrorHandler
DOMLocator
Loading: parsing an existing XML document
to produce a Document
object
Saving: serializing a Document
object
into a file or onto a stream
Completely implementation dependent in DOM2
import org.w3c.dom.*; import org.w3c.dom.ls.*; import org.w3c.dom.bootstrap.*; public class DOM3ParserMaker { public static void main(String[] args) throws ClassNotFoundException, InstantiationException, IllegalAccessException { System.setProperty(DOMImplementationRegistry.PROPERTY, "org.apache.xerces.dom.DOMImplementationSourceImpl"); DOMImplementationRegistry registry = DOMImplementationRegistry.newInstance(); DOMImplementation impl = registry.getDOMImplementation("LS-Load"); if (impl == null) { System.err.println("Coudl not locate a DOM3 Parser"); return; } DOMImplementationLS implls = (DOMImplementationLS) impl; LSParser parser = implls.createLSParser(DOMImplementationLS.MODE_SYNCHRONOUS , null); for (int i = 0; i < args.length; i++) { try { Document d = parser.parseURI(args[i]); } catch (DOMException ex) { System.err.println(ex); } } } }
DOMImplementationLS
DOMImplementation
that provides the factory
methods for creating the objects
required for loading and saving.LSParser
LSInput
InputSource
LSResourceResolver
LSParserFilter
Element
nodes as
they are being processed during the parsing of a document.
like SAX filters.
LSSerializer
LSSerializerFilter
LSLoadEvent
LSProgressEvent
Factory interface to create new
LSParser
and LSSerializer
implementations.
Java Binding:
package org.w3c.dom.ls;
public interface DOMImplementationLS {
public static final short MODE_SYNCHRONOUS = 1;
public static final short MODE_ASYNCHRONOUS = 2;
public LSParser createLSParser(short mode, String schemaType)
throws DOMException;
public LSSerializer createLSSerializer();
public LSInput createLSInput();
public LSOutput createLSOutput();
}
Use the feature "LS" or "LS-Async" to find a
DOMImplementation
object that supports
Load and Save.
Cast the DOMImplementation
object to
DOMImplementationLS
.
System.setProperty(DOMImplementationRegistry.PROPERTY,
"org.apache.xerces.dom.DOMImplementationSourceImpl");
DOMImplementationRegistry registry = DOMImplementationRegistry.newInstance();
DOMImplementation impl = registry.getDOMImplementation("XML 1.0 LS 3.0");
if (impl != null) {
DOMImplementationLS implls = (DOMImplementationLS) impl;
// ...
}
Provides an implementation-independent
API for parsing XML documents to produce a DOM
Document
object.
Instances are built by the
createLSParser()
method in DOMImplementationLS
.
Java Binding:
package org.w3c.dom.ls;
public interface LSParser {
public DOMConfiguration getDomConfig();
public LSParserFilter getFilter();
public void setFilter(LSParserFilter filter);
public boolean getAsync();
public boolean getBusy();
public Document parse(LSInput input) throws DOMException, LSException;
public Document parseURI(String uri) throws DOMException, LSException;
// ACTION_TYPES
public static final short ACTION_APPEND_AS_CHILDREN = 1;
public static final short ACTION_REPLACE_CHILDREN = 2;
public static final short ACTION_INSERT_BEFORE = 3;
public static final short ACTION_INSERT_AFTER = 4;
public static final short ACTION_REPLACE = 5;
public Node parseWithContext(LSInput input, Node contextArg, short action)
throws DOMException, LSException;
public void abort();
}
Like SAX2's InputSource
class,
this interface is an abstraction of all the different things
(streams, files, byte arrays, sockets, URLs, etc.) from which
an XML document can be read.
Java Binding:
package org.w3c.dom.ls;
public interface LSInput {
public Reader getCharacterStream();
public void setCharacterStream(Reader in);
public InputStream getByteStream();
public void setByteStream(InputStream in);
public String getStringData();
public void setStringData(String stringData);
public String getSystemId();
public void setSystemId(String systemId);
public String getPublicId();
public void setPublicId(String publicId);
public String getBaseURI();
public void setBaseURI(String baseURI);
public String getEncoding();
public void setEncoding(String encoding);
public boolean getCertifiedText(); // known to be in NFC
public void setCertifiedText(boolean certifiedText);
}
An abstraction of all the different things (streams, files, byte arrays, sockets, strings, etc.) to which an XML document can be written
Created by DOMIMplementationLS's createLSOutput()
method
Java Binding:
package org.w3c.dom.ls;
public interface LSOutput {
public Writer getCharacterStream();
public void setCharacterStream(java.io.Writer characterStream);
public OutputStream getByteStream();
public void setByteStream(OutputStream byteStream);
public String getSystemId();
public void setSystemId(String systemId);
public String getEncoding();
public void setEncoding(String encoding);
}
Like SAX2's EntityResolver
interface,
this interface lets applications redirect references to external entities.
Java Binding:
package org.w3c.dom.ls;
public interface LSResourceResolver {
public LSInput resolveResource(String type, String namespaceURI,
String publicID, String systemID, String baseURI);
}
Provides an API for serializing (writing) a DOM document out as a sequence of bytes onto a stream, file, socket, byte array, etc.
Java Binding:
package org.w3c.dom.ls;
public interface LSSerializer {
public DOMConfiguration getDomConfig();
public String getNewLine();
public void setNewLine(String newLine);
public LSSerializerFilter getFilter();
public void setFilter(LSSerializerFilter filter);
public boolean write(Node nodeArg, LSOutput destination) throws LSException;
public boolean writeToURI(Node nodeArg, String uri) throws LSException;
public String writeToString(Node node) throws DOMException, LSException;
}
import java.math.*; import java.io.*; import org.w3c.dom.*; import org.w3c.dom.bootstrap.*; import org.w3c.dom.ls.*; public class FibonacciDOM3 { public static void main(String[] args) throws Exception { System.setProperty(DOMImplementationRegistry.PROPERTY, "org.apache.xerces.dom.DOMImplementationSourceImpl"); DOMImplementationRegistry registry = DOMImplementationRegistry.newInstance(); DOMImplementation impl = registry.getDOMImplementation("XML 1.0 LS"); if (impl == null) { System.err.println("Oops! Couln't find DOM3 implementation"); return; } Document fibonacci = impl.createDocument(null, "Fibonacci_Numbers", null ); BigInteger low = BigInteger.ZERO; BigInteger high = BigInteger.ONE; Element root = fibonacci.getDocumentElement(); for (int i = 0; i <= 25; i++) { Element number = fibonacci.createElement("fibonacci"); number.setAttribute("index", Integer.toString(i)); Text text = fibonacci.createTextNode(low.toString()); number.appendChild(text); root.appendChild(number); BigInteger temp = high; high = high.add(low); low = temp; } // Now that the document is created we need to *serialize* it DOMImplementationLS implls = (DOMImplementationLS) impl; LSSerializer serializer = implls.createLSSerializer(); LSOutput output = implls.createLSOutput(); output.setByteStream(new FileOutputStream("fibonacci_dom.xml")); serializer.write(fibonacci, output); } }
Lets applications examine nodes as they are being constructed during a parse.
As each node is examined, it may be modified or removed, or parsing may be aborted.
Java Binding:
package org.w3c.dom.ls;
public interface LSParserFilter {
// Constants returned by startElement and acceptNode
public static final short FILTER_ACCEPT = 1;
public static final short FILTER_REJECT = 2;
public static final short FILTER_SKIP = 3;
public static final short FILTER_INTERRUPT = 4;
public short startElement(Element element);
public short acceptNode(Node node);
public int getWhatToShow();
}
Lets applications examine nodes as they are being output.
As each element is examined, it may be modified or removed, or output may be aborted.
Java Binding:
package org.w3c.dom.ls;
public interface LSSerializerFilter extends NodeFilter {
public int getWhatToShow();
}
Query without detailed DOM navigation
e.g. //book[author="Neal Stephenson"]/title
to find all books by Neal Stephenson
XPath is not Turing complete but Java is.
javax.xml.xpath package
XPath 1.0. No XPath 2.
XQuery API is under development.
import java.io.IOException;
import org.w3c.dom.*;
import org.xml.sax.SAXException;
import javax.xml.parsers.*;
import javax.xml.xpath.*;
public class XPathExample {
public static void main(String[] args)
throws ParserConfigurationException, SAXException,
IOException, XPathExpressionException {
// 1. Parse a document with JAXP
DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();
domFactory.setNamespaceAware(true); // never forget this!
DocumentBuilder builder = domFactory.newDocumentBuilder();
Document doc = builder.parse("books.xml");
// 2. Compile the expression
XPathFactory factory = XPathFactory.newInstance();
XPath xpath = factory.newXPath();
XPathExpression expr
= xpath.compile("//book[author='Neal Stephenson']/title/text()");
// 3. Make the query
Object result = expr.evaluate(doc, XPathConstants.NODESET);
// 4. Get the result.
NodeList nodes = (NodeList) result;
for (int i = 0; i < nodes.getLength(); i++) {
System.out.println(nodes.item(i).getNodeValue());
}
}
}
evaluate()
is declared to return Object
.
Actual return type depends on the result of the XPath expression:
number maps to a java.lang.Double
string maps to a java.lang.String
boolean maps to a java.lang.Boolean
node-set maps to an org.w3c.dom.NodeList
Second argument specifies the return type you want
XPathConstants.NODESET
XPathConstants.BOOLEAN
XPathConstants.NUMBER
XPathConstants.STRING
XPathConstants.NODE
If the requested conversion can't be made, evaluate()
throws an XPathException
.
//pre:book[pre:author='Neal Stephenson']/pre:title/text()
XSLT uses namespaces in scope to resolve namespace prefixes in XPath expressions.
But Java is not XML. There are no namespaces in scope in a Java program.
How to solve this?
Supply a NamespaceContext
object to evaluate()
that can resolve the prefixes:
package javax.xml.namespace;
public interface NamespaceContext {
String getNamespaceURI(String prefix);
String getPrefix(String namespaceURI);
Iterator getPrefixes(String namespaceURI);
}
Java 7/JAXP 1.5 should add a standard implementation of this interface.
import java.util.Iterator;
import javax.xml.*;
import javax.xml.namespace.NamespaceContext;
public class PersonalNamespaceContext implements NamespaceContext {
public String getNamespaceURI(String prefix) {
if (prefix == null) throw new NullPointerException("Null prefix");
else if ("pre".equals(prefix)) return "http://www.example.org/books";
else if ("xml".equals(prefix)) return XMLConstants.XML_NS_URI;
return XMLConstants.NULL_NS_URI;
}
// This method isn't necessary for XPath processing.
public String getPrefix(String uri) {
throw new UnsupportedOperationException();
}
// This method isn't necessary for XPath processing either.
public Iterator getPrefixes(String uri) {
throw new UnsupportedOperationException();
}
}
XPathFactory factory = XPathFactory.newInstance();
XPath xpath = factory.newXPath();
xpath.setNamespaceContext(new PersonalNamespaceContext());
XPathExpression expr = xpath.compile(
"//pre:book[pre:author='Neal Stephenson']/pre:title/text()"
);
Object result = expr.evaluate(doc, XPathConstants.NODESET);
NodeList nodes = (NodeList) result;
for (int i = 0; i < nodes.getLength(); i++) {
System.out.println(nodes.item(i).getNodeValue());
}
Do what's inconvenient with XPath
Implement the javax.xml.xpath.XPathFunction
interface.
public Object evaluate(List args) throws XPathFunctionException
Return one of these five types:
java.lang.String
java.lang.Double
java.lang.Boolean
org.w3c.dom.Nodelist
org.w3c.dom.Node
import java.util.List;
import javax.xml.xpath.*;
import org.w3c.dom.*;
public class ISBNValidator implements XPathFunction {
public Object evaluate(List args) throws XPathFunctionException {
if (args.size() != 1) {
throw new XPathFunctionException(
"Wrong number of arguments to valid-isbn()");
}
String isbn;
Object o = args.get(0);
// perform conversions
if (o instanceof String) isbn = (String) args.get(0);
else if (o instanceof Boolean) isbn = o.toString();
else if (o instanceof Double) isbn = o.toString();
else if (o instanceof NodeList) {
NodeList list = (NodeList) o;
Node node = list.item(0);
// getTextContent is available in Java 5 and DOM 3.
// In Java 1.4 and DOM 2, you'd need to recursively
// accumulate the content.
isbn= node.getTextContent();
}
else {
throw new XPathFunctionException("Could not convert argument type");
}
char[] data = isbn.toCharArray();
if (data.length != 10) return Boolean.FALSE;
int checksum = 0;
for (int i = 0; i < 9; i++) {
checksum += (i+1) * (data[i]-'0');
}
int checkdigit = checksum % 11;
if (checkdigit + '0' == data[9]
|| (data[9] == 'X && checkdigit == 10)) {
return Boolean.TRUE;
}
return Boolean.FALSE;
}
}
Validation. augmentation, and type information
javax.xml.validation
Schema language independent
JDK bundles W3C schema support
3rd party Libraries add other languages such as RELAX NG
import java.io.*;
import javax.xml.transform.Source;
import javax.xml.transform.stream.StreamSource;
import javax.xml.validation.*;
import org.xml.sax.SAXException;
public class DocbookXSDCheck {
public static void main(String[] args) throws SAXException, IOException {
// 1. Lookup a factory for the W3C XML Schema language
SchemaFactory factory =
SchemaFactory.newInstance("http://www.w3.org/2001/XMLSchema");
// 2. Compile the schema.
// Here the schema is loaded from a java.io.File, but you could use
// a java.net.URL or a javax.xml.transform.Source instead.
File schemaLocation = new File("/opt/xml/docbook/xsd/docbook.xsd");
Schema schema = factory.newSchema(schemaLocation);
// 3. Get a validator from the schema.
Validator validator = schema.newValidator();
// 4. Parse the document you want to check.
Source source = new StreamSource(args[0]);
// 5. Check the document
try {
validator.validate(source);
System.out.println(args[0] + " is valid.");
}
catch (SAXException ex) {
System.out.println(args[0] + " is not valid because ");
System.out.println(ex.getMessage());
}
}
}
SchemaFactory factory = SchemaFactory.newInstance("http://www.w3.org/2001/XMLSchema");
Schema schema = factory.newSchema();
XMLConstants.W3C_XML_SCHEMA_NS_URI: http://www.w3.org/2001/XMLSchema
XMLConstants.RELAXNG_NS_URI: http://relaxng.org/ns/structure/1.0
XMLConstants.XML_DTD_NS_URI: http://www.w3.org/TR/REC-xml
Other languages can be added with libraries
Only W3C schema supported out-of-the-box in JDK 5/6
Adding information to the document from the schema such as default attribute values.
Usually a bad idea.
import java.io.*;
import javax.xml.transform.dom.*;
import javax.xml.validation.*;
import javax.xml.parsers.*;
import org.w3c.dom.*;
import org.xml.sax.SAXException;
public class DocbookXSDAugmenter {
public static void main(String[] args)
throws SAXException, IOException, ParserConfigurationException {
SchemaFactory factory
= SchemaFactory.newInstance("http://www.w3.org/2001/XMLSchema");
File schemaLocation = new File("/opt/xml/docbook/xsd/docbook.xsd");
Schema schema = factory.newSchema(schemaLocation);
Validator validator = schema.newValidator();
DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();
domFactory.setNamespaceAware(true); // never forget this
DocumentBuilder builder = domFactory.newDocumentBuilder();
Document doc = builder.parse(new File(args[0]));
DOMSource source = new DOMSource(doc);
DOMResult result = new DOMResult();
try {
validator.validate(source, result);
Document augmented = (Document) result.getNode();
// do whatever you need to do with the augmented document...
}
catch (SAXException ex) {
System.out.println(args[0] + " is not valid because ");
System.out.println(ex.getMessage());
}
}
}
DOM 3 TypeInfo
tells you what the type is
The Schema
object provides a ValidatorHandler
that implements SAX's ContentHandler
interface that you install in a SAX parser.
You also install your own ContentHandler
in the ValidatorHandler
(not the parser).
The ValidatorHandler
makes available a TypeInfoProvider
that your ContentHandler
can call to determine the type of the current element or one of its attributes.
package org.w3c.dom;
public interface TypeInfo {
public String getTypeName();
public String getTypeNamespace();
public boolean isDerivedFrom(String typeNamespace,
String typeName, int derivationMethod);
public static int DERIVATION_EXTENSION;
public static int DERIVATION_LIST;
public static int DERIVATION_RESTRICTION;
public static int DERIVATION_UNION;
}
package javax.xml.validation;
public abstract class TypeInfoProvider {
public abstract TypeInfo getElementTypeInfo();
public abstract TypeInfo getAttributeTypeInfo(int index);
public abstract boolean isIdAttribute(int index);
public abstract boolean isSpecified(int index);
}
import java.io.*;
import javax.xml.validation.*;
import org.xml.sax.*;
import org.xml.sax.helpers.*;
public class TypeLister extends DefaultHandler {
private TypeInfoProvider provider;
public TypeLister(TypeInfoProvider provider) {
this.provider = provider;
}
public static void main(String[] args) throws SAXException, IOException {
SchemaFactory factory
= SchemaFactory.newInstance("http://www.w3.org/2001/XMLSchema");
File schemaLocation = new File("/opt/xml/docbook/xsd/docbook.xsd");
Schema schema = factory.newSchema(schemaLocation);
ValidatorHandler vHandler = schema.newValidatorHandler();
TypeInfoProvider provider = vHandler.getTypeInfoProvider();
ContentHandler cHandler = new TypeLister(provider);
vHandler.setContentHandler(cHandler);
XMLReader parser = XMLReaderFactory.createXMLReader();
parser.setContentHandler(vHandler);
parser.parse(args[0]);
}
public void startElement(String namespace, String localName,
String qualifiedName, Attributes atts) throws SAXException {
String type = provider.getElementTypeInfo().getTypeName();
System.out.println(qualifiedName + ": " + type);
}
}
book: #AnonType_book title: #AnonType_title subtitle: #AnonType_subtitle info: #AnonType_info copyright: #AnonType_copyright year: #AnonType_year holder: #AnonType_holder author: #AnonType_author personname: #AnonType_personname firstname: #AnonType_firstname othername: #AnonType_othername surname: #AnonType_surname personblurb: #AnonType_personblurb para: #AnonType_para link: #AnonType_link
W3C/IETF Joint Proposed Recommendation, August 20, 2001
XML Signatures provide:
Integrity
Message authentication
Signer authentication
For data of any type
Signed data can be located within the XML that includes the signature or elsewhere.
An enveloped signature is enclosed inside the XML element it signs
An enveloping signature signs XML data it contains.
A detached signature signs
data external to the Signature
element,
possibly in another document entirely.
The signature processor calculates a hash code for some data using a strong, one-way hash function.
The processor encrypts the hash code using a private key.
The verifier calculates the hash code for the data it's received.
It then decrypts the encrypted hash code using the public key to see if the hash codes match.
The signature processor digests (calculates the hash code for) a data object.
The processor places the digest value
in a Signature
element.
The processor digests the Signature
element.
The processor cryptographically signs
the Signature
element.
<?xml version='1.0' encoding='UTF-8'?>
<Signature xmlns="http://www.w3.org/2000/09/xmldsig#">
<SignedInfo>
<CanonicalizationMethod Algorithm="http://www.w3.org/TR/2000/WD-xml-c14n-20000119"/>
<SignatureMethod Algorithm="http://www.w3.org/2000/09/xmldsig#dsa-sha1"/>
<Reference URI="http://www.cafeconleche.org/slides/hoffman/fundamentals/examples/hotcop.xml">
<DigestMethod Algorithm="http://www.w3.org/2000/09/xmldsig#sha1"/>
<DigestValue>nvfYilfgN/rICyzhGmjidKCFoC8=</DigestValue>
</Reference>
</SignedInfo>
<SignatureValue>
hfowa4qdbuMkoZfX1/VXd4UBpIpZMM5+6CElmY7jOIKFqvXq5A5VKw==
</SignatureValue>
<KeyInfo>
<KeyValue>
<DSAKeyValue>
<P>
/X9TgR11EilS30qcLuzk5/YRt1I870QAwx4/gLZRJmlFXUAiUftZPY1Y+r/F9bow9s
ubVWzXgTuAHTRv8mZgt2uZUKWkn5/oBHsQIsJPu6nX/rfGG/g7V+fGqKYVDwT7g/bT
xR7DAjVUE1oWkTL2dfOuK2HXKu/yIgMZndFIAcc=
</P>
<Q>l2BQjxUjC8yykrmCouuEC/BYHPU=</Q>
<G>
9+GghdabPd7LvKtcNrhXuXmUr7v6OuqC+VdMCz0HgmdRWVeOutRZT+ZxBxCBgLRJFn
Ej6EwoFhO3zwkyjMim4TwWeotUfI0o4KOuHiuzpnWRbqN/C/ohNWLx+2J6ASQ7zKTx
vqhRkImog9/hWuWfBpKLZl6Ae1UlZAFMO/7PSSo=
</G>
<Y>
6jKpNnmkkWeArsn5Oeeg2njcz+nXdk0f9kZI892ddlR8Lg1aMhPeFTYuoq3I6neFlb
BjWzuktNZKiXYBfKsSTB8U09dTiJo2ir3HJuY7eW/p89osKMfixPQsp9vQMgzph6Qa
lY7j4MB7y5ROJYsTr1/fFwmj/yhkHwpbpzed1LE=
</Y>
</DSAKeyValue>
</KeyValue>
<X509Data>
<X509IssuerSerial>
<X509IssuerName>CN=Elliotte Rusty Harold, OU=Metrotech, O=Polytechnic, L=Brooklyn, ST=New York, C=US</X509IssuerName>
<X509SerialNumber>983556890</X509SerialNumber></X509IssuerSerial>
<X509SubjectName>CN=Elliotte Rusty Harold, OU=Metrotech, O=Polytechnic, L=Brooklyn, ST=New York, C=US</X509SubjectName>
<X509Certificate>
MIIDLzCCAu0CBDqf4xowCwYHKoZIzjgEAwUAMH0xCzAJBgNVBAYTAlVTMREwDwYDVQQIEwhOZXcg
WW9yazERMA8GA1UEBxMIQnJvb2tseW4xFDASBgNVBAoTC1BvbHl0ZWNobmljMRIwEAYDVQQLEwlN
ZXRyb3RlY2gxHjAcBgNVBAMTFUVsbGlvdHRlIFJ1c3R5IEhhcm9sZDAeFw0wMTAzMDIxODE0NTBa
Fw0wMTA1MzExODE0NTBaMH0xCzAJBgNVBAYTAlVTMREwDwYDVQQIEwhOZXcgWW9yazERMA8GA1UE
BxMIQnJvb2tseW4xFDASBgNVBAoTC1BvbHl0ZWNobmljMRIwEAYDVQQLEwlNZXRyb3RlY2gxHjAc
BgNVBAMTFUVsbGlvdHRlIFJ1c3R5IEhhcm9sZDCCAbgwggEsBgcqhkjOOAQBMIIBHwKBgQD9f1OB
HXUSKVLfSpwu7OTn9hG3UjzvRADDHj+AtlEmaUVdQCJR+1k9jVj6v8X1ujD2y5tVbNeBO4AdNG/y
ZmC3a5lQpaSfn+gEexAiwk+7qdf+t8Yb+DtX58aophUPBPuD9tPFHsMCNVQTWhaRMvZ1864rYdcq
7/IiAxmd0UgBxwIVAJdgUI8VIwvMspK5gqLrhAvwWBz1AoGBAPfhoIXWmz3ey7yrXDa4V7l5lK+7
+jrqgvlXTAs9B4JnUVlXjrrUWU/mcQcQgYC0SRZxI+hMKBYTt88JMozIpuE8FnqLVHyNKOCjrh4r
s6Z1kW6jfwv6ITVi8ftiegEkO8yk8b6oUZCJqIPf4VrlnwaSi2ZegHtVJWQBTDv+z0kqA4GFAAKB
gQDqMqk2eaSRZ4Cuyfk556DaeNzP6dd2TR/2Rkjz3Z12VHwuDVoyE94VNi6ircjqd4WVsGNbO6S0
1kqJdgF8qxJMHxTT11OImjaKvccm5jt5b+nz2iwox+LE9Cyn29AyDOmHpBqVjuPgwHvLlE4lixOv
X98XCaP/KGQfClunN53UsTALBgcqhkjOOAQDBQADLwAwLAIUODqxsFzS96BjrVA4LVo5FzuWBRMC
FC0xfXxbaJaCJuVqtcBv4bqwV0EX
</X509Certificate>
</X509Data>
</KeyInfo>
</Signature>
XMLSignatureFactory factory = XMLSignatureFactory.getInstance("DOM");
DigestMethod sha1 = factory.newDigestMethod(DigestMethod.SHA1, null);
CanonicalizationMethod inclusive = factory.newCanonicalizationMethod
(CanonicalizationMethod.INCLUSIVE, (C14NMethodParameterSpec) null);
SignatureMethod rsasha1
= factory.newSignatureMethod(SignatureMethod.RSA_SHA1, null);
Transform enveloped = factory.newTransform
(Transform.ENVELOPED, (TransformParameterSpec) null));
List transforms = Collections.singletonList(enveloped);
// empty string means sign the current, complete document
Reference ref = factory.newReference("", sha1, transforms);
List references = Collections.singletonList(ref);
SignedInfo signer = factory.newSignedInfo(inclusive, rsasha1, references);
char[] password = "secret".toCharArray();
KeyStore store = KeyStore.getInstance("JKS");
InputStream keys = new FileInputStream("keys.jks");
store.load(keys, password);
KeyStore.PrivateKeyEntry entry = (KeyStore.PrivateKeyEntry) store.getEntry
("theKey", new KeyStore.PasswordProtection(password));
X509Certificate cert = (X509Certificate) entry.getCertificate();
KeyInfoFactory keyFactory = factory.getKeyInfoFactory();
List certs = new ArrayList();
certs.add(cert.getSubjectX500Principal().getName());
certs.add(cert);
X509Data data = keyFactory.newX509Data(certs);
List dataList = Collections.singletonList(data);
KeyInfo key = keyFactory.newKeyInfo(dataList);
Document doc = getDOMDocument( /* wherever you like */ );
DOMSignContext context
= new DOMSignContext(entry.getPrivateKey(), doc.getDocumentElement());
XMLSignature signature = factory.newXMLSignature(signer, key);
signature.sign(context);
// The Signature element has now been added to the Document.
NodeList nodes
= doc.getElementsByTagNameNS(XMLSignature.XMLNS, "Signature");
DOMValidateContext dvc
= new DOMValidateContext(new X509KeySelector(), nodes.item(0));
XMLSignature signature = factory.unmarshalXMLSignature(dvc);
if (signature.validate(dvc)) {
System.err.println(
"Signature failed! Document may have been tampered with.");
}
Push: SAX, XNI
Tree: DOM, JDOM, XOM, ElectricXML, dom4j, Sparta
Data binding: Castor, Zeus, JAXB
Pull: XMLPULL, StAX, NekoPull
Transform: XSLT, TrAX, XQuery
pull parsing is the way to go in the future. The first 3 XML parsers (Lark, NXP, and expat) all were event-driven because... er well that was 1996, can't exactly remember, seemed like a good idea at the time.
--Tim Bray on the xml-dev mailing list, Wednesday, September 18, 2002
Fast
Memory efficient
Streamable
Read-only
Streaming API for XML
javax.xml.stream.
JSR-173, proposed by BEA Systems:
Two recently proposed JSRs, JAXB and JAX-RPC, highlight the need for an XML Streaming API. Both data binding and remote procedure calling (RPC) require processing of XML as a stream of events, where the current context of the XML defines subsequent processing of the XML. A streaming API makes this type of code much more natural to write than SAX, and much more efficient than DOM.
Goals:
Develop APIs and conventions that allow a user to programmatically pull parse events from an XML input stream.
Develop APIs that allow a user to write events to an XML output stream.
Develop a set of objects and interfaces that encapsulate the information contained in an XML stream.
The specification should be easy to use, efficient, and not require a grammar. It should include support for namespaces, and associated XML constructs. The specification will make reasonable efforts to define APIs that are "pluggable".
XMLStreamReader
:XMLInputFactory
:XMLStreamReader
XMLStreamException
:IOException
that might go wrong when parsing an
XML document, particularly well-formedness errors import javax.xml.stream.*;
import java.net.*;
import java.io.*;
public class StAXChecker {
public static void main(String[] args) {
if (args.length == 0) {
System.err.println("Usage: java StAXChecker url" );
return;
}
try {
InputStream in;
try {
URL u = new URL(args[0]);
in = u.openStream();
}
catch (MalformedURLException ex) {
// Maybe it's a file name
in = new FileInputStream(args[0]);
}
XMLInputFactory factory = XMLInputFactory.newInstance();
XMLStreamReader parser = factory.createXMLStreamReader(in);
while (true) {
int event = parser.next();
if (event == XMLStreamConstants.END_DOCUMENT) {
parser.close();
break;
}
}
parser.close();
// If we get here there are no exceptions
System.out.println(args[0] + " is well-formed");
}
catch (XMLStreamException ex) {
System.out.println(args[0] + " is not well-formed");
System.out.println(ex);
}
catch (IOException ex) {
System.out.println(args[0] + " could not be checked due to an "
+ ex.getClass().getName());
ex.printStackTrace();
}
}
}
$ java -classpath stax.jar:.:bea.jar StAXChecker http://www.cafeconleche.org/ http://www.cafeconleche.org/ is well-formed $ java -classpath stax.jar:.:bea.jar StAXChecker http://www.xml.com/ http://www.xml.com/ is not well-formed javax.xml.stream.XMLStreamException: ParseError at [row,col]:[44,7] Message: could not resolve entity named 'nbsp'
The XMLStreamConstants
interface defines the
int
event codes returned by next()
to tell you what kind of node the parser read.
17 event codes:
XMLStreamConstants.START_DOCUMENT
XMLStreamConstants.END_DOCUMENT
XMLStreamConstants.START_ENTITY
XMLStreamConstants.END_ENTITY
XMLStreamConstants.START_ELEMENT
XMLStreamConstants.END_ELEMENT
XMLStreamConstants.ATTRIBUTE
XMLStreamConstants.CHARACTERS
XMLStreamConstants.CDATA
XMLStreamConstants.SPACE
XMLStreamConstants.PROCESSING_INSTRUCTION
XMLStreamConstants.COMMENT
XMLStreamConstants.ENTITY_REFERENCE
XMLStreamConstants.NOTATION_DECLARATION
XMLStreamConstants.ENTITY_DECLARATION
XMLStreamConstants.NAMESPACE
XMLStreamConstants.DTD
Depending on what the event is, different methods are available on the XMLStreamReader
import javax.xml.stream.*;
import java.net.*;
import java.io.*;
public class EventLister {
public static void main(String[] args) {
if (args.length == 0) {
System.err.println("Usage: java EventLister url" );
return;
}
try {
InputStream in;
try {
URL u = new URL(args[0]);
in = u.openStream();
}
catch (MalformedURLException ex) {
// Maybe it's a file name
in = new FileInputStream(args[0]);
}
XMLInputFactory factory = XMLInputFactory.newInstance();
XMLStreamReader parser = factory.createXMLStreamReader(in);
while (true) {
int event = parser.next();
if (event == XMLStreamConstants.START_ELEMENT) {
System.out.println("Start tag");
}
else if (event == XMLStreamConstants.END_ELEMENT) {
System.out.println("End tag");
}
else if (event == XMLStreamConstants.START_DOCUMENT) {
System.out.println("Start document");
}
else if (event == XMLStreamConstants.CHARACTERS) {
System.out.println("Text");
}
else if (event == XMLStreamConstants.CDATA) {
System.out.println("CDATA Section");
}
else if (event == XMLStreamConstants.COMMENT) {
System.out.println("Comment");
}
else if (event == XMLStreamConstants.DTD) {
System.out.println("Document type declaration");
}
else if (event == XMLStreamConstants.ENTITY_REFERENCE) {
System.out.println("Entity Reference");
}
else if (event == XMLStreamConstants.START_ENTITY) {
System.out.println("Entity Reference");
}
else if (event == XMLStreamConstants.END_ENTITY) {
System.out.println("Entity Reference");
}
else if (event == XMLStreamConstants.SPACE) {
System.out.println("Ignorable white space");
}
else if (event == XMLStreamConstants.NOTATION_DECLARATION) {
System.out.println("Notation Declaration");
}
else if (event == XMLStreamConstants.ENTITY_DECLARATION) {
System.out.println("Entity Declaration");
}
else if (event == XMLStreamConstants.PROCESSING_INSTRUCTION) {
System.out.println("Processing Instruction");
}
else if (event == XMLStreamConstants.END_DOCUMENT) {
System.out.println("End Document");
break;
}
}
}
catch (XMLStreamException ex) {
System.out.println(ex);
}
catch (IOException ex) {
System.out.println("IOException while parsing " + args[0]);
}
}
}
% java -classpath stax.jar:.:bea.jar EventLister hotcop.xml Ignorable white space Start tag Text Start tag Text End tag Text Start tag Text End tag Text Start tag Text End tag Text Start tag Text End tag Text Start tag Text End tag Text Start tag Text End tag Text Start tag Text End tag Text Start tag Text End tag Text Start tag Text End tag Text End tag Ignorable white space End Document
Invokable methods | |
---|---|
Event Type | Valid Methods |
START_ELEMENT | next(), getName(), getLocalName(), hasName(), getPrefix(), getAttributeCount(), getAttributeName(int index), getAttributeNamespace(int index), getAttributePrefix(int index), getAttributeQName(int index), getAttributeType(int index), getAttributeValue(int index), getAttributeValue(String namespaceURI, String localName), isAttributeSpecified(), getNamespaceContext(), getNamespaceCount(), getNamespacePrefix(int index), getNamespaceURI(), getNamespaceURI(int index), getNamespaceURI(String prefix), getElementText(), nextTag() |
ATTRIBUTE | next(), nextTag(), getAttributeCount(), getAttributeName(int index), getAttributeNamespace(int index), getAttributePrefix(int index), getAttributeQName(int index), getAttributeType(int index), getAttributeValue(int index), getAttributeValue(String namespaceURI, String localName), isAttributeSpecified() |
NAMESPACE | next(), nextTag(), getNamespaceContext(), getNamespaceCount(), getNamespacePrefix(int index), getNamespaceURI(), getNamespaceURI(int index), getNamespaceURI(String prefix) |
END_ELEMENT | next(), getName(), getLocalName(), hasName(), getPrefix(), getNamespaceContext(), getNamespaceCount(), getNamespacePrefix(int index), getNamespaceURI(), getNamespaceURI(int index), getNamespaceURI(String prefix), nextTag() |
CHARACTERS | next(), getText(), getTextCharacters(), getTextCharacters(int sourceStart, char[] target, int targetStart, int length), getTextLength(), nextTag() |
CDATA | next(), getText(), getTextCharacters(), getTextCharacters(int sourceStart, char[] target, int targetStart, int length), getTextLength(), nextTag() |
COMMENT | next(), getText(), getTextCharacters(), getTextCharacters(int sourceStart, char[] target, int targetStart, int length), getTextLength(), nextTag() |
SPACE | next(), getText(), getTextCharacters(), getTextCharacters(int sourceStart, char[] target, int targetStart, int length), getTextLength(), nextTag() |
START_DOCUMENT | next(), getEncoding(), next(), getPrefix(), getVersion(), isStandalone(), standaloneSet(), getCharacterEncodingScheme(), nextTag() |
END_DOCUMENT | close() |
PROCESSING_INSTRUCTION | next(), getPITarget(), getPIData(), nextTag() |
ENTITY_REFERENCE | next(), getLocalName(), getText(), nextTag() |
DTD | next(), getText(), nextTag() |
The getText()
method returns the text of the current event:
public String getText()
It works for CHARACTERS, SPACE, ENTITY_REFERENCE, DTD, and COMMENT events.
Not guaranteed to return a maximum contiguous run of text unless javax.xml.stream.isCoalescing is true
For all other event types it throws an
java.lang.IllegalStateException
import javax.xml.stream.*;
import java.net.*;
import java.io.*;
public class EventText {
public static void main(String[] args) {
if (args.length == 0) {
System.err.println("Usage: java EventText url" );
return;
}
try {
InputStream in;
try {
URL u = new URL(args[0]);
in = u.openStream();
}
catch (MalformedURLException ex) {
// Maybe it's a file name
in = new FileInputStream(args[0]);
}
XMLInputFactory factory = XMLInputFactory.newInstance();
XMLStreamReader parser = factory.createXMLStreamReader(in);
for (int event = parser.next();
event != XMLStreamConstants.END_DOCUMENT;
event = parser.next()) {
if (event == XMLStreamConstants.CHARACTERS
|| event == XMLStreamConstants.SPACE
|| event == XMLStreamConstants.CDATA) {
System.out.println(parser.getText());
}
else if (event == XMLStreamConstants.COMMENT) {
System.out.println("<!-- " + parser.getText() + "-->");
}
}
}
catch (XMLStreamException ex) {
System.out.println(ex);
}
catch (IOException ex) {
System.out.println("IOException while parsing " + args[0]);
}
}
}
$ java -classpath stax.jar:.:bea.jar EventText hotcop.xml Hot Cop Jacques Morali Henri Belolo Victor Willis Jacques Morali PolyGram Records 6:20 1978 Village People
Rather than testing for type, it's sometimes useful to ask if the current event can be queried for a certain characteristic:
public boolean isStartElement()
public boolean isEndElement()
public boolean isCharacters()
public boolean isWhiteSpace()
public boolean hasText()
public boolean hasName()
public boolean hasNext()
import javax.xml.stream.*;
import java.net.*;
import java.io.*;
public class SimplerEventText {
public static void main(String[] args) {
if (args.length == 0) {
System.err.println("Usage: java SimplerEventText url" );
return;
}
try {
InputStream in;
try {
URL u = new URL(args[0]);
in = u.openStream();
}
catch (MalformedURLException ex) {
// Maybe it's a file name
in = new FileInputStream(args[0]);
}
XMLInputFactory factory = XMLInputFactory.newInstance();
XMLStreamReader parser = factory.createXMLStreamReader(in);
for (int event = parser.next();
parser.hasNext();
event = parser.next()) {
if (parser.hasText()) {
System.out.println(parser.getText());
}
}
}
catch (XMLStreamException ex) {
System.out.println(ex);
}
catch (IOException ex) {
System.out.println("IOException while parsing " + args[0]);
}
}
}
public char[] getTextCharacters()
public int getTextStart()
public int getTextLength()
The char array returned may be reused, and is good only until the
next call to next()
import javax.xml.stream.*;
import java.net.*;
import java.io.*;
public class EfficientEventText {
public static void main(String[] args) {
if (args.length == 0) {
System.err.println("Usage: java EfficientEventText url" );
return;
}
try {
InputStream in;
try {
URL u = new URL(args[0]);
in = u.openStream();
}
catch (MalformedURLException ex) {
// Maybe it's a file name
in = new FileInputStream(args[0]);
}
XMLInputFactory factory = XMLInputFactory.newInstance();
XMLStreamReader parser = factory.createXMLStreamReader(in);
Writer out = new OutputStreamWriter(System.out);
for (int event = parser.next();
event != XMLStreamConstants.END_DOCUMENT;
event = parser.next()) {
if (event == XMLStreamConstants.CHARACTERS
|| event == XMLStreamConstants.SPACE
|| event == XMLStreamConstants.CDATA) {
out.write(parser.getTextCharacters(),
parser.getTextStart(), parser.getTextLength());
}
}
out.flush();
out.close();
}
catch (XMLStreamException ex) {
System.out.println(ex);
}
catch (IOException ex) {
System.out.println("IOException while parsing " + args[0]);
}
}
}
public int getTextCharacters(int sourceStart, char[] target, int targetStart, int length)
throws XMLStreamException, IndexOutOfBoundsException,
UnsupportedOperationException, IllegalStateException
If the event is START_ELEMENT or END_ELEMENT, then the following methods
in XMLStreamReader
also work:
public String getLocalName()
public String getPrefix()
public QName getName()
getLocalName()
returns the local (unprefixed) name of the element
getQName()
returns a QName
object for this element
getPrefix()
returns the prefix of the element, or null if the element does not have a prefix
import javax.xml.namespace.*;
public class QName {
public QName(String localPart);
public QName(String namespaceURI, String localPart);
public QName(String namespaceURI, String localPart, String prefix);
public String getLocalPart();
public String getPrefix();
public String getNamespaceURI();
public static QName valueOf(String qNameAsString);
public int hashCode();
public boolean equals(Object object);
public String toString();
}
import javax.xml.stream.*;
import java.net.*;
import java.io.*;
public class NamePrinter {
public static void main(String[] args) {
if (args.length == 0) {
System.err.println("Usage: java NamePrinter url" );
return;
}
try {
InputStream in;
try {
URL u = new URL(args[0]);
in = u.openStream();
}
catch (MalformedURLException ex) {
// Maybe it's a file name
in = new FileInputStream(args[0]);
}
XMLInputFactory factory = XMLInputFactory.newInstance();
XMLStreamReader parser = factory.createXMLStreamReader(in);
while (true) {
int event = parser.next();
if (event == XMLStreamConstants.START_ELEMENT) {
System.out.println("Start tag: ");
printEvent(parser);
}
else if (event == XMLStreamConstants.END_ELEMENT) {
System.out.println("End tag");
printEvent(parser);
}
else if (event == XMLStreamConstants.START_DOCUMENT) {
System.out.println("Start document");
}
else if (event == XMLStreamConstants.CHARACTERS) {
System.out.println("Text");
printEvent(parser);
}
else if (event == XMLStreamConstants.CDATA) {
System.out.println("CDATA Section");
printEvent(parser);
}
else if (event == XMLStreamConstants.COMMENT) {
System.out.println("Comment");
printEvent(parser);
}
else if (event == XMLStreamConstants.DTD) {
System.out.println("Document type declaration");
printEvent(parser);
}
else if (event == XMLStreamConstants.ENTITY_REFERENCE) {
System.out.println("Entity Reference");
printEvent(parser);
}
else if (event == XMLStreamConstants.SPACE) {
System.out.println("Ignorable white space");
printEvent(parser);
}
else if (event == XMLStreamConstants.PROCESSING_INSTRUCTION) {
System.out.println("Processing Instruction");
printEvent(parser);
}
else if (event == XMLStreamConstants.END_DOCUMENT) {
System.out.println("End Document");
break;
} // end else if
} // end while
}
catch (XMLStreamException ex) {
System.out.println(ex);
}
catch (IOException ex) {
System.out.println("IOException while parsing " + args[0]);
ex.printStackTrace();
}
}
private static void printEvent(XMLStreamReader parser) {
String localName = parser.getLocalName();
String prefix = parser.getPrefix();
String uri = parser.getNamespaceURI();
if (localName != null) System.out.println("\tLocal name: " + localName);
if (prefix != null) System.out.println("\tPrefix: " + prefix);
if (uri != null) System.out.println("\tNamespace URI: " + uri);
System.out.println();
}
}
[146:sd2004west/stax/examples] elharo% java -classpath .:bea.jar:stax.jar NamePrinter hotcop.xml Ignorable white space Start tag: Local name: SONG Namespace URI: Text Start tag: Local name: TITLE Namespace URI: Text End tag Local name: TITLE Namespace URI: Text Start tag: Local name: COMPOSER Namespace URI: Text End tag Local name: COMPOSER Namespace URI: Text Start tag: Local name: COMPOSER Namespace URI: Text End tag Local name: COMPOSER Namespace URI: Text Start tag: Local name: COMPOSER Namespace URI: Text End tag Local name: COMPOSER Namespace URI: Text Start tag: Local name: PRODUCER Namespace URI: Text End tag Local name: PRODUCER Namespace URI: Text Start tag: Local name: PUBLISHER Namespace URI: Text End tag Local name: PUBLISHER Namespace URI: Text Start tag: Local name: LENGTH Namespace URI: Text End tag Local name: LENGTH Namespace URI: Text Start tag: Local name: YEAR Namespace URI: Text End tag Local name: YEAR Namespace URI: Text Start tag: Local name: ARTIST Namespace URI: Text End tag Local name: ARTIST Namespace URI: Text End tag Local name: SONG Namespace URI: Ignorable white space End Document
import javax.xml.stream.*;
import java.net.*;
import java.io.*;
public class RSSLister {
public static void main(String[] args) {
if (args.length == 0) {
System.err.println("Usage: java RSSLister url" );
return;
}
try {
InputStream in;
try {
URL u = new URL(args[0]);
in = u.openStream();
}
catch (MalformedURLException ex) {
// Maybe it's a file name
in = new FileInputStream(args[0]);
}
XMLInputFactory factory = XMLInputFactory.newInstance();
XMLStreamReader parser = factory.createXMLStreamReader(in);
boolean printing = false;
for (int event = parser.next();
parser.hasNext();
event = parser.next()) {
if (event == XMLStreamConstants.START_ELEMENT) {
String name = parser.getLocalName();
if (name.equals("title")) printing = true;
}
else if (event == XMLStreamConstants.END_ELEMENT) {
String name = parser.getLocalName();
if (name.equals("title")) printing = false;
}
else if (parser.hasText() && event != XMLStreamConstants.COMMENT) {
if (printing) System.out.println(parser.getText());
}
}
parser.close();
}
catch (XMLStreamException ex) {
System.out.println(ex);
}
catch (IOException ex) {
System.out.println("IOException while parsing " + args[0]);
}
}
}
$ java -classpath stax.jar:.:bea.jar RSSLister ananova.rss Ananova: Archeology Powered by News Is Free Britain's earliest leprosy victim may have been found 20th anniversary of Mary Rose recovery 'Proof of Jesus' burial box damaged on way to Canada Remains of four woolly rhinos give new insight into Ice Age Experts solve crop lines mystery
Print only item titles:
import javax.xml.stream.*;
import java.net.*;
import java.io.*;
public class BetterRSSLister {
public static void main(String[] args) {
if (args.length == 0) {
System.err.println("Usage: java BetterRSSLister url" );
return;
}
try {
InputStream in;
try {
URL u = new URL(args[0]);
in = u.openStream();
}
catch (MalformedURLException ex) {
// Maybe it's a file name
in = new FileInputStream(args[0]);
}
XMLInputFactory factory = XMLInputFactory.newInstance();
XMLStreamReader parser = factory.createXMLStreamReader(in);
boolean inItem = false;
boolean inTitle = false;
// I am relying on no recursion here. To fix this
// just keep an int count rather than a boolean
for (int event = parser.nextTag();
parser.hasNext();
event = parser.next()) {
if (event == XMLStreamConstants.START_ELEMENT) {
String name = parser.getLocalName();
if (name.equals("title")) inTitle = true;
else if (name.equals("item")) inItem = true;
}
else if (event == XMLStreamConstants.END_ELEMENT) {
String name = parser.getLocalName();
if (name.equals("title")) inTitle = false;
else if (name.equals("item")) inItem = false;
}
else if (parser.hasText() && event != XMLStreamConstants.COMMENT) {
if (inItem && inTitle) System.out.println(parser.getText());
}
}
parser.close();
}
catch (XMLStreamException ex) {
System.out.println(ex);
}
catch (IOException ex) {
System.out.println("IOException while parsing " + args[0]);
}
}
}
$ java -classpath stax.jar:.:bea.jar RSSLister ananova.rss Archeology Powered by News Is Free Britain's earliest leprosy victim may have been found 20th anniversary of Mary Rose recovery 'Proof of Jesus' burial box damaged on way to Canada Remains of four woolly rhinos give new insight into Ice Age Experts solve crop lines mystery
Skips ahead to the next start-tag or end-tag
Ignores comments, processing instructions, and boundary white space
Throws an exception if it encounters non-whitespace text
useful for processing record-like XML
These methods are invokable when the event type is START_ELEMENT:
public int getAttributeCount()
public String getAttributeNamespace(int index)
public String getAttributeName(int index)
public QName getAttributeQName(int index)
public String getAttributePrefix(int index)
public String getAttributeType(int index)
public boolean isAttributeSpecified(int index)
public String getAttributeValue(int index)
public String getAttributeValue(String namespace, String name)
xmlns
and
xmlns:prefix
attributes are not
reported
If the javax.xml.stream.isNamespaceAware property
is false, xmlns
and xmlns:prefix
attributes are reported.
import javax.xml.stream.*;
import java.net.*;
import java.io.*;
import java.util.*;
public class PullSpider {
// Need to keep track of where we've been
// so we don't get stuck in an infinite loop
private List spideredURIs = new Vector();
// This linked list keeps track of where we're going.
// Although the LinkedList class does not guarantee queue like
// access, I always access it in a first-in/first-out fashion.
private LinkedList queue = new LinkedList();
private URL currentURL;
private XMLInputFactory factory;
public PullSpider() {
this.factory = XMLInputFactory.newInstance();
}
private void processStartTag(XMLStreamReader parser) {
String type
= parser.getAttributeValue("http://www.w3.org/1999/xlink", "type");
if (type != null) {
String href
= parser.getAttributeValue("http://www.w3.org/1999/xlink", "href");
if (href != null) {
try {
URL foundURL = new URL(currentURL, href);
if (!spideredURIs.contains(foundURL)) {
queue.addFirst(foundURL);
}
}
catch (MalformedURLException ex) {
// skip this URL
}
}
}
}
public void spider(URL url) {
System.out.println("Spidering " + url);
currentURL = url;
try {
XMLStreamReader parser = factory.createXMLStreamReader(currentURL.openStream());
spideredURIs.add(currentURL);
for (int event = parser.next();
parser.hasNext();
event = parser.next()) {
if (event == XMLStreamConstants.START_ELEMENT) {
processStartTag(parser);
}
} // end for
parser.close();
while (!queue.isEmpty()) {
URL nextURL = (URL) queue.removeLast();
spider(nextURL);
}
}
catch (Exception ex) {
// skip this document
}
}
public static void main(String[] args) throws Exception {
if (args.length == 0) {
System.err.println("Usage: java PullSpider url" );
return;
}
PullSpider spider = new PullSpider();
spider.spider(new URL(args[0]));
} // end main
} // end PullSpider
$ java -classpath stax.jar:.:bea.jar PullSpider http://www.rddl.org Spidering http://www.rddl.org Spidering http://www.rddl.org/natures Spidering http://www.rddl.org/purposes Spidering http://www.rddl.org/xrd.css Spidering http://www.rddl.org/rddl-xhtml.dtd Spidering http://www.rddl.org/rddl-qname-1.mod Spidering http://www.rddl.org/rddl-resource-1.mod Spidering http://www.rddl.org/xhtml-arch-1.mod Spidering http://www.rddl.org/xhtml-attribs-1.mod Spidering http://www.rddl.org/xhtml-base-1.mod Spidering http://www.rddl.org/xhtml-basic-form-1.mod Spidering http://www.rddl.org/xhtml-basic-table-1.mod Spidering http://www.rddl.org/xhtml-blkphras-1.mod Spidering http://www.rddl.org/xhtml-blkstruct-1.mod Spidering http://www.rddl.org/xhtml-charent-1.mod Spidering http://www.rddl.org/xhtml-datatypes-1.mod Spidering http://www.rddl.org/xhtml-framework-1.mod Spidering http://www.rddl.org/xhtml-hypertext-1.mod Spidering http://www.rddl.org/xhtml-image-1.mod Spidering http://www.rddl.org/xhtml-inlphras-1.mod Spidering http://www.rddl.org/xhtml-inlstruct-1.mod Spidering http://www.rddl.org/xhtml-lat1.ent Spidering http://www.rddl.org/xhtml-link-1.mod Spidering http://www.rddl.org/xhtml-meta-1.mod Spidering http://www.rddl.org/xhtml-notations-1.mod Spidering http://www.rddl.org/xhtml-object-1.mod Spidering http://www.rddl.org/xhtml-param-1.mod Spidering http://www.rddl.org/xhtml-qname-1.mod Spidering http://www.rddl.org/xhtml-rddl-model-1.mod Spidering http://www.rddl.org/xhtml-special.ent Spidering http://www.rddl.org/xhtml-struct-1.mod Spidering http://www.rddl.org/xhtml-symbol.ent Spidering http://www.rddl.org/xhtml-text-1.mod Spidering http://www.rddl.org/xlink-module-1.mod Spidering http://www.rddl.org/rddl.rdfs Spidering http://www.rddl.org/rddl-integration.rxg Spidering http://www.rddl.org/modules/rddl-1.rxm
Namespace support is turned on by default.
xmlns
and
xmlns:prefix
attributes are not reported
as attributes
These methods report the namespace declarations for a START_ELEMENT event (bindings going into scope) or an END_ELEMENT event (bindings going out of scope):
public int getNamespaceCount()
public String getNamespacePrefix(int index)
public String getNamespaceURI(int index)
These methods return namespace information for the current element in a START_ELEMENT event or an END_ELEMENT event:
public String getNamespaceURI()
public String getPrefix()
This method returns a namespace context that can be used to query all the namespaces in scope inside a particular element (START_ELEMENT event or an END_ELEMENT event):
public NamespaceContext getNamespaceContext()
next()
(or nextTag()
).
package javax.xml.namespace;
public interface NamespaceContext {
public String getNamespaceURI(String prefix);
public String getPrefix(String namespaceURI);
public Iterator getPrefixes(String namespaceURI);
}
An event based API for creating XML documents
Minimal well-formedness checking
Instances are created by XMLOutputFactory.createXMLStreamWriter()
factory method:
XMLOutputFactory factory = XMLOutputFactory.newInstance();
XMLStreamWriter serializer = XMLOutputFactory.newInstance().createXMLStreamWriter;
package javax.xml.stream;
public interface XMLStreamWriter {
public void writeStartElement(String localName)
throws XMLStreamException;
public void writeStartElement(String namespaceURI, String localName)
throws XMLStreamException;
public void writeStartElement(String prefix,
String localName,
String namespaceURI)
throws XMLStreamException;
public void writeEmptyElement(String namespaceURI, String localName)
throws XMLStreamException;
public void writeEmptyElement(String prefix, String localName, String namespaceURI)
throws XMLStreamException;
public void writeEmptyElement(String localName)
throws XMLStreamException;
public void writeEndElement()
throws XMLStreamException;
public void writeEndDocument()
throws XMLStreamException;
public void writeAttribute(String localName, String value)
throws XMLStreamException;
public void writeAttribute(String prefix,
String namespaceURI,
String localName,
String value)
throws XMLStreamException;
public void writeAttribute(String namespaceURI,
String localName,
String value)
throws XMLStreamException;
public void writeNamespace(String prefix, String namespaceURI)
throws XMLStreamException;
public void writeDefaultNamespace(String namespaceURI)
throws XMLStreamException;
public void writeComment(String data)
throws XMLStreamException;
public void writeProcessingInstruction(String target)
throws XMLStreamException;
public void writeProcessingInstruction(String target,
String data)
throws XMLStreamException;
public void writeCData(String data)
throws XMLStreamException;
public void writeDTD(String dtd)
throws XMLStreamException;
public void writeEntityRef(String name)
throws XMLStreamException;
public void writeStartDocument()
throws XMLStreamException;
public void writeStartDocument(String version)
throws XMLStreamException;
public void writeStartDocument(String encoding,
String version)
throws XMLStreamException;
public void writeCharacters(String text)
throws XMLStreamException;
public void writeCharacters(char[] text, int start, int len)
throws XMLStreamException;
public String getPrefix(String uri)
throws XMLStreamException;
public void setPrefix(String prefix, String uri)
throws XMLStreamException;
public void setDefaultNamespace(String uri)
throws XMLStreamException;
public void setNamespaceContext(NamespaceContext context)
throws XMLStreamException;
public NamespaceContext getNamespaceContext();
public void close() throws XMLStreamException;
public void flush() throws XMLStreamException;
public Object getProperty(java.lang.String name) throws IllegalArgumentException;
}
Goal: Convert a RDDL document to pure XHTML.
RDDL is
just an XHTML Basic document in which there's one extra element:
rddl:resource
which can appear anywhere a p
element can appear, and can contain anything a
div
element can contain.
The customary rddl
prefix is mapped to the
http://www.rddl.org/ namespace URL:
<rddl:resource id="rec-xhtml"
xlink:title="W3C REC XHTML"
xlink:role="http://www.w3.org/1999/xhtml"
xlink:arcrole="http://www.rddl.org/purposes#reference"
xlink:href="http://www.w3.org/tr/xhtml1"
>
<li><a href="http://www.w3.org/tr/xhtml1">W3C XHTML 1.0</a></li>
</rddl:resource>
The program needs to throw away the
<rddl:resource>
start-tag and </rddl:resource>
end-tag while leaving everything else intact.
import javax.xml.stream.*;
import java.net.*;
import java.io.*;
public class RDDLStripper {
public final static String RDDL_NS = "http://www.rddl.org/";
public static void main(String[] args) {
if (args.length == 0) {
System.err.println("Usage: java RDDLStripper url" );
return;
}
try {
InputStream in;
try {
URL u = new URL(args[0]);
in = u.openStream();
}
catch (MalformedURLException ex) {
// Maybe it's a file name
in = new FileInputStream(args[0]);
}
XMLStreamReader parser
= XMLInputFactory.newInstance().createXMLStreamReader(in);
XMLStreamWriter serializer
= XMLOutputFactory.newInstance().createXMLStreamWriter(System.out);
while (true) {
int event = parser.next();
if (parser.isStartElement()) {
String namespaceURI = parser.getNamespaceURI();
if (!namespaceURI.equals(RDDL_NS)) {
serializer.writeStartElement(namespaceURI, parser.getLocalName());
// add attributes
for (int i = 0; i < parser.getAttributeCount(); i++) {
serializer.writeAttribute(
parser.getAttributeNamespace(i),
parser.getAttributeName(i),
parser.getAttributeValue(i)
);
}
// add namespace declarations
for (int i = 0; i < parser.getNamespaceCount(); i++) {
String uri = parser.getNamespaceURI(i);
if (!RDDL_NS.equals(uri)) {
serializer.writeNamespace(parser.getNamespacePrefix(i), uri);
}
}
}
}
else if (parser.isEndElement()) {
String namespaceURI = parser.getNamespaceURI();
if (!namespaceURI.equals(RDDL_NS)) {
serializer.writeEndElement();
}
}
else if (event == XMLStreamConstants.CHARACTERS
|| event == XMLStreamConstants.SPACE) {
serializer.writeCharacters(parser.getText());
}
else if (event == XMLStreamConstants.CDATA) {
serializer.writeCData(parser.getText());
}
else if (event == XMLStreamConstants.COMMENT) {
serializer.writeComment(parser.getText());
}
else if (event == XMLStreamConstants.DTD) {
serializer.writeDTD(parser.getText());
}
else if (event == XMLStreamConstants.ENTITY_REFERENCE) {
serializer.writeEntityRef(parser.getLocalName());
}
else if (event == XMLStreamConstants.PROCESSING_INSTRUCTION) {
serializer.writeProcessingInstruction(parser.getPITarget(), parser.getPIData());
}
else if (event == XMLStreamConstants.END_DOCUMENT) {
serializer.flush();
break;
}
}
serializer.close();
parser.close();
}
catch (XMLStreamException ex) {
System.out.println(ex);
}
catch (IOException ex) {
System.out.println("IOException while parsing " + args[0]);
}
}
}
Makes certain kinds of programs really easy:
Filter out certain kinds of nodes
Filter out certain tags
Convert processing instructions to elements
Comment reader
Change names of elements
Add attributes to elements
Changes have to be local to be easy:
Start-tag changes based on name, namespace, and attributes
End-tag changes based on name and namespace
Event changes based on that event only
I don't know whether these programs are realistic patterns or just common tutorial examples
Java XQuery API, JSR 225, currently in early draft review
Java XML Encryption API, JSR 106, currently in public review' no progress in last year
This presentation: http://www.cafeconleche.org/slides/sd2007west/java5xml/
Document Object Model (DOM) Level 3 Core Specification Version 1.0: http://www.w3.org/TR/DOM-Level-3-Core
Document Object Model (DOM) Level 3 Load and Save Specification: http://www.w3.org/TR/DOM-Level-3-LS/
Document Object Model (DOM) Requirements: http://www.w3.org/TR/DOM-Requirements/
Java XPath API: http://www-128.ibm.com/developerworks/xml/library/x-javaxpathapi.html
Java Validation API: http://www-128.ibm.com/developerworks/xml/library/x-javaxmlvalidapi.html
XML Digital Signatures API: http://java.sun.com/developer/technicalArticles/xml/dig_signature_api/
An introduction to StAX: http://www.xml.com/pub/a/2003/09/17/stax.html