The XML specification defines three classes of problems that can occur in an XML document. In order of decreasing severity, these are
A well-formedness error. As soon as the parser detects it, it must throw in the towel and stop parsing. The parse() method throws a SAXParseException when a fatal error is detected. Parsers have a little leeway in whether they detect fatal errors. In particular non-validating parsers may not catch certain fatal errors that occur in the external DTD subset, and many parsers don’t actually check everything they’re supposed to check. However, if a parser does detect a fatal error, then it must give up and stop parsing.
An error but not a well-formedness error. The most common such error is a validity error, though there are a few other kinds as well. Some parsers classify violations of namespace well-formedness as errors. Parsers may or may not detect these errors. If a parser does detect it, it may or may not throw a SAXParseException and it may or may not continue parsing. (Validity errors generally do not cause SAXParseExceptions. Other kinds of errors may, depending on the parser.) These sorts of errors are a source of some interoperability problems in XML because two parsers may behave differently given the same document.
Not itself an error. However, it may nonetheless indicate a mistake of some kind in the document. For example, a parser might issue a warning if it encountered an element named XMLDocument because all names beginning with “XML” (in any arrangement of case) are reserved by the W3C for future standards. Parsers may or may not detect problems like this. If a parser does detect one, it will not throw an exception and it will continue parsing.
In addition, a parser may encounter an I/O problem that has nothing to do with XML. For example, your cat might knock the Ethernet cable out of the back of your PC while you’re downloading a large XML document from a remote web server.
If the parser detects a well-formedness error in the document it’s parsing, parse() throws a SAXException. In the event of an I/O error, it throws an IOException. The parser may or may not throw a SAXException in the event of a non-fatal error, and will not throw an exception for a warning.
As you can see, the only kind of problem the parser is guaranteed to tell you about through an exception is the well-formedness error. If you want to be informed of the other kinds of errors and possible problems, you have to implement the ErrorHandler interface, and register your ErrorHandler implementation with the XMLReader.
The SAXException class, Example 7.4, is the generic exception class for almost anything other than an I/O problem that can go wrong while processing an XML document with SAX. Not only the parse() method but most of the callback methods in the various SAX interfaces are declared to throw this exception. If you detect a problem while processing an XML document, your code can throw its own SAXException.
Example 7.4. The SAXException class
package org.xml.sax; public class SAXException extends Exception { public SAXException() public SAXException(String message) public SAXException(Exception rootCause) public SAXException(String message, Exception e) public String getMessage() public Exception getException() public String toString() }
SAXException may not always be the exception you want to throw, however. For example, suppose you’re parsing a document containing an XML digital signature, and the endElement() method notices that the Base-64 encoded text provided in the P element, which represents the prime modulus of a DSA key, does not decode to a prime number like it’s supposed to. You naturally want to throw a java.security.InvalidKeyException to warn the client application of this. However, endElement() cannot throw a java.security.InvalidKeyException, only a SAXException In this case, you wrap the exception you really want to throw inside a SAXException and throw the SAXException instead. For example,
Exception nestedException = new InvalidKeyException("Modulus is not prime!"); SAXException e = new SAXException(nestedException); throw e;
The code that catches the SAXException can retrieve the original exception using the getException() method. For example, the client application method might indeed be declared to throw an InvalidKeyException so you could cast the nested exception to its real type and throw it into the appropriate catch block elsewhere in the call chain.
catch (SAXException e) { Exception rootCause = e.getException(); if (rootCause == null) { // handle it as an XML problem... } else { if (rootCause instanceof InvalidKeyException) { InvalidKeyException ike = (InvalidKeyException) rootCause; throw ike; } else if (rootCause instanceof SomeOtherException) { SomeOtherException soe = (SomeOtherException) rootCause; throw soe; } … } }
SAX defines several more specific subclasses of SAXException for specific problems, though most methods are only declared to throw a generic SAXException. These subclasses include SAXParseException, SAXNotRecognizedException, and SAXNotSupportedException. In addition parsers can extend SAXException with their own custom subclasses, though few do this.
A SAXParseException indicates a fatal error, error, or warning in an XML document. The parse() method of the XMLReader interface throws this when it encounters a well-formedness error. SAXParseException is also passed as an argument to the methods of the ErrorHandler interface to signal any of the three kinds of problems an XML document may contain.
Besides the usual exception methods like getMessage() and printStackTrace() that SAXParseException inherits from its superclasses, it provides methods to get the public ID and system ID of the file where the well-formedness error occurs (Remember, XML documents that use external parsed entities can be broken up over multiple separate files.) and the line number and column number where the well-formedness error appears within that file.
Example 7.5. The SAXParseException class
package org.xml.sax; public class SAXParseException extends SAXException { public SAXParseException(String message, Locator locator) public SAXParseException(String message, Locator locator, Exception e) public SAXParseException(String message, String publicID, String systemID, int lineNumber, int columnNumber) public SAXParseException(String message, String publicID, String systemID, int lineNumber, int columnNumber, Exception e) public String getPublicId() public String getSystemId() public int getLineNumber() public int getColumnNumber() }
The lines and column numbers reported by the parser for the problem may not always be perfectly accurate. Nonetheless, they should be close to where the problem begins or ends. (Some parsers give the line and column numbers for the start-tag of a problem element. Others give the give the line and column numbers for the end-tag.) If the document is so malformed that the parser can’t even begin working with it, especially if it isn’t an XML document at all, then the parser will probably indicate that the error occurred at line -1, column -1.
Example 7.6 enhances last chapter’s SAXChecker program so that it reports the line numbers of any well-formedness errors. Since there are two catch blocks, one for SAXParseException and one for the more generic SAXException, it’s possible to distinguish between well-formedness errors and other problems such as not being able to find the right XMLReader implementation class.
Example 7.6. A SAX program that parses a document and identifies the line numbers of any well-formedness errors
import org.xml.sax.*; import org.xml.sax.helpers.XMLReaderFactory; import java.io.IOException; public class BetterSAXChecker { public static void main(String[] args) { if (args.length <= 0) { System.out.println("Usage: java BetterSAXChecker URL"); return; } String document = args[0]; try { XMLReader parser = XMLReaderFactory.createXMLReader(); parser.parse(document); System.out.println(document + " is well-formed."); } catch (SAXParseException e) { System.out.print(document + " is not well-formed at "); System.out.print("Line " + e.getLineNumber() + ", column " + e.getColumnNumber() ); System.out.println(" in the entity " + e.getSystemId()); } catch (SAXException e) { System.out.println("Could not check document because " + e.getMessage()); } catch (IOException e) { System.out.println( "Due to an IOException, the parser could not check " + document ); } } }
Here’s the output I got when I first ran it across my Cafe con Leche home page. The first time I did not specify a parser which produced a generic SAXException. The second time I corrected that mistake, and a SAXParseException signaled a well-formedness error.
%java BetterSAXChecker http://www.cafeconleche.org Could not check document because System property org.xml.sax.driver not specified %java -Dorg.xml.sax.driver=org.apache.xerces.parsers.SAXParser BetterSAXChecker http://www.cafeconleche.org http://www.cafeconleche.org is not well-formed at Line 64, column 64 in the entity http://www.cafeconleche.org/
XML includes a few errors that fall into a gray area. These are errors but neither fatal well-formedness errors nor non-fatal validity errors. The most common such error is an ambiguous content model in an element declaration. For example, consider this declaration which states that an Actor can have between 0 and 2 Parts:
<!ELEMENT Actor (Part?, Part?)>
The problem occurs with an Actor element that has one Part like this:
<Actor> <Part>Cyrano</Part> </Actor>
Does this one Part match the first Part in the content model or the second? There’s no way to tell. Some parsers have trouble with this construct while other parsers don’t notice any problem at all. The XML specification says this is an error, but does not classify it as a fatal error.
Different parsers treat these not necessarily fatal errors differently. Some parsers throw a SAXParseException when one is encountered. Other parsers let them pass without comment. And still others report them in a different way but not throw an exception. For maximum compatibility, try to design your DTDs and instance documents so these are not a problem.
Throwing an exception aborts the parsing process. Not all problems encountered in an XML document necessarily require such a radical step. In particular, validity errors are not signaled by an exception because that would stop parsing. If you want your program to be informed of non-fatal errors, you must register an ErrorHandler object with the XMLReader. Then the parser will tell you about problems in the document by passing (not throwing!) a SAXParseException to one of the methods in this object.
Example 7.7 summarizes the ErrorHandler interface. As you can see it has three callback methods corresponding to the three different kinds of problems a parser may detect. When the parser detects one of these problems, it passes a SAXParseException to the appropriate method. If you want to treat errors or warnings as fatal, then you can throw the exception you were passed. (The parse() method will always throw an exception for a fatal error, even if you don’t.) If you don’t want to treat them as fatal (and mostly you don’t), then you can do something else with the information wrapped in the exception.
Example 7.7. The ErrorHandler interface
package org.xml.sax; public interface ErrorHandler { public void warning(SAXParseException exception) throws SAXException; public void error(SAXParseException exception) throws SAXException; public void fatalError(SAXParseException exception) throws SAXException; }
These two methods install an ErrorHandler into an XMLReader:
public void setErrorHandler(ErrorHandler handler);
public ErrorHandler getErrorHandler();
You can uninstall an ErrorHandler by passing null to setErrorHandler().
Example 7.8 is a program that checks documents for well-formedness errors and other problems. All errors detected are reported, no matter how small, through the ErrorHandler interface.
Example 7.8. A SAX program that reports all problems found in an XML document
import org.xml.sax.*; import org.xml.sax.helpers.XMLReaderFactory; import java.io.IOException; public class BestSAXChecker implements ErrorHandler { public void warning(SAXParseException exception) { System.out.println("Warning: " + exception.getMessage()); System.out.println(" at line " + exception.getLineNumber() + ", column " + exception.getColumnNumber()); System.out.println(" in entity " + exception.getSystemId()); } public void error(SAXParseException exception) { System.out.println("Error: " + exception.getMessage()); System.out.println(" at line " + exception.getLineNumber() + ", column " + exception.getColumnNumber()); System.out.println(" in entity " + exception.getSystemId()); } public void fatalError(SAXParseException exception) { System.out.println("Fatal Error: " + exception.getMessage()); System.out.println(" at line " + exception.getLineNumber() + ", column " + exception.getColumnNumber()); System.out.println(" in entity " + exception.getSystemId()); } public static void main(String[] args) { if (args.length <= 0) { System.out.println("Usage: java BestSAXChecker URL"); return; } String document = args[0]; try { XMLReader parser = XMLReaderFactory.createXMLReader(); ErrorHandler handler = new BestSAXChecker(); parser.setErrorHandler(handler); parser.parse(document); // If the document isn't well-formed, an exception has // already been thrown and this has been skipped. System.out.println(document + " is well-formed."); } catch (SAXParseException e) { System.out.print(document + " is not well-formed at "); System.out.println("Line " + e.getLineNumber() + ", column " + e.getColumnNumber() ); System.out.println(" in entity " + e.getSystemId()); } catch (SAXException e) { System.out.println("Could not check document because " + e.getMessage()); } catch (IOException e) { System.out.println( "Due to an IOException, the parser could not check " + document ); } } }
Here’s the output from running BestSAXChecker across the Docbook XML source code for an early version of this chapter:
%java -Dorg.xml.sax.driver=org.apache.xerces.parsers.SAXParser BestSAXChecker xmlreader.xml Error: The namespace prefix "xinclude" was not declared. at line 349, column 92 in entity file:///D:/books/XMLJAVA/xmlreader.xml Error: The namespace prefix "xinclude" was not declared. at line 530, column 95 in entity file:///D:/books/XMLJAVA/xmlreader.xml Error: The namespace prefix "xinclude" was not declared. at line 545, column 84 in entity file:///D:/books/XMLJAVA/xmlreader.xml Error: The namespace prefix "xinclude" was not declared. at line 688, column 93 in entity file:///D:/books/XMLJAVA/xmlreader.xml Fatal Error: The element type "para" must be terminated by the matching end-tag "</para>". at line 706, column 42 in entity file:///D:/books/XMLJAVA/xmlreader.xml Could not check document because Stopping after fatal error: The element type "para" must be terminated by the matching end-tag "</para>".
BestSAXChecker complains several times about an undeclared namespace prefix for the XInclude elements I use to merge in source code examples like Example 7.8. Then about three quarters of the way through the document, it encounters a well-formedness error where I neglected to put an end-tag in the right place. At this point parsing stops. If there are any errors after that point, they aren’t reported. Once I fixed those problems, the file became well-formed and valid:
%java -Dorg.xml.sax.driver=org.apache.xerces.parsers.SAXParser BestSAXChecker xmlreader.xml xmlreader.xml is well-formed.
Beyond simple well-formedness, which errors this program catches depends on the underlying parser. All conformant parsers detect all well-formedness errors. Most modern parsers should also catch any violations of namespace well-formedness. Whether this program catches validity errors depends on the parser. Most parsers do not validate by default. Instead they require the client application to explicitly request validation by setting the http://xml.org/sax/features/validation feature to true. I take this subject up next.
Copyright 2001, 2002 Elliotte Rusty Harold | elharo@metalab.unc.edu | Last Modified May 26, 2002 |
Up To Cafe con Leche |