SAX is mostly about the instance document, not the DTD or schema. However, given a validating parser, or at least an internal DTD subset, the DTD can affect the contents of the instance document in six ways:
It can provide default values for attributes.
It can assign types to attributes which affects their normalized value.
It can distinguish between ignorable and non-ignorable white space.
It can declare general entities.
It can declare unparsed entities.
It can declare notations.
The first four are resolved silently. For instance, when applying a default value for an attribute to an element, the parser simply adds that attribute to the Attributes object it passes to startElement(). It doesn’t tell you that it’s done it. It just does it.
The DTDHandler interface covers the last two. Since notations and unparsed entities are so infrequently used, they’re not made a part of the main ContentHandler interface. Instead they’re given their own callback interface that’s just for working with notations and unparsed entities, DTDHandler. This is summarized in Example 7.16 Those few programmers who need this functionality can use it. Everyone else can ignore it.
Example 7.16. The DTDHandler interface
package org.xml.sax; public interface DTDHandler { public void notationDecl(String name, String publicID, String systemID) throws SAXException; public void unparsedEntityDecl(String name, String publicID, String systemID, String notationName) throws SAXException; }
Like other callback interfaces, programmers implement this interface in a class of their own choosing. That concrete instantiation is registered with the XMLReader through its setDTDHandler() method. For parallelism, there’s also a getDTDHandler() method though it isn’t much needed in practice:
public void setDTDHandler(DTDHandler handler);
public DTDHandler getDTDHandler();
As with the other callback interfaces, you can uninstall a DTDHandler by passing null to setDTDHandler().
The most common thing to do with a DTDHandler is simply store all the information provided about the notations and unparsed entities. Then the ContentHandler can refer back to this when it needs to resolve an unparsed entity. For instance, Example 7.17 is a simple DTDHandler implementation that stores the notations and unparsed entities declared in the DTD in two hash tables.
Example 7.17. A caching DTDHandler
import org.xml.sax.*; import java.util.Hashtable; public class UnparsedCache implements DTDHandler { private Hashtable notations = new Hashtable(); private Hashtable entities = new Hashtable(); public void notationDecl(String name, String publicID, String systemID) { System.out.println(name); notations.put(name, new Notation(name, publicID, systemID)); } public void unparsedEntityDecl(String name, String publicID, String systemID, String notationName) { entities.put(name, new UnparsedEntity(name, publicID, systemID, notationName)); } public UnparsedEntity getUnparsedEntity(String name) { System.out.println("Getting " + name); return (UnparsedEntity) entities.get(name); } public Notation getNotation(String name) { System.out.println("Getting " + name); return (Notation) notations.get(name); } }
For the convenience of tracking the several strings associated with each notation and unparsed entity, I wrap each one in a very simple class that just has a constructor, some getter methods, the equals() and hashCode() methods needed to store these objects in hash tables, and a toString() method for convenient output. The Notation class is shown in Example 7.18. The UnparsedEntity class is shown in Example 7.19. Once you learn about DOM, an alternative would be to use that API’s Notation and Entity classes instead.
Example 7.18. A Notation utility class
public class Notation { private String name; private String publicID; private String systemID; public Notation(String name, String publicID, String systemID) { this.name = name; this.publicID = publicID; this.systemID = systemID; } public String getName() { return this.name; } public String getSystemID() { return this.systemID; } public String getPublicID() { return this.publicID; } public boolean equals(Object o) { if (o instanceof Notation) { Notation n = (Notation) o; // Well-formedness requires every notation to have // at least a SYSTEM or a PUBLIC ID so both should not be // simultaneously null as long as the UnparsedCache built // this object if (publicID == null) { return name.equals(n.name) && systemID.equals(n.systemID); } else if (systemID == null) { return name.equals(n.name) && publicID.equals(n.publicID); } else { return name.equals(n.name) && publicID.equals(n.publicID) && systemID.equals(n.systemID); } } return false; } public int hashCode() { if (publicID == null) { return name.hashCode() ^ systemID.hashCode(); } else if (systemID == null) { return name.hashCode() ^ publicID.hashCode(); } else { return name.hashCode() ^ publicID.hashCode() ^ systemID.hashCode(); } } public String toString() { StringBuffer result = new StringBuffer(name); if (publicID != null) { result.append(" PUBLIC "); result.append(publicID); if (systemID != null) { result.append(" "); result.append(systemID); } } else { result.append(" SYSTEM "); result.append(systemID); } return result.toString(); } }
Example 7.19. An UnparsedEntity utility class
public class UnparsedEntity { private String name; private String publicID; private String systemID; private String notationName; public UnparsedEntity(String name, String publicID, String systemID, String notationName) { this.name = name; this.publicID = publicID; this.systemID = systemID; this.notationName = notationName; } public String getName() { return this.name; } public String getSystemID() { return this.systemID; } public String getPublicID() { return this.publicID; } public String getNotationName() { return this.notationName; } public boolean equals(Object o) { if (o instanceof UnparsedEntity) { UnparsedEntity entity = (UnparsedEntity) o; if (publicID == null) { return name.equals(entity.name) && systemID.equals(entity.systemID) && notationName.equals(entity.notationName); } else { return name.equals(entity.name) && systemID.equals(entity.systemID) && publicID.equals(entity.publicID) && notationName.equals(entity.notationName); } } return false; } public int hashCode() { if (publicID == null) { return name.hashCode() ^ systemID.hashCode() ^ notationName.hashCode(); } else { return name.hashCode() ^ publicID.hashCode() ^ systemID.hashCode() ^ notationName.hashCode(); } } public String toString() { StringBuffer result = new StringBuffer(name); if (publicID == null) { result.append(" PUBLIC "); result.append(publicID); } else { result.append(" SYSTEM "); } result.append(" "); result.append(systemID); return result.toString(); } }
When you later encounter an attribute of type ENTITY, ENTITIES, or NOTATION in the ContentHandler, you can use the getEntity() and getNotation() methods to return the relevant data for that item. For example, Example 7.20 is a simple program to list the unparsed entities and notations discovered in an XML document.
Example 7.20. A program that lists the unparsed entities and notations used in an XML document
import org.xml.sax.*; import org.xml.sax.helpers.*; import java.util.StringTokenizer; public class EntityLister extends DefaultHandler { private UnparsedCache cache; public EntityLister(UnparsedCache cache) { this.cache = cache; } public void startElement(String namespaceURI, String localName, String qualifiedName, Attributes attributes) { for (int i = 0; i < attributes.getLength(); i++) { if (attributes.getType(i).equals("NOTATION")) { Notation n = cache.getNotation(attributes.getValue(i)); System.out.println("Element " + qualifiedName + " has notation " + n); } else if (attributes.getType(i).equals("ENTITY")) { UnparsedEntity e = cache.getUnparsedEntity( attributes.getValue(i)); System.out.println("Entity: " + e); } else if (attributes.getType(i).equals("ENTITIES")) { String entityNames = attributes.getValue(i); StringTokenizer st = new StringTokenizer(entityNames); while (st.hasMoreTokens()) { String name = st.nextToken(); UnparsedEntity e = cache.getUnparsedEntity(name); System.out.println("Entity: " + e); } } } } public static void main(String[] args) { if (args.length <= 0) { System.out.println("Usage: java EntityLister URL"); return; } String document = args[0]; try { XMLReader parser = XMLReaderFactory.createXMLReader(); // I want to use qualified names parser.setFeature( "http://xml.org/sax/features/namespace-prefixes", true); UnparsedCache cache = new UnparsedCache(); parser.setDTDHandler(cache); parser.setContentHandler(new EntityLister(cache)); parser.parse(document); } catch (Exception e) { System.out.println("Could not read document because " + e.getMessage()); } } }
It took me a while to find an XML document in the wild that actually used notations and unparsed entities. However, David Carlisle pointed out to me that DocBook uses notations to identify preformatted elements in which white space should be preserved, and since this book is written in Docbook, I ran EntityLister across a rough draft of this chapter. Here’s what came out:
% java EntityLister xmlreader.xml Element screen has notation linespecific SYSTEM linespecific Element programlisting has notation linespecific SYSTEM linespecific Element programlisting has notation linespecific SYSTEM linespecific Element programlisting has notation linespecific SYSTEM linespecific Element programlisting has notation linespecific SYSTEM linespecific Element programlisting has notation linespecific SYSTEM linespecific Element programlisting has notation linespecific SYSTEM linespecific Element programlisting has notation linespecific SYSTEM linespecific Element programlisting has notation linespecific SYSTEM linespecific Element programlisting has notation linespecific SYSTEM linespecific Element programlisting has notation linespecific SYSTEM linespecific Element programlisting has notation linespecific SYSTEM linespecific Element programlisting has notation linespecific SYSTEM linespecific Element programlisting has notation linespecific SYSTEM linespecific Element programlisting has notation linespecific SYSTEM linespecific Element screen has notation linespecific SYSTEM linespecific Element programlisting has notation linespecific SYSTEM linespecific …
Copyright 2001, 2002 Elliotte Rusty Harold | elharo@metalab.unc.edu | Last Modified July 26, 2002 |
Up To Cafe con Leche |