The Document Object Model defines a tree-based representation of XML documents. The org.w3c.dom package contains the basic node classes that represent the different components that make up the tree. The org.w3c.dom.traversal package includes some useful utility classes for navigating, searching, and querying the tree.
DOM Level 2, the version which is described here, is incomplete. It does not define how a DOMImplementation is loaded, how a document is parsed, or how a document is serialized. For the moment, JAXP provides a stopgap solution. Eventually, DOM Level 3 will fill in these holes. However, since DOM Level 3 was far from complete at the time of this writing, this appendix covers DOM level 2 exclusively.
Table A.1 summarizes the DOM data model with the name, value, parent, and possible children for each kind of node.
Table A.1. Node properties
Node type | name | value | parent | children |
---|---|---|---|---|
Document | #document | null | null | Comment, processing instruction, zero or one document type, one element |
DocumentType | Root element name specified by the DOCTYPE declaration | null | Document | none |
Element | prefixed name | null | Element, Document, or Document fragment | Comment, Processing Instruction, Text, Element, Entity reference, CDATA section |
Text | #text | text of the node | Element, Attr, Entity, or Entity reference | none |
Attr | prefixed name | normalized attribute value | Element | Text, Entity reference |
Comment | #comment | text of comment | Element, Document, or Document fragment | none |
Processing Instruction | target | data | Element, Document, or Document fragment | none |
Entity Reference | name | null | Element or Document Fragment | Comment, Processing Instruction, Text, Element, Entity reference, CDATA section |
Entity | entity name | null | null | Comment, Processing Instruction, Text, Element, Entity Reference, CDATA section |
CDATA section | #cdata-section | text of the section | Element, Entity, or Entity reference | none |
Notation | notation name | null | null | none |
Document fragment | #document-fragment | null | null | Comment, Processing Instruction, Text, Element, Entity reference, CDATA section |
One thing to keep in mind is the parts of the XML document that are not exposed in this data model:
The XML declaration, including the version, standalone, and encoding declarations. These will be added as properties of the document node in DOM3, but they are not provided by current parsers.
Most information from the DTD and/or schema is not provided including element and attribute types and content models. DOM Level 3 will add some of this.
Any white space outside the root element.
Whether or not each character was provided by a character reference. Parsers may provide information about entity references, but are not required to do so.
A DOM program cannot manipulate any of these constructs. It cannot, for example, read in an XML document, then write it out again in the same encoding the original document used because it doesn’t know what encoding the original document used. It cannot treat $var differently than $var because it doesn’t know which was originally written.
The org.w3c.dom package contains the core interfaces that are used to form DOM documents. Node is the common superinterface all these node types share. In addition, this package contains a few data structures used to hold collections of DOM nodes and one exception class.
The Attr interface represents an attribute node. Its node properties are defined as follows:
node name | The full name of the attribute, including a prefix and a colon if the attribute is in a namespace |
node value | The attribute’s normalized value |
local name | The local part of the attribute’s name |
namespace URI | The namespace URI of the attribute or null if the attribute does not have a prefix |
namespace prefix | The namespace URI of the attribute or null if the attribute is not in a namespace |
Furthermore, Attr objects are not part of the tree. They have neither parents nor siblings. getParentNode(), getPreviousSibling(), and getNextSibling() all return null when invoked on an Attr object. Attr objects do have children (Text and EntityReference objects) but it's generally best to ignore this fact, and just use the getValue() method to read the value of an attribute.
package org.w3c.dom; public interface Attr extends Node { public String getName(); public boolean getSpecified(); public String getValue(); public void setValue(String value) throws DOMException; public Element getOwnerElement(); }
The CDATASection interface represents a CDATA section. DOM parsers are not required to use this interface to report CDATA sections. They may just use Text objects to report the content of CDATA sections. Do not write code that depends on recognizing CDATA sections in text. The node properties of CDATASection are defined as follows:
node name | #cdata-section |
node value | The text of the CDATA section |
local name | null |
namespace URI | null |
namespace prefix | null |
package org.w3c.dom;
public interface CDATASection extends Text {
}
The CharacterData interface is the generic superinterface for those nodes composed of plain text: Comment, Text, and CDATASection. All actual instances of CharacterData should be instances of one of these subinterfaces. The node properties depend on the specific subinterface.
package org.w3c.dom; public interface CharacterData extends Node { public String getData() throws DOMException; public void setData(String data) throws DOMException; public int getLength(); public String substringData(int offset, int count) throws DOMException; public void appendData(String s) throws DOMException; public void insertData(int offset, String s) throws DOMException; public void deleteData(int offset, int count) throws DOMException; public void replaceData(int offset, int count, String s) throws DOMException; }
The Comment interface represents a comment node. All its methods are inherited from the CharacterData and Node superinterfaces. Its node properties are defined as follows:
node name | #comment |
node value | The text of the comment, not including <-- and --> |
local name | null |
namespace URI | null |
namespace prefix | null |
package org.w3c.dom;
public interface Comment extends CharacterData {
}
The Document interface represents the root node of the tree. It also serves as an abstract factory to create the other kinds of nodes (element, attribute, comment, etc.) which will be stored in the tree. Its node properties are defined as follows:
node name | #document |
node value | null |
local name | null |
namespace URI | null |
namespace prefix | null |
package org.w3c.dom; public interface Document extends Node { public DocumentType getDoctype(); public DOMImplementation getImplementation(); public Element getDocumentElement(); public Element createElement(String tagName) throws DOMException; public Element createElementNS( String namespaceURI, String qualifiedName) throws DOMException; public Attr createAttribute(String name) throws DOMException; public Attr createAttributeNS( String namespaceURI, String qualifiedName) throws DOMException; public DocumentFragment createDocumentFragment(); public Text createTextNode(String data); public Comment createComment(String data); public CDATASection createCDATASection(String data) throws DOMException; public ProcessingInstruction createProcessingInstruction( String target, String data) throws DOMException; public EntityReference createEntityReference(String name) throws DOMException; public NodeList getElementsByTagName(String tagName); public Node importNode(Node importedNode, boolean deep) throws DOMException; public NodeList getElementsByTagNameNS(String namespaceURI, String localName); public Element getElementById(String id); }
The DocumentFragment interface is used to hold lists of element, text, comment, CDATA section, and processing instruction nodes when those nodes do not have a parent. It’s convenient for cutting and pasting or inserting and moving fragments of an XML document that that don’t necessarily contain a single element. Its node properties are defined as follows:
node name | #document-fragment |
node value | null |
local name | null |
namespace URI | null |
namespace prefix | null |
package org.w3c.dom;
public interface DocumentFragment extends Node {
}
This interface is for advanced use only. DOM trees created by a parser won’t contain any DocumentFragment objects and adding a DocumentFragment to a Document actually adds the contents of the fragment instead.
The DocumentType interface represents a document type declaration. It contains the root element name it declares, the system ID and public ID for the external DTD subset, and the complete internal DTD subset as a String. It also contains lists of the notations and general entities declared in the DTD. Other than this it contains no information from the DTD. The node properties of a DocumentType object are defined as follows:
node name | declared root element name |
node value | null |
local name | null |
namespace URI | null |
namespace prefix | null |
package org.w3c.dom; public interface DocumentType extends Node { public String getName(); public String getPublicId(); public String getSystemId(); public String getInternalSubset(); public NamedNodeMap getEntities(); public NamedNodeMap getNotations(); }
In DOM Level 2 the entire DocumentType object is read-only. No part of it can be modified. Furthermore a Document object’s DocumentType cannot be changed after the Document object is created. This restriction is lifted in DOM Level 3.
DOM does not provide any representation of the document type definition as distinguished from the document type declaration.
DOMImplementation is an abstract factory used to create new Document and DocumentType objects. The javax.xml.parsers.DocumentBuilder class can create new DOMImplementation objects.
package org.w3c.dom; public interface DOMImplementation { public DocumentType createDocumentType(String qualifiedName, String publicID, String systemID) throws DOMException; public Document createDocument(String namespaceURI, String qualifiedName, DocumentType doctype) throws DOMException; public boolean hasFeature(String feature, String version); }
The Element interface represents an element node. The most important methods for this interface are inherited from the Node superinterface. Its node properties are defined as follows:
node name | The qualified name of the element, possibly including a prefix and a colon |
node value | null |
local name | The local part of the element name |
namespace URI | The namespace URI of the element, or null if this element is not in a namespace |
namespace prefix | The namespace prefix of the element, or null if this element is in the default namespace or no namespace at all |
package org.w3c.dom; public interface Element extends Node { public String getTagName(); public NodeList getElementsByTagNameNS(String namespaceURI, String localName); public NodeList getElementsByTagName(String name); public String getAttribute(String name); public void setAttribute(String name, String value) throws DOMException; public void removeAttribute(String name) throws DOMException; public Attr getAttributeNode(String name); public Attr setAttributeNode(Attr newAttr) throws DOMException; public Attr removeAttributeNode(Attr oldAttr) throws DOMException; public String getAttributeNS(String namespaceURI, String localName); public void setAttributeNS(String namespaceURI, String qualifiedName, String value) throws DOMException; public void removeAttributeNS(String namespaceURI, String localName) throws DOMException; public Attr getAttributeNodeNS(String namespaceURI, String localName); public Attr setAttributeNodeNS(Attr newAttr) throws DOMException; public boolean hasAttribute(String name); public boolean hasAttributeNS(String namespaceURI, String localName); }
The Entity interface represents an entity node. It does not appear directly in the tree. Instead an EntityReference node appears in the tree. The name of the EntityReference identifies a member of the document’s entities map, which is accessible through the DocumentType interface. If the Entity object represents a parsed entity, and the parser resolved the entity, then this node will have children representing its replacement text. However all aspects of the Entity object including all its children are read-only. They may not be modified or changed in any way.
package org.w3c.dom; public interface Entity extends Node { public String getPublicId(); public String getSystemId(); public String getNotationName(); }
The node properties of Entity are defined as follows:
node name | The name of the entity |
node value | null |
local name | null |
namespace URI | null |
namespace prefix | null |
Since Entity objects are not part of the tree, they have neither parents nor siblings. getParentNode(), getPreviousSibling(), and getNextSibling() all return null when invoked on an Entity object.
The EntityReference interface represents a parsed entity reference which appears in the document tree. Parsers are not required to use this class. Some parsers silently resolve all entity references to their replacement text. If a parser does not resolve external entity references, then it must include EntityReference objects instead, though the only information available from these objects will be the name. A parser that does resolve external entity references and chooses to include EntityReference objects anyway will also set the children of this node so as to the represent the entity’s replacement text. In this case, you can use the methods inherited from the Node superinterface to walk the entity’s tree. However, all these children and their descendants are completely read-only. You cannot change them in any way. If you need to modify them, you must first clone each of the EntityReference’s children, and replace the EntityReference with the cloned children.
package org.w3c.dom;
public interface EntityReference extends Node {
}
EntityReference objects are never used for the five predefined entity references (<, >, &, ", and ',) or for character references such as   or  .
The node properties of EntityReference are defined as follows:
node name | The name of the entity |
node value | null |
local name | null |
namespace URI | null |
namespace prefix | null |
DOM uses NamedNodeMap data structures to hold unordered sets of attributes, notations, and entities. You can iterate through a map using the item() and getLength(). The first item in the map is at index 0. However, the particular order the implementation chooses is not significant or even reproducible.
package org.w3c.dom; public interface NamedNodeMap { public Node getNamedItem(String name); public Node setNamedItem(Node node) throws DOMException; public Node removeNamedItem(String name) throws DOMException; public Node item(int index); public int getLength(); public Node getNamedItemNS(String namespaceURI, String localName); public Node setNamedItemNS(Node node) throws DOMException; public Node removeNamedItemNS(String namespaceURI, String localName) throws DOMException; }
NamedNodeMaps are live. That is, adding an item to the map or removing an item from the map will add it to or remove it from whatever construct produced the map in the first place.
Node is the key superinterface for almost all the other classes in the org.w3c.dom package. It is the primary means by which you navigate, search, query, and occasionally even update an XML document with DOM.
package org.w3c.dom; public interface Node { // Node type constants public static final short ELEMENT_NODE; public static final short ATTRIBUTE_NODE; public static final short TEXT_NODE; public static final short CDATA_SECTION_NODE; public static final short ENTITY_REFERENCE_NODE; public static final short ENTITY_NODE; public static final short PROCESSING_INSTRUCTION_NODE; public static final short COMMENT_NODE; public static final short DOCUMENT_NODE; public static final short DOCUMENT_TYPE_NODE; public static final short DOCUMENT_FRAGMENT_NODE; public static final short NOTATION_NODE; // Basic getter methods public String getNodeName(); public String getNodeValue() throws DOMException; public void setNodeValue(String value) throws DOMException; public short getNodeType(); public String getNamespaceURI(); public String getPrefix(); public void setPrefix(String prefix) throws DOMException; public String getLocalName(); // Navigation methods public Node getParentNode(); public boolean hasChildNodes(); public NodeList getChildNodes(); public Node getFirstChild(); public Node getLastChild(); public Node getPreviousSibling(); public Node getNextSibling(); public Document getOwnerDocument(); // Attribute methods public boolean hasAttributes(); public NamedNodeMap getAttributes(); // Tree modification methods public Node insertBefore(Node newChild, Node refChild) throws DOMException; public Node replaceChild(Node newChild, Node oldChild) throws DOMException; public Node removeChild(Node oldChild) throws DOMException; public Node appendChild(Node newChild) throws DOMException; // Utility methods public Node cloneNode(boolean deep); public void normalize(); public boolean isSupported(String feature, String version); }
NodeList is the basic DOM list type. These are most commonly used for lists of children of an Element or Document. The index of the first item in the list is 0, like Java arrays.
The actual data structure used to implement the list can vary from implementation to implementation. However, one constant is that the lists are live. In other words, if a node is deleted or moved from its parent, then it is also deleted from all lists that were built from the children of that parent. Similarly, if a new node is added to some node, then it is also added to all lists that point to the children of that node.
package org.w3c.dom; public interface NodeList { public Node item(int index); public int getLength(); }
The Notation interface represents a notation declared in the document’s DTD. It does not have a position in the tree. However, the complete list of notations in the document is accessible through the getNotations() method of the DocumentType interface. Both this list and the individual Notation objects are read-only.
package org.w3c.dom; public interface Notation extends Node { public String getPublicId(); public String getSystemId(); }
The node properties of Notation are defined as follows:
node name | notation name |
node value | null |
local name | null |
namespace URI | null |
namespace prefix | null |
The ProcessingInstruction interface represents a processing instruction node. Its node properties are defined as follows:
node name | the target |
node value | the data |
local name | null |
namespace URI | null |
namespace prefix | null |
package org.w3c.dom; public interface ProcessingInstruction extends Node { public String getTarget(); public String getData(); public void setData(String data) throws DOMException; }
The Text interface represents a text node. It can contain any characters that are legal in XML text including characters like < and & that may need to be escaped when the document is serialized. When a parser reads an XML document and builds a DOM tree, each Text object will contain the longest possible contiguous run of text. However, DOM does not maintain this constraint as the document is manipulated in memory. Its node properties are defined as follows:
node name | #text |
node value | the text of the node |
local name | null |
namespace URI | null |
namespace prefix | null |
The Text interface only declares one method of its own, splitText(). Most of its functionality is inherited from the superinterfaces CharacterData and Node.
package org.w3c.dom; public interface Text extends CharacterData { public Text splitText(int offset) throws DOMException; }
DOM Level 2 defines only one exception class, DOMException. This is a runtime exception used for almost anything that can go wrong while constructing or manipulating a DOM Document The details are provided by a short field, code which is set to any of several named constants.
package org.w3c.dom; public class DOMException extends RuntimeException { public short code; public static final short INDEX_SIZE_ERR; public static final short DOMSTRING_SIZE_ERR; public static final short HIERARCHY_REQUEST_ERR; public static final short WRONG_DOCUMENT_ERR; public static final short INVALID_CHARACTER_ERR; public static final short NO_DATA_ALLOWED_ERR; public static final short NO_MODIFICATION_ALLOWED_ERR; public static final short NOT_FOUND_ERR; public static final short NOT_SUPPORTED_ERR; public static final short INUSE_ATTRIBUTE_ERR; public static final short INVALID_STATE_ERR; public static final short SYNTAX_ERR; public static final short INVALID_MODIFICATION_ERR; public static final short NAMESPACE_ERR; public static final short INVALID_ACCESS_ERR; public DOMException(short code, String message); }
The DOM traversal API in the org.w3c.dom.traversal package provides some convenience classes for navigating and searching an XML document. The most useful aspects of this class are the ability to get lists and trees that contain the kinds of nodes that you’re interested in while ignoring everything else.
DocumentTraversal is a factory interface for creating new NodeIterator and TreeWalker objects that present a filtered view of the content of an element or a document. (You can filter other kinds of nodes too, but there’s not a lot of point to this if they don’t have any children.)
In implementations that support the traversal API (which can be determined by invoking the hasFeature("Traversal", "2.0" ) method in the Document or DOMImplementation classes) all objects that implement Document also implement DocumentTraversal. That is, to create a DocumentTraversal object, just cast a Document to DocumentTraversal.
package org.w3c.dom.traversal; public interface DocumentTraversal { public NodeIterator createNodeIterator(Node root, int whatToShow, NodeFilter filter, boolean expandEntities) throws DOMException; public TreeWalker createTreeWalker(Node root, int whatToShow, NodeFilter filter, boolean expandEntities) throws DOMException; }
The NodeFilter interface is used by NodeIterators and TreeWalkers to determine which nodes are included in the view of the document they present to the client. Each node in the subtree will be passed to the filter’s acceptNode() method. This returns one of the three named constants NodeFilter.FILTER_ACCEPT (include the node), NodeFilter.FILTER_REJECT (do not include the node or any of its descendants when tree walking, do not include the node or but do include its descendants when iterating), or, NodeFilter.FILTER_SKIP (do not include the node but do include its children if they pass the filter individually).
In addition this class has thirteen named constants that can be combined with the bitwise operators and passed to createNodeIterator() and createTreeWalker() to specify which kinds of nodes should be included in their views.
package org.w3c.dom.traversal; public interface NodeFilter { public static final short FILTER_ACCEPT; public static final short FILTER_REJECT; public static final short FILTER_SKIP; public static final int SHOW_ALL; public static final int SHOW_ELEMENT; public static final int SHOW_ATTRIBUTE; public static final int SHOW_TEXT; public static final int SHOW_CDATA_SECTION; public static final int SHOW_ENTITY_REFERENCE; public static final int SHOW_ENTITY; public static final int SHOW_PROCESSING_INSTRUCTION; public static final int SHOW_COMMENT; public static final int SHOW_DOCUMENT; public static final int SHOW_DOCUMENT_TYPE; public static final int SHOW_DOCUMENT_FRAGMENT; public static final int SHOW_NOTATION; public short acceptNode(Node node); }
The NodeIterator interface presents a subset of nodes from the document as a list in document order. The list is live; that is, changes to the document are reflected in the list.
package org.w3c.dom.traversal; public interface NodeIterator { public Node nextNode() throws DOMException; public Node previousNode() throws DOMException; public Node getRoot(); public int getWhatToShow(); public NodeFilter getFilter(); public boolean getExpandEntityReferences(); public void detach(); }
The TreeWalker interface presents a subset of nodes from the document as a tree. Walking the TreeWalker is much like walking a full Document or Element, except that many of the node’s descendants which you aren’t interested in can be filtered out so they don’t get in your way. The tree is live; that is, changes to the document are reflected in the tree.
package org.w3c.dom.traversal; public interface TreeWalker { public Node parentNode(); public Node firstChild(); public Node lastChild(); public Node previousSibling(); public Node nextSibling(); public Node previousNode(); public Node nextNode(); public Node getRoot(); public int getWhatToShow(); public NodeFilter getFilter(); public boolean getExpandEntityReferences(); public Node getCurrentNode(); public void setCurrentNode(Node node) throws DOMException; }
Copyright 2001, 2002 Elliotte Rusty Harold | elharo@metalab.unc.edu | Last Modified July 27, 2002 |
Up To Cafe con Leche |