XML comments don’t have a lot of structure. They’re really just some undifferentiated text inside <!-- and -->. Therefore, the Comment interface, shown in Example 11.19, is a subinterface of CharacterData and shares all its method with that interface. However, your code can use the type to determine that a node is a comment, and treat it appropriately. Serializers will be smart enough to output a Comment with the right markup around it.
Example 11.19. The Comment interface
package org.w3c.dom; public interface Comment extends CharacterData { }
Earlier in Chapter 7, I demonstrated a SAX program that read comments. Now in Example 11.20 you can see the DOM equivalent. The approach is different— actively walking a tree instead of passively receiving events—but the effect is the same, printing the contents of comments and only comments on System.out.
Example 11.20. Printing comments
import javax.xml.parsers.*; import org.w3c.dom.*; import org.xml.sax.SAXException; import java.io.IOException; public class DOMCommentReader { // note use of recursion public static void printComments(Node node) { int type = node.getNodeType(); if (type == Node.COMMENT_NODE) { Comment comment = (Comment) node; System.out.println(comment.getData()); System.out.println(); } else { if (node.hasChildNodes()) { NodeList children = node.getChildNodes(); for (int i = 0; i < children.getLength(); i++) { printComments(children.item(i)); } } } } public static void main(String[] args) { if (args.length <= 0) { System.out.println("Usage: java DOMCommentReader URL"); return; } String url = args[0]; try { DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); DocumentBuilder parser = factory.newDocumentBuilder(); // Read the document Document document = parser.parse(url); // Process the document DOMCommentReader.printComments(document); } catch (SAXException e) { System.out.println(url + " is not well-formed."); } catch (IOException e) { System.out.println( "Due to an IOException, the parser could not check " + url ); } catch (FactoryConfigurationError e) { System.out.println("Could not locate a factory class"); } catch (ParserConfigurationException e) { System.out.println("Could not locate a JAXP parser"); } } // end main }
Here’s the result of running this program on the XML Schema Datatypes specification:
D:\books\XMLJAVA>java DOMCommentReader http://www.w3.org/TR/2001/REC-xmlschema-2-20010502/datatypes.xml commenting these out means only that they won't show up in the stylesheet generated "Revisions from previous draft" appendix Changes before Sept public draft commented out... <sitem> 19990521: PVB: corrected definition of length and maxLengths facet for strings to be in terms of <emph>characters</emph> not <emph>bytes</emph> </sitem> <sitem> 19990521: PVB: removed issue "other-date-representations". We don't want other separators, left mention of aggregate reps for dates as an ednote. </sitem> <sitem> 19990521: PVB: fixed "holidays" example, "-0101" ==> "==0101" (where == in the correction should be two hyphens, but that would not allow us to comment out this sitem) …
It’s not obvious from this output sample, but there is a big difference between the behavior of the SAX and DOM versions of this program. The SAX version begins producing output almost immediately because it works in streaming mode. However, the DOM version first has to read the entire document from the remote URL, parse it, and only then begin walking the tree to look for comments. The SAX and DOM versions are both limited by the speed of the network connection so they both take about the same amount of time to run on the same input data. However, the SAX version begins returning results much more quickly than the DOM version which doesn’t present any results until the entire document has been read. This may not be a big concern in a batch-mode application, but it can be very important when there is a human user. The SAX version will feel a lot more responsive.
Copyright 2001, 2002 Elliotte Rusty Harold | elharo@metalab.unc.edu | Last Modified May 26, 2002 |
Up To Cafe con Leche |