The detailed behavior of a serializer is controlled by an OutputFormat object. This class can configure almost any aspect of serialization, including setting the maximum line length, changing the indenting, specifying which elements have their text escaped as CDATA sections, and more. There are even a few options that have the potential to make your documents malformed. For instance, if you add an element to the list of non-escaping elements, then any reserved characters like < and & that appear in its text content will be output as themselves rather than escaped as < and &.
One of the most frequent requests for serializers is pretty printing data with extra line breaks and indentation. Within reasonable limits, the OutputFormat class can provide this. Simply pass true to setIndenting(), pass the number of spaces you want each level to be indented to setIndent(), and pass the maximum line length to setLineWidth(). Example 13.1 demonstrates.
Example 13.1. Using Xerces’ OutputFormat class to pretty print XML
import java.math.*; import java.io.IOException; import org.w3c.dom.*; import javax.xml.parsers.*; import org.apache.xml.serialize.*; public class IndentedFibonacci { public static void main(String[] args) { try { // Find the implementation DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); factory.setNamespaceAware(true); DocumentBuilder builder = factory.newDocumentBuilder(); DOMImplementation impl = builder.getDOMImplementation(); // Create the document Document doc = impl.createDocument(null, "Fibonacci_Numbers", null); // Fill the document BigInteger low = BigInteger.ONE; BigInteger high = BigInteger.ONE; Element root = doc.getDocumentElement(); for (int i = 0; i < 10; i++) { Element number = doc.createElement("fibonacci"); Text text = doc.createTextNode(low.toString()); number.appendChild(text); root.appendChild(number); BigInteger temp = high; high = high.add(low); low = temp; } // Serialize the document OutputFormat format = new OutputFormat(doc); format.setLineWidth(65); format.setIndenting(true); format.setIndent(2); XMLSerializer serializer = new XMLSerializer(System.out, format); serializer.serialize(doc); } catch (FactoryConfigurationError e) { System.out.println("Could not locate a JAXP factory class"); } catch (ParserConfigurationException e) { System.out.println( "Could not locate a JAXP DocumentBuilder class" ); } catch (DOMException e) { System.err.println(e); } catch (IOException e) { System.err.println(e); } } }
When run, this program produces the following output:
C:\XMLJAVA>java IndentedFibonacci <?xml version="1.0" encoding="UTF-8"?> <Fibonacci_Numbers> <fibonacci>1</fibonacci> <fibonacci>1</fibonacci> <fibonacci>2</fibonacci> <fibonacci>3</fibonacci> <fibonacci>5</fibonacci> <fibonacci>8</fibonacci> <fibonacci>13</fibonacci> <fibonacci>21</fibonacci> <fibonacci>34</fibonacci> <fibonacci>55</fibonacci> </Fibonacci_Numbers>
I think you’ll agree that this looks much more attractive than the smushed together output from the bare serialization without any extra white space. One warning, however: white space is significant in XML. Adding this white space has changed the document. This is not the same document as existed before it was pretty printed. For this application the extra white space is insignificant. However, this is not true in all XML applications.
White space is just the beginning of what the OutputFormat class can control. Other features include the MIME media type, the XML declaration, the system and public IDs for the document type, which elements’ content should be escaped as CDATA sections and more. Here are the various properties you can control by invoking various methods on OutputFormat. In some cases the default is document dependent. When it’s not the default value is given in parentheses.
This is normally set to one the three values xml, html or text, indicating the type of output that is desired. The serializer uses this value to configure itself. The default value is determined by the type of document being serialized.
public void setMethod(String method);
public String getMethod();
public static String whichMethod(Document doc);
The MIME media type for the output such as application/xml or application/xhtml+xml. This will not be included in the document itself, but may be used as part of the stream's metadata if it's written in to a file system or onto an HTTP connection or some such.
public void setMediaType(String version);
public String getMediaType();
public static String whichMediaType(Document doc);
The version number used in the encoding declaration. This should always be "1.0". Do not change this.
public void setVersion(String version);
public String getVersion();
The value of the standalone attribute in the XML declaration. This should be true for "yes" and false for "no".
public void setStandalone(boolean standalone);
public boolean getStandalone();
The encoding specifed in the encoding attribute in the XML declaration and used to convert characters to bytes when serializing onto an OutputStream.
public void setEncoding(String encoding);
public String getEncoding();
If true, then no XML declaration is output. If false, an XML declaration is written.
public void setOmitXMLDeclaration(boolean omitXMLDeclaration);
public boolean getOmitXMLDeclaration();
This specifies the system and public IDs of the external DTD subset given in the document type declaration. These values are used only if the Document being serialized does not contain a DocumentType object of its own.
public void setDoctype(String publicID, String systemID);
public String getDoctypePublic();
public String getDoctypeSystem();
public static String whichDoctypePublic(Document doc);
public static String whichDoctypeSystem(Document doc);
If true, then no document type declaration is output. If false, a document type declaration is written. If the document does not have a document type declaration and none has been set with setDoctype(), then no document type declaration will be written, regardless of the value of this property.
public void setOmitDocumentType(boolean omitDocumentType);
public boolean getDocumentType();
The elements whose text node children should not be escaped using entity references.
public void setNonEscapingElements(String[] elementNames);
public String[] getNonEscapingElements(String[] elementNames);
public boolean isNonEscapingElement(String name);
The elements whose text content should be enclosed in a CDATA section.
public void setCDATAElements(String[] elementNames);
public String[] getCDATAElements(String[] elementNames);
public boolean isCDATAElement(String name);
If true, then comments in the document are not written onto the output. If false, they are written.
public void setOmitComments(boolean omitComments);
public boolean getOmitComments();
If true, then the serializer will add indents at each level and wrap lines that exceed the maximum line width. If false it won't. The number of spaces to indent is set by the indent property, and the column to wrap at is set by the line width property.
public void setIndenting(boolean indenting);
public boolean getIndenting();
The number of spaces to indent each level if indenting is true.
public void setIndent(int indent);
public int getIndent();
The maximum number of characters in a line when indenting is true. Setting this to zero turns off line wrapping completely.
public void setLineWidth(int width);
public int getLineWidth();
The character or characters to use for a line break. You shouldonly set this property to a carriage return, a linefeed, or a carriage return-linefeed pair.
public void setLineSeparator(String separator);
public String getLineSeparator();
Example 13.2 uses these methods to create a valid MathML document encoded in ISO-8859-1 with a document type declaration, an XML declaration, no comments, a 65 character maximum line width, a two space indent, a standalone declaration with the value yes, and the MIME media type application/xml:
Example 13.2. Using Xerces’ OutputFormat class to pretty print MathML
import java.math.*; import java.io.*; import org.w3c.dom.*; import javax.xml.parsers.*; import org.apache.xml.serialize.*; public class ValidFibonacciMathML { public static String MATHML_NS = "http://www.w3.org/1998/Math/MathML"; public static void main(String[] args) { try { // Find the implementation DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); factory.setNamespaceAware(true); DocumentBuilder builder = factory.newDocumentBuilder(); DOMImplementation impl = builder.getDOMImplementation(); // Create the document Document doc = impl.createDocument(MATHML_NS, "math", null); // Fill the document BigInteger low = BigInteger.ONE; BigInteger high = BigInteger.ONE; Element root = doc.getDocumentElement(); root.setAttribute("xmlns", MATHML_NS); for (int i = 1; i <= 10; i++) { Element mrow = doc.createElementNS(MATHML_NS, "mrow"); Element mi = doc.createElementNS(MATHML_NS, "mi"); Text function = doc.createTextNode("f(" + i + ")"); mi.appendChild(function); Element mo = doc.createElementNS(MATHML_NS, "mo"); Text equals = doc.createTextNode("="); mo.appendChild(equals); Element mn = doc.createElementNS(MATHML_NS, "mn"); Text value = doc.createTextNode(low.toString()); mn.appendChild(value); mrow.appendChild(mi); mrow.appendChild(mo); mrow.appendChild(mn); root.appendChild(mrow); BigInteger temp = high; high = high.add(low); low = temp; } // Serialize the document onto System.out OutputFormat format = new OutputFormat(doc); format.setLineWidth(65); format.setIndenting(true); format.setIndent(2); format.setEncoding("ISO-8859-1"); format.setDoctype("-//W3C//DTD MathML 2.0//EN", "http://www.w3.org/TR/MathML2/dtd/mathml2.dtd"); format.setMediaType("application/xml"); format.setOmitComments(true); format.setOmitXMLDeclaration(false); format.setVersion("1.0"); format.setStandalone(true); XMLSerializer serializer = new XMLSerializer(System.out, format); serializer.serialize(doc); } catch (FactoryConfigurationError e) { System.out.println("Could not locate a JAXP factory class"); } catch (ParserConfigurationException e) { System.out.println( "Could not locate a JAXP DocumentBuilder class" ); } catch (DOMException e) { System.err.println(e); } catch (IOException e) { System.err.println(e); } } }
Here’s the beginning of the output:
C:\XMLJAVA>java ValidFibonacciMathML D:\books\XMLJAVA\examples\13>java ValidFibonacciMathML <?xml version="1.0" encoding="ISO-8859-1"?> <!DOCTYPE math PUBLIC "-//W3C//DTD MathML 2.0//EN" "http://www.w3.org/TR/MathML2/dtd/mathml2.dtd"> <math> <mrow> <mi>f(1)</mi> <mo>=</mo> <mn>1</mn> </mrow> <mrow> <mi>f(2)</mi> <mo>=</mo> <mn>1</mn> </mrow> …
You can imagine other requests for the serializer. For example, maybe you want a line break after each </mrow> end-tag but no line breaks inside mrow elements. OutputFormat doesn’t give you enough control to arrange serialization at this level of detail, but you could write a custom subclass of XMLSerializer that accomplishes this.
Copyright 2001, 2002 Elliotte Rusty Harold | elharo@metalab.unc.edu | Last Modified June 05, 2002 |
Up To Cafe con Leche |