|
|||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectnu.xom.Serializer
A serializer outputs a Document
object in a specific
encoding using various options for controlling white space,
indenting, line breaking, and base URIs. However, in general these
do affect the document's infoset. In particular, if you set either
the maximum line length or the indent size to a positive value,
then the serializer will not respect input white space. It
may trim leading and trailing space, condense runs of white
space to a single space, convert carriage returns and line
feeds to spaces, add extra space where none was present before,
and otherwise muck with the document's white space.
The defaults, however, preserve all significant white space
including ignorable white space and boundary white space.
Constructor Summary | |
Serializer(java.io.OutputStream out)
Create a new serializer that uses the UTF-8 encoding. |
|
Serializer(java.io.OutputStream out,
java.lang.String encoding)
Create a new serializer that uses a specified encoding. |
Method Summary | |
protected void |
breakLine()
Writes the current line break string onto the underlying output stream and indents as specified by the current level and the indent property. |
void |
flush()
Flushes the data onto the output stream. |
protected int |
getColumnNumber()
Returns the current column number of the output stream. |
java.lang.String |
getEncoding()
Returns the name of the character encoding used by this Serializer . |
int |
getIndent()
Returns the number of spaces this serializer indents. |
java.lang.String |
getLineSeparator()
Returns the String used as a line separator. |
int |
getMaxLength()
Returns the preferred maximum line length. |
boolean |
getPreserveBaseURI()
Returns true if this serializer preserves the original base URIs by inserting extra xml:base attributes. |
boolean |
getUnicodeNormalizationFormC()
If true, this property indicates serialization will perform Unicode normalization on all data using normalization form C (NFC). |
void |
setIndent(int indent)
Sets the number of additional spaces to add to each successive level in the hierarchy. |
void |
setLineSeparator(java.lang.String lineSeparator)
Sets the lineSeparator. |
void |
setMaxLength(int maxLength)
Sets the suggested maximum line length for this serializer. |
void |
setOutputStream(java.io.OutputStream out)
Flushes the previous output stream and redirects further output to the new output stream. |
void |
setPreserveBaseURI(boolean preserve)
Determines whether this Serializer inserts
extra xml:base attributes to attempt to
preserve base URI information from the document. |
void |
setUnicodeNormalizationFormC(boolean normalize)
If true, this property indicates serialization will perform Unicode normalization on all data using normalization form C (NFC). |
protected void |
write(Attribute attribute)
Writes an attribute in the form name="value" . |
protected void |
write(Comment comment)
Writes a Comment object
onto the output stream using the current options. |
protected void |
write(DocType doctype)
Writes a DocType object
onto the output stream using the current options. |
void |
write(Document doc)
Serializes a document onto the output stream using the current options. |
protected void |
write(Element element)
Serializes an element onto the output stream using the current options. |
protected void |
write(ProcessingInstruction instruction)
Writes a ProcessingInstruction object
onto the output stream using the current options. |
protected void |
write(Text text)
Writes a Text object
onto the output stream using the current options. |
protected void |
writeAttributes(Element element)
Writes all the attributes of the specified element onto the output stream, one at a time, separated by white space. |
protected void |
writeAttributeValue(java.lang.String value)
Writes a string onto the underlying output stream. |
protected void |
writeChild(Node node)
Writes a child node onto the output stream using the current options. |
protected void |
writeEmptyElementTag(Element element)
Writes an empty-element tag for the element including all its namespace declarations and attributes. |
protected void |
writeEndTag(Element element)
Writes the end-tag for an element in the form </name> . |
protected void |
writeEscaped(java.lang.String text)
Writes a string onto the underlying output stream. |
protected void |
writeNamespaceDeclaration(java.lang.String prefix,
java.lang.String uri)
Writes a namespace declaration in the form xmlns:prefix="uri" or
xmlns="uri" . |
protected void |
writeNamespaceDeclarations(Element element)
Writes all the namespace declaration attributes of the specified element onto the output stream, one at a time, separated by white space. |
protected void |
writeRaw(java.lang.String text)
Writes a string onto the underlying output stream. |
protected void |
writeStartTag(Element element)
Writes the start-tag for the element including all its namespace declarations and attributes. |
protected void |
writeXMLDeclaration()
Writes the XML declaration onto the output stream, followed by a line break. |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
public Serializer(java.io.OutputStream out)
Create a new serializer that uses the UTF-8 encoding.
out
- the output stream to write the document on
java.lang.NullPointerException
- if out
is nullpublic Serializer(java.io.OutputStream out, java.lang.String encoding) throws java.io.UnsupportedEncodingException
Create a new serializer that uses a specified encoding. The encoding must be recognized by the Java virtual machine. Currently the following encodings are recognized by XOM:
More will be added in the future. You can use
encodings not in this list as long as the local virtual
machine supports them. However, characters may unnecessarily
be output as character references. Conversely, not all
versions of Java support all of these encodings. If you
attempt to use an encoding that the local Java virtual
machine does not support, the constructor will throw an
UnsupportedEncodingException
.
out
- the output stream to write the document onencoding
- the character encoding for the serialization
java.lang.NullPointerException
- if out
or encoding
is null
java.io.UnsupportedEncodingException
- if the VM does not
support the requested encodingMethod Detail |
public void setOutputStream(java.io.OutputStream out) throws java.io.IOException
Flushes the previous output stream and redirects further output to the new output stream.
out
- the output stream to write the document on
java.lang.NullPointerException
- if out
is null
java.io.IOException
- if the previous output stream
encounters an I/O error when flushedpublic void write(Document doc) throws java.io.IOException
Serializes a document onto the output stream using the current options.
doc
- the Document
to serialize
java.io.IOException
- if the underlying output stream
encounters an I/O error
java.lang.NullPointerException
- if doc
is null
UnavailableCharacterException
- if the document contains an unescapable
character (e.g. in an element name) that is not available
in the current encodingprotected void writeXMLDeclaration() throws java.io.IOException
Writes the XML declaration onto the output stream, followed by a line break.
java.io.IOException
- if the underlying output stream
encounters an I/O errorprotected void write(Element element) throws java.io.IOException
Serializes an element onto the output stream using the current
options. The result is guaranteed to be well-formed. If
element
does not have a parent element, the output
will also be namespace well-formed.
If the element is empty, this method invokes
writeEmptyElementTag
. If the element is not
empty, then:
writeStartTag
write
in order.writeEndTag
It may break lines or add white space if the serializer has been configured to indent or use a maximum line length.
element
- the Element
to serialize
java.io.IOException
- if the underlying output stream
encounters an I/O error
UnavailableCharacterException
- if the element name contains
a character that is not available in the current encodingprotected void writeEndTag(Element element) throws java.io.IOException
Writes the end-tag for an element in the form
</name>
.
element
- the element whose end-tag is written
java.io.IOException
- if the underlying output stream
encounters an I/O error
UnavailableCharacterException
- if the element name contains
a character that is not available in the current encodingprotected void writeStartTag(Element element) throws java.io.IOException
Writes the start-tag for the element including all its namespace declarations and attributes.
The writeAttributes
method is called to write
all the non-namespace-declaration attributes.
The writeNamespaceDeclarations
method
is called to write all the namespace declaration attributes.
element
- the element whose start-tag is written
java.io.IOException
- if the underlying output stream
encounters an I/O error
UnavailableCharacterException
- if the name of the element or the name of
any of its attributes contains a character that is not
available in the current encodingprotected void writeEmptyElementTag(Element element) throws java.io.IOException
Writes an empty-element tag for the element including all its namespace declarations and attributes.
The writeAttributes
method is called to write
all the non-namespace-declaration attributes.
The writeNamespaceDeclarations
method
is called to write all the namespace declaration attributes.
If subclasses don't wish empty-element tags to be used,
they can override this method to simply invoke
writeStartTag
followed by
writeEndTag
.
element
- the element whose empty-element tag is written
java.io.IOException
- if the underlying output stream
encounters an I/O error
UnavailableCharacterException
- if the name of the element or the name of
any of its attributes contains a character that is not
available in the current encodingprotected void writeAttributes(Element element) throws java.io.IOException
Writes all the attributes of the specified
element onto the output stream, one at a time, separated
by white space. If preserveBaseURI is true, and it is
necessary to add an xml:base
attribute
to the element in order to preserve the base URI, then
that attribute is also written here.
Each individual attribute is written by invoking
write(Attribute)
.
element
- the Element
whose attributes are
written
java.io.IOException
- if the underlying output stream
encounters an I/O error
UnavailableCharacterException
- if the name of any of the element's
attributes contains a character that is not
available in the current encodingprotected void writeNamespaceDeclarations(Element element) throws java.io.IOException
Writes all the namespace declaration
attributes of the specified element onto the output stream,
one at a time, separated by white space. Each individual
declaration is written by invoking
writeNamespaceDeclaration
.
element
- the Element
whose attributes are
written
java.io.IOException
- if the underlying output stream
encounters an I/O error
UnavailableCharacterException
- if any of the element's namespace prefixes
contains a character that is not available in the current
encodingprotected void writeNamespaceDeclaration(java.lang.String prefix, java.lang.String uri) throws java.io.IOException
Writes a namespace declaration in the form
xmlns:prefix="uri"
or
xmlns="uri"
. It does not write
the spaces on either side of the namespace declaration.
These are written by writeNamespaceDeclarations
.
prefix
- the namespace prefix; the empty string for the
default namespaceuri
- the namespace URI
java.io.IOException
- if the underlying output stream
encounters an I/O error
UnavailableCharacterException
- if the namespace prefix contains a
character that is not available in the current encodingprotected void write(Attribute attribute) throws java.io.IOException
Writes an attribute in the form
name="value"
.
Characters in the attribute value are escaped as necessary.
attribute
- the Attribute
to write
java.io.IOException
- if the underlying output stream
encounters an I/O error
UnavailableCharacterException
- if the attribute name contains a character
that is not available in the current encodingprotected void write(Comment comment) throws java.io.IOException
Writes a Comment
object
onto the output stream using the current options.
Since character and entity references are not resolved in comments, comments can only be serialized when all characters they contain are available in the current encoding.
comment
- the Comment
to serialize
java.io.IOException
- if the underlying output stream
encounters an I/O error
UnavailableCharacterException
- if the comment contains a character that is
not available in the current encodingprotected void write(ProcessingInstruction instruction) throws java.io.IOException
Writes a ProcessingInstruction
object
onto the output stream using the current options.
Since character and entity references are not resolved in processing instructions, processing instructions can only be serialized when all characters they contain are available in the current encoding.
instruction
- the ProcessingInstruction
to serialize
java.io.IOException
- if the underlying output stream
encounters an I/O error
UnavailableCharacterException
- if the comment contains a character that is
not available in the current encodingprotected void write(Text text) throws java.io.IOException
Writes a Text
object
onto the output stream using the current options.
Reserved characters such as <, > and "
are escaped using the standard entity references
such as <
, >
,
and "
.
Characters which cannot be encoded in the current character set (for example, Ω in ISO-8859-1) are encoded using character references. Unsupported character sets encode all non-ASCII characters, even when they don't need to be.
text
- the Text
to serialize
java.io.IOException
- if the underlying output stream
encounters an I/O errorprotected void write(DocType doctype) throws java.io.IOException
Writes a DocType
object
onto the output stream using the current options.
doctype
- the document type declaration to serialize
java.io.IOException
- if the underlying
output stream encounters an I/O error
UnavailableCharacterException
- if the document type declaration contains
a character that is not available in the current encodingprotected void writeChild(Node node) throws java.io.IOException
Writes a child node onto the output stream using the
current options. It is invoked when walking the tree to
serialize the entire document. It is not called, and indeed
should not be called, for either the Document
node or for attributes.
node
- the Node
to serialize
java.io.IOException
- if the underlying output stream
encounters an I/O error
XMLException
- if an Attribute
or a
Document
is passed to this methodprotected final void writeEscaped(java.lang.String text) throws java.io.IOException
Writes a string onto the underlying output stream.
Non-ASCII characters that are not available in the
current character set are hexadecimally escaped.
The three reserved characters <, >, and &
are escaped using the standard entity references
<
, >
,
and &
.
Double and single quotes are not escaped.
text
- the String
to serialize
java.io.IOException
- if the underlying output stream
encounters an I/O errorprotected final void writeAttributeValue(java.lang.String value) throws java.io.IOException
Writes a string onto the underlying output stream.
Non-ASCII characters that are not available in the
current character set are escaped using hexadeicmal numeric
character references. Carriage returns, line feeds, and tabs
are also escaped using hexadecimal numeric character
references in order to ensure their preservation on a round
trip. The four reserved characters <, >, &,
and " are escaped using the standard entity references
<
, >
,
&
, and "
.
The single quote is not escaped.
value
- the String
to serialize
java.io.IOException
- if the underlying output stream
encounters an I/O errorprotected final void writeRaw(java.lang.String text) throws java.io.IOException
Writes a string onto the underlying output stream.
without escaping any characters.
Non-ASCII characters that are not available in the
current character set cause an IOException
.
text
- the String
to serialize
java.io.IOException
- if the underlying output stream
encounters an I/O error or text
contains
characters not available in the current character setprotected final void breakLine() throws java.io.IOException
Writes the current line break string onto the underlying output stream and indents as specified by the current level and the indent property.
java.io.IOException
- if the underlying output stream
encounters an I/O errorpublic void flush() throws java.io.IOException
Flushes the data onto the output stream. It is not enough to flush the output stream. You must flush the serializer object itself because it uses some internal buffering. The serializer will flush the underlying output stream.
java.io.IOException
- if the underlying
output stream encounters an I/O errorpublic int getIndent()
Returns the number of spaces this serializer indents.
public void setIndent(int indent)
Sets the number of additional spaces to add to each successive level in the hierarchy. Use 0 for no extra indenting. The maximum indentation is in limited to approximately half the maximum line length. The serializer will not indent further than that no matter how many levels deep the hierarchy is.
When this variable is set to a value greater than 0, the serializer does not preserve white space. Spaces, tabs, carriage returns, and line feeds can all be interchanged at the serializer's discretion, and additional white space may be added before and after tags. Carriage returns, line feeds, and tabs will not be escaped with numeric character references.
Inside elements with an xml:space="preserve"
attribute, white space is preserved and no indenting
takes place, regardless of the setting of the indent
property, unless, of course, an
xml:space="default"
attribute overrides the
xml:space="preserve"
attribute.
The default value for indent is 0; that is, the default is not to add or subtract any white space from the source document.
indent
- the number of spaces to indent
each successive level of the hierarchy
java.lang.IllegalArgumentException
- if indent is less than zeropublic java.lang.String getLineSeparator()
Returns the String
used as a line separator.
This is always "\n"
, "\r"
,
or "\r\n"
.
public void setLineSeparator(java.lang.String lineSeparator)
Sets the lineSeparator. This can only be one of the
three strings "\n"
, "\r"
,
or "\r\n"
. All other values are forbidden.
If this method is invoked, then
line separators in the character data will be changed to this
string. Line separators in attribute values will be changed
to the hexadecimal numeric character references corresponding
to this string.
The default line separator is "\r\n"
. However,
line separators in character data and attribute values are not
changed to this string, unless this method is called first.
lineSeparator
- the lineSeparator to set
java.lang.IllegalArgumentException
- if you attempt to use any line
separator other than "\n"
, "\r"
,
or "\r\n"
.public int getMaxLength()
Returns the preferred maximum line length.
public void setMaxLength(int maxLength)
Sets the suggested maximum line length for this serializer. Setting this to 0 indicates that no automatic wrapping is to be performed. When a line approaches this length, the serializer begins looking for opportunities to break the line. Generally it will break on any ASCII white space character (tab, carriage return, linefeed, and space). In some circumstances the serializer may not be able to break the line before the maximum length is reached. For instance, if an element name is longer than the maximum line length the only way to correctly serialize it is to exceed the maximum line length. In this case, the serializer will exceed the maximum line length.
The default value for max line length is 0, which is interpreted as no maximum line length. Setting this to a negative value just sets it to 0.
When this variable is set to a value greater than 0, the serializer does not preserve white space. Spaces, tabs, carriage returns, and line feeds can all be interchanged at the serializer's discretion. Carriage returns, line feeds, and tabs will not be escaped with numeric character references.
Inside elements with an xml:space="preserve"
attribute, the maximum line length is not enforced,
regardless of the setting of the this property, unless,
of course, an xml:space="default"
attribute
overrides the xml:space="preserve"
attribute.
maxLength
- the suggested maximum line lengthpublic boolean getPreserveBaseURI()
Returns true if this serializer preserves the original
base URIs by inserting extra xml:base
attributes.
Serializer
inserts
extra xml:base
attributes to attempt to
preserve base URI information from the document.public void setPreserveBaseURI(boolean preserve)
Determines whether this Serializer
inserts
extra xml:base
attributes to attempt to
preserve base URI information from the document.
The default is false, do not preserve base URI information.
xml:base
attributes that are part of the document's
infoset are always output. This property only determines
whether or not extra xml:base
attributes are added.
preserve
- true if xml:base
attributes should be added as necessary
to preserve base URI informationpublic java.lang.String getEncoding()
Returns the name of the character encoding used by
this Serializer
.
public void setUnicodeNormalizationFormC(boolean normalize)
If true, this property indicates serialization will perform Unicode normalization on all data using normalization form C (NFC). Performing Unicode normalization may change the document's infoset. The default is false; do not normalize.
The implementation used is IBM's International Components for Unicode for Java (ICU4J) 2.6. This version is based on Unicode 4.0.
This feature has not yet been benchmarked or optimized. It may result in substantially slower code.
If all your data is in the first 256 code points of Unicode (i.e. the ISO-8859-1, Latin-1 character set) then it's already in normalization form C and renormalizing won't change anything.
normalize
- true if normalization is performed;
false if it isn'tpublic boolean getUnicodeNormalizationFormC()
If true, this property indicates serialization will perform Unicode normalization on all data using normalization form C (NFC). The default is false; do not normalize.
protected final int getColumnNumber()
Returns the current column number of the output stream. This method useful for subclasses that implement their own pretty printing strategies by inserting white space and line breaks at appropriate points.
Columns are counted based on Unicode characters, not Java chars. A surrogate pair counts as one character in this context, not two. However, a character followed by a combining character (e.g. e followed by combining accent acute) counts as two characters. This latter choice (treating combining characters like regular characters) is under review, and may change in the future if it's not too big a performance hit.
|
|||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |