Web Services
File formats: OpenOffice, Word 12, etc.
Config files: Apple's plist format
RSS/Atom
And this must be tested!
Cannot do a straight binary compare
Cannot do a straight text compare
Must use a parser based test tool
CDATA sections vs. entity references:
<![CDATA[<Oxygen/> has an Eclipse plugin for editing XML]]> |
<Oxygen/> has an Eclipse plugin for editing XML |
Entity references vs numeric character references
<Oxygen/> has an Eclipse plugin for editing XML |
<Oxygen/> has an Eclipse plugin for editing XML |
Decimal vs. hexadecimal character references
<Oxygen/> has an Eclipse plugin for editing XML |
<Oxygen/> has an Eclipse plugin for editing XML |
Attribute order
<property name="packages" value="nu.xom.*"/>
|
<property value="nu.xom.*" name="packages" />
|
White space inside tags
<property name="packages" value="nu.xom.*"/>
|
|
Unexpected content
Different order
Comments
Processing instructions
Namespace prefixes
Boundary whitespace
Does this document contain the information it needs to contain?
The question is not: "Does it not contain anything else?"
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple Computer//DTD PLIST 1.0//EN"
"http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>CFBundleDevelopmentRegion</key>
<string>English</string>
<key>CFBundleExecutable</key>
<string>thunderbird-bin</string>
<key>CFBundleGetInfoString</key>
<string>Thunderbird 1.0.2, © 2005 The Mozilla Organization</string>
<key>CFBundleIconFile</key>
<string>thunderbird</string>
<key>CFBundleIdentifier</key>
<string>org.mozilla.thunderbird</string>
<key>CFBundleInfoDictionaryVersion</key>
<string>6.0</string>
<key>CFBundleName</key>
<string>Thunderbird</string>
<key>CFBundlePackageType</key>
<string>APPL</string>
<key>CFBundleShortVersionString</key>
<string>1.0.2</string>
<key>CFBundleSignature</key>
<string>MOZM</string>
<key>CFBundleVersion</key>
<string>1.0.2</string>
<key>NSAppleScriptEnabled</key>
<true/>
</dict>
</plist>
<?xml version="1.0" encoding="MacRoman"?>
<!DOCTYPE plist PUBLIC "-//Apple Computer//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd" [
<!ENTITY version "1.0.2">
]>
<plist version = '1.0'>
<dict>
<key>CFBundleDevelopmentRegion</key>
<string>English</string>
<key>CFBundleExecutable</key>
<string>thunderbird-bin</string>
<key>CFBundleGetInfoString</key>
<string>Thunderbird &version;, © 2005 The Mozilla Organization</string>
<key>CFBundleIconFile</key>
<string>thunderbird</string>
<key>CFBundleIdentifier</key>
<string>org.mozilla.thunderbird</string>
<key>CFBundleInfoDictionaryVersion</key>
<string>6.0</string>
<key>CFBundleName</key>
<string>Thunderbird</string>
<key>CFBundlePackageType</key>
<string>APPL</string>
<key>CFBundleShortVersionString</key>
<string>&version;</string>
<key>CFBundleSignature</key>
<string>MOZM</string>
<key>CFBundleVersion</key>
<string>&version;</string>
<key>NSAppleScriptEnabled</key>
<true/>
</dict>
</plist>
<?xml version="1.0" encoding="MacRoman"?>
<?xml-stylesheet href="plist.css" type="text/css"?>
<!-- Removing all the white space may not make this document as easy to
read, but it could make it faster to parse since there are fewer nodes
to handle. -->
<!DOCTYPE plist SYSTEM "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0"><dict>
<key>CFBundleVersion</key><string>1.0.2</string>
<key>NSAppleScriptEnabled</key><true/>
<key>CFBundleIdentifier</key><string>org.mozilla.thunderbird</string>
<key>CFBundleInfoDictionaryVersion</key><string>6.0</string>
<key>CFBundleName</key><string>Thunderbird</string>
<key>CFBundlePackageType</key><string>APPL</string>
<key>CFBundleShortVersionString</key><string>1.0.2</string>
<key>CFBundleSignature</key><string>MOZM</string>
<key>CFBundleDevelopmentRegion</key><string>English</string>
<key>CFBundleExecutable</key><string>thunderbird-bin</string>
<key>CFBundleGetInfoString</key><string>Thunderbird 1.0.2, © 2005 The Mozilla Organization</string>
<key>CFBundleIconFile</key><string>thunderbird</string>
</dict></plist>
The InfoSet defines 11 Kinds of Information Items
The Document Information Item
Element Information Items
Attribute Information Items
Processing instruction Information Items
Unexpanded Entity Reference Information Items
Character Information Items
Comment Information Items
The Document Type Declaration Information Item
Unparsed Entity Information Items
Notation Information Items
Namespace Declaration Information Items
Not everyone agrees that this is a good thing! or that this is the right list!
An Element Information Item includes:
namespace name
local name
children: a list of element, processing instruction, unexpanded entity reference, character, and comment information items, one for each element, processing instruction, unexpanded entity reference, data character, and comment appearing immediately within the current element
attributes: an unordered set of attribute information items, one for each of the attributes
(specified or defaulted from the DTD) of this element. xmlns
attributes
declarations are not include.
declared namespaces: an unordered set of namespace declaration information items, one for each of the namespaces declared either in the start-tag of this element or defaulted from the DTD.
in-scope namespaces: An unordered set of namespace declaration information items, one for each of the namespaces in effect for this element
base URI: The absolute URI of the external entity in which this element appears, as defined in XML Base. If this is not known, this property is null.
parent
The internal and external DTD subsets; especially
ELEMENT
and ATTLIST
declarations
Whether an empty element uses two tags or one
What kind of quotes surround attributes
Insignificant white space in attributes
White space that occurs between attributes
Attribute order
CDATA sections
Parsed entities
Comments in the DTD
You may well wish to ignore more details when comparing:
Boundary whitespace
Child element order
Comments
Processing instructions
More or less Infoset based
Supported out of the box in Java 1.4 and later
But very painful: JDOM, XOM, etc. are much easier to use
Basic approach (irrespective of API):
Parse document (possibly in a fixture)
Navigate to the piece you want to test
Use Java (or Python, or C#, or whatever) to make the test
private Document plist;
protected void setUp()
throws IOException, ParserConfigurationException, SAXException {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true); // NEVER FORGET THIS!
DocumentBuilder builder = factory.newDocumentBuilder();
plist = builder.parse(new File("thunderbirdplist.xml"));
}
public void testNoTwoKeyElementsAreAdjacentDOM() {
Element root = plist.getDocumentElement();
Element dict = (Element) root.getElementsByTagName("dict").item(0);
NodeList children = dict.getElementsByTagName("*");
for (int i = 0; i < children.getLength(); i++) {
Node element = children.item(i);
if (element.getNodeName().equals("key")) {
assertFalse(children.item(i+1).getNodeName().equals("key"));
// effectively also tests that every key
// is followed by something
}
}
}
Resolves all purely syntactic differences so binary comparisons are possible.
Equal infosets compare equal; non-equal infosets compare unequal
May be too strong: counts boundary white space, element order, etc.
Occasionally too weak: misses attribute types and document type declaration
Comments are included or excluded at user option
Exclusive XML canonicalization avoids a few bugs in the spec
No DOCTYPE
No entity references except for the five predefined ones
No numeric character references
UTF-8
No XML declaration
No empty-element tags
Double quotes on normalized attribute values
No extra white space inside tags
<plist version="1.0">
<dict>
<key>CFBundleDevelopmentRegion</key>
<string>English</string>
<key>CFBundleExecutable</key>
<string>thunderbird-bin</string>
<key>CFBundleGetInfoString</key>
<string>Thunderbird 1.0.2, © 2005 The Mozilla Organization</string>
<key>CFBundleIconFile</key>
<string>thunderbird</string>
<key>CFBundleIdentifier</key>
<string>org.mozilla.thunderbird</string>
<key>CFBundleInfoDictionaryVersion</key>
<string>6.0</string>
<key>CFBundleName</key>
<string>Thunderbird</string>
<key>CFBundlePackageType</key>
<string>APPL</string>
<key>CFBundleShortVersionString</key>
<string>1.0.2</string>
<key>CFBundleSignature</key>
<string>MOZM</string>
<key>CFBundleVersion</key>
<string>1.0.2</string>
<key>NSAppleScriptEnabled</key>
<true></true>
</dict>
</plist>
import java.io.*;
import nu.xom.*;
import nu.xom.canonical.*;
import junit.framework.Assert;
public class CanonicalAssert extends Assert {
public void assertCanonicalEquals(Document expected, Document actual) {
ByteArrayOutputStream expectedBytes = new ByteArrayOutputStream();
ByteArrayOutputStream actualBytes = new ByteArrayOutputStream();
try {
Canonicalizer expectedCanonicalizer
= new Canonicalizer(expectedBytes);
expectedCanonicalizer.write(expected);
byte[] expectedArray = expectedBytes.toByteArray();
Canonicalizer actualCanonicalizer
= new Canonicalizer(actualBytes);
actualCanonicalizer.write(actual);
byte[] actualArray = actualBytes.toByteArray();
assertEquals(expectedArray.length, actualArray.length);
for (int i = 0; i < expectedArray.length; i++) {
assertEquals(expectedArray[i], actualArray[i]);
}
}
catch (IOException ex) {
fail("IOException while canonicalizing");
}
}
}
Use an XPath to select the part of the document to canonicalize
Result may not be well-formed, but will be a byte sequence
Infoset inclusions and omissions pretty much the same as with full document canonicalization
Inheritance of xml:
attributes
public void assertCanonicalEquals(Document expected, Document actual, String xpath) {
ByteArrayOutputStream expectedBytes = new ByteArrayOutputStream();
ByteArrayOutputStream actualBytes = new ByteArrayOutputStream();
try {
Canonicalizer expectedCanonicalizer = new Canonicalizer(expectedBytes);
Nodes expectedNodes = expected.query(xpath);
expectedCanonicalizer.write(expectedNodes);
byte[] expectedArray = expectedBytes.toByteArray();
Canonicalizer actualCanonicalizer = new Canonicalizer(actualBytes);
Nodes actualNodes = actual.query(xpath);
actualCanonicalizer.write(actualNodes);
byte[] actualArray = actualBytes.toByteArray();
assertEquals(expectedArray.length, actualArray.length);
for (int i = 0; i < expectedArray.length; i++) {
assertEquals(expectedArray[i], actualArray[i]);
}
}
catch (IOException ex) {
fail("IOException while canonicalizing");
}
}
Same as canonical XML for full documents.
No inheritance of xml:
attributes
Namespaces in scope are preserved for document subsets
Normally the better choice for document subset canonicalization
public void assertCanonicalEquals(Document expected, Document actual, String xpath) {
ByteArrayOutputStream expectedBytes = new ByteArrayOutputStream();
ByteArrayOutputStream actualBytes = new ByteArrayOutputStream();
try {
Canonicalizer expectedCanonicalizer = new Canonicalizer(
expectedBytes, Canonicalizer.EXCLUSIVE_XML_CANONICALIZATION);
Nodes expectedNodes = expected.query(xpath);
expectedCanonicalizer.write(expectedNodes);
byte[] expectedArray = expectedBytes.toByteArray();
Canonicalizer actualCanonicalizer = new Canonicalizer(
actualBytes, Canonicalizer.EXCLUSIVE_XML_CANONICALIZATION);
Nodes actualNodes = actual.query(xpath);
actualCanonicalizer.write(actualNodes);
byte[] actualArray = actualBytes.toByteArray();
assertEquals(expectedArray.length, actualArray.length);
for (int i = 0; i < expectedArray.length; i++) {
assertEquals(expectedArray[i], actualArray[i]);
}
}
catch (IOException ex) {
fail("IOException while canonicalizing");
}
}
Apache XML Security: Java and C++
Validity is not required, but it's very useful for testing
Very easy to use (as testing XML goes)
Great tool support
Well understood and documented
<!ENTITY % plistObject
"(array | data | date | dict | real | integer | string | true | false )" >
<!ELEMENT plist %plistObject;>
<!ATTLIST plist version CDATA "1.0" >
<!-- Collections -->
<!ELEMENT array (%plistObject;)*>
<!ELEMENT dict (key, %plistObject;)*>
<!ELEMENT key (#PCDATA)>
<!--- Primitive types -->
<!ELEMENT string (#PCDATA)>
<!ELEMENT data (#PCDATA)> <!-- Contents interpreted as Base-64 encoded -->
<!ELEMENT date (#PCDATA)> <!-- Contents should conform to a subset of ISO 8601
(in particular, YYYY '-' MM '-' DD 'T' HH ':' MM ':' SS 'Z'.
Smaller units may be omitted with a loss of precision) -->
<!-- Numerical primitives -->
<!ELEMENT true EMPTY> <!-- Boolean constant true -->
<!ELEMENT false EMPTY> <!-- Boolean constant false -->
<!ELEMENT real (#PCDATA)> <!-- Contents should represent a
floating point number matching
("+" | "-")? d+ ("."d*)? ("E" ("+" | "-") d+)?
where d is a digit 0-9. -->
<!ELEMENT integer (#PCDATA)> <!-- Contents should represent a (possibly signed)
integer number in base 10 -->
public void testValidOutput() throws SAXException, IOEXception {
File f = new File("filename.xml");
InputSource in = new InputSource(new FileInputStream(f));
XMLReader parser = XMLReaderFactory.createXMLReader();
parser.setFeature("http://xml.org/sax/features/validation", true);
parser.setErrorHandler(new ErrorHandler() {
public void warning(SAXParseException exception) {
// skip
}
public void error(SAXParseException exception) throws SAXException {
throw exception;
}
public void fatalError(SAXParseException exception) throws SAXException {
throw exception;
}
});
parser.parse(in);
}
DTD validation is always relative to DTD specified by DOCTYPE.
Sometimes you need to check a document that has no DOCTYPE.
Sometimes you want to substitute a different DTD
Use an EntityResolver
import org.xml.sax.*;
import java.io.*;
public class LocalResolver implements EntityResolver {
public InputSource resolveEntity (String publicID, String systemID)
{
if (publicID.equals("-//Apple Computer//DTD PLIST 1.0//EN")) {
InputStream in = new FileInputStream("plist.dtd");
return new InputSource(in);
}
else {
return null;
}
}
}
Trickier when there's no DOCTYPE at all.
Need to add a DOCTYPE directly into the stream the parser reads.
In Java, do this with SequenceInputStream
and mark and reset
Simon St. Laurent's DOCTYPEChanger: http://www.simonstl.com/projects/doctypes/
Additional Data Typing
Much easier to attach different schemas
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:element name="plist">
<xsd:complexType>
<xsd:sequence minOccurs="1" maxOccurs="1">
<xsd:element name="dict">
<xsd:complexType>
<xsd:sequence minOccurs="1" maxOccurs="123">
<xsd:element name="key" type="xsd:token"/>
<xsd:choice>
<xsd:element name="string" type="xsd:string"/>
<xsd:element name="true"></xsd:element>
<xsd:element name="false"></xsd:element>
</xsd:choice>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:sequence>
<xsd:attribute name="version" type="xsd:string" fixed="1.0"/>
</xsd:complexType>
</xsd:element>
</xsd:schema>
public void testSchemaValidOutput() throws SAXException {
File f = new File("filename.xml");
InputSource in = new InputSource(new FileInputStream(f));
XMLReader parser = XMLReaderFactory.createXMLReader(
"org.apache.xerces.parsers.SAXParser");
parser.setFeature("http://xml.org/sax/features/validation", true);
parser.setFeature("http://apache.org/xml/features/validation/schema", true);
parser.setProperty(
"http://apache.org/xml/properties/schema/external-noNamespaceSchemaLocation",
"examples/plist.xsd");
// also http://apache.org/xml/properties/schema/external-schemaLocation
parser.setErrorHandler(new ErrorHandler() {
public void warning(SAXParseException exception) throws SAXException {
// skip
}
public void error(SAXParseException exception) throws SAXException {
throw exception;
}
public void fatalError(SAXParseException exception) throws SAXException {
throw exception;
}
});
parser.parse(in);
}
The most powerful schema language of all
Quite easy to specify different schemas
namespace a = "http://relaxng.org/ns/compatibility/annotations/1.0"
plistObject =
array | data | date | dict | real | integer | \string | true | false
plist = element plist { attlist.plist, plistObject }
attlist.plist &= [ a:defaultValue = "1.0" ] attribute version { text }?
# Collections
array = element array { attlist.array, plistObject* }
attlist.array &= empty
dict = element dict { attlist.dict, (key, plistObject)* }
attlist.dict &= empty
key = element key { attlist.key, text }
attlist.key &= empty
# - Primitive types
\string = element string { attlist.string, text }
attlist.string &= empty
data = element data { attlist.data, text }
attlist.data &= empty
# Contents interpreted as Base-64 encoded
date = element date { attlist.date, text }
attlist.date &= empty
# Contents should conform to a subset of ISO 8601 (in particular, YYYY '-' MM '-' DD 'T' HH ':' MM ':' SS 'Z'. Smaller units may be omitted with a loss of precision)
# Numerical primitives
true = element true { attlist.true, empty }
attlist.true &= empty
# Boolean constant true
false = element false { attlist.false, empty }
attlist.false &= empty
# Boolean constant false
real = element real { attlist.real, text }
attlist.real &= empty
# Contents should represent a floating point number matching ("+" | "-")? d+ ("."d*)? ("E" ("+" | "-") d+)? where d is a digit 0-9.
integer = element integer { attlist.integer, text }
attlist.integer &= empty
start = plist
Not bundled with JDK; must install third party library
Can still use javax.xml.validation
API.
public void testRELAXNGValid() {
// some of this might be moved into fixtures
DocumentBuilder parser
= DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document document = parser.parse(new File("filename.xml"));
SchemaFactory factory
= SchemaFactory.newInstance(XMLConstants.RELAXNG_NS_URI);
Source source = new StreamSource(new File("plist.rnc"));
Schema schema = factory.newSchema(source);
Validator validator = schema.newValidator();
validator.validate(new DOMSource(document));
// throws exception if document is invalid
}
More declarative
Query for presence (or absence) of specific content
Ignore everything else.
More robust, less specific navigation with //
and descendant
axis
boolean()
function reduces many XPaths to true-false answers
Can be plugged into various APIs: DOM, XOM, JDOM, etc.
There is a CFBundleExecutable: boolean(//key[. = 'CFBundleExecutable'])
There is exactly one CFBundleIconFile: count(//key[. = 'CFBundleIconFile']) = 1
The software is copyrighted by Mozilla: contains(//key[. = 'CFBundleGetInfoString']/following-sibling::string, '© 2005 The Mozilla Organization')
The CFBundleSignature is four letters: string-length(//key[. = 'CFBundleSignature']/following-sibling::string) = 4
No two key elements are adjacent: count(//key/following-sibling::*[1]/self::key) = 0
On top of javax.xml.xpath
Bundled with Java 1.5; a standard extension for Java 1.4 and earlier
There are other frameworks you could use
import org.xml.sax.InputSource;
import javax.xml.xpath.*;
import junit.framework.*;
import java.io.*;
public class PListXPathTest extends TestCase {
private InputSource plist;
private XPath query;
protected void setUp() throws IOException {
plist = new InputSource(new FileInputStream("thunderbirdplist.xml"));
query = XPathFactory.newInstance().newXPath();
}
public void testNoTwoKeyElementsAreAdjacent()
throws XPathExpressionException {
// //key/following-sibling::*[1]/self::key is empty
Boolean result = (Boolean) query.evaluate(
"//key/following-sibling::*[1]/self::key",
plist, XPathConstants.BOOLEAN);
assertFalse(result.booleanValue());
}
}
Very convenient for authoring tests
Doesn't work so well for unit testing
Relatively hard to debug when something breaks
<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:template match="/">
<xsl:if test="not(//key[. = 'CFBundleExecutable'])">
No CFBundleExecutable
</xsl:if>
<xsl:if test="count(//key[. = 'CFBundleIconFile']) = 0">
There is no CFBundleIconFile
</xsl:if>
<xsl:if test="count(//key[. = 'CFBundleIconFile']) > 1">
There is more than one CFBundleIconFile
</xsl:if>
<xsl:if test="not(contains(//key[. = 'CFBundleGetInfoString']/following-sibling::string, '© 2005 The Mozilla Organization'))">
Missing copyright
</xsl:if>
<xsl:if test="string-length(//key[. = 'CFBundleSignature']/following-sibling::string) != 4">
The CFBundleSignature is not four letters
</xsl:if>
<xsl:if test="count(//key/following-sibling::*[1]/self::key) != 0">
Adjacent key elements
</xsl:if>
</xsl:template>
</xsl:stylesheet>
XPath is not Turing complete.
Some things are easier (or possible) if you don't do them in pure XPath.
Use XPath to select the relevant element" //key[. = 'CFBundleVersion']/following-sibling::string[1]
Then test its value with Java.
CFBundleVersion looks like a version string
Find it with XPath
Test it with a regular expression: \d+\.\d+(\.\d+)?
Could use XPath 2.0, but not yet widely supported
public void testCFBundleVersionFormat()
throws XPathExpressionException {
String regex = "\\d+\\.\\d+(\\.\\d+)?";
String xpath = "//key[. = 'CFBundleVersion']/following-sibling::string[1]";
String version = (String) query.evaluate(xpath, plist, XPathConstants.STRING);
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(version);
assertTrue(matcher.matches());
}
According to Schematron inventor Rick Jelliffe:
The Schematron differs in basic concept from other schema languages in that it not based on grammars but on finding tree patterns in the parsed document. This approach allows many kinds of structures to be represented which are inconvenient and difficult in grammar-based schema languages.
Makes it easy to write tests for individual units of the XML document
XPath Based
Validator is implemented in XSLT
W3C Schemas are conservative: everything not permitted is forbidden.
Schematron is liberal: everything not forbidden is permitted.
Handles unordered structures very well
Handles descendant constraints very well
Almost self-documenting
A schema
contains a title
and a pattern
Each pattern
contains rule
child elements
Each rule
contains assert
and report
elements and has a
context
attribute
Each assert
and report
element
has a test
attribute containing an XPath expression
which returns a boolean.
The contents of each assert
element is printed if the assertion test fails
The contents of each report
element is printed if the report test
succeeds
Includes all the constraints previously listed as XPaths:
<?xml version="1.0"?>
<schema xmlns="http://www.ascc.net/xml/schematron">
<title>A Schematron Schema for the Thunderbird PLists</title>
<pattern>
<rule context="plist">
<assert test="//key[. = 'CFBundleExecutable']">
There is a CFBundleExecutable.
</assert>
<assert test="count(//key[. = 'CFBundleIconFile']) = 1">
There is exactly one CFBundleIconFile
</assert>
<assert test="contains(//key[. = 'CFBundleGetInfoString']/following-sibling::string, '© 2005 The Mozilla Organization')">
The software is copyrighted by Mozilla.
</assert>
<assert test="string-length(//key[. = 'CFBundleSignature']/following-sibling::string) = 4">
The CFBundleSignature is four letters
</assert>
<assert test="string-length(//key[. = 'CFBundleSignature']/following-sibling::string) = 4">
The CFBundleSignature is four letters
</assert>
</rule>
</pattern>
<!-- some tests are simpler here -->
<pattern>
<rule context="key">
<assert test="name(following-sibling::*[1]) != 'key'">
No two key elements are adjacent.
</assert>
</rule>
</pattern>
</schema>
Use skeleton-1.5.xsl to generate an XSLT stylesheet:
$ xsltproc skeleton1-5.xsl plist.sct
<?xml version="1.0" standalone="yes"?>
<axsl:stylesheet xmlns:axsl="http://www.w3.org/1999/XSL/Transform" xmlns:sch="http://www.ascc.net/xml/schematron" version="1.0">
<axsl:template match="*|@*" mode="schematron-get-full-path">
<axsl:apply-templates select="parent::*" mode="schematron-get-full-path"/>
<axsl:text>/</axsl:text>
<axsl:if test="count(. | ../@*) = count(../@*)">@</axsl:if>
<axsl:value-of select="name()"/>
...
Apply the stylesheet to the input documents:
$ xsltproc plist.xsl thunderbirdplist.xml
$
Or after deliberately invalidating the input:
$ xsltproc plist.xsl thunderbirdplist.xml
<?xml version="1.0"?>
The CFBundleSignature is four letters
$
You can customize the skeleton to produce different output.
The generated stylesheet can be integrated into testing like any other XSLT solution.
With a little meta work, the original Schematron schema can be integrated into the unit test.
public void testWithSchematron() throws TransformerException, IOException {
StreamSource skeleton = new StreamSource(new File("skeleton1-5.xsl"));
StreamSource schema = new StreamSource(new File("plist.sct"));
StringWriter temp = new StringWriter();
StreamResult result = new StreamResult(temp);
// generate the stylesheet
TransformerFactory factory = TransformerFactory.newInstance();
Transformer xform = factory.newTransformer(skeleton);
xform.transform(schema, result);
temp.flush();
temp.close();
String stylesheet = temp.toString();
// now flip
StringReader in = new StringReader(stylesheet);
StreamSource sheet = new StreamSource(in);
Transformer validator = factory.newTransformer(sheet);
validator.setOutputProperty("method", "text");
temp = new StringWriter();
result = new StreamResult(temp);
validator.transform(new StreamSource(new File("thunderbirdplist.xml")), result);
temp.flush();
String output = temp.toString();
// Check for no output if all tests pass.
assertEquals(output, "", output);
// note use of output for both assertion message
// and test
}
Same issue as handcrafted XSLT-stylesheet-based tests: not good for unit tests;
Hard to find the place to set the breakpoint in the debugger and hard to step through since you end up deep in the XSLT code. After all the code is really XSLT, not Java. It's like trying to debug a Python program using an assembly level debugger.
The assertion is funny. We're basically checking that the stylesheet produces no output. This requires the schema to only use asserts; no reports. It also requires the text output method.
From the XMLUnit web page:
For those of you who've got into it you'll know that test driven development is great. It gives you the confidence to change code safe in the knowledge that if something breaks you'll know about it. Except for those bits you don't know how to test. Until now XML has been one of them. Oh sure you can use
"<stuff></stuff>".equals("<stuff></stuff>");
but is that really gonna work when some joker decides to output a<stuff/>
? -- damned right it's not ;-)XML can be used for just about anything so deciding if two documents are equal to each other isn't as easy as a character for character match. Sometimes
<stuff-doc>
<stuff>
Stuff Stuff Stuff
</stuff>
<more-stuff>
Some More Stuff
</more-stuff>
</stuff-doc>equals <stuff-doc>
<more-stuff>
Some More Stuff</more-stuff>
<stuff>Stuff Stuff Stuff</stuff>
</stuff-doc>and sometimes it doesn't... With XMLUnit you get the control, and you get to decide.
Developed by Jeff Martin and Tim Bacon
Open Source: BSD license
Addresses many of the issues we've been discussing today, but wraps it in a nice JUnit based framework
Based on JAXP and DOM
Can use XPath
Can parse badly formed HTML (or just use TagSoup)
import java.io.*;
import javax.xml.parsers.*;
import org.custommonkey.xmlunit.*;
import org.xml.sax.*;
public class SimpleTest extends XMLTestCase {
public void testHelloWorld()
throws SAXException, IOException, ParserConfigurationException {
String expected = "<GREETING>Hello World!</GREETING>";
String actual = "<GREETING>Hello World!</GREETING>";
assertXMLEqual(expected, actual);
}
}
public void testHelloWorld2()
throws SAXException, IOException, ParserConfigurationException {
String expected = "<?xml version='1.0'?><GREETING >Hello World!</GREETING>";
String actual = "<GREETING>Hello World!</GREETING>";
assertXMLEqual(expected, actual);
}
Can also compare two java.io.Reader
objects
whose content will be parsed
public void testHelloWorld3()
throws SAXException, IOException, ParserConfigurationException {
Reader in1 = new InputStreamReader(new FileInputStream("hello1.xml"), "UTF-8");
Reader in2 = new InputStreamReader(new FileInputStream("hello2.xml"), "UTF-8");
assertXMLEqual(in1, in2);
}
This is poor design. Readers do not handle XML encoding properly. Do not use these methods.
Instead compare DOM documents:
public void testHelloWorld4()
throws SAXException, IOException, ParserConfigurationException {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true); // NEVER FORGET THIS!
DocumentBuilder builder = factory.newDocumentBuilder();
Document in1 = builder.parse(new File("hello1.xml"));
Document in2 = builder.parse(new File("hello2.xml"));
assertXMLEqual(in1, in2);
}
Of course you can provide your own assertion message:
public void testHelloWorld5()
throws SAXException, IOException, ParserConfigurationException {
String expected = "<GREETING>Hello World!</GREETING>";
String actual = "<GREETING>\nHello World!\n</GREETING>";
assertXMLEqual("White space seems to count", expected, actual);
}
Identical: no DOM level differences inside the root element (XMLUnit always ignores the prolog.)
Similar: some differences allowed:
Element Order
Namespace prefixes
Attribute defaulted or present
Boundary whitespace (optional)
assertXMLEqual()
/assertXMLNotEqual()
.
Not equal: clearly different information content
These tests all pass:
public void testSiblingOrder()
throws SAXException, IOException, ParserConfigurationException {
String expected = "<a><x/><y/></a>";
String actual = "<a><y/><x/></a>";
assertXMLEqual("Sibling order seems to count", expected, actual);
}
public void testNamespacePrefix()
throws SAXException, IOException, ParserConfigurationException {
String expected = "<a xmlns='http://www.example.org'><x/></a>";
String actual = "<pre:a xmlns:pre='http://www.example.org'><pre:x/></pre:a>";
assertXMLEqual(expected, actual);
}
public void testDOCTYPE()
throws SAXException, IOException, ParserConfigurationException {
String expected = "<!DOCTYPE a [<!ATTLIST a b CDATA 'test'>]>\n" +
"<a><x/></a>";
String actual = "<a b='test'><x/></a>";
assertXMLEqual(expected, actual);
}
public void testCommentInProlog()
throws SAXException, IOException, ParserConfigurationException {
String expected = "<!-- test -->" +
"<a><x/></a>";
String actual = "<a><x/></a>";
assertXMLEqual(expected, actual);
}
public void testProcessingInstructionInProlog()
throws SAXException, IOException, ParserConfigurationException {
String expected = "<?xml-styleshet type='text/css' href='file.css'?>" +
"<a><x/></a>";
String actual = "<a><x/></a>";
assertXMLEqual(expected, actual);
}
This test fails
public void testCDATA()
throws SAXException, IOException, ParserConfigurationException {
String expected = "<a>Hello</a>";
String actual = "<a><![CDATA[Hello]]></a>";
assertXMLEqual(expected, actual);
}
More detailed comparisons
Can compare:
public Diff(String control, String test) throws SAXException, IOException, ParserConfigurationException
public Diff(Reader control, Reader test) throws SAXException, IOException, ParserConfigurationException
public Diff(Document controlDoc, Document testDoc)
public Diff(String control, Transform testTransform) throws IOException, TransformerException, ParserConfigurationException, SAXException
public Diff(InputSource control, InputSource test) throws SAXException, IOException, ParserConfigurationException
public Diff(DOMSource control, DOMSource test)
Distinguishes between similarity and identity:
public boolean similar()
public boolean identical()
Supports custom rules through configurable DifferenceEngine
and ElementQualifier
:
public Diff(Document controlDoc, Document testDoc, DifferenceEngine comparator)
public Diff(Document controlDoc, Document testDoc, DifferenceEngine comparator, ElementQualifier elementQualifier)
public void overrideDifferenceListener(DifferenceListener delegate)
public void overrideElementQualifier(ElementQualifier delegate)
These tests all fail:
public void testSiblingOrderIdentity()
throws SAXException, IOException, ParserConfigurationException {
String expected = "<a><x/><y/></a>";
String actual = "<a><y/><x/></a>";
Diff diff = new Diff(expected, actual);
assertTrue(diff.identical());
}
public void testNamespacePrefixIdentity()
throws SAXException, IOException, ParserConfigurationException {
String expected = "<a xmlns='http://www.example.org'><x/></a>";
String actual = "<pre:a xmlns:pre='http://www.example.org'><pre:x/></pre:a>";
Diff diff = new Diff(expected, actual);
assertTrue(diff.identical());
}
public void testDOCTYPEIdentity()
throws SAXException, IOException, ParserConfigurationException {
String expected = "<!DOCTYPE a [<!ATTLIST a b CDATA 'test'>]>\n" +
"<a><x/></a>";
String actual = "<a b='test'><x/></a>";
Diff diff = new Diff(expected, actual);
assertTrue(diff.identical());
}
Beware assertXMLIdentical
. It's at least confusing and possibly exactly backwards.
assertXpathExists
:
assert that an XPath expression selects at least one node
assertXpathNotExists
:
assert that an XPath expression does not select any nodes
assertXpathsEqual
:
assert that the node-sets obtained by evaluating two XPath expressions
are similar
assertXpathsNotEqual
:
assert that the nodes obtained by evaluating two XPath expressions
are different
assertXpathValuesEqual
:
assert that the string-value of two XPath expressions
evaluated against two context nodes
are similar
assertXpathValuesNotEqual
:
assert that the string-value of two XPath expressions
are different
assertXpathEvaluatesTo
:
assert that the string-value of an XPath expression is equal to a specified string
private Document plist;
protected void setUp()
throws IOException, ParserConfigurationException, SAXException {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true); // NEVER FORGET THIS!
DocumentBuilder builder = factory.newDocumentBuilder();
plist = builder.parse(new File("thunderbirdplist.xml"));
}
public void testNoTwoKeyElementsAreAdjacent()
throws TransformerException {
assertXpathNotExists(
"//key/following-sibling::*[1]/self::key",
plist);
}
public void testCreatorCodeIsMOZM() throws TransformerException {
assertXpathEvaluatesTo("MOZM",
"//key[. = 'CFBundleSignature']/following-sibling::string",
plist);
}
public void testThereIsAnIcon() throws TransformerException {
assertXpathExists(
"//key[. = 'CFBundleIconFile']",
plist);
assertXpathExists(
"//key[. = 'CFBundleIconFile']/following-sibling::string",
plist);
}
DifferenceListener
interface compares two nodes and tells whether they're identical, similar, or different.
differenceFound
method is invoked for non-identical nodes
This is one way: we can say two different nodes are identical or similar; but we can't say two identical nodes aren't equal
Can't control the tree walking order or skip nodes completely
package org.custommonkey.xmlunit;
public interface DifferenceListener {
public final int RETURN_ACCEPT_DIFFERENCE = 0;
public final int RETURN_IGNORE_DIFFERENCE_NODES_IDENTICAL = 1;
public final int RETURN_IGNORE_DIFFERENCE_NODES_SIMILAR = 2;
public int differenceFound(Difference difference);
public void skippedComparison(Node control, Node test);
}
import org.custommonkey.xmlunit.*;
import org.w3c.dom.Node;
public class CDATAEqualsText implements DifferenceListener {
public int differenceFound(Difference diff) {
Node expected = diff.getControlNodeDetail().getNode();
Node actual = diff.getTestNodeDetail().getNode();
if ((expected.getNodeType() == Node.CDATA_SECTION_NODE
&& actual.getNodeType() == Node.TEXT_NODE)
||
(actual.getNodeType() == Node.CDATA_SECTION_NODE
&& expected.getNodeType() == Node.TEXT_NODE)) {
if (expected.getNodeValue().equals(actual.getNodeValue())) {
return RETURN_IGNORE_DIFFERENCE_NODES_IDENTICAL;
}
}
return RETURN_ACCEPT_DIFFERENCE;
// We could really use something like DOM's NodeFilter
// to indicate whether to process or skip the children
}
public void skippedComparison(Node expected, Node actual) {}
}
String expected = "<root>Hello</root>";
String actual = "<root><![CDATA[Hello]]></root>";
DifferenceListener listener = new CDATAEqualsText();
Diff myDiff = new Diff(expected, actual);
myDiff.overrideDifferenceListener(listener);
assertTrue(myDiff.identical());
ElementQualifier
determines which nodes to compare
Important for comparing elements in different order:
package org.custommonkey.xmlunit;
public interface DifferenceListener {
public boolean qualifyForComparison(Element control, Element test)
}
Return true if the two elements should be compared, false otherwise
Contains various global configuration methods:
package org.custommonkey.xmlunit;
public final class XMLUnit {
public static void setControlParser(String className) throws FactoryConfigurationError;
public static DocumentBuilder getControlParser() throws ParserConfigurationException;
public static DocumentBuilderFactory getControlDocumentBuilderFactory();
public static void setControlDocumentBuilderFactory(DocumentBuilderFactory factory);
public static void setTestParser(String className) throws FactoryConfigurationError;
public static DocumentBuilder getTestParser() throws ParserConfigurationException;
public static DocumentBuilderFactory getTestDocumentBuilderFactory() ;
public static void setTestDocumentBuilderFactory(DocumentBuilderFactory factory);
public static void setIgnoreWhitespace(boolean ignore);
public static boolean getIgnoreWhitespace();
}
This presentation: http://www.cafeconleche.org/slides/sdbestpractices2005/testingxml/
XMLUnit: http://xmlunit.sourceforge.net/
XOM: http://www.xom.nu/
Schematron: http://www.schematron.com/resources.html
Canonical XML: http://www.w3.org/TR/xml-c14n
XML Infoset: http://www.w3.org/TR/xml-infoset