The Bleeding Edge of XMLElliotte Rusty HaroldXML and Web Services 2003 LondonMonday, March 17, 2003elharo@metalab.unc.eduhttp://www.cafeconleche.org// |
Part I: XML Infoset, Canonical XML, Digital Signatures, and Encryption
Part II: XML 1.1
Part III: XInclude
Part IV: SAX 2.1
Part V: DOM Level 3
Part VI: XOM
The Infoset is the unfortunate standard to which those in retreat from the radical and most useful implications of well-formedness have rallied. At its core the Infoset insists that there is 'more' to XML than the straightforward syntax of well-formedness. By imposing its canonical semantics the Infoset obviates the infinite other semantic outcomes which might be elaborated in particular unique circumstances from an instance of well-formed XML 1.0 syntax. The question we should be asking is not whether the Infoset has chosen the correct canonical semantics, but whether the syntactic possibilities of XML 1.0 should be curtailed in this way at all.--Walter Perry on the xml-dev mailing list
<?xml version="1.0" encoding="UTF-8" standalone="no"?> <?xml-stylesheet type="text/css" href="song.css"?> <!DOCTYPE SONG SYSTEM "song.dtd"> <SONG xmlns="http://www.cafeconleche.org/namespace/song" xmlns:xlink="http://www.w3.org/1999/xlink"> <TITLE>Hot Cop</TITLE> <PHOTO xlink:type="simple" xlink:show="onLoad" xlink:href="hotcop.jpg" ALT="Victor Willis in Cop Outfit" WIDTH="100" HEIGHT="200"/> <COMPOSER>Jacques Morali</COMPOSER> <COMPOSER>Henri Belolo</COMPOSER> <COMPOSER>Victor Willis</COMPOSER> <PRODUCER>Jacques Morali</PRODUCER> <!-- The publisher is actually Polygram but I needed an example of a general entity reference. --> <PUBLISHER xlink:type="simple" xlink:href="http://www.amrecords.com/"> A & M Records </PUBLISHER> <LENGTH>6:20</LENGTH> <YEAR>1978</YEAR> <ARTIST>Village People</ARTIST> </SONG> <!-- You can tell what album I was listening to when I wrote this example -->
<?xml-stylesheet type="text/css" href="song.css"?> <SONG xmlns="http://www.cafeconleche.org/namespace/song" xmlns:xlink="http://www.w3.org/1999/xlink"> <TITLE>Hot Cop</TITLE> <PHOTO ALT="Victor Willis in Cop Outfit" HEIGHT="200" WIDTH="100" xlink:href="hotcop.jpg" xlink:show="onLoad" xlink:type="simple"></PHOTO> <COMPOSER>Jacques Morali</COMPOSER> <COMPOSER>Henri Belolo</COMPOSER> <COMPOSER>Victor Willis</COMPOSER> <PRODUCER>Jacques Morali</PRODUCER> <PUBLISHER xlink:href="http://www.amrecords.com/" xlink:type="simple"> A & M Records </PUBLISHER> <LENGTH>6:20</LENGTH> <YEAR>1978</YEAR> <ARTIST>Village People</ARTIST> </SONG>
import org.apache.xerces.parsers.*; import org.w3c.dom.*; import org.xml.sax.*; import java.io.*; public class DOMHotCop { public static void main(String[] args) { DOMParser parser = new DOMParser(); try { parser.parse("hot_cop.xml"); Document d = parser.getDocument(); } catch (SAXException e) { System.err.println(e); } catch (IOException e) { System.err.println(e); } } }
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE EncryptedData SYSTEM "song.dtd"> <?xml-stylesheet type="text/css" href="song.css"?><EncryptedData Id="ed1" Type="http://www.w3.org/2001/04/xmlenc#Element" xmlns="http://www.w3.org/2001/04/xmlenc#"> <EncryptionMethod Algorithm="http://www.w3.org/2001/04/xmlenc#tripledes-cbc"/> <KeyInfo xmlns="http://www.w3.org/2000/09/xmldsig#"> <EncryptedKey xmlns="http://www.w3.org/2001/04/xmlenc#"> <EncryptionMethod Algorithm="http://www.w3.org/2001/04/xmlenc#rsa-1_5"/> <KeyInfo xmlns="http://www.w3.org/2000/09/xmldsig#"> <KeyName>Alice</KeyName> </KeyInfo> <CipherData> <CipherValue/> </CipherData> </EncryptedKey> </KeyInfo> <CipherData> <CipherValue>yyIIMYu1mpIm+5MVokRnJ0hnfvZt/x/3ly311l6dK0v1GvynMJP1rkb+/YGfr6Zy00nL4plqxFgo5pJuVmxzj+R7q6f7sF6acfU0XBABICE9ZXfJ5gnainHVuaWbHnVPgT3fi2ohxhEmXp/JF7NhqDvsH9PULZLCIaRS9tKsrNrzdX/EQM3enQHkyc0aJAuAFLTwU710Hta7pf3qXX62i3UGSqjxy2Di8fOs+d/P4nysE9428SZmOM6fe4/m8YyRayRxMNr2RoOQIiYkiJ1krEGQzQ0XGJwIWmAR56CsMljTyT1G/2BDp39k/jCEiqARPekTwHZ1m7Pyh81nr4lnfm9lF3/NzlYe7wpnfSBp2u6IytoWWOeP27h5HTsu5jYfkRhht2h2R4nyIj07YkOsPmd9ubu3cq/SYU4DuvtKrKEIkhnYg4ZUVGjMKlffGzLNAaS2G1PRVIENJHNRoJwivY6+cPqjOhXUvioNQ/WQTOeo5cvTlJaD/od5VWGTJ75ZR8tkZfwFbop8JbhNN6ZODZNSNndnMJ1jEJeeFobOel5Vw0/ClPGh12LxkEJX/h3A+GyUtEfoAmB8ANb3xTsqiTyea1ZBJaS9hhcAFt3Ck+gTHPzwYS+y6x5qRTCfPyZS5PHvKjjIkAEXv+0p9zlQT9hBH1BJB6jXtWjd5sZAE3rMQC/7MXyXvN3ms/TFypBaQsWzKRg+JvxToErD1MtJXT1g8uZr59ubVlBcyjTWcCLMf+QUDxaY0iqPneNSGHAr1isuFc8PZOwJemYjnsySB0R8NN2LcCdFtK8IcB2+QLY7QCj8CAPy4uIZyHbCx6ojg5KWyGOIM5vmWGq6p6Tg+Y3nbc1uFOr1CbXCIbaNC9DI3N+HAcnW7439/JpMhMRa9s02RZsVqhjo4rYz04lkjI/44ffrBVsxk0/sk6XyCnZHQAwpd4y5gXofyPzW83yXA1iXZh7SQfs=</CipherValue> </CipherData> </EncryptedData><!-- You can tell what album I was listening to when I wrote this example -->
The customary form of an XML document
The canonical form of an XML document
The object form of an XML document
The encrypted form of an XML document
A W3C proposed recommendation providing "a consistent set of definitions for use in other specifications that need to refer to the information in a well-formed XML document." This is considerably weaker than originally planned.
What it used to be: A W3C standard for what is and is not significant in an XML document
Not everyone agrees that this is a good thing! or that this is the right list!
The Document Information Item
Element Information Items
Attribute Information Items
Processing Instruction Information Items
Unparsed Entity Information Items
Unexpanded Entity Information Items
Character Information Items
Comment Information Items
The Document Type Declaration Information Item
Notation Information Items
Namespace Information Items
Represents the entire document; not just the root element
Properties:
Children
One Element Information Item for the root element
One Comment Information Item for each Comment
One Processing Instruction Information Item for each Processing Instruction
Document Element
Character Encoding Scheme
Notation Declarations
Entity Declarations
Base URI
Standalone Declaration
Version Declaration
All declarations processed
<PHOTO
xlink:type="simple" xlink:show="onLoad" xlink:href="hotcop.jpg"
ALT="Victor Willis in Cop Outfit" WIDTH="100" HEIGHT="200"/>
<COMPOSER>Jacques Morali</COMPOSER>
<COMPOSER>
<PERSON>
<NAME>
<FIRST>Henri</FIRST>
<LAST>Belolo</LAST>
</NAME>
</PERSON>
</COMPOSER>
<rdf:RDF xmlns:rdf="http://www.w3.org/TR/REC-rdf-syntax#">
<rdf:Description xmlns:dc="http://purl.org/dc/"
about="http://www.ibiblio.org/examples/impressionists.xml">
<dc:title> Impressionist Paintings </dc:title>
<dc:creator> Elliotte Rusty Harold </dc:creator>
<dc:description>
A list of famous impressionist paintings organized
by painter and date
</dc:description>
<dc:date>2000-08-22</dc:date>
</rdf:Description>
</rdf:RDF>
An Element Information Item Includes:
namespace name; e.g. the absolute URI for the element's namespace
local name
prefix
children: a list of element, processing instruction, unexpanded entity, character, and comment information items, one for each element, processing instruction, unexpanded entity, data character, and comment appearing immediately within the current element
attributes: an unordered set of attribute information items, one for each of the attributes
(specified or defaulted from the DTD) of this element. xmlns
attributes
are not included.
namespace attributes: an unordered set of attribute information items, one for each of the namespaces declared either in the start-tag of this element or defaulted from the DTD.
in-scope namespaces: An unordered set of namespace information items, one for each of the namespaces in effect for this element
base URI: The absolute URI of the external entity in which this element appears, as defined in XML Base. If this is not known, this property is null.
parent
xlink:type="simple"
xlink:href="http://www.amrecords.com/"
xlink:type = "simple"
xlink:show = "onLoad"
xlink:href="hotcop.jpg"
ALT="Victor Willis in Cop Outfit"
WIDTH=" 100 "
HEIGHT=' 200 '
An Attribute Information Item Includes:
namespace name
local name
prefix
normalized value
specified: A flag indicating whether this attribute was actually specified in the start-tag of its element, or was defaulted from the DTD
attribute type:
ID
IDREF
IDREFS
ENTITY
ENTITIES
NMTOKEN
NMTOKENS
NOTATION
CDATA
ENUMERATED
owner element
references: only for IDREF, IDREFS, ENTITY, ENTITIES, and NOTATION type attributes; an ordered list of the things this attribute points to
<!-- The publisher is actually Polygram but I needed
an example of a general entity reference. -->
<!-- <PUBLISHER xlink:type="simple" xlink:href="http://www.amrecords.com/">
A & M Records
</PUBLISHER>
<LENGTH>6:20</LENGTH>
<YEAR>1978</YEAR>
<ARTIST>Village People</ARTIST>
</SONG> -->
<!-- You can tell what album I was
listening to when I wrote this example -->
A comment Information Item includes:
content
parent
<?robots index="yes" follow="no"?>
<?php
mysql_connect("database.unc.edu", "clerk", "password");
$result = mysql("CYNW", "SELECT LastName, FirstName FROM Employees
ORDER BY LastName, FirstName");
$i = 0;
while ($i < mysql_numrows ($result)) {
$fields = mysql_fetch_row($result);
echo "<person>$fields[1] $fields[0] </person>\r\n";
$i++;
}
mysql_close();
?>
target
content
notation
base URI
parent
A character is one Unicode character in the content of an element, attribute value, comment or processing instruction data.
A Character Information Item includes:
Note that Unicode is not a two-byte character set
An element has one namespace information item for each namespace in scope on the element. This is not the same as the namespaces declared on the element.
A Namespace Information Item includes:
prefix
namespace name
There is no obvious representation of namespace information items in the syntax of an XML document.
These are namespace declaration attributes, not namespace information items:
xmlns:rdf="http://www.w3.org/TR/REC-rdf-syntax#"
xmlns:dc="http://purl.org/dc/"
xmlns="http://www.w3.org/2000/svg"
Consider this document:
<svg:svg width="5cm" height="4cm"
xmlns:svg="http://www.w3.org/2000/svg">
<svg:desc>Two rectangles</svg:desc>
<svg:rect x="1.5cm" y="3.5cm" width="12cm" height="9.9cm"/>
<svg:rect x="2.5cm" y="2.8cm" width="3cm" height="17cm"/>
</svg:svg>
Each of the four elements has a namespace information item
with the prefix svg
and the namespace name
http://www.w3.org/2000/svg
<!DOCTYPE SONG SYSTEM "song.dtd">
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
A Document Type Declaration Information Item includes:
SYSTEM ID
PUBLIC ID
children: only the comment and processing instruction information items in the internal DTD subset and external DTD subsets.
parent
<!ELEMENT SONG (TITLE, PHOTO?, COMPOSER+, PRODUCER*, PUBLISHER*, LENGTH?, YEAR?, ARTIST+)> <!ATTLIST SONG xmlns CDATA #REQUIRED xmlns:xlink CDATA #REQUIRED> <!ELEMENT TITLE (#PCDATA)> <!ELEMENT PHOTO EMPTY> <!ATTLIST PHOTO xlink:type CDATA #FIXED "simple" xlink:href CDATA #REQUIRED xlink:show CDATA #IMPLIED ALT CDATA #REQUIRED WIDTH CDATA #REQUIRED HEIGHT CDATA #REQUIRED > <!ELEMENT COMPOSER (#PCDATA)> <!ELEMENT PRODUCER (#PCDATA)> <!ELEMENT PUBLISHER (#PCDATA)> <!ATTLIST PUBLISHER xlink:type CDATA #IMPLIED xlink:href CDATA #IMPLIED > <!ELEMENT LENGTH (#PCDATA)> <!-- This should be a four digit year like "1999", not a two-digit year like "99" --> <!ELEMENT YEAR (#PCDATA)> <!ELEMENT ARTIST (#PCDATA)>
There is no information item for this.
Comments and processing instructions in the DTD are reported as children of the Document Type Declaration information item
Notation and general entity declarations are reported as properties of the Document information item
Attribute types and default values are reported on the actual attributes in the document instance.
Everything else is not reported!
An XML document is made up of one or more physical storage units called entities
Entity references :
Parsed internal general entity references like &
Parsed external general entity references
Unparsed external general entity references
External parameter entity references
Internal parameter entity references
Reading an XML document is not the same thing as reading an XML file
The XML file contains entity references.
The XML document contains the entities' replacement text.
When you use a parser to read a document you'll get the text including characters like <. You will not see the entity references.
Two kinds of entity information items:
Unparsed Entity Information Item
Unexpanded Entity Information Items
Other entities are not reported
name
system identifier
public identifier
Notation
name
entity
parent
The internal and external DTD subsets; especially
ELEMENT
and ATTLIST
declarations
Schema types
CDATA sections
Character references
Expanded, parsed entity references
Whether an empty element uses two tags or one
What kind of quotes surround attributes
Insignificant white space in attributes
White space that occurs between attributes
Attribute order
A schema assigns a type to each element
Schema validation produces a Post Schema Validation Infoset, PSVI for short
Schema aware applications using schema aware parsers and APIs can make use of the types of elements
A W3C proposed standard serialization format of an XML document instance
Not everyone agrees that this is a good thing! or that this is the right format! It's totally unsuitable for editors and validation.
Based on the XPath 1.0 data model
Not really Infoset compatible
Something of this nature is nonetheless clearly needed for non-XML aware tools like digital signatures, change management, hash functions, and the like.
The document is encoded in UTF-8
Line breaks are normalized to a linefeed (ASCII , \n)
Attribute values are normalized, as if by a validating processor
Character and parsed entity references are replaced
CDATA sections are replaced with their character content
The XML and document type declarations are removed
Empty elements are converted to start tag-end tag pairs
White space outside of the document element and within start and end tags is normalized
All white space in character content is retained (except for characters removed during linefeed normalization)
Attribute value delimiters are set to double quotes
Special characters in attribute values and character content are replaced by character references
Superfluous namespace declarations are removed from each element
Default attributes are added to each element
Lexicographic order is imposed on the namespace declarations and attributes of each element
c14n.C14nDOM
reads an XML document from stdin
and writes the canonicalized output to stdout:
% java c14n.C14nDOM -xpath < hotcop.xml > canonicalized_hotcop.xml
-xpath option necessary to support the final draft of Canonical XML 1.0.
API in com.ibm.xml.dsig.Canonicalizer
package com.ibm.xml.dsig;
public interface Canonicalizer {
public static final String W3C
= "http://www.w3.org/TR/2000/WD-xml-c14n-20000119"
public static final java.lang.String W3C2
= "http://www.w3.org/TR/2001/REC-xml-c14n-20010315"
public static final java.lang.String W3C2WC
= "http://www.w3.org/TR/2001/REC-xml-c14n-20010315#WithComments"
public static final java.lang.String EXCLUSIVE;
public static final java.lang.String EXCLUSIVEWC;
public String getURI();
public void canonicalize(org.w3c.dom.Node node, OutputStream stream)
throws IOException;
}
Implementations include:
com.ibm.xml.dsig.transform.ExclusiveC11r
com.ibm.xml.dsig.transform.ExclusiveC11rWC
com.ibm.xml.dsig.transform.W3CCanonicalizer
com.ibm.xml.dsig.transform.W3CCanonicalizer2
com.ibm.xml.dsig.transform.W3CCanonicalizer2WC
DOMWriter
W3C/IETF Joint Recommendation, February 12, 2002
XML Signatures provide:
Integrity
Message authentication
Signer authentication
For data of any type
Signed data can be located within the XML that includes the signature or elsewhere.
An enveloped signature is enclosed inside the XML element it signs
An enveloping signature signs XML data it contains.
A detached signature signs
data external to the Signature
element,
possibly in another document entirely.
The signature processor calculates a hash code for some data using a strong, one-way hash function.
The processor encrypts the hash code using a private key.
The verifier calculates the hash code for the data it's received.
It then decrypts the encrypted hash code using the public key to see if the hash codes match.
The signature processor digests (calculates the hash code for) a data object.
The processor places the digest value
in a Signature
element.
The processor digests the Signature
element.
The processor cryptographically signs
the Signature
element.
SampleSign2 and VerifyGUI from IBM's XML Security Suite: http://www.alphaworks.ibm.com/tech/xmlsecuritysuite
First use the JDK's keytool to generate a key:
% keytool -genkey -dname "CN=Elliotte Rusty Harold, OU=Metrotech, O=Polytechnic, L=Brooklyn, S=New York, C=US" -alias elharo -storepass mypassword -keypass mykeypassword
SampleSign2 reads an XML document from stdin and writes the signature to stdout:
% java dsig.SampleSign2 elharo mypassword mykeypassword -ext
file:///home/elharo/speaking/xmlone/london2003/bleeding/examples/hotcop.xml > hotcop_signature.xml
Key store: C:\Documents and Settings\Administrator\.keystore
Sign: 7030ms
VerifyGUI reads signature from stdin and warns of changes to signed content.
C:\>java dsig.VerifyGUI < hotcop_signature.xml
The signature has a KeyValue element.
The signature has one or more X509Data elements.
Checks an X509Data:
It has 1 certificate(s).
Certificate Information:
Version: 1
Validity: OK
SubjectDN: CN=Elliotte Rusty Harold, OU=Metrotech, O=Polytechnic, L=Brooklyn, ST=New York, C=US
IssuerDN: CN=Elliotte Rusty Harold, OU=Metrotech, O=Polytechnic, L=Brooklyn, ST=New York, C=US
Serial#: 983556890
Time to verify: 951 [msec]
<Signature xmlns="http://www.w3.org/2000/09/xmldsig#">
<SignedInfo>
<CanonicalizationMethod Algorithm="http://www.w3.org/TR/2001/REC-xml-c14n-20010315"></CanonicalizationMethod>
<SignatureMethod Algorithm="http://www.w3.org/2000/09/xmldsig#dsa-sha1"></SignatureMethod>
<Reference URI="file:///home/elharo/speaking/xmlone/london2003/bleeding/examples/hotcop.xml">
<DigestMethod Algorithm="http://www.w3.org/2000/09/xmldsig#sha1"></DigestMethod>
<DigestValue>i9y8sGT7gjSa0tkQI9aVsl7P6zg=</DigestValue>
</Reference>
</SignedInfo>
<SignatureValue>
RBIqdgwjHB8yufwiwScaf/L1P95u4SknSU2NLEeBH1yUAdyzjD/B3A==
</SignatureValue>
<KeyInfo>
<KeyValue>
<DSAKeyValue>
<P>
/X9TgR11EilS30qcLuzk5/YRt1I870QAwx4/gLZRJmlFXUAiUftZPY1Y+r/F9bow9s
ubVWzXgTuAHTRv8mZgt2uZUKWkn5/oBHsQIsJPu6nX/rfGG/g7V+fGqKYVDwT7g/bT
xR7DAjVUE1oWkTL2dfOuK2HXKu/yIgMZndFIAcc=
</P>
<Q>l2BQjxUjC8yykrmCouuEC/BYHPU=</Q>
<G>
9+GghdabPd7LvKtcNrhXuXmUr7v6OuqC+VdMCz0HgmdRWVeOutRZT+ZxBxCBgLRJFn
Ej6EwoFhO3zwkyjMim4TwWeotUfI0o4KOuHiuzpnWRbqN/C/ohNWLx+2J6ASQ7zKTx
vqhRkImog9/hWuWfBpKLZl6Ae1UlZAFMO/7PSSo=
</G>
<Y>
/U3X04lzUKj+2NSxcV1SHBQe8Jyvhj2sMneglMBDZ9nwdTvyuYG10uMgHYmd5Id9lr
vGbGSz2O+xBU2oh20hR5knKx4MmPZsbheKlUFrpd+3z71CzN8isfDuyvjT7hUt6Br8
zDx/N5Av8Y205khGFwgE9qkabH20u2JG4LW+LLo=
</Y>
</DSAKeyValue>
</KeyValue>
<X509Data>
<X509IssuerSerial>
<X509IssuerName>CN=Elliotte Rusty Harold,OU=Metrotech,O=Polytechnic,L=Brooklyn,ST=New York,C=US</X509IssuerName>
<X509SerialNumber>1047659081</X509SerialNumber>
</X509IssuerSerial>
<X509SubjectName>CN=Elliotte Rusty Harold,OU=Metrotech,O=Polytechnic,L=Brooklyn,ST=New York,C=US</X509SubjectName>
<X509Certificate>
MIIDLzCCAu0CBD5yAkkwCwYHKoZIzjgEAwUAMH0xCzAJBgNVBAYTAlVTMREwDwYDVQQIEwhOZXcg
WW9yazERMA8GA1UEBxMIQnJvb2tseW4xFDASBgNVBAoTC1BvbHl0ZWNobmljMRIwEAYDVQQLEwlN
ZXRyb3RlY2gxHjAcBgNVBAMTFUVsbGlvdHRlIFJ1c3R5IEhhcm9sZDAeFw0wMzAzMTQxNjI0NDFa
Fw0wMzA2MTIxNjI0NDFaMH0xCzAJBgNVBAYTAlVTMREwDwYDVQQIEwhOZXcgWW9yazERMA8GA1UE
BxMIQnJvb2tseW4xFDASBgNVBAoTC1BvbHl0ZWNobmljMRIwEAYDVQQLEwlNZXRyb3RlY2gxHjAc
BgNVBAMTFUVsbGlvdHRlIFJ1c3R5IEhhcm9sZDCCAbgwggEsBgcqhkjOOAQBMIIBHwKBgQD9f1OB
HXUSKVLfSpwu7OTn9hG3UjzvRADDHj+AtlEmaUVdQCJR+1k9jVj6v8X1ujD2y5tVbNeBO4AdNG/y
ZmC3a5lQpaSfn+gEexAiwk+7qdf+t8Yb+DtX58aophUPBPuD9tPFHsMCNVQTWhaRMvZ1864rYdcq
7/IiAxmd0UgBxwIVAJdgUI8VIwvMspK5gqLrhAvwWBz1AoGBAPfhoIXWmz3ey7yrXDa4V7l5lK+7
+jrqgvlXTAs9B4JnUVlXjrrUWU/mcQcQgYC0SRZxI+hMKBYTt88JMozIpuE8FnqLVHyNKOCjrh4r
s6Z1kW6jfwv6ITVi8ftiegEkO8yk8b6oUZCJqIPf4VrlnwaSi2ZegHtVJWQBTDv+z0kqA4GFAAKB
gQD9TdfTiXNQqP7Y1LFxXVIcFB7wnK+GPawyd6CUwENn2fB1O/K5gbXS4yAdiZ3kh32Wu8ZsZLPY
77EFTaiHbSFHmScrHgyY9mxuF4qVQWul37fPvULM3yKx8O7K+NPuFS3oGvzMPH83kC/xjbTmSEYX
CAT2qRpsfbS7Ykbgtb4sujALBgcqhkjOOAQDBQADLwAwLAIUKtIOsax3UbphktK0CnWEWz0yJ5gC
FAX5zyBcEp0+mYauptGaIjw7drSZ
</X509Certificate>
</X509Data>
</KeyInfo>
</Signature>
<Signature xmlns="http://www.w3.org/2000/09/xmldsig#">
<SignedInfo>
<CanonicalizationMethod Algorithm="http://www.w3.org/TR/2001/REC-xml-c14n-20010315"></CanonicalizationMethod>
<SignatureMethod Algorithm="http://www.w3.org/2000/09/xmldsig#dsa-sha1"></SignatureMethod>
<Reference URI="#Res0">
<Transforms>
<Transform Algorithm="http://www.w3.org/TR/2001/REC-xml-c14n-20010315"></Transform>
</Transforms>
<DigestMethod Algorithm="http://www.w3.org/2000/09/xmldsig#sha1"></DigestMethod>
<DigestValue>D9oQKdI9sEcHhJM6CTHDDKVjwSo=</DigestValue>
</Reference>
</SignedInfo>
<SignatureValue>
AqD5Zjfr0S64qAMPrOtznEhFBl1bXJ7aCosaY5pPMLHzGuzN7u1doQ==
</SignatureValue>
<KeyInfo>
<KeyValue>
<DSAKeyValue>
<P>
/X9TgR11EilS30qcLuzk5/YRt1I870QAwx4/gLZRJmlFXUAiUftZPY1Y+r/F9bow9s
ubVWzXgTuAHTRv8mZgt2uZUKWkn5/oBHsQIsJPu6nX/rfGG/g7V+fGqKYVDwT7g/bT
xR7DAjVUE1oWkTL2dfOuK2HXKu/yIgMZndFIAcc=
</P>
<Q>l2BQjxUjC8yykrmCouuEC/BYHPU=</Q>
<G>
9+GghdabPd7LvKtcNrhXuXmUr7v6OuqC+VdMCz0HgmdRWVeOutRZT+ZxBxCBgLRJFn
Ej6EwoFhO3zwkyjMim4TwWeotUfI0o4KOuHiuzpnWRbqN/C/ohNWLx+2J6ASQ7zKTx
vqhRkImog9/hWuWfBpKLZl6Ae1UlZAFMO/7PSSo=
</G>
<Y>
/U3X04lzUKj+2NSxcV1SHBQe8Jyvhj2sMneglMBDZ9nwdTvyuYG10uMgHYmd5Id9lr
vGbGSz2O+xBU2oh20hR5knKx4MmPZsbheKlUFrpd+3z71CzN8isfDuyvjT7hUt6Br8
zDx/N5Av8Y205khGFwgE9qkabH20u2JG4LW+LLo=
</Y>
</DSAKeyValue>
</KeyValue>
<X509Data>
<X509IssuerSerial>
<X509IssuerName>CN=Elliotte Rusty Harold,OU=Metrotech,O=Polytechnic,L=Brooklyn,ST=New York,C=US</X509IssuerName>
<X509SerialNumber>1047659081</X509SerialNumber>
</X509IssuerSerial>
<X509SubjectName>CN=Elliotte Rusty Harold,OU=Metrotech,O=Polytechnic,L=Brooklyn,ST=New York,C=US</X509SubjectName>
<X509Certificate>
MIIDLzCCAu0CBD5yAkkwCwYHKoZIzjgEAwUAMH0xCzAJBgNVBAYTAlVTMREwDwYDVQQIEwhOZXcg
WW9yazERMA8GA1UEBxMIQnJvb2tseW4xFDASBgNVBAoTC1BvbHl0ZWNobmljMRIwEAYDVQQLEwlN
ZXRyb3RlY2gxHjAcBgNVBAMTFUVsbGlvdHRlIFJ1c3R5IEhhcm9sZDAeFw0wMzAzMTQxNjI0NDFa
Fw0wMzA2MTIxNjI0NDFaMH0xCzAJBgNVBAYTAlVTMREwDwYDVQQIEwhOZXcgWW9yazERMA8GA1UE
BxMIQnJvb2tseW4xFDASBgNVBAoTC1BvbHl0ZWNobmljMRIwEAYDVQQLEwlNZXRyb3RlY2gxHjAc
BgNVBAMTFUVsbGlvdHRlIFJ1c3R5IEhhcm9sZDCCAbgwggEsBgcqhkjOOAQBMIIBHwKBgQD9f1OB
HXUSKVLfSpwu7OTn9hG3UjzvRADDHj+AtlEmaUVdQCJR+1k9jVj6v8X1ujD2y5tVbNeBO4AdNG/y
ZmC3a5lQpaSfn+gEexAiwk+7qdf+t8Yb+DtX58aophUPBPuD9tPFHsMCNVQTWhaRMvZ1864rYdcq
7/IiAxmd0UgBxwIVAJdgUI8VIwvMspK5gqLrhAvwWBz1AoGBAPfhoIXWmz3ey7yrXDa4V7l5lK+7
+jrqgvlXTAs9B4JnUVlXjrrUWU/mcQcQgYC0SRZxI+hMKBYTt88JMozIpuE8FnqLVHyNKOCjrh4r
s6Z1kW6jfwv6ITVi8ftiegEkO8yk8b6oUZCJqIPf4VrlnwaSi2ZegHtVJWQBTDv+z0kqA4GFAAKB
gQD9TdfTiXNQqP7Y1LFxXVIcFB7wnK+GPawyd6CUwENn2fB1O/K5gbXS4yAdiZ3kh32Wu8ZsZLPY
77EFTaiHbSFHmScrHgyY9mxuF4qVQWul37fPvULM3yKx8O7K+NPuFS3oGvzMPH83kC/xjbTmSEYX
CAT2qRpsfbS7Ykbgtb4sujALBgcqhkjOOAQDBQADLwAwLAIUKtIOsax3UbphktK0CnWEWz0yJ5gC
FAX5zyBcEp0+mYauptGaIjw7drSZ
</X509Certificate>
</X509Data>
</KeyInfo>
<dsig:Object xmlns="" xmlns:dsig="http://www.w3.org/2000/09/xmldsig#" Id="Res0"><?xml-stylesheet type="text/css" href="song.css"?><SONG xmlns="http://www.cafeconleche.org/namespace/song" xmlns:xlink="http://www.w3.org/1999/xlink">
<TITLE>Hot Cop</TITLE>
<PHOTO ALT="Victor Willis in Cop Outfit" HEIGHT="200" WIDTH="100" xlink:href="hotcop.jpg" xlink:show="onLoad" xlink:type="simple"></PHOTO>
<COMPOSER>Jacques Morali</COMPOSER>
<COMPOSER>Henri Belolo</COMPOSER>
<COMPOSER>Victor Willis</COMPOSER>
<PRODUCER>Jacques Morali</PRODUCER>
<!-- The publisher is actually Polygram but I needed
an example of a general entity reference. -->
<PUBLISHER xlink:href="http://www.amrecords.com/" xlink:type="simple">
A & M Records
</PUBLISHER>
<LENGTH>6:20</LENGTH>
<YEAR>1978</YEAR>
<ARTIST>Village People</ARTIST>
</SONG><!-- You can tell what album I was
listening to when I wrote this example --></dsig:Object>
</Signature>
Can encrypt:
An XML element
The content of an XML element
Arbitrary binary data with a URI
The ciphertext can be stored in an EncryptedData
element or referenced (through a URI) by an EncryptedData
element.
Arbitrary encryption algorithms are supported.
Required encryption algorithms include:
AES with CMS keylength
3DES
RSA-OAEP used with AES
RSA-v1.5 used with 3DES
Required key transport algorithms include:
RSA-OAEP used with AES
RSA-v1.5 used with 3DES
Required Symmetric Key Wrap algorithms include:
AES KeyWrap
CMS-KeyWrap-3DES
From the spec:
REQUIRED TRIPLEDES
http://www.w3.org/2001/04/xmlenc#tripledes-cbc
REQUIRED AES-128
http://www.w3.org/2001/04/xmlenc#aes128-cbc
REQUIRED AES-256
http://www.w3.org/2001/04/xmlenc#aes256-cbc
OPTIONAL AES-192
http://www.w3.org/2001/04/xmlenc#aes192-cbc
REQUIRED RSA-v1.5
http://www.w3.org/2001/04/xmlenc#rsa-1_5
REQUIRED RSA-OAEP
http://www.w3.org/2001/04/xmlenc#rsa-oaep-mgf1p
OPTIONAL Diffie-Hellman
http://www.w3.org/2001/04/xmlenc#dh
REQUIRED TRIPLEDES KeyWrap
http://www.w3.org/2001/04/xmlenc#kw-tripledes
REQUIRED AES-128 KeyWrap
http://www.w3.org/2001/04/xmlenc#kw-aes128
REQUIRED AES-256 KeyWrap
http://www.w3.org/2001/04/xmlenc#kw-aes256
OPTIONAL AES-192 KeyWrap
http://www.w3.org/2001/04/xmlenc#kw-aes192
REQUIRED SHA1
http://www.w3.org/2000/09/xmldsig#sha1
RECOMMENDED SHA256
http://www.w3.org/2001/04/xmlenc#sha256
OPTIONAL SHA512
http://www.w3.org/2001/04/xmlenc#sha512
OPTIONAL RIPEMD-160
http://www.w3.org/2001/04/xmlenc#ripemd160
RECOMMENDED XML Digital Signature
http://www.w3.org/TR/2001/CR-xmldsig-core-20010419/
OPTIONAL Canonical XML with Comments
http://www.w3.org/TR/2001/REC-xml-c14n-20010315#WithComments
OPTIONAL Canonical XML (omits comments)
http://www.w3.org/TR/2001/REC-xml-c14n-20010315
REQUIRED base64
http://www.w3.org/2000/09/xmldsig#base64
Namespace URI http://www.w3.org/2001/04/xmlenc#
(Normally mapped to the xenc
prefix or default namespace)
Uses some elements from XML digital signatures for keys
Typical form:
<EncryptedData Id="unique_value"
Type="http://www.w3.org/2001/04/xmlenc#Element |
http://www.w3.org/2001/04/xmlenc#Content |
MIME media type URI">
<EncryptionMethod Algorithm="URI"/>
<ds:KeyInfo>
<ds:KeyName>Plain text name of key</ds:KeyName>
<ds:RetrievalMethod URI="key location"
Type="http://www.w3.org/2001/04/xmlenc#EncryptedKey" />
</ds:KeyInfo>
<CipherData>
<CipherValue>Base-64 encoded cipher text</CipherValue>
<CipherReference URI="URL of cipher text">
<Transforms>
<ds:Transform
Algorithm="http://www.w3.org/TR/1999/REC-xpath-19991116">
<ds:XPath xmlns:rep="http://www.example.org/repository">
self::text()[parent::CipherValue[@id="example1"]]
</ds:XPath>
<ds:Transform>
<ds:Transform Algorithm="http://www.w3.org/2000/09/xmldsig#base64"/>
</Transforms>
</CipherReference>
</CipherData>
</EncryptedData>
At a minimum, each EncryptedData
must contain
a CipherData
which contains either a
CipherValue
or a CipherReference
.
Everything else is optional.
<?xml version='1.0'?>
<PaymentInfo xmlns="http://example.org/payment">
<Name>Elliotte Rusty Harold<Name/>
<CreditCard Limit='1000' Currency='USD'>
<Number>1234 5678 9012 3456</Number>
<Issuer>Citibank</Issuer>
<Expiration>03/02</Expiration>
</CreditCard>
</PaymentInfo>
<?xml version='1.0'?>
<PaymentInfo xmlns="http://example.org/payment">
<Name>Elliotte Rusty Harold<Name/>
<EncryptedData Type='http://www.w3.org/2001/04/xmlenc#Element'
xmlns='http://www.w3.org/2001/04/xmlenc#'>
<EncryptionMethod
Algorithm="http://www.w3.org/2001/04/xmlenc#3des-cbc"/>
<CipherData>
<CipherValue>A23B45C56CABE4BE33327</CipherValue>
</CipherData>
</EncryptedData>
</PaymentInfo>
<?xml version='1.0'?>
<PaymentInfo xmlns="http://example.org/payment">
<Name>Elliotte Rusty Harold<Name/>
<CreditCard Limit="1000" Currency="USD">
<EncryptedData Type="http://www.w3.org/2001/04/xmlenc#Content"
xmlns="http://www.w3.org/2001/04/xmlenc#">
<EncryptionMethod
Algorithm="http://www.w3.org/2001/04/xmlenc#3des-cbc"/>
<CipherData>
<CipherValue>A23B45C56CABE4BE3</CipherValue>
</CipherData>
</EncryptedData>
</CreditCard>
</PaymentInfo>
<?xml version='1.0'?>
<PaymentInfo xmlns="http://example.org/payment">
<Name>Elliotte Rusty Harold<Name/>
<CreditCard Limit='1000' Currency='USD'>
<Number>
<EncryptedData Type="http://www.w3.org/2001/04/xmlenc#Content"
xmlns="http://www.w3.org/2001/04/xmlenc#">
<EncryptionMethod
Algorithm="http://www.w3.org/2001/04/xmlenc#3des-cbc"/>
<CipherData>
<CipherValue>A23B45C56CABE4BE</CipherValue>
</CipherData>
</EncryptedData>
</Number>
<Issuer>Citibank</Issuer>
<Expiration>03/02</Expiration>
</CreditCard>
</PaymentInfo>
<?xml version='1.0'?>
<EncryptedData
Type="http://www.isi.edu/in-notes/iana/assignments/media-types/text/xml"
xmlns="http://www.w3.org/2001/04/xmlenc#">
<EncryptionMethod Algorithm="http://www.w3.org/2001/04/xmlenc#3des-cbc"/>
<CipherData>
<CipherValue>A23B45C56CABE4BE7687989219C4E5DEADBEEFCAFEBABE</CipherValue>
</CipherData>
</EncryptedData>
enc.XMLCipher2 reads an XML document and encrypts the part of it specified by an XPath expression using a template file:
% java enc.XMLCipher2 -e keyinfo.xml hotcop.xml /SONG/PUBLISHER template1.xml
API
Authentication
Authorization
Access Control
XML Infoset Specification: http://www.w3.org/TR/xml-infoset
Canonical XML Specification: http://www.w3.org/TR/xml-c14n
XML Signature Specification: http://www.w3.org/TR/xmldsig-core/
XML Encryption Requirements: http://www.w3.org/TR/xml-encryption-req
XML Encryption Syntax and Processing: http://www.w3.org/TR/xmlenc-core/
Discriminating against languages simply because their scripts were not encoded in Unicode 2.0 is inherently unjust.
--XML Blueberry Requirements
Changes the definition of white space
Enables native language markup in Ethiopic, Burmese, and Cambodian
Breaks compatibility with XML 1.0
Undeclare namespace prefixes
C1 controls must be represented as character references
C0 controls can be used but must be represented as character references (except for the tab, line feed, carriage return, and space
Most well-formed and valid XML 1.0 documents remain well-formed and valid XML 1.1 documents.
version
attribute in the XML declaration has the value 1.1
:
<?xml version="1.1">
Characters from many more scripts are allowed as name characters, i.e tag names, attribute names, entity names, ID attribute values, etc.
Most C0 controls (vertical tab, bell, formfeed, etc.) are now allowed in element content. However, they must be escaped as character references such as ,  , etc.
NEL (0x85) is now allowed as white space along with carriage return and linefeed.
All of these cause major interoperability problems and should not be used.
XML 1.0 defines white space thusly:
[3] S ::= (#x20 | #x9 | #xD | #xA)+
With XML 1.1 this is the same, but the parser converts #x85 to #xA on input
Supports IBM mainframe editors
Breaks everybody else's software
Currently only scripts defined in Unicode 2.0 are allowed in XML element and attribute names
All scripts defined in Unicode are allowed in element and attribute content
Unicode 3.0 adds:
Ethiopic (Amharic, Geez, etc.)
Burmese
Cambodian
Mongolian
Dvihehi
Yi syllabary
Also:
Cherokee
Canadian aboriginal languages
Perhaps:
Japanese
Cantonese
Unicode 3.1 adds:
Deseret
Old Italic
Gothic
Unicode 4.0 adds:
Linear B
Ugaritic
Shavian
Osmanya
Is this enough to justify breaking compatibility?
XML 1.0 explicitly lists them; everything not permitted is forbidden
XML 1.1:
[4] NameStartChar := ":" | [A-Z] | "_" | [a-z] | [#xC0-#x02FF]
| [#x0370-#x037D] | [#x037F-#x2027]
| [#x202A-#x218F] | [#x2800-#xD7FF]
| [#xE000-#xFDCF] | [#xFDE0-#xFFEF]
| [#x10000-#x10FFFF]
[4a] NameChar := NameStartChar | "-" | "." | [0-9] | #xB7 | [#x0300-#x036F]
Many of these characters aren't defined yet in Unicode
Many of these characters are very surprising; e.g. musical and mathematical sybols
1.0.1
Mandate non-1.1 for documents that don't use 1.1
Well-formedness error?
Non-fatal error?
All of these were rejected by the working group.
Requires XML 1.1
Uses IRIs rather than URIs
Can undeclare prefixes
Internationalized Resource Identifiers
Not limited to ASCII
For example,
http://www.example.fr/présidence
http://www.example.he/λεξικό
Percent escapes are not decoded when comparing. These are all different:
http://www.example.fr/présidence
http://www.example.fr/pr%a9sidence
http://www.example.fr/pr%A9sidence
XML escapes are decoded. These are the same:
xmlns:λεξ="http://www.example.he/λεξικό"
xmlns:λεξ="http://www.example.he/λεξικό"
xmlns:λεξ="http://www.example.he/λεξικό"
xmlns:prefix=""
is now legal
Important in applications such as XSLT and W3C XML Schema Language that use prefixes in attribute values and element content
Example from the spec:
<?xml version="1.1"?>
<x xmlns:n1="http://www.w3.org">
<n1:a/> <!-- legal; the prefix n1 is bound to http://www.w3.org -->
<x xmlns:n1="">
<n1:a/> <!-- illegal; the prefix n1 is not bound here -->
<x xmlns:n1="http://www.w3.org">
<n1:a/> <!-- legal; the prefix n1 is bound again -->
</x>
</x>
</x>
The problem is that we're not providing the tools. We're providing the specs. That's a whole different ball game. If tools existed for actually making really interesting use of RDF and XLink and XInclude then people would use them. If IE and/or Mozilla supported the full gamut of specs, from XSLT 1.0 to XLink and XInclude (OK, so they're not quite REC's, but with time...) then you would find people using them more.--Matt Sergeant on the xml-dev mailing list
A means of including one XML document inside another, irrespective of validation.
Based on the XML Infoset; a source infoset is transformed into a result infoset
xlink:show="embed"
only graphically includes,
like the IMG
element in HTML.
It does not merge infosets.
External parsed entities:
Require a DTD
Can only handle very limited documents; i.e. not all well-formed XML documents are well-formed external parsed entities. In particular XML declarations can be and document type declarations are a problem.
Doesn't allow unparsed text inserted as CDATA
XSLT document()
function
Only handles XSLT
No unparsed, pure-text includes
Server side includes:
HTML only
Server dependent
Custom code or XSLT extension functions
href
attribute identifies the document (or part thereof)
to be included
In the http://www.w3.org/2001/XInclude
namespace.
The prefixes xinclude
or xi
are customary.
<book xmlns:xinclude="http://www.w3.org/2001/XInclude">
<title>Processing XML with Java</title>
<chapter><xinclude:include href="dom.xml"/></chapter>
<chapter><xinclude:include href="sax.xml"/></chapter>
<chapter><xinclude:include href="jdom.xml"/></chapter>
</book>
parse="xml"
parse="text"
<
will change to <
and so forth.
<slide xmlns:xinclude="http://www.w3.org/2001/XInclude">
<title>The href attribute</title>
<ul>
<li>Identifies the document to be included with a URI</li>
<li>The document at the URI replaces the <code>include</code>
element in the including document</li>
<li>The <code>xinclude</code> prefix is bound to the http://www.w3.org/2001/XInclude
namespace URI.
</li>
</ul>
<pre><code><xinclude:include parse="text" href="processing_xml_with_java.xml"/>
</code></pre>
<description>
A slide from Elliotte Rusty Harold's Bleeding Edge of XML presentation at
<host_ref/>, <date_ref/>
</description>
<last_modified>October 26, 2000</last_modified>
</slide>
Used when parse="text"
Value is the name of the text file's character encoding, as in the encoding declaration in the XML declaration
e.g. ISO-8859-1, UTF-8, UTF-16, MacRoman, etc.
<slide xmlns:xinclude="http://www.w3.org/2001/XInclude">
<title>The href attribute</title>
<ul>
<li>Identifies the document to be included with a URI</li>
<li>The document at the URI replaces the <code>include</code>
element in the including document</li>
<li>The <code>xinclude</code> prefix is bound to the http://www.w3.org/2001/XInclude
namespace URI.
</li>
</ul>
<pre><code><xinclude:include parse="text" encoding="ISO-8859-1"
href="processing_xml_with_java.xml"/>
</code></pre>
<description>
A slide from Elliotte Rusty Harold's Bleeding Edge of XML presentation at
<host_ref/>, <date_ref/>
</description>
<last_modified>October 26, 2000</last_modified>
</slide>
This presentation: http://www.cafeconleche.org/slides/xmlone/amsterdam2001/hypertext/
XInclude Specification: http://www.w3.org/TR/xinclude
XML Bible, Gold edition
Elliotte Rusty Harold
Hungry Minds, 2001
ISBN 0-7645-4819-0
Actually, SAX2 has ** MUCH ** better infoset support than DOM does. Yes, I've done the detailed analysis.
--David Brownell on the xml-dev mailing list
Full Infoset support
Backwards compatible with SAX2
Much less radical changes than from SAX1 to SAX2
Infoset includes a flag saying whether a given attribute value was specified in the instance document or defaulted from the DTD.
DOM also wants to know this
Solution:
package org.xml.sax.ext;
public interface Attributes2 extends Attributes {
public boolean isSpecified (int index);
public boolean isSpecified (String uri, String localName);
public boolean isSpecified (String qualifiedName);
public boolean isDeclared (int index);
public boolean isDeclared (String uri, String localName);
public boolean isDeclared (String qualifiedName);
}
The isDeclared
methods
return false unless the attribute was declared in the DTD.
This interface would be implemented by SAX 2.1
Attributes
objects provided in
startElement()
callbacks
The read-only
http://xml.org/sax/features/use-attributes2 feature
specifies whether Attributes2
is available
<?xml version="1.0" standalone="yes"?>
The XML Infoset includes a standalone property for documents
Not currently exposed by SAX2
Solution: Define a new read-only feature: http://xml.org/sax/features/is-standalone
<?xml version="1.0" encoding="UTF-16"?>
Infoset includes the version and encoding from the XML declaration; SAX2 does not.
Unlike standalone, these apply to all parsed entities; not just the document entity
Solution:
package org.xml.sax.ext;
public interface Locator2 extends Locator {
public String getXMLVersion ();
public String getEncoding ();
}
This would be implemented by
Locator
objects passed to
setDocumentLocator()
methods
The read-only feature http://xml.org/sax/features/use-locator2
says whether Locator2
's are used.
To make matters worse, there can be as many as three encodings:
What's declared in the document using an encoding declaration in the XML declaration
The MIME type encoding, as specified by the the HTTP header
The name of the encoding used by a java.io.InputStreamReader
(UTF8 vs. UTF-8)
There's no way to find out what features
and properties a given XMLReader
recognizes.
Solution: Define two new read-only properties:
XMLReader
.
XMLReader
.
Or perhaps a method instead of a property?
The DeclHandler
and LexicalHandler
extension handlers are not supported by the
DefaultHandler
convenience class.
Solution:
Define a new org.xml.sax.ext
class implementing those two
interfaces, inheriting from
org.xml.sax.helpers.DefaultHandler
public class DefaultHandler2 extends DefaultHandler
implements DeclHandler, LexicalHandler, EntityResolver2 {
// LexicalHandler methods
public void startDTD(String name, String publicId, String systemId)
throws SAXException {}
public void endDTD() throws SAXException {}
public void startEntity(String name) throws SAXException {}
public void endEntity(String name) throws SAXException {}
public void startCDATA() throws SAXException {}
public void endCDATA() throws SAXException {}
public void comment(char[] ch, int start, int length)
throws SAXException {}
// DeclHandler methods
public void elementDecl(String name, String model)
throws SAXException {}
public void attributeDecl(String elementName,
String attributeName, String type,
String valueDefault, String value)
throws SAXException {}
public void internalEntityDecl(String name, String value)
throws SAXException {}
public void externalEntityDecl(String name, String publicID,
String systemID) throws SAXException {}
}
Alternately,
update DefaultHandler
.
Problem: There is no conventional way for applications to identify the version of the parser they are using, for purposes of diagnostics or other kinds of troubleshooting.
The best the JVM supports is the JDK 1.2
java.lang.Package
facility,
which is dependent on the JAR file metadata. It provides a partial solution, at
the price of portability (JDK 1.1 APIs are much more portable) and
assumptions like "one parser per package".
Solution: Define a new standard read-only property:
Returns a string identifying the reader and its version for use in diagnostics.
Parsers could support that if desired, probably using some sort of resource-based mechanism (not necessarily Package) to keep such release-specific strings out of the source code.
Open issue: Should there be separate strings to ID the reader (likely a constant value) and its version (ideally assigned in release engineering)?
The http://xml.org/sax/features/xmlns-uris feature "Controls whether, when the namespace-prefixes feature is set, the parser treats namespace declaration attributes as being in the http://www.w3.org/2000/xmlns/ namespace. By default, SAX2 conforms to the original "Namespaces in XML" Recommendation, which explicitly states that such attributes are not in any namespace. Setting this optional flag to true makes the SAX2 events conform to a later backwards-incompatible revision of that recommendation, placing those attributes in a namespace."
SAXParseException has a new getExceptionId()
method to identify which kind of error is being reported:
public String getExceptionId()
Since diagnostic message vary between parsers, these identifiers are URIs so parsers can define nonstandard IDs.
Standard IDs look like:
http://xml.org/sax/exception/xml/rule-66
http://xml.org/sax/exception/xml/wfc-PEInInternalSubset
http://xml.org/sax/exception/xml/vc-roottype
http://xml.org/sax/exception/xml/nsc-NSDeclared
http://xml.org/sax/exception/xml/rule-number
indicates a BNF grammar violation by production
http://xml.org/sax/exception/xml/wfc-id
indicates a well-formedness constraint violation by ID attribute
http://xml.org/sax/exception/xml/vc-id
indicates a validity constraint violation by ID attribute
http://xml.org/sax/exception/xmlns/nsc-id
indicates a namespace violation by ID attribute
Some well-formedness violations can be attributed to more than one grammar rule. The most specific applicable rule should be used.
Subscribe to the xml-dev mailing list, http://lists.xml.org/archives/xml-dev/
of all of the things the W3C has given us, the DOM is probably the one with the least value.
--Michael Brennan on the xml-dev mailing list
DOM Level 0: what was implemented for JavaScript in Netscape 3/IE3
DOM Level 1, a W3C Standard
DOM Level 2, a W3C Standard
DOM Level 3: Several Working Drafts:
Validation
Extra (IDL) attributes on Entity
,
Document
, Node
,
and Text
interfaces
Document normalization
Standard means of loading and saving XML documents.
Bootstrapping new documents
Key events
DOMUserData
Node
Document
Text
Entity
Bootstrapping
Adds:
I will only show the new members. Currently, the plan is to
simply add these to the existing Node
interface.
Java binding:
package org.w3c.dom;
public interface Node {
public String getBaseURI();
public static final short DOCUMENT_POSITION_DISCONNECTED = 0x01;
public static final short DOCUMENT_POSITION_PRECEDING = 0x02;
public static final short DOCUMENT_POSITION_FOLLOWING = 0x04;
public static final short DOCUMENT_POSITION_CONTAINS = 0x08;
public static final short DOCUMENT_POSITION_IS_CONTAINED = 0x10;
public static final short DOCUMENT_POSITION_IMPLEMENTATION_SPECIFIC = 0x20;
public short compareDocumentPosition(Node other) throws DOMException;
public String getTextContent() throws DOMException;
public void setTextContent(String textContent) throws DOMException;
public boolean isSameNode(Node other);
public String lookupPrefix(String namespaceURI);
public boolean isDefaultNamespace(String namespaceURI);
public String lookupNamespaceURI(String prefix);
public boolean isEqualNode(Node arg);
public Node getFeature(String feature, String version);
public Object setUserData(String key, Object data, UserDataHandler handler);
public Object getUserData(String key);
}
XML documents may be built from multiple parsed entities, each of which is not necessarily a well-formed XML document, but is at least a plausible part of a well-formed XML document.
Each entity may have its own text declaration.
This is like an XML declaration without a standalone
attribute
and with an optional version
attribute:
<?xml version="1.0"?>
<?xml version="1.0" encoding="ISO-8859-9"?>
<?xml encoding="ISO-8859-9"?>
DOM3 adds:
Java binding:
package org.w3c.dom;
public interface Entity extends Node {
public String getActualEncoding();
public void setActualEncoding(String actualEncoding);
public String getEncoding();
public void setEncoding(String encoding);
public String getVersion();
public void setVersion(String version);
}
Adds:
<?xml version="1.0"?>
<?xml version="1.0" encoding="ISO-8859-9"?>
<?xml version="1.0" encoding="ISO-8859-9" standalone="no"?>
<?xml version="1.0" standalone="yes"?>
adoptNode()
setBaseURI()
renameNode()
DOMErrorHandler
that will be called in the event "that an error is
encountered while performing an operation on a document"Java binding:
package org.w3c.dom;
public interface Document extends Node {
public String getActualEncoding();
public void setActualEncoding(String actualEncoding);
public String getEncoding();
public void setEncoding(String encoding);
public boolean getStandalone();
public void setStandalone(boolean standalone);
public boolean getStrictErrorChecking();
public void setStrictErrorChecking(boolean strictErrorChecking);
public String getVersion();
public void setVersion(String version);
public Node adoptNode(Node source) throws DOMException;
public DOMErrorHandler getErrorHandler();
public void setErrorHandler(DOMErrorHandler errorHandler);
public String getDocumentURI();
public void setDocumentURI(String documentURI);
public void normalizeDocument();
public Node renameNode(Node n, String namespaceURI, String qualifiedName)
throws DOMException;
public DOMConfiguration getConfig();
}
Adds:
isWhitespaceInElementContent()
wholeText()
Text
nodes logically-adjacent to this node;
i.e. the XPath value of the text nodeJava binding:
package org.w3c.dom;
public interface Text extends Node {
public boolean getIsWhitespaceInElementContent();
public String getWholeText();
public Text replaceWholeText(String content) throws DOMException;
}
DOM2 has no implementation-independent means to create
a new Document
object
Implementation-dependent methods tend to be fairly complex. For example, in Xerces-J:
DOMImplementation impl = DOMImplementationImpl.getDOMImplementation();
Document fibonacci = impl.createDocument(null, "Fibonacci_Numbers", null);
Still no language-independent means to create
a new Document
object
Does provide an implementation-independent method for Java only:
DOMImplementation impl = DOMImplementationRegistry.getDOMImplementation("XML");
package org.w3c.dom.bootstrap;
public class DOMImplementationRegistry {
public final static String PROPERTY =
"org.w3c.dom.DOMImplementationSourceList";
public static DOMImplementationRegistry newInstance()
throws ClassNotFoundException, InstantiationException,
IllegalAccessException;
public DOMImplementation getDOMImplementation(String features)
throws ClassNotFoundException,
InstantiationException, IllegalAccessException, ClassCastException;
public DOMImplementationList getDOMImplementations(String features)
throws ClassNotFoundException,
InstantiationException, IllegalAccessException, ClassCastException;
public void addSource(DOMImplementationSource s)
throws ClassNotFoundException,
InstantiationException, IllegalAccessException;
}
DOMErrorHandler
DOMLocator
Similar to SAX2's ErrorHandler
interface.
A callback interface
An application implements this interface and
then registers it with the setErrorHandler()
method to provide
warnings, errors, and fatal errors.
Java binding:
package org.w3c.dom;
public interface DOMErrorHandler {
public boolean handleError(DOMError error);
}
package org.w3c.dom;
public interface DOMError {
// ErrorSeverity
public static final short SEVERITY_WARNING = 0;
public static final short SEVERITY_ERROR = 1;
public static final short SEVERITY_FATAL_ERROR = 2;
public short getSeverity();
public String getMessage();
public String getType();
public Object getRelatedException();
public Object getRelatedData();
public DOMLocator getLocation();
}
Similar to SAX2's Locator
interface.
An application can implement this interface and
then register it with the setLocator()
method to
find out in which line and column and file a given
node appears.
Java binding:
package org.w3c.dom;
public interface DOMLocator {
public int getLineNumber();
public int getColumnNumber();
public int getOffset();
public Node getRelatedNode();
public String getUri();
}
Loading: parsing an existing XML document
to produce a Document
object
Saving: serializing a Document
object
into a file or onto a stream
Completely implementation dependent in DOM2
Library specific code creates a parser
The parser parses the document and returns a DOM
org.w3c.dom.Document
object.
The entire document is stored in memory.
DOM methods and interfaces are used to extract data from this object
This program parses with Xerces. Other parsers are different.
import org.apache.xerces.parsers.*; import org.w3c.dom.*; import org.xml.sax.*; import java.io.*; public class DOMParserMaker { public static void main(String[] args) { DOMParser parser = new DOMParser(); for (int i = 0; i < args.length; i++) { try { parser.parse(args[i]); Document d = parser.getDocument(); } catch (SAXException e) { System.err.println(e); } catch (IOException e) { System.err.println(e); } } } }
import javax.xml.parsers.*; // JAXP import org.xml.sax.SAXException; import java.io.IOException; public class JAXPParserMaker { public static void main(String[] args) { if (args.length <= 0) { System.out.println("Usage: java JAXPParserMaker URL"); return; } String document = args[0]; try { DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); DocumentBuilder parser = factory.newDocumentBuilder(); parser.parse(document); System.out.println(document + " is well-formed."); } catch (SAXException e) { System.out.println(document + " is not well-formed."); } catch (IOException e) { System.out.println( "Due to an IOException, the parser could not check " + document ); } catch (FactoryConfigurationError e) { // JAXP suffers from excessive brain-damage caused by // intellectual in-breeding at Sun. (Basically the Sun // engineers spend way too much time talking to each other // and not nearly enough time talking to people outside // Sun.) Fortunately, you can happily ignore most of the // JAXP brain damage and not be any the poorer for it. // This, however, is one of the few problems you can't // avoid if you're going to use JAXP at all. // DocumentBuilderFactory.newInstance() should throw a // ClassNotFoundException if it can't locate the factory // class. However, what it does throw is an Error, // specifically a FactoryConfigurationError. Very few // programs are prepared to respond to errors as opposed // to exceptions. You should catch this error in your // JAXP programs as quickly as possible even though the // compiler won't require you to, and you should // never rethrow it or otherwise let it escape from the // method that produced it. System.out.println("Could not locate a factory class"); } catch (ParserConfigurationException e) { System.out.println("Could not locate a JAXP parser"); } } }
import org.w3c.dom.*; public class DOM3ParserMaker { public static void main(String[] args) { DOMImplementation impl = DOMImplementationRegistry.getDOMImplementation("XML 1.0 LS-Load 3.0"); DOMImplementationLS implls = (DOMImplementationLS) impl; DOMBuilder parser = implls.getDOMBuilder(); for (int i = 0; i < args.length; i++) { try { Document d = parser.parseURI(args[i]); } catch (DOMSystemException e) { System.err.println(e); } catch (DOMException e) { System.err.println(e); } } } }
This code will not actually compile or run until some parser supports DOM3 Load and Save.
DOMImplementationLS
DOMImplementation
that provides the factory
methods for creating the objects
required for loading and saving.DOMBuilder
DOMInputSource
InputSource
DOMEntityResolver
DOMBuilderFilter
Element
nodes as
they are being processed during the parsing of a document.
like SAX filters.
DOMWriter
DOMWriterFilter
DocumentLS
ParserErrorEvent
LSLoadEvent
LSProgressEvent
Factory interface to create new
DOMBuilder
and DOMWriter
implementations.
Java Binding:
package org.w3c.dom.ls;
public interface DOMImplementationLS {
public static final short MODE_SYNCHRONOUS = 1;
public static final short MODE_ASYNCHRONOUS = 2;
public DOMBuilder createDOMBuilder(short mode) throws DOMException;
public DOMWriter createDOMWriter();
public DOMInputSource createDOMInputSource();
}
Use the feature "LS-Load" to find a
DOMImplementation
object that supports
Load and Save.
Cast the DOMImplementation
object to
DOMImplementationLS
.
DOMImplementation impl
= DOMImplementationRegistry.getDOMImplementation("XML 1.0 LS-Load 3.0");
if (impl != null) {
DOMImplementationLS implls = (DOMImplementationLS) impl;
// ...
}
Provides an implementation-independent
API for parsing XML documents to produce a DOM
Document
object.
Instances are built by the
createDOMBuilder()
method in DOMImplementationLS
.
Java Binding:
package org.w3c.dom.ls;
public interface DOMBuilder {
public DOMEntityResolver getEntityResolver();
public void setEntityResolver(DOMEntityResolver entityResolver);
public DOMErrorHandler getErrorHandler();
public void setErrorHandler(DOMErrorHandler errorHandler);
public DOMBuilderFilter getFilter();
public void setFilter(DOMBuilderFilter filter);
public void setFeature(String name, boolean state)
throws DOMException;
public boolean canSetFeature(String name, boolean state);
public boolean getFeature(String name)
throws DOMException;
public Document parseURI(String uri) throws Exception;
public Document parse(DOMInputSource is) throws Exception;
// ACTION_TYPES
public static final short ACTION_APPEND_AS_CHILDREN = 1;
public static final short ACTION_REPLACE_CHILDREN = 2;
public static final short ACTION_INSERT_BEFORE = 3;
public static final short ACTION_INSERT_AFTER = 4;
public static final short ACTION_REPLACE = 5;
public void parseWithContext(DOMInputSource is,
Node contextNode, short action) throws DOMException;
public DOMConfiguration getConfig();
public boolean getAsync();
public boolean getBusy();
public void abort();
}
Like SAX2's InputSource
class,
this interface is an abstraction of all the different things
(streams, files, byte arrays, sockets, URLs, etc.) from which
an XML document can be read.
Java Binding:
package org.w3c.dom.ls;
public interface DOMInputSource {
public InputStream getByteStream();
public void setByteStream(InputStream in);
public Reader getCharacterStream();
public void setCharacterStream(Reader in);
public String getStringData();
public void setStringData(String data);
public String getEncoding();
public void setEncoding(String encoding);
public String getPublicId();
public void setPublicId(String publicId);
public String getSystemId();
public void setSystemId(String systemId);
}
Like SAX2's EntityResolver
interface,
this interface lets applications redirect references to external entities.
Java Binding:
package org.w3c.dom.ls;
public interface DOMEntityResolver {
public DOMInputSource resolveEntity(String publicID,
String systemID, String baseURI) throws DOMSystemException;
}
Provides an API for serializing (writing) a DOM document out as a sequence of bytes onto a stream, file, socket, byte array, etc.
Java Binding:
package org.w3c.dom.ls;
public interface DOMWriter {
public DOMConfiguration getConfig();
public DOMWriterFilter getFilter();
public void setFilter(DOMWriterFilter filter);
public String getEncoding();
public void setEncoding(String encoding);
public String getNewLine();
public void setNewLine(String newLine);
public boolean writeNode(OutputStream destination, Node wnode)
throws Exception;
public String writeToString(Node node) throws DOMException;
}
Lets applications examine nodes as they are being constructed during a parse.
As each node is examined, it may be modified or removed, or parsing may be aborted.
Java Binding:
package org.w3c.dom.ls;
public interface DOMBuilderFilter {
// Constants returned by startElement and acceptNode
public static final short FILTER_ACCEPT = 1;
public static final short FILTER_REJECT = 2;
public static final short FILTER_SKIP = 3;
public static final short FILTER_INTERRUPT = 4;
public short startElement(Element element);
public short acceptNode(Node node);
public int getWhatToShow();
}
Lets applications examine nodes as they are being output.
As each element is examined, it may be modified or removed, or output may be aborted.
Java Binding:
package org.w3c.dom.ls;
public interface DOMWriterFilter extends NodeFilter {
public int getWhatToShow();
}
An instance of the DocumentLS
interface
can be obtained by casting an instance of the
Document
interface to DocumentLS
.
Java Binding:
package org.w3c.dom.ls;
import org.w3c.dom.Node;
import org.w3c.dom.DOMException;
public interface DocumentLS {
public boolean getAsync();
public void setAsync(boolean async);
public void abort();
public boolean load(String url);
public boolean loadXML(String source);
public String saveXML(Node node) throws DOMException;
}
Represents an error (of what kind?) in the document being parsed
Java Binding:
package org.w3c.dom.ls;
public interface ParseErrorEvent extends Event {
public DOMError getError();
}
Document Object Model (DOM) Level 3 Core Specification Version 1.0: http://www.w3.org/TR/DOM-Level-3-Core
Document Object Model (DOM) Level 3 Validation Specification: http://www.w3.org/TR/DOM-Level-3-Val/
Document Object Model (DOM) Level 3 Load and Save Specification: http://www.w3.org/TR/DOM-Level-3-LS/
Document Object Model (DOM) Requirements: http://www.w3.org/TR/DOM-Requirements/
Document Object Model (DOM) Level 3 Views and Formatting Specification: http://www.w3.org/TR/DOM-Level-3-Views/
Document Object Model (DOM) Level 3 XPath Specification: http://www.w3.org/TR/DOM-Level-3-Events/
Document Object Model (DOM) Level 3 Events Specification: http://www.w3.org/TR/DOM-Level-3-Views/
This presentation: http://www.cafeconleche.org/slides/xmlone/london2003/bleeding/
XOM Web Site: http://www.cafeconleche.org/XOM
XML Infoset Specification: http://www.w3.org/TR/xml-infoset
Processing XML with Java: http://www.cafeconleche.org/books/xmljava/
XInclude Specification: http://www.w3.org/TR/xinclude