The Bleeding Edge of XMLElliotte Rusty HaroldXML and Web Services 2002 LondonMonday, March 11, 2002elharo@metalab.unc.eduhttp://www.cafeconleche.org// |
Part I: XML Infoset, Canonical XML, Digital Signatures, and Encryption
Part II: XML 1.1
Part III: XPath 2.0 and Beyond
Part IV: SAX 2.1
Part V: DOM Level 3
The Infoset is the unfortunate standard to which those in retreat from the radical and most useful implications of well-formedness have rallied. At its core the Infoset insists that there is 'more' to XML than the straightforward syntax of well-formedness. By imposing its canonical semantics the Infoset obviates the infinite other semantic outcomes which might be elaborated in particular unique circumstances from an instance of well-formed XML 1.0 syntax. The question we should be asking is not whether the Infoset has chosen the correct canonical semantics, but whether the syntactic possibilities of XML 1.0 should be curtailed in this way at all.--Walter Perry on the xml-dev mailing list
<?xml version="1.0" encoding="UTF-8" standalone="no"?> <?xml-stylesheet type="text/css" href="song.css"?> <!DOCTYPE SONG SYSTEM "song.dtd"> <SONG xmlns="http://www.cafeconleche.org/namespace/song" xmlns:xlink="http://www.w3.org/1999/xlink"> <TITLE>Hot Cop</TITLE> <PHOTO xlink:type="simple" xlink:show="onLoad" xlink:href="hotcop.jpg" ALT="Victor Willis in Cop Outfit" WIDTH="100" HEIGHT="200"/> <COMPOSER>Jacques Morali</COMPOSER> <COMPOSER>Henri Belolo</COMPOSER> <COMPOSER>Victor Willis</COMPOSER> <PRODUCER>Jacques Morali</PRODUCER> <!-- The publisher is actually Polygram but I needed an example of a general entity reference. --> <PUBLISHER xlink:type="simple" xlink:href="http://www.amrecords.com/"> A & M Records </PUBLISHER> <LENGTH>6:20</LENGTH> <YEAR>1978</YEAR> <ARTIST>Village People</ARTIST> </SONG> <!-- You can tell what album I was listening to when I wrote this example -->
<?xml-stylesheet type="text/css" href="song.css"?> <SONG xmlns="http://www.cafeconleche.org/namespace/song" xmlns:xlink="http://www.w3.org/1999/xlink"> <TITLE>Hot Cop</TITLE> <PHOTO ALT="Victor Willis in Cop Outfit" HEIGHT="200" WIDTH="100" xlink:href="hotcop.jpg" xlink:show="onLoad" xlink:type="simple"></PHOTO> <COMPOSER>Jacques Morali</COMPOSER> <COMPOSER>Henri Belolo</COMPOSER> <COMPOSER>Victor Willis</COMPOSER> <PRODUCER>Jacques Morali</PRODUCER> <PUBLISHER xlink:href="http://www.amrecords.com/" xlink:type="simple"> A & M Records </PUBLISHER> <LENGTH>6:20</LENGTH> <YEAR>1978</YEAR> <ARTIST>Village People</ARTIST> </SONG>
import org.apache.xerces.parsers.*; import org.w3c.dom.*; import org.xml.sax.*; import java.io.*; public class DOMHotCop { public static void main(String[] args) { DOMParser parser = new DOMParser(); try { parser.parse("hot_cop.xml"); Document d = parser.getDocument(); } catch (SAXException e) { System.err.println(e); } catch (IOException e) { System.err.println(e); } } }
The customary form of an XML document
The canonical form of an XML document
The object form of an XML document
The encrypted form of an XML document
A W3C proposed recommendation providing "a consistent set of definitions for use in other specifications that need to refer to the information in a well-formed XML document." This is considerably weaker than originally planned.
What it used to be: A W3C standard for what is and is not significant in an XML document
Not everyone agrees that this is a good thing! or that this is the right list!
The Document Information Item
Element Information Items
Attribute Information Items
Processing Instruction Information Items
Unparsed Entity Information Items
Unexpanded Entity Information Items
Character Information Items
Comment Information Items
The Document Type Declaration Information Item
Notation Information Items
Namespace Information Items
Represents the entire document; not just the root element
Properties:
Children
One Element Information Item for the root element
One Comment Information Item for each Comment
One Processing Instruction Information Item for each Processing Instruction
Document Element
Character Encoding Scheme
Notation Declarations
Entity Declarations
Base URI
Standalone Declaration
Version Declaration
All declarations processed
<PHOTO
xlink:type="simple" xlink:show="onLoad" xlink:href="hotcop.jpg"
ALT="Victor Willis in Cop Outfit" WIDTH="100" HEIGHT="200"/>
<COMPOSER>Jacques Morali</COMPOSER>
<COMPOSER>
<PERSON>
<NAME>
<FIRST>Henri</FIRST>
<LAST>Belolo</LAST>
</NAME>
</PERSON>
</COMPOSER>
<rdf:RDF xmlns:rdf="http://www.w3.org/TR/REC-rdf-syntax#">
<rdf:Description xmlns:dc="http://purl.org/dc/"
about="http://www.ibiblio.org/examples/impressionists.xml">
<dc:title> Impressionist Paintings </dc:title>
<dc:creator> Elliotte Rusty Harold </dc:creator>
<dc:description>
A list of famous impressionist paintings organized
by painter and date
</dc:description>
<dc:date>2000-08-22</dc:date>
</rdf:Description>
</rdf:RDF>
An Element Information Item Includes:
namespace name; e.g. the absolute URI for the element's namespace
local name
prefix
children: a list of element, processing instruction, unexpanded entity, character, and comment information items, one for each element, processing instruction, unexpanded entity, data character, and comment appearing immediately within the current element
attributes: an unordered set of attribute information items, one for each of the attributes
(specified or defaulted from the DTD) of this element. xmlns
attributes
are not included.
namespace attributes: an unordered set of attribute information items, one for each of the namespaces declared either in the start-tag of this element or defaulted from the DTD.
in-scope namespaces: An unordered set of namespace information items, one for each of the namespaces in effect for this element
base URI: The absolute URI of the external entity in which this element appears, as defined in XML Base. If this is not known, this property is null.
parent
xlink:type="simple"
xlink:href="http://www.amrecords.com/"
xlink:type = "simple"
xlink:show = "onLoad"
xlink:href="hotcop.jpg"
ALT="Victor Willis in Cop Outfit"
WIDTH=" 100 "
HEIGHT=' 200 '
An Attribute Information Item Includes:
namespace name
local name
prefix
normalized value
specified: A flag indicating whether this attribute was actually specified in the start-tag of its element, or was defaulted from the DTD
attribute type:
ID
IDREF
IDREFS
ENTITY
ENTITIES
NMTOKEN
NMTOKENS
NOTATION
CDATA
ENUMERATED
owner element
references: only for IDREF, IDREFS, ENTITY, ENTITIES, and NOTATION type attributes; an ordered list of the things this attribute points to
<!-- The publisher is actually Polygram but I needed
an example of a general entity reference. -->
<!-- <PUBLISHER xlink:type="simple" xlink:href="http://www.amrecords.com/">
A & M Records
</PUBLISHER>
<LENGTH>6:20</LENGTH>
<YEAR>1978</YEAR>
<ARTIST>Village People</ARTIST>
</SONG> -->
<!-- You can tell what album I was
listening to when I wrote this example -->
A comment Information Item includes:
content
parent
<?robots index="yes" follow="no"?>
<?php
mysql_connect("database.unc.edu", "clerk", "password");
$result = mysql("CYNW", "SELECT LastName, FirstName FROM Employees
ORDER BY LastName, FirstName");
$i = 0;
while ($i < mysql_numrows ($result)) {
$fields = mysql_fetch_row($result);
echo "<person>$fields[1] $fields[0] </person>\r\n";
$i++;
}
mysql_close();
?>
target
content
notation
base URI
parent
A character is one Unicode character in the content of an element, attribute value, comment or processing instruction data.
A Character Information Item includes:
Note that Unicode is not a two-byte character set
An element has one namespace information item for each namespace in scope on the element. This is not the same as the namespaces declared on the element.
A Namespace Information Item includes:
prefix
namespace name
There is no obvious representation of namespace information items in the syntax of an XML document.
These are namespace declaration attributes, not namespace information items:
xmlns:rdf="http://www.w3.org/TR/REC-rdf-syntax#"
xmlns:dc="http://purl.org/dc/"
xmlns="http://www.w3.org/2000/svg"
Consider this document:
<svg:svg width="5cm" height="4cm"
xmlns:svg="http://www.w3.org/2000/svg">
<svg:desc>Two rectangles</svg:desc>
<svg:rect x="1.5cm" y="3.5cm" width="12cm" height="9.9cm"/>
<svg:rect x="2.5cm" y="2.8cm" width="3cm" height="17cm"/>
</svg:svg>
Each of the four elements has a namespace information item
with the prefix svg
and the namespace name
http://www.w3.org/2000/svg
<!DOCTYPE SONG SYSTEM "song.dtd">
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
A Document Type Declaration Information Item includes:
SYSTEM ID
PUBLIC ID
children: only the comment and processing instruction information items in the internal DTD subset and external DTD subsets.
parent
<!ELEMENT SONG (TITLE, PHOTO?, COMPOSER+, PRODUCER*, PUBLISHER*, LENGTH?, YEAR?, ARTIST+)> <!ATTLIST SONG xmlns CDATA #REQUIRED xmlns:xlink CDATA #REQUIRED> <!ELEMENT TITLE (#PCDATA)> <!ELEMENT PHOTO EMPTY> <!ATTLIST PHOTO xlink:type CDATA #FIXED "simple" xlink:href CDATA #REQUIRED xlink:show CDATA #IMPLIED ALT CDATA #REQUIRED WIDTH CDATA #REQUIRED HEIGHT CDATA #REQUIRED > <!ELEMENT COMPOSER (#PCDATA)> <!ELEMENT PRODUCER (#PCDATA)> <!ELEMENT PUBLISHER (#PCDATA)> <!ATTLIST PUBLISHER xlink:type CDATA #IMPLIED xlink:href CDATA #IMPLIED > <!ELEMENT LENGTH (#PCDATA)> <!-- This should be a four digit year like "1999", not a two-digit year like "99" --> <!ELEMENT YEAR (#PCDATA)> <!ELEMENT ARTIST (#PCDATA)>
There is no information item for this.
Comments and processing instructions in the DTD are reported as children of the Document Type Declaration information item
Notation and general entity declarations are reported as properties of the Document information item
Attribute types and default values are reported on the actual attributes in the document instance.
Everything else is not reported!
An XML document is made up of one or more physical storage units called entities
Entity references :
Parsed internal general entity references like &
Parsed external general entity references
Unparsed external general entity references
External parameter entity references
Internal parameter entity references
Reading an XML document is not the same thing as reading an XML file
The XML file contains entity references.
The XML document contains the entities' replacement text.
When you use a parser to read a document you'll get the text including characters like <. You will not see the entity references.
Two kinds of entity information items:
Unparsed Entity Information Item
Unexpanded Entity Information Items
Other entities are not reported
name
system identifier
public identifier
Notation
name
entity
parent
The internal and external DTD subsets; especially
ELEMENT
and ATTLIST
declarations
Schema types
CDATA sections
Character references
Expanded, parsed entity references
Whether an empty element uses two tags or one
What kind of quotes surround attributes
Insignificant white space in attributes
White space that occurs between attributes
Attribute order
A schema assigns a type to each element
Schema validation produces a Post Schema Validation Infoset, PSVI for short
Schema aware applications using schema aware parsers and APIs can make use of the types of elements
A W3C proposed standard serialization format of an XML document instance
Not everyone agrees that this is a good thing! or that this is the right format! It's totally unsuitable for editors and validation.
Based on the XPath 1.0 data model
Not really Infoset compatible
Something of this nature is nonetheless clearly needed for non-XML aware tools like digital signatures, change management, hash functions, and the like.
The document is encoded in UTF-8
Line breaks are normalized to a linefeed (ASCII , \n)
Attribute values are normalized, as if by a validating processor
Character and parsed entity references are replaced
CDATA sections are replaced with their character content
The XML and document type declarations are removed
Empty elements are converted to start tag-end tag pairs
White space outside of the document element and within start and end tags is normalized
All white space in character content is retained (except for characters removed during linefeed normalization)
Attribute value delimiters are set to double quotes
Special characters in attribute values and character content are replaced by character references
Superfluous namespace declarations are removed from each element
Default attributes are added to each element
Lexicographic order is imposed on the namespace declarations and attributes of each element
c14n.C14nDOM
reads an XML document from stdin
and writes the canonicalized output to stdout:
% java c14n.C14nDOM -xpath < hotcop.xml > canonicalized_hotcop.xml
-xpath option necessary to support the final draft of Canonical XML 1.0.
API in com.ibm.xml.dsig.Canonicalizer
package com.ibm.xml.dsig;
public interface Canonicalizer {
public static final String W3C
= "http://www.w3.org/TR/2000/WD-xml-c14n-20000119"
public static final java.lang.String W3C2
= "http://www.w3.org/TR/2001/REC-xml-c14n-20010315"
public static final java.lang.String W3C2WC
= "http://www.w3.org/TR/2001/REC-xml-c14n-20010315#WithComments"
public static final java.lang.String EXCLUSIVE;
public static final java.lang.String EXCLUSIVEWC;
public String getURI();
public void canonicalize(org.w3c.dom.Node node, OutputStream stream)
throws IOException;
}
Implementations include:
com.ibm.xml.dsig.transform.ExclusiveC11r
com.ibm.xml.dsig.transform.ExclusiveC11rWC
com.ibm.xml.dsig.transform.W3CCanonicalizer
com.ibm.xml.dsig.transform.W3CCanonicalizer2
com.ibm.xml.dsig.transform.W3CCanonicalizer2WC
CVS only; won't build
DOMWriter
W3C/IETF Joint Proposed Recommendation, August 20, 2001
XML Signatures provide:
Integrity
Message authentication
Signer authentication
For data of any type
Signed data can be located within the XML that includes the signature or elsewhere.
An enveloped signature is enclosed inside the XML element it signs
An enveloping signature signs XML data it contains.
A detached signature signs
data external to the Signature
element,
possibly in another document entirely.
The signature processor calculates a hash code for some data using a strong, one-way hash function.
The processor encrypts the hash code using a private key.
The verifier calculates the hash code for the data it's received.
It then decrypts the encrypted hash code using the public key to see if the hash codes match.
The signature processor digests (calculates the hash code for) a data object.
The processor places the digest value
in a Signature
element.
The processor digests the Signature
element.
The processor cryptographically signs
the Signature
element.
SampleSign2 and VerifyGUI from IBM's XML Security Suite: http://www.alphaworks.ibm.com/tech/xmlsecuritysuite
First use the JDK's keytool to generate a key:
% keytool -genkey -dname "CN=Elliotte Rusty Harold, OU=Metrotech, O=Polytechnic, L=Brooklyn, S=New York, C=US" -alias elharo -storepass mypassword -keypass mykeypassword
SampleSign2 reads an XML document from stdin and writes the signature to stdout:
C:\> java SampleSign2 elharo mypassword mykeypassword -ext
http://www.cafeconleche.org/slides/hoffman/fundamentals/examples/hotcop.xml > hotcop_signature.xml
Key store: C:\Documents and Settings\Administrator\.keystore
Sign: 7030ms
VerifyGUI reads signature from stdin and warns of changes to signed content.
C:\>java VerifyGUI < hotcop_signature.xml
The signature has a KeyValue element.
The signature has one or more X509Data elements.
Checks an X509Data:
It has 1 certificate(s).
Certificate Information:
Version: 1
Validity: OK
SubjectDN: CN=Elliotte Rusty Harold, OU=Metrotech, O=Polytechnic, L=Brooklyn, ST=New York, C=US
IssuerDN: CN=Elliotte Rusty Harold, OU=Metrotech, O=Polytechnic, L=Brooklyn, ST=New York, C=US
Serial#: 983556890
Time to verify: 951 [msec]
<?xml version='1.0' encoding='UTF-8'?>
<Signature xmlns="http://www.w3.org/2000/09/xmldsig#">
<SignedInfo>
<CanonicalizationMethod Algorithm="http://www.w3.org/TR/2000/WD-xml-c14n-20000119"/>
<SignatureMethod Algorithm="http://www.w3.org/2000/09/xmldsig#dsa-sha1"/>
<Reference URI="http://www.cafeconleche.org/slides/hoffman/fundamentals/examples/hotcop.xml">
<DigestMethod Algorithm="http://www.w3.org/2000/09/xmldsig#sha1"/>
<DigestValue>nvfYilfgN/rICyzhGmjidKCFoC8=</DigestValue>
</Reference>
</SignedInfo>
<SignatureValue>
hfowa4qdbuMkoZfX1/VXd4UBpIpZMM5+6CElmY7jOIKFqvXq5A5VKw==
</SignatureValue>
<KeyInfo>
<KeyValue>
<DSAKeyValue>
<P>
/X9TgR11EilS30qcLuzk5/YRt1I870QAwx4/gLZRJmlFXUAiUftZPY1Y+r/F9bow9s
ubVWzXgTuAHTRv8mZgt2uZUKWkn5/oBHsQIsJPu6nX/rfGG/g7V+fGqKYVDwT7g/bT
xR7DAjVUE1oWkTL2dfOuK2HXKu/yIgMZndFIAcc=
</P>
<Q>l2BQjxUjC8yykrmCouuEC/BYHPU=</Q>
<G>
9+GghdabPd7LvKtcNrhXuXmUr7v6OuqC+VdMCz0HgmdRWVeOutRZT+ZxBxCBgLRJFn
Ej6EwoFhO3zwkyjMim4TwWeotUfI0o4KOuHiuzpnWRbqN/C/ohNWLx+2J6ASQ7zKTx
vqhRkImog9/hWuWfBpKLZl6Ae1UlZAFMO/7PSSo=
</G>
<Y>
6jKpNnmkkWeArsn5Oeeg2njcz+nXdk0f9kZI892ddlR8Lg1aMhPeFTYuoq3I6neFlb
BjWzuktNZKiXYBfKsSTB8U09dTiJo2ir3HJuY7eW/p89osKMfixPQsp9vQMgzph6Qa
lY7j4MB7y5ROJYsTr1/fFwmj/yhkHwpbpzed1LE=
</Y>
</DSAKeyValue>
</KeyValue>
<X509Data>
<X509IssuerSerial>
<X509IssuerName>CN=Elliotte Rusty Harold, OU=Metrotech, O=Polytechnic, L=Brooklyn, ST=New York, C=US</X509IssuerName>
<X509SerialNumber>983556890</X509SerialNumber></X509IssuerSerial>
<X509SubjectName>CN=Elliotte Rusty Harold, OU=Metrotech, O=Polytechnic, L=Brooklyn, ST=New York, C=US</X509SubjectName>
<X509Certificate>
MIIDLzCCAu0CBDqf4xowCwYHKoZIzjgEAwUAMH0xCzAJBgNVBAYTAlVTMREwDwYDVQQIEwhOZXcg
WW9yazERMA8GA1UEBxMIQnJvb2tseW4xFDASBgNVBAoTC1BvbHl0ZWNobmljMRIwEAYDVQQLEwlN
ZXRyb3RlY2gxHjAcBgNVBAMTFUVsbGlvdHRlIFJ1c3R5IEhhcm9sZDAeFw0wMTAzMDIxODE0NTBa
Fw0wMTA1MzExODE0NTBaMH0xCzAJBgNVBAYTAlVTMREwDwYDVQQIEwhOZXcgWW9yazERMA8GA1UE
BxMIQnJvb2tseW4xFDASBgNVBAoTC1BvbHl0ZWNobmljMRIwEAYDVQQLEwlNZXRyb3RlY2gxHjAc
BgNVBAMTFUVsbGlvdHRlIFJ1c3R5IEhhcm9sZDCCAbgwggEsBgcqhkjOOAQBMIIBHwKBgQD9f1OB
HXUSKVLfSpwu7OTn9hG3UjzvRADDHj+AtlEmaUVdQCJR+1k9jVj6v8X1ujD2y5tVbNeBO4AdNG/y
ZmC3a5lQpaSfn+gEexAiwk+7qdf+t8Yb+DtX58aophUPBPuD9tPFHsMCNVQTWhaRMvZ1864rYdcq
7/IiAxmd0UgBxwIVAJdgUI8VIwvMspK5gqLrhAvwWBz1AoGBAPfhoIXWmz3ey7yrXDa4V7l5lK+7
+jrqgvlXTAs9B4JnUVlXjrrUWU/mcQcQgYC0SRZxI+hMKBYTt88JMozIpuE8FnqLVHyNKOCjrh4r
s6Z1kW6jfwv6ITVi8ftiegEkO8yk8b6oUZCJqIPf4VrlnwaSi2ZegHtVJWQBTDv+z0kqA4GFAAKB
gQDqMqk2eaSRZ4Cuyfk556DaeNzP6dd2TR/2Rkjz3Z12VHwuDVoyE94VNi6ircjqd4WVsGNbO6S0
1kqJdgF8qxJMHxTT11OImjaKvccm5jt5b+nz2iwox+LE9Cyn29AyDOmHpBqVjuPgwHvLlE4lixOv
X98XCaP/KGQfClunN53UsTALBgcqhkjOOAQDBQADLwAwLAIUODqxsFzS96BjrVA4LVo5FzuWBRMC
FC0xfXxbaJaCJuVqtcBv4bqwV0EX
</X509Certificate>
</X509Data>
</KeyInfo>
</Signature>
Can encrypt:
An XML element
The content of an XML element
Arbitrary binary data with a URI
The ciphertext can be stored in an EncryptedData
element or referenced (through a URI) by an EncryptedData
element.
Arbitrary encryption algorithms are supported.
Required encryption algorithms include:
AES with CMS keylength
3DES
RSA-OAEP used with AES
RSA-v1.5 used with 3DES
Required key transport algorithms include:
RSA-OAEP used with AES
RSA-v1.5 used with 3DES
Required Symmetric Key Wrap algorithms include:
AES KeyWrap
CMS-KeyWrap-3DES
From the spec:
REQUIRED TRIPLEDES
http://www.w3.org/2001/04/xmlenc#tripledes-cbc
REQUIRED AES-128
http://www.w3.org/2001/04/xmlenc#aes128-cbc
REQUIRED AES-256
http://www.w3.org/2001/04/xmlenc#aes256-cbc
OPTIONAL AES-192
http://www.w3.org/2001/04/xmlenc#aes192-cbc
REQUIRED RSA-v1.5
http://www.w3.org/2001/04/xmlenc#rsa-1_5
REQUIRED RSA-OAEP
http://www.w3.org/2001/04/xmlenc#rsa-oaep-mgf1p
OPTIONAL Diffie-Hellman
http://www.w3.org/2001/04/xmlenc#dh
REQUIRED TRIPLEDES KeyWrap
http://www.w3.org/2001/04/xmlenc#kw-tripledes
REQUIRED AES-128 KeyWrap
http://www.w3.org/2001/04/xmlenc#kw-aes128
REQUIRED AES-256 KeyWrap
http://www.w3.org/2001/04/xmlenc#kw-aes256
OPTIONAL AES-192 KeyWrap
http://www.w3.org/2001/04/xmlenc#kw-aes192
REQUIRED SHA1
http://www.w3.org/2000/09/xmldsig#sha1
RECOMMENDED SHA256
http://www.w3.org/2001/04/xmlenc#sha256
OPTIONAL SHA512
http://www.w3.org/2001/04/xmlenc#sha512
OPTIONAL RIPEMD-160
http://www.w3.org/2001/04/xmlenc#ripemd160
RECOMMENDED XML Digital Signature
http://www.w3.org/TR/2001/CR-xmldsig-core-20010419/
OPTIONAL Canonical XML with Comments
http://www.w3.org/TR/2001/REC-xml-c14n-20010315#WithComments
OPTIONAL Canonical XML (omits comments)
http://www.w3.org/TR/2001/REC-xml-c14n-20010315
REQUIRED base64
http://www.w3.org/2000/09/xmldsig#base64
Namespace URI http://www.w3.org/2001/04/xmlenc#
(Normally mapped to the xenc
prefix)
Uses some elements from XML digital signatures for keys
Typical form:
<EncryptedData Id="unique_value"
Type="http://www.w3.org/2001/04/xmlenc#Element |
http://www.w3.org/2001/04/xmlenc#Content |
MIME media type URI">
<EncryptionMethod Algorithm="URI"/>
<ds:KeyInfo>
<ds:KeyName>Plain text name of key</ds:KeyName>
<ds:RetrievalMethod URI="key location"
Type="http://www.w3.org/2001/04/xmlenc#EncryptedKey" />
</ds:KeyInfo>
<CipherData Nonce="Base-64 encoded salt">
<CipherValue>Base-64 encoded cipher text</CipherValue>
<CipherReference URI="URL of cipher text">
<Transforms>
<ds:Transform
Algorithm="http://www.w3.org/TR/1999/REC-xpath-19991116">
<ds:XPath xmlns:rep="http://www.example.org/repository">
self::text()[parent::CipherValue[@id="example1"]]
</ds:XPath>
<ds:Transform>
<ds:Transform Algorithm="http://www.w3.org/2000/09/xmldsig#base64"/>
</Transforms>
</CipherReference>
</CipherData>
</EncryptedData>
At a minimum, each EncryptedData
must contain
a CipherData
which contains either a
CipherValue
or a CipherReference
.
Everything else is optional.
<?xml version='1.0'?>
<PaymentInfo xmlns="http://example.org/payment">
<Name>Elliotte Rusty Harold<Name/>
<CreditCard Limit='1000' Currency='USD'>
<Number>1234 5678 9012 3456</Number>
<Issuer>Citibank</Issuer>
<Expiration>03/02</Expiration>
</CreditCard>
</PaymentInfo>
<?xml version='1.0'?>
<PaymentInfo xmlns="http://example.org/payment">
<Name>Elliotte Rusty Harold<Name/>
<EncryptedData Type='http://www.w3.org/2001/04/xmlenc#Element'
xmlns='http://www.w3.org/2001/04/xmlenc#'>
<EncryptionMethod
Algorithm="http://www.w3.org/2001/04/xmlenc#3des-cbc"/>
<CipherData>
<CipherValue>A23B45C56CABE4BE33327</CipherValue>
</CipherData>
</EncryptedData>
</PaymentInfo>
<?xml version='1.0'?>
<PaymentInfo xmlns="http://example.org/payment">
<Name>Elliotte Rusty Harold<Name/>
<CreditCard Limit="1000" Currency="USD">
<EncryptedData Type="http://www.w3.org/2001/04/xmlenc#Content"
xmlns="http://www.w3.org/2001/04/xmlenc#">
<EncryptionMethod
Algorithm="http://www.w3.org/2001/04/xmlenc#3des-cbc"/>
<CipherData>
<CipherValue>A23B45C56CABE4BE3</CipherValue>
</CipherData>
</EncryptedData>
</CreditCard>
</PaymentInfo>
<?xml version='1.0'?>
<PaymentInfo xmlns="http://example.org/payment">
<Name>Elliotte Rusty Harold<Name/>
<CreditCard Limit='1000' Currency='USD'>
<Number>
<EncryptedData Type="http://www.w3.org/2001/04/xmlenc#Content"
xmlns="http://www.w3.org/2001/04/xmlenc#">
<EncryptionMethod
Algorithm="http://www.w3.org/2001/04/xmlenc#3des-cbc"/>
<CipherData>
<CipherValue>A23B45C56CABE4BE</CipherValue>
</CipherData>
</EncryptedData>
</Number>
<Issuer>Citibank</Issuer>
<Expiration>03/02</Expiration>
</CreditCard>
</PaymentInfo>
<?xml version='1.0'?>
<EncryptedData
Type="http://www.isi.edu/in-notes/iana/assignments/media-types/text/xml"
xmlns="http://www.w3.org/2001/04/xmlenc#">
<EncryptionMethod Algorithm="http://www.w3.org/2001/04/xmlenc#3des-cbc"/>
<CipherData>
<CipherValue>A23B45C56CABE4BE7687989219C4E5DEADBEEFCAFEBABE</CipherValue>
</CipherData>
</EncryptedData>
enc.XMLCipher2 reads an XML document and encrypts the part of it specified by an XPath expression using a template file:
% java enc.XMLCipher2 -e keyinfo.xml hotcop.xml /SONG/PUBLISHER template1.xml
API
org.apache.xml.security.c14n.Canonicalizer
I have not been able to build this. No precompiled binaries yet.
Authentication
Authorization
Access Control
XML Infoset Specification: http://www.w3.org/TR/xml-infoset
Canonical XML Specification: http://www.w3.org/TR/xml-c14n
XML Signature Specification: http://www.w3.org/TR/xmldsig-core/
XML Encryption Requirements: http://www.w3.org/TR/xml-encryption-req
XML Encryption Syntax and Processing: http://www.w3.org/TR/xmlenc-core/
Discriminating against languages simply because their scripts were not encoded in Unicode 2.0 is inherently unjust.
--XML Blueberry Requirements
Changes the definition of white space
Enables native language markup in Ethiopic, Burmese, and Cambodian
Breaks compatibility with XML 1.0
XML 1.0 defines white space thusly:
[3] S ::= (#x20 | #x9 | #xD | #xA)+
With XML 1.1 this becomes
[3] S ::= (#x9 | #x20 | #xA | #xD | #x85 | #x2028)+
Supports IBM mainframe editors
Breaks everybody else's software
Currently only scripts defined in Unicode 2.0 are allowed in XML element and attribute names
All scripts defined in Unicode are allowed in element and attribute content
Unicode 3.0 adds:
Ethiopic (Amharic, Geez, etc.)
Burmese
Cambodian
Mongolian
Dvihehi
Yi syllabary
Also:
Cherokee
Canadian aboriginal languages
Perhaps:
Japanese
Cantonese
Is this enough to justify breaking compatibility?
XML 1.0 explicitly lists them; everything not permitted is forbidden
XML 1.1:
[4] NameStartChar := ":" | [A-Z] | "_" | [a-z] | [#xC0-#x02FF]
| [#x0370-#x037D] | [#x037F-#x2027]
| [#x202A-#x218F] | [#x2800-#xD7FF]
| [#xE000-#xFDCF] | [#xFDE0-#xFFEF]
| [#x10000-#x10FFFF]
[4a] NameChar := NameStartChar | "-" | "." | [0-9] | #xB7 | [#x0300-#x036F]
Many of these characters aren't defined yet in Unicode
Many of these characters are very surprising; e.g. musical and mathematical sybols
1.0.1
Mandate non-1.1 for documents that don't use 1.1
Well-formedness error?
Non-fatal error?
All of these were rejected by the working group.
In SQL, the query language is not expressed in tables and rows. In XQuery, the query language is not expressed in XML. Why is this a problem?--Jonathan Robie on the xml-dev mailing list
Used for XSLT 2.0 and XQuery
Schema Aware
Partially implemented by Michael Kay's Saxon 7.0, http://saxon.sourceforge.net/
Simplify manipulation of XML Schema-typed content
Simplify manipulation of string content
Support related XML standards
Improve ease of use
Improve interoperability
Improve internationalization (i18n) support
Maintain backward compatibility
Enable improved processor efficiency
Must express data model in terms of the Infoset
Must provide common core syntax and semantics for XSLT 2.0 and XML Query 1.0
Must support explicit "for any" or "for all" comparison and equality semantics
Must add min()
and max()
functions
Any valid XPath 1.0 expression SHOULD also be a valid XPath 2.0 expression when operating in the absence of XML Schema type information.
Should provide intersection and difference functions
Must loosen restrictions on location steps
Must provide a conditional expression (e.g. ternary
?:
operator in Java and C)
Should support additional string functions, possibly including space padding, string replacement and conversion to upper or lower case
Must support regular expression string matching using the regexp syntax from schemas
Must add support for XML Schema primitive datatypes
Should add support for XML Schema structures
(Adapted from Jeni Tennison)
The first class objects are strings, numbers, booleans, and node-sets (plus result tree fragments for XSLT)
Node-sets contain nodes (which are not first-class objects)
Nodes have various properties, including children - a node set (the order of the children can be worked out from the nodes' document order)
Seven node types: document, element, attribute, text, namespace, processing instruction, and comment
There are conceptually two kinds of node-sets:
Node-sets containing new nodes (result tree fragments) can only be generated using XSLT
Node-sets containing existing nodes can only be generated using XPath
No list data types, only node-sets but no number sets
Not Infoset compatible
(Adapted from Jeni Tennison)
The first class object type is a sequence; i.e. an ordered list
Sequences contain items of two types: simple typed values or nodes. (They may not contain other sequences.)
A sequence containing one item is the same as the item.
Simple typed values have W3C XML Schema Language simple types: xsd:gYear, xsd:int, xsd:decimal, xsd:date, etc.
Seven node types: document, element, attribute, text, namespace, processing instruction, and comment
Nodes have these properties:
node-kind: either "document", "element", "attribute", "text", "namespace", "processing-instruction", or "comment".
name: a sequence containing one expanded QName if the node has a name (elements, attributes, etc.) or an empty sequence if the node doesn't have a name (comments, text nodes, etc.)
parent: a sequence containing the unique parent node; the empty sequence is returned for parentless nodes, particularly document and namespace nodes
base-uri: URI from which this particular node came
string-value: same as XPath 1.0
typed-value: a sequence of simple typed values corresponding to the node (always the empty sequence for anything other than elements and attributes)
A sequence of child nodes (empty except for element and document nodes)
attributes: a sequence of attribute nodes; empty except for attribute nodes
namespaces: a sequence of namespace nodes in-scope on the node
declaration: a sequence containing 0 or 1 schema component
type: a sequence containing 0 or 1 schema component
unique-ID: a sequence containing 0 or 1 xsd:ID type node
Infoset compatible
{-- This is an XPath comment --}
<xsl:apply-templates
select="{-- The difference between the context node and the
current node is crucial here --}
../composition[@composer=current()/@id]"/>
<xsl:template match="*:set">
This matches MathML set elements, SVG set elements, set
elements in no namespace at all, etc.
</xsl:template>
The document()
function returns the root of a document at a given URL
document("http://www.cafeconleche.org/")//today
/child::contacts/(child::personal | child::business)/child::name
Abbreviated: /contacts/(personal | business)/name
Map an IDREF attribute node to the element it refers to
Composers and their compositions are linked through the
an ID-type
id
attribute of the composer
element
and the IDREF-type composer
attribute of the
composition
element:
<composer id="c3"> <name> <first_name>Beth</first_name> <middle_name></middle_name> <last_name>Anderson</last_name> </name> </composer> <composition composers="c3"> <title>Trio: Dream in D</title> <date><year>(1980)</year></date> <length>10'</length> <instruments>fl, pn, vc, or vn, pn, vc</instruments> <description> Rhapsodic. Passionate. Available on CD <cite><a href="http://www.amazon.com/exec/obidos/ASIN/B000007NMH/qid%3D913265342/sr%3D1-2/">Two by Three</a></cite> from North/South Consonance (1998). </description> <publisher></publisher> </composition>
With XPath 1.0:
<xsl:template match="composition">
<h2>
<xsl:value-of select="name"/> by
<xsl:value-of select="../composer[@id=current()/@composer]"/>
</h2>
</xsl:template>
With XPath 2.0:
<xsl:template match="composition">
<h2>
<xsl:value-of select="name"/> by
<xsl:value-of select="@composers=>composer/name"/>
</h2>
</xsl:template>
(1, 3, 2, 34, 76, "fnord")
(1 to 12)
Using constructors: (xf:date("2002-03-11"), xf:date("2002-03-12"), xf:date("2002-03-13"),
xf:date("2002-03-14"), xf:date("2002-03-15"))
Sequences can have mixed types: (xf:date("2002-03-11"), "Hello", 15)
Sequences do not nest; that is, a sequence cannot be a member of a sequence
Sequences are not sets: they are ordered and can contain duplicates
A single item is the same as a one-element sequence containing that item.
<?xml version="1.0"?> <xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="/"> <numbers> <xsl:for-each select="(1 to 10)"> <integer> <xsl:value-of select="."/> </integer> </xsl:for-each> </numbers> </xsl:template> </xsl:stylesheet>
Output (modulo white space):
<?xml version="1.0" encoding="utf-8"?>
<numbers>
<integer>1</integer>
<integer>2</integer>
<integer>3</integer>
<integer>4</integer>
<integer>5</integer>
<integer>6</integer>
<integer>7</integer>
<integer>8</integer>
<integer>9</integer>
<integer>10</integer>
</numbers>
union
or |
Duplicates are eliminated
<?xml version="1.0"?> <xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output indent="yes"/> <xsl:template match="/"> <numbers> <xsl:for-each select='(3 to 10) | (5 to 12) | (20 to 23)'> <integer> <xsl:value-of select="."/> </integer> </xsl:for-each> </numbers> </xsl:template> </xsl:stylesheet>
Output:
<numbers>
<integer>3</integer>
<integer>4</integer>
<integer>5</integer>
<integer>6</integer>
<integer>7</integer>
<integer>8</integer>
<integer>9</integer>
<integer>10</integer>
<integer>11</integer>
<integer>12</integer>
<integer>20</integer>
<integer>21</integer>
<integer>22</integer>
<integer>23</integer>
</numbers>
intersect
<?xml version="1.0"?> <xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output indent="yes"/> <xsl:template match="/"> <numbers> <xsl:for-each select='(3 to 10) intersect (5 to 12)'> <integer> <xsl:value-of select="."/> </integer> </xsl:for-each> </numbers> </xsl:template> </xsl:stylesheet>
Output:
<numbers>
<integer>5</integer>
<integer>6</integer>
<integer>7</integer>
<integer>8</integer>
<integer>9</integer>
<integer>10</integer>
</numbers>
except
<?xml version="1.0"?> <xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output indent="yes"/> <xsl:template match="/"> <numbers> <xsl:for-each select='(3 to 10) except (5 to 12)'> <integer> <xsl:value-of select="."/> </integer> </xsl:for-each> </numbers> </xsl:template> </xsl:stylesheet>
Output:
<numbers>
<integer>3</integer>
<integer>4</integer>
</numbers>
Compare single values and sequences of single or no values:
eq
ne
lt
le
gt
ge
These operators return either true, false, the empty sequence, an error, or a type exception.
Differ from value comparisons in that the condition only need to be true for some pair of items in a sequence
=
!=
<
<=
>
>=
These operators always return either true or false.
== and != have the same semantics as Java's == operator (identity), not the
equals()
method (equality)
>> and << compare single nodes and sequences of single nodes for document order
The precedes
operator returns true
if the first operand node is reachable from the second operand node
using the
preceding axis; otherwise it returns false.
The follows
operator returns true
if the first operand node is reachable from the second operand node
using the following axis; otherwise it returns false.
Useful for joining documents
Useful for restructuring data.
Syntax:
for $var1 in expression, $var2 in expression...
return expression
Consider the list of weblogs at http://static.userland.com/weblogMonitor/logs.xml
<?xml version="1.0" encoding="ISO-8859-1" ?> <!DOCTYPE foo SYSTEM "http://msdn.microsoft.com/xml/general/htmlentities.dtd"> <weblogs> <log> <name>MozillaZine</name> <url>http://www.mozillazine.org</url> <changesUrl>http://www.mozillazine.org/contents.rdf</changesUrl> <ownerName>Jason Kersey</ownerName> <ownerEmail>kerz@en.com</ownerEmail> <description>THE source for news on the Mozilla Organization. DevChats, Reviews, Chats, Builds, Demos, Screenshots, and more.</description> <imageUrl></imageUrl> <adImageUrl>http://static.userland.com/weblogMonitor/ads/kerz@en.com.gif</adImageUrl> </log> <log> <name>SalonHerringWiredFool</name> <url>http://www.salonherringwiredfool.com/</url> <ownerName>Some Random Herring</ownerName> <ownerEmail>salonfool@wiredherring.com</ownerEmail> <description></description> </log> <log> <name>SlashDot.Org</name> <url>http://www.slashdot.org/</url> <ownerName>Simply a friend</ownerName> <ownerEmail>afriendofweblogs@weblogs.com</ownerEmail> <description>News for Nerds, Stuff that Matters.</description> </log> </weblogs>
The changesUrl
element points to a document like
this:
<?xml version="1.0"?> <!DOCTYPE rss PUBLIC "-//Netscape Communications//DTD RSS 0.91//EN" "http://my.netscape.com/publish/formats/rss-0.91.dtd"> <rss version="0.91"> <channel> <title>MozillaZine</title> <link>http://www.mozillazine.org/</link> <language>en-us</language> <description>Your source for Mozilla news, advocacy, interviews, builds, and more!</description> <copyright>Copyright 1998-2002, The MozillaZine Organization</copyright> <managingEditor>jason@mozillazine.org</managingEditor> <webMaster>jason@mozillazine.org</webMaster> <image> <title>MozillaZine</title> <url>http://www.mozillazine.org/image/mynetscape88.gif</url> <description>Your source for Mozilla news, advocacy, interviews, builds, and more!</description> <link>http://www.mozillazine.org/</link> </image> <item> <title>BugDays Are Back!</title> <link>http://www.mozillazine.org/talkback.html?article=2151</link> </item> <item> <title>Independent Status Reports</title> <link>http://www.mozillazine.org/talkback.html?article=2150</link> </item> </channel> </rss>
We want to process all the item
elements from each weblog.
<xsl:template match="weblogs">
<xsl:apply-templates select="
for $url in log/changesUrl
return document($url)/item
"/>
</xsl:template>
if ( expression) then expression else expression
Not all weblogs have a changesUrl
<xsl:template match="log">
<xsl:apply-templates select="
if (changesUrl)
then document(changesUrl)
else document(url)"/>
</xsl:template>
some $QualifedName in expression satisfies
expression
every $QualifedName in expression satisfies
expression
Both return boolean values, true or false
<xsl:template match="weblogs">
<xsl:if test="some $log in log satisfies changesURL">
????
</xsl:if>
</xsl:template>
<xsl:template match="weblogs">
<xsl:if test="every $log in log satisfies url">
????
</xsl:if>
</xsl:template>
Functions are in the
http://www.w3.org/2001/12/xquery-functions
namespace which is customarily mapped to the xf
prefix
The function namespace name and prefix is understood in XSLT, without being explicitly stated.
Operators are in the http://www.w3.org/2001/12/xquery-operators namespace
XPath implementations such as XQuery and XSLT map the operators to symbols like * and +
xf:node-kind(Node)
xf:name(Node)
xf:string(Object)
xf:data(Node)
xf:base-uri(node)
xf:unique-ID(element)
Create a simple type from a string
Numeric constructors:
xf:decimal(string $srcval) => decimal
xf:integer(string $srcval) => integer
xf:long(string $srcval) => integer
xf:int(string $srcval) => integer
xf:short(string $srcval) => integer
xf:byte(string $srcval) => integer
xf:float(string $srcval) => float
xf:double(string $srcval) => double
String constructors
xf:string(string $srcval) => string
xf:normalizedString(string $srcval) => normalizedString
xf:token(string $srcval) => token
xf:language(string $srcval) => language
xf:Name(string $srcval) => Name
xf:NMTOKEN(string $srcval) => NMTOKEN
xf:NCName(string $srcval) => NCName
xf:ID(string $srcval) => ID
xf:IDREF(string $srcval) => IDREF
xf:ENTITY(string $srcval) => ENTITY
Boolean constructors:
xf:true() => boolean
xf:false() => boolean
xf:boolean-from-string(string $srcval) => boolean
Duration and Datetime constructors:
xf:duration(string $srcval) => duration
xf:dateTime(string $srcval) => dateTime
xf:date(string $srcval) => date
xf:time(string $srcval) => time
xf:gYearMonth(string $srcval) => gYearMonth
xf:gYear(string $srcval) => gYear
xf:gMonthDay(string $srcval) => gMonthDay
xf:gMonth(string $srcval) => gMonth
xf:gDay(string $srcval) => gDay
Constructors for QNames
xf:QName-from-uri(string $paramURI, string $paramLocal) => QName
xf:QName-from-string(string $param) => QName
xf:QName(string $paramLocal) => QName
Constructor for anyURI:
xf:anyURI(string $srcval) => anyURI
Constructors for NOTATION:
xf:NOTATION(string $srcval) => NOTATION
op:multiply(numeric $operand1, numeric $operand2) => numeric
op:numeric-add(numeric $operand1, numeric $operand2) => numeric
op:numeric-subtract(numeric $operand1, numeric $operand2) => numeric
op:numeric-multiply(numeric $operand1, numeric $operand2) => numeric
op:numeric-divide(numeric $operand1, numeric $operand2) => numeric
op:numeric-mod(numeric $operand1, numeric $operand2) => numeric
op:numeric-unary-plus(numeric $operand) => numeric
op:numeric-unary-minus(numeric $operand) => numeric
op:numeric-equal(numeric $operand1, numeric $operand2) => boolean
op:numeric-less-than(numeric $operand1, numeric $operand2) => boolean
op:numeric-greater-than(numeric $operand1, numeric $operand2) => boolean
op:numeric-less-than-or-equal(numeric $operand1, numeric $operand2) => boolean
op:numeric-greater-than-or-equal(numeric $operand1, numeric $operand2) => boolean
op:numeric-not-equal(numeric $operand1, numeric $operand2) => boolean
xf:floor(double? $srcval) => integer?
xf:ceiling(double? $srcval) => integer?
xf:round(double? $srcval) => integer?
xf:compare(string? $comparand1, string? $comparand2) => integer?
xf:compare(string? $comparand1, string? $comparand2, anyURI $collationLiteral) => integer?
xf:concat() => string
xf:concat(string? $op1) => string
xf:concat(string? $op1, string? $op2, ...) => string
xf:starts-with(string? $operand1, string? $operand2) => boolean?
xf:starts-with(string? $operand1, string? $operand2, anyURI $collationLiteral) => boolean?
xf:ends-with(string? $operand1, string? $operand2) => boolean?
xf:ends-with(string? $operand1, string? $operand2, anyURI $collationLiteral) => boolean?
xf:contains(string? $operand1, string? $operand2) => boolean?
xf:contains(string? $operand1, string? $operand2, anyURI $collationLiteral) => boolean?
xf:substring(string? $sourceString, decimal? $startingLoc) => string?
xf:substring(string? $sourceString, decimal? $startingLoc, decimal? $length) => string?
xf:string-length(string? $srcval) => integer?
xf:substring-before(string? $operand1, string? $operand2) => string?
xf:substring-before(string? $operand1, string? $operand2, anyURI $collationLiteral) => string?
xf:substring-after(string? $operand1, string? $operand2) => string?
xf:substring-after(string? $operand1, string? $operand2, anyURI $collationLiteral) => string?
xf:normalize-space(string? $srcval) => string?
xf:normalize-unicode(string? $srcval, string $normalizationForm) => string?
xf:upper-case(string? $srcval) => string?
xf:lower-case(string? $srcval) => string?
xf:translate(string? $srcval, string? $mapString, string? $transString) => string?
xf:string-pad(string? $padString, decimal? $padCount) => string?
xf:match(string? $srcval, string? $regexp) => integer*
xf:replace(string? $srcval, string? $regexp, string? $repval) => string?
Syntax for xf:match()
is based on W3C XML Schema Language
regular expressions:
Syntax for xf:replace()
is based
on W3C XML Schema Language
regular expressions plus $N
in replace patterns to indicate the
Nth match.
op:boolean-and(boolean $value1, boolean $value2) => boolean
op:boolean-or(boolean $value1, boolean $value2) => boolean
op:boolean-equal(boolean? $value1, boolean? $value2) => boolean?
xf:not(boolean? $srcval) => boolean
xf:not3(boolean? $srcval) => boolean?
Comparisons of Duration and Datetime Values:
op:duration-equal(duration $operand1, duration $operand2) => boolean
op:duration-less-than(duration $operand1, duration $operand2) => boolean
op:duration-greater-than(duration $operand1, duration $operand2) => boolean
op:duration-less-than-or-equal(duration $operand1, duration $operand2) => boolean
op:duration-greater-than-or-equal(duration $operand1, duration $operand2) => boolean
op:duration-not-equal(duration $operand1, duration $operand2) => boolean
op:datetime-equal(dateTime $operand1, dateTime $operand2) => boolean
op:datetime-less-than(dateTime $operand1, dateTime $operand2) => boolean
op:datetime-greater-than(dateTime $operand1, dateTime $operand2) => boolean
op:datetime-less-than-or-equal(dateTime $operand1, dateTime $operand2) => boolean
op:datetime-greater-than-or-equal(dateTime $operand1, dateTime $operand2) => boolean
op:datetime-not-equal(dateTime $operand1, dateTime $operand2) => boolean
Component Extraction Functions on Datetime Values:
xf:get-Century-from-dateTime(dateTime? $srcval) => integer?
xf:get-Century-from-date(date? $srcval) => integer?
xf:get-hour-from-dateTime(dateTime? $srcval) => integer?
xf:get-hour-from-time(time? $srcval) => integer?
xf:get-minutes-from-dateTime(dateTime? $srcval) => integer?
xf:get-minutes-from-time(time? $srcval) => integer?
xf:get-seconds-from-dateTime(dateTime? $srcval) => decimal?
xf:get-seconds-from-time(time? $srcval) => decimal?
xf:get-timezone-from-dateTime(dateTime? $srcval) => string?
xf:get-timezone-from-date(date? $srcval) => string?
xf:get-timezone-from-time(time? $srcval) => string?
Arithmetic Functions on Dates:
xf:add-days(date? $dateParam, decimal? $incrDays) => date?
Functions and Operators on TimePeriod Values:
op:get-duration(dateTime $parameter1, dateTime $parameter2) => duration
op:get-end-dateTime(dateTime $parameter1, duration $parameter2) => dateTime
xf:get-start-dateTime(dateTime $parameter1, duration $parameter2) => dateTime?
xf:get-local-name(QName? $srcval) => string?
xf:get-namespace-uri(QName? $srcval) => anyURI?
op:hex-binary-equal(hexBinary $value1, hexBinary $value2) => boolean
op:base64-binary-equal(base64Binary $value1, base64Binary $value2) => boolean
xf:local-name() => string
xf:local-name(node $srcval) => string
xf:number() => anySimpleType
xf:number(node $srcval) => anySimpleType
op:node-equal(node $parameter1, node $parameter2) => boolean
xf:deep-equal(node $parameter1, node $parameter2) => boolean
xf:deep-equal(node $parameter1, node $parameter2, anyURI $collation) => boolean
op:node-before(node $parameter1, node $parameter2) => boolean
op:node-after(node $parameter1, node $parameter2) => boolean
xf:copy(node? $srcval) => node?
xf:shallow(node? $srcval) => node?
xf:if-absent((elementNode | attributeNode)? $node, anySimpleType $value) => (elementNode | attributeNode | anySimpleType)?
xf:if-empty((elementNode | attributeNode)? $node, anySimpleType $value) => (elementNode | attributeNode | anySimpleType)
op:to(decimal $firstval, decimal $lastval) => sequence
xf:boolean(item* $srcval) => boolean
op:concatenate(item* $seq1, item* $seq2) => item*
op:item-at(item* $seqParam, decimal $posParam) => item?
xf:index-of(item* $seqParam, item $srchParam) => unsignedInt?
xf:index-of(item* $seqParam, item $srchParam, anyURI $collationLiteral) => unsignedInt?
xf:empty(item* $srcval) => boolean
xf:exists(item* $srcval) => boolean
xf:distinct-nodes(node* $srcval) => node*
xf:distinct-values(item* $srcval) => item*
xf:distinct-values(item* $srcval, anyURI $collationLiteral) => item*
xf:insert(item* $target, decimal $position, item* $inserts) => item*
xf:remove(item* $target, decimal $position) => item*
xf:sublist(item* $sourceSeq, decimal $startingLoc) => item*
xf:sublist(item* $sourceSeq, decimal $startingLoc, decimal $length) => item*
xf:sequence-deep-equal(item* $parameter1, item* $parameter2) => boolean?
xf:sequence-deep-equal(item* $parameter1, item* $parameter2, anyURI $collationLiteral) => boolean?
xf:sequence-node-equal(item*? $parameter1, item*? $parameter2) => boolean?
op:union(item* $parameter1, item* $parameter2) => item*
op:intersect(item* $parameter1, item* $parameter2) => item*
op:except(item* $parameter1, item* $parameter2) => item*
xf:count(item* $srcval) => unsignedInt
xf:avg(item* $srcval) => double?
xf:max(item* $srcval) => anySimpleType?
xf:max(item* $srcval, anyURI $collationLiteral) => anySimpleType?
xf:min(item* $srcval) => anySimpleType?
xf:min(item* $srcval, anyURI $collationLiteral) => anySimpleType?
xf:sum(item* $srcval) => double?
xf:id(IDREF* $srcval) => elementNode*
xf:idref(string* $srcval) => elementNode*
xf:filter(expression $srcval) => node*
xf:document(string? $srcval) => node?
op:context-item() => item
xf:position() => unsignedInt
xf:last() => unsignedInt
op:context-document() => DocumentNode
xf:current-dateTime() => dateTime
Uses XPath 2.0
Schema Aware
Partially implemented by Michael Kay's Saxon 7.0, http://saxon.sourceforge.net/
Simplify manipulation of XML Schema-typed content
Simplify manipulation of string content
Support related XML standards
Improve ease of use
Improve interoperability
Improve i18n support
Maintain backward compatibility
Enable improved processor efficiency
Simplifying the ability to parse unstructured information to produce structured results.
Turning XSLT into a general-purpose programming language
Must maintain backwards compatibility with XSLT 1.1
Should be able to match elements and attributes whose value is explicitly null.
Should allow included documents to encapsulate local stylesheets
Could support accessing infoset items for XML declaration
Could provide qualified name aware string functions
Could enable constructing a namespace with computed name
Could simplify resolving prefix conflicts in qname-valued attributes
Could support XHTML output method
Must allow matching on default namespace without explicit prefix
Must add date formatting functions
Must simplify accessing IDs and keys in other documents
Should provide function to absolutize relative URIs
Should include unparsed text from an external resource
Should allow authoring extension functions in XSLT
Should output character entity references instead of numeric character entities
Should construct entity reference by name
Should support Unicode string normalization
Should standardize extension element language bindings
Could improve efficiency of transformations on large documents
Could support reverse IDREF attributes
Could support case-insensitive comparisons
Could support lexigraphic string comparisons
Could allow comparing nodes based on document order
Could improve support for unparsed entities
Could allow processing a node with the "next best matching" template
Could make coercions symmetric by allowing scalar to nodeset conversion
Must support XML schema
Must simplify constructing and copying typed content
Must support sorting nodes based on XML schema type
Could support scientific notation in number formatting
Could provide ability to detect whether "rich" schema information is available
Must simplify grouping
Multiple output documents
Variables can be set to node sets; no more result tree fragments.
Existing elements and functions hardly change at all
Namespace is still http://www.w3.org/1999/XSL/Transform
version
attribute of
xsl:stylesheet
has value 2.0
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<!-- Top level elements -->
</xsl:stylesheet>
The result tree fragment data-type has been eliminated.
Variable-binding elements with content now construct node-sets
These node sets can now be operated on by templates
Functionality previously available with
saxon:nodeSet()
and similar extension functions
Allows pipelining of templates
Like xsl:for-each
, but orders elements differently
Works well with flat structures
Replaces Muenchian method
Basic syntax:
<xsl:for-each-group
select = expression
group-by = "string expression"
group-adjacent = "string expression"
group-starting-with = pattern>
<!-- Content: (xsl:sort*, content-constructor) -->
</xsl:for-each-group>
The select
attribute selects the population to be grouped.
The group-by
attribute calculates a string value for each node in the population.
Nodes with the same value are grouped together.
The group-adjacent
attribute
calculates a string value for each node in the population.
Every time the value changes, a new group is started.
The group-starting-with
starts a new group
every time its pattern is matched.
group-by
, group-adjacent
,
and group-starting-with
are mutually exclusive.
Task: Arrange articles in a large, flat document like this by section:
<?xml version="1.0"?> <backslash> <story> <title>ROX Desktop Update</title> <url>http://slashdot.org/article.pl?sid=02/02/18/180240</url> <time>2002-02-18 18:50:13</time> <author>timothy</author> <department>small-simple-swift</department> <topic>104</topic> <comments>32</comments> <section>developers</section> <image>topicx.jpg</image> </story> <story> <title>HP Selling Systems With Linux</title> <url>http://slashdot.org/article.pl?sid=02/02/18/1712241</url> <time>2002-02-18 17:37:20</time> <author>timothy</author> <department>wish-this-wasn't-remarkable</department> <topic>173</topic> <comments>188</comments> <section>articles</section> <image>topichp.gif</image> </story> <story> <title>Excellent Hacks to the ReplayTV 4000</title> <url>http://slashdot.org/article.pl?sid=02/02/18/1619213</url> <time>2002-02-18 16:46:04</time> <author>CmdrTaco</author> <department>hardware-I-lust-after</department> <topic>129</topic> <comments>117</comments> <section>articles</section> <image>topictv.jpg</image> </story> <story> <title>Peek-a-Boo(ty)</title> <url>http://slashdot.org/article.pl?sid=02/02/18/1546226</url> <time>2002-02-18 15:58:06</time> <author>Hemos</author> <department>pirate-treasure</department> <topic>158</topic> <comments>207</comments> <section>articles</section> <image>topicprivacy.gif</image> </story> <story> <title>Self-Shredding E-Mail</title> <url>http://slashdot.org/article.pl?sid=02/02/18/1343202</url> <time>2002-02-18 14:37:45</time> <author>timothy</author> <department>plausible-deniability</department> <topic>158</topic> <comments>170</comments> <section>articles</section> <image>topicprivacy.gif</image> </story> <story> <title>CIA &amp; KGB Gadgets On Display</title> <url>http://slashdot.org/article.pl?sid=02/02/18/0252219</url> <time>2002-02-18 13:52:04</time> <author>Hemos</author> <department>looking-a-tthe-gear</department> <topic>126</topic> <comments>103</comments> <section>articles</section> <image>topictech2.gif</image> </story> <story> <title>Re-Building the Wright Flyer</title> <url>http://slashdot.org/article.pl?sid=02/02/18/060257</url> <time>2002-02-18 12:29:12</time> <author>timothy</author> <department>we-hope-they-wear-modern-helmets</department> <topic>126</topic> <comments>132</comments> <section>science</section> <image>topictech2.gif</image> </story> <story> <title>How to Fix the Unix Configuration Nightmare</title> <url>http://slashdot.org/article.pl?sid=02/02/18/0248248</url> <time>2002-02-18 10:48:36</time> <author>Hemos</author> <department>fixing-the-problem</department> <topic>130</topic> <comments>367</comments> <section>articles</section> <image>topicunix.jpg</image> </story> <story> <title>Sleep Less, Live Longer</title> <url>http://slashdot.org/article.pl?sid=02/02/18/0511253</url> <time>2002-02-18 07:38:15</time> <author>timothy</author> <department>if-you're-reading-this</department> <topic>134</topic> <comments>309</comments> <section>science</section> <image>topicscience.gif</image> </story> <story> <title>Warming and Slowing the World</title> <url>http://slashdot.org/article.pl?sid=02/02/18/0243253</url> <time>2002-02-18 04:39:39</time> <author>Hemos</author> <department>slowing-things-down</department> <topic>134</topic> <comments>312</comments> <section>science</section> <image>topicscience.gif</image> </story> </backslash>
<?xml version="1.0"?> <forwardslash> <section> <title>developers</title> <story> <title>ROX Desktop Update</title> <url>http://slashdot.org/article.pl?sid=02/02/18/180240</url> <time>2002-02-18 18:50:13</time> <author>timothy</author> <department>small-simple-swift</department> <topic>104</topic> <comments>32</comments> <image>topicx.jpg</image> </story> </section> <section> <title>articles</title> <story> <title>HP Selling Systems With Linux</title> <url>http://slashdot.org/article.pl?sid=02/02/18/1712241</url> <time>2002-02-18 17:37:20</time> <author>timothy</author> <department>wish-this-wasn't-remarkable</department> <topic>173</topic> <comments>188</comments> <image>topichp.gif</image> </story> <story> <title>Excellent Hacks to the ReplayTV 4000</title> <url>http://slashdot.org/article.pl?sid=02/02/18/1619213</url> <time>2002-02-18 16:46:04</time> <author>CmdrTaco</author> <department>hardware-I-lust-after</department> <topic>129</topic> <comments>117</comments> <image>topictv.jpg</image> </story> <story> <title>Peek-a-Boo(ty)</title> <url>http://slashdot.org/article.pl?sid=02/02/18/1546226</url> <time>2002-02-18 15:58:06</time> <author>Hemos</author> <department>pirate-treasure</department> <topic>158</topic> <comments>207</comments> <image>topicprivacy.gif</image> </story> <story> <title>Self-Shredding E-Mail</title> <url>http://slashdot.org/article.pl?sid=02/02/18/1343202</url> <time>2002-02-18 14:37:45</time> <author>timothy</author> <department>plausible-deniability</department> <topic>158</topic> <comments>170</comments> <image>topicprivacy.gif</image> </story> <story> <title>CIA &amp; KGB Gadgets On Display</title> <url>http://slashdot.org/article.pl?sid=02/02/18/0252219</url> <time>2002-02-18 13:52:04</time> <author>Hemos</author> <department>looking-a-tthe-gear</department> <topic>126</topic> <comments>103</comments> <image>topictech2.gif</image> </story> <story> <title>How to Fix the Unix Configuration Nightmare</title> <url>http://slashdot.org/article.pl?sid=02/02/18/0248248</url> <time>2002-02-18 10:48:36</time> <author>Hemos</author> <department>fixing-the-problem</department> <topic>130</topic> <comments>367</comments> <image>topicunix.jpg</image> </story> </section> <section> <title>science</title> <story> <title>Re-Building the Wright Flyer</title> <url>http://slashdot.org/article.pl?sid=02/02/18/060257</url> <time>2002-02-18 12:29:12</time> <author>timothy</author> <department>we-hope-they-wear-modern-helmets</department> <topic>126</topic> <comments>132</comments> <image>topictech2.gif</image> </story> <story> <title>Sleep Less, Live Longer</title> <url>http://slashdot.org/article.pl?sid=02/02/18/0511253</url> <time>2002-02-18 07:38:15</time> <author>timothy</author> <department>if-you're-reading-this</department> <topic>134</topic> <comments>309</comments> <section>science</section> <image>topicscience.gif</image> </story> <story> <title>Warming and Slowing the World</title> <url>http://slashdot.org/article.pl?sid=02/02/18/0243253</url> <time>2002-02-18 04:39:39</time> <author>Hemos</author> <department>slowing-things-down</department> <topic>134</topic> <comments>312</comments> <section>science</section> <image>topicscience.gif</image> </story> </section> </forwardslash>
<?xml version="1.0"?> <xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="/"> <forwardslash> <xsl:apply-templates select="*"/> </forwardslash> </xsl:template> <xsl:template match="backslash"> <xsl:for-each-group select="story" group-by="section"> <section> <title><xsl:value-of select="current-group()/section"/></title> <xsl:apply-templates select="."/> </section> </xsl:for-each-group> </xsl:template> <xsl:template match="story"> <story> <xsl:apply-templates/> </story> </xsl:template> <xsl:template match="*"> <xsl:copy-of select="."/> </xsl:template> <xsl:template match="section"/> </xsl:stylesheet>
Determines the URI of the principal result tree; i.e. the main output document
Syntax:
<!-- Category: declaration -->
<xsl:destination
format = "QualifiedName"
href = "uri-reference" />
The format attribute names an xsl:output
element
for this result document.
Determines the URI of a secondary result tree; there can be several of these.
Allows you to generate multiple documents from one source document
Previously available with extension functions like
xt:document
and saxon:output
Syntax:
<!-- Category: instruction -->
<xsl:result-document
format = "QualifiedName"
href = "uri-reference">
<!-- Content: content-constructor -->
</xsl:result-document>
The format attribute names an xsl:output
element
for this result document.
<xsl:output name="ccl:html" method="html" encoding="ISO-8859-1" />
<xsl:result-document href="index.html" format="ccl:html">
<html>
<head>
<title><xsl:value-of select="title"/></title>
</head>
<body>
<h1 align="center"><xsl:value-of select="title"/></h1>
<ul>
<xsl:for-each select="slide">
<li><a href="{format-number(position(),'00')}.html"><xsl:value-of select="title"/></a></li>
</xsl:for-each>
</ul>
<p><a href="{translate(title,' ', '_')}.html">Entire Presentation as Single File</a></p>
<hr/>
<div align="center">
<A HREF="01.html">Start</A> | <A HREF="/xml/">Cafe con Leche</A>
</div>
<hr/>
<font size="-1">
Copyright 2002
<a href="http://www.elharo.com/">Elliotte Rusty Harold</a><br/>
<a href="mailto:elharo@metalab.unc.edu">elharo@metalab.unc.edu</a><br/>
Last Modified <xsl:apply-templates select="last_modified" mode="lm"/>
</font>
</body>
</html>
</xsl:result-document>
<xsl:sort-key
name = "Qualified Name">
<!-- Content: (xsl:sort+) -->
</xsl:sort-key>
Attaches an additional namespace node to a result tree element
Rarely necessary; normally the usual XSLT 1.0 namespace declarations are sufficient.
Occasionally useful if the output document uses a namespace prefix exclusively in element content or attribute values
<xsl:namespace name="xsd">http://www.w3.org/2001/XMLSchema</xsl:namespace>
Separator attribute identifies value placed between string value of each member of sequence
<x><xsl:value-of select="(1,2,3,4)" separator=" | "/></x>
<x>1 | 2 | 3 | 4</x>
An attribute that specifies the default namespace in effect for unprefixed element names used in XPath expressions within this element and its descendants
Can be used on literal result elements, in which case it is
in the XSLT namespace and the attribute is prefixed as
xsl:default-xpath-namespace
XPath expressions must use a prefix to match XHTML element names.
<?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns="http://www.w3.org/1999/xhtml" xmlns:html="http://www.w3.org/1999/xhtml" > <xsl:output method="html" encoding="ISO-8859-1"/> <xsl:template match="week"> <html xml:lang="en" lang="en"> <head><title><xsl:value-of select="//html:h1[1]"/></title></head> <body bgcolor="#ffffff" text="#000000"> <xsl:apply-templates select="html:body"/> <font size="-1">Last Modified Mon June 5, 2001<br /> Copyright 2001 Elliotte Rusty Harold<br /> <a href="mailto:elharo@metalab.unc.edu">elharo@metalab.unc.edu</a> </font> </body> </html> </xsl:template> <xsl:template match="html:body"> <xsl:apply-templates select="text()[count(following-sibling::html:hr)>1]|*[count(following-sibling::html:hr)>1]" /> <hr/> </xsl:template> <xsl:template match="html:*"> <xsl:copy> <xsl:for-each select="@*"> <xsl:copy-of select="."/> </xsl:for-each> <xsl:apply-templates/> </xsl:copy> </xsl:template> <xsl:template match="html:font[@size='-1']"></xsl:template> <xsl:template match="html:a"> <xsl:apply-templates/> </xsl:template> <xsl:template match="html:applet"> <xsl:apply-templates/> </xsl:template> <xsl:template match="html:param"/> </xsl:stylesheet>
XPath expressions can use customary, non-prefixed XHTML element names.
<?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns="http://www.w3.org/1999/xhtml" default-xpath-namespace="http://www.w3.org/1999/xhtml" > <xsl:output method="html" encoding="ISO-8859-1"/> <xsl:template match="week"> <html xml:lang="en" lang="en"> <head><title><xsl:value-of select="//h1[1]"/></title></head> <body bgcolor="#ffffff" text="#000000"> <xsl:apply-templates select="body"/> <font size="-1">Last Modified Mon June 5, 2001<br /> Copyright 2001 Elliotte Rusty Harold<br /> <a href="mailto:elharo@metalab.unc.edu">elharo@metalab.unc.edu</a> </font> </body> </html> </xsl:template> <xsl:template match="body"> <xsl:apply-templates select="text()[count(following-sibling::hr)>1]|*[count(following-sibling::hr)>1]"/> <hr/> </xsl:template> <xsl:template match="*"> <xsl:copy> <xsl:for-each select="@*"> <xsl:copy-of select="."/> </xsl:for-each> <xsl:apply-templates/> </xsl:copy> </xsl:template> <xsl:template match="font[@size='-1']"></xsl:template> <xsl:template match="a"> <xsl:apply-templates/> </xsl:template> <xsl:template match="applet"> <xsl:apply-templates/> </xsl:template> <xsl:template match="param"/> </xsl:stylesheet>
Top-level elements in some namespace other than the XSLT namespace (No namespace is not allowed.)
Exact interpretation is processor specific, but may not change the meaning of cusotmary elements
Possible uses:
Data for extension instructions and extension functions
Information about what to do with the result tree
Information about how to obtain the source tree
Optimization hints
Metadata about the stylesheet
Documentation for the stylesheet
Variables can have a type:
Syntax:
<xsl:variable
name = "QualifiedName"
select = expression
type = datatype>
<!-- Content: content-constructor -->
</xsl:variable>
<xsl:param
name = "QualifiedName"
select = expression
type = datatype>
<!-- Content: content-constructor -->
</xsl:param>
Constants for types remain to be determined
<xsl:function name="math:factorial"
xmlns:fib="http://www.example.com/math"
exclude-result-prefixes="math">
<xsl:param name="index" type="xsd:nonNegativeInteger"/>
<xsl:result type="xsd:positiveInteger"
select="if ($sentence eq 0) then 1
else math:factorial(index - 1)/>
</xsl:function>
sequence unparsed-text(sequence uris, String encoding?)
For example,
<include_as_text source="bib.xml"/>
<xsl:template match="include_as_text">
<xsl:value-of select="unparsed-text(@source)"/>
</xsl:template>
Three parts:
A data model for XML documents based on the XML Infoset
A mathematically precise query algebra; that is, a set of query operators on that data model
A query language based on these query operators and this algebra
A fourth generation declarative language like SQL; not a procedural language like Java or a functional language like XSLT
Queries operate on single documents or fixed collections of documents.
Queries select whole documents or subtrees of documents that match conditions defined on document content and structure
Can construct new documents based on what is selected
No updates or inserts!
Narrative documents and collections of such documents; e.g. generate a table of contents for a book
Data-oriented documents; e.g. SQL-like queries of an XML dump of a database
Filtering streams to process logs of email messages, network packets, stock market data, newswire feeds, EDI, or weather data to filter and route messages represented in XML, to extract data from XML streams, or to transform data in XML streams.
XML views of non-XML data
Files on a disk
Native-XML databases like Software AG's Tamino
DOM trees in memory
Streaming data
Other representations of the infoset
Direct query tools at command line
GUI query tools
JSP, ASP, PHP, and other such server side technologies
Programs written in Java, C++, and other languages that need to extract data from XML documents
Others are possible
Anywhere SQL is used to extract data from a database, XQuery is used to extract data from an XML document.
SQL is a non-compiled language that must be processed by some other tool to extract data from a database. So is XQuery.
A relational database contains tables | An XML database contains collections |
A relational table contains records with the same schema | A collection contains XML documents with the same DTD |
A relational record is an unordered list of named values | An XML document is a tree of nodes |
A SQL query returns an unordered set of records | An XQuery returns an unordered sequence of nodes |
XML 1.0 #PCDATA
Schema primitive types: positiveInteger, String, float, double, unsignedLong, gYear, date, time, boolean, etc.
Schema complex types
Collections of these types
References to these types
Most of the examples in this talk query this bibliography document at the (relative) URL bib.xml:
<bib> <book year="1994"> <title>TCP/IP Illustrated</title> <author><last>Stevens</last><first>W.</first></author> <publisher>Addison-Wesley</publisher> <price> 65.95</price> </book> <book year="1992"> <title>Advanced Programming in the Unix Environment</title> <author><last>Stevens</last><first>W.</first></author> <publisher>Addison-Wesley</publisher> <price>65.95</price> </book> <book year="2000"> <title>Data on the Web</title> <author><last>Abiteboul</last><first>Serge</first></author> <author><last>Buneman</last><first>Peter</first></author> <author><last>Suciu</last><first>Dan</first></author> <publisher>Morgan Kaufmann Publishers</publisher> <price> 39.95</price> </book> <book year="1999"> <title>The Economics of Technology and Content for Digital TV</title> <editor> <last>Gerbarg</last><first>Darcy</first> <affiliation>CITI</affiliation> </editor> <publisher>Kluwer Academic Publishers</publisher> <price>129.95</price> </book> </bib>
Adapted from Mary Fernandez, Jerome Simeon, and Phil Wadler: XML Query Languages: Experiences and Exemplars, 1999, as adapted in XML Query Use Cases
FOR
: each node selected by an XPath 2.0 location path
LET
: a new variable have a specified value
WHERE
: a condition expressed in XPath is true
RETURN
: this node set
FOR $t IN document("bib.xml")/bib/book/title
RETURN
$t
Adapted from XML Query Use Cases
<title>TCP/IP Illustrated</title>
<title>Advanced Programming in the Unix Environment</title>
<title>Data on the Web</title>
<title>The Economics of Technology and Content for Digital TV</title>
Adapted from XML Query Use Cases
An XML Syntax for XQuery
Intended for machine processing and programmer convenience, not for human legibility
In XQuery:
FOR $t IN document("bib.xml")/bib/book/title
RETURN
$t
In XQueryX:
<?xml version="1.0"?>
<xq:query xmlns:xq="http://www.w3.org/2001/06/xqueryx">
<xq:flwr>
<xq:forAssignment variable="$t">
<xq:step axis="CHILD">
<xq:function name="document">
<xq:constant datatype="CHARSTRING">bib.xml</xq:constant>
</xq:function>
<xq:identifier>bib</xq:identifier>
</xq:step>
<xq:step axis="CHILD">
<xq:identifier>book</xq:identifier>
</xq:step>
<xq:step axis="CHILD">
<xq:identifier>title</xq:identifier>
</xq:step>
</xq:forAssignment>
<xq:return>
<xq:variable>$t</xq:variable>
</xq:return>
</xq:flwr>
</xq:query>
Tags are given as literals
XQuery expression which is evaluated to become the contents of the element is enclosed in curly braces
The contents can also contain literal text outside the braces
List titles of all books in a bib
element.
Put each title in a book
element.
<bib>
{
FOR $t IN document("bib.xml")/bib/book/title
RETURN
<book>
{ $t }
</book>
}
</bib>
Adapted from XML Query Use Cases
<bib>
<book>
<title>TCP/IP Illustrated</title>
</book>
<book>
<title>Advanced Programming in the Unix Environment</title>
</book>
<book>
<title>Data on the Web</title>
</book>
<book>
<title>The Economics of Technology and Content for Digital TV</title>
</book>
</bib>
Adapted from XML Query Use Cases
List titles of books published by Addison-Wesley
<bib>
{
FOR $b IN document("bib.xml")/bib/book
WHERE $b/publisher = "Addison-Wesley"
RETURN
$b/title
}
</bib>
This WHERE
clause could be replaced by an XPath predicate:
<bib>
{
FOR $b IN document("bib.xml")/bib/book[publisher="Addison-Wesley"]
RETURN
$b/title
}
</bib>
But WHERE
clauses can combine
multiple variables from multiple documents
Adapted from XML Query Use Cases
<bib>
<title>TCP/IP Illustrated</title>
<title>Advanced Programming in the Unix Environment</title>
</bib>
Adapted from XML Query Use Cases
XQuery booleans include:
AND
OR
NOT()
List books published by Addison-Wesley after 1993:
<bib>
{
FOR $b IN document("bib.xml")/bib/book
WHERE $b/publisher = "Addison-Wesley" AND $b/@year > 1993
RETURN
$b/title
}
</bib>
Adapted from XML Query Use Cases
<bib>
<title>Advanced Programming in the Unix Environment</title>
</bib>
Adapted from XML Query Use Cases
List books published by Addison-Wesley after 1993, including their year and title:
<bib>
{
FOR $b IN document("bib.xml")/bib/book
WHERE $b/publisher = "Addison-Wesley" AND $b/@year > 1993
RETURN
<book year = { $b/@year }>
{ $b/title }
</book>
}
</bib>
This is not well-formed XML!
Adapted from XML Query Use Cases
<bib>
<book year="1992">
<title>Advanced Programming in the Unix Environment</title>
</book>
</bib>
Adapted from XML Query Use Cases
Create a list of all the title-author pairs, with each pair enclosed in
a result
element.
<results>
{
FOR $b IN document("bib.xml")/bib/book,
$t IN $b/title,
$a IN $b/author
RETURN
<result>
{ $t }
{ $a }
</result>
}
</results>
Adapted from XML Query Use Cases
<results>
<result>
<title>TCP/IP Illustrated</title>
<author><last>Stevens</last><first>W.</first></author>
</result>
<result>
<title>Advanced Programming in the Unix Environment</title>
<author><last>Stevens</last><first>W.</first></author>
</result>
<result>
<title>Data on the Web</title>
<author><last>Abiteboul</last><first>Serge</first></author>
</result>
<result>
<title> Data on the Web</title>
<author><last>Buneman</last><first>Peter</first></author>
</result>
<result>
<title>Data on the Web</title>
<author><last>Suciu</last><first>Dan</first></author>
</result>
</results>
Adapted from XML Query Use Cases
For each book in the bibliography, list the title and authors, grouped inside
a result
element.
<results>
{
FOR $b IN document("bib.xml")/bib/book
RETURN
<result>
{ $b/title }
{
FOR $a IN $b/author
RETURN $a
}
</result>
}
</results>
Adapted from XML Query Use Cases
<?xml version="1.0"?>
<results xmlns:ino="http://namespaces.softwareag.com/tamino/response2" xmlns:xql="http://metalab.unc.edu/xql/">
<result>
<title>TCP/IP Illustrated</title>
<author>
<last>Stevens</last>
<first>W.</first>
</author>
</result>
<result>
<title>Advanced Programming in the Unix Environment</title>
<author>
<last>Stevens</last>
<first>W.</first>
</author>
</result>
<result>
<title>Data on the Web</title>
<author>
<last>Abiteboul</last>
<first>Serge</first>
</author>
<author>
<last>Buneman</last>
<first>Peter</first>
</author>
<author>
<last>Suciu</last>
<first>Dan</first>
</author>
</result>
<result>
<title>The Economics of Technology and Content for Digital TV</title>
</result>
</results>
Adapted from XML Query Use Cases
For each author in the bibliography, list the author's name and the titles of
all books by that author, grouped inside a result
element.
<results>
{
FOR $a IN distinct-values(document("bib.xml")//author)
RETURN
<result>
{ $a }
{ FOR $b IN document("bib.xml")/bib/book[author=$a]
RETURN $b/title
}
</result>
}
</results>
Adapted from XML Query Use Cases
<results>
<result>
<author><last>Stevens</last><first>W.</first></author>
<title>TCP/IP Illustrated</title>
<title>Advanced Programming in the Unix Environment</title>
</result>
<result>
<author><last>Abiteboul</last><first>Serge</first></author>
<title>Data on the Web</title>
</result>
<result>
<author><last>Buneman</last><first>Peter</first></author>
<title>Data on the Web</title>
</result>
<result>
<author><last>Suciu</last><first>Dan</first></author>
<title>Data on the Web</title>
</result>
</results>
Adapted from XML Query Use Cases
List the titles and years of all books published by Addison-Wesley after 1991, in alphabetic order.
<bib>
{
FOR $b IN document("bib.xml")//book
[publisher = "Addison-Wesley" AND @year > "1991"]
RETURN
<book>
{ $b/@year } { $b/title }
</book> SORTBY (title)
}
</bib>
Adapted from XML Query Use Cases
<bib>
<book year="1992">
<title>Advanced Programming in the Unix Environment</title>
</book>
<book year="1994">
<title>TCP/IP Illustrated</title>
</book>
</bib>
Adapted from XML Query Use Cases
Find books in which some element has a tag ending in "or" and the same element contains the string "Suciu" (at any level of nesting). For each such book, return the title and the qualifying element.
<result>
FOR $b IN document("bib.xml")//book,
$e IN $b/*[contains(string(.), "Suciu")]
WHERE ends_with(name($e), "or")
RETURN
<book>
{ $b/title} { $e }
</book>
</result>
Not supported by Quip yet
Adapted from XML Query Use Cases
<result>
<book>
<title> Data on the Web </title>
<author> <last> Suciu </last> <first> Dan </first> </author>
</book>
</result>
Adapted from XML Query Use Cases
Sample data at "reviews.xml":
<reviews> <entry> <title>Data on the Web</title> <price>34.95</price> <review> A very good discussion of semi-structured database systems and XML. </review> </entry> <entry> <title>Advanced Programming in the Unix Environment</title> <price>65.95</price> <review> A clear and detailed discussion of UNIX programming. </review> </entry> <entry> <title>TCP/IP Illustrated</title> <price>65.95</price> <review> One of the best books on TCP/IP. </review> </entry> </reviews>
Adapted from XML Query Use Cases
<!ELEMENT reviews (entry*)> <!ELEMENT entry (title, price, review)> <!ELEMENT title (#PCDATA)> <!ELEMENT price (#PCDATA)> <!ELEMENT review (#PCDATA)>
For each book found in both bib.xml and reveiws.xml, list the title of the book and its price from each source.
<books-with-prices>
{
FOR $b IN document("bib.xml")//book,
$a IN document("reviews.xml")//entry
WHERE $b/title = $a/title
RETURN
<book-with-prices>
{ $b/title },
<price-amazon> { $a/price/text() } </price-amazon>
<price-bn> { $b/price/text() } </price-bn>
</book-with-prices>
}
</books-with-prices>
Adapted from XML Query Use Cases
<books-with-prices>
<book-with-prices>
<title>TCP/IP Illustrated</title>
<price-amazon>65.95</price-amazon>
<price-bn>65.95</price-bn>
</book-with-prices>
<book-with-prices>
<title>Advanced Programming in the Unix Environment</title>
<price-amazon>65.95</price-amazon>
<price-bn>65.95</price-bn>
</book-with-prices>
<book-with-prices>
<title>Data on the Web</title>
<price-amazon>34.95</price-amazon>
<price-bn>39.95</price-bn>
</book-with-prices>
</books-with-prices>
Adapted from XML Query Use Cases
The next query also uses an input document named "prices.xml":
<prices> <book> <title>Advanced Programming in the Unix Environment</title> <source>www.amazon.com</source> <price>65.95</price> </book> <book> <title>Advanced Programming in the Unix Environment </title> <source>www.bn.com</source> <price>65.95</price> </book> <book> <title>TCP/IP Illustrated </title> <source>www.amazon.com</source> <price>65.95</price> </book> <book> <title>TCP/IP Illustrated </title> <source>www.bn.com</source> <price>65.95</price> </book> <book> <title>Data on the Web</title> <source>www.amazon.com</source> <price>34.95</price> </book> <book> <title>Data on the Web</title> <source>www.bn.com</source> <price>39.95</price> </book> </prices>
Adapted from XML Query Use Cases
In the document "prices.xml", find the minimum price for each book, in the
form of a minprice
element with the book title as its
title
attribute.
<results>
{
FOR $t IN distinct(document("prices.xml")/book/title)
LET $p := $doc/book[title = $t]/price
RETURN
<minprice title = { $t/text() } >
{ min($p) }
</minprice>
}
</results>
Adapted from XML Query Use Cases
<results>
<minprice title="Advanced Programming in the Unix Environment"> 65.95 </minprice>
<minprice title="TCP/IP Illustrated"> 65.95 </minprice>
<minprice title="Data on the Web"> 34.95 </minprice>
</results>
Adapted from XML Query Use Cases
For each book with an author, return a
book
with its title and authors. For
each book with an editor, return a
reference
with the book title and the
editor's affiliation.
<bib>
{
FOR $b IN document("bib.xml")//book[author]
RETURN
<book>
{ $b/title }
{ $b/author }
</book>,
FOR $b IN document("bib.xml")//book[editor]
RETURN
<reference>
{ $b/title }
<org> { $b/editor/affiliation/text() } </org>
</reference>
}
</bib>
Adapted from XML Query Use Cases
<bib>
<book>
<title>TCP/IP Illustrated</title>
<author><last> Stevens </last> <first> W.</first></author>
</book>
<book>
<title>Advanced Programming in the Unix Environment</title>
<author><last>Stevens</last><first>W.</first></author>
</book>
<book>
<title>Data on the Web</title>
<author><last>Abiteboul</last><first>Serge</first></author>
<author><last>Buneman</last><first>Peter</first></author>
<author><last>Suciu</last><first>Dan</first></author>
</book>
<reference>
<title>The Economics of Technology and Content for Digital TV</title>
<org>CITI</org>
</reference>
</bib>
Adapted from XML Query Use Cases
Quilt: http://www.almaden.ibm.com/cs/people/chamberlin/quilt.html
Kweelt: http://db.cis.upenn.edu/Kweelt/
Ipedo: http://www.ipedo.com/
XSLT is document-driven; XQuery is program driven
XSLT is functional; XQuery is imperative
XSLT is written in XML; XQuery is not
An assertion (unproven): XSLT 2.0 can do everything XQuery can do
This presentation: http://www.cafeconleche.org/slides/xmlone/london2002/advancedxml/
XSLT 2.0 Working Draft: http://www.w3.org/TR/xslt20/
XPath 2.0 Requirements: http://www.w3.org/TR/2001/WD-xpath20req-20010214
XSLT 2.0 Requirements: http://www.w3.org/TR/2001/WD-xslt20req-20010214
XQuery: A Query Language for XML: http://www.w3.org/TR/xquery/
XML Query Requirements: http://www.w3.org/TR/xmlquery-req
XML Query Use Cases: http://www.w3.org/TR/xmlquery-use-cases
XML Query Data Model: http://www.w3.org/TR/query-datamodel/
The XML Query Algebra: http://www.w3.org/TR/query-algebra/
XML Syntax for XQuery 1.0 (XQueryX): http://www.w3.org/TR/xqueryx
XQuery 1.0 and XPath 2.0 Functions and Operators Version 1.0: http://www.w3.org/TR/xquery-operators/
Actually, SAX2 has ** MUCH ** better infoset support than DOM does. Yes, I've done the detailed analysis.
--David Brownell on the xml-dev mailing list
Full Infoset support
Backwards compatible with SAX2
Much less radical changes than from SAX1 to SAX2
Infoset includes a flag saying whether a given attribute value was specified in the instance document or defaulted from the DTD.
DOM also wants to know this
Solution:
package org.xml.sax.ext;
public interface Attributes2 extends Attributes {
public boolean isSpecified (int index);
public boolean isSpecified (String uri, String localName);
public boolean isSpecified (String qualifiedName);
}
This interface would be implemented by SAX 2.1
Attributes
objects provided in
startElement()
callbacks
The read-only
http://xml.org/sax/features/use-attributes2 feature
specifies whether Attributes2
is available
<?xml version="1.0" standalone="yes"?>
The XML Infoset includes a standalone property for documents
Not currently exposed by SAX2
Solution: Define a new read-only feature: http://xml.org/sax/features/is-standalone
Open issue: distinguish between standalone="no"
and omitted standalone declaration?
<?xml version="1.0" encoding="UTF-16"?>
Infoset includes the version and encoding from the XML declaration; SAX2 does not.
Unlike standalone, these apply to all parsed entities; not just the document entity
Solution:
package org.xml.sax.ext;
public interface Locator2 extends Locator {
public String getXMLVersion ();
public String getEncoding ();
}
This would be implemented by
Locator
objects passed to
setDocumentLocator()
methods
The read-only feature http://xml.org/sax/features/use-locator2
says whether Locator2
's are used.
To make matters worse, there can be as many as three encodings:
What's declared in the document using an encoding declaration in the XML declaration
The MIME type encoding, as specified by the the HTTP header
The name of the encoding used by a java.io.InputStreamReader
(UTF8 vs. UTF-8)
There's no way to find out what features
and properties a given XMLReader
recognizes.
Solution: Define two new read-only properties:
XMLReader
.
XMLReader
.
Or perhaps a method instead of a property?
The DeclHandler
and LexicalHandler
extension handlers are not supported by the
DefaultHandler
convenience class.
Solution:
Define a new org.xml.sax.ext class
implementing those two
interfaces, inheriting from
org.xml.sax.helpers.DefaultHandler
public class DefaultHandler2 extends DefaultHandler
implements DeclHandler, LexicalHandler {
// LexicalHandler methods
public void startDTD(String name, String publicId, String systemId)
throws SAXException {}
public void endDTD() throws SAXException {}
public void startEntity(String name) throws SAXException {}
public void endEntity(String name) throws SAXException {}
public void startCDATA() throws SAXException {}
public void endCDATA() throws SAXException {}
public void comment(char[] ch, int start, int length)
throws SAXException {}
// DeclHandler methods
public void elementDecl(String name, String model)
throws SAXException {}
public void attributeDecl(String elementName,
String attributeName, String type,
String valueDefault, String value)
throws SAXException {}
public void internalEntityDecl(String name, String value)
throws SAXException {}
public void externalEntityDecl(String name, String publicID,
String systemID) throws SAXException {}
}
Alternately,
update DefaultHandler
.
Problem: There is no conventional way for applications to identify the version of the parser they are using, for purposes of diagnostics or other kinds of troubleshooting.
The best the JVM supports is the JDK 1.2
java.lang.Package
facility,
which is dependent on the JAR file metadata. It provides a partial solution, at
the price of portability (JDK 1.1 APIs are much more portable) and
assumptions like "one parser per package".
Solution: Define a new standard read-only property:
Returns a string identifying the reader and its version for use in diagnostics.
Parsers could support that if desired, probably using some sort of resource-based mechanism (not necessarily Package) to keep such release-specific strings out of the source code.
Open issue: Should there be separate strings to ID the reader (likely a constant value) and its version (ideally assigned in release engineering)?
package org.jdom;
public final class Verifier {
public static final String checkElementName(String name) {}
public static final String checkAttributeName(String name) {}
public static final String checkCharacterData(String text) {}
public static final String checkNamespacePrefix(String prefix) {}
public static final String checkNamespaceURI(String uri) {}
public static final String checkProcessingInstructionTarget(String target) {}
public static final String checkCommentData(String data) {}
public static boolean isXMLCharacter(char c) {}
public static boolean isXMLNameCharacter(char c) {}
public static boolean isXMLNameStartCharacter(char c) {}
public static boolean isXMLLetterOrDigit(char c) {}
public static boolean isXMLLetter(char c) {}
public static boolean isXMLCombiningChar(char c) {}
public static boolean isXMLExtender(char c) {}
public static boolean isXMLDigit(char c) {}
}
Subscribe to the xml-dev mailing list, http://lists.xml.org/archives/xml-dev/
of all of the things the W3C has given us, the DOM is probably the one with the least value.
--Michael Brennan on the xml-dev mailing list
DOM Level 0: what was implemented for JavaScript in Netscape 3/IE3
DOM Level 1, a W3C Standard
DOM Level 2, a W3C Standard
DOM Level 3: Several Working Drafts:
Grammar access; a.k.a abstract schemas (DTDs, RELAX NG schemas, W3C XML Schema Language schemas)
Extra (IDL) attributes on Entity
,
Document
, Node
,
and Text
interfaces
Document normalization
Standard means of loading and saving XML documents.
Bootstrapping new documents
Key events
DOMKey
Node
Document
Text
Entity
Bootstrapping
Every node gets a unique key automatically generated by the DOM implementation to uniquely identify DOM nodes.
For Java this is just an
Object
. (Implementations may use more detailed subclasses.)
Adds:
I will only show the new methods. Currently, the plan is to
simply add these to the existing Node
interface.
Java binding:
package org.w3c.dom;
public interface Node {
public String getBaseURI();
public static final short TREE_POSITION_PRECEDING = 0x01;
public static final short TREE_POSITION_FOLLOWING = 0x02;
public static final short TREE_POSITION_ANCESTOR = 0x04;
public static final short TREE_POSITION_DESCENDANT = 0x08;
public static final short TREE_POSITION_EQUIVALENT = 0x10;
public static final short TREE_POSITION_SAME_NODE = 0x20;
public static final short TREE_POSITION_DISCONNECTED = 0x00;
public int compareTreePosition(Node other) throws DOMException;
public String getTextContent() throws DOMException;
public void setTextContent(String textContent) throws DOMException;
public Object setUserData(String key, Object data, UserDataHandler handler);
public Object getUserData(String key);
public boolean isSameNode(Node other);
public boolean isEqualNode(Node arg, boolean deep);
public String lookupNamespacePrefix(String namespaceURI);
public String lookupNamespaceURI(String prefix);
public void normalizeNS();
public Node getAs(String feature);
public Object getKey();
}
In IDL:
interface Node {
readonly attribute DOMString baseURI;
const unsigned short TREE_POSITION_PRECEDING = 0x01;
const unsigned short TREE_POSITION_FOLLOWING = 0x02;
const unsigned short TREE_POSITION_ANCESTOR = 0x04;
const unsigned short TREE_POSITION_DESCENDANT = 0x08;
const unsigned short TREE_POSITION_EQUIVALENT = 0x10;
const unsigned short TREE_POSITION_SAME_NODE = 0x20;
const unsigned short TREE_POSITION_DISCONNECTED = 0x00;
unsigned short compareTreePosition(in Node other);
attribute DOMString textContent; // raises(DOMException) on setting
// raises(DOMException) on retrieval
boolean isSameNode(in Node other);
DOMString lookupNamespacePrefix(in DOMString namespaceURI);
DOMString lookupNamespaceURI(in DOMString prefix);
boolean isEqualNode(in Node arg, in boolean deep);
Node getInterface(in DOMString feature);
DOMKeyObject setUserData(in DOMString key, in DOMKeyObject data,
in UserDataHandler handler);
DOMKeyObject getUserData(in DOMString key);
};
XML documents may be built from multiple parsed entities, each of which is not necessarily a well-formed XML document, but is at least a plausible part of a well-formed XML document.
Each entity may have its own text declaration.
This is like an XML declaration without a standalone
attribute
and with an optional version
attribute:
<?xml version="1.0"?>
<?xml version="1.0" encoding="ISO-8859-9"?>
<?xml encoding="ISO-8859-9"?>
DOM3 adds:
Java binding:
package org.w3c.dom;
public interface Entity extends Node {
public String getActualEncoding();
public void setActualEncoding(String actualEncoding);
public String getEncoding();
public void setEncoding(String encoding);
public String getVersion();
public void setVersion();
}
In IDL:
interface Entity : Node {
attribute DOMString actualEncoding;
attribute DOMString encoding;
attribute DOMString version;
};
Adds:
<?xml version="1.0"?>
<?xml version="1.0" encoding="ISO-8859-9"?>
<?xml version="1.0" encoding="ISO-8859-9" standalone="no"?>
<?xml version="1.0" standalone="yes"?>
adoptNode()
setBaseURI()
renameNode()
DOMErrorHandler
that will be called in the event "that an error is
encountered while performing an operation on a document"Java binding:
package org.w3c.dom;
public interface Document extends Node {
public String getActualEncoding();
public void setActualEncoding(String actualEncoding);
public String getEncoding();
public void setEncoding(String encoding);
public boolean getStandalone();
public void setStandalone(boolean standalone);
public boolean getStrictErrorChecking();
public void setStrictErrorChecking(boolean strictErrorChecking);
public String getVersion();
public void setVersion(String version);
public Node adoptNode(Node source) throws DOMException;
public DOMErrorHandler getErrorHandler();
public void setErrorHandler(DOMErrorHandler errorHandler);
public String getDocumentURI();
public void setDocumentURI(String documentURI);
public void normalizeDocument();
public boolean canSetNormalizationFeature(String name, boolean state);
public void setNormalizationFeature(String name, boolean state)
throws DOMException;
public boolean getNormalizationFeature(String name)
throws DOMException;
public Node renameNode(Node n, String namespaceURI, String name)
throws DOMException;
}
In IDL:
interface Document : Node {
attribute DOMString actualEncoding;
attribute DOMString encoding;
attribute boolean standalone;
attribute boolean strictErrorChecking;
attribute DOMString version;
Node adoptNode(in Node source) raises(DOMException);
void setBaseURI(in DOMString baseURI) raises(DOMException);
attribute boolean strictErrorChecking;
attribute DOMErrorHandler errorHandler;
attribute DOMString documentURI;
void normalizeDocument();
boolean canSetNormalizationFeature(in DOMString name, in boolean state);
void setNormalizationFeature(in DOMString name, in boolean state) raises(DOMException);
boolean getNormalizationFeature(in DOMString name) raises(DOMException);
Node renameNode(in Node n, in DOMString namespaceURI, in DOMString name) raises(DOMException);
};
Adds:
isWhitespaceInElementContent()
wholeText()
Java binding:
package org.w3c.dom;
public interface Text extends Node {
public boolean getIsWhitespaceInElementContent();
public String getWholeText();
public Text replaceWholeText(String content) throws DOMException;
}
In IDL:
interface Text : Node {
readonly attribute boolean isWhitespaceInElementContent;
readonly attribute DOMString wholeText;
Text replaceWholeText(in DOMString content) raises(DOMException);
};
DOM2 has no implementation-independent means to create
a new Document
object
Implementation-dependent methods tend to be fairly complex. For example, in Xerces-J:
DOMImplementation impl = DOMImplementationImpl.getDOMImplementation();
Document fibonacci = impl.createDocument(null, "Fibonacci_Numbers", null);
Still no language-independent means to create
a new Document
object
Does provide an implementation-independent method for Java only:
DOMImplementation impl = DOMImplementationRegistry.getDOMImplementation("XML");
package org.w3c.dom;
public class DOMImplementationRegistry {
// The system property to specify the DOMImplementationSource class names.
public static String PROPERTY = "org.w3c.dom.DOMImplementationSourceList";
public static DOMImplementation getDOMImplementation(String features)
throws ClassNotFoundException, InstantiationException, IllegalAccessException;
public static void addSource(DOMImplementationSource s)
throws ClassNotFoundException, InstantiationException, IllegalAccessException;
}
DOMErrorHandler
DOMLocator
Similar to SAX2's ErrorHandler
interface.
A callback interface
An application implements this interface and
then registers it with the setErrorHandler()
method to provide
warnings, errors, and fatal errors.
Java binding:
package org.w3c.dom;
public interface DOMErrorHandler {
public boolean handleError(DOMError error);
}
IDL:
interface DOMErrorHandler {
boolean handleError(in DOMError error);
};
Similar to SAX2's Locator
interface.
An application can implement this interface and
then register it with the setLocator()
method to
find out in which line and column and file a given
node appears.
Java binding:
package org.w3c.dom;
public interface DOMLocator {
public int getLineNumber();
public int getColumnNumber();
public int getOffset();
public Node getErrorNode();
public String getUri();
}
IDL:
interface DOMLocator {
readonly attribute long lineNumber;
readonly attribute long columnNumber;
readonly attribute long offset;
readonly attribute Node errorNode;
readonly attribute DOMString uri;
};
Loading: parsing an existing XML document
to produce a Document
object
Saving: serializing a Document
object
into a file or onto a stream
Completely implementation dependent in DOM2
Library specific code creates a parser
The parser parses the document and returns a DOM
org.w3c.dom.Document
object.
The entire document is stored in memory.
DOM methods and interfaces are used to extract data from this object
This program parses with Xerces. Other parsers are different.
import org.apache.xerces.parsers.*; import org.w3c.dom.*; import org.xml.sax.*; import java.io.*; public class DOMParserMaker { public static void main(String[] args) { DOMParser parser = new DOMParser(); for (int i = 0; i < args.length; i++) { try { parser.parse(args[i]); Document d = parser.getDocument(); } catch (SAXException e) { System.err.println(e); } catch (IOException e) { System.err.println(e); } } } }
import javax.xml.parsers.*; // JAXP import org.xml.sax.SAXException; import java.io.IOException; public class JAXPParserMaker { public static void main(String[] args) { if (args.length <= 0) { System.out.println("Usage: java JAXPParserMaker URL"); return; } String document = args[0]; try { DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); DocumentBuilder parser = factory.newDocumentBuilder(); parser.parse(document); System.out.println(document + " is well-formed."); } catch (SAXException e) { System.out.println(document + " is not well-formed."); } catch (IOException e) { System.out.println( "Due to an IOException, the parser could not check " + document ); } catch (FactoryConfigurationError e) { // JAXP suffers from excessive brain-damage caused by // intellectual in-breeding at Sun. (Basically the Sun // engineers spend way too much time talking to each other // and not nearly enough time talking to people outside // Sun.) Fortunately, you can happily ignore most of the // JAXP brain damage and not be any the poorer for it. // This, however, is one of the few problems you can't // avoid if you're going to use JAXP at all. // DocumentBuilderFactory.newInstance() should throw a // ClassNotFoundException if it can't locate the factory // class. However, what it does throw is an Error, // specifically a FactoryConfigurationError. Very few // programs are prepared to respond to errors as opposed // to exceptions. You should catch this error in your // JAXP programs as quickly as possible even though the // compiler won't require you to, and you should // never rethrow it or otherwise let it escape from the // method that produced it. System.out.println("Could not locate a factory class"); } catch (ParserConfigurationException e) { System.out.println("Could not locate a JAXP parser"); } } }
import org.w3c.dom.*; public class DOM3ParserMaker { public static void main(String[] args) { DOMImplementation impl = DOMImplementationRegistry.getDOMImplementation("XML 1.0 LS-Load 3.0"); DOMImplementationLS implls = (DOMImplementationLS) impl; DOMBuilder parser = implls.getDOMBuilder(); for (int i = 0; i < args.length; i++) { try { Document d = parser.parseURI(args[i]); } catch (DOMSystemException e) { System.err.println(e); } catch (DOMException e) { System.err.println(e); } } } }
This code will not actually compile or run until some parser supports DOM3 Load and Save.
DOMImplementationLS
DOMImplementation
that provides the factory
methods for creating the objects
required for loading and saving.DOMBuilder
DOMInputSource
InputSource
DOMEntityResolver
DOMBuilderFilter
Element
nodes as
they are being processed during the parsing of a document.
like SAX filters.
DOMWriter
DOMWriterFilter
DocumentLS
ParserErrorEvent
LSLoadEvent
LSProgressEvent
Factory interface to create new
DOMBuilder
and DOMWriter
implementations.
Java Binding:
package org.w3c.dom.ls;
public interface DOMImplementationLS {
public static final short MODE_SYNCHRONOUS = 1;
public static final short MODE_ASYNCHRONOUS = 2;
public DOMBuilder createDOMBuilder(short mode) throws DOMException;
public DOMWriter createDOMWriter();
public DOMInputSource createDOMInputSource();
}
IDL:
interface DOMImplementationLS {
const unsigned short MODE_SYNCHRONOUS = 1;
const unsigned short MODE_ASYNCHRONOUS = 2;
DOMBuilder createDOMBuilder(in unsigned short mode)
raises(DOMException);
DOMWriter createDOMWriter();
DOMInputSource createDOMInputSource();
};
Use the feature "LS-Load" to find a
DOMImplementation
object that supports
Load and Save.
Cast the DOMImplementation
object to
DOMImplementationLS
.
DOMImplementation impl
= DOMImplementationRegistry.getDOMImplementation("XML 1.0 LS-Load 3.0");
if (impl != null) {
DOMImplementationLS implls = (DOMImplementationLS) impl;
// ...
}
Provides an implementation-independent
API for parsing XML documents to produce a DOM
Document
object.
Instances are built by the
createDOMBuilder()
method in DOMImplementationLS
.
IDL:
Java Binding:
package org.w3c.dom.ls;
public interface DOMBuilder {
public DOMEntityResolver getEntityResolver();
public void setEntityResolver(DOMEntityResolver entityResolver);
public DOMErrorHandler getErrorHandler();
public void setErrorHandler(DOMErrorHandler errorHandler);
public DOMBuilderFilter getFilter();
public void setFilter(DOMBuilderFilter filter);
public void setFeature(String name, boolean state)
throws DOMException;
public boolean canSetFeature(String name, boolean state);
public boolean getFeature(String name)
throws DOMException;
public Document parseURI(String uri) throws Exception;
public Document parse(DOMInputSource is) throws Exception;
// ACTION_TYPES
public static final short ACTION_REPLACE = 1;
public static final short ACTION_APPEND = 2;
public static final short ACTION_INSERT_AFTER = 3;
public static final short ACTION_INSERT_BEFORE = 4;
public void parseWithContext(DOMInputSource is,
Node contextNode, short action) throws DOMException;
}
interface DOMBuilder {
attribute DOMEntityResolver entityResolver;
attribute DOMErrorHandler errorHandler;
attribute DOMBuilderFilter filter;
void setFeature(in DOMString name, in boolean state)
raises(DOMException);
boolean canSetFeature(in DOMString name, in boolean state);
boolean getFeature(in DOMString name) raises(DOMException);
Document parseURI(in DOMString uri) raises(DOMSystemException);
Document parse(in DOMInputSource is) raises(DOMSystemException);
// ACTION_TYPES
const unsigned short ACTION_REPLACE = 1;
const unsigned short ACTION_APPEND = 2;
const unsigned short ACTION_INSERT_AFTER = 3;
const unsigned short ACTION_INSERT_BEFORE = 4;
void parseWithContext(in DOMInputSource is, in Node cnode,
in unsigned short action) raises(DOMException);
};
Like SAX2's InputSource
class,
this interface is an abstraction of all the different things
(streams, files, byte arrays, sockets, URLs, etc.) from which
an XML document can be read.
Java Binding:
package org.w3c.dom.ls;
public interface DOMInputSource {
public InputStream getByteStream();
public void setByteStream(InputStream in);
public Reader getCharacterStream();
public void setCharacterStream(Reader in);
public String getStringData();
public void setStringData(String data);
public String getEncoding();
public void setEncoding(String encoding);
public String getPublicId();
public void setPublicId(String publicId);
public String getSystemId();
public void setSystemId(String systemId);
}
IDL:
interface DOMInputSource {
attribute DOMInputStream byteStream;
attribute DOMString stringData;
attribute DOMReader characterStream;
attribute DOMString encoding;
attribute DOMString publicId;
attribute DOMString systemId;
};
Like SAX2's EntityResolver
interface,
this interface lets applications redirect references to external entities.
Java Binding:
package org.w3c.dom.ls;
public interface DOMEntityResolver {
public DOMInputSource resolveEntity(String publicID,
String systemID, String baseURI) throws DOMSystemException;
}
IDL:
interface DOMEntityResolver {
DOMInputSource resolveEntity(in DOMString publicID,
in DOMString systemID, in DOMString baseURI)
raises(DOMSystemException);
};
Provides an API for serializing (writing) a DOM document out as a sequence of bytes onto a stream, file, socket, byte array, etc.
Java Binding:
package org.w3c.dom.ls;
public interface DOMWriter {
public void setFeature(String name, boolean state)
throws DOMException;
public boolean canSetFeature(String name, boolean state);
public boolean getFeature(String name) throws DOMException;
public String getEncoding();
public void setEncoding(String encoding);
public String getLastEncoding();
public String getNewLine();
public void setNewLine(String newLine);
public DOMErrorHandler getErrorHandler();
public void setErrorHandler(DOMErrorHandler errorHandler);
public boolean writeNode(OutputStream destination, Node wnode)
throws Exception;
public String writeToString(Node node) throws DOMException;
}
IDL:
interface DOMWriter {
void setFeature(in DOMString name, in boolean state)
raises(DOMException);
boolean canSetFeature(in DOMString name, in boolean state);
boolean getFeature(in DOMString name) raises(DOMException);
attribute DOMString encoding;
readonly attribute DOMString lastEncoding;
attribute DOMString newLine;
attribute DOMErrorHandler errorHandler;
boolean writeNode(in DOMOutputStream destination, in Node wnode)
raises(DOMSystemException);
DOMString writeToString(in Node wnode) raises(DOMException);
};
Lets applications examine nodes as they are being constructed during a parse.
As each node is examined, it may be modified or removed, or parsing may be aborted.
Java Binding:
package org.w3c.dom.ls;
public interface DOMBuilderFilter {
public int startNode(Node snode);
public int endNode(Node enode);
public int getWhatToShow();
}
IDL:
interface DOMBuilderFilter {
unsigned long startNode(in Node snode);
unsigned long endNode(in Node enode);
readonly attribute unsigned long whatToShow;
};
Lets applications examine nodes as they are being output.
As each element is examined, it may be modified or removed, or output may be aborted.
Java Binding:
package org.w3c.dom.ls;
public interface DOMWriterFilter extends NodeFilter {
public int getWhatToShow();
}
IDL:
interface DOMWriterFilter : traversal::NodeFilter {
readonly attribute unsigned long whatToShow;
};
An instance of the DocumentLS
interface
can be obtained by casting an instance of the
Document
interface to DocumentLS
.
Java Binding:
package org.w3c.dom.ls;
import org.w3c.dom.Node;
import org.w3c.dom.DOMException;
public interface DocumentLS {
public boolean getAsync();
public void setAsync(boolean async);
public void abort();
public boolean load(String url);
public boolean loadXML(String source);
public String saveXML(Node node) throws DOMException;
}
IDL:
interface DocumentLS {
attribute boolean async;
void abort();
boolean load(in DOMString url);
boolean loadXML(in DOMString source);
DOMString saveXML(in Node node) raises(DOMException);
};
Represents an error (of what kind?) in the document being parsed
Java Binding:
package org.w3c.dom.ls;
public interface ParseErrorEvent extends Event {
public DOMError getError();
}
IDL:
interface ParseErrorEvent : events::Event {
readonly attribute DOMError error;
};
Abstract Schemas (AS) include DTDs, W3C XML Schema Language Schemas, RELAX NG, and more
Should be able to access their information without binding yourself too tightly to any one language
Associating a schema with a document, or changing the current association
Using the same schema with several documents, without having to reload it
Validating documents against a schema
Validating document parts against a schema
Retrieving information from a schema (e.g. default values and attribute types
Create new schemas (like the DOM creates new instance documents)
Save an in-memory schema to a file.
Modify in-memory schemas
Provide the guidance necessary so that valid instance documents can be modified and remain valid.
Abstract Schema and AS-Editing Interfaces:
ASObject
ASModel
ASException
ASExceptionCode
ASContentModel
ASObjectList
ASNamedObjectMap
ASDataType
ASElementDecl
ASAttributeDecl
ASEntityDecl
ASNotationDecl
Validation and Other Interfaces:
DocumentAS
DOMImplementationAS
Load and Save for Abstract Schemas:
ASDOMBuilder
DOMASWriter
Document-Editing Interfaces:
NodeEditAS
ElementEditAS
CharacterDataEditAS
DocumentEditAS
AttributeEditAS
Check to see if the implementation supports the "LS-AS" feature, version "3.0".
Construct a DOMBuilder
object
Cast the DOMBuilder
to ASDOMBuilder
Call the parseASURI()
method to read the schema
try {
if (impl.hasFeature("LS-AS", "3.0")) {
DOMImplementationFactoryLS impl =
(DOMImplementationLS) DOMImplementationFactory.getDOMImplementation();
DOMBuilder parser = impl.getDOMBuilder();
ASDOMBuilder schemaParser = (ASDOMBuilder) parser;
ASModel schema = schemaParser.parseASURI(
"http://www.openhealth.org/RDDL/rddl-integration.rxg",
"RELAX");
// Use the schema...
}
}
catch (DOMException e) {
//...
}
Cast a Document
to DocumentAS
Add the schema to the DocumentAS
with the
addAS()
method.
Invoke DocumentAS
's validate
method
if (impl.hasFeature("AS-DOC 3.0")) {
Document doc = parser.parseURI("????");
DocumentAS docWithSchema = (DocumentAS) doc;
docWithSchema.addAS(schema);
docWithSchema.validate()
// Process the data...
}
DOMImplementation.hasFeature("AS-EDIT")
returns true if
a given DOM supports these interfaces for editing abstract schemas:
ASObject
ASModel
ASException
ASExceptionCode
ASContentModel
ASObjectList
ASNamedObjectMap
ASDataType
ASElementDecl
ASAttributeDecl
ASEntityDecl
ASNotationDecl
The superinterface for the various kinds of declarations out of which
ASModel
s are built
Java binding:
package org.w3c.dom.as;
public interface ASObject {
// ASObjectType
public static final short AS_ELEMENT_DECLARATION = 1;
public static final short AS_ATTRIBUTE_DECLARATION = 2;
public static final short AS_NOTATION_DECLARATION = 3;
public static final short AS_ENTITY_DECLARATION = 4;
public static final short AS_CONTENTMODEL = 5;
public static final short AS_MODEL = 6;
public short getASObjectType();
public ASModel getOwnerASModel();
public String getObjectName();
public void setObjectName(String objectName);
public String getPrefix();
public void setPrefix(String prefix);
public String getLocalName();
public void setLocalName(String localName);
public String getNamespaceURI();
public void setNamespaceURI(String namespaceURI);
public ASObject cloneASObject(boolean deep);
}
IDL:
interface ASObject {
// ASObjectType
const unsigned short AS_ELEMENT_DECLARATION = 1;
const unsigned short AS_ATTRIBUTE_DECLARATION = 2;
const unsigned short AS_NOTATION_DECLARATION = 3;
const unsigned short AS_ENTITY_DECLARATION = 4;
const unsigned short AS_CONTENTMODEL = 5;
const unsigned short AS_MODEL = 6;
readonly attribute unsigned short ASObjectType;
readonly attribute ASModel ownerASModel;
attribute DOMString objectName;
attribute DOMString prefix;
attribute DOMString localName;
attribute DOMString namespaceURI;
ASObject cloneASObject(in boolean deep);
};
Represents an abstract content model that could be a DTD, an XML Schema, or something else. It has both an internal and external subset.
Java binding:
package org.w3c.dom.as;
public interface ASModel extends ASObject {
// ASMODEL_TYPES
public static final short INTERNAL_SUBSET = 1;
public static final short EXTERNAL_SUBSET = 2;
public static final short NOT_USED = 3;
public boolean getNamespaceAware();
public short getUsage();
public String getLocation();
public void setLocation(String location);
public String getHint();
public void setHint(String hint);
public boolean getContainer();
public ASNamedObjectMap getElementDecls();
public ASNamedObjectMap getAttributeDecls();
public ASNamedObjectMap getNotationDecls();
public ASNamedObjectMap getEntityDecls();
public ASNamedObjectMap getContentModelDecls();
public void addASModel(ASModel abstractSchema);
public ASObjectList getASModels();
public void removeAS(ASModel as);
public boolean validate();
public void importASObject(ASObject asobject);
public void insertASObject(ASObject asobject);
public ASElementDecl createASElementDecl(String namespaceURI,
String name) throws ASException;
public ASAttributeDecl createASAttributeDecl(String namespaceURI,
String name) throws ASException;
public ASNotationDecl createASNotationDecl(String namespaceURI,
String name, String systemId, String publicId) throws ASException;
public ASEntityDecl createASEntityDecl(String name) throws ASException;
public ASContentModel createASContentModel(String name,
String namespaceURI, int minOccurs, int maxOccurs, short operator)
throws ASException;
}
IDL:
interface ASModel : ASObject {
// ASMODEL_TYPES
const unsigned short INTERNAL_SUBSET = 1;
const unsigned short EXTERNAL_SUBSET = 2;
const unsigned short NOT_USED = 3;
readonly attribute boolean NamespaceAware;
readonly attribute unsigned short usage;
attribute DOMString location;
attribute DOMString hint;
readonly attribute boolean container;
readonly attribute ASNamedObjectMap elementDecls;
readonly attribute ASNamedObjectMap attributeDecls;
readonly attribute ASNamedObjectMap notationDecls;
readonly attribute ASNamedObjectMap entityDecls;
readonly attribute ASNamedObjectMap contentModelDecls;
void addASModel(in ASModel abstractSchema);
ASObjectList getASModels();
void removeAS(in ASModel as);
boolean validate();
void importASObject(in ASObject asobject);
void insertASObject(in ASObject asobject);
ASElementDecl createASElementDecl(in DOMString namespaceURI, in DOMString name)
raises(ASException);
ASAttributeDecl createASAttributeDecl(in DOMString namespaceURI, in DOMString name)
raises(ASException);
ASNotationDecl createASNotationDecl(in DOMString namespaceURI,
in DOMString name, in DOMString systemId, in DOMString publicId)
raises(ASException);
ASEntityDecl createASEntityDecl(in DOMString name) raises(ASException);
ASContentModel createASContentModel(in DOMString name,
in DOMString namespaceURI, in unsigned long minOccurs,
in unsigned long maxOccurs, in unsigned short operator)
raises(ASException);
};
Represents the content specification for an element:
In a DTD:
<!ELEMENT name (first, last)>
In a schema this is the contents of an
xsd:element
element.
Java binding:
package org.w3c.dom.as;
public interface ASContentModel extends ASObject {
public static final int AS_UNBOUNDED = MAX_VALUE;
// ASContentModelType
public static final short AS_SEQUENCE = 0;
public static final short AS_CHOICE = 1;
public static final short AS_ALL = 2;
public static final short AS_NONE = 3;
public static final short AS_UNDEFINED = 4;
public short getListOperator();
public void setListOperator(short listOperator);
public int getMinOccurs();
public void setMinOccurs(int minOccurs);
public int getMaxOccurs();
public void setMaxOccurs(int maxOccurs);
public ASObjectList getSubModels();
public void setSubModels(ASObjectList subModels);
public void removesubModel(ASObject oldObject);
public ASObject insertBeforeSubModel(ASObject newObject, ASObject refObject)
throws ASException;
public int appendsubModel(ASObject newObject) throws ASException;
}
IDL:
interface ASContentModel : ASObject {
const unsigned long AS_UNBOUNDED = MAX_VALUE;
// ASContentModelType
const unsigned short AS_SEQUENCE = 0;
const unsigned short AS_CHOICE = 1;
const unsigned short AS_ALL = 2;
const unsigned short AS_NONE = 3;
const unsigned short AS_UNDEFINED = 4;
attribute unsigned short listOperator;
attribute unsigned long minOccurs;
attribute unsigned long maxOccurs;
attribute ASObjectList subModels;
void removesubModel(in ASObject oldObject);
ASObject insertBeforeSubModel(in ASObject newObject, in ASObject refObject)
raises(ASException);
unsigned long appendsubModel(in ASObject newObject) raises(ASException);
};
An ordered list of the ASObjects in a content model
Java binding:
package org.w3c.dom.as;
public interface ASObjectList {
public int getLength();
public ASObject item(int index);
}
IDL:
interface ASObjectList {
readonly attribute unsigned long length;
ASObject item(in unsigned long index);
};
An unordered set of AS objects
Java binding:
package org.w3c.dom.as;
public interface ASNamedObjectMap {
public int getLength();
public ASObject getNamedItem(String name);
public ASObject item(int index);
public ASObject removeNamedItem(String name) throws DOMException;
public ASObject setNamedItem(ASObject newASObject)
throws DOMException, ASException;
}
IDL:
interface ASNamedObjectMap {
readonly attribute unsigned long length;
ASObject getNamedItem(in DOMString name);
ASObject item(in unsigned long index);
ASObject removeNamedItem(in DOMString name)
raises(DOMException);
ASObject setNamedItem(in ASObject newASObject)
raises(DOMException, ASException);
};
Data types used in content models
Based on W3C XML Schema language types
Java binding:
package org.w3c.dom.as;
public interface ASDataType {
public short getDataType();
// DATA_TYPES
public static final short STRING_DATATYPE = 1;
public static final short NOTATION_DATATYPE = 10;
public static final short ID_DATATYPE = 11;
public static final short IDREF_DATATYPE = 12;
public static final short IDREFS_DATATYPE = 13;
public static final short ENTITY_DATATYPE = 14;
public static final short ENTITIES_DATATYPE = 15;
public static final short NMTOKEN_DATATYPE = 16;
public static final short NMTOKENS_DATATYPE = 17;
public static final short BOOLEAN_DATATYPE = 100;
public static final short FLOAT_DATATYPE = 101;
public static final short DOUBLE_DATATYPE = 102;
public static final short DECIMAL_DATATYPE = 103;
public static final short HEXBINARY_DATATYPE = 104;
public static final short BASE64BINARY_DATATYPE = 105;
public static final short ANYURI_DATATYPE = 106;
public static final short QNAME_DATATYPE = 107;
public static final short DURATION_DATATYPE = 108;
public static final short DATETIME_DATATYPE = 109;
public static final short DATE_DATATYPE = 110;
public static final short TIME_DATATYPE = 111;
public static final short GYEARMONTH_DATATYPE = 112;
public static final short GYEAR_DATATYPE = 113;
public static final short GMONTHDAY_DATATYPE = 114;
public static final short GDAY_DATATYPE = 115;
public static final short GMONTH_DATATYPE = 116;
public static final short INTEGER = 117;
public static final short NAME_DATATYPE = 200;
public static final short NCNAME_DATATYPE = 201;
public static final short NORMALIZEDSTRING_DATATYPE = 202;
public static final short TOKEN_DATATYPE = 203;
public static final short LANGUAGE_DATATYPE = 204;
public static final short NONPOSITIVEINTEGER_DATATYPE = 205;
public static final short NEGATIVEINTEGER_DATATYPE = 206;
public static final short LONG_DATATYPE = 207;
public static final short INT_DATATYPE = 208;
public static final short SHORT_DATATYPE = 209;
public static final short BYTE_DATATYPE = 210;
public static final short NONNEGATIVEINTEGER_DATATYPE = 211;
public static final short UNSIGNEDLONG_DATATYPE = 212;
public static final short UNSIGNEDINT_DATATYPE = 213;
public static final short UNSIGNEDSHORT_DATATYPE = 214;
public static final short UNSIGNEDBYTE_DATATYPE = 215;
public static final short POSITIVEINTEGER_DATATYPE = 216;
public static final short OTHER_SIMPLE_DATATYPE = 1000;
public static final short COMPLEX_DATATYPE = 1001;
}
IDL:
interface ASDataType {
readonly attribute unsigned short dataType;
// DATA_TYPES
const unsigned short STRING_DATATYPE = 1;
const unsigned short NOTATION_DATATYPE = 10;
const unsigned short ID_DATATYPE = 11;
const unsigned short IDREF_DATATYPE = 12;
const unsigned short IDREFS_DATATYPE = 13;
const unsigned short ENTITY_DATATYPE = 14;
const unsigned short ENTITIES_DATATYPE = 15;
const unsigned short NMTOKEN_DATATYPE = 16;
const unsigned short NMTOKENS_DATATYPE = 17;
const unsigned short BOOLEAN_DATATYPE = 100;
const unsigned short FLOAT_DATATYPE = 101;
const unsigned short DOUBLE_DATATYPE = 102;
const unsigned short DECIMAL_DATATYPE = 103;
const unsigned short HEXBINARY_DATATYPE = 104;
const unsigned short BASE64BINARY_DATATYPE = 105;
const unsigned short ANYURI_DATATYPE = 106;
const unsigned short QNAME_DATATYPE = 107;
const unsigned short DURATION_DATATYPE = 108;
const unsigned short DATETIME_DATATYPE = 109;
const unsigned short DATE_DATATYPE = 110;
const unsigned short TIME_DATATYPE = 111;
const unsigned short GYEARMONTH_DATATYPE = 112;
const unsigned short GYEAR_DATATYPE = 113;
const unsigned short GMONTHDAY_DATATYPE = 114;
const unsigned short GDAY_DATATYPE = 115;
const unsigned short GMONTH_DATATYPE = 116;
const unsigned short INTEGER = 117;
const unsigned short NAME_DATATYPE = 200;
const unsigned short NCNAME_DATATYPE = 201;
const unsigned short NORMALIZEDSTRING_DATATYPE = 202;
const unsigned short TOKEN_DATATYPE = 203;
const unsigned short LANGUAGE_DATATYPE = 204;
const unsigned short NONPOSITIVEINTEGER_DATATYPE = 205;
const unsigned short NEGATIVEINTEGER_DATATYPE = 206;
const unsigned short LONG_DATATYPE = 207;
const unsigned short INT_DATATYPE = 208;
const unsigned short SHORT_DATATYPE = 209;
const unsigned short BYTE_DATATYPE = 210;
const unsigned short NONNEGATIVEINTEGER_DATATYPE = 211;
const unsigned short UNSIGNEDLONG_DATATYPE = 212;
const unsigned short UNSIGNEDINT_DATATYPE = 213;
const unsigned short UNSIGNEDSHORT_DATATYPE = 214;
const unsigned short UNSIGNEDBYTE_DATATYPE = 215;
const unsigned short POSITIVEINTEGER_DATATYPE = 216;
const unsigned short OTHER_SIMPLE_DATATYPE = 1000;
const unsigned short COMPLEX_DATATYPE = 1001;
};
Represents a declaration of an element such as
<!ELEMENT TIME (#PCDATA)>
or an xsd:element
schema element
Java binding:
package org.w3c.dom.as;
public interface ASElementDecl extends ASObject {
// CONTENT_MODEL_TYPES
public static final short EMPTY_CONTENTTYPE = 1;
public static final short ANY_CONTENTTYPE = 2;
public static final short MIXED_CONTENTTYPE = 3;
public static final short ELEMENTS_CONTENTTYPE = 4;
public boolean getStrictMixedContent();
public void setStrictMixedContent(boolean strictMixedContent);
public ASDataType getElementType();
public void setElementType(ASDataType elementType);
public boolean getIsPCDataOnly();
public void setIsPCDataOnly(boolean isPCDataOnly);
public short getContentType();
public void setContentType(short contentType);
public ASContentModel getASContentModel();
public void setASContentModel(ASContentModel ASContentModel);
public ASNamedObjectMap getASAttributeDecls();
public void setASAttributeDecls(ASNamedObjectMap ASAttributeDecls);
public void addASAttributeDecl(ASAttributeDecl attributeDecl);
public ASAttributeDecl removeASAttributeDecl(ASAttributeDecl attributeDecl);
}
IDL:
interface ASElementDecl : ASObject {
// CONTENT_MODEL_TYPES
const unsigned short EMPTY_CONTENTTYPE = 1;
const unsigned short ANY_CONTENTTYPE = 2;
const unsigned short MIXED_CONTENTTYPE = 3;
const unsigned short ELEMENTS_CONTENTTYPE = 4;
attribute boolean strictMixedContent;
attribute ASDataType elementType;
attribute boolean isPCDataOnly;
attribute unsigned short contentType;
attribute ASContentModel ASContentModel;
attribute ASNamedObjectMap ASAttributeDecls;
void addASAttributeDecl(in ASAttributeDecl attributeDecl);
ASAttributeDecl removeASAttributeDecl(in ASAttributeDecl attributeDecl);
};
Represents a declaration of an attribute; e.g. an xsd:attribute
schema element
oe
<!ATTLIST TIME HOURS CDATA #IMPLIED>
Java binding:
package org.w3c.dom.as;
public interface ASAttributeDecl extends ASObject {
public static final short NONE = 0;
public static final short DEFAULT = 1;
public static final short FIXED = 2;
public static final short REQUIRED = 3;
public ASDataType getDataType();
public void setDataType(ASDataType DataType);
public String getDataValue();
public void setDataValue(String DataValue);
public String getEnumAttr();
public void setEnumAttr(String enumAttr);
public ASObjectList getOwnerElements();
public void setOwnerElements(ASObjectList ownerElements);
public short getDefaultType();
public void setDefaultType(short defaultType);
}
IDL:
interface ASAttributeDecl : ASObject {
// VALUE_TYPES
const unsigned short NONE = 0;
const unsigned short DEFAULT = 1;
const unsigned short FIXED = 2;
const unsigned short REQUIRED = 3;
attribute ASDataType DataType;
attribute DOMString DataValue;
attribute DOMString enumAttr;
attribute ASObjectList ownerElements;
attribute unsigned short defaultType;
};
Represents a declaration of a general entity; e.g.
<!ENTITY COPY01 "Copyright 2001 Elliotte Harold">
Java binding:
package org.w3c.dom.as;
public interface ASEntityDecl extends ASObject {
// EntityType
public static final short INTERNAL_ENTITY = 1;
public static final short EXTERNAL_ENTITY = 2;
public short getEntityType();
public void setEntityType(short entityType);
public String getEntityValue();
public void setEntityValue(String entityValue);
public String getSystemId();
public void setSystemId(String systemId);
public String getPublicId();
public void setPublicId(String publicId);
}
IDL:
interface ASEntityDecl : ASObject {
// EntityType
const unsigned short INTERNAL_ENTITY = 1;
const unsigned short EXTERNAL_ENTITY = 2;
attribute unsigned short entityType;
attribute DOMString entityValue;
attribute DOMString systemId;
attribute DOMString publicId;
};
Represents a declaration of a notation; e.g.
<!NOTATION TXT SYSTEM "text/plain">
Java binding:
package org.w3c.dom.as;
public interface ASNotationDecl extends ASObject {
public String getSystemId();
public void setSystemId(String systemId);
public String getPublicId();
public void setPublicId(String publicId);
}
IDL:
interface ASNotationDecl : ASObject {
attribute DOMString systemId;
attribute DOMString publicId;
};
DocumentAS
DOMImplementationAS
Extends the
Document
interface with additional methods for both
document editing, abstract schema
editing, and validation.
Java binding:
package org.w3c.dom.as;
public interface DocumentAS extends Document {
public ASModel getActiveASModel();
public void setActiveASModel(ASModel activeASModel);
public ASObjectList getBoundASModels();
public void setBoundASModels(ASObjectList boundASModels);
public ASModel getInternalAS();
public void setInternalAS(ASModel as) throws DOMException;
public void addAS(ASModel as);
public void removeAS(ASModel as);
public ASElementDecl getElementDecl() throws DOMException;
public void validate() throws ASException;
}
IDL:
interface DocumentAS : Document {
attribute ASModel activeASModel;
attribute ASObjectList boundASModels;
ASModel getInternalAS();
void setInternalAS(in ASModel as) raises(DOMException);
void addAS(in ASModel as);
void removeAS(in ASModel as);
ASElementDecl getElementDecl() raises(DOMException);
void validate() raises(ASException);
};
Call hasFeature("????", "3.0")
to verify that
this is supported
Load the document in the usual way
Load the ASModel
Cast the Document
to a DocumentAS
Attach the ASModel
the DocumentAS
using the
setAS()
method
Invoke the DocumentAS
's
validate()
method
If the Document
is not valid,
then a ASException
is thrown with the code VALIDATION_ERR
Extends the DOM2
DOMImplementation
interface with factory methods to create
schema documents
Java binding:
package org.w3c.dom.as;
import org.w3c.dom.DOMImplementation;
public interface DOMImplementationAS extends DOMImplementation {
public boolean getContainer();
public String getSchemaType();
public void setSchemaType(String schemaType);
public ASModel createAS(boolean NamespaceAware, String schemaType);
}
IDL:
interface DOMImplementationAS : DOMImplementation {
readonly attribute boolean container;
attribute DOMString schemaType;
ASModel createAS(in boolean NamespaceAware, in DOMString schemaType);
};
Call hasFeature("AS-EDIT", "3.0")
to verify that
this is supported
Load a DOMImplementation
in the usual way
Cast DOMImplementation
to DOMImplementationAS
Invoke the createAS()
method to create a new,
implementation-specific
ASModel
object
Use the factory methods in this
ASModel
to create the schema
<!ELEMENT methodCall (methodName, params)>
<!ELEMENT methodName (#PCDATA)>
<!ELEMENT params (param*)>
<!ELEMENT param (value)>
<!ELEMENT value
(i4|int|string|dateTime.iso8601|double|base64|struct|array)>
<!ELEMENT i4 (#PCDATA)>
<!ELEMENT int (#PCDATA)>
<!ELEMENT string (#PCDATA)>
<!ELEMENT dateTime.iso8601 (#PCDATA)>
<!ELEMENT double (#PCDATA)>
<!ELEMENT base64 (#PCDATA)>
<!ELEMENT array (data)>
<!ELEMENT data (value*)>
<!ELEMENT struct (member+)>
<!ELEMENT member (name, value)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT methodResponse (params | fault)>
<!ELEMENT fault (value)>
Call hasFeature("AS-EDIT", "3.0")
to verify that
this is supported
Load a DOMImplementation
in the usual way
Cast DOMImplementation
to DOMImplementationAS
Invoke the createAS()
method to create a new,
implementation-specific
ASModel
object
Use the factory methods in this
ASModel
to create the schema
try {
if (impl.hasFeature("AS-EDIT", "3.0")) {
DOMImplementationFactoryLS impl =
(DOMImplementationAS) DOMImplementationFactory.getDOMImplementation();
ASModel dtd = impl.createAS(false, "DTD");
// <!ELEMENT methodCall (methodName, params)>
ASElementDecl methodCall = dtd.createASElementDecl(null, "methodCall");
ASContentModel methodCallModel = dtd.createASContentModel(
"methodCall", null, 1, 1, ASContentModel.AS_SEQUENCE);
methodCall.setASContentModel(methodCallModel);
methodCall.setContentType(ASElementDecl.ELEMENTS_CONTENTTYPE);
// <!ELEMENT methodName (#PCDATA)>
ASElementDecl methodName = dtd.createASElementDecl(null, "methodName");
methodName.setIsPCDataOnly(true);
// <!ELEMENT params (param*)>
ASElementDecl params = dtd.createASElementDecl(null, "params");
ASContentModel paramsModel = dtd.createASContentModel(
"params", "", 0, ASContentModel.AS_UNBOUNDED, ASContentModel.AS_SEQUENCE);
methodCall.setASContentModel(paramsModel);
methodCall.setContentType(ASElementDecl.ELEMENTS_CONTENTTYPE);
// <!ELEMENT param (value)>
ASElementDecl param = dtd.createASElementDecl(null, "param");
ASContentModel paramModel = dtd.createASContentModel(
"param", "", 1, 1, ASContentModel.AS_SEQUENCE);
methodCall.setASContentModel(paramModel);
methodCall.setContentType(ASElementDecl.ELEMENTS_CONTENTTYPE);
// <!ELEMENT value (i4|int|string|dateTime.iso8601|double|base64|struct|array)>
ASElementDecl value = dtd.createASElementDecl(null, "value");
ASContentModel valueModel = dtd.createASContentModel(
"param", "", 1, 1, ASContentModel.AS_CHOICE);
methodCall.setASContentModel(valueModel);
methodCall.setContentType(ASElementDecl.ELEMENTS_CONTENTTYPE);
// <!ELEMENT i4 (#PCDATA)>
// <!ELEMENT int (#PCDATA)>
// <!ELEMENT string (#PCDATA)>
// <!ELEMENT dateTime.iso8601 (#PCDATA)>
// <!ELEMENT double (#PCDATA)>
// <!ELEMENT base64 (#PCDATA)>
ASElementDecl i4 = dtd.createASElementDecl(null, "i4");
i4.setIsPCDataOnly(true);
ASElementDecl intElement = dtd.createASElementDecl(null, "int");
intElement.setIsPCDataOnly(true);
ASElementDecl string = dtd.createASElementDecl(null, "string");
string.setIsPCDataOnly(true);
ASElementDecl dateTime.iso8601 = dtd.createASElementDecl(null, "dateTime.iso8601");
dateTime.iso8601.setIsPCDataOnly(true);
ASElementDecl base64 = dtd.createASElementDecl(null, "base64");
base64.setIsPCDataOnly(true);
ASElementDecl doubleElement = dtd.createASElementDecl(null, "doubleElement");
doubleElement.setIsPCDataOnly(true);
// <!ELEMENT array (data)>
ASElementDecl array = dtd.createASElementDecl(null, "array");
ASContentModel arrayModel = dtd.createASContentModel(
"array", "", 1, ASContentModel.AS_UNBOUNDED, ASContentModel.AS_SEQUENCE);
methodCall.setASContentModel(arrayModel);
methodCall.setContentType(ASElementDecl.ELEMENTS_CONTENTTYPE);
// <!ELEMENT data (value*)>
ASElementDecl data = dtd.createASElementDecl(null, "data");
ASContentModel dataModel = dtd.createASContentModel(
"data", "", 0, 1, ASContentModel.AS_SEQUENCE);
methodCall.setASContentModel(arrayModel);
methodCall.setContentType(ASElementDecl.ELEMENTS_CONTENTTYPE);
// <!ELEMENT struct (member+)>
ASElementDecl struct = dtd.createASElementDecl(null, "struct");
ASContentModel structModel = dtd.createASContentModel(
"struct", "", 1, ASContentModel.AS_UNBOUNDED, ASContentModel.AS_SEQUENCE);
methodCall.setASContentModel(structModel);
methodCall.setContentType(ASElementDecl.ELEMENTS_CONTENTTYPE);
// <!ELEMENT member (name, value)>
ASElementDecl member = dtd.createASElementDecl(null, "member");
ASContentModel memberModel = dtd.createASContentModel(
"member", "", 2, 2, ASContentModel.AS_SEQUENCE);
methodCall.setASContentModel(memberModel);
methodCall.setContentType(ASElementDecl.ELEMENTS_CONTENTTYPE);
// <!ELEMENT name (#PCDATA)>
ASElementDecl name = dtd.createASElementDecl(null, "i4");
name.setIsPCDataOnly(true);
// <!ELEMENT methodResponse (params | fault)>
ASElementDecl methodResponse = dtd.createASElementDecl(null, "methodResponse");
ASContentModel methodResponseModel = dtd.createASContentModel(
"member", "", 1, 1, ASContentModel.AS_CHOICE);
methodCall.setASContentModel(methodResponseModel);
methodCall.setContentType(ASElementDecl.ELEMENTS_CONTENTTYPE);
// <!ELEMENT fault (value)>
ASElementDecl fault = dtd.createASElementDecl(null, "fault");
ASContentModel faultModel = dtd.createASContentModel(
"fault", "", 1, 1, ASContentModel.AS_SEQUENCE);
methodCall.setASContentModel(faultModel);
methodCall.setContentType(ASElementDecl.ELEMENTS_CONTENTTYPE);
methodCallModel.appendSubModel(methodName);
methodCallModel.appendSubModel(params);
paramsModel.appendSubModel(param);
paramModel.appendSubModel(value);
valueModel.appendSubModel(i4);
valueModel.appendSubModel(intElement);
valueModel.appendSubModel(string);
valueModel.appendSubModel(dateTime.iso8601);
valueModel.appendSubModel(doubleElement);
valueModel.appendSubModel(base64Element);
valueModel.appendSubModel(structElement);
valueModel.appendSubModel(arrayElement);
arrayModel.appendSubModel(data);
dataModel.appendSubModel(value);
structModel.appendSubModel(name);
methodResponseModel.appendSubModel(params);
methodResponseModel.appendSubModel(fault);
memberModel.appendSubModel(name);
memberModel.appendSubModel(value);
faultModel.appendSubModel(value);
}
}
catch (ASException e) {
System.err.println(e);
}
<?xml version="1.0"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <!-- The only two possible root elements are methodResponse and methodCall so these are the only two I use a top-level declaration for. --> <xsd:element name="methodCall"> <xsd:complexType> <xsd:all> <xsd:element name="methodName"> <xsd:simpleType> <xsd:restriction base="ASCIIString"> <xsd:pattern value="([A-Za-z0-9]|/|\.|:|_)*" /> </xsd:restriction> </xsd:simpleType> </xsd:element> <xsd:element name="params" minOccurs="0" maxOccurs="1"> <xsd:complexType> <xsd:sequence> <xsd:element name="param" type="ParamType" minOccurs="0" maxOccurs="unbounded"/> </xsd:sequence> </xsd:complexType> </xsd:element> </xsd:all> </xsd:complexType> </xsd:element> <xsd:element name="methodResponse"> <xsd:complexType> <xsd:choice> <xsd:element name="params"> <xsd:complexType> <xsd:sequence> <xsd:element name="param" type="ParamType"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:element name="fault"> <!-- What can appear inside a fault is very restricted --> <xsd:complexType> <xsd:sequence> <xsd:element name="value"> <xsd:complexType> <xsd:sequence> <xsd:element name="struct"> <xsd:complexType> <xsd:sequence> <xsd:element name="member" type="MemberType"> </xsd:element> <xsd:element name="member" type="MemberType"> </xsd:element> </xsd:sequence> </xsd:complexType> </xsd:element> </xsd:sequence> </xsd:complexType> </xsd:element> </xsd:sequence> </xsd:complexType> </xsd:element> </xsd:choice> </xsd:complexType> </xsd:element> <xsd:complexType name="ParamType"> <xsd:sequence> <xsd:element name="value" type="ValueType"/> </xsd:sequence> </xsd:complexType> <xsd:complexType name="ValueType" mixed="true"> <!-- I need to figure out how to say that this is either a simple xsd:string type or that it contains one of these elements; but that otherwise it does not have mixed content --> <xsd:choice> <xsd:element name="i4" type="xsd:int"/> <xsd:element name="int" type="xsd:int"/> <xsd:element name="string" type="ASCIIString"/> <xsd:element name="double" type="xsd:decimal"/> <xsd:element name="Base64" type="xsd:base64Binary"/> <xsd:element name="boolean" type="NumericBoolean"/> <xsd:element name="dateTime.iso8601" type="xsd:dateTime"/> <xsd:element name="array" type="ArrayType"/> <xsd:element name="struct" type="StructType"/> </xsd:choice> </xsd:complexType> <xsd:complexType name="StructType"> <xsd:sequence> <xsd:element name="member" type="MemberType" maxOccurs="unbounded"/> </xsd:sequence> </xsd:complexType> <xsd:complexType name="MemberType"> <xsd:sequence> <xsd:element name="name" type="xsd:string" /> <xsd:element name="value" type="ValueType"/> </xsd:sequence> </xsd:complexType> <xsd:complexType name="ArrayType"> <xsd:sequence> <xsd:element name="data"> <xsd:complexType> <xsd:sequence> <xsd:element name="value" type="ValueType" minOccurs="0" maxOccurs="unbounded"/> </xsd:sequence> </xsd:complexType> </xsd:element> </xsd:sequence> </xsd:complexType> <xsd:simpleType name="ASCIIString"> <xsd:restriction base="xsd:string"> <xsd:pattern value="([ -~]|\n|\r|\t)*" /> </xsd:restriction> </xsd:simpleType> <xsd:simpleType name="NumericBoolean"> <xsd:restriction base="xsd:boolean"> <xsd:pattern value="0|1" /> </xsd:restriction> </xsd:simpleType> </xsd:schema>
Allows you to determine whether or not it's valid to add or a delete a node at a particular position in a document. This is called guided document editing.
DOMImplementation.hasFeature("AS-DOC")
returns true if
a given DOM supports these capabilities.
NodeEditAS
ElementEditAS
CharacterDataEditAS
DocumentEditAS
AttributeEditAS
Extends the Node
interface with methods for
guided document editing.
Java binding:
package org.w3c.dom.as;
import org.w3c.dom.Node;
public interface NodeEditAS extends Node {
// ASCheckType
public static final short WF_CHECK = 1;
public static final short NS_WF_CHECK = 2;
public static final short PARTIAL_VALIDITY_CHECK = 3;
public static final short STRICT_VALIDITY_CHECK = 4;
public boolean canInsertBefore(Node newChild, Node refChild);
public boolean canRemoveChild(Node oldChild);
public boolean canReplaceChild(Node newChild, Node oldChild);
public boolean canAppendChild(Node newChild);
public boolean isNodeValid(boolean deep, short wFValidityCheckLevel)
throws ASException;
}
IDL:
interface NodeEditAS : Node {
// ASCheckType
const unsigned short WF_CHECK = 1;
const unsigned short NS_WF_CHECK = 2;
const unsigned short PARTIAL_VALIDITY_CHECK = 3;
const unsigned short STRICT_VALIDITY_CHECK = 4;
boolean canInsertBefore(in Node newChild, in Node refChild);
boolean canRemoveChild(in Node oldChild);
boolean canReplaceChild(in Node newChild, in Node oldChild);
boolean canAppendChild(in Node newChild);
boolean isNodeValid(in boolean deep, in unsigned short wFValidityCheckLevel)
raises(ASException);
};
Extends the DOM NodeEditAS
interface with methods for determining the legal attributes and children of an element.
Java binding:
package org.w3c.dom.as;
public interface ElementEditAS extends NodeEditAS {
public NodeList getDefinedElementTypes();
public short contentType();
public boolean canSetAttribute(String name, String value);
public boolean canSetAttributeNode(Attr attrNode);
public boolean canSetAttributeNS(String name, String value, String namespaceURI);
public boolean canRemoveAttribute(String name);
public boolean canRemoveAttributeNS(String name, String namespaceURI);
public boolean canRemoveAttributeNode(Node attrNode);
public NodeList getChildElements();
public NodeList getParentElements();
public NodeList getAttributeList();
public boolean isElementDefined(String elemTypeName);
public boolean isElementDefinedNS(String elemTypeName,
String namespaceURI, String name);
}
IDL:
interface ElementEditAS : NodeEditAS {
readonly attribute NodeList definedElementTypes;
unsigned short contentType();
boolean canSetAttribute(in DOMString attrname, in DOMString attrval);
boolean canSetAttributeNode(in Attr attrNode);
boolean canSetAttributeNS(in DOMString name, in DOMString attrval, in DOMString namespaceURI);
boolean canRemoveAttribute(in DOMString attrname);
boolean canRemoveAttributeNS(in DOMString attrname, in DOMString namespaceURI);
boolean canRemoveAttributeNode(in Node attrNode);
NodeList getChildElements();
NodeList getParentElements();
NodeList getAttributeList();
boolean isElementDefined(in DOMString elemTypeName);
boolean isElementDefinedNS(in DOMString elemTypeName, in DOMString namespaceURI, in DOMString name);
};
Extends the NodeEditAS
interface with methods to determine whether or not certain text
can be added at a particular place.
Java binding:
package org.w3c.dom.as;
public interface CharacterDataEditAS extends NodeEditAS {
public boolean getIsWhitespaceOnly();
public boolean canSetData(int offset, int count);
public boolean canAppendData(String arg);
public boolean canReplaceData(int offset, int count, String data);
public boolean canInsertData(int offset, String data);
public boolean canDeleteData(int offset, int count);
}
IDL:
interface CharacterDataEditAS : NodeEditAS {
readonly attribute boolean isWhitespaceOnly;
boolean canSetData(in unsigned long offset, in unsigned long count);
boolean canAppendData(in DOMString arg);
boolean canReplaceData(in unsigned long offset, in unsigned long count, in DOMString arg);
boolean canInsertData(in unsigned long offset, in DOMString arg);
boolean canDeleteData(in unsigned long offset, in unsigned long count);
};
Extends the NodeEditAS
interface with methods to turn continuous validaity checking on
or off
Java binding:
package org.w3c.dom.as;
public interface DocumentEditAS extends NodeEditAS {
public boolean getContinuousValidityChecking();
public void setContinuousValidityChecking(boolean continuousValidityChecking);
}
IDL:
interface DocumentEditAS : NodeEditAS {
attribute boolean continuousValidityChecking;
};
Document Object Model (DOM) Level 3 Abstract Schemas and Load and Save Specification: http://www.w3.org/TR/DOM-Level-3-ASLS
Document Object Model (DOM) Level 3 Core Specification Version 1.0: http://www.w3.org/TR/DOM-Level-3-Core
Document Object Model (DOM) Requirements: http://www.w3.org/TR/DOM-Requirements/
Document Object Model (DOM) Level 3 Views and Formatting Specification: http://www.w3.org/TR/DOM-Level-3-Views/
Document Object Model (DOM) Level 3 XPath Specification: http://www.w3.org/TR/DOM-Level-3-Events/
Document Object Model (DOM) Level 3 Events Specification: http://www.w3.org/TR/DOM-Level-3-Views/
This presentation: http://www.cafeconleche.org/slides/xmlone/london2002/advancedxml/
XML Infoset Specification: http://www.w3.org/TR/xml-infoset
Processing XML with Java: http://www.cafeconleche.org/books/xmljava/