The Bleeding Edge of XML


Blood being drawn from an arm

The Bleeding Edge of XML

Elliotte Rusty Harold

XML and Web Services 2003 London

Monday, March 17, 2003

elharo@metalab.unc.edu

http://www.cafeconleche.org//


Outline


Part I: Semantics and Syntax

The Infoset is the unfortunate standard to which those in retreat from the radical and most useful implications of well-formedness have rallied. At its core the Infoset insists that there is 'more' to XML than the straightforward syntax of well-formedness. By imposing its canonical semantics the Infoset obviates the infinite other semantic outcomes which might be elaborated in particular unique circumstances from an instance of well-formed XML 1.0 syntax. The question we should be asking is not whether the Infoset has chosen the correct canonical semantics, but whether the syntactic possibilities of XML 1.0 should be curtailed in this way at all.
--Walter Perry on the xml-dev mailing list


A normal XML document

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/css" href="song.css"?>
<!DOCTYPE SONG SYSTEM "song.dtd">
<SONG xmlns="http://www.cafeconleche.org/namespace/song"
      xmlns:xlink="http://www.w3.org/1999/xlink">
  <TITLE>Hot Cop</TITLE>
  <PHOTO 
    xlink:type="simple" xlink:show="onLoad" xlink:href="hotcop.jpg"
    ALT="Victor Willis in Cop Outfit" WIDTH="100" HEIGHT="200"/>
  <COMPOSER>Jacques Morali</COMPOSER>
  <COMPOSER>Henri Belolo</COMPOSER>
  <COMPOSER>Victor Willis</COMPOSER>
  <PRODUCER>Jacques Morali</PRODUCER>
  <!-- The publisher is actually Polygram but I needed 
       an example of a general entity reference. -->
  <PUBLISHER xlink:type="simple" xlink:href="http://www.amrecords.com/">
    A &amp; M Records
  </PUBLISHER>
  <LENGTH>6:20</LENGTH>
  <YEAR>1978</YEAR>
  <ARTIST>Village People</ARTIST>
</SONG>
<!-- You can tell what album I was 
     listening to when I wrote this example -->

A canonical XML document

<?xml-stylesheet type="text/css" href="song.css"?>
<SONG xmlns="http://www.cafeconleche.org/namespace/song" xmlns:xlink="http://www.w3.org/1999/xlink">
  <TITLE>Hot Cop</TITLE>
  <PHOTO ALT="Victor Willis in Cop Outfit" HEIGHT="200" WIDTH="100" xlink:href="hotcop.jpg" xlink:show="onLoad" xlink:type="simple"></PHOTO>
  <COMPOSER>Jacques Morali</COMPOSER>
  <COMPOSER>Henri Belolo</COMPOSER>
  <COMPOSER>Victor Willis</COMPOSER>
  <PRODUCER>Jacques Morali</PRODUCER>
  
  <PUBLISHER xlink:href="http://www.amrecords.com/" xlink:type="simple">
    A &amp; M Records
  </PUBLISHER>
  <LENGTH>6:20</LENGTH>
  <YEAR>1978</YEAR>
  <ARTIST>Village People</ARTIST>
</SONG>

An org.w3c.dom.Document object formed by reading hotcop.xml


import org.apache.xerces.parsers.*;
import org.w3c.dom.*;
import org.xml.sax.*;
import java.io.*;

public class DOMHotCop {

  public static void main(String[] args) {

    DOMParser parser = new DOMParser();

    try {
      parser.parse("hot_cop.xml");
      Document d = parser.getDocument();
    }
    catch (SAXException e) {
      System.err.println(e);
    }
    catch (IOException e) {
      System.err.println(e);
    }

  }

}

An encrypted hotcop.xml

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE EncryptedData SYSTEM "song.dtd">
<?xml-stylesheet type="text/css" href="song.css"?><EncryptedData Id="ed1" Type="http://www.w3.org/2001/04/xmlenc#Element" xmlns="http://www.w3.org/2001/04/xmlenc#">
  <EncryptionMethod Algorithm="http://www.w3.org/2001/04/xmlenc#tripledes-cbc"/>
  <KeyInfo xmlns="http://www.w3.org/2000/09/xmldsig#">
    <EncryptedKey xmlns="http://www.w3.org/2001/04/xmlenc#">
      <EncryptionMethod Algorithm="http://www.w3.org/2001/04/xmlenc#rsa-1_5"/>
      <KeyInfo xmlns="http://www.w3.org/2000/09/xmldsig#">
        <KeyName>Alice</KeyName>
      </KeyInfo>
      <CipherData>
        <CipherValue/>
      </CipherData>
    </EncryptedKey>
  </KeyInfo>
  <CipherData>
    <CipherValue>yyIIMYu1mpIm+5MVokRnJ0hnfvZt/x/3ly311l6dK0v1GvynMJP1rkb+/YGfr6Zy00nL4plqxFgo5pJuVmxzj+R7q6f7sF6acfU0XBABICE9ZXfJ5gnainHVuaWbHnVPgT3fi2ohxhEmXp/JF7NhqDvsH9PULZLCIaRS9tKsrNrzdX/EQM3enQHkyc0aJAuAFLTwU710Hta7pf3qXX62i3UGSqjxy2Di8fOs+d/P4nysE9428SZmOM6fe4/m8YyRayRxMNr2RoOQIiYkiJ1krEGQzQ0XGJwIWmAR56CsMljTyT1G/2BDp39k/jCEiqARPekTwHZ1m7Pyh81nr4lnfm9lF3/NzlYe7wpnfSBp2u6IytoWWOeP27h5HTsu5jYfkRhht2h2R4nyIj07YkOsPmd9ubu3cq/SYU4DuvtKrKEIkhnYg4ZUVGjMKlffGzLNAaS2G1PRVIENJHNRoJwivY6+cPqjOhXUvioNQ/WQTOeo5cvTlJaD/od5VWGTJ75ZR8tkZfwFbop8JbhNN6ZODZNSNndnMJ1jEJeeFobOel5Vw0/ClPGh12LxkEJX/h3A+GyUtEfoAmB8ANb3xTsqiTyea1ZBJaS9hhcAFt3Ck+gTHPzwYS+y6x5qRTCfPyZS5PHvKjjIkAEXv+0p9zlQT9hBH1BJB6jXtWjd5sZAE3rMQC/7MXyXvN3ms/TFypBaQsWzKRg+JvxToErD1MtJXT1g8uZr59ubVlBcyjTWcCLMf+QUDxaY0iqPneNSGHAr1isuFc8PZOwJemYjnsySB0R8NN2LcCdFtK8IcB2+QLY7QCj8CAPy4uIZyHbCx6ojg5KWyGOIM5vmWGq6p6Tg+Y3nbc1uFOr1CbXCIbaNC9DI3N+HAcnW7439/JpMhMRa9s02RZsVqhjo4rYz04lkjI/44ffrBVsxk0/sk6XyCnZHQAwpd4y5gXofyPzW83yXA1iXZh7SQfs=</CipherValue>
  </CipherData>
</EncryptedData><!-- You can tell what album I was 
     listening to when I wrote this example -->

Are these four the same thing or not?


What is the XML Infoset?


The Infoset defines 11 kinds of Information Items


The Document Information Item


Elements

<PHOTO 
  xlink:type="simple" xlink:show="onLoad" xlink:href="hotcop.jpg"
  ALT="Victor Willis in Cop Outfit" WIDTH="100" HEIGHT="200"/>
  
<COMPOSER>Jacques Morali</COMPOSER>

<COMPOSER>
  <PERSON>
    <NAME>
      <FIRST>Henri</FIRST>
      <LAST>Belolo</LAST>
    </NAME>
  </PERSON>
</COMPOSER>

<rdf:RDF xmlns:rdf="http://www.w3.org/TR/REC-rdf-syntax#">
  <rdf:Description xmlns:dc="http://purl.org/dc/"
     about="http://www.ibiblio.org/examples/impressionists.xml">
    <dc:title> Impressionist Paintings </dc:title>
    <dc:creator> Elliotte Rusty Harold </dc:creator>
    <dc:description> 
      A list of famous impressionist paintings organized 
      by painter and date 
    </dc:description>
    <dc:date>2000-08-22</dc:date>
  </rdf:Description>
</rdf:RDF>

Element Information Items

An Element Information Item Includes:


Attributes

xlink:type="simple"
xlink:href="http://www.amrecords.com/"
xlink:type =  "simple"
xlink:show = "onLoad"
xlink:href="hotcop.jpg"
ALT="Victor Willis in Cop Outfit"
WIDTH=" 100 "
HEIGHT=' 200 '

An Attribute Information Item Includes:


Comments

  <!-- The publisher is actually Polygram but I needed 
       an example of a general entity reference. -->
<!--  <PUBLISHER xlink:type="simple" xlink:href="http://www.amrecords.com/">
    A &amp; M Records
  </PUBLISHER>
  <LENGTH>6:20</LENGTH>
  <YEAR>1978</YEAR>
  <ARTIST>Village People</ARTIST>
</SONG> -->
<!-- You can tell what album I was 
     listening to when I wrote this example -->

A comment Information Item includes:


A Processing Instruction Information Item Includes:

<?robots index="yes" follow="no"?>
<?php 
  mysql_connect("database.unc.edu", "clerk", "password"); 
  $result = mysql("CYNW", "SELECT LastName, FirstName FROM Employees 
    ORDER BY LastName, FirstName"); 
  $i = 0;
  while ($i < mysql_numrows ($result)) {
     $fields = mysql_fetch_row($result);
     echo "<person>$fields[1] $fields[0] </person>\r\n";
     $i++;
  }
  mysql_close();
?>

Characters


Namespaces


Document Type Declaration

<!DOCTYPE SONG SYSTEM "song.dtd">
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
                       "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

A Document Type Declaration Information Item includes:


Document Type Definition

<!ELEMENT SONG (TITLE, PHOTO?, COMPOSER+, PRODUCER*, 
 PUBLISHER*, LENGTH?, YEAR?, ARTIST+)>
<!ATTLIST SONG xmlns       CDATA #REQUIRED
               xmlns:xlink CDATA #REQUIRED>
<!ELEMENT TITLE (#PCDATA)>

<!ELEMENT PHOTO EMPTY>
<!ATTLIST PHOTO xlink:type CDATA #FIXED "simple"
                xlink:href CDATA #REQUIRED
                xlink:show CDATA #IMPLIED
                ALT        CDATA #REQUIRED
                WIDTH      CDATA #REQUIRED
                HEIGHT     CDATA #REQUIRED
>

<!ELEMENT COMPOSER (#PCDATA)>
<!ELEMENT PRODUCER (#PCDATA)>
<!ELEMENT PUBLISHER (#PCDATA)>
<!ATTLIST PUBLISHER xlink:type CDATA #IMPLIED
                    xlink:href CDATA #IMPLIED
>

<!ELEMENT LENGTH (#PCDATA)>
<!-- This should be a four digit year like "1999",
     not a two-digit year like "99" -->
<!ELEMENT YEAR (#PCDATA)>

<!ELEMENT ARTIST (#PCDATA)>

Entities


Entity Information Items


Unparsed Entity Information Items


Unexpanded Entity Information Items


The Infoset Omits:


The PSVI


Canonical XML


How are documents canonicalized?

  1. The document is encoded in UTF-8

  2. Line breaks are normalized to a linefeed (ASCII , \n)

  3. Attribute values are normalized, as if by a validating processor

  4. Character and parsed entity references are replaced

  5. CDATA sections are replaced with their character content

  6. The XML and document type declarations are removed

  7. Empty elements are converted to start tag-end tag pairs

  8. White space outside of the document element and within start and end tags is normalized

  9. All white space in character content is retained (except for characters removed during linefeed normalization)

  10. Attribute value delimiters are set to double quotes

  11. Special characters in attribute values and character content are replaced by character references

  12. Superfluous namespace declarations are removed from each element

  13. Default attributes are added to each element

  14. Lexicographic order is imposed on the namespace declarations and attributes of each element


Canonicalization software

XML Canonicalizer from IBM's XML Security Suite:
Apache XML Security Suite
A standard feature for DOM level 3's DOMWriter
XOM

Digital Signatures


Not Just for Signing XML


Generic Digital Signature Process

  1. The signature processor calculates a hash code for some data using a strong, one-way hash function.

  2. The processor encrypts the hash code using a private key.

  3. The verifier calculates the hash code for the data it's received.

  4. It then decrypts the encrypted hash code using the public key to see if the hash codes match.


XML Signature Process

  1. The signature processor digests (calculates the hash code for) a data object.

  2. The processor places the digest value in a Signature element.

  3. The processor digests the Signature element.

  4. The processor cryptographically signs the Signature element.


XML Digital Signature software


A Detached Signature for hotcop.xml

<Signature xmlns="http://www.w3.org/2000/09/xmldsig#">
  <SignedInfo>
    <CanonicalizationMethod Algorithm="http://www.w3.org/TR/2001/REC-xml-c14n-20010315"></CanonicalizationMethod>
    <SignatureMethod Algorithm="http://www.w3.org/2000/09/xmldsig#dsa-sha1"></SignatureMethod>
    <Reference URI="file:///home/elharo/speaking/xmlone/london2003/bleeding/examples/hotcop.xml">
      <DigestMethod Algorithm="http://www.w3.org/2000/09/xmldsig#sha1"></DigestMethod>
      <DigestValue>i9y8sGT7gjSa0tkQI9aVsl7P6zg=</DigestValue>
    </Reference>
  </SignedInfo>
  <SignatureValue>
    RBIqdgwjHB8yufwiwScaf/L1P95u4SknSU2NLEeBH1yUAdyzjD/B3A==
  </SignatureValue>
  <KeyInfo>
    <KeyValue>
      <DSAKeyValue>
        <P>
          /X9TgR11EilS30qcLuzk5/YRt1I870QAwx4/gLZRJmlFXUAiUftZPY1Y+r/F9bow9s
          ubVWzXgTuAHTRv8mZgt2uZUKWkn5/oBHsQIsJPu6nX/rfGG/g7V+fGqKYVDwT7g/bT
          xR7DAjVUE1oWkTL2dfOuK2HXKu/yIgMZndFIAcc=
        </P>
        <Q>l2BQjxUjC8yykrmCouuEC/BYHPU=</Q>
        <G>
          9+GghdabPd7LvKtcNrhXuXmUr7v6OuqC+VdMCz0HgmdRWVeOutRZT+ZxBxCBgLRJFn
          Ej6EwoFhO3zwkyjMim4TwWeotUfI0o4KOuHiuzpnWRbqN/C/ohNWLx+2J6ASQ7zKTx
          vqhRkImog9/hWuWfBpKLZl6Ae1UlZAFMO/7PSSo=
        </G>
        <Y>
          /U3X04lzUKj+2NSxcV1SHBQe8Jyvhj2sMneglMBDZ9nwdTvyuYG10uMgHYmd5Id9lr
          vGbGSz2O+xBU2oh20hR5knKx4MmPZsbheKlUFrpd+3z71CzN8isfDuyvjT7hUt6Br8
          zDx/N5Av8Y205khGFwgE9qkabH20u2JG4LW+LLo=
        </Y>
      </DSAKeyValue>
    </KeyValue>
    <X509Data>
      <X509IssuerSerial>
        <X509IssuerName>CN=Elliotte Rusty Harold,OU=Metrotech,O=Polytechnic,L=Brooklyn,ST=New York,C=US</X509IssuerName>
        <X509SerialNumber>1047659081</X509SerialNumber>
      </X509IssuerSerial>
      <X509SubjectName>CN=Elliotte Rusty Harold,OU=Metrotech,O=Polytechnic,L=Brooklyn,ST=New York,C=US</X509SubjectName>
      <X509Certificate>
MIIDLzCCAu0CBD5yAkkwCwYHKoZIzjgEAwUAMH0xCzAJBgNVBAYTAlVTMREwDwYDVQQIEwhOZXcg
WW9yazERMA8GA1UEBxMIQnJvb2tseW4xFDASBgNVBAoTC1BvbHl0ZWNobmljMRIwEAYDVQQLEwlN
ZXRyb3RlY2gxHjAcBgNVBAMTFUVsbGlvdHRlIFJ1c3R5IEhhcm9sZDAeFw0wMzAzMTQxNjI0NDFa
Fw0wMzA2MTIxNjI0NDFaMH0xCzAJBgNVBAYTAlVTMREwDwYDVQQIEwhOZXcgWW9yazERMA8GA1UE
BxMIQnJvb2tseW4xFDASBgNVBAoTC1BvbHl0ZWNobmljMRIwEAYDVQQLEwlNZXRyb3RlY2gxHjAc
BgNVBAMTFUVsbGlvdHRlIFJ1c3R5IEhhcm9sZDCCAbgwggEsBgcqhkjOOAQBMIIBHwKBgQD9f1OB
HXUSKVLfSpwu7OTn9hG3UjzvRADDHj+AtlEmaUVdQCJR+1k9jVj6v8X1ujD2y5tVbNeBO4AdNG/y
ZmC3a5lQpaSfn+gEexAiwk+7qdf+t8Yb+DtX58aophUPBPuD9tPFHsMCNVQTWhaRMvZ1864rYdcq
7/IiAxmd0UgBxwIVAJdgUI8VIwvMspK5gqLrhAvwWBz1AoGBAPfhoIXWmz3ey7yrXDa4V7l5lK+7
+jrqgvlXTAs9B4JnUVlXjrrUWU/mcQcQgYC0SRZxI+hMKBYTt88JMozIpuE8FnqLVHyNKOCjrh4r
s6Z1kW6jfwv6ITVi8ftiegEkO8yk8b6oUZCJqIPf4VrlnwaSi2ZegHtVJWQBTDv+z0kqA4GFAAKB
gQD9TdfTiXNQqP7Y1LFxXVIcFB7wnK+GPawyd6CUwENn2fB1O/K5gbXS4yAdiZ3kh32Wu8ZsZLPY
77EFTaiHbSFHmScrHgyY9mxuF4qVQWul37fPvULM3yKx8O7K+NPuFS3oGvzMPH83kC/xjbTmSEYX
CAT2qRpsfbS7Ykbgtb4sujALBgcqhkjOOAQDBQADLwAwLAIUKtIOsax3UbphktK0CnWEWz0yJ5gC
FAX5zyBcEp0+mYauptGaIjw7drSZ
      </X509Certificate>
    </X509Data>
  </KeyInfo>
</Signature>

An enveloping signature for hotcop.xml

<Signature xmlns="http://www.w3.org/2000/09/xmldsig#">
  <SignedInfo>
    <CanonicalizationMethod Algorithm="http://www.w3.org/TR/2001/REC-xml-c14n-20010315"></CanonicalizationMethod>
    <SignatureMethod Algorithm="http://www.w3.org/2000/09/xmldsig#dsa-sha1"></SignatureMethod>
    <Reference URI="#Res0">
      <Transforms>
        <Transform Algorithm="http://www.w3.org/TR/2001/REC-xml-c14n-20010315"></Transform>
      </Transforms>
      <DigestMethod Algorithm="http://www.w3.org/2000/09/xmldsig#sha1"></DigestMethod>
      <DigestValue>D9oQKdI9sEcHhJM6CTHDDKVjwSo=</DigestValue>
    </Reference>
  </SignedInfo>
  <SignatureValue>
    AqD5Zjfr0S64qAMPrOtznEhFBl1bXJ7aCosaY5pPMLHzGuzN7u1doQ==
  </SignatureValue>
  <KeyInfo>
    <KeyValue>
      <DSAKeyValue>
        <P>
          /X9TgR11EilS30qcLuzk5/YRt1I870QAwx4/gLZRJmlFXUAiUftZPY1Y+r/F9bow9s
          ubVWzXgTuAHTRv8mZgt2uZUKWkn5/oBHsQIsJPu6nX/rfGG/g7V+fGqKYVDwT7g/bT
          xR7DAjVUE1oWkTL2dfOuK2HXKu/yIgMZndFIAcc=
        </P>
        <Q>l2BQjxUjC8yykrmCouuEC/BYHPU=</Q>
        <G>
          9+GghdabPd7LvKtcNrhXuXmUr7v6OuqC+VdMCz0HgmdRWVeOutRZT+ZxBxCBgLRJFn
          Ej6EwoFhO3zwkyjMim4TwWeotUfI0o4KOuHiuzpnWRbqN/C/ohNWLx+2J6ASQ7zKTx
          vqhRkImog9/hWuWfBpKLZl6Ae1UlZAFMO/7PSSo=
        </G>
        <Y>
          /U3X04lzUKj+2NSxcV1SHBQe8Jyvhj2sMneglMBDZ9nwdTvyuYG10uMgHYmd5Id9lr
          vGbGSz2O+xBU2oh20hR5knKx4MmPZsbheKlUFrpd+3z71CzN8isfDuyvjT7hUt6Br8
          zDx/N5Av8Y205khGFwgE9qkabH20u2JG4LW+LLo=
        </Y>
      </DSAKeyValue>
    </KeyValue>
    <X509Data>
      <X509IssuerSerial>
        <X509IssuerName>CN=Elliotte Rusty Harold,OU=Metrotech,O=Polytechnic,L=Brooklyn,ST=New York,C=US</X509IssuerName>
        <X509SerialNumber>1047659081</X509SerialNumber>
      </X509IssuerSerial>
      <X509SubjectName>CN=Elliotte Rusty Harold,OU=Metrotech,O=Polytechnic,L=Brooklyn,ST=New York,C=US</X509SubjectName>
      <X509Certificate>
MIIDLzCCAu0CBD5yAkkwCwYHKoZIzjgEAwUAMH0xCzAJBgNVBAYTAlVTMREwDwYDVQQIEwhOZXcg
WW9yazERMA8GA1UEBxMIQnJvb2tseW4xFDASBgNVBAoTC1BvbHl0ZWNobmljMRIwEAYDVQQLEwlN
ZXRyb3RlY2gxHjAcBgNVBAMTFUVsbGlvdHRlIFJ1c3R5IEhhcm9sZDAeFw0wMzAzMTQxNjI0NDFa
Fw0wMzA2MTIxNjI0NDFaMH0xCzAJBgNVBAYTAlVTMREwDwYDVQQIEwhOZXcgWW9yazERMA8GA1UE
BxMIQnJvb2tseW4xFDASBgNVBAoTC1BvbHl0ZWNobmljMRIwEAYDVQQLEwlNZXRyb3RlY2gxHjAc
BgNVBAMTFUVsbGlvdHRlIFJ1c3R5IEhhcm9sZDCCAbgwggEsBgcqhkjOOAQBMIIBHwKBgQD9f1OB
HXUSKVLfSpwu7OTn9hG3UjzvRADDHj+AtlEmaUVdQCJR+1k9jVj6v8X1ujD2y5tVbNeBO4AdNG/y
ZmC3a5lQpaSfn+gEexAiwk+7qdf+t8Yb+DtX58aophUPBPuD9tPFHsMCNVQTWhaRMvZ1864rYdcq
7/IiAxmd0UgBxwIVAJdgUI8VIwvMspK5gqLrhAvwWBz1AoGBAPfhoIXWmz3ey7yrXDa4V7l5lK+7
+jrqgvlXTAs9B4JnUVlXjrrUWU/mcQcQgYC0SRZxI+hMKBYTt88JMozIpuE8FnqLVHyNKOCjrh4r
s6Z1kW6jfwv6ITVi8ftiegEkO8yk8b6oUZCJqIPf4VrlnwaSi2ZegHtVJWQBTDv+z0kqA4GFAAKB
gQD9TdfTiXNQqP7Y1LFxXVIcFB7wnK+GPawyd6CUwENn2fB1O/K5gbXS4yAdiZ3kh32Wu8ZsZLPY
77EFTaiHbSFHmScrHgyY9mxuF4qVQWul37fPvULM3yKx8O7K+NPuFS3oGvzMPH83kC/xjbTmSEYX
CAT2qRpsfbS7Ykbgtb4sujALBgcqhkjOOAQDBQADLwAwLAIUKtIOsax3UbphktK0CnWEWz0yJ5gC
FAX5zyBcEp0+mYauptGaIjw7drSZ
      </X509Certificate>
    </X509Data>
  </KeyInfo>
  <dsig:Object xmlns="" xmlns:dsig="http://www.w3.org/2000/09/xmldsig#" Id="Res0"><?xml-stylesheet type="text/css" href="song.css"?><SONG xmlns="http://www.cafeconleche.org/namespace/song" xmlns:xlink="http://www.w3.org/1999/xlink">
  <TITLE>Hot Cop</TITLE>
  <PHOTO ALT="Victor Willis in Cop Outfit" HEIGHT="200" WIDTH="100" xlink:href="hotcop.jpg" xlink:show="onLoad" xlink:type="simple"></PHOTO>
  <COMPOSER>Jacques Morali</COMPOSER>
  <COMPOSER>Henri Belolo</COMPOSER>
  <COMPOSER>Victor Willis</COMPOSER>
  <PRODUCER>Jacques Morali</PRODUCER>
  <!-- The publisher is actually Polygram but I needed 
       an example of a general entity reference. -->
  <PUBLISHER xlink:href="http://www.amrecords.com/" xlink:type="simple">
    A &amp; M Records
  </PUBLISHER>
  <LENGTH>6:20</LENGTH>
  <YEAR>1978</YEAR>
  <ARTIST>Village People</ARTIST>
</SONG><!-- You can tell what album I was 
     listening to when I wrote this example --></dsig:Object>
</Signature>

XML Encryption


XML Encryption Algorithms


Complete standard algorithm list

From the spec:

Block Encryption
  1. REQUIRED TRIPLEDES
    http://www.w3.org/2001/04/xmlenc#tripledes-cbc

  2. REQUIRED AES-128
    http://www.w3.org/2001/04/xmlenc#aes128-cbc

  3. REQUIRED AES-256
    http://www.w3.org/2001/04/xmlenc#aes256-cbc

  4. OPTIONAL AES-192
    http://www.w3.org/2001/04/xmlenc#aes192-cbc

Key Transport
  1. REQUIRED RSA-v1.5
    http://www.w3.org/2001/04/xmlenc#rsa-1_5

  2. REQUIRED RSA-OAEP
    http://www.w3.org/2001/04/xmlenc#rsa-oaep-mgf1p

Key Agreement
  1. OPTIONAL Diffie-Hellman
    http://www.w3.org/2001/04/xmlenc#dh

Symmetric Key Wrap
  1. REQUIRED TRIPLEDES KeyWrap
    http://www.w3.org/2001/04/xmlenc#kw-tripledes

  2. REQUIRED AES-128 KeyWrap
    http://www.w3.org/2001/04/xmlenc#kw-aes128

  3. REQUIRED AES-256 KeyWrap
    http://www.w3.org/2001/04/xmlenc#kw-aes256

  4. OPTIONAL AES-192 KeyWrap
    http://www.w3.org/2001/04/xmlenc#kw-aes192

Message Digest
  1. REQUIRED SHA1
    http://www.w3.org/2000/09/xmldsig#sha1

  2. RECOMMENDED SHA256
    http://www.w3.org/2001/04/xmlenc#sha256

  3. OPTIONAL SHA512
    http://www.w3.org/2001/04/xmlenc#sha512

  4. OPTIONAL RIPEMD-160
    http://www.w3.org/2001/04/xmlenc#ripemd160

Message Authentication
  1. RECOMMENDED XML Digital Signature
    http://www.w3.org/TR/2001/CR-xmldsig-core-20010419/

Canonicalization
  1. OPTIONAL Canonical XML with Comments
    http://www.w3.org/TR/2001/REC-xml-c14n-20010315#WithComments

  2. OPTIONAL Canonical XML (omits comments)
    http://www.w3.org/TR/2001/REC-xml-c14n-20010315

Encoding
  1. REQUIRED base64
    http://www.w3.org/2000/09/xmldsig#base64


XML Encryption Syntax


An XML Document containing sensitive information

<?xml version='1.0'?>
<PaymentInfo xmlns="http://example.org/payment">
  <Name>Elliotte Rusty Harold<Name/>
  <CreditCard Limit='1000' Currency='USD'>
    <Number>1234 5678 9012 3456</Number>
    <Issuer>Citibank</Issuer>
    <Expiration>03/02</Expiration>
  </CreditCard>
</PaymentInfo>

An XML Document containing an encrypted element

<?xml version='1.0'?>
<PaymentInfo xmlns="http://example.org/payment">
  <Name>Elliotte Rusty Harold<Name/>
  <EncryptedData Type='http://www.w3.org/2001/04/xmlenc#Element'
     xmlns='http://www.w3.org/2001/04/xmlenc#'>
    <EncryptionMethod
      Algorithm="http://www.w3.org/2001/04/xmlenc#3des-cbc"/>
    <CipherData>
      <CipherValue>A23B45C56CABE4BE33327</CipherValue>
    </CipherData>
  </EncryptedData>
</PaymentInfo>

An XML Document containing encrypted element data

<?xml version='1.0'?>
<PaymentInfo xmlns="http://example.org/payment">
  <Name>Elliotte Rusty Harold<Name/>
  <CreditCard Limit="1000" Currency="USD">
     <EncryptedData Type="http://www.w3.org/2001/04/xmlenc#Content"
        xmlns="http://www.w3.org/2001/04/xmlenc#">
      <EncryptionMethod
        Algorithm="http://www.w3.org/2001/04/xmlenc#3des-cbc"/>
      <CipherData>
        <CipherValue>A23B45C56CABE4BE3</CipherValue>
      </CipherData>
    </EncryptedData>
  </CreditCard>
</PaymentInfo>

An XML Document containing encrypted text

<?xml version='1.0'?>
<PaymentInfo xmlns="http://example.org/payment">
  <Name>Elliotte Rusty Harold<Name/>
  <CreditCard Limit='1000' Currency='USD'>
    <Number>
      <EncryptedData Type="http://www.w3.org/2001/04/xmlenc#Content"
                     xmlns="http://www.w3.org/2001/04/xmlenc#">
        <EncryptionMethod
          Algorithm="http://www.w3.org/2001/04/xmlenc#3des-cbc"/>
        <CipherData>
          <CipherValue>A23B45C56CABE4BE</CipherValue>
        </CipherData>
      </EncryptedData>
    </Number>
    <Issuer>Citibank</Issuer>
    <Expiration>03/02</Expiration>
  </CreditCard>
</PaymentInfo>

A completely encrypted XML Document

<?xml version='1.0'?>
<EncryptedData 
   Type="http://www.isi.edu/in-notes/iana/assignments/media-types/text/xml"
   xmlns="http://www.w3.org/2001/04/xmlenc#">
  <EncryptionMethod Algorithm="http://www.w3.org/2001/04/xmlenc#3des-cbc"/>
  <CipherData>
    <CipherValue>A23B45C56CABE4BE7687989219C4E5DEADBEEFCAFEBABE</CipherValue>
  </CipherData>
</EncryptedData>

XML Encryption Software

xss4j, IBM's XML Security Suite:
Apache XML Security Suite
Commericial implementations from Baltimore and Phaos
JSR-106: XML Digital Encryption APIs

Issues XML Encryption doesn't address


To Learn More


Part II: XML 1.1

Discriminating against languages simply because their scripts were not encoded in Unicode 2.0 is inherently unjust.

--XML Blueberry Requirements


New features


Changes in XML 1.1


White Space


Native language markup


Name characters


Harm Reduction proposals


Namespaces 1.1


IRIs


Undeclaring prefixes


Part III: XInclude

The problem is that we're not providing the tools. We're providing the specs. That's a whole different ball game. If tools existed for actually making really interesting use of RDF and XLink and XInclude then people would use them. If IE and/or Mozilla supported the full gamut of specs, from XSLT 1.0 to XLink and XInclude (OK, so they're not quite REC's, but with time...) then you would find people using them more.
--Matt Sergeant on the xml-dev mailing list


What is XInclude?


Alternatives (and why they don't work)


The include element

<book xmlns:xinclude="http://www.w3.org/2001/XInclude">
  <title>Processing XML with Java</title>
  <chapter><xinclude:include href="dom.xml"/></chapter>
  <chapter><xinclude:include href="sax.xml"/></chapter>
  <chapter><xinclude:include href="jdom.xml"/></chapter>
</book>

The parse attribute

parse="xml"
The resource must be parsed as XML and the infosets merged. This is the default.
parse="text"
The resource must be treated as pure text and inserted as a text node. When serialized, this means that characters like < will change to &lt; and so forth.
<slide xmlns:xinclude="http://www.w3.org/2001/XInclude">
  <title>The href attribute</title>
  
<ul>
  <li>Identifies the document to be included with a URI</li>
  <li>The document at the URI replaces the <code>include</code> 
      element in the including document</li>
  <li>The <code>xinclude</code> prefix is bound to the http://www.w3.org/2001/XInclude
  namespace URI. 
  </li>
</ul>  

<pre><code><xinclude:include parse="text" href="processing_xml_with_java.xml"/>
</code></pre>
        
  <description>
      A slide from Elliotte Rusty Harold's Bleeding Edge of XML presentation at
      <host_ref/>, <date_ref/>
    </description>
  <last_modified>October 26, 2000</last_modified>
</slide>


The encoding attribute

<slide xmlns:xinclude="http://www.w3.org/2001/XInclude">
  <title>The href attribute</title>
  
<ul>
  <li>Identifies the document to be included with a URI</li>
  <li>The document at the URI replaces the <code>include</code> 
      element in the including document</li>
  <li>The <code>xinclude</code> prefix is bound to the http://www.w3.org/2001/XInclude
  namespace URI. 
  </li>
</ul>  

<pre><code><xinclude:include parse="text" encoding="ISO-8859-1" 
                  href="processing_xml_with_java.xml"/>
</code></pre>
        
  <description>
      A slide from Elliotte Rusty Harold's Bleeding Edge of XML presentation at
      <host_ref/>, <date_ref/>
    </description>
  <last_modified>October 26, 2000</last_modified>
</slide>


The fallback element


Implementation as a SAX filter


SAX XInclude Driver


Implementation as JDOM


Implementation as DOM


To Learn More


Part IV: SAX 2.1

Actually, SAX2 has ** MUCH ** better infoset support than DOM does. Yes, I've done the detailed analysis.

--David Brownell on the xml-dev mailing list


Goals


Specified vs. Defaulted Attributes


standalone declaration

<?xml version="1.0" standalone="yes"?>


The version and encoding properties

<?xml version="1.0" encoding="UTF-16"?>


Feature/Property discovery


DefaultHandler infoset extensions


Parser identification


xmlns URI


Error IDs


To Learn More


Part V: DOM Level 3

of all of the things the W3C has given us, the DOM is probably the one with the least value.

--Michael Brennan on the xml-dev mailing list


DOM Evolution


New Features in DOM Level 3


DOM Level 3 Core Changes


New methods in Node interface


User Data


New methods in Entity


New methods in Document


New methods in Text


Bootstrapping


DOM3 Bootstrapping


DOM Error Handler Interfaces


The DOMErrorHandler Interface


The DOMError Interface

package org.w3c.dom;

public interface DOMError {
    // ErrorSeverity
    public static final short SEVERITY_WARNING          = 0;
    public static final short SEVERITY_ERROR            = 1;
    public static final short SEVERITY_FATAL_ERROR      = 2;

    public short  getSeverity();
    public String getMessage();
    public String getType();
    public Object getRelatedException();
    public Object getRelatedData();
    public DOMLocator getLocation();

}

The DOMLocator Interface


Load and Save


The DOM Process

  1. Library specific code creates a parser

  2. The parser parses the document and returns a DOM org.w3c.dom.Document object.

  3. The entire document is stored in memory.

  4. DOM methods and interfaces are used to extract data from this object


Parsing documents with DOM2

This program parses with Xerces. Other parsers are different.

import org.apache.xerces.parsers.*;
import org.w3c.dom.*;
import org.xml.sax.*;
import java.io.*;

public class DOMParserMaker {

  public static void main(String[] args) {
     
    DOMParser parser = new DOMParser();
    
    for (int i = 0; i < args.length; i++) {
      try {
        parser.parse(args[i]); 
       
        Document d = parser.getDocument();
      }
      catch (SAXException e) {
        System.err.println(e); 
      }
      catch (IOException e) {
        System.err.println(e); 
      }
      
    }
   
  }

}

Parsing documents with JAXP

import javax.xml.parsers.*; // JAXP
import org.xml.sax.SAXException;
import java.io.IOException;


public class JAXPParserMaker {

  public static void main(String[] args) {
     
    if (args.length <= 0) {
      System.out.println("Usage: java JAXPParserMaker URL");
      return;
    }
    String document = args[0];
    
    try {
      DocumentBuilderFactory factory 
       = DocumentBuilderFactory.newInstance();
      DocumentBuilder parser = factory.newDocumentBuilder();
      parser.parse(document); 
      System.out.println(document + " is well-formed.");
    }
    catch (SAXException e) {
      System.out.println(document + " is not well-formed.");
    }
    catch (IOException e) { 
      System.out.println(
       "Due to an IOException, the parser could not check " 
       + document
      ); 
    }
    catch (FactoryConfigurationError e) { 
      // JAXP suffers from excessive brain-damage caused by 
      // intellectual in-breeding at Sun. (Basically the Sun 
      // engineers spend way too much time talking to each other
      // and not nearly enough time talking to people outside 
      // Sun.) Fortunately, you can happily ignore most of the 
      // JAXP brain damage and not be any the poorer for it.
      
      // This, however, is one of the few problems you can't 
      // avoid if you're going to use JAXP at all. 
      // DocumentBuilderFactory.newInstance() should throw a 
      // ClassNotFoundException if it can't locate the factory
      // class. However, what it does throw is an Error,
      // specifically a FactoryConfigurationError. Very few 
      // programs are prepared to respond to errors as opposed
      // to exceptions. You should catch this error in your 
      // JAXP programs as quickly as possible even though the
      // compiler won't require you to, and you should 
      // never rethrow it or otherwise let it escape from the 
      // method that produced it. 
      System.out.println("Could not locate a factory class"); 
    }
    catch (ParserConfigurationException e) { 
      System.out.println("Could not locate a JAXP parser"); 
    }
   
  }

}

Parsing documents with DOM3

import org.w3c.dom.*;

public class DOM3ParserMaker {

  public static void main(String[] args) {

    DOMImplementation impl 
     = DOMImplementationRegistry.getDOMImplementation("XML 1.0 LS-Load 3.0");
    DOMImplementationLS implls = (DOMImplementationLS) impl;
    DOMBuilder parser = implls.getDOMBuilder();

    for (int i = 0; i < args.length; i++) {
      try {
        Document d = parser.parseURI(args[i]);
      }
      catch (DOMSystemException e) {
        System.err.println(e);
      }
      catch (DOMException e) {
        System.err.println(e);
      }

    }

  }

}

This code will not actually compile or run until some parser supports DOM3 Load and Save.


Load and Save

DOMImplementationLS
A sub-interface od DOMImplementation that provides the factory methods for creating the objects required for loading and saving.
DOMBuilder
A parser interface
DOMInputSource
Encapsulate information about the source of the XML to be loaded, like SAX's InputSource
DOMEntityResolver
During loading, provides a way for applications to redirect references to external entities.
DOMBuilderFilter
Provide the ability to examine and optionally remove Element nodes as they are being processed during the parsing of a document. like SAX filters.
DOMWriter
An interface for serializing DOM documents onto a stream or string.
DOMWriterFilter
Provide the ability to examine and optionally remove or modify nodes as they are being output.
DocumentLS
A "mechanism by which the content of a document can be replaced with the DOM tree produced when loading a URL, or parsing a string."
ParserErrorEvent
Some sort of error detected in the input document (well-formedness? validity?)
LSLoadEvent
A document has been completely loaded
LSProgressEvent
A document has been partially loaded

DOMImplementationLS


Creating DOMImplementationLS Objects

  1. Use the feature "LS-Load" to find a DOMImplementation object that supports Load and Save.

  2. Cast the DOMImplementation object to DOMImplementationLS.

DOMImplementation impl 
 = DOMImplementationRegistry.getDOMImplementation("XML 1.0 LS-Load 3.0");
  if (impl != null) {
    DOMImplementationLS implls = (DOMImplementationLS) impl;
    // ...
  }

DOMBuilder


DOMInputSource


DOMEntityResolver


DOMWriter


DOMBuilderFilter


DOMWriterFilter


DocumentLS


ParserErrorEvent


Validation


To Learn More


Part VI: XOM


To Learn More


Index | Cafe con Leche

Copyright 2000-2003 Elliotte Rusty Harold
elharo@metalab.unc.edu
Last Modified March 14, 2003