XML Fundamentals

XML Fundamentals

Elliotte Rusty Harold

Bank of America

September 21, 2000

elharo@metalab.unc.edu

http://metalab.unc.edu/xml/


What is XML?


XML is a Meta Markup Language


XML describes structure and semantics, not formatting


A Song Description in HTML

<dt>Hot Cop
<dd> by Jacques Morali, Henri Belolo, and Victor Willis
<ul>
<li>Producer: Jacques Morali
<li>Publisher: PolyGram Records
<li>Length: 6:20
<li>Written: 1978
<li>Artist: Village People
</ul>
View Document in Browser

A Song Description in XML

<SONG>
  <TITLE>Hot Cop</TITLE>
  <COMPOSER>Jacques Morali</COMPOSER>
  <COMPOSER>Henri Belolo</COMPOSER>
  <COMPOSER>Victor Willis</COMPOSER>
  <PRODUCER>Jacques Morali</PRODUCER>
  <PUBLISHER>PolyGram Records</PUBLISHER>
  <LENGTH>6:20</LENGTH>
  <YEAR>1978</YEAR>
  <ARTIST>Village People</ARTIST>
</SONG>
View Document in Browser

Editing and Saving XML Files


Style Sheets provide formatting

SONG {display: block; font-family: New York, Times New Roman, serif}
TITLE {display: block; font-size: 24pt; 
       font-weight: bold; font-family: Helvetica, sans}
COMPOSER {display: block}
PRODUCER {display: block}
YEAR {display: block}
PUBLISHER {display: block}
LENGTH {display: block}
ARTIST {display: block; font-style: italic}

Attaching style sheets to documents

<?xml-stylesheet type="text/css" href="song.css"?>
<SONG>
  <TITLE>Hot Cop</TITLE>
  <COMPOSER>Jacques Morali</COMPOSER>
  <COMPOSER>Henri Belolo</COMPOSER>
  <COMPOSER>Victor Willis</COMPOSER>
  <PRODUCER>Jacques Morali</PRODUCER>
  <PUBLISHER>PolyGram Records</PUBLISHER>
  <LENGTH>6:20</LENGTH>
  <YEAR>1978</YEAR>
  <ARTIST>Village People</ARTIST>
</SONG>

View Document in Browser

Style Sheet Languages


An XSLT stylesheet

<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:template match="/">
    <html>
      <head><title>Song</title></head>
      <body>
        <xsl:apply-templates select="SONG"/>    
      </body>
    </html>
  </xsl:template>
  
  <xsl:template match="SONG">
    <h1>
      <xsl:value-of select="TITLE"/> 
      by the 
      <xsl:value-of select="ARTIST"/>
    </h1>
    
    <ul>
      <li>Length: <xsl:value-of select="LENGTH"/></li>
      <li>Producer: <xsl:value-of select="PRODUCER"/></li>
      <li>Publisher: <xsl:value-of select="PUBLISHER"/></li>
      <li>Year: <xsl:value-of select="YEAR"/></li>
      <xsl:apply-templates select="COMPOSER"/>
    </ul>
  </xsl:template>

  <xsl:template match="COMPOSER">
    <li>Composer: <xsl:value-of select="."/></li>
  </xsl:template>

</xsl:stylesheet>

Transforming the Document

D:\fundamentals\examples> saxon hotcop.xml song3.xsl
<html>
<head>
<title>Song</title>
</head>
<body>
<h1>Hot Cop
      by the
      Village People</h1>
<ul>
<li>Length: 6:20</li>
<li>Producer: Jacques Morali</li>
<li>Publisher: PolyGram Records</li>
<li>Year: 1978</li>
<li>Composer: Jacques Morali</li>
<li>Composer: Henri Belolo</li>
<li>Composer: Victor Willis</li>
</ul>
</body>
</html>

Or alternately:

% java com.icl.saxon.StyleSheet -x org.apache.xerces.parsers.SAXParser xml_fundamentals.xml slides.xsl hotcop.xml song3.xsl
<html>
...


View Document in Browser

Well-formedness

Rules:


Validity

To be valid an XML document must be

  1. Well-formed

  2. Must have a DTD

  3. Must comply with the constraints specified in the DTD


A DTD for Songs

<!ELEMENT SONG (TITLE, COMPOSER+, PRODUCER*, 
 PUBLISHER*, LENGTH?, YEAR?, ARTIST+)>

<!ELEMENT TITLE (#PCDATA)>

<!ELEMENT COMPOSER (#PCDATA)>
<!ELEMENT PRODUCER (#PCDATA)>
<!ELEMENT PUBLISHER (#PCDATA)>
<!ELEMENT LENGTH (#PCDATA)>
<!-- This should be a four digit year like "1999",
     not a two-digit year like "99" -->
<!ELEMENT YEAR (#PCDATA)>

<!ELEMENT ARTIST (#PCDATA)>

A Valid Song Document

<!DOCTYPE SONG SYSTEM "song.dtd">
<SONG>
  <TITLE>Hot Cop</TITLE>
  <COMPOSER>Jacques Morali</COMPOSER>
  <COMPOSER>Henri Belolo</COMPOSER>
  <COMPOSER>Victor Willis</COMPOSER>
  <PRODUCER>Jacques Morali</PRODUCER>
  <PUBLISHER>PolyGram Records</PUBLISHER>
  <LENGTH>6:20</LENGTH>
  <YEAR>1978</YEAR>
  <ARTIST>Village People</ARTIST>
</SONG>

Checking Validity

To check validity you pass the document through a validating parser which should report any errors it finds. For example,

% java dom.DOMCount -v validhotcop.xml
[Error] validhotcop.xml:13:9: The content of element type "SONG" must match "(TI
TLE,COMPOSER+,PRODUCER*,PUBLISHER*,LENGTH?,YEAR?)".
validhotcop.xml: 550 ms (10 elems, 0 attrs, 28 spaces, 98 chars)

A valid document:

% java dom.DOMCount -v validhotcop.xml
validhotcop.xml: 291 ms (10 elems, 0 attrs, 28 spaces, 98 chars)

A More Complex Example

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/css" href="song.css"?>
<!DOCTYPE SONG SYSTEM "expanded_song.dtd">
<SONG xmlns="http://metalab.unc.edu/xml/namespace/song"
      xmlns:xlink="http://www.w3.org/1999/xlink">
  <TITLE>Hot Cop</TITLE>
  <PHOTO 
    xlink:type="simple" xlink:show="onLoad" xlink:href="hotcop.jpg"
    ALT="Victor Willis in Cop Outfit" WIDTH="100" HEIGHT="200"/>
  <COMPOSER>Jacques Morali</COMPOSER>
  <COMPOSER>Henri Belolo</COMPOSER>
  <COMPOSER>Victor Willis</COMPOSER>
  <PRODUCER>Jacques Morali</PRODUCER>
  <!-- The publisher is actually Polygram but I needed 
       an example of a general entity reference. -->
  <PUBLISHER xlink:type="simple" xlink:href="http://www.amrecords.com/">
    A &amp; M Records
  </PUBLISHER>
  <LENGTH>6:20</LENGTH>
  <YEAR>1978</YEAR>
  <ARTIST>Village People</ARTIST>
</SONG>
<!-- You can tell what album I was 
     listening to when I wrote this example -->

The XML Declaration

<?xml version="1.0" encoding="UTF-8" standalone="no"?>


Attributes

<PHOTO xlink:type="simple" xlink:show="onLoad" xlink:href="hotcop.jpg" ALT="Victor Willis in Cop Outfit" WIDTH="100" HEIGHT="200" />


Empty Element Tags

<PHOTO xlink:type="simple" xlink:show="onLoad" xlink:href="hotcop.jpg" ALT="Victor Willis in Cop Outfit" WIDTH="100" HEIGHT="200" />


Comments

<!-- You can tell what album I was listening to when I wrote this example -->


Namespaces

<SONG xmlns="http://metalab.unc.edu/xml/namespace/song"
      xmlns:xlink="http://www.w3.org/1999/xlink">
  <TITLE>Hot Cop</TITLE>
  <PHOTO 
    xlink:type="simple" xlink:show="onLoad" xlink:href="hotcop.jpg"
    ALT="Victor Willis in Cop Outfit" WIDTH="100" HEIGHT="200"/>
  <COMPOSER>Jacques Morali</COMPOSER>
  <PUBLISHER xlink:type="simple" xlink:href="http://www.amrecords.com/">
    A &amp; M Records
  </PUBLISHER>
  <ARTIST>Village People</ARTIST>
</SONG>

Entity References

A & M Records


A More Complex DTD

<!ELEMENT SONG (TITLE, PHOTO?, COMPOSER+, PRODUCER*, 
 PUBLISHER*, LENGTH?, YEAR?, ARTIST+)>
<!ATTLIST SONG xmlns       CDATA #REQUIRED
               xmlns:xlink CDATA #REQUIRED>
<!ELEMENT TITLE (#PCDATA)>

<!ELEMENT PHOTO EMPTY>
<!ATTLIST PHOTO xlink:type CDATA #FIXED "simple"
                xlink:href CDATA #REQUIRED
                xlink:show CDATA #IMPLIED
                ALT        CDATA #REQUIRED
                WIDTH      CDATA #REQUIRED
                HEIGHT     CDATA #REQUIRED
>

<!ELEMENT COMPOSER (#PCDATA)>
<!ELEMENT PRODUCER (#PCDATA)>
<!ELEMENT PUBLISHER (#PCDATA)>
<!ATTLIST PUBLISHER xlink:type CDATA #IMPLIED
                    xlink:href CDATA #IMPLIED
>

<!ELEMENT LENGTH (#PCDATA)>
<!-- This should be a four digit year like "1999",
     not a two-digit year like "99" -->
<!ELEMENT YEAR (#PCDATA)>

<!ELEMENT ARTIST (#PCDATA)>

What is XML used for?


Domain-Specific Markup Languages


Self-Describing Data


An XML Fragment

<PERSON ID="p1100" SEX="M">
  <NAME>
    <GIVEN>Judson</GIVEN>
    <SURNAME>McDaniel</SURNAME>
  </NAME>
  <BIRTH>
    <DATE>21 Feb 1834</DATE>
  </BIRTH>
  <DEATH>
    <DATE>9 Dec 1905</DATE>
  </DEATH>
</PERSON>

Interchange of Data Among Applications


XML Applications


Example XML Applications


Mathematical Markup Language

<?xml version="1.0"?>
<html xmlns="http://www.w3.org/TR/REC-html40"
      xmlns:m="http://www.w3.org/TR/REC-MathML/"
>
<head>
<title>Fiat Lux</title>
<meta name="GENERATOR" content="amaya V1.3b" />
</head>
<body>

<P>
And God said,
</P>

<math>
  <m:mrow>
    <m:msub>
      <m:mi>&delta;</m:mi>
      <m:mi>&alpha;</m:mi>
    </m:msub>
    <m:msup>
      <m:mi>F</m:mi>
      <m:mi>&alpha;&beta;</m:mi>
    </m:msup>
    <m:mi></m:mi>
    <m:mo>=</m:mo>
    <m:mi></m:mi>
    <m:mfrac>
      <m:mrow>
        <m:mn>4</m:mn>
        <m:mi>&pi;</m:mi>
      </m:mrow>
      <m:mi>c</m:mi>
    </m:mfrac>
    <m:mi></m:mi>
    <m:msup>
      <m:mi>J</m:mi>
      <m:mrow>
        <m:mi>&beta;</m:mi>
        <m:mo></m:mo>
      </m:mrow>
    </m:msup>
  </m:mrow>
</math>

<P>
and there was light
</P>
</body>
</html>

Channel Definition Format

<?xml version="1.0"?>
<CHANNEL HREF="http://metalab.unc.edu/xml/index.html">
  <TITLE>Cafe con Leche</TITLE>
  <ITEM HREF="http://metalab.unc.edu/xml/books.html">
    <TITLE>Books about XML</TITLE>
  </ITEM>
  <ITEM HREF="http://metalab.unc.edu/xml/tradeshows.html">
    <TITLE>Trade shows and conferences about XML</TITLE>
  </ITEM>
  <ITEM HREF="http://metalab.unc.edu/xml/lists.htm">
    <TITLE>Mailing Lists dedicated to XML</TITLE>
  </ITEM>
</CHANNEL>

Classic Literature


Vector Graphics

A VML document

The Resource Description Framework (RDF)


An Example of RDF

<rdf:RDF 
 xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  xmlns:dc="http://purl.org/DC/>
  <rdf:Description about="http://metalab.unc.edu/xml/>
    <dc:CREATOR>Elliotte Rusty Harold</dc:CREATOR>
    <dc:TITLE>Cafe con Leche</dc:TITLE>
  </rdf:Description>
</rdf:RDF>

XML for XML


XSL: The Extensible Stylesheet Language


W3C XML Schemas

<xsd:schema xmlns:xsd="http://www.w3.org/1999/XMLSchema">
 
  <xsd:element name="SONG" type="SongType"/>

  <xsd:complexType name="SongType">
  
    <xsd:element name="TITLE"     type="xsd:string" minOccurs="1" maxOccurs="1"/>
    <xsd:element name="COMPOSER"  type="xsd:string" minOccurs="1" maxOccurs="unbounded"/>
    <xsd:element name="PRODUCER"  type="xsd:string" minOccurs="0" maxOccurs="unbounded"/>
    <xsd:element name="PUBLISHER" type="xsd:string" minOccurs="0" maxOccurs="1"/>
  
    <xsd:element name="LENGTH" type="xsd:timeDuration" minOccurs="0" maxOccurs="1"/>
    <xsd:element name="YEAR"   type="xsd:year" minOccurs="1" maxOccurs="1"/>

    <xsd:element name="ARTIST" type="xsd:string" minOccurs="1" maxOccurs="unbounded"/>
    
  </xsd:complexType>

</xsd:schema>

XLinks: The Extensible Linking Language

<footnote xlink:type="simple" xlink:href="footnote7.xml">7</footnote>

Non-XML for XML


File Formats, in-house applications, and other behind the scenes uses


A larger XML example: Music Catalog


Sample Catalog

http://metalab.unc.edu/nywc/

Organizing the Data


What is the Root Element


The Root Element

<?xml version="1.0" encoding="ISO-8859-1"?>
<catalog>
  Everything else will go here...
</catalog>
View in Browser

The Root Element Style Sheet

<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:template match="/">
    <html>
      <head><title></title></head>
      <body>
        <xsl:apply-templates/>    
      </body>
    </html>
  </xsl:template>

</xsl:stylesheet>

The Root Element Style Sheet Output

<html>
   <head>
      <meta http-equiv="Content-Type" content="application/xml; charset=utf-8">
   
      <title></title>
   </head>
   <body>
      Everything else will go here...
      
   </body>
</html>
View Result in Browser

What are the Immediate Children of the Root?


Child Elements

<?xml version="1.0" encoding="ISO-8859-1"?>
<catalog>

  <category>
    Small chamber ensembles - 2-4 Players by New York Women Composers
  </category>

</catalog>
View in Browser

Style Sheet

<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:template match="/">
    <html>
        <xsl:apply-templates select="catalog"/>    
    </html>
  </xsl:template>

  <xsl:template match="catalog">
      <head><title><xsl:value-of select="category"/></title></head>
      <body>
        <h1><xsl:value-of select="category"/></h1>  
      </body>
  </xsl:template>

</xsl:stylesheet>

Style Sheet Output

<html>
   <head>
      <meta http-equiv="Content-Type" content="application/xml; charset=utf-8">
   
      <title>
         Small chamber ensembles - 2-4 Players by New York Women Composers
         
      </title>
   </head>
   <body>
      <h1>
         Small chamber ensembles - 2-4 Players by New York Women Composers
         
      </h1>
   </body>
</html>
View Result in Browser

White space in XML is not especially significant

<?xml version="1.0" encoding="ISO-8859-1"?>
<catalog><category>Small chamber ensembles 
- 2-4 Players by New York Women Composers</category></catalog>
View in Browser

Composers

Each composer has a name

<?xml version="1.0" encoding="ISO-8859-1"?>
<catalog>

  <category>
    Small chamber ensembles - 2-4 Players by New York Women Composers
  </category>

  <composer>
    <name>Julie Mandel</name>
  </composer>

  <composer>
    <name>Margaret De Wys</name>
  </composer>  
    
  <composer>
    <name>Beth Anderson</name>
  </composer>
    
  <composer>
    <name>Linda Bouchard</name>
  </composer>

</catalog>
View in Browser

Style Sheet

<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:template match="/">
    <html>
        <xsl:apply-templates select="catalog"/>    
    </html>
  </xsl:template>

  <xsl:template match="catalog">
      <head><title><xsl:value-of select="category"/></title></head>
      <body>
        <h1><xsl:value-of select="category"/></h1> 
        <xsl:apply-templates select="composer"/> 
      </body>
  </xsl:template>

  <xsl:template match="composer">
      <h2><xsl:value-of select="."/></h2>
  </xsl:template>

</xsl:stylesheet>

Style Sheet Output

<html>
   <head>
      <meta http-equiv="Content-Type" content="application/xml; charset=utf-8">
   
      <title>
         Small chamber ensembles - 2-4 Players by New York Women Composers
         
      </title>
   </head>
   <body>
      <h1>
         Small chamber ensembles - 2-4 Players by New York Women Composers
         
      </h1>
      <h2>
         Julie Mandel
         
      </h2>
      <h2>
         Margaret De Wys
         
      </h2>
      <h2>
         Beth Anderson
         
      </h2>
      <h2>
         Linda Bouchard
         
      </h2>
   </body>
</html>
View Result in Browser

Grand Children

<?xml version="1.0" encoding="ISO-8859-1"?>
<catalog>

  <category>
    Small chamber ensembles - 2-4 Players by New York Women Composers
  </category>

  <composer>
    <name>
      <first_name>Julie</first_name> 
      <middle_name></middle_name> 
      <last_name>Mandel</last_name>
    </name>
  </composer>

  <composer>
    <name>
      <first_name>Margaret</first_name> 
      <middle_name>De</middle_name> 
      <last_name>Wys</last_name>
    </name>
  </composer>  
    
  <composer>
    <name>
      <first_name>Beth</first_name> 
      <middle_name></middle_name> 
      <last_name>Anderson</last_name>
    </name>
  </composer>
    
  <composer>
    <name>
      <first_name>Linda</first_name> 
      <middle_name></middle_name> 
      <last_name>Bouchard</last_name>
    </name>
  </composer>

</catalog>
View in Browser

The Same Style Sheet Still Works

<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:template match="/">
    <html>
        <xsl:apply-templates select="catalog"/>    
    </html>
  </xsl:template>

  <xsl:template match="catalog">
      <head><title><xsl:value-of select="category"/></title></head>
      <body>
        <h1><xsl:value-of select="category"/></h1> 
        <xsl:apply-templates select="composer"/> 
      </body>
  </xsl:template>

  <xsl:template match="composer">
      <h2><xsl:value-of select="."/></h2>
  </xsl:template>

</xsl:stylesheet>

Style Sheet Output

<html>
   <head>
      <meta http-equiv="Content-Type" content="application/xml; charset=utf-8">
   
      <title>
         Small chamber ensembles - 2-4 Players by New York Women Composers
         
      </title>
   </head>
   <body>
      <h1>
         Small chamber ensembles - 2-4 Players by New York Women Composers
         
      </h1>
      <h2>
         
         Julie 
          
         Mandel
         
         
      </h2>
      <h2>
         
         Margaret 
         De 
         Wys
         
         
      </h2>
      <h2>
         
         Beth 
          
         Anderson
         
         
      </h2>
      <h2>
         
         Linda 
          
         Bouchard
         
         
      </h2>
   </body>
</html>
View Result in Browser

Attributes

<?xml version="1.0" encoding="ISO-8859-1"?>
<catalog>

  <category>
    Small chamber ensembles - 2-4 Players by New York Women Composers
  </category>

  <composer id="c1">
    <name>
      <first_name>Julie</first_name> 
      <middle_name></middle_name> 
      <last_name>Mandel</last_name>
    </name>
  </composer>

  <composer id="c2">
    <name>
      <first_name>Margaret</first_name> 
      <middle_name>De</middle_name> 
      <last_name>Wys</last_name>
    </name>
  </composer>  
    
  <composer id="c3">
    <name>
      <first_name>Beth</first_name> 
      <middle_name></middle_name> 
      <last_name>Anderson</last_name>
    </name>
  </composer>
    
  <composer id="c4">
    <name>
      <first_name>Linda</first_name> 
      <middle_name></middle_name> 
      <last_name>Bouchard</last_name>
    </name>
  </composer>

</catalog>
View in Browser

Style Sheet with Attributes

<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:template match="/">
    <html>
       <xsl:apply-templates select="catalog"/>    
    </html>
  </xsl:template>

  <xsl:template match="catalog">
      <head><title><xsl:value-of select="category"/></title></head>
      <body>
        <h1><xsl:value-of select="category"/></h1> 
        <xsl:apply-templates select="composer"/> 
      </body>
  </xsl:template>

  <xsl:template match="composer">
      <h2 id="{@id}"><xsl:value-of select="."/></h2>
  </xsl:template>

</xsl:stylesheet>

Style Sheet Output

<html>
   <head>
      <meta http-equiv="Content-Type" content="application/xml; charset=utf-8">
   
      <title>
         Small chamber ensembles - 2-4 Players by New York Women Composers
         
      </title>
   </head>
   <body>
      <h1>
         Small chamber ensembles - 2-4 Players by New York Women Composers
         
      </h1>
      <h2 id="c1">
         
         Julie 
          
         Mandel
         
         
      </h2>
      <h2 id="c2">
         
         Margaret 
         De 
         Wys
         
         
      </h2>
      <h2 id="c3">
         
         Beth 
          
         Anderson
         
         
      </h2>
      <h2 id="c4">
         
         Linda 
          
         Bouchard
         
         
      </h2>
   </body>
</html>
View Result in Browser

Attributes vs. Elements


When not to use attributes


Compositions

Let's look at an example of what we want:

Rendered HTML:

Brass Swale (1988) 5", tbn, 2 Bfl tpts, bar. hn

Tonal. Commissioned/Premiered by the Redlands' New Music Ensemble. (A swale is a meadow or a marsh where a lot of wild plants grow together. The composer discovered the word when a horse named Swale won the Kentucky Derby several years ago. Since her work is primarily collage of newly composed musical swatches, she has used the name extensively.) ACA - American Composers Alliance


Composition Example in HTML

Or in HTML:

<dt><cite>Brass Swale</cite> (1988) 5", tbn, 2 Bfl tpts, bar. hn</dt>
<dd><p>
Tonal. Commissioned/Premiered by the Redlands' New Music 
Ensemble. (A swale is a meadow or a marsh where a lot of 
wild plants grow together. The composer discovered the word 
when a horse named Swale won the Kentucky Derby several 
years ago. Since her work is primarily collage of newly 
composed musical swatches, she has used the name 
extensively.)  ACA - American Composers 
Alliance</p>
</dd>

Each composition has a


Composition Example in XML

  <composition>
    <title>Brass Swale</title>
    <date>1988</date> 
    <length>5"</length>
    <instruments>tbn, 2 Bfl tpts, bar, hn</instruments>
    <description>
	  Tonal. Commissioned/Premiered by the Redlands' New Music
	  Ensemble. (A swale is a meadow or a marsh where a lot of
	  wild plants grow together. The composer discovered the word
	  when a horse named Swale won the Kentucky Derby several
	  years ago. Since her work is primarily collage of newly
	  composed musical swatches, she has used the name
	  extensively.)
    </description>
    <publisher>ACA - American Composers Alliance</publisher>
  </composition>
View in Browser

Style Rule for Compositions

<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:template match="/">
    <html>
       <xsl:apply-templates select="catalog"/>    
    </html>
  </xsl:template>

  <xsl:template match="catalog">
      <head><title><xsl:value-of select="category"/></title></head>
      <body>
        <h1><xsl:value-of select="category"/></h1> 
        <xsl:apply-templates select="composer"/> 
        <dl>
          <xsl:apply-templates select="composition"/> 
        </dl>
      </body>
  </xsl:template>

  <xsl:template match="composer">
    <h2 id="{@id}"><xsl:value-of select="."/></h2>
  </xsl:template>

  <xsl:template match="composition">
    <dt><cite><xsl:value-of select="title"/></cite> 
        (<xsl:value-of select="date"/>)
        <xsl:value-of select="length"/>
        <xsl:value-of select="instruments"/>
    </dt>
    <dd>
      <xsl:value-of select="description"/>
      <xsl:value-of select="publisher"/>
    </dd>
  </xsl:template>

</xsl:stylesheet>

Style Sheet Output

<html>
   <head>
      <meta http-equiv="Content-Type" content="application/xml; charset=utf-8">
   
      <title>
         Small chamber ensembles - 2-4 Players by New York Women Composers
         
      </title>
   </head>
   <body>
      <h1>
         Small chamber ensembles - 2-4 Players by New York Women Composers
         
      </h1>
      <h2 id="c1">
         
         Julie 
          
         Mandel
         
         
      </h2>
      <h2 id="c2">
         
         Margaret 
         De 
         Wys
         
         
      </h2>
      <h2 id="c3">
         
         Beth 
          
         Anderson
         
         
      </h2>
      <h2 id="c4">
         
         Linda 
          
         Bouchard
         
         
      </h2>
      <dl>
         <dt><cite>Trio for Flute, Viola and Harp</cite> 
            (1994)
            13'38"fl, hp, vla
         </dt>
         <dd>
            Premiered at Queens College in April, 1996 by Sue Ann Kahn, 
            Christine Ims, and Susan Jolles. In 3 movements : mvt. 1: 5:01
            mvt. 2: 4:11
            mvt. 3: 4:26
            Theodore Presser
         </dd>
         <dt><cite>Charmonium</cite> 
            (1991)
            9'2 vln, vla, vc
         </dt>
         <dd>
            Commissioned as quartet for the Meridian String Quartet. 
            Sonorous, bold. Moderate difficulty. Tape available.
            
         </dd>
         <dt><cite>Invention for Flute and Piano</cite> 
            (1994)
            fl, pn
         </dt>
         <dd>3 movements</dd>
         <dt><cite>Little Trio</cite> 
            (1984)
            4'fl, guit, va
         </dt>
         <dd>ACA</dd>
         <dt><cite>Dr. Blood's Mermaid Lullaby</cite> 
            (1980)
            3'fl or ob, or vn, or vc, pn
         </dt>
         <dd>ACA</dd>
         <dt><cite>Trio: Dream in D</cite> 
            (1980)
            10'fl, pn, vc, or vn, pn, vc
         </dt>
         <dd>
            Rhapsodic. Passionate. Available on CD
            Two by Three from North/South Consonance (1998).
            
         </dd>
         <dt><cite>Propos II</cite> 
            (1985)
            11'2 tpt
         </dt>
         <dd>Arrangement from Propos</dd>
         <dt><cite>Rictus En Mirroir</cite> 
            (1985)
            14'fl, ob, hpschd, vc
         </dt>
         <dd></dd>
      </dl>
   </body>
</html>
View Result in Browser

Attaching the Composer to the Composition

  <composition composer="c3">
    <title>Trio: Dream in D</title>
    <date><year>1980</year></date> 
    <length>10'</length>
    <instruments>fl, pn, vc, or vn, pn, vc</instruments>
    <description>
      Rhapsodic. Passionate. Available on CD 
      <cite><a href="http://www.amazon.com/exec/obidos/ASIN/B000007NMH/qid=913265342/sr%3D1-2/">
       Two by Three</a></cite> from North/South Consonance (1998).
    </description> 
    <publisher></publisher>
  </composition>
View in Browser

Arranging Compositions by Composer

<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:template match="/">
    <html>
       <xsl:apply-templates select="catalog"/>    
    </html>
  </xsl:template>

  <xsl:template match="catalog">
      <head><title><xsl:value-of select="category"/></title></head>
      <body>
        <h1><xsl:value-of select="category"/></h1> 
        <xsl:apply-templates select="composer"/> 

      </body>
  </xsl:template>

  <xsl:template match="composer">
    <h2 id="{@id}"><xsl:value-of select="."/></h2>
    <dl>
      <xsl:apply-templates select="../composition[@composer=current()/@id]"/> 
    </dl>
  </xsl:template>

  <xsl:template match="composition">
    <dt><cite><xsl:value-of select="title"/></cite> 
        (<xsl:value-of select="date"/>)
        <xsl:value-of select="length"/>
        <xsl:value-of select="instruments"/>
    </dt>
    <dd>
      <xsl:value-of select="description"/>
      <xsl:value-of select="publisher"/>
    </dd>
  </xsl:template>

</xsl:stylesheet>

Style Sheet Output

<html>
   <head>
      <meta http-equiv="Content-Type" content="application/xml; charset=utf-8">
   
      <title>
         Small chamber ensembles - 2-4 Players by New York Women Composers
         
      </title>
   </head>
   <body>
      <h1>
         Small chamber ensembles - 2-4 Players by New York Women Composers
         
      </h1>
      <h2 id="c1">
         
         Julie 
          
         Mandel
         
         
      </h2>
      <dl>
         <dt><cite>Trio for Flute, Viola and Harp</cite> 
            (1994)
            13'38"fl, hp, vla
         </dt>
         <dd>
            Premiered at Queens College in April, 1996 by Sue Ann Kahn, 
            Christine Ims, and Susan Jolles. In 3 movements :
            
            mvt. 1: 5:01
            mvt. 2: 4:11
            mvt. 3: 4:26
            
            Theodore Presser
         </dd>
         <dt><cite>Invention for Flute and Piano</cite> 
            (1994)
            fl, pn
         </dt>
         <dd>3 movements</dd>
      </dl>
      <h2 id="c2">
         
         Margaret 
         De 
         Wys
         
         
      </h2>
      <dl>
         <dt><cite>Charmonium</cite> 
            (1991)
            9'2 vln, vla, vc
         </dt>
         <dd>
            Commissioned as quartet for the Meridian String Quartet. 
            Sonorous, bold. Moderate difficulty. Tape available.
            
         </dd>
      </dl>
      <h2 id="c3">
         
         Beth 
          
         Anderson
         
         
      </h2>
      <dl>
         <dt><cite>Little Trio</cite> 
            (1984)
            4'fl, guit, va
         </dt>
         <dd>ACA</dd>
         <dt><cite>Dr. Blood's Mermaid Lullaby</cite> 
            (1980)
            3'fl or ob, or vn, or vc, pn
         </dt>
         <dd>ACA</dd>
         <dt><cite>Trio: Dream in D</cite> 
            (1980)
            10'fl, pn, vc, or vn, pn, vc
         </dt>
         <dd>
            Rhapsodic. Passionate. Available on CD 
            Two by Three 
            from North/South Consonance (1998).
            
         </dd>
      </dl>
      <h2 id="c4">
         
         Linda 
          
         Bouchard
         
         
      </h2>
      <dl>
         <dt><cite>Propos II</cite> 
            (1985)
            11'2 tpt
         </dt>
         <dd>Arrangement from Propos</dd>
         <dt><cite>Rictus En Mirroir</cite> 
            (1985)
            14'fl, ob, hpschd, vc
         </dt>
         <dd></dd>
      </dl>
   </body>
</html>
View Result in Browser

Some Keywords For the Search Engines

<?xml version="1.0" encoding="ISO-8859-1"?>
<catalog>

  <category>
    Small chamber ensembles - 2-4 Players by New York Women Composers
  </category>

  <cataloging_info>
    <abstract>Compositions by the members of New York Women Composers</abstract>
    <keyword>music publishing</keyword>
    <keyword>scores</keyword>
    <keyword>women composers</keyword>
    <keyword>New York</keyword>
  </cataloging_info>

  <composer id="c1">
    <name>
      <first_name>Julie</first_name> 
      <middle_name></middle_name> 
      <last_name>Mandel</last_name>
    </name>
  </composer>
  
  ...
  
</catalog>
View in Browser

Standard Signature

<?xml version="1.0" encoding="ISO-8859-1"?>
<catalog>
...
  <last_updated>July 28, 1999</last_updated>
  <copyright>1999 New York Women Composers</copyright>
  <maintainer email="elharo@metalab.unc.edu" 
              url="http://www.macfaq.com/personal.html">
    <name>
      <first_name>Elliotte</first_name> 
      <middle_name>Rusty</middle_name> 
      <last_name>Harold</last_name>
    </name>
  </maintainer>

</catalog>
View in Browser

Cascading Style Sheets


A Blank Style Sheet

<?xml version="1.0" encoding="ISO-8859-1"?>
<?xml-stylesheet type="text/css" href="compositions1.css"?>
<catalog>
...
</catalog>
View in Browser

The Default Rule

catalog { font-family: New York, Times New Roman, serif; 
       font-size: 14pt; 
       background-color: white; 
       color: black; 
       display: block }

View in Browser

A style rule for the category element

category { display: block; 
       font-family: Helvetica, Arial, sans;
       font-size: 32pt; 
       font-weight: bold; 
       text-align: center}
       
catalog { font-family: New York, Times New Roman, serif; 
       font-size: 14pt; 
       background-color: white; 
       color: black; 
       display: block }

View in Browser

A style rule for the composer element

composer { display: block; 
       font-family: Helvetica, Arial, sans;
       font-size: 24pt; 
       font-weight: bold; 
       text-align: left}     

View in Browser

A style rule for the title element

composition title { display: block; 
       font-family: Helvetica, Arial, sans;
       font-size: 18pt; 
       font-weight: bold; 
       text-align: left}                

View in Browser

A style rule for the catalog info

// cataloging_info is only for search engines
cataloging_info { display: none;
       color: white}                

display: none requires CSS2:


View in Browser

Style rules for the signature

  <last_updated>July 28, 1999</last_updated>
  <copyright>1999 New York Women Composers</copyright>
  <maintainer email="elharo@metalab.unc.edu" 
              url="http://www.macfaq.com/personal.html">
    <name>
      <first_name>Elliotte</first_name> 
      <middle_name>Rusty</middle_name> 
      <last_name>Harold</last_name>
    </name>
  </maintainer>            

last_updated, copyright, maintainer {display: block;
       font-size: small}
       
copyright:before {content: "Copyright " }

last_updated:before {content: "Last Modified " }

last_updated {margin-top: 2ex }

Again, some of this requires CSS2


View in Browser

Style Rules for composition children

composition * {display:list-item}
       
description {display: block}

View in Browser

Finished Style Sheet

category { display: block; 
       font-family: Helvetica, Arial, sans;
       font-size: 32pt; 
       font-weight: bold; 
       text-align: center}
       
catalog { font-family: New York, Times New Roman, serif; 
       font-size: 14pt; 
       background-color: white; 
       color: black; 
       display: block }
      
composer { display: block; 
       font-family: Helvetica, Arial, sans;
       font-size: 24pt; 
       font-weight: bold; 
       text-align: left}  
       
composition title { display: block; 
       font-family: Helvetica, Arial, sans;
       font-size: 18pt; 
       font-weight: bold; 
       text-align: left}
       
composition * {display:list-item}
       
description {display: block}
              
// cataloging_info is only for search engines
cataloging_info { display: none;
       color: #FFFFFF}
       
last_updated, copyright, maintainer {display: block;
       font-size: small}
       
copyright:before {content: "Copyright " }

last_updated:before {content: "Last Modified " }

last_updated {margin-top: 2ex }

Possible Extensions


Possible Solutions


CSS or XSL?


Programming with XML


Several APIs to choose from


SAX


SAX2


The SAX Process


Making an XMLReader


Parsing a Document with XMLReader

import org.xml.sax.*;
import org.xml.sax.helpers.*;
import java.io.*;


public class SAX2Checker {

  public static void main(String[] args) {
    
    if (args.length == 0) {
      System.out.println("Usage: java SAX2Checker URL1 URL2..."); 
    } 
    
    // set up the parser 
    XMLReader parser;
    try {
      parser = XMLReaderFactory.createXMLReader();
    } 
    catch (SAXException e) {
      try {
        parser = XMLReaderFactory.createXMLReader("org.apache.xerces.parsers.SAXParser");
      }
      catch (SAXException e2) {
        System.err.println("Error: could not locate a parser.");
        return;
      }
    }
     
    // start parsing... 
    for (int i = 0; i < args.length; i++) {
      
      // command line should offer URIs or file names
      try {
        parser.parse(args[i]);
        // If there are no well-formedness errors
        // then no exception is thrown
        System.out.println(args[i] + " is well formed.");
      }
      catch (SAXParseException e) { // well-formedness error
        System.out.println(args[i] + " is not well formed.");
        System.out.println(e.getMessage()
         + " at line " + e.getLineNumber() 
         + ", column " + e.getColumnNumber());
      }
      catch (SAXException e) { // some other kind of error
        System.out.println(e.getMessage());
      }
      catch (IOException e) {
        System.out.println("Could not check " + args[i] 
         + " because of the IOException " + e);
      }
      
    }  
  
  }

}

The ContentHandler interface

package org.xml.sax;


public interface ContentHandler {

    public void setDocumentLocator(Locator locator);
    
    public void startDocument() throws SAXException;
    
    public void endDocument()	throws SAXException;
    
    public void startPrefixMapping(String prefix, String uri) 
     throws SAXException;

    public void endPrefixMapping(String prefix) throws SAXException;

    public void startElement(String namespaceURI, String localName,
		 String rawName, Attributes atts) throws SAXException;

    public void endElement(String namespaceURI, String localName,
     String rawName) throws SAXException;

    public void characters(char[] ch, int start, int length) 
     throws SAXException;

    public void ignorableWhitespace(char ch[], int start, int length)
     throws SAXException;

    public void processingInstruction(String target, String data)
     throws SAXException;

    public void skippedEntity(String name) throws SAXException;
     
}

SAX Example

import org.apache.xerces.parsers.*;
import org.w3c.dom.*;
import org.xml.sax.*;
import java.io.IOException;
import java.util.StringTokenizer;


public class SAXWordCount implements ContentHandler {

  private int numWords;
    
  public void startDocument() throws SAXException {
    this.numWords = 0; 
  }

  public void endDocument() throws SAXException {
    System.out.println(numWords + " words");
    System.out.flush();
  }
  
  private StringBuffer sb = new StringBuffer();
  
  public void characters(char[] text, int start, int length) 
   throws SAXException {
    
    sb.append(text, start, length);
    
  }
  
  private void flush() {
    numWords += countWords(sb.toString());
    sb = new StringBuffer();    
  }
  
  // methods that signify a word break
  public void startElement(String namespaceURI, String localName,
	 String rawName, Attributes atts) throws SAXException {
    this.flush(); 
  }
  
  public void endElement(String namespaceURI, String localName,
	 String rawName) throws SAXException {
    this.flush(); 
  }
  
  public void processingInstruction(String target, String data)
   throws SAXException {
    this.flush(); 
  }

  // methods that aren't necessary in this example
  public void startPrefixMapping(String prefix, String uri) 
   throws SAXException {
    // ignore; 
  }

  public void ignorableWhitespace(char[] text, int start, int length)
   throws SAXException {
    // ignore; 
  }
  
  public void endPrefixMapping(String prefix) throws SAXException {
    // ignore; 
  }

  public void skippedEntity(String name) throws SAXException {
    // ignore; 
  }   
  
  public void setDocumentLocator(Locator locator) {}

  private static int countWords(String s) {
    
    if (s == null) return 0;
    s = s.trim();
    if (s.length() == 0) return 0;
    
    StringTokenizer st = new StringTokenizer(s);
    return st.countTokens();
    
  } 

  public static void main(String[] args) {
     
    SAXParser parser = new SAXParser();
    SAXWordCount counter = new SAXWordCount();
    parser.setContentHandler(counter);
    
    for (int i = 0; i < args.length; i++) {
      try {
        parser.parse(args[i]); 
      }
      catch (SAXException e) {
        System.err.println(e); 
      }
      catch (IOException e) {
        System.err.println(e); 
      }
      
    }
  
  } // end main

}
% java SAXWordCount hotcop.xml
16 words

Event Based API Caveats


Document Object Model


The Design of the DOM API


DOM Evolution


Eight Modules:


DOM Trees


org.w3c.dom


The DOM Process


Parsing documents with a DOM Parser Example

import org.apache.xerces.parsers.DOMParser;
import org.xml.sax.SAXException;
import java.io.IOException;
import org.w3c.dom.*;


public class DOMChecker {

  public static void main(String[] args) {
     
    // This is simpler but less flexible than the SAX approach.
    // Perhaps a good creational design pattern is needed here?   
  
    DOMParser parser = new DOMParser();
    
    for (int i = 0; i < args.length; i++) {
      try {
        // Read the entire document into memory
        parser.parse(args[i]); 
       
        Document d = parser.getDocument();
        // work with the document...
      }
      catch (SAXException e) {
        System.err.println(e); 
      }
      catch (IOException e) {
        System.err.println(e); 
      }
      
    }
  
  }

}

DOM Example

import org.apache.xerces.parsers.*;
import org.w3c.dom.*;
import org.xml.sax.*;
import java.io.IOException;
import java.util.StringTokenizer;


public class DOMWordCount {

  public static void main(String[] args) {
     
    DOMParser parser = new DOMParser();
    DOMWordCount counter = new DOMWordCount();
    
    for (int i = 0; i < args.length; i++) {
      try {
        // Read the entire document into memory
        parser.parse(args[i]); 
       
        Document d = parser.getDocument();
        int numWords = countWordsInNode(d);
        System.out.println(numWords + " words");

      }
      catch (SAXException e) {
        System.err.println(e); 
      }
      catch (IOException e) {
        System.err.println(e); 
      }
      
    }
  
  } // end main

  // note use of recursion
  public static int countWordsInNode(Node node) {
    
    int numWords = 0;
    
    if (node.hasChildNodes()) {
      NodeList children = node.getChildNodes();
      for (int i = 0; i < children.getLength(); i++) {
        numWords += countWordsInNode(children.item(i));
      } 
    }  

    int type = node.getNodeType();
    if (type == Node.TEXT_NODE) {
      String s = node.getNodeValue();
      numWords += countWordsInString(s);
    }
    
    return numWords;  
    
  }
  
  private static int countWordsInString(String s) {
    
    if (s == null) return 0;
    s = s.trim();
    if (s.length() == 0) return 0;
    
    StringTokenizer st = new StringTokenizer(s);
    return st.countTokens();
    
  } 

}
% java DOMWordCount hotcop.xml
16 words

JDOM


The JDOM Process


Parsing a Document with JDOM

import org.jdom.JDOMException;
import org.jdom.input.SAXBuilder;


public class JDOMChecker {

  public static void main(String[] args) {
  
    if (args.length == 0) {
      System.out.println("Usage: java JDOMChecker URL1 URL2..."); 
    } 
      
    SAXBuilder builder = new SAXBuilder();
     
    // start parsing... 
    for (int i = 0; i < args.length; i++) {
      
      // command line should offer URIs or file names
      try {
        builder.build(args[i]);
        // If there are no well-formedness errors, 
        // then no exception is thrown
        System.out.println(args[i] + " is well formed.");
      }
      catch (JDOMException e) { // indicates a well-formedness or other error
        System.out.println(args[i] + " is not well formed.");
        System.out.println(e.getMessage());
      }
      
    }   
  
  }

}

Parser Results

% java JDOMChecker shortlogs.xml HelloJDOM.java
shortlogs.xml is well formed.
HelloJDOM.java is not well formed.
The markup in the document preceding the root element must be well-formed.: 
Error on line 1 of XML document: The markup in the document preceding the 
root element must be well-formed.

JDOM Example

import org.jdom.*;
import org.jdom.input.SAXBuilder;
import java.util.*;


public class JDOMWordCount {

  public static void main(String[] args) {
  
    if (args.length == 0) {
      System.out.println("Usage: java JDOMWordCount URL1 URL2..."); 
    } 
      
    SAXBuilder builder = new SAXBuilder();
     
    // start parsing... 
    for (int i = 0; i < args.length; i++) {
      
      // command line should offer URIs or file names
      try {
        Document doc = builder.build(args[i]);
        Element root = doc.getRootElement();
        int numWords = countWordsInElement(root);
        System.out.println(numWords + " words");

      }
      catch (JDOMException e) { // indicates a well-formedness or other error
        System.out.println(args[i] + " is not well formed.");
        System.out.println(e.getMessage());
      }
      
    }   
  
  }

  public static int countWordsInElement(Element element) {
    
    int numWords = 0;
    
    List children = element.getMixedContent();
    Iterator iterator = children.iterator();
    while (iterator.hasNext()) {
      Object o = iterator.next();
      if (o instanceof String) {
        numWords += countWordsInString((String) o);
      } 
      else if (o instanceof Element) {
        // note use of recursion
        numWords += countWordsInElement((Element) o); 
      } 
    }
    
    return numWords;  
    
  }

  private static int countWordsInString(String s) {
    
    if (s == null) return 0;
    s = s.trim();
    if (s.length() == 0) return 0;
    
    StringTokenizer st = new StringTokenizer(s);
    return st.countTokens();
    
  }

}
% java JDOMWordCount hotcop.xml
16 words

Questions?


To Learn More



Index | Cafe con Leche

Copyright 2000 Elliotte Rusty Harold
elharo@metalab.unc.edu
Last Modified September 19, 2000