Extensible Markup Language
A syntax for documents
A Meta-Markup Language
A Structural and Semantic language, not a formatting language
Not just for Web pages
Not like HTML, troff, LaTeX
Make up the tags you need as you need them
The tags you create can be documented in a Document Type Definition (DTD)
A meta syntax for domain-specific markup languages like MusicML, MathML, and XHTML
XML documents form a tree
Element and attribute names reflect the kind of the element
Formatting can be added with a style sheet
<dt>Hot Cop <dd> by Jacques Morali, Henri Belolo, and Victor Willis <ul> <li>Producer: Jacques Morali <li>Publisher: PolyGram Records <li>Length: 6:20 <li>Written: 1978 <li>Artist: Village People </ul>View Document in Browser
<SONG> <TITLE>Hot Cop</TITLE> <COMPOSER>Jacques Morali</COMPOSER> <COMPOSER>Henri Belolo</COMPOSER> <COMPOSER>Victor Willis</COMPOSER> <PRODUCER>Jacques Morali</PRODUCER> <PUBLISHER>PolyGram Records</PUBLISHER> <LENGTH>6:20</LENGTH> <YEAR>1978</YEAR> <ARTIST>Village People</ARTIST> </SONG>View Document in Browser
Plain ASCII or UTF-8 text
.xml is standard file extension
Any standard text editor will work
SONG {display: block; font-family: New York, Times New Roman, serif} TITLE {display: block; font-size: 24pt; font-weight: bold; font-family: Helvetica, sans} COMPOSER {display: block} PRODUCER {display: block} YEAR {display: block} PUBLISHER {display: block} LENGTH {display: block} ARTIST {display: block; font-style: italic}
<?xml-stylesheet type="text/css" href="song1.css"?>
<?xml-stylesheet type="text/css" href="song.css"?> <SONG> <TITLE>Hot Cop</TITLE> <COMPOSER>Jacques Morali</COMPOSER> <COMPOSER>Henri Belolo</COMPOSER> <COMPOSER>Victor Willis</COMPOSER> <PRODUCER>Jacques Morali</PRODUCER> <PUBLISHER>PolyGram Records</PUBLISHER> <LENGTH>6:20</LENGTH> <YEAR>1978</YEAR> <ARTIST>Village People</ARTIST> </SONG>
Cascading Style Sheets Level 1 (CSS1)
Internet Explorer 5.0
Mozilla 5.0
Cascading Style Sheets Level 2 (CSS2)
Internet Explorer 5 (partial)
Mozilla 5.0 (partial)
Extensible Stylesheet Language (XSL)
Internet Explorer 5.0 (older draft, buggy)
LotusXSL, XT, Xalan, Saxon, other non-browser converters
Document Style and Semantics Language (DSSSL)
Jade
<?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="/"> <html> <head><title>Song</title></head> <body> <xsl:apply-templates select="SONG"/> </body> </html> </xsl:template> <xsl:template match="SONG"> <h1> <xsl:value-of select="TITLE"/> by the <xsl:value-of select="ARTIST"/> </h1> <ul> <li>Length: <xsl:value-of select="LENGTH"/></li> <li>Producer: <xsl:value-of select="PRODUCER"/></li> <li>Publisher: <xsl:value-of select="PUBLISHER"/></li> <li>Year: <xsl:value-of select="YEAR"/></li> <xsl:apply-templates select="COMPOSER"/> </ul> </xsl:template> <xsl:template match="COMPOSER"> <li>Composer: <xsl:value-of select="."/></li> </xsl:template> </xsl:stylesheet>
D:\fundamentals\examples> saxon hotcop.xml song3.xsl <html> <head> <title>Song</title> </head> <body> <h1>Hot Cop by the Village People</h1> <ul> <li>Length: 6:20</li> <li>Producer: Jacques Morali</li> <li>Publisher: PolyGram Records</li> <li>Year: 1978</li> <li>Composer: Jacques Morali</li> <li>Composer: Henri Belolo</li> <li>Composer: Victor Willis</li> </ul> </body> </html>
Or alternately:
% java com.icl.saxon.StyleSheet -x org.apache.xerces.parsers.SAXParser xml_fundamentals.xml slides.xsl hotcop.xml song3.xsl
<html>
...
Rules:
Open and close all tags
Empty tags end with />
There is a unique root element
Elements may not overlap
Attribute values are quoted
<
and &
are only used to start tags and entities
Only the five predefined entity references are used
Plus more...
To be valid an XML document must be
Well-formed
Must have a DTD
Must comply with the constraints specified in the DTD
<!ELEMENT SONG (TITLE, COMPOSER+, PRODUCER*, PUBLISHER*, LENGTH?, YEAR?, ARTIST+)> <!ELEMENT TITLE (#PCDATA)> <!ELEMENT COMPOSER (#PCDATA)> <!ELEMENT PRODUCER (#PCDATA)> <!ELEMENT PUBLISHER (#PCDATA)> <!ELEMENT LENGTH (#PCDATA)> <!-- This should be a four digit year like "1999", not a two-digit year like "99" --> <!ELEMENT YEAR (#PCDATA)> <!ELEMENT ARTIST (#PCDATA)>
<!DOCTYPE SONG SYSTEM "song.dtd"> <SONG> <TITLE>Hot Cop</TITLE> <COMPOSER>Jacques Morali</COMPOSER> <COMPOSER>Henri Belolo</COMPOSER> <COMPOSER>Victor Willis</COMPOSER> <PRODUCER>Jacques Morali</PRODUCER> <PUBLISHER>PolyGram Records</PUBLISHER> <LENGTH>6:20</LENGTH> <YEAR>1978</YEAR> <ARTIST>Village People</ARTIST> </SONG>
To check validity you pass the document through a validating parser which should report any errors it finds. For example,
% java dom.DOMCount -v validhotcop.xml [Error] validhotcop.xml:13:9: The content of element type "SONG" must match "(TI TLE,COMPOSER+,PRODUCER*,PUBLISHER*,LENGTH?,YEAR?)". validhotcop.xml: 550 ms (10 elems, 0 attrs, 28 spaces, 98 chars)
A valid document:
% java dom.DOMCount -v validhotcop.xml validhotcop.xml: 291 ms (10 elems, 0 attrs, 28 spaces, 98 chars)
<?xml version="1.0" encoding="UTF-8" standalone="no"?> <?xml-stylesheet type="text/css" href="song.css"?> <!DOCTYPE SONG SYSTEM "expanded_song.dtd"> <SONG xmlns="http://metalab.unc.edu/xml/namespace/song" xmlns:xlink="http://www.w3.org/1999/xlink"> <TITLE>Hot Cop</TITLE> <PHOTO xlink:type="simple" xlink:show="onLoad" xlink:href="hotcop.jpg" ALT="Victor Willis in Cop Outfit" WIDTH="100" HEIGHT="200"/> <COMPOSER>Jacques Morali</COMPOSER> <COMPOSER>Henri Belolo</COMPOSER> <COMPOSER>Victor Willis</COMPOSER> <PRODUCER>Jacques Morali</PRODUCER> <!-- The publisher is actually Polygram but I needed an example of a general entity reference. --> <PUBLISHER xlink:type="simple" xlink:href="http://www.amrecords.com/"> A & M Records </PUBLISHER> <LENGTH>6:20</LENGTH> <YEAR>1978</YEAR> <ARTIST>Village People</ARTIST> </SONG> <!-- You can tell what album I was listening to when I wrote this example -->
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
version
attribute
required
always has the value 1.0
standalone
attribute
yes
no
encoding
attribute
UTF-8
8859_1
etc.
<PHOTO
xlink:type="simple" xlink:show="onLoad" xlink:href="hotcop.jpg"
ALT="Victor Willis in Cop Outfit" WIDTH="100" HEIGHT="200" />
name="value"
same as in HTML
Generally used for meta-information
Attribute values are quoted with either single or double quotes:
Good:
<A HREF="http://metalab.unc.edu/xml/">
<DIV ALIGN='CENTER'>
<A HREF="http://metalab.unc.edu/xml/">
<EMBED SRC="minnesotaswale.aif" hidden="true">
Bad:
<A HREF=http://metalab.unc.edu/xml/>
<DIV ALIGN=CENTER>
<EMBED SRC=minnesotaswale.aif hidden=true>
<EMBED SRC="minnesotaswale.aif" hidden>
<PHOTO
xlink:type="simple" xlink:show="onLoad" xlink:href="hotcop.jpg"
ALT="Victor Willis in Cop Outfit" WIDTH="100" HEIGHT="200" />
Ends with />
instead of >
<PHOTO/>
is semantically the same as <PHOTO></PHOTO>
Just syntactic sugar
<!-- You can tell what album I was
listening to when I wrote this example -->
Essentially the same as in HTML
Let you mix and match different XML vocabularies
URIS identify elements and attributes that belong to different XML applications
Prefixes can change if the URI stay the same
<SONG xmlns="http://metalab.unc.edu/xml/namespace/song"
xmlns:xlink="http://www.w3.org/1999/xlink">
<TITLE>Hot Cop</TITLE>
<PHOTO
xlink:type="simple" xlink:show="onLoad" xlink:href="hotcop.jpg"
ALT="Victor Willis in Cop Outfit" WIDTH="100" HEIGHT="200"/>
<COMPOSER>Jacques Morali</COMPOSER>
<PUBLISHER xlink:type="simple" xlink:href="http://www.amrecords.com/">
A & M Records
</PUBLISHER>
<ARTIST>Village People</ARTIST>
</SONG>
A & M Records
<
and &
are only used to start tags and entities
Good:
<H1>O'Reilly & Associates</H1>
Bad:
<H1>O'Reilly & Associates</H1>
Good:
<CODE>for (int i = 0; i <= args.length; i++ ) { </CODE>
Bad:
<CODE>for (int i = 0; i <= args.length; i++ ) { </CODE>
Only the five predefined entity references are used
Good:
&
<
>
"
'
Bad:
©
®
&tm;
α
é
etc.
Entity references must end with a semicolon.
<
is good
<
is bad
<!ELEMENT SONG (TITLE, PHOTO?, COMPOSER+, PRODUCER*, PUBLISHER*, LENGTH?, YEAR?, ARTIST+)> <!ATTLIST SONG xmlns CDATA #REQUIRED xmlns:xlink CDATA #REQUIRED> <!ELEMENT TITLE (#PCDATA)> <!ELEMENT PHOTO EMPTY> <!ATTLIST PHOTO xlink:type CDATA #FIXED "simple" xlink:href CDATA #REQUIRED xlink:show CDATA #IMPLIED ALT CDATA #REQUIRED WIDTH CDATA #REQUIRED HEIGHT CDATA #REQUIRED > <!ELEMENT COMPOSER (#PCDATA)> <!ELEMENT PRODUCER (#PCDATA)> <!ELEMENT PUBLISHER (#PCDATA)> <!ATTLIST PUBLISHER xlink:type CDATA #IMPLIED xlink:href CDATA #IMPLIED > <!ELEMENT LENGTH (#PCDATA)> <!-- This should be a four digit year like "1999", not a two-digit year like "99" --> <!ELEMENT YEAR (#PCDATA)> <!ELEMENT ARTIST (#PCDATA)>
Domain-Specific Markup Languages
Self-Describing Data
Interchange of Data Among Applications
Non proprietary format
Don't pay for what you don't use
Much data is lost due to format problems
XML is very simple
XML is self-describing
XML is well documented
<PERSON ID="p1100" SEX="M">
<NAME>
<GIVEN>Judson</GIVEN>
<SURNAME>McDaniel</SURNAME>
</NAME>
<BIRTH>
<DATE>21 Feb 1834</DATE>
</BIRTH>
<DEATH>
<DATE>9 Dec 1905</DATE>
</DEATH>
</PERSON>
E-commerce
Syndication
EAI and EDI
A specific markup language that uses the XML meta-syntax is called an XML application
Different XML applications have their own more constricted syntaxes and vocabularies within the broader XML syntax
Further syntax can be layered on top of this; e.g. data typing through schemas
Web Pages
Mathematical Equations
Music Notation
Vector Graphics
Metadata
and more...
<?xml version="1.0"?> <html xmlns="http://www.w3.org/TR/REC-html40" xmlns:m="http://www.w3.org/TR/REC-MathML/" > <head> <title>Fiat Lux</title> <meta name="GENERATOR" content="amaya V1.3b" /> </head> <body> <P> And God said, </P> <math> <m:mrow> <m:msub> <m:mi>δ</m:mi> <m:mi>α</m:mi> </m:msub> <m:msup> <m:mi>F</m:mi> <m:mi>αβ</m:mi> </m:msup> <m:mi></m:mi> <m:mo>=</m:mo> <m:mi></m:mi> <m:mfrac> <m:mrow> <m:mn>4</m:mn> <m:mi>π</m:mi> </m:mrow> <m:mi>c</m:mi> </m:mfrac> <m:mi></m:mi> <m:msup> <m:mi>J</m:mi> <m:mrow> <m:mi>β</m:mi> <m:mo></m:mo> </m:mrow> </m:msup> </m:mrow> </math> <P> and there was light </P> </body> </html>
<?xml version="1.0"?>
<CHANNEL HREF="http://metalab.unc.edu/xml/index.html">
<TITLE>Cafe con Leche</TITLE>
<ITEM HREF="http://metalab.unc.edu/xml/books.html">
<TITLE>Books about XML</TITLE>
</ITEM>
<ITEM HREF="http://metalab.unc.edu/xml/tradeshows.html">
<TITLE>Trade shows and conferences about XML</TITLE>
</ITEM>
<ITEM HREF="http://metalab.unc.edu/xml/lists.htm">
<TITLE>Mailing Lists dedicated to XML</TITLE>
</ITEM>
</CHANNEL>
Joseph Conrad's Heart of Darkness
Vector Markup Language (VML)
Internet Explorer 5.0
Microsoft Office 2000
Scalable Vector Graphics (SVG)
Meta-data
Dublin Core
Better Web searching
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:dc="http://purl.org/DC/>
<rdf:Description about="http://metalab.unc.edu/xml/>
<dc:CREATOR>Elliotte Rusty Harold</dc:CREATOR>
<dc:TITLE>Cafe con Leche</dc:TITLE>
</rdf:Description>
</rdf:RDF>
XSL: The Extensible Stylesheet Language
An XML syntax to replace DTDs
Data typing of element and attribute content
<xsd:schema xmlns:xsd="http://www.w3.org/1999/XMLSchema"> <xsd:element name="SONG" type="SongType"/> <xsd:complexType name="SongType"> <xsd:element name="TITLE" type="xsd:string" minOccurs="1" maxOccurs="1"/> <xsd:element name="COMPOSER" type="xsd:string" minOccurs="1" maxOccurs="unbounded"/> <xsd:element name="PRODUCER" type="xsd:string" minOccurs="0" maxOccurs="unbounded"/> <xsd:element name="PUBLISHER" type="xsd:string" minOccurs="0" maxOccurs="1"/> <xsd:element name="LENGTH" type="xsd:timeDuration" minOccurs="0" maxOccurs="1"/> <xsd:element name="YEAR" type="xsd:year" minOccurs="1" maxOccurs="1"/> <xsd:element name="ARTIST" type="xsd:string" minOccurs="1" maxOccurs="unbounded"/> </xsd:complexType> </xsd:schema>
Any element can be a link
Links can be bi-directional
Links can be separated from the documents they connect
<footnote xlink:type="simple" xlink:href="footnote7.xml">7</footnote>
XPath, the XML Path Language
And MathML and XSL-FO are intended as an output format only. Other languages will be written and then transformed into these formats.
DTDs are only technically XML
Microsoft Office 2000
Netscape What's Related
Examine the data
Design a vocabulary for the data
Write a style sheet
XML documents are trees.
XML elements contain other elements as well as text
Within these limits there's more than one way to organize the data
Hierarchically
Relationally
Objects
The catalog?
A custom Document element?
Choose catalog
for the root element
Everything else will be a descendant of catalog
This is not the only possible choice
<?xml version="1.0" encoding="ISO-8859-1"?> <catalog> Everything else will go here... </catalog>View in Browser
<?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="/"> <html> <head><title></title></head> <body> <xsl:apply-templates/> </body> </html> </xsl:template> </xsl:stylesheet>
<html> <head> <meta http-equiv="Content-Type" content="application/xml; charset=utf-8"> <title></title> </head> <body> Everything else will go here... </body> </html>View Result in Browser
Composers?
Songs/Compositions?
Categories?
All of the Above?
<?xml version="1.0" encoding="ISO-8859-1"?>
<catalog>
<category>
Small chamber ensembles - 2-4 Players by New York Women Composers
</category>
</catalog>
View in Browser<?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="/"> <html> <xsl:apply-templates select="catalog"/> </html> </xsl:template> <xsl:template match="catalog"> <head><title><xsl:value-of select="category"/></title></head> <body> <h1><xsl:value-of select="category"/></h1> </body> </xsl:template> </xsl:stylesheet>
<html> <head> <meta http-equiv="Content-Type" content="application/xml; charset=utf-8"> <title> Small chamber ensembles - 2-4 Players by New York Women Composers </title> </head> <body> <h1> Small chamber ensembles - 2-4 Players by New York Women Composers </h1> </body> </html>View Result in Browser
<?xml version="1.0" encoding="ISO-8859-1"?>
<catalog><category>Small chamber ensembles
- 2-4 Players by New York Women Composers</category></catalog>
View in BrowserEach composer has a name
<?xml version="1.0" encoding="ISO-8859-1"?> <catalog> <category> Small chamber ensembles - 2-4 Players by New York Women Composers </category> <composer> <name>Julie Mandel</name> </composer> <composer> <name>Margaret De Wys</name> </composer> <composer> <name>Beth Anderson</name> </composer> <composer> <name>Linda Bouchard</name> </composer> </catalog>View in Browser
<?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="/"> <html> <xsl:apply-templates select="catalog"/> </html> </xsl:template> <xsl:template match="catalog"> <head><title><xsl:value-of select="category"/></title></head> <body> <h1><xsl:value-of select="category"/></h1> <xsl:apply-templates select="composer"/> </body> </xsl:template> <xsl:template match="composer"> <h2><xsl:value-of select="."/></h2> </xsl:template> </xsl:stylesheet>
<html> <head> <meta http-equiv="Content-Type" content="application/xml; charset=utf-8"> <title> Small chamber ensembles - 2-4 Players by New York Women Composers </title> </head> <body> <h1> Small chamber ensembles - 2-4 Players by New York Women Composers </h1> <h2> Julie Mandel </h2> <h2> Margaret De Wys </h2> <h2> Beth Anderson </h2> <h2> Linda Bouchard </h2> </body> </html>View Result in Browser
It's better for sorting to divide names into first, middle, and last
Some (e.g. middle name) elements may be empty
<?xml version="1.0" encoding="ISO-8859-1"?> <catalog> <category> Small chamber ensembles - 2-4 Players by New York Women Composers </category> <composer> <name> <first_name>Julie</first_name> <middle_name></middle_name> <last_name>Mandel</last_name> </name> </composer> <composer> <name> <first_name>Margaret</first_name> <middle_name>De</middle_name> <last_name>Wys</last_name> </name> </composer> <composer> <name> <first_name>Beth</first_name> <middle_name></middle_name> <last_name>Anderson</last_name> </name> </composer> <composer> <name> <first_name>Linda</first_name> <middle_name></middle_name> <last_name>Bouchard</last_name> </name> </composer> </catalog>View in Browser
<?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="/"> <html> <xsl:apply-templates select="catalog"/> </html> </xsl:template> <xsl:template match="catalog"> <head><title><xsl:value-of select="category"/></title></head> <body> <h1><xsl:value-of select="category"/></h1> <xsl:apply-templates select="composer"/> </body> </xsl:template> <xsl:template match="composer"> <h2><xsl:value-of select="."/></h2> </xsl:template> </xsl:stylesheet>
<html> <head> <meta http-equiv="Content-Type" content="application/xml; charset=utf-8"> <title> Small chamber ensembles - 2-4 Players by New York Women Composers </title> </head> <body> <h1> Small chamber ensembles - 2-4 Players by New York Women Composers </h1> <h2> Julie Mandel </h2> <h2> Margaret De Wys </h2> <h2> Beth Anderson </h2> <h2> Linda Bouchard </h2> </body> </html>View Result in Browser
Some people have the same names
Use an ID number to disambiguate
Store the ID number in an id
attribute
name=value
An element may not have two attributes with the same name
Attribute values must be quoted
<?xml version="1.0" encoding="ISO-8859-1"?> <catalog> <category> Small chamber ensembles - 2-4 Players by New York Women Composers </category> <composer id="c1"> <name> <first_name>Julie</first_name> <middle_name></middle_name> <last_name>Mandel</last_name> </name> </composer> <composer id="c2"> <name> <first_name>Margaret</first_name> <middle_name>De</middle_name> <last_name>Wys</last_name> </name> </composer> <composer id="c3"> <name> <first_name>Beth</first_name> <middle_name></middle_name> <last_name>Anderson</last_name> </name> </composer> <composer id="c4"> <name> <first_name>Linda</first_name> <middle_name></middle_name> <last_name>Bouchard</last_name> </name> </composer> </catalog>View in Browser
<?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="/"> <html> <xsl:apply-templates select="catalog"/> </html> </xsl:template> <xsl:template match="catalog"> <head><title><xsl:value-of select="category"/></title></head> <body> <h1><xsl:value-of select="category"/></h1> <xsl:apply-templates select="composer"/> </body> </xsl:template> <xsl:template match="composer"> <h2 id="{@id}"><xsl:value-of select="."/></h2> </xsl:template> </xsl:stylesheet>
<html> <head> <meta http-equiv="Content-Type" content="application/xml; charset=utf-8"> <title> Small chamber ensembles - 2-4 Players by New York Women Composers </title> </head> <body> <h1> Small chamber ensembles - 2-4 Players by New York Women Composers </h1> <h2 id="c1"> Julie Mandel </h2> <h2 id="c2"> Margaret De Wys </h2> <h2 id="c3"> Beth Anderson </h2> <h2 id="c4"> Linda Bouchard </h2> </body> </html>View Result in Browser
Attribute are for meta-data; elements are for data
Does the reader want to see the information? If yes, use element content; if no, use attributes
Attributes are good for ID numbers, URLs, references, and other information not directly relevant to the reader
Attributes can't hold structure well.
Elements allow you to include meta-meta-data (information about the information about the information).
Not everyone always agrees on what is and isn't meta-data.
Elements are more extensible in the face of future changes.
Let's look at an example of what we want:
Rendered HTML:
Tonal. Commissioned/Premiered by the Redlands' New Music Ensemble. (A swale is a meadow or a marsh where a lot of wild plants grow together. The composer discovered the word when a horse named Swale won the Kentucky Derby several years ago. Since her work is primarily collage of newly composed musical swatches, she has used the name extensively.) ACA - American Composers Alliance
Or in HTML:
<dt><cite>Brass Swale</cite> (1988) 5", tbn, 2 Bfl tpts, bar. hn</dt>
<dd><p>
Tonal. Commissioned/Premiered by the Redlands' New Music
Ensemble. (A swale is a meadow or a marsh where a lot of
wild plants grow together. The composer discovered the word
when a horse named Swale won the Kentucky Derby several
years ago. Since her work is primarily collage of newly
composed musical swatches, she has used the name
extensively.) ACA - American Composers
Alliance</p>
</dd>
Title
Date
Description
List of instruments
Length
Publisher
Some pieces may be missing from some compositions
<composition>
<title>Brass Swale</title>
<date>1988</date>
<length>5"</length>
<instruments>tbn, 2 Bfl tpts, bar, hn</instruments>
<description>
Tonal. Commissioned/Premiered by the Redlands' New Music
Ensemble. (A swale is a meadow or a marsh where a lot of
wild plants grow together. The composer discovered the word
when a horse named Swale won the Kentucky Derby several
years ago. Since her work is primarily collage of newly
composed musical swatches, she has used the name
extensively.)
</description>
<publisher>ACA - American Composers Alliance</publisher>
</composition>
View in Browser<?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="/"> <html> <xsl:apply-templates select="catalog"/> </html> </xsl:template> <xsl:template match="catalog"> <head><title><xsl:value-of select="category"/></title></head> <body> <h1><xsl:value-of select="category"/></h1> <xsl:apply-templates select="composer"/> <dl> <xsl:apply-templates select="composition"/> </dl> </body> </xsl:template> <xsl:template match="composer"> <h2 id="{@id}"><xsl:value-of select="."/></h2> </xsl:template> <xsl:template match="composition"> <dt><cite><xsl:value-of select="title"/></cite> (<xsl:value-of select="date"/>) <xsl:value-of select="length"/> <xsl:value-of select="instruments"/> </dt> <dd> <xsl:value-of select="description"/> <xsl:value-of select="publisher"/> </dd> </xsl:template> </xsl:stylesheet>
<html> <head> <meta http-equiv="Content-Type" content="application/xml; charset=utf-8"> <title> Small chamber ensembles - 2-4 Players by New York Women Composers </title> </head> <body> <h1> Small chamber ensembles - 2-4 Players by New York Women Composers </h1> <h2 id="c1"> Julie Mandel </h2> <h2 id="c2"> Margaret De Wys </h2> <h2 id="c3"> Beth Anderson </h2> <h2 id="c4"> Linda Bouchard </h2> <dl> <dt><cite>Trio for Flute, Viola and Harp</cite> (1994) 13'38"fl, hp, vla </dt> <dd> Premiered at Queens College in April, 1996 by Sue Ann Kahn, Christine Ims, and Susan Jolles. In 3 movements : mvt. 1: 5:01 mvt. 2: 4:11 mvt. 3: 4:26 Theodore Presser </dd> <dt><cite>Charmonium</cite> (1991) 9'2 vln, vla, vc </dt> <dd> Commissioned as quartet for the Meridian String Quartet. Sonorous, bold. Moderate difficulty. Tape available. </dd> <dt><cite>Invention for Flute and Piano</cite> (1994) fl, pn </dt> <dd>3 movements</dd> <dt><cite>Little Trio</cite> (1984) 4'fl, guit, va </dt> <dd>ACA</dd> <dt><cite>Dr. Blood's Mermaid Lullaby</cite> (1980) 3'fl or ob, or vn, or vc, pn </dt> <dd>ACA</dd> <dt><cite>Trio: Dream in D</cite> (1980) 10'fl, pn, vc, or vn, pn, vc </dt> <dd> Rhapsodic. Passionate. Available on CD Two by Three from North/South Consonance (1998). </dd> <dt><cite>Propos II</cite> (1985) 11'2 tpt </dt> <dd>Arrangement from Propos</dd> <dt><cite>Rictus En Mirroir</cite> (1985) 14'fl, ob, hpschd, vc </dt> <dd></dd> </dl> </body> </html>View Result in Browser
<composition composer="c3">
<title>Trio: Dream in D</title>
<date><year>1980</year></date>
<length>10'</length>
<instruments>fl, pn, vc, or vn, pn, vc</instruments>
<description>
Rhapsodic. Passionate. Available on CD
<cite><a href="http://www.amazon.com/exec/obidos/ASIN/B000007NMH/qid=913265342/sr%3D1-2/">
Two by Three</a></cite> from North/South Consonance (1998).
</description>
<publisher></publisher>
</composition>
View in Browser<?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="/"> <html> <xsl:apply-templates select="catalog"/> </html> </xsl:template> <xsl:template match="catalog"> <head><title><xsl:value-of select="category"/></title></head> <body> <h1><xsl:value-of select="category"/></h1> <xsl:apply-templates select="composer"/> </body> </xsl:template> <xsl:template match="composer"> <h2 id="{@id}"><xsl:value-of select="."/></h2> <dl> <xsl:apply-templates select="../composition[@composer=current()/@id]"/> </dl> </xsl:template> <xsl:template match="composition"> <dt><cite><xsl:value-of select="title"/></cite> (<xsl:value-of select="date"/>) <xsl:value-of select="length"/> <xsl:value-of select="instruments"/> </dt> <dd> <xsl:value-of select="description"/> <xsl:value-of select="publisher"/> </dd> </xsl:template> </xsl:stylesheet>
<html> <head> <meta http-equiv="Content-Type" content="application/xml; charset=utf-8"> <title> Small chamber ensembles - 2-4 Players by New York Women Composers </title> </head> <body> <h1> Small chamber ensembles - 2-4 Players by New York Women Composers </h1> <h2 id="c1"> Julie Mandel </h2> <dl> <dt><cite>Trio for Flute, Viola and Harp</cite> (1994) 13'38"fl, hp, vla </dt> <dd> Premiered at Queens College in April, 1996 by Sue Ann Kahn, Christine Ims, and Susan Jolles. In 3 movements : mvt. 1: 5:01 mvt. 2: 4:11 mvt. 3: 4:26 Theodore Presser </dd> <dt><cite>Invention for Flute and Piano</cite> (1994) fl, pn </dt> <dd>3 movements</dd> </dl> <h2 id="c2"> Margaret De Wys </h2> <dl> <dt><cite>Charmonium</cite> (1991) 9'2 vln, vla, vc </dt> <dd> Commissioned as quartet for the Meridian String Quartet. Sonorous, bold. Moderate difficulty. Tape available. </dd> </dl> <h2 id="c3"> Beth Anderson </h2> <dl> <dt><cite>Little Trio</cite> (1984) 4'fl, guit, va </dt> <dd>ACA</dd> <dt><cite>Dr. Blood's Mermaid Lullaby</cite> (1980) 3'fl or ob, or vn, or vc, pn </dt> <dd>ACA</dd> <dt><cite>Trio: Dream in D</cite> (1980) 10'fl, pn, vc, or vn, pn, vc </dt> <dd> Rhapsodic. Passionate. Available on CD Two by Three from North/South Consonance (1998). </dd> </dl> <h2 id="c4"> Linda Bouchard </h2> <dl> <dt><cite>Propos II</cite> (1985) 11'2 tpt </dt> <dd>Arrangement from Propos</dd> <dt><cite>Rictus En Mirroir</cite> (1985) 14'fl, ob, hpschd, vc </dt> <dd></dd> </dl> </body> </html>View Result in Browser
<?xml version="1.0" encoding="ISO-8859-1"?>
<catalog>
<category>
Small chamber ensembles - 2-4 Players by New York Women Composers
</category>
<cataloging_info>
<abstract>Compositions by the members of New York Women Composers</abstract>
<keyword>music publishing</keyword>
<keyword>scores</keyword>
<keyword>women composers</keyword>
<keyword>New York</keyword>
</cataloging_info>
<composer id="c1">
<name>
<first_name>Julie</first_name>
<middle_name></middle_name>
<last_name>Mandel</last_name>
</name>
</composer>
...
</catalog>
View in BrowserCopyright notice
Name of maintainer
Email address of maintainer
Last modified date
<?xml version="1.0" encoding="ISO-8859-1"?>
<catalog>
...
<last_updated>July 28, 1999</last_updated>
<copyright>1999 New York Women Composers</copyright>
<maintainer email="elharo@metalab.unc.edu"
url="http://www.macfaq.com/personal.html">
<name>
<first_name>Elliotte</first_name>
<middle_name>Rusty</middle_name>
<last_name>Harold</last_name>
</name>
</maintainer>
</catalog>
View in BrowserPartially supported by Mozilla and IE 5.0
Full W3C Recommendation
<?xml version="1.0" encoding="ISO-8859-1"?>
<?xml-stylesheet type="text/css" href="compositions1.css"?>
<catalog>
...
</catalog>
View in BrowserNot every element needs a rule
The root element should be at least display: block
catalog { font-family: New York, Times New Roman, serif;
font-size: 14pt;
background-color: white;
color: black;
display: block }
Make it look like an H1 heading
category { display: block;
font-family: Helvetica, Arial, sans;
font-size: 32pt;
font-weight: bold;
text-align: center}
catalog { font-family: New York, Times New Roman, serif;
font-size: 14pt;
background-color: white;
color: black;
display: block }
Make it look like a level 2 head
No need to styleize the first, middle, and last names separately
composer { display: block;
font-family: Helvetica, Arial, sans;
font-size: 24pt;
font-weight: bold;
text-align: left}
composition title { display: block;
font-family: Helvetica, Arial, sans;
font-size: 18pt;
font-weight: bold;
text-align: left}
// cataloging_info is only for search engines
cataloging_info { display: none;
color: white}
display: none
requires CSS2:
<last_updated>July 28, 1999</last_updated>
<copyright>1999 New York Women Composers</copyright>
<maintainer email="elharo@metalab.unc.edu"
url="http://www.macfaq.com/personal.html">
<name>
<first_name>Elliotte</first_name>
<middle_name>Rusty</middle_name>
<last_name>Harold</last_name>
</name>
</maintainer>
last_updated, copyright, maintainer {display: block;
font-size: small}
copyright:before {content: "Copyright " }
last_updated:before {content: "Last Modified " }
last_updated {margin-top: 2ex }
Again, some of this requires CSS2
composition * {display:list-item}
description {display: block}
category { display: block; font-family: Helvetica, Arial, sans; font-size: 32pt; font-weight: bold; text-align: center} catalog { font-family: New York, Times New Roman, serif; font-size: 14pt; background-color: white; color: black; display: block } composer { display: block; font-family: Helvetica, Arial, sans; font-size: 24pt; font-weight: bold; text-align: left} composition title { display: block; font-family: Helvetica, Arial, sans; font-size: 18pt; font-weight: bold; text-align: left} composition * {display:list-item} description {display: block} // cataloging_info is only for search engines cataloging_info { display: none; color: #FFFFFF} last_updated, copyright, maintainer {display: block; font-size: small} copyright:before {content: "Copyright " } last_updated:before {content: "Last Modified " } last_updated {margin-top: 2ex }
Should be able to match composers with compositions
Should be able to sort composers and compositions by name
Should be able to include data from attributes; e.g. the maintainer's email address
Horizontal rules would be nice
Better header (e.g. title
and meta
tags) would be nice
CSS Level 3?
XSL
XSL + JavaScript
CSS has broader support
CSS is more stable
XSL is much more powerful
XSL can be used without browser support by transforming to HTML on the server side
Java works best
C, Perl, Python etc. can also be used
Unicode support is the biggest issue
SAX
DOM
JDOM
Parser specific APIs
Public domain, developed on xml-dev mailing list
Maintained by David Megginson
org.xml.sax
package
Parser independent; programs can plug in different parsers
Event based; the parser pushes data to your handler
Read-only
SAX omits DTD declarations
Adds:
Namespace support
Optional Validation
Optional Lexical events for comments, CDATA sections, entity references
A lot more configurable
Deprecates a lot of SAX1
Adapter classes convert between SAX2 and SAX1 parsers.
Construct a parser-specific implementation of the
XMLReader
interface
Your code registers a ContentHandler
with the parser
An InputSource
feeds the document into the parser
As the document is read, the parser calls back to the
methods of the methods of the ContentHandler
to tell it
what it's seeing in the document.
The XMLReaderFactory.createXMLReader()
method
instantiates an XMLReader
subclass named by
the org.xml.sax.driver
system property:
try {
XMLReader parser = XMLReaderFactory.createXMLReader();
}
catch (SAXException e) {
System.err.println(e);
}
The XMLReaderFactory.createXMLReader(String className)
method
instantiates an XMLReader
subclass named by
its argument:
try {
XMLReader parser
= XMLReaderFactory.createXMLReader("org.apache.xerces.parsers.SAXParser");
}
catch (SAXException e) {
System.err.println(e);
}
Or you can use the constructor in the package-specific class:
XMLReader parser = new SAXParser();
import org.xml.sax.*; import org.xml.sax.helpers.*; import java.io.*; public class SAX2Checker { public static void main(String[] args) { if (args.length == 0) { System.out.println("Usage: java SAX2Checker URL1 URL2..."); } // set up the parser XMLReader parser; try { parser = XMLReaderFactory.createXMLReader(); } catch (SAXException e) { try { parser = XMLReaderFactory.createXMLReader("org.apache.xerces.parsers.SAXParser"); } catch (SAXException e2) { System.err.println("Error: could not locate a parser."); return; } } // start parsing... for (int i = 0; i < args.length; i++) { // command line should offer URIs or file names try { parser.parse(args[i]); // If there are no well-formedness errors // then no exception is thrown System.out.println(args[i] + " is well formed."); } catch (SAXParseException e) { // well-formedness error System.out.println(args[i] + " is not well formed."); System.out.println(e.getMessage() + " at line " + e.getLineNumber() + ", column " + e.getColumnNumber()); } catch (SAXException e) { // some other kind of error System.out.println(e.getMessage()); } catch (IOException e) { System.out.println("Could not check " + args[i] + " because of the IOException " + e); } } } }
package org.xml.sax; public interface ContentHandler { public void setDocumentLocator(Locator locator); public void startDocument() throws SAXException; public void endDocument() throws SAXException; public void startPrefixMapping(String prefix, String uri) throws SAXException; public void endPrefixMapping(String prefix) throws SAXException; public void startElement(String namespaceURI, String localName, String rawName, Attributes atts) throws SAXException; public void endElement(String namespaceURI, String localName, String rawName) throws SAXException; public void characters(char[] ch, int start, int length) throws SAXException; public void ignorableWhitespace(char ch[], int start, int length) throws SAXException; public void processingInstruction(String target, String data) throws SAXException; public void skippedEntity(String name) throws SAXException; }
import org.apache.xerces.parsers.*; import org.w3c.dom.*; import org.xml.sax.*; import java.io.IOException; import java.util.StringTokenizer; public class SAXWordCount implements ContentHandler { private int numWords; public void startDocument() throws SAXException { this.numWords = 0; } public void endDocument() throws SAXException { System.out.println(numWords + " words"); System.out.flush(); } private StringBuffer sb = new StringBuffer(); public void characters(char[] text, int start, int length) throws SAXException { sb.append(text, start, length); } private void flush() { numWords += countWords(sb.toString()); sb = new StringBuffer(); } // methods that signify a word break public void startElement(String namespaceURI, String localName, String rawName, Attributes atts) throws SAXException { this.flush(); } public void endElement(String namespaceURI, String localName, String rawName) throws SAXException { this.flush(); } public void processingInstruction(String target, String data) throws SAXException { this.flush(); } // methods that aren't necessary in this example public void startPrefixMapping(String prefix, String uri) throws SAXException { // ignore; } public void ignorableWhitespace(char[] text, int start, int length) throws SAXException { // ignore; } public void endPrefixMapping(String prefix) throws SAXException { // ignore; } public void skippedEntity(String name) throws SAXException { // ignore; } public void setDocumentLocator(Locator locator) {} private static int countWords(String s) { if (s == null) return 0; s = s.trim(); if (s.length() == 0) return 0; StringTokenizer st = new StringTokenizer(s); return st.countTokens(); } public static void main(String[] args) { SAXParser parser = new SAXParser(); SAXWordCount counter = new SAXWordCount(); parser.setContentHandler(counter); for (int i = 0; i < args.length; i++) { try { parser.parse(args[i]); } catch (SAXException e) { System.err.println(e); } catch (IOException e) { System.err.println(e); } } } // end main }
% java SAXWordCount hotcop.xml 16 words
You do not always have all the information you need at the time of a given callback
You may need to store information in various data structures (stacks, queues,vectors, arrays, etc.) and act on it at a later point
For example, the characters()
method is not guaranteed
to give you the maximum number of contiguous characters. It may
split a single run of characters over multiple method calls.
Defines how XML and HTML documents are represented as objects in programs
Defined in IDL; thus language independent
HTML as well as XML
Writing as well as reading
More complete than SAX or JDOM; covers everything except internal and external DTD subsets
DOM focuses more on the document; SAX focuses more on the parser.
Parser independent interfaces; parser dependent implementation classes. Most programs must use the parser dependent classes. JAXP helps solve this, but so far only for DOM Level 1.
Everything's a Node
:
Extensive use of polymorphism
Lots of casting
Language independence means there's very limited use of the Java class library; Various features are reinvented
Language independence requires no method overloading because not all languages support it.
Several features are poor design in Java, if not in other languages:
Named constants are often shorts
Only one kind of exception; details provided by constants
No Java-specific utility methods
like equals()
, hashCode()
, clone()
, or
toString()
DOM Level 0:
DOM Level 1, a W3C Standard
DOM Level 2, a W3C Standard
Eight Modules:
Core: org.w3c.dom
*
HTML: org.w3c.dom.html
Views: org.w3c.dom.views
StyleSheets: org.w3c.dom.stylesheets
CSS: org.w3c.dom.css
Events: org.w3c.dom.events
*
Traversal: org.w3c.dom.traversal
*
Range: org.w3c.dom.range
Only the core and traversal modules really apply to XML. The other six are for HTML.
* indicates Xerces support
Entire document is represented as a grove of trees.
Each XML document should contain exactly one tree.
A tree contains nodes.
Some nodes may contain other nodes (depending on node type).
Each document node contains:
zero or one doctype nodes
one root element node
zero or more comment and processing instruction nodes
17 interfaces:
DOM Interface | JDOM Equivalent |
---|---|
Attr | Attribute |
CDATASection |
|
CharacterData |
|
Comment | Comment |
Document | Document |
DocumentFragment |
|
DocumentType | DocType |
DOMImplementation |
|
Element | Element |
Entity | Entity |
EntityReference |
|
NamedNodeMap |
|
Node |
|
NodeList |
|
Notation |
|
ProcessingInstruction | ProcessingInstruction |
Text |
|
plus one exception:
DOMException
Plus a bunch of HTML stuff in org.w3c.dom.html
and other packages
we will ignore
Library specific code creates a parser
The parser parses the document and returns an
org.w3c.dom.Document
object.
The entire document is stored in memory.
DOM methods and interfaces are used to extract data from this object
import org.apache.xerces.parsers.DOMParser; import org.xml.sax.SAXException; import java.io.IOException; import org.w3c.dom.*; public class DOMChecker { public static void main(String[] args) { // This is simpler but less flexible than the SAX approach. // Perhaps a good creational design pattern is needed here? DOMParser parser = new DOMParser(); for (int i = 0; i < args.length; i++) { try { // Read the entire document into memory parser.parse(args[i]); Document d = parser.getDocument(); // work with the document... } catch (SAXException e) { System.err.println(e); } catch (IOException e) { System.err.println(e); } } } }
import org.apache.xerces.parsers.*; import org.w3c.dom.*; import org.xml.sax.*; import java.io.IOException; import java.util.StringTokenizer; public class DOMWordCount { public static void main(String[] args) { DOMParser parser = new DOMParser(); DOMWordCount counter = new DOMWordCount(); for (int i = 0; i < args.length; i++) { try { // Read the entire document into memory parser.parse(args[i]); Document d = parser.getDocument(); int numWords = countWordsInNode(d); System.out.println(numWords + " words"); } catch (SAXException e) { System.err.println(e); } catch (IOException e) { System.err.println(e); } } } // end main // note use of recursion public static int countWordsInNode(Node node) { int numWords = 0; if (node.hasChildNodes()) { NodeList children = node.getChildNodes(); for (int i = 0; i < children.getLength(); i++) { numWords += countWordsInNode(children.item(i)); } } int type = node.getNodeType(); if (type == Node.TEXT_NODE) { String s = node.getNodeValue(); numWords += countWordsInString(s); } return numWords; } private static int countWordsInString(String s) { if (s == null) return 0; s = s.trim(); if (s.length() == 0) return 0; StringTokenizer st = new StringTokenizer(s); return st.countTokens(); } }
% java DOMWordCount hotcop.xml 16 words
More Java like tree-based API
Parser independent classes sit on top of parsers and other APIs
Construct an org.jdom.input.SAXBuilder
or an
org.jdom.input.DOMBuilder
Invoke the builder's build()
method to
build a Document
object from a
Reader
InputStream
URL
File
SYSTEM ID String
If there's a problem building the document, a JDOMException
is thrown
Work with the resulting Document
object
import org.jdom.JDOMException; import org.jdom.input.SAXBuilder; public class JDOMChecker { public static void main(String[] args) { if (args.length == 0) { System.out.println("Usage: java JDOMChecker URL1 URL2..."); } SAXBuilder builder = new SAXBuilder(); // start parsing... for (int i = 0; i < args.length; i++) { // command line should offer URIs or file names try { builder.build(args[i]); // If there are no well-formedness errors, // then no exception is thrown System.out.println(args[i] + " is well formed."); } catch (JDOMException e) { // indicates a well-formedness or other error System.out.println(args[i] + " is not well formed."); System.out.println(e.getMessage()); } } } }
% java JDOMChecker shortlogs.xml HelloJDOM.java shortlogs.xml is well formed. HelloJDOM.java is not well formed. The markup in the document preceding the root element must be well-formed.: Error on line 1 of XML document: The markup in the document preceding the root element must be well-formed.
import org.jdom.*; import org.jdom.input.SAXBuilder; import java.util.*; public class JDOMWordCount { public static void main(String[] args) { if (args.length == 0) { System.out.println("Usage: java JDOMWordCount URL1 URL2..."); } SAXBuilder builder = new SAXBuilder(); // start parsing... for (int i = 0; i < args.length; i++) { // command line should offer URIs or file names try { Document doc = builder.build(args[i]); Element root = doc.getRootElement(); int numWords = countWordsInElement(root); System.out.println(numWords + " words"); } catch (JDOMException e) { // indicates a well-formedness or other error System.out.println(args[i] + " is not well formed."); System.out.println(e.getMessage()); } } } public static int countWordsInElement(Element element) { int numWords = 0; List children = element.getMixedContent(); Iterator iterator = children.iterator(); while (iterator.hasNext()) { Object o = iterator.next(); if (o instanceof String) { numWords += countWordsInString((String) o); } else if (o instanceof Element) { // note use of recursion numWords += countWordsInElement((Element) o); } } return numWords; } private static int countWordsInString(String s) { if (s == null) return 0; s = s.trim(); if (s.length() == 0) return 0; StringTokenizer st = new StringTokenizer(s); return st.countTokens(); } }
% java JDOMWordCount hotcop.xml 16 words
The XML Bible
Elliotte Rusty Harold
IDG Books, 1999
ISBN: 0-7645-3236-7
This presentation: http://metalab.unc.edu/xml/slides/bank_of_america/fundamentals/