Part I: XML Basics
Part II: Well-formedness, DTDs and Validity
Part III: Namespaces
Part IV: XSL Transformations
Part V: XML and Databases
Part V: XML Processing APIs
XML succeeded, and in ways that weren't expected - at least not by many. Originally it was conceived as a document-oriented technology for robust quality publishing of documents over networks. the original workplan had three pillars - XML syntax, XML link, and XML stylesheets. Schemas were not high on the agenda and XML was not seen as an infrastructure for middleware or glueware. It was expected that at some stage it would be necessary to manage data but there was little activity in this area in 1997. When developing Chemical Markup Language (which must be one of the first published XML applications), I found the lack of datatypes very frustrating!
Well, XML is now a basic infrastructure of much modern information. I doubt that anyone now designs a protocol, or operating system without including XML. Although this list sometimes complains that XML isn't as clean as we would like, it works, and it works pretty well.
--Peter Murray-Rust on the xml-dev mailing list, Thursday, 07 Feb 2002
Extensible Markup Language
A syntax for documents
A Meta-Markup Language
A Structural and Semantic language, not a formatting language
Not just for Web pages
It has a grammar
It has a vocabulary (sort of)
It can be parsed by machines
It says what things are; not what they do
It is not a programming language
It is not compiled
You can add words to the language
Not like HTML, troff, LaTeX
Make up the tags you need as you need them
The tags you create can be documented in a Document Type Definition (DTD)
A meta syntax for domain-specific markup languages like MusicML, MathML, and XHTML
A specific markup language that uses the XML meta-syntax is called an XML application
Different XML applications have their own more constricted syntaxes and vocabularies within the broader XML syntax
Further syntax can be layered on top of this; e.g. data typing through schemas
Clinical Trial Data Model for drug trials
National Library of Medicine (NLM) XML Data Formats for MEDLINE data over FTP replacing ELHILL Unit Record Format (EURF) on magnetic tape
and many more...
XML documents form a tree
Element and attribute names reflect the kind of the element
Formatting can be added with a style sheet
<dt>Hot Cop <dd> by Jacques Morali, Henri Belolo, and Victor Willis <ul> <li>Producer: Jacques Morali <li>Publisher: PolyGram Records <li>Length: 6:20 <li>Written: 1978 <li>Artist: Village People </ul>View Document in Browser
<SONG> <TITLE>Hot Cop</TITLE> <COMPOSER>Jacques Morali</COMPOSER> <COMPOSER>Henri Belolo</COMPOSER> <COMPOSER>Victor Willis</COMPOSER> <PRODUCER>Jacques Morali</PRODUCER> <PUBLISHER>PolyGram Records</PUBLISHER> <LENGTH>6:20</LENGTH> <YEAR>1978</YEAR> <ARTIST>Village People</ARTIST> </SONG>View Document in Browser
Plain ASCII or UTF-8 text
.xml is customary file extension
Any normal text editor will work
SONG {display: block; font-family: New York, Times New Roman, serif} TITLE {display: block; font-size: 24pt; font-weight: bold; font-family: Helvetica, sans} COMPOSER {display: block} PRODUCER {display: block} YEAR {display: block} PUBLISHER {display: block} LENGTH {display: block} ARTIST {display: block; font-style: italic}
<?xml-stylesheet type="text/css" href="song1.css"?>
<?xml-stylesheet type="text/css" href="song.css"?> <SONG> <TITLE>Hot Cop</TITLE> <COMPOSER>Jacques Morali</COMPOSER> <COMPOSER>Henri Belolo</COMPOSER> <COMPOSER>Victor Willis</COMPOSER> <PRODUCER>Jacques Morali</PRODUCER> <PUBLISHER>PolyGram Records</PUBLISHER> <LENGTH>6:20</LENGTH> <YEAR>1978</YEAR> <ARTIST>Village People</ARTIST> </SONG>
Cascading Style Sheets Level 1 (CSS1)
Internet Explorer 5.0 and later
Mozilla
Netscape 6
Opera 4.0, 5.0
Cascading Style Sheets Level 2 (CSS2)
Internet Explorer 5.0 and later (partial)
Mozilla
Netscape 6
Opera 4.0, 5.0
Extensible Stylesheet Language (XSL)
Mozilla 0.9.9
Internet Explorer 5.0/5.5 (older draft, buggy)
Internet Explorer 6.0
LotusXSL, Xalan, Saxon, other non-browser converters
Document Style and Semantics Language (DSSSL)
Jade
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="/"> <html> <head><title>Song</title></head> <body> <xsl:apply-templates select="SONG"/> </body> </html> </xsl:template> <xsl:template match="SONG"> <h1> <xsl:value-of select="TITLE"/> by the <xsl:value-of select="ARTIST"/> </h1> <ul> <li>Length: <xsl:value-of select="LENGTH"/></li> <li>Producer: <xsl:value-of select="PRODUCER"/></li> <li>Publisher: <xsl:value-of select="PUBLISHER"/></li> <li>Year: <xsl:value-of select="YEAR"/></li> <xsl:apply-templates select="COMPOSER"/> </ul> </xsl:template> <xsl:template match="COMPOSER"> <li>Composer: <xsl:value-of select="."/></li> </xsl:template> </xsl:stylesheet>
D:\fundamentals\examples>saxon hotcop.xml song3.xsl
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<title>Song</title>
</head>
<body>
<h1>Hot Cop
by the
Village People
</h1>
<ul>
<li>Length: 6:20</li>
<li>Producer: Jacques Morali</li>
<li>Publisher: PolyGram Records</li>
<li>Year: 1978</li>
<li>Composer: Jacques Morali</li>
<li>Composer: Henri Belolo</li>
<li>Composer: Victor Willis</li>
</ul>
</body>
</html>
Or alternately:
% java com.icl.saxon.StyleSheet hotcop.xml song3.xsl
<html>
...
CSS has broader support
XSL is much more powerful
XSL can be used without browser support by transforming to HTML on the server side
Rules:
Open and close all tags
Empty-element tags end with />
There is a unique root element
Elements may not overlap
Attribute values are quoted
<
and &
are
only used to start tags and entity references
Only the five predefined entity references are used
Plus more...
To be valid an XML document must be
Well-formed
Must have a Document Type Definition (DTD)
Must comply with the constraints specified in the DTD
<!ELEMENT SONG (TITLE, COMPOSER+, PRODUCER*, PUBLISHER*, LENGTH?, YEAR?, ARTIST+)> <!ELEMENT TITLE (#PCDATA)> <!ELEMENT COMPOSER (#PCDATA)> <!ELEMENT PRODUCER (#PCDATA)> <!ELEMENT PUBLISHER (#PCDATA)> <!ELEMENT LENGTH (#PCDATA)> <!-- This should be a four digit year like "1999", not a two-digit year like "99" --> <!ELEMENT YEAR (#PCDATA)> <!ELEMENT ARTIST (#PCDATA)>
<!DOCTYPE SONG SYSTEM "song.dtd"> <SONG> <TITLE>Hot Cop</TITLE> <COMPOSER>Jacques Morali</COMPOSER> <COMPOSER>Henri Belolo</COMPOSER> <COMPOSER>Victor Willis</COMPOSER> <PRODUCER>Jacques Morali</PRODUCER> <PUBLISHER>PolyGram Records</PUBLISHER> <LENGTH>6:20</LENGTH> <YEAR>1978</YEAR> <ARTIST>Village People</ARTIST> </SONG>
To check validity you pass the document through a validating parser which should report any errors it finds. For example,
% java dom.Counter -v invalidhotcop.xml [Error] invalidhotcop.xml:10:8: The content of element type "SONG" must match "(TITLE,COMPOSER+,PRODUCER*,PUBLISHER*,LENGTH?,YEAR?,ARTIST+)". invalidhotcop.xml: 862;70;0 ms (7 elems, 0 attrs, 19 spaces, 59 chars)
A valid document:
% java dom.Counter -v validhotcop.xml validhotcop.xml: 671;70;0 ms (10 elems, 0 attrs, 28 spaces, 98 chars)
<?xml version="1.0" encoding="UTF-8" standalone="no"?> <?xml-stylesheet type="text/css" href="song.css"?> <!DOCTYPE SONG SYSTEM "expanded_song.dtd"> <SONG xmlns="http://metalab.unc.edu/xml/namespace/song" xmlns:xlink="http://www.w3.org/1999/xlink"> <TITLE>Hot Cop</TITLE> <PHOTO xlink:type="simple" xlink:show="onLoad" xlink:href="hotcop.jpg" ALT="Victor Willis in Cop Outfit" WIDTH="100" HEIGHT="200"/> <COMPOSER>Jacques Morali</COMPOSER> <COMPOSER>Henri Belolo</COMPOSER> <COMPOSER>Victor Willis</COMPOSER> <PRODUCER>Jacques Morali</PRODUCER> <!-- The publisher is actually Polygram but I needed an example of a general entity reference. --> <PUBLISHER xlink:type="simple" xlink:href="http://www.amrecords.com/"> A & M Records </PUBLISHER> <LENGTH>6:20</LENGTH> <YEAR>1978</YEAR> <ARTIST>Village People</ARTIST> </SONG> <!-- You can tell what album I was listening to when I wrote this example -->
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
version
attribute
required
always has the value 1.0
standalone
attribute
yes
no
encoding
attribute
UTF-8
ISO-8859-1
SJIS
etc.
<PHOTO
xlink:type="simple" xlink:show="onLoad" xlink:href="hotcop.jpg"
ALT="Victor Willis in Cop Outfit" WIDTH="100" HEIGHT="200" />
name="value"
same as in HTML
Generally used for meta-information
Attribute values are quoted with either single or double quotes:
Good:
<A HREF="http://www.cafeconleche.org/">
<DIV ALIGN='CENTER'>
<A HREF="http://www.cafeconleche.org/">
<EMBED SRC="minnesotaswale.aif" hidden="true">
Bad:
<A HREF=http://www.cafeconleche.org/>
<DIV ALIGN=CENTER>
<EMBED SRC=minnesotaswale.aif hidden=true>
<EMBED SRC="minnesotaswale.aif" hidden>
<PHOTO
xlink:type="simple" xlink:show="onLoad" xlink:href="hotcop.jpg"
ALT="Victor Willis in Cop Outfit" WIDTH="100" HEIGHT="200" />
Ends with />
instead of >
<PHOTO/>
is semantically the same as <PHOTO></PHOTO>
Just syntactic sugar
<!-- You can tell what album I was
listening to when I wrote this example -->
Essentially the same as in HTML
Let you mix and match different XML vocabularies
URIs identify elements and attributes that belong to different XML applications
Prefixes can change if the URI stay the same
<SONG xmlns="http://www.cafeconleche.org/namespace/song"
xmlns:xlink="http://www.w3.org/1999/xlink">
<TITLE>Hot Cop</TITLE>
<PHOTO
xlink:type="simple" xlink:show="onLoad" xlink:href="hotcop.jpg"
ALT="Victor Willis in Cop Outfit" WIDTH="100" HEIGHT="200"/>
<COMPOSER>Jacques Morali</COMPOSER>
<PUBLISHER xlink:type="simple" xlink:href="http://www.amrecords.com/">
A & M Records
</PUBLISHER>
<ARTIST>Village People</ARTIST>
</SONG>
A & M Records
<
and &
are only used to start tags and entities
Good:
<H1>O'Reilly & Associates</H1>
Bad:
<H1>O'Reilly & Associates</H1>
Good:
<CODE>for (int i = 0; i <= args.length; i++ ) { </CODE>
Bad:
<CODE>for (int i = 0; i <= args.length; i++ ) { </CODE>
Only the five predefined entity references are used
Good:
&
<
>
"
'
Bad:
©
®
&tm;
α
é
etc.
Entity references must end with a semicolon.
<
is good
<
is bad
<!ELEMENT SONG (TITLE, PHOTO?, COMPOSER+, PRODUCER*, PUBLISHER*, LENGTH?, YEAR?, ARTIST+)> <!ATTLIST SONG xmlns CDATA #REQUIRED xmlns:xlink CDATA #REQUIRED> <!ELEMENT TITLE (#PCDATA)> <!ELEMENT PHOTO EMPTY> <!ATTLIST PHOTO xlink:type CDATA #FIXED "simple" xlink:href CDATA #REQUIRED xlink:show CDATA #IMPLIED ALT CDATA #REQUIRED WIDTH CDATA #REQUIRED HEIGHT CDATA #REQUIRED > <!ELEMENT COMPOSER (#PCDATA)> <!ELEMENT PRODUCER (#PCDATA)> <!ELEMENT PUBLISHER (#PCDATA)> <!ATTLIST PUBLISHER xlink:type CDATA #IMPLIED xlink:href CDATA #IMPLIED > <!ELEMENT LENGTH (#PCDATA)> <!-- This should be a four digit year like "1999", not a two-digit year like "99" --> <!ELEMENT YEAR (#PCDATA)> <!ELEMENT ARTIST (#PCDATA)>
Domain-Specific Markup Languages
Self-Describing Data
Interchange of Data Among Applications
Non proprietary format
Don't pay for what you don't use
You don't have to answer the questions you don't care about.
Many free tools
Huge support infrastructure
Much data is lost due to format problems
XML is very simple
XML is self-describing
XML is well documented
<PERSON ID="p1100" SEX="M">
<NAME>
<GIVEN>Judson</GIVEN>
<SURNAME>McDaniel</SURNAME>
</NAME>
<BIRTH>
<DATE>21 Feb 1834</DATE>
</BIRTH>
<DEATH>
<DATE>9 Dec 1905</DATE>
</DEATH>
</PERSON>
E-commerce
Syndication
EAI and EDI
Web Pages
Mathematical Equations
Music Notation
Vector Graphics
Metadata
and more...
<?xml version="1.0"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1 plus MathML 2.0//EN" "http://www.w3.org/TR/MathML2/dtd/xhtml-math11-f.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xmlns:m="http://www.w3.org/1998/Math/MathML" > <head> <title>Fiat Lux</title> </head> <body> <p> And God said, </p> <m:math> <m:mrow> <m:msub> <m:mi>δ</m:mi> <m:mi>α</m:mi> </m:msub> <m:msup> <m:mi>F</m:mi> <m:mi>αβ</m:mi> </m:msup> <m:mi> </m:mi> <m:mo>=</m:mo> <m:mi></m:mi> <m:mfrac> <m:mrow> <m:mn>4</m:mn> <m:mi>π</m:mi> </m:mrow> <m:mi>c</m:mi> </m:mfrac> <m:mi> </m:mi> <m:msup> <m:mi>J</m:mi> <m:mrow> <m:mi>β</m:mi> <m:mo> </m:mo> </m:mrow> </m:msup> </m:mrow> </m:math> <p> and there was light </p> </body> </html>
<?xml version="1.0"?>
<CHANNEL HREF="http://www.cafeconleche.org/index.html">
<TITLE>Cafe con Leche</TITLE>
<ITEM HREF="http://www.cafeconleche.org/books.html">
<TITLE>Books about XML</TITLE>
</ITEM>
<ITEM HREF="http://www.cafeconleche.org/tradeshows.html">
<TITLE>Trade shows and conferences about XML</TITLE>
</ITEM>
<ITEM HREF="http://www.cafeconleche.org/lists.htm">
<TITLE>Mailing Lists dedicated to XML</TITLE>
</ITEM>
</CHANNEL>
Joseph Conrad's Heart of Darkness
Vector Markup Language (VML)
Internet Explorer 5.0
Microsoft Office 2000
Scalable Vector Graphics (SVG)
Meta-data
Dublin Core
Better Web searching
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:dc="http://purl.org/DC/">
<rdf:Description about="http://www.cafeconleche.org/">
<dc:creator>Elliotte Rusty Harold</dc:creator>
<dc:title>Cafe con Leche</dc:title>
</rdf:Description>
</rdf:RDF>
Microsoft Office 2000
Netscape What's Related
XSL: The Extensible Stylesheet Language
<?xml version="1.0"?> <PERIODIC_TABLE> <ATOM STATE="GAS"> <NAME>Hydrogen</NAME> <SYMBOL>H</SYMBOL> <ATOMIC_NUMBER>1</ATOMIC_NUMBER> <ATOMIC_WEIGHT>1.00794</ATOMIC_WEIGHT> <BOILING_POINT UNITS="Kelvin">20.28</BOILING_POINT> <MELTING_POINT UNITS="Kelvin">13.81</MELTING_POINT> <DENSITY UNITS="grams/cubic centimeter"> <!-- At 300K, 1 atm --> 0.0000899 </DENSITY> </ATOM> <ATOM STATE="GAS"> <NAME>Helium</NAME> <SYMBOL>He</SYMBOL> <ATOMIC_NUMBER>2</ATOMIC_NUMBER> <ATOMIC_WEIGHT>4.0026</ATOMIC_WEIGHT> <BOILING_POINT UNITS="Kelvin">4.216</BOILING_POINT> <MELTING_POINT UNITS="Kelvin">0.95</MELTING_POINT> <DENSITY UNITS="grams/cubic centimeter"><!-- At 300K --> 0.0001785 </DENSITY> </ATOM> </PERIODIC_TABLE>
<?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:fo="http://www.w3.org/1999/XSL/Format"> <xsl:output indent="yes"/> <xsl:template match="/"> <fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format"> <fo:layout-master-set> <fo:simple-page-master master-name="only"> <fo:region-body/> </fo:simple-page-master> </fo:layout-master-set> <fo:page-sequence master-reference="only"> <fo:flow flow-name="xsl-region-body"> <xsl:apply-templates select="//ATOM"/> </fo:flow> </fo:page-sequence> </fo:root> </xsl:template> <xsl:template match="ATOM"> <fo:block font-size="20pt" font-family="serif" line-height="30pt"> <xsl:value-of select="NAME"/> </fo:block> </xsl:template> </xsl:stylesheet>
<?xml version="1.0" encoding="utf-8"?> <fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format"> <fo:layout-master-set> <fo:simple-page-master master-name="only"> <fo:region-body/> </fo:simple-page-master> </fo:layout-master-set> <fo:page-sequence master-reference="only"> <fo:flow flow-name="xsl-region-body"> <fo:block font-size="20pt" font-family="serif" line-height="30pt">Hydrogen</fo:block> <fo:block font-size="20pt" font-family="serif" line-height="30pt">Helium</fo:block> </fo:flow> </fo:page-sequence> </fo:root>The PDF Result
An XML syntax that provides an alternative to DTD validation
Data typing of element and attribute content
<?xml version="1.0"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <xsd:element name="SONG" type="SongType"/> <xsd:complexType name="SongType"> <xsd:sequence> <xsd:element name="TITLE" type="xsd:string" minOccurs="1" maxOccurs="1"/> <xsd:element name="COMPOSER" type="xsd:string" minOccurs="1" maxOccurs="unbounded"/> <xsd:element name="PRODUCER" type="xsd:string" minOccurs="0" maxOccurs="unbounded"/> <xsd:element name="PUBLISHER" type="xsd:string" minOccurs="0" maxOccurs="1"/> <xsd:element name="LENGTH" type="xsd:string" minOccurs="1" maxOccurs="1"/> <xsd:element name="YEAR" type="xsd:gYear" minOccurs="1" maxOccurs="1"/> <xsd:element name="ARTIST" type="xsd:string" minOccurs="1" maxOccurs="unbounded"/> <xsd:element name="PRICE" type="xsd:string" minOccurs="0" maxOccurs="1"/> </xsd:sequence> </xsd:complexType> </xsd:schema>
Linking in XML is divided into multiple parts:
A Uniform Resource Identifier (URI) names or locates a resource
An XLink defines connections between two or more documents identified by URIs
XPath identifies particular nodes within a document
An XPointer adds an XPath to a URI
XBase defines the URI against which relative URIs are resolved
XInclude embeds a document identified by a URI inside an XML document.
<?xml version="1.0"?> <story date="January 9, 2001" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xi="http://www.w3.org/2001/XInclude" xml:base="http://www.cafeaulait.org/"> <p> The W3C XML Linking Working Group has pushed the <cite xlink:href="http://www.w3.org/TR/2001/WD-xptr-20010108"> XPointer specification </cite> back to working draft status. The specific issue that was uncovered during Candidate Recommendation was some <em xlink:type="simple" xlink:href="http://www.w3.org/TR/xptr#xpointer(//div[@class='div3'][7])"> confusion </em> over how to integrate XPointers, particularly those in non-XML documents, with namespaces. </p> <p> It's also come to light in this draft that Sun has <em xlink:type="simple" xlink:href= "http://lists.w3.org/Archives/Public/www-xml-linking-comments/2000OctDec/0092.html" > claimed a patent</em> on some of the technologies needed to implement XPointer. I think this is particularly offensive because Eve L. Maler, a Sun employee, served as co-chair of the XML Linking Working Group and a co-editor of the XPointer specification. As usual Sun wants to use this as a club to lock implementers and users into a licensing agreement that goes beyond what Sun and the W3C could otherwise demand. The specific patent is <cite>United States Patent No. 5,659,729, Method and system for implementing hypertext scroll attributes</cite>, issued to Jakob Nielsen in 1997. The patent was filed on February 1, 1996. It claims: </p> <blockquote> <xi:include href= "http://www.delphion.com/details?&pn=US05659729__#xpointer(//abstract)" ></xi:include> </blockquote> </story>
Any element can be a link
Links can be bi-directional
Links can even be multi-directional
Links can be separated from the documents they connect
<footnote xlink:type="simple" xlink:href="footnote7.xml">7</footnote>
Simple links are very similar to HTML links, one-directional, one-element-to-one-document links
Extended links are multi-directional, many-to-many links
An extended link is a list of nodes and a list of the connections between them
<WEBSITE xmlns:xlink="http://www.w3.org/1999/xlink"
xlink:type="extended" xlink:title="Cafe au Lait">
<NAME xlink:type="resource" xlink:label="source">
Cafe au Lait
</NAME>
<HOMESITE xlink:type="locator"
xlink:href="http://www.cafeaulait.org/"
xlink:label="ny"/>
<MIRROR xlink:type="locator"
xlink:title="Cafe au Lait Swedish Mirror"
xlink:label="se"
xlink:href="http://sunsite.kth.se/javafaq"/>
<MIRROR xlink:type="locator"
xlink:title="Cafe au Lait U.S. Mirror"
xlink:label="nc"
xlink:href="http://ibiblio.org/javafaq/"/>
<MIRROR xlink:type="locator"
xlink:title="Cafe au Lait Swiss Mirror"
xlink:label="ch"
xlink:href="http://sunsite.cnlab-switch.ch/javafaq"/>
<CONNECTION xlink:type="arc" xlink:from="source"
xlink:to="ch" xlink:show="replace"
xlink:actuate="onRequest"/>
<CONNECTION xlink:type="arc" xlink:from="source"
xlink:to="ny" xlink:show="replace"
xlink:actuate="onRequest"/>
<CONNECTION xlink:type="arc" xlink:from="source"
xlink:to="se" xlink:show="replace"
xlink:actuate="onRequest"/>
<CONNECTION xlink:type="arc" xlink:from="source"
xlink:to="nc" xlink:show="replace"
xlink:actuate="onRequest"/>
</WEBSITE>
A means of merging multiple XML document or parts thereof
Not yet finished
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE book SYSTEM "book.dtd" >
<book xmlns:xinclude="http://www.w3.org/2001/XInclude">
<title>The Java Developer's Resource</title>
<last_modified>December 3, 2000</last_modified>
<xinclude:include href="getting_started.xml"/>
<xinclude:include href="procedural_java.xml"/>
</book>
XPath, the XML Path Language
MathML and XSL-FO are intended as an output format only. Other languages will be written and then transformed into these formats.
DTDs are only technically XML
A syntax for addressing into an XML document
Used in XPointer and XSLT
Basis for XML Query Language
descendant::language[position()=2]
/child::spec/child::body/child::*/child::language[2]
/spec/body/*/language[2]
A syntax for addressing into an XML document
Extend XPath to support non-well-formed points and ranges
Used by XLink and XInclude
xpointer(id("ebnf"))
xpointer(descendant::language[position()=2])
ebnf
xpointer(/child::spec/child::body/child::*/child::language[2])
xpointer(/spec/body/*/language[2])
/1/14/2
xpointer(id("ebnf"))xpointer(id("EBNF"))
XPointers are normally attached to the end of URIs as fragment identifiers
This is how they're used by XLink and XInclude
http://www.w3.org/TR/1998/REC-xml-19980210.xml#xpointer(id("ebnf"))
http://www.w3.org/TR/1998/REC-xml-19980210.xml#xpointer(descendant::language[position()=2])
http://www.w3.org/TR/1998/REC-xml-19980210.xml#ebnf
http://www.w3.org/TR/1998/REC-xml-19980210.xml#xpointer(/child::spec/child::body/child::*/child::language[2])
http://www.w3.org/TR/1998/REC-xml-19980210.xml#xpointer(/spec/body/*/language[2])
http://www.w3.org/TR/1998/REC-xml-19980210.xml#/1/14/2
http://www.w3.org/TR/1998/REC-xml-19980210.xml#xpointer(id("ebnf"))xpointer(id("EBNF"))
Examine the data
Design a vocabulary for the data
Write a style sheet
XML documents are trees.
XML elements contain other elements as well as text
Within these limits there's more than one way to organize the data
Hierarchically
Relationally
Objects
The catalog?
A custom Document element?
Choose catalog
for the root element
Everything else will be a descendant of catalog
This is not the only possible choice
<?xml version="1.0" encoding="ISO-8859-1"?> <catalog> Everything else will go here... </catalog>View in Browser
Composers?
Songs/Compositions?
Categories?
All of the Above?
<?xml version="1.0" encoding="ISO-8859-1"?>
<catalog>
<category>
Small chamber ensembles - 2-4 Players by New York Women Composers
</category>
</catalog>
View in Browser<?xml version="1.0" encoding="ISO-8859-1"?>
<catalog><category>Small chamber ensembles
- 2-4 Players by New York Women Composers</category></catalog>
View in BrowserEach composer has a name
<?xml version="1.0" encoding="ISO-8859-1"?> <catalog> <category> Small chamber ensembles - 2-4 Players by New York Women Composers </category> <composer> <name>Julie Mandel</name> </composer> <composer> <name>Margaret De Wys</name> </composer> <composer> <name>Beth Anderson</name> </composer> <composer> <name>Linda Bouchard</name> </composer> </catalog>View in Browser
It's better for sorting to divide names into first, middle, and last
Some (e.g. middle name) elements may be empty
<?xml version="1.0" encoding="ISO-8859-1"?> <catalog> <category> Small chamber ensembles - 2-4 Players by New York Women Composers </category> <composer> <name> <first_name>Julie</first_name> <middle_name></middle_name> <last_name>Mandel</last_name> </name> </composer> <composer> <name> <first_name>Margaret</first_name> <middle_name>De</middle_name> <last_name>Wys</last_name> </name> </composer> <composer> <name> <first_name>Beth</first_name> <middle_name></middle_name> <last_name>Anderson</last_name> </name> </composer> <composer> <name> <first_name>Linda</first_name> <middle_name></middle_name> <last_name>Bouchard</last_name> </name> </composer> </catalog>View in Browser
Some people have the same names
Use an ID number to disambiguate
Store the ID number in an id
attribute
name=value
An element may not have two attributes with the same name
Attribute values must be quoted
<?xml version="1.0" encoding="ISO-8859-1"?> <catalog> <category> Small chamber ensembles - 2-4 Players by New York Women Composers </category> <composer id="c1"> <name> <first_name>Julie</first_name> <middle_name></middle_name> <last_name>Mandel</last_name> </name> </composer> <composer id="c2"> <name> <first_name>Margaret</first_name> <middle_name>De</middle_name> <last_name>Wys</last_name> </name> </composer> <composer id="c3"> <name> <first_name>Beth</first_name> <middle_name></middle_name> <last_name>Anderson</last_name> </name> </composer> <composer id="c4"> <name> <first_name>Linda</first_name> <middle_name></middle_name> <last_name>Bouchard</last_name> </name> </composer> </catalog>View in Browser
Attribute are for meta-data; elements are for data.
Does the reader want to see the information? If yes, use element content; if no, use attributes
Attributes are good for ID numbers, URLs, references, and other information not directly relevant to the reader
Attributes can't hold structure well.
Elements allow you to include meta-meta-data (information about the information about the information).
Not everyone always agrees on what is and isn't meta-data.
Elements are more extensible in the face of future changes.
Let's look at an example of what we want:
Rendered HTML:
Tonal. Commissioned/Premiered by the Redlands' New Music Ensemble. (A swale is a meadow or a marsh where a lot of wild plants grow together. The composer discovered the word when a horse named Swale won the Kentucky Derby several years ago. Since her work is primarily collage of newly composed musical swatches, she has used the name extensively.) ACA - American Composers Alliance
Or in HTML:
<dt><cite>Brass Swale</cite> (1988) 5", tbn, 2 Bfl tpts, bar. hn</dt>
<dd><p>
Tonal. Commissioned/Premiered by the Redlands' New Music
Ensemble. (A swale is a meadow or a marsh where a lot of
wild plants grow together. The composer discovered the word
when a horse named Swale won the Kentucky Derby several
years ago. Since her work is primarily collage of newly
composed musical swatches, she has used the name
extensively.) ACA - American Composers
Alliance</p>
</dd>
Title
Date
Description
List of instruments
Length
Publisher
Some pieces may be missing from some compositions
<composition>
<title>Brass Swale</title>
<date>1988</date>
<length>5"</length>
<instruments>tbn, 2 Bfl tpts, bar, hn</instruments>
<description>
Tonal. Commissioned/Premiered by the Redlands' New Music
Ensemble. (A swale is a meadow or a marsh where a lot of
wild plants grow together. The composer discovered the word
when a horse named Swale won the Kentucky Derby several
years ago. Since her work is primarily collage of newly
composed musical swatches, she has used the name
extensively.)
</description>
<publisher>ACA - American Composers Alliance</publisher>
</composition>
View in Browser <composition>
<title>Trio for Flute, Viola and Harp</title>
<date><year>1994</year></date>
<length>13'38"</length>
<instruments>fl, hp, vla</instruments>
<description>
<p>Premiered at Queens College in April, 1996 by Sue Ann Kahn,
Christine Ims, and Susan Jolles. In 3 movements :</p>
<ul>
<li>mvt. 1: 5:01</li>
<li>mvt. 2: 4:11</li>
<li>mvt. 3: 4:26</li>
</ul>
</description>
<publisher>Theodore Presser</publisher>
</composition>
View in Browser <composition composer="c3">
<title>Trio: Dream in D</title>
<date><year>1980</year></date>
<length>10'</length>
<instruments>fl, pn, vc, or vn, pn, vc</instruments>
<description>
Rhapsodic. Passionate. Available on CD
<cite><a href=
"http://www.amazon.com/exec/obidos/ASIN/B000007NMH/qid=913265342/sr%3D1-2/">
Two by Three
</a></cite> from North/South Consonance (1998).
</description>
<publisher></publisher>
</composition>
View in Browser<?xml version="1.0" encoding="ISO-8859-1"?>
<catalog>
<category>
Small chamber ensembles - 2-4 Players by New York Women Composers
</category>
<cataloging_info>
<abstract>Compositions by the members of New York Women Composers</abstract>
<keyword>music publishing</keyword>
<keyword>scores</keyword>
<keyword>women composers</keyword>
<keyword>New York</keyword>
</cataloging_info>
<composer id="c1">
<name>
<first_name>Julie</first_name>
<middle_name></middle_name>
<last_name>Mandel</last_name>
</name>
</composer>
...
</catalog>
View in BrowserCopyright notice
Name of maintainer
Email address of maintainer
Last modified date
<?xml version="1.0" encoding="ISO-8859-1"?>
<catalog>
...
<last_updated>July 28, 1999</last_updated>
<copyright>1999 New York Women Composers</copyright>
<maintainer email="elharo@metalab.unc.edu"
url="http://www.elharo.com/">
<name>
<first_name>Elliotte</first_name>
<middle_name>Rusty</middle_name>
<last_name>Harold</last_name>
</name>
</maintainer>
</catalog>
View in BrowserPartially supported by Mozilla, IE 5.0, and Opera 4.0
Full W3C Recommendation
<?xml version="1.0" encoding="ISO-8859-1"?>
<?xml-stylesheet type="text/css" href="compositions1.css"?>
<catalog>
...
</catalog>
View in BrowserNot every element needs a rule
The root element should be at least display: block
catalog { font-family: "New York", "Times New Roman", serif;
font-size: 14pt;
background-color: white;
color: black;
display: block }
Make it look like an H1 heading
category { display: block;
font-family: Helvetica, Arial, sans;
font-size: 32pt;
font-weight: bold;
text-align: center
}
catalog { font-family: New York, Times New Roman, serif;
font-size: 14pt;
background-color: white;
color: black;
display: block
}
Make it look like a level 2 head
No need to stylize the first, middle, and last names separately
composer { display: block;
font-family: Helvetica, Arial, sans;
font-size: 24pt;
font-weight: bold;
text-align: left
}
composition title { display: block;
font-family: Helvetica, Arial, sans;
font-size: 18pt;
font-weight: bold;
text-align: left
}
// cataloging_info is only for search engines
cataloging_info { display: none;
color: white}
display: none
requires CSS2:
<last_updated>July 28, 1999</last_updated>
<copyright>1999 New York Women Composers</copyright>
<maintainer email="elharo@metalab.unc.edu"
url="http://www.elharo.com/">
<name>
<first_name>Elliotte</first_name>
<middle_name>Rusty</middle_name>
<last_name>Harold</last_name>
</name>
</maintainer>
last_updated, copyright, maintainer {display: block;
font-size: small}
copyright:before {content: "Copyright " }
last_updated:before {content: "Last Modified " }
last_updated {margin-top: 2ex }
Again, some of this requires CSS2
composition * {display:list-item}
description {display: block}
category { display: block; font-family: Helvetica, Arial, sans; font-size: 32pt; font-weight: bold; text-align: center } catalog { font-family: "New York", "Times New Roman", serif; font-size: 14pt; background-color: white; color: black; display: block } composer { display: block; font-family: Helvetica, Arial, sans; font-size: 24pt; font-weight: bold; text-align: left } composition title { display: block; font-family: Helvetica, Arial, sans; font-size: 18pt; font-weight: bold; text-align: left} composition * {display:list-item} description {display: block} // cataloging_info is only for search engines cataloging_info { display: none; color: #FFFFFF} last_updated, copyright, maintainer {display: block; font-size: small} copyright:before {content: "Copyright " } last_updated:before {content: "Last Modified " } last_updated {margin-top: 2ex }
Should be able to match composers with compositions
Should be able to sort composers and compositions by name
Should be able to include data from attributes; e.g. the maintainer's email address
Horizontal rules would be nice
Better header (e.g. title
and meta
tags) would be nice
CSS Level 3?
XSL
XSL + JavaScript
CSS has broader support
CSS is more stable
XSL is much more powerful
XSL can be used without browser support by transforming to HTML on the server side
The great innovation of SGML was requiring documents to define their own type. The great innovation of XML was to remove this requirement.
--Joe English on the xml-dev mailing list, Tue, 26 Mar 2002
There are two levels of conformance to XML
Well-formed documents are correct with or without a DTD. They adhere to the basic syntax rules of XML
Valid documents also adhere to the constraints specified in a DTD
All valid documents are well-formed; not all well-formed document are valid.
Open and close all tags
Empty-element tags end with />
There is a unique root element
Elements may not overlap
Attribute values are quoted
<
and &
are only used to start tags and entity references
Only the five predefined entity references are used
Plus more...
Good:
<p>The quick brown fox jumped over the lazy dog</p>
<li>A very <B>important</B> point</li>
Copyright 1999 Elliotte Rusty Harold<br></br>
Bad:
The quick brown fox jumped over the lazy dog<p>
<li>A very <B>important point
Copyright 1999 Elliotte Rusty Harold<br>
<BR/>
, <HR/>
, and
<IMG/>
instead of
<BR>
, <HR>
, and
<IMG>
Web browsers deal inconsistently with these
Can use <BR></BR>
<HR></HR>
<IMG></IMG>
instead
<BR CLASS="EMPTY"/>
seems to work best.
One element completely contains all other elements of the document
This is HTML
in HTML files
The XML declaration and xml-stylesheet
processing instruction are
not elements
If an element contains a start tag for an element, it must also contain the corresponding end tag
Empty elements may appear anywhere
Every non root element has a parent element
Good:
<A HREF="http://www.cafeconleche.org/">
<DIV ALIGN="CENTER">
<A HREF="http://www.cafeconleche.org/">
<EMBED SRC="minnesotaswale.aif" hidden="hidden">
Bad:
<A HREF=http://www.cafeconleche.org/>
<DIV ALIGN=CENTER>
<EMBED SRC=minnesotaswale.aif hidden=hidden>
<EMBED SRC="minnesotaswale.aif" hidden>
Good:
<H1>O'Reilly & Associates</H1>
Bad:
<H1>O'Reilly & Associates</H1>
Good:
<CODE>for (int i = 0; i <= args.length; i++ ) { </CODE>
Bad:
<CODE>for (int i = 0; i <= args.length; i++ ) { </CODE>
Good:
&
<
>
"
'
Bad:
©
®
&tm;
α
é
etc.
Entity references must end with a semicolon.
<
is good
<
is bad
Decimal:
¡ | ¡ |
¢ | ¢ |
£ | £ |
¤ | ¤ |
¥ | ¥ |
¦ | ¦ |
etc. for all other Unicode values that are allowed in XML documents |
Hexadecimal
¡ | ¡ |
¢ | ¢ |
£ | £ |
¤ | ¤ |
¥ | ¥ |
¦ | ¦ |
etc. for all other Unicode values that are allowed in XML documents |
Not like HTML, troff, LaTeX
Make up the tags you need as you need them
The tags you create can be documented in a Document Type Definition (DTD)
A meta syntax for domain-specific markup languages like MusicML, MathML, and CML
<SONG> <TITLE>Hot Cop</TITLE> <COMPOSER>Jacques Morali</COMPOSER> <COMPOSER>Henri Belolo</COMPOSER> <COMPOSER>Victor Willis</COMPOSER> <PRODUCER>Jacques Morali</PRODUCER> <PUBLISHER>PolyGram Records</PUBLISHER> <LENGTH>6:20</LENGTH> <YEAR>1978</YEAR> <ARTIST>Village People</ARTIST> </SONG>
<!ELEMENT SONG (TITLE, COMPOSER+, PRODUCER*, PUBLISHER*, LENGTH?, YEAR?, ARTIST+)> <!ELEMENT TITLE (#PCDATA)> <!ELEMENT COMPOSER (#PCDATA)> <!ELEMENT PRODUCER (#PCDATA)> <!ELEMENT PUBLISHER (#PCDATA)> <!ELEMENT LENGTH (#PCDATA)> <!-- This should be a four digit year like "1999", not a two-digit year like "99" --> <!ELEMENT YEAR (#PCDATA)> <!ELEMENT ARTIST (#PCDATA)>
<!DOCTYPE SONG SYSTEM "song.dtd"> <SONG> <TITLE>Hot Cop</TITLE> <COMPOSER>Jacques Morali</COMPOSER> <COMPOSER>Henri Belolo</COMPOSER> <COMPOSER>Victor Willis</COMPOSER> <PRODUCER>Jacques Morali</PRODUCER> <PUBLISHER>PolyGram Records</PUBLISHER> <LENGTH>6:20</LENGTH> <YEAR>1978</YEAR> <ARTIST>Village People</ARTIST> </SONG>
To be valid an XML document must be
Well-formed
Must have a document type declaration
Must comply with the constraints specified in the DTD
To check validity you pass the document through a validating parser which should report any errors it finds. For example,
% java sax.Counter -v invalidhotcop.xml Error at (file file:/D:/speaking/SD99EAST/dtds/invalidhotcop.xml, line 10, char 8): Element "<SONG>" is not valid because it does not follow the rule, "(TITLE,C OMPOSER+,PRODUCER*,PUBLISHER*,LENGTH?,YEAR?,ARTIST+)". invalidhotcop.xml: 281 ms
A valid document:
% java sax.Counter -v validhotcop.xml validhotcop.xml: 170 ms
<?xml version="1.0"?> <!DOCTYPE SONG [ <!ELEMENT SONG (TITLE, COMPOSER+, PRODUCER*, PUBLISHER*, LENGTH?, YEAR?, ARTIST+)> <!ELEMENT TITLE (#PCDATA)> <!ELEMENT COMPOSER (#PCDATA)> <!ELEMENT PRODUCER (#PCDATA)> <!ELEMENT PUBLISHER (#PCDATA)> <!ELEMENT LENGTH (#PCDATA)> <!-- This should be a four digit year like "1999", not a two-digit year like "99" --> <!ELEMENT YEAR (#PCDATA)> <!ELEMENT ARTIST (#PCDATA)> ]> <SONG> <TITLE>Hot Cop</TITLE> <COMPOSER>Jacques Morali</COMPOSER> <COMPOSER>Henri Belolo</COMPOSER> <COMPOSER>Victor Willis</COMPOSER> <PRODUCER>Jacques Morali</PRODUCER> <PUBLISHER>PolyGram Records</PUBLISHER> <LENGTH>6:20</LENGTH> <YEAR>1978</YEAR> <ARTIST>Village People</ARTIST> </SONG>
Domain-Specific Markup Languages
Self-Describing Data
Interchange of Data Among Applications
A DTD precisely describes the format
DTDs verify that documents adhere to the format
Ensures interoperability of unrelated tools
DTDs explain the format so reverse engineering isn't as necessary
Comments in DTDs can go even further
<!-- This should be a four digit year like "1999",
not a two-digit year like "99" -->
<!ELEMENT YEAR (#PCDATA)>
E-commerce and syndication
DTDs make sure that two independent applications speak the same language
DTDs detect malformed data
DTDs verify correct data
A specific markup language that uses the XML meta-syntax is called an XML application
Different XML applications have their own more constricted syntaxes and vocabularies within the broader XML syntax
The DTD documents this syntax
Further syntax can be layered on top of this; e.g. data typing through schemas
A Document Type Definition describes the elements and attributes that may appear in a document
Validation compares a particular document against a DTD
Well-formedness is a prerequisite for validity
A DTD lists the elements, attributes, and entities contained in a document
A DTD defines the relationships between different elements and attributes
internal vs. external DTDs
Ensures that data is correct before feeding it into a program
Ensures that a format is followed
Establishes what must be supported
Not all documents need to be valid; sometimes well-formed is enough
<?xml version="1.0" encoding="ISO-8859-1"?> <!DOCTYPE catalog SYSTEM "compositions.dtd"> <catalog> <category> Small chamber ensembles - 2-4 Players by New York Women Composers </category> <cataloging_info> <abstract>Compositions by the members of New York Women Composers</abstract> <keyword>music publishing</keyword> <keyword>scores</keyword> <keyword>women composers</keyword> <keyword>New York</keyword> </cataloging_info> <last_updated>July 28, 1999</last_updated> <copyright>1999 New York Women Composers</copyright> <maintainer email="elharo@metalab.unc.edu" url="http://www.elharo.com/"> <name> <first_name>Elliotte</first_name> <middle_name>Rusty</middle_name> <last_name>Harold</last_name> </name> </maintainer> <composer id="c1"> <name> <first_name>Julie</first_name> <middle_name></middle_name> <last_name>Mandel</last_name> </name> </composer> <composer id="c2"> <name> <first_name>Margaret</first_name> <middle_name>De</middle_name> <last_name>Wys</last_name> </name> </composer> <composer id="c3"> <name> <first_name>Beth</first_name> <middle_name></middle_name> <last_name>Anderson</last_name> </name> </composer> <composer id="c4"> <name> <first_name>Linda</first_name> <middle_name></middle_name> <last_name>Bouchard</last_name> </name> </composer> <composition composer="c1"> <title>Trio for Flute, Viola and Harp</title> <date><year>1994</year></date> <length>13'38"</length> <instruments>fl, hp, vla</instruments> <description> <p>Premiered at Queens College in April, 1996 by Sue Ann Kahn, Christine Ims, and Susan Jolles. In 3 movements :</p> <ul> <li>mvt. 1: 5:01</li> <li>mvt. 2: 4:11</li> <li>mvt. 3: 4:26</li> </ul> </description> <publisher>Theodore Presser</publisher> </composition> <composition composer="c2"> <title>Charmonium</title> <date><year>1991</year></date> <length>9'</length> <instruments>2 vln, vla, vc</instruments> <description> <p>Commissioned as quartet for the Meridian String Quartet. Sonorous, bold. Moderate difficulty. Tape available.</p> </description> </composition> <composition composer="c1"> <title>Invention for Flute and Piano</title> <date><year>1994</year></date> <instruments>fl, pn</instruments> <description><p>3 movements</p></description> </composition> <composition composer="c3"> <title>Little Trio</title> <date><year>1984</year></date> <length>4'</length> <instruments>fl, guit, va</instruments> <publisher>ACA</publisher> </composition> <composition composer="c3"> <title>Dr. Blood's Mermaid Lullaby</title> <date><year>1980</year></date> <length>3'</length> <instruments>fl or ob, or vn, or vc, pn</instruments> <publisher>ACA</publisher> </composition> <composition composer="c3"> <title>Trio: Dream in D</title> <date><year>1980</year></date> <length>10'</length> <instruments>fl, pn, vc, or vn, pn, vc</instruments> <description> <p>Rhapsodic. Passionate. Available on CD <cite> <a href="http://www.amazon.com/exec/obidos/ASIN/B000007NMH/qid=913265342/sr=1-2/"> Two by Three</a></cite> from North/South Consonance (1998).</p> </description> </composition> <composition composer="c4"> <title>Propos II</title> <date><year>1985</year></date> <length>11'</length> <instruments>2 tpt</instruments> <description><p>Arrangement from Propos</p></description> </composition> <composition composer="c4"> <title>Rictus En Mirroir</title> <date><year>1985</year></date> <length>14'</length> <instruments>fl, ob, hpschd, vc</instruments> </composition> </catalog>View in Browser
Each tag must be declared in a <!ELEMENT>
declaration.
A <!ELEMENT>
declaration gives the
name and content specification of the element
The content specification uses a simple regular expression-like grammar to precisely specify what is and isn't allowed in an element
ANY
#PCDATA
Sequences
Choices
Mixed Content
Modifiers
EMPTY
<!ELEMENT catalog ANY>
A catalog
can contain any
child element and/or raw text (parsed character data)
Parsed Character Data; i.e. raw text, no markup. For example,
<year>1984</year>
<!ELEMENT year (#PCDATA)>
Valid:
<year>1999</year>
<year>99</year>
<year>1999 C.E.</year>
<year>
The year of our Lord one thousand, nine hundred, and ninety-nine
</year>
Invalid:
<year>
<month>January</month>
<month>February</month>
<month>March</month>
<month>April</month>
<month>May</month>
<month>June</month>
<month>July</month>
<month>August</month>
<month>September</month>
<month>October</month>
<month>November</month>
<month>December</month>
</year>
There are a number of elements in the example document that only contain PCDATA:
<!ELEMENT category (#PCDATA)>
<!ELEMENT abstract (#PCDATA)>
<!ELEMENT keyword (#PCDATA)>
<!ELEMENT last_updated (#PCDATA)>
<!ELEMENT copyright (#PCDATA)>
<!ELEMENT first_name (#PCDATA)>
<!ELEMENT middle_name (#PCDATA)>
<!ELEMENT last_name (#PCDATA)>
<!ELEMENT title (#PCDATA)>
<!ELEMENT year (#PCDATA)>
<!ELEMENT instruments (#PCDATA)>
<!ELEMENT publisher (#PCDATA)>
<!ELEMENT length (#PCDATA)>
DTDs seem fundamentally more obfuscated than C.
Comments can improve this by giving example elements
Comments are the same as in HTML; e.g. <!-- Comment -->
<!-- e.g. "1999 New York Women Composers",
not "Copyright 1999 New York Women Composers" -->
<!ELEMENT copyright (#PCDATA)>
<date><year>1994</year></date>
To declare that a date
element must have a
year
child:
<!ELEMENT date (year)>
You only have to declare the immediate children
<maintainer email="elharo@metalab.unc.edu"
url="http://www.elharo.com/">
<name>
<first_name>Elliotte</first_name>
<middle_name>Rusty</middle_name>
<last_name>Harold</last_name>
</name>
</maintainer>
<composer id="c1">
<name>
<first_name>Julie</first_name>
<middle_name></middle_name>
<last_name>Mandel</last_name>
</name>
</composer>
To declare that a maintainer
element must have a
name
child:
<!ELEMENT maintainer (name)>
<!ELEMENT composer (name)>
<name>
<first_name>Elliotte</first_name>
<middle_name>Rusty</middle_name>
<last_name>Harold</last_name>
</name>
Separate multiple required child elements with commas; e.g.
<!ELEMENT name (first_name, middle_name, last_name)>
A list of child elements separated by commas is called a sequence
ELEMENT
The element being described must have only child elements, no mixed content
You must know the order of the child elements
You must know the type of each child element
You must know the number of child elements
The number can be relaxed with wild cards
<cataloging_info>
<abstract>Compositions by the members of New York Women Composers</abstract>
<keyword>music publishing</keyword>
<keyword>scores</keyword>
<keyword>women composers</keyword>
<keyword>New York</keyword>
</cataloging_info>
The +
suffix indicates that one or more of that element
is required at that point
<!ELEMENT cataloging_info (abstract, keyword+)>
The *
suffix indicates that zero, one, or more of that element
is required at that point
<!ELEMENT catalog (category, cataloging_info, last_updated, copyright,
maintainer, composer*, composition*)>
<composition composer="c1">
<title>Trio for Flute, Viola and Harp</title>
<date><year>1994</year></date>
<length>13'38"</length>
<instruments>fl, hp, vla</instruments>
<description>
<p>Premiered at Queens College in April, 1996 by Sue Ann Kahn,
Christine Ims, and Susan Jolles. In 3 movements :</p>
<ul>
<li>mvt. 1: 5:01</li>
<li>mvt. 2: 4:11</li>
<li>mvt. 3: 4:26</li>
</ul>
</description>
<publisher>Theodore Presser</publisher>
</composition>
Suffixing an element name with a question mark (?) in the content model indicates that either 0 or 1 (but not more than one) of that element are expected at that position
<!ELEMENT composition
(title, date, length?, instruments, description?, publisher?)>
A choice indicates one element or another but not both
A choice is signified by a vertical bar |
There can be two or more elements in a choice
<!ELEMENT date (year | ISODate)>
<!ELEMENT year (#PCDATA)>
<!ELEMENT ISODate (#PCDATA)>
Parentheses combine several elements into a single element.
Parenthesized elements can be nested inside other parentheses in place of a single element.
The parenthesized elements can be suffixed with a plus sign, a comma, or a question mark.
<!ELEMENT dl (dt, dd)*>
<!ELEMENT ARTICLE (TITLE, (P | PHOTO | GRAPH | SIDEBAR | PULLQUOTE | SUBHEAD)*, BYLINE?)>
<ELEMENT catalog (category, cataloging_info, last_updated,
copyright, maintainer, (composer | composition)*)>
<!ELEMENT description (#PCDATA | ul | a | cite | p)*>
<!ELEMENT cite (#PCDATA | a)*>
<!ELEMENT ul (li*)>
<!ELEMENT li (#PCDATA)>
<!ELEMENT a (#PCDATA)>
<!ELEMENT p (#PCDATA)>
Mixed content is both #PCDATA and child elements in a choice, followed by an asterisk
This is the only way to combine PCDATA with child elements in a content specification
#PCDATA must come first
#PCDATA cannot be used in a sequence
<!ELEMENT BR EMPTY>
<!ELEMENT IMG EMPTY>
<!ELEMENT HR EMPTY>
Mixed content with other content models
Exactly one element of a given type but in any position (The SGML & operator)
Between M and N of a given element
Restrictions on the PCDATA; e.g. that the year
element must contain a four-digit year
Recall this element:
<maintainer email="elharo@metalab.unc.edu"
url="http://www.elharo.com/">
<name>
<first_name>Elliotte</first_name>
<middle_name>Rusty</middle_name>
<last_name>Harold</last_name>
</name>
</maintainer>
It is declared like this:
<!ELEMENT maintainer (name)>
<!ATTLIST maintainer email CDATA "webmaster@nywc.org">
<!ATTLIST maintainer url CDATA "http://www.ibiblio.org/nywc">
The general format of an <!ATTLIST>
declaration is:
<!ATTLIST Element_name Attribute_name Type Default_value>
<maintainer email="elharo@metalab.unc.edu"
url="http://www.elharo.com/">
<name>
<first_name>Elliotte</first_name>
<middle_name>Rusty</middle_name>
<last_name>Harold</last_name>
</name>
</maintainer>
It is declared like this:
<!ELEMENT maintainer (name)>
<!ATTLIST maintainer email CDATA "webmaster@nywc.org">
<!ATTLIST maintainer url CDATA "http://www.ibiblio.org/nywc">
But it can also be declared in a single
<!ATTLIST>
declaration like this:
<!ATTLIST maintainer email
CDATA "webmaster@nywc.org" url CDATA "http://www.ibiblio.org/nywc/">
This is more obvious with better indentation:
<!ATTLIST maintainer email CDATA "webmaster@nywc.org"
url CDATA "http://www.ibiblio.org/nywc/">
A literal string value
One of these three keywords
#REQUIRED
#IMPLIED
#FIXED
No default value is provided in the DTD
Document authors must provide an attribute value for each element
<!ELEMENT maintainer (name)>
<!ATTLIST maintainer email CDATA #REQUIRED
url CDATA #REQUIRED>
<!ELEMENT a (#PCDATA)>
<!ATTLIST a href CDATA #IMPLIED>
No default value in the DTD
Author may (but does not have to) provide a value with each element
Value is the same for all elements
Default value must be provided in DTD
Document author may not change default value
<!ELEMENT maintainer (name)>
<!ATTLIST maintainer email CDATA #FIXED "webmaster@nywc.org"
url CDATA #REQUIRED>
CDATA
ID
IDREF
IDREFS
ENTITY
ENTITIES
NOTATION
NMTOKEN
NMTOKENS
Enumerated
Most general attribute type
Value can be any string of text not containing a raw less-than
sign (<
) or quotation marks ("
)
<!ATTLIST maintainer email CDATA #REQUIRED
url CDATA #IMPLIED>
Value must be an XML name
May include letters, digits, underscores, hyphens, and periods
May not include whitespace
May or may not have the name "id" or "ID"
May contain colons only if used for namespaces
Value must be unique within ID type attributes in the document
Generally the default value is #REQUIRED
<!ELEMENT composer (name)>
<!ATTLIST composer id ID #REQUIRED>
Value matches the ID of an element in the same document
Used for links and the like
Multiple elements may share the same IDREF values
<!ELEMENT composition (title, date, length?,
instruments, description?, publisher?)>
<!ATTLIST composition composer IDREF #REQUIRED>
A list of ID values in the same document
Separated by white space
<!ELEMENT composition (title, date, length?,
instruments, description?, publisher?)>
<!ATTLIST composition composer IDREFS #REQUIRED>
Not a keyword
Refers to a list of possible values from which one must be chosen
Default value is generally provided explicitly
<!ATTLIST P VISIBLE (TRUE | FALSE) "TRUE">
<!ELEMENT category (#PCDATA)>
<!ELEMENT abstract (#PCDATA)>
<!ELEMENT keyword (#PCDATA)>
<!ELEMENT last_updated (#PCDATA)>
<!ELEMENT copyright (#PCDATA)>
<!ELEMENT title (#PCDATA)>
<!ELEMENT instruments (#PCDATA)>
<!ELEMENT publisher (#PCDATA)>
<!ELEMENT length (#PCDATA)>
<!ELEMENT date (year | ISODate)>
<!ELEMENT year (#PCDATA)>
<!ELEMENT ISODate (#PCDATA)>
<!ELEMENT catalog (category, cataloging_info, last_updated,
copyright, maintainer, (composer | composition)*)>
<!ELEMENT cataloging_info (abstract, keyword+)>
<!ELEMENT description (#PCDATA | ul | a | cite | p)*>
<!ELEMENT cite (#PCDATA | a)*>
<!ELEMENT ul (li*)>
<!ELEMENT li (#PCDATA)>
<!ELEMENT a (#PCDATA)>
<!ELEMENT p (#PCDATA)>
<!ELEMENT maintainer (name)>
<!ELEMENT name (first_name, middle_name, last_name)>
<!ELEMENT first_name (#PCDATA)>
<!ELEMENT middle_name (#PCDATA)>
<!ELEMENT last_name (#PCDATA)>
<!ATTLIST maintainer email CDATA #REQUIRED
url CDATA #IMPLIED>
<!ELEMENT composer (name)>
<!ATTLIST composer id ID #REQUIRED>
<!ELEMENT composition (title, date, length?, instruments, description?, publisher?)>
<!ATTLIST composition composer IDREFS #REQUIRED>
<!ATTLIST a href CDATA #REQUIRED>
An abbreviation for commonly used or hard to type text
Begin with an ampersand and end with a semicolon
α
"
©right;
&signature;
Declared in a <!ENTITY>
declaration
<!ENTITY copyright "Copyright 2000">
<!ENTITY quot """>
<!ENTITY signature
"<SIGNATURE>
<COPYRIGHT>2000 Elliotte Rusty Harold</COPYRIGHT>
<EMAIL>elharo@metalab.unc.edu</EMAIL>
<LAST_MODIFIED>March 10, 2000</LAST_MODIFIED>
</SIGNATURE>"
>
<?xml version="1.0" standalone="yes"?> <!DOCTYPE DOCUMENT [ <!ENTITY ERH "Elliotte Rusty Harold"> <!ELEMENT DOCUMENT (TITLE, SIGNATURE)> <!ELEMENT TITLE (#PCDATA)> <!ELEMENT COPYRIGHT (#PCDATA)> <!ELEMENT EMAIL (#PCDATA)> <!ELEMENT LAST_MODIFIED (#PCDATA)> <!ELEMENT SIGNATURE (COPYRIGHT, EMAIL, LAST_MODIFIED)> ]> <DOCUMENT> <TITLE>&ERH;</TITLE> <SIGNATURE> <COPYRIGHT>1999 &ERH;</COPYRIGHT> <EMAIL>elharo@metalab.unc.edu</EMAIL> <LAST_MODIFIED>March 10, 1999</LAST_MODIFIED> </SIGNATURE> </DOCUMENT>View in Browser
A general entity reference that refers to a different file
Parsed and Unparsed
<!ENTITY AlLeiter SYSTEM "mets/AlLeiter.xml"> <!ENTITY ArmandoReynoso SYSTEM "mets/ArmandoReynoso.xml"> <!ENTITY BobbyJones SYSTEM "mets/BobbyJones.xml"> <!ENTITY BradClontz SYSTEM "mets/BradClontz.xml"> <!ENTITY DennisCook SYSTEM "mets/DennisCook.xml"> <!ENTITY GregMcmichael SYSTEM "mets/GregMcMichael.xml"> <!ENTITY HideoNomo SYSTEM "mets/HideoNomo.xml"> <!ENTITY JohnFranco SYSTEM "mets/JohnFranco.xml"> <!ENTITY JosiasManzanillo SYSTEM "mets/JosiasManzanillo.xml"> <!ENTITY OctavioDotel SYSTEM "mets/OctavioDotel.xml"> <!ENTITY RickReed SYSTEM "mets/RickReed.xml"> <!ENTITY RigoBeltran SYSTEM "mets/RigoBeltran.xml"> <!ENTITY WillieBlair SYSTEM "mets/WillieBlair.xml">
Prolog is only a text declaration
Document is not valid and may not be well-formed because it may not have a root element.
<?xml version="1.0" encoding="UTF-8"?> <PLAYER> <GIVEN_NAME>Al</GIVEN_NAME> <SURNAME>Leiter</SURNAME> <P>Starting Pitcher</P> <G>28</G> <GS>28</GS> <W>17</W> <L>6</L> <SV>0</SV> <CG>4</CG> <SO>2</SO> <ERA>2.47</ERA> <IP>193</IP> <HRA>8</HRA> <RA>55</RA> <ER>53</ER> <HB>11</HB> <WP>4</WP> <B>1</B> <WB>71</WB> <K>174</K> </PLAYER>View in Browser
<?xml version="1.0" standalone="no"?> <!DOCTYPE TEAM SYSTEM "team.dtd" [ <!ENTITY % players SYSTEM "mets.dtd"> %players; ] > <TEAM> <TEAM_CITY>New York</TEAM_CITY> <TEAM_NAME>Mets</TEAM_NAME> &AlLeiter; &ArmandoReynoso; &BobbyJones; &BradClontz; &DennisCook; &GregMcmichael; &HideoNomo; &JohnFranco; &JosiasManzanillo; &OctavioDotel; &RickReed; &RigoBeltran; &WillieBlair; </TEAM>View in Browser
Only used in DTDs
Use a %
instead of an &
:
%inlines;
%block;
%mathml-prefix;
%mathml-colon;
Declared in a <!ENTITY %>
declaration
<!ENTITY % ERH "Elliotte Rusty Harold">
<!ENTITY COPY99 "Copyright 1999 %ERH;">
<!ENTITY % inlines
"(PERSON | DEGREE | MODEL | PRODUCT | ANIMAL | INGREDIENT)*">
<!ELEMENT PARAGRAPH %inlines;>
<!ELEMENT CELL %inlines;>
<!ELEMENT HEADING %inlines;>
Only used in DTDs
Pull in other DTD fragments
Add a SYSTEM
to the declaration:
<!ENTITY % player SYSTEM "player.dtd">
%player;
Can use a full URL:
<!ENTITY % player SYSTEM "http://www.cafeconleche.org/dtds/player.dtd">
%player;
<description>
<p>Premiered at Queens College in April, 1996 by Sue Ann Kahn,
Christine Ims, and Susan Jolles. In 3 movements :</p>
<ul>
<li>mvt. 1: 5:01</li>
<li>mvt. 2: 4:11</li>
<li>mvt. 3: 4:26</li>
</ul>
</description>
<!ELEMENT description (#PCDATA | ul | a | cite | p)*>
<!ELEMENT cite (#PCDATA | a)*>
<!ELEMENT ul (li*)>
<!ELEMENT li (#PCDATA)>
<!ELEMENT p (#PCDATA)>
<!ELEMENT a (#PCDATA)>
<!ATTLIST a href CDATA #IMPLIED>
XHTML is a reformulation of HTML as strict XML
Tags must be closed
Attribute values must be quoted
<br/>
instead of <br>
etc.
W3C Recommendation 26 January 2000
Includes three DTDs for HTML:
Strict
Transitional
Frameset
What if we can use one of those DTDs instead of inventing our own?
<!ENTITY % xhtml1 SYSTEM "http://www.w3.org/TR/xhtml1/DTD/strict.dtd">
%xhtml1;
<!ENTITY % xhtml1 SYSTEM "http://www.w3.org/TR/xhtml1/DTD/strict.dtd"> %xhtml1; <!ELEMENT category (#PCDATA)> <!ELEMENT abstract (#PCDATA)> <!ELEMENT keyword (#PCDATA)> <!ELEMENT last_updated (#PCDATA)> <!-- e.g. "1999 New York Women Composers", not "Copyright 1999 New York Women Composers" --> <!ELEMENT copyright (#PCDATA)> <!ELEMENT instruments (#PCDATA)> <!ELEMENT publisher (#PCDATA)> <!ELEMENT length (#PCDATA)> <!ELEMENT date (year | ISODate)> <!ELEMENT year (#PCDATA)> <!ELEMENT ISODate (#PCDATA)> <!ELEMENT catalog (category, cataloging_info, last_updated, copyright, maintainer, (composer | composition)*)> <!ELEMENT cataloging_info (abstract, keyword+)> <!ELEMENT description %Block;> <!ELEMENT maintainer (name)> <!ELEMENT name (first_name, middle_name, last_name)> <!ELEMENT first_name (#PCDATA)> <!ELEMENT middle_name (#PCDATA)> <!ELEMENT last_name (#PCDATA)> <!ATTLIST maintainer email CDATA #REQUIRED url CDATA #IMPLIED> <!ELEMENT composer (name)> <!ATTLIST composer id ID #REQUIRED> <!ELEMENT composition (title, date, length?, instruments, description?, publisher?)> <!ATTLIST composition composer IDREFS #REQUIRED> <!ATTLIST a href CDATA #IMPLIED>
<?xml version="1.0"?>
<!DOCTYPE document SYSTEM "http://www.w3.org/TR/xhtml1/DTD/transitional.dtd" [
<!ELEMENT document %BLOCK; >
]>
<document>
<p>Hello There!</p>
</document>
Use XML syntax to describe the allowed content of an XML document rather than DTD syntax
Allow restrictions to be placed on PCDATA content; e.g. that the contents of an element must be an integer between 1 and 10
Area of active research and development
markup isn't about meaning at all; XML just gives you a way to send a bundle of labeled strings of text, with recursion and internationalization, from point A to point B. Namespaces allow the labels to come from multiple vocabularies, and make it cheap for software to find the labeled chunks it cares about.
--Tim Bray on the xml-dev mailing list
To distinguish between elements and attributes from different vocabularies with different meanings.
To group all related elements and attributes together so that a parser can easily recognize them.
The XLink specification defines an attribute with the name href
.
The XHTML specification also uses href
attributes on some elements.
And the XInclude specification uses href
attributes.
An XSLT style sheet that will transform XHTML documents containing both Scalable Vector Graphics (SVG) pictures and MathML equations into XSL-Formatting object documents.
The a
, title
, script
,
style
and font
elements in XHTML and SVG
The table
element in XHTML and XSL-FO
The text
element in XSLT and SVG
The set
element in MathML and SVG
An XSLT stylesheet that transforms a style sheet in an older version of the XSLT specification to a style sheet in a newer version of the XSLT specification.
Namespaces disambiguate elements with the same name from each other by attaching different prefixes to names from different XML applications.
Each prefix is associated with a URI.
Names whose prefixes are associated with the same URI are in the same namespace.
Names whose prefixes are associated with different URIs are in different namespaces.
Elements and attributes that are in namespaces have names that contain exactly one colon. They look like this:
rdf:description
xlink:type
xsl:template
Everything before the colon is called the prefix
Everything after the colon is called the local part.
The complete name including the colon is called the qualified name.
Each prefix in a qualified name is associated with a URI.
For example, all elements in XSLT 1.0 style sheets are associated with the http://www.w3.org/1999/XSL/Transform URI.
The customary prefix xsl
is a shorthand for the longer URI
http://www.w3.org/1999/XSL/Transform.
You can't use the URI in the element name directly.
{http://www.w3.org/1999/XSL/Transform}template
Prefixes are bound to namespace URIs by attaching an xmlns:prefix
attribute to the prefixed element or one of its ancestors.
<svg:svg xmlns:svg="http://www.w3.org/2000/svg"
width="12cm" height="10cm">
<svg:ellipse rx="110" ry="130" />
<svg:rect x="4cm" y="1cm" width="3cm" height="6cm" />
</svg:svg>
Bindings have scope within the element where they're declared.
An SVG processor can recognize all three of these elements as SVG elements because they all have prefixes bound to the particular URI defined by the SVG specification.
<xhtml:html xmlns:xhtml="http://www.w3.org/1999/xhtml"
xmlns:xlink="http://www.w3.org/1999/xlink">
<xhtml:head><xhtml:title>Three Namespaces</xhtml:title></xhtml:head>
<xhtml:body>
<xhtml:h1 align="center">An Ellipse and a Rectangle</xhtml:h1>
<svg:svg xmlns:svg="http://www.w3.org/2000/svg"
width="12cm" height="10cm">
<svg:ellipse rx="110" ry="130" />
<svg:rect x="4cm" y="1cm" width="3cm" height="6cm" />
</svg:svg>
<xhtml:p xlink:type="simple"
xlink:href="ellipses.html">
More about ellipses
</xhtml:p>
<xhtml:p xlink:type="simple" xlink:href="rectangles.html">
More about rectangles
</xhtml:p>
<xhtml:hr/>
<xhtml:p>Last Modified February 13, 2000</xhtml:p>
</xhtml:body>
</xhtml:html>
<!ATTLIST svg:svg xmlns:svg (CDATA)
#FIXED "http://www.w3.org/2000/svg">
<svg:svg width="12cm" height="10cm">
<svg:ellipse rx="110" ry="130" />
<svg:rect x="4cm" y="1cm" width="3cm" height="6cm" />
</svg:svg>
Unprefixed attributes are never in any namespace.
Being an attribute of an element in the http://www.w3.org/1999/xhtml
namespace is not sufficient to put the attribute in the http://www.w3.org/1999/xhtml
namespace.
The only way an attribute belongs to a namespace is if it has a declared prefix, like xlink:type
and xlink:href
.
Many XML applications have recommended prefixes. For example, SVG elements often use the prefix svg
and Resource Description Framework (RDF) elements often have the prefix rdf
. However, these prefixes are simply conventions, and can be changed based on necessity, convenience or whim.
Before a prefix can be used, it must be bound to a URI.
These URIs are standardized, not the prefixes.
The prefix can change as long as the URI stays the same.
Purely formal
Can point somewhere but do not have to
Parsers compare namespace URIs on a character by character basis. These are three different namespaces:
http://www.w3.org/1999/XSL/Transform
http://www.w3.org/1999/XSL/Transform/
http://www.w3.org/1999/XSL/Transform/index.html
Indicate that an unprefixed element and all its unprefixed descendant
elements belong to a particular namespace by attaching an xmlns
attribute with no prefix:
<DATASCHEMA xmlns="http://www.w3.org/2000/P3Pv1">
<DATA name="vehicle.make" type="text" short="Make"
category="preference" size="31"/>
<DATA name="vehicle.model" type="text" short="Model"
category="preference" size="31"/>
<DATA name="vehicle.year" type="number" short="Year"
category="preference" size="4"/>
<DATA name="vehicle.license.state." type="postal." short="State"
category="preference" size="2"/>
<DATA name="vehicle.license.number" type="text"
short="License Plate Number" category="preference" size="12"/>
</DATASCHEMA>
Both the DATASCHEMA
and DATA
elements are in the
http://www.w3.org/2000/P3Pv1 namespace.
Default namespaces apply only to elements, not to attributes.
Thus in the above example the name
, type
, short
, category
, and size
attributes are not in any namespace.
You can change the default namespace within a particular
element by adding an xmlns
attribute to the element.
<html xmlns="http://www.w3.org/1999/xhtml"
xmlns:xlink="http://www.w3.org/XML/XLink/0.9">
<head><title>Three Namespaces</title></head>
<body>
<h1 align="center">An Ellipse and a Rectangle</h1>
<svg xmlns="http://www.w3.org/2000/svg"
width="12cm" height="10cm">
<ellipse rx="110" ry="130" />
<rect x="4cm" y="1cm" width="3cm" height="6cm" />
</svg>
<p xlink:type="simple" xlink:href="ellipses.html">
More about ellipses
</p>
<p xlink:type="simple" xlink:href="rectangles.html">
More about rectangles
</p>
<hr/>
<p>Last Modified February 13, 2000</p>
</body>
</html>
<!ATTLIST svg xmlns (CDATA)
#FIXED "http://www.w3.org/2000/svg">
<svg width="12cm" height="10cm">
<ellipse rx="110" ry="130" />
<rect x="4cm" y="1cm" width="3cm" height="6cm" />
</svg>
Namespaces were added to XML 1.0 after the fact, but care was taken to ensure backwards compatibility.
An XML 1.0 parser that does not know about namespaces will most likely not have any troubles reading a document that uses namespaces.
A namespace aware parser also checks to see that all prefixes are mapped to URIs. Otherwise it behaves almost exactly like a non-namespace aware parser.
Other software that sits on top of the raw XML parser, an XSLT engine for example, may treat elements differently depending on what namespace they belong to. However, the XML parser itself mostly doesn't care as long as all well- formedness and namespace constraints are met.
A possible exception occurs in the unlikely event that elements with different prefixes belong to the same namespace or elements with the same prefix belong to different namespaces
Many parsers have the option of whether to report namespace violations so that you can turn namespace processing on or off as you see fit.
DTDs must declare the qualified names
<!ELEMENT svg:text (#PCDATA)>
If the prefix changes, the DTD needs to change to.
Parameter entity references can help when the prefix changes or is removed:
<!ENTITY % mathml-colon ''>
<!ENTITY % mathml-prefix ''>
<!ENTITY % mathml-exp '%mathml-prefix;%mathml-colon;exp' >
<!ENTITY % mathml-abs '%mathml-prefix;%mathml-colon;abs' >
<!ENTITY % mathml-arg '%mathml-prefix;%mathml-colon;arg' >
<!ENTITY % mathml-real '%mathml-prefix;%mathml-colon;real' >
<!ENTITY % mathml-imaginary '%mathml-prefix;%mathml-colon;imaginary' >
<!ELEMENT %mathml-prefix;%mathml-colon;%mathml-exp; EMPTY>
people are won over to XSLT once they have a chance to get some perspective, and this usually doesn't take long at all. I have had much occasion working with developers who curse and splutter all the time at all the little tripping points of XSLT that Mike Kay points out. But at least twice, I remember a developer saying, after a few days of this, something to the effect of "wow, XSLT is certainly a different way of thinking, but I must say that I accomplished my XML processing task using XSLT in a fraction of the time it took for me to do the same with DOM and Java".
--Uche Ogbuji on the xml-dev mailing list, Mon, 25 Mar 2002
The Extensible Stylesheet Language
Two parts:
A transformation language (XSLT)
A formatting language (XSL-FO)
This talk covers:
XSL Transformations: November 16, 1999 1.0 Specification
XSLT 2.0 is under development.
The XML parser reads an XML document and forms a tree
The tree is passed to the XSLT processor
The XSLT processor compares the nodes in the tree to the instructions in the style sheet
When the XSLT processor finds a match, it outputs a tree fragment
(Optional) The complete output tree is serialized to some other format such as text, HTML, or an XML file
<?xml version="1.0" encoding="ISO-8859-1"?> <!DOCTYPE catalog SYSTEM "compositions.dtd"> <catalog> <category> Small chamber ensembles - 2-4 Players by New York Women Composers </category> <cataloging_info> <abstract>Compositions by the members of New York Women Composers</abstract> <keyword>music publishing</keyword> <keyword>scores</keyword> <keyword>women composers</keyword> <keyword>New York</keyword> </cataloging_info> <last_updated>July 28, 1999</last_updated> <copyright>1999 New York Women Composers</copyright> <maintainer email="elharo@metalab.unc.edu" url="http://www.elharo.com/"> <name> <first_name>Elliotte</first_name> <middle_name>Rusty</middle_name> <last_name>Harold</last_name> </name> </maintainer> <composer id="c1"> <name> <first_name>Julie</first_name> <middle_name></middle_name> <last_name>Mandel</last_name> </name> </composer> <composer id="c2"> <name> <first_name>Margaret</first_name> <middle_name>De</middle_name> <last_name>Wys</last_name> </name> </composer> <composer id="c3"> <name> <first_name>Beth</first_name> <middle_name></middle_name> <last_name>Anderson</last_name> </name> </composer> <composer id="c4"> <name> <first_name>Linda</first_name> <middle_name></middle_name> <last_name>Bouchard</last_name> </name> </composer> <composition composer="c1"> <title>Trio for Flute, Viola and Harp</title> <date><year>1994</year></date> <length>13'38"</length> <instruments>fl, hp, vla</instruments> <description> <p>Premiered at Queens College in April, 1996 by Sue Ann Kahn, Christine Ims, and Susan Jolles. In 3 movements :</p> <ul> <li>mvt. 1: 5:01</li> <li>mvt. 2: 4:11</li> <li>mvt. 3: 4:26</li> </ul> </description> <publisher>Theodore Presser</publisher> </composition> <composition composer="c2"> <title>Charmonium</title> <date><year>1991</year></date> <length>9'</length> <instruments>2 vln, vla, vc</instruments> <description> <p>Commissioned as quartet for the Meridian String Quartet. Sonorous, bold. Moderate difficulty. Tape available.</p> </description> </composition> <composition composer="c1"> <title>Invention for Flute and Piano</title> <date><year>1994</year></date> <instruments>fl, pn</instruments> <description><p>3 movements</p></description> </composition> <composition composer="c3"> <title>Little Trio</title> <date><year>1984</year></date> <length>4'</length> <instruments>fl, guit, va</instruments> <publisher>ACA</publisher> </composition> <composition composer="c3"> <title>Dr. Blood's Mermaid Lullaby</title> <date><year>1980</year></date> <length>3'</length> <instruments>fl or ob, or vn, or vc, pn</instruments> <publisher>ACA</publisher> </composition> <composition composer="c3"> <title>Trio: Dream in D</title> <date><year>1980</year></date> <length>10'</length> <instruments>fl, pn, vc, or vn, pn, vc</instruments> <description> <p>Rhapsodic. Passionate. Available on CD <cite> <a href="http://www.amazon.com/exec/obidos/ASIN/B000007NMH/qid=913265342/sr=1-2/"> Two by Three</a></cite> from North/South Consonance (1998).</p> </description> </composition> <composition composer="c4"> <title>Propos II</title> <date><year>1985</year></date> <length>11'</length> <instruments>2 tpt</instruments> <description><p>Arrangement from Propos</p></description> </composition> <composition composer="c4"> <title>Rictus En Mirroir</title> <date><year>1985</year></date> <length>14'</length> <instruments>fl, ob, hpschd, vc</instruments> </composition> </catalog>
<?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> </xsl:stylesheet>
Windows executable:
C:\>saxon compositions.xml sheet1.xsl
Java executable:
C:\>java com.icl.saxon.StyleSheet compositions.xml sheet1.xsl output1.html
<html>
...
<?xml version="1.0" encoding="utf-8"?> Small chamber ensembles - 2-4 Players by New York Women Composers Compositions by the members of New York Women Composers music publishing scores women composers New York July 28, 1999 1999 New York Women Composers Elliotte Rusty Harold Julie Mandel Margaret De Wys Beth Anderson Linda Bouchard Trio for Flute, Viola and Harp 1994 13'38" fl, hp, vla Premiered at Queens College in April, 1996 by Sue Ann Kahn, Christine Ims, and Susan Jolles. In 3 movements : mvt. 1: 5:01 mvt. 2: 4:11 mvt. 3: 4:26 Theodore Presser Charmonium 1991 9' 2 vln, vla, vc Commissioned as quartet for the Meridian String Quartet. Sonorous, bold. Moderate difficulty. Tape available. Invention for Flute and Piano 1994 fl, pn 3 movements Little Trio 1984 4' fl, guit, va ACA Dr. Blood's Mermaid Lullaby 1980 3' fl or ob, or vn, or vc, pn ACA Trio: Dream in D 1980 10' fl, pn, vc, or vn, pn, vc Rhapsodic. Passionate. Available on CD Two by Three from North/South Consonance (1998). Propos II 1985 11' 2 tpt Arrangement from Propos Rictus En Mirroir 1985 14' fl, ob, hpschd, vcView Transformed Document
<?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="composition"> <h3><xsl:value-of select="title"/></h3> </xsl:template> </xsl:stylesheet>View Transformed Document in Browser
<?xml version="1.0" encoding="utf-8"?> Small chamber ensembles - 2-4 Players by New York Women Composers Compositions by the members of New York Women Composers music publishing scores women composers New York July 28, 1999 1999 New York Women Composers Elliotte Rusty Harold Julie Mandel Margaret De Wys Beth Anderson Linda Bouchard <h3>Trio for Flute, Viola and Harp</h3> <h3>Charmonium</h3> <h3>Invention for Flute and Piano</h3> <h3>Little Trio</h3> <h3>Dr. Blood's Mermaid Lullaby</h3> <h3>Trio: Dream in D</h3> <h3>Propos II</h3> <h3>Rictus En Mirroir</h3>
<?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="/"> <html> <body> </body> </html> </xsl:template> <xsl:template match="composition"> <h3><xsl:value-of select="title"/></h3> </xsl:template> </xsl:stylesheet>View Transformed Document in Browser
<html> <body></body> </html>
<?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="/"> <html> <body> <xsl:apply-templates/> </body> </html> </xsl:template> <xsl:template match="composition"> <h3><xsl:value-of select="title"/></h3> </xsl:template> </xsl:stylesheet>
<html> <body> Small chamber ensembles - 2-4 Players by New York Women Composers Compositions by the members of New York Women Composers music publishing scores women composers New York July 28, 1999 1999 New York Women Composers Elliotte Rusty Harold Julie Mandel Margaret De Wys Beth Anderson Linda Bouchard <h3>Trio for Flute, Viola and Harp</h3> <h3>Charmonium</h3> <h3>Invention for Flute and Piano</h3> <h3>Little Trio</h3> <h3>Dr. Blood's Mermaid Lullaby</h3> <h3>Trio: Dream in D</h3> <h3>Propos II</h3> <h3>Rictus En Mirroir</h3> </body> </html>
<?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="/"> <html> <xsl:apply-templates select="catalog"/> </html> </xsl:template> <xsl:template match="catalog"> <head> <title><xsl:value-of select="category"/></title> </head> <body> <h1><xsl:value-of select="category"/></h1> <xsl:apply-templates select="composition"/> </body> </xsl:template> <xsl:template match="composition"> <h3><xsl:value-of select="title"/></h3> </xsl:template> </xsl:stylesheet>View Transformed Document in Browser
<html> <head> <meta http-equiv="Content-Type" content="application/xml; charset=utf-8"> <title> Small chamber ensembles - 2-4 Players by New York Women Composers </title> </head> <body> <h1> Small chamber ensembles - 2-4 Players by New York Women Composers </h1> <h3>Trio for Flute, Viola and Harp</h3> <h3>Charmonium</h3> <h3>Invention for Flute and Piano</h3> <h3>Little Trio</h3> <h3>Dr. Blood's Mermaid Lullaby</h3> <h3>Trio: Dream in D</h3> <h3>Propos II</h3> <h3>Rictus En Mirroir</h3> </body> </html>
<?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="/"> <html> <xsl:apply-templates select="catalog"/> </html> </xsl:template> <xsl:template match="catalog"> <head> <title><xsl:value-of select="category"/></title> </head> <body> <h1><xsl:value-of select="category"/></h1> <xsl:apply-templates select="composition"/> <hr/> Copyright <xsl:value-of select="copyright"/><br/> Last Modified: <xsl:value-of select="last_updated"/> </body> </xsl:template> <xsl:template match="composition"> <h3><xsl:value-of select="title"/></h3> <ul> <li><xsl:value-of select="date"/></li> <li><xsl:value-of select="length"/></li> <li><xsl:value-of select="instruments"/></li> <li><xsl:value-of select="publisher"/></li> </ul> <p><xsl:value-of select="description"/></p> </xsl:template> </xsl:stylesheet>View Transformed Document in Browser
<html> <head> <meta http-equiv="Content-Type" content="application/xml; charset=utf-8"> <title> Small chamber ensembles - 2-4 Players by New York Women Composers </title> </head> <body> <h1> Small chamber ensembles - 2-4 Players by New York Women Composers </h1> <h3>Trio for Flute, Viola and Harp</h3> <ul> <li>1994</li> <li>13'38"</li> <li>fl, hp, vla</li> <li>Theodore Presser</li> </ul> <p> Premiered at Queens College in April, 1996 by Sue Ann Kahn, Christine Ims, and Susan Jolles. In 3 movements : mvt. 1: 5:01 mvt. 2: 4:11 mvt. 3: 4:26 </p> <h3>Charmonium</h3> <ul> <li>1991</li> <li>9'</li> <li>2 vln, vla, vc</li> <li></li> </ul> <p> Commissioned as quartet for the Meridian String Quartet. Sonorous, bold. Moderate difficulty. Tape available. </p> <h3>Invention for Flute and Piano</h3> <ul> <li>1994</li> <li></li> <li>fl, pn</li> <li></li> </ul> <p>3 movements</p> <h3>Little Trio</h3> <ul> <li>1984</li> <li>4'</li> <li>fl, guit, va</li> <li>ACA</li> </ul> <p></p> <h3>Dr. Blood's Mermaid Lullaby</h3> <ul> <li>1980</li> <li>3'</li> <li>fl or ob, or vn, or vc, pn</li> <li>ACA</li> </ul> <p></p> <h3>Trio: Dream in D</h3> <ul> <li>1980</li> <li>10'</li> <li>fl, pn, vc, or vn, pn, vc</li> <li></li> </ul> <p> Rhapsodic. Passionate. Available on CD Two by Three from North/South Consonance (1998). </p> <h3>Propos II</h3> <ul> <li>1985</li> <li>11'</li> <li>2 tpt</li> <li></li> </ul> <p>Arrangement from Propos</p> <h3>Rictus En Mirroir</h3> <ul> <li>1985</li> <li>14'</li> <li>fl, ob, hpschd, vc</li> <li></li> </ul> <p></p> <hr> Copyright 1999 New York Women Composers<br> Last Modified: July 28, 1999 </body> </html>
I want to add something like this to the footer so readers can contact me if there's a problem with the page:
Elliotte Rusty Harold<br/> elharo@metalab.unc.edu
This information comes from the maintainer
element:
<maintainer email="elharo@metalab.unc.edu" url="http://www.elharo.com/"> <name> <first_name>Elliotte</first_name> <middle_name>Rusty</middle_name> <last_name>Harold</last_name> </name> </maintainer>
We need a way to get content from attributes in the input document.
This is accomplished by prefixing the attribute name with @
.
<xsl:template match="catalog"> <head> <title><xsl:value-of select="category"/></title> </head> <body> <h1><xsl:value-of select="category"/></h1> <xsl:apply-templates select="composition"/> <hr/> Copyright <xsl:value-of select="copyright"/><br/> Last Modified: <xsl:value-of select="last_updated"/><br/> <xsl:apply-templates select="maintainer"/> </body> </xsl:template> <xsl:template match="maintainer"> <xsl:value-of select="name"/><br/> <xsl:value-of select="@email"/> </xsl:template>
<html> <head> <meta http-equiv="Content-Type" content="application/xml; charset=utf-8"> <title> Small chamber ensembles - 2-4 Players by New York Women Composers </title> </head> <body> <h1> Small chamber ensembles - 2-4 Players by New York Women Composers </h1> <h3>Trio for Flute, Viola and Harp</h3> <ul> <li>1994</li> <li>13'38"</li> <li>fl, hp, vla</li> <li>Theodore Presser</li> </ul> <p> Premiered at Queens College in April, 1996 by Sue Ann Kahn, Christine Ims, and Susan Jolles. In 3 movements : mvt. 1: 5:01 mvt. 2: 4:11 mvt. 3: 4:26 </p> <h3>Charmonium</h3> <ul> <li>1991</li> <li>9'</li> <li>2 vln, vla, vc</li> <li></li> </ul> <p> Commissioned as quartet for the Meridian String Quartet. Sonorous, bold. Moderate difficulty. Tape available. </p> <h3>Invention for Flute and Piano</h3> <ul> <li>1994</li> <li></li> <li>fl, pn</li> <li></li> </ul> <p>3 movements</p> <h3>Little Trio</h3> <ul> <li>1984</li> <li>4'</li> <li>fl, guit, va</li> <li>ACA</li> </ul> <p></p> <h3>Dr. Blood's Mermaid Lullaby</h3> <ul> <li>1980</li> <li>3'</li> <li>fl or ob, or vn, or vc, pn</li> <li>ACA</li> </ul> <p></p> <h3>Trio: Dream in D</h3> <ul> <li>1980</li> <li>10'</li> <li>fl, pn, vc, or vn, pn, vc</li> <li></li> </ul> <p> Rhapsodic. Passionate. Available on CD Two by Three from North/South Consonance (1998). </p> <h3>Propos II</h3> <ul> <li>1985</li> <li>11'</li> <li>2 tpt</li> <li></li> </ul> <p>Arrangement from Propos</p> <h3>Rictus En Mirroir</h3> <ul> <li>1985</li> <li>14'</li> <li>fl, ob, hpschd, vc</li> <li></li> </ul> <p></p> <hr> Copyright 1999 New York Women Composers<br> Last Modified: July 28, 1999<br> Elliotte Rusty Harold <br>elharo@metalab.unc.edu </body> </html>
Add this to the footer:
<a href="http://www.elharo.com/">Elliotte Rusty Harold</a><br/> <a href="mailto:elharo@metalab.unc.edu">elharo@metalab.unc.edu</a>
Need a way to copy nodes from the input document to attribute values in the output document.
Attribute value templates are the solution
<xsl:template match="maintainer"> <a href="{@url}"><xsl:value-of select="name"/></a><br/> <a href="mailto:{@email}"><xsl:value-of select="@email"/></a> </xsl:template>
<html> <head> <meta http-equiv="Content-Type" content="application/xml; charset=utf-8"> <title> Small chamber ensembles - 2-4 Players by New York Women Composers </title> </head> <body> <h1> Small chamber ensembles - 2-4 Players by New York Women Composers </h1> <h3>Trio for Flute, Viola and Harp</h3> <ul> <li>1994</li> <li>13'38"</li> <li>fl, hp, vla</li> <li>Theodore Presser</li> </ul> <p> Premiered at Queens College in April, 1996 by Sue Ann Kahn, Christine Ims, and Susan Jolles. In 3 movements : mvt. 1: 5:01 mvt. 2: 4:11 mvt. 3: 4:26 </p> <h3>Charmonium</h3> <ul> <li>1991</li> <li>9'</li> <li>2 vln, vla, vc</li> <li></li> </ul> <p> Commissioned as quartet for the Meridian String Quartet. Sonorous, bold. Moderate difficulty. Tape available. </p> <h3>Invention for Flute and Piano</h3> <ul> <li>1994</li> <li></li> <li>fl, pn</li> <li></li> </ul> <p>3 movements</p> <h3>Little Trio</h3> <ul> <li>1984</li> <li>4'</li> <li>fl, guit, va</li> <li>ACA</li> </ul> <p></p> <h3>Dr. Blood's Mermaid Lullaby</h3> <ul> <li>1980</li> <li>3'</li> <li>fl or ob, or vn, or vc, pn</li> <li>ACA</li> </ul> <p></p> <h3>Trio: Dream in D</h3> <ul> <li>1980</li> <li>10'</li> <li>fl, pn, vc, or vn, pn, vc</li> <li></li> </ul> <p> Rhapsodic. Passionate. Available on CD Two by Three from North/South Consonance (1998). </p> <h3>Propos II</h3> <ul> <li>1985</li> <li>11'</li> <li>2 tpt</li> <li></li> </ul> <p>Arrangement from Propos</p> <h3>Rictus En Mirroir</h3> <ul> <li>1985</li> <li>14'</li> <li>fl, ob, hpschd, vc</li> <li></li> </ul> <p></p> <hr> Copyright 1999 New York Women Composers<br> Last Modified: July 28, 1999<br><a href="http://www.elharo.com/"> Elliotte Rusty Harold </a><br><a href="mailto:elharo@metalab.unc.edu">elharo@metalab.unc.edu</a></body> </html>
The descriptions in the output document are pure text.
The descriptions in the input document are somewhat more styled and include paragraphs, unordered lists and citations; e.g.
<description> <p>Premiered at Queens College in April, 1996 by Sue Ann Kahn, Christine Ims, and Susan Jolles. In 3 movements :</p> <ul> <li>mvt. 1: 5:01</li> <li>mvt. 2: 4:11</li> <li>mvt. 3: 4:26</li> </ul> </description>
But all this is stripped by the default template rule used for the description!
Use xsl:copy
to move these elements into
the output more or less as is:
<xsl:template match="p"> <xsl:copy> <xsl:apply-templates/> </xsl:copy> </xsl:template> <!-- pass HTML along unchanged --> <xsl:template match="ul"> <xsl:copy> <xsl:apply-templates"/> </xsl:copy> </xsl:template> <xsl:template match="li"> <xsl:copy> <xsl:apply-templates"/> </xsl:copy> </xsl:template> <xsl:template match="cite"> <xsl:copy> <xsl:apply-templates"/> </xsl:copy> </xsl:template>
We also have to apply templates to the description
element rather than taking its value:
<xsl:template match="composition"> <h3><xsl:value-of select="title"/></h3> <ul> <li><xsl:value-of select="date"/></li> <li><xsl:value-of select="length"/></li> <li><xsl:value-of select="instruments"/></li> <li><xsl:value-of select="publisher"/></li> </ul> <p><xsl:apply-templates select="description"/></p> </xsl:template>
<html> <head> <meta http-equiv="Content-Type" content="application/xml; charset=utf-8"> <title> Small chamber ensembles - 2-4 Players by New York Women Composers </title> </head> <body> <h1> Small chamber ensembles - 2-4 Players by New York Women Composers </h1> <h3>Trio for Flute, Viola and Harp</h3> <ul> <li>1994</li> <li>13'38"</li> <li>fl, hp, vla</li> <li>Theodore Presser</li> </ul> <p> <p>Premiered at Queens College in April, 1996 by Sue Ann Kahn, Christine Ims, and Susan Jolles. In 3 movements : </p> <ul> <li>mvt. 1: 5:01</li> <li>mvt. 2: 4:11</li> <li>mvt. 3: 4:26</li> </ul> </p> <h3>Charmonium</h3> <ul> <li>1991</li> <li>9'</li> <li>2 vln, vla, vc</li> <li></li> </ul> <p> <p>Commissioned as quartet for the Meridian String Quartet. Sonorous, bold. Moderate difficulty. Tape available. </p> </p> <h3>Invention for Flute and Piano</h3> <ul> <li>1994</li> <li></li> <li>fl, pn</li> <li></li> </ul> <p> <p>3 movements</p> </p> <h3>Little Trio</h3> <ul> <li>1984</li> <li>4'</li> <li>fl, guit, va</li> <li>ACA</li> </ul> <p></p> <h3>Dr. Blood's Mermaid Lullaby</h3> <ul> <li>1980</li> <li>3'</li> <li>fl or ob, or vn, or vc, pn</li> <li>ACA</li> </ul> <p></p> <h3>Trio: Dream in D</h3> <ul> <li>1980</li> <li>10'</li> <li>fl, pn, vc, or vn, pn, vc</li> <li></li> </ul> <p> <p>Rhapsodic. Passionate. Available on CD <cite> Two by Three </cite> from North/South Consonance (1998). </p> </p> <h3>Propos II</h3> <ul> <li>1985</li> <li>11'</li> <li>2 tpt</li> <li></li> </ul> <p> <p>Arrangement from Propos</p> </p> <h3>Rictus En Mirroir</h3> <ul> <li>1985</li> <li>14'</li> <li>fl, ob, hpschd, vc</li> <li></li> </ul> <p></p> <hr> Copyright 1999 New York Women Composers<br> Last Modified: July 28, 1999<br><a href="http://www.macfaq.com/personal.html"> Elliotte Rusty Harold </a><br><a href="mailto:elharo@metalab.unc.edu">elharo@metalab.unc.edu</a></body> </html>
Since all four template rules for the HTML
element have the same content, we can combine them into a single rule
that applies to each of the four using the or operator
|
<xsl:template match="p|ul|li|cite"> <xsl:copy> <xsl:apply-templates/> </xsl:copy> </xsl:template>
Right now the descriptions in the input document only use a few HTML tags, but potentially they could use full HTML up to and including tables, images, styles, and more. You could include separate template rules for each of these, but it's easier to specify a rule that applies to all elements.
<!-- pass unrecognized tags along unchanged --> <xsl:template match="*"> <xsl:copy> <xsl:apply-templates/> </xsl:copy> </xsl:template>
The *
matches all elements that are not matched by some
more specific rules. It only matches element nodes, though. It does not match
nodes for
attributes
comments
processing instructions
namespaces
text
The output is the same in this case, though for a document that used more HTML it might be different.
To copy everything including:
attributes
comments
processing instructions
namespaces
text
we have to use greedier wild cards:
@*
to copy attribute nodes
node()
to copy all other nodes
<!-- pass unrecognized nodes along unchanged --> <xsl:template match="node()|@*"> <xsl:copy> <xsl:apply-templates select="node()|@*"/> </xsl:copy> </xsl:template>
The output is the same in this case, though for a document that used more HTML it might be different.
Perhaps this is too greedy. Do we really only want to recognize
HTML in the description element? What if somebody puts HTML
in a different, element like instruments
?
What if somebody makes a mistake and adds an element
that shouldn't be there?
I don't think so, but it would be possible to use modes
or other techniques to make this default rule only apply
inside the description
element.
<xsl:template match="composition"> <h3><xsl:value-of select="title"/></h3> <ul> <xsl:if test="date"> <li><xsl:value-of select="date"/></li> </xsl:if> <xsl:if test="length"> <li><xsl:value-of select="length"/></li> </xsl:if> <xsl:if test="instruments"> <li><xsl:value-of select="instruments"/></li> </xsl:if> <xsl:if test="publisher"> <li><xsl:value-of select="publisher"/></li> </xsl:if> </ul> <p><xsl:apply-templates select="description"/></p> </xsl:template>
Non-empty node-sets are true. Empty node-sets are false.
Zero length strings are false. Other strings are true.
There are all the <
, >
, =
, !=
,
<=
and >=
operators you expect
<html> <head> <meta http-equiv="Content-Type" content="application/xml; charset=utf-8"> <title> Small chamber ensembles - 2-4 Players by New York Women Composers </title> </head> <body> <h1> Small chamber ensembles - 2-4 Players by New York Women Composers </h1> <h3>Trio for Flute, Viola and Harp</h3> <ul> <li>1994</li> <li>13'38"</li> <li>fl, hp, vla</li> <li>Theodore Presser</li> </ul> <p> <description> <p>Premiered at Queens College in April, 1996 by Sue Ann Kahn, Christine Ims, and Susan Jolles. In 3 movements : </p> <ul> <li>mvt. 1: 5:01</li> <li>mvt. 2: 4:11</li> <li>mvt. 3: 4:26</li> </ul> </description> </p> <h3>Charmonium</h3> <ul> <li>1991</li> <li>9'</li> <li>2 vln, vla, vc</li> </ul> <p> <description> <p>Commissioned as quartet for the Meridian String Quartet. Sonorous, bold. Moderate difficulty. Tape available. </p> </description> </p> <h3>Invention for Flute and Piano</h3> <ul> <li>1994</li> <li>fl, pn</li> </ul> <p> <description> <p>3 movements</p> </description> </p> <h3>Little Trio</h3> <ul> <li>1984</li> <li>4'</li> <li>fl, guit, va</li> <li>ACA</li> </ul> <p></p> <h3>Dr. Blood's Mermaid Lullaby</h3> <ul> <li>1980</li> <li>3'</li> <li>fl or ob, or vn, or vc, pn</li> <li>ACA</li> </ul> <p></p> <h3>Trio: Dream in D</h3> <ul> <li>1980</li> <li>10'</li> <li>fl, pn, vc, or vn, pn, vc</li> </ul> <p> <description> <p>Rhapsodic. Passionate. Available on CD <cite> <a href="http://www.amazon.com/exec/obidos/ASIN/B000007NMH/qid=913265342/sr=1-2/" shape="rect"> Two by Three </a> </cite> from North/South Consonance (1998). </p> </description> </p> <h3>Propos II</h3> <ul> <li>1985</li> <li>11'</li> <li>2 tpt</li> </ul> <p> <description> <p>Arrangement from Propos</p> </description> </p> <h3>Rictus En Mirroir</h3> <ul> <li>1985</li> <li>14'</li> <li>fl, ob, hpschd, vc</li> </ul> <p></p> <hr> Copyright 1999 New York Women Composers<br> Last Modified: July 28, 1999<br><a href="http://www.elharo.com/"> Elliotte Rusty Harold </a><br><a href="mailto:elharo@metalab.unc.edu">elharo@metalab.unc.edu</a></body> </html>
The composers and their compositions are linked through the
the id
attribute of the composer
element
and the composer
attribute of the composition
element.
<composer id="c3"> <name> <first_name>Beth</first_name> <middle_name></middle_name> <last_name>Anderson</last_name> </name> </composer> <composition composer="c3"> <title>Trio: Dream in D</title> <date><year>(1980)</year></date> <length>10'</length> <instruments>fl, pn, vc, or vn, pn, vc</instruments> <description> Rhapsodic. Passionate. Available on CD <cite><a href="http://www.amazon.com/exec/obidos/ASIN/B000007NMH/qid%3D913265342/sr%3D1-2/">Two by Three</a></cite> from North/South Consonance (1998). </description> <publisher></publisher> </composition>
<xsl:template match="catalog"> <head> <title><xsl:value-of select="category"/></title> </head> <body> <h1><xsl:value-of select="category"/></h1> <xsl:apply-templates select="composer"/> <hr/> Copyright <xsl:value-of select="copyright"/><br/> Last Modified: <xsl:value-of select="last_updated"/><br/> <xsl:apply-templates select="maintainer"/> </body> </xsl:template> <xsl:template match="composer"> <h2><xsl:value-of select="name"/></h2> <xsl:apply-templates select="../composition[@composer=current()/@id]"/> </xsl:template>
..
selects the parent element
/
selects a child of the context node
Square braces []
include a predicate to winnow down the
selected nodes
The current()
function refers to the matched composer element
@composer()
and @id
take the value of the composer
attribute and the id
attribute
The =
operator compares the to attributes
<html> <head> <meta http-equiv="Content-Type" content="application/xml; charset=utf-8"> <title> Small chamber ensembles - 2-4 Players by New York Women Composers </title> </head> <body> <h1> Small chamber ensembles - 2-4 Players by New York Women Composers </h1> <h2> Julie Mandel </h2> <h3>Trio for Flute, Viola and Harp</h3> <ul> <li>1994</li> <li>13'38"</li> <li>fl, hp, vla</li> <li>Theodore Presser</li> </ul> <p> <description> <p>Premiered at Queens College in April, 1996 by Sue Ann Kahn, Christine Ims, and Susan Jolles. In 3 movements : </p> <ul> <li>mvt. 1: 5:01</li> <li>mvt. 2: 4:11</li> <li>mvt. 3: 4:26</li> </ul> </description> </p> <h3>Invention for Flute and Piano</h3> <ul> <li>1994</li> <li>fl, pn</li> </ul> <p> <description> <p>3 movements</p> </description> </p> <h2> Margaret De Wys </h2> <h3>Charmonium</h3> <ul> <li>1991</li> <li>9'</li> <li>2 vln, vla, vc</li> </ul> <p> <description> <p>Commissioned as quartet for the Meridian String Quartet. Sonorous, bold. Moderate difficulty. Tape available. </p> </description> </p> <h2> Beth Anderson </h2> <h3>Little Trio</h3> <ul> <li>1984</li> <li>4'</li> <li>fl, guit, va</li> <li>ACA</li> </ul> <p></p> <h3>Dr. Blood's Mermaid Lullaby</h3> <ul> <li>1980</li> <li>3'</li> <li>fl or ob, or vn, or vc, pn</li> <li>ACA</li> </ul> <p></p> <h3>Trio: Dream in D</h3> <ul> <li>1980</li> <li>10'</li> <li>fl, pn, vc, or vn, pn, vc</li> </ul> <p> <description> <p>Rhapsodic. Passionate. Available on CD <cite> <a href="http://www.amazon.com/exec/obidos/ASIN/B000007NMH/qid=913265342/sr=1-2/" shape="rect"> Two by Three </a> </cite> from North/South Consonance (1998). </p> </description> </p> <h2> Linda Bouchard </h2> <h3>Propos II</h3> <ul> <li>1985</li> <li>11'</li> <li>2 tpt</li> </ul> <p> <description> <p>Arrangement from Propos</p> </description> </p> <h3>Rictus En Mirroir</h3> <ul> <li>1985</li> <li>14'</li> <li>fl, ob, hpschd, vc</li> </ul> <p></p> <hr> Copyright 1999 New York Women Composers<br> Last Modified: July 28, 1999<br><a href="http://www.macfaq.com/personal.html"> Elliotte Rusty Harold </a><br><a href="mailto:elharo@metalab.unc.edu">elharo@metalab.unc.edu</a></body> </html>
<xsl:template match="catalog"> <head> <title><xsl:value-of select="category"/></title> </head> <body> <h1><xsl:value-of select="category"/></h1> <xsl:apply-templates select="composer"> <xsl:sort select="name/last_name"/> </xsl:apply-templates> <hr/> Copyright <xsl:value-of select="copyright"/><br/> Last Modified: <xsl:value-of select="last_updated"/><br/> <xsl:apply-templates select="maintainer"/> </body> </xsl:template>
The select
attribute provides the key to sort by
Must be a child of xsl:apply-templates
or xsl:for-each
<html> <head> <meta http-equiv="Content-Type" content="application/xml; charset=utf-8"> <title> Small chamber ensembles - 2-4 Players by New York Women Composers </title> </head> <body> <h1> Small chamber ensembles - 2-4 Players by New York Women Composers </h1> <h2> Beth Anderson </h2> <h3>Little Trio</h3> <ul> <li>1984</li> <li>4'</li> <li>fl, guit, va</li> <li>ACA</li> </ul> <p></p> <h3>Dr. Blood's Mermaid Lullaby</h3> <ul> <li>1980</li> <li>3'</li> <li>fl or ob, or vn, or vc, pn</li> <li>ACA</li> </ul> <p></p> <h3>Trio: Dream in D</h3> <ul> <li>1980</li> <li>10'</li> <li>fl, pn, vc, or vn, pn, vc</li> </ul> <p> <description> <p>Rhapsodic. Passionate. Available on CD <cite> <a href="http://www.amazon.com/exec/obidos/ASIN/B000007NMH/qid=913265342/sr=1-2/" shape="rect"> Two by Three </a> </cite> from North/South Consonance (1998). </p> </description> </p> <h2> Linda Bouchard </h2> <h3>Propos II</h3> <ul> <li>1985</li> <li>11'</li> <li>2 tpt</li> </ul> <p> <description> <p>Arrangement from Propos</p> </description> </p> <h3>Rictus En Mirroir</h3> <ul> <li>1985</li> <li>14'</li> <li>fl, ob, hpschd, vc</li> </ul> <p></p> <h2> Julie Mandel </h2> <h3>Trio for Flute, Viola and Harp</h3> <ul> <li>1994</li> <li>13'38"</li> <li>fl, hp, vla</li> <li>Theodore Presser</li> </ul> <p> <description> <p>Premiered at Queens College in April, 1996 by Sue Ann Kahn, Christine Ims, and Susan Jolles. In 3 movements : </p> <ul> <li>mvt. 1: 5:01</li> <li>mvt. 2: 4:11</li> <li>mvt. 3: 4:26</li> </ul> </description> </p> <h3>Invention for Flute and Piano</h3> <ul> <li>1994</li> <li>fl, pn</li> </ul> <p> <description> <p>3 movements</p> </description> </p> <h2> Margaret De Wys </h2> <h3>Charmonium</h3> <ul> <li>1991</li> <li>9'</li> <li>2 vln, vla, vc</li> </ul> <p> <description> <p>Commissioned as quartet for the Meridian String Quartet. Sonorous, bold. Moderate difficulty. Tape available. </p> </description> </p> <hr> Copyright 1999 New York Women Composers<br> Last Modified: July 28, 1999<br><a href="http://www.macfaq.com/personal.html"> Elliotte Rusty Harold </a><br><a href="mailto:elharo@metalab.unc.edu">elharo@metalab.unc.edu</a></body> </html>
<xsl:template match="catalog"> <head> <title><xsl:value-of select="category"/></title> </head> <body> <h1><xsl:value-of select="category"/></h1> <xsl:apply-templates select="composer"> <xsl:sort select="name/last_name"/> <xsl:sort select="name/first_name"/> <xsl:sort select="name/middle_name"/> </xsl:apply-templates> <hr/> Copyright <xsl:value-of select="copyright"/><br/> Last Modified: <xsl:value-of select="last_updated"/><br/> <xsl:apply-templates select="maintainer"/> </body> </xsl:template>
<html> <head> <meta http-equiv="Content-Type" content="application/xml; charset=utf-8"> <title> Small chamber ensembles - 2-4 Players by New York Women Composers </title> </head> <body> <h1> Small chamber ensembles - 2-4 Players by New York Women Composers </h1> <h2> Beth Anderson </h2> <h3>Little Trio</h3> <ul> <li>1984</li> <li>4'</li> <li>fl, guit, va</li> <li>ACA</li> </ul> <p></p> <h3>Dr. Blood's Mermaid Lullaby</h3> <ul> <li>1980</li> <li>3'</li> <li>fl or ob, or vn, or vc, pn</li> <li>ACA</li> </ul> <p></p> <h3>Trio: Dream in D</h3> <ul> <li>1980</li> <li>10'</li> <li>fl, pn, vc, or vn, pn, vc</li> </ul> <p> <description> <p>Rhapsodic. Passionate. Available on CD <cite> <a href="http://www.amazon.com/exec/obidos/ASIN/B000007NMH/qid=913265342/sr=1-2/" shape="rect"> Two by Three </a> </cite> from North/South Consonance (1998). </p> </description> </p> <h2> Linda Bouchard </h2> <h3>Propos II</h3> <ul> <li>1985</li> <li>11'</li> <li>2 tpt</li> </ul> <p> <description> <p>Arrangement from Propos</p> </description> </p> <h3>Rictus En Mirroir</h3> <ul> <li>1985</li> <li>14'</li> <li>fl, ob, hpschd, vc</li> </ul> <p></p> <h2> Julie Mandel </h2> <h3>Trio for Flute, Viola and Harp</h3> <ul> <li>1994</li> <li>13'38"</li> <li>fl, hp, vla</li> <li>Theodore Presser</li> </ul> <p> <description> <p>Premiered at Queens College in April, 1996 by Sue Ann Kahn, Christine Ims, and Susan Jolles. In 3 movements : </p> <ul> <li>mvt. 1: 5:01</li> <li>mvt. 2: 4:11</li> <li>mvt. 3: 4:26</li> </ul> </description> </p> <h3>Invention for Flute and Piano</h3> <ul> <li>1994</li> <li>fl, pn</li> </ul> <p> <description> <p>3 movements</p> </description> </p> <h2> Margaret De Wys </h2> <h3>Charmonium</h3> <ul> <li>1991</li> <li>9'</li> <li>2 vln, vla, vc</li> </ul> <p> <description> <p>Commissioned as quartet for the Meridian String Quartet. Sonorous, bold. Moderate difficulty. Tape available. </p> </description> </p> <hr> Copyright 1999 New York Women Composers<br> Last Modified: July 28, 1999<br><a href="http://www.elharo.com/"> Elliotte Rusty Harold </a><br><a href="mailto:elharo@metalab.unc.edu">elharo@metalab.unc.edu</a></body> </html>
Sorting by composition title is equally straight-forward
but we have to do it in a separate xsl:apply-templates
element
<xsl:template match="catalog"> <head> <title><xsl:value-of select="category"/></title> </head> <body> <h1><xsl:value-of select="category"/></h1> <xsl:apply-templates select="composer"> <xsl:sort select="name/last_name"/> <xsl:sort select="name/first_name"/> <xsl:sort select="name/middle_name"/> </xsl:apply-templates> <hr/> Copyright <xsl:value-of select="copyright"/><br/> Last Modified: <xsl:value-of select="last_updated"/><br/> <xsl:apply-templates select="maintainer"/> </body> </xsl:template> <xsl:template match="composer"> <h2><xsl:value-of select="name"/></h2> <xsl:apply-templates select="../composition[@composer=current()/@id]"> <xsl:sort select="title"/> </xsl:apply-templates> </xsl:template>
<html> <head> <meta http-equiv="Content-Type" content="application/xml; charset=utf-8"> <title> Small chamber ensembles - 2-4 Players by New York Women Composers </title> </head> <body> <h1> Small chamber ensembles - 2-4 Players by New York Women Composers </h1> <h2> Beth Anderson </h2> <h3>Dr. Blood's Mermaid Lullaby</h3> <ul> <li>1980</li> <li>3'</li> <li>fl or ob, or vn, or vc, pn</li> <li>ACA</li> </ul> <p></p> <h3>Little Trio</h3> <ul> <li>1984</li> <li>4'</li> <li>fl, guit, va</li> <li>ACA</li> </ul> <p></p> <h3>Trio: Dream in D</h3> <ul> <li>1980</li> <li>10'</li> <li>fl, pn, vc, or vn, pn, vc</li> </ul> <p> <description> <p>Rhapsodic. Passionate. Available on CD <cite> <a href="http://www.amazon.com/exec/obidos/ASIN/B000007NMH/qid=913265342/sr=1-2/" shape="rect"> Two by Three </a> </cite> from North/South Consonance (1998). </p> </description> </p> <h2> Linda Bouchard </h2> <h3>Propos II</h3> <ul> <li>1985</li> <li>11'</li> <li>2 tpt</li> </ul> <p> <description> <p>Arrangement from Propos</p> </description> </p> <h3>Rictus En Mirroir</h3> <ul> <li>1985</li> <li>14'</li> <li>fl, ob, hpschd, vc</li> </ul> <p></p> <h2> Julie Mandel </h2> <h3>Invention for Flute and Piano</h3> <ul> <li>1994</li> <li>fl, pn</li> </ul> <p> <description> <p>3 movements</p> </description> </p> <h3>Trio for Flute, Viola and Harp</h3> <ul> <li>1994</li> <li>13'38"</li> <li>fl, hp, vla</li> <li>Theodore Presser</li> </ul> <p> <description> <p>Premiered at Queens College in April, 1996 by Sue Ann Kahn, Christine Ims, and Susan Jolles. In 3 movements : </p> <ul> <li>mvt. 1: 5:01</li> <li>mvt. 2: 4:11</li> <li>mvt. 3: 4:26</li> </ul> </description> </p> <h2> Margaret De Wys </h2> <h3>Charmonium</h3> <ul> <li>1991</li> <li>9'</li> <li>2 vln, vla, vc</li> </ul> <p> <description> <p>Commissioned as quartet for the Meridian String Quartet. Sonorous, bold. Moderate difficulty. Tape available. </p> </description> </p> <hr> Copyright 1999 New York Women Composers<br> Last Modified: July 28, 1999<br><a href="http://www.elharo.com/"> Elliotte Rusty Harold </a><br><a href="mailto:elharo@metalab.unc.edu">elharo@metalab.unc.edu</a></body> </html>
<xsl:template match="catalog"> <head> <title><xsl:value-of select="category"/></title> </head> <body> <!-- Header --> <h1><xsl:value-of select="category"/></h1> <ul> <xsl:for-each select="composition"> <li><xsl:value-of select="title"/></li> </xsl:for-each> </ul> <!-- Body --> <xsl:apply-templates select="composer"> <xsl:sort select="name/last_name"/> <xsl:sort select="name/first_name"/> <xsl:sort select="name/middle_name"/> </xsl:apply-templates> <!-- Signature --> <hr/> Copyright <xsl:value-of select="copyright"/><br/> Last Modified: <xsl:value-of select="last_updated"/><br/> <xsl:apply-templates select="maintainer"/> </body> </xsl:template>
.
selects the context node
xsl:for-each
iterates through the selected nodes,
setting each one to the current node in turn but does not
apply templates to that node.
<html> <head> <meta http-equiv="Content-Type" content="application/xml; charset=utf-8"> <title> Small chamber ensembles - 2-4 Players by New York Women Composers </title> </head> <body> <h1> Small chamber ensembles - 2-4 Players by New York Women Composers </h1> <ul> <li>Trio for Flute, Viola and Harp</li> <li>Charmonium</li> <li>Invention for Flute and Piano</li> <li>Little Trio</li> <li>Dr. Blood's Mermaid Lullaby</li> <li>Trio: Dream in D</li> <li>Propos II</li> <li>Rictus En Mirroir</li> </ul> <h2> Beth Anderson </h2> <h3>Dr. Blood's Mermaid Lullaby</h3> <ul> <li>1980</li> <li>3'</li> <li>fl or ob, or vn, or vc, pn</li> <li>ACA</li> </ul> <p></p> <h3>Little Trio</h3> <ul> <li>1984</li> <li>4'</li> <li>fl, guit, va</li> <li>ACA</li> </ul> <p></p> <h3>Trio: Dream in D</h3> <ul> <li>1980</li> <li>10'</li> <li>fl, pn, vc, or vn, pn, vc</li> </ul> <p> <description> <p>Rhapsodic. Passionate. Available on CD <cite> <a href="http://www.amazon.com/exec/obidos/ASIN/B000007NMH/qid=913265342/sr=1-2/" shape="rect"> Two by Three </a> </cite> from North/South Consonance (1998). </p> </description> </p> <h2> Linda Bouchard </h2> <h3>Propos II</h3> <ul> <li>1985</li> <li>11'</li> <li>2 tpt</li> </ul> <p> <description> <p>Arrangement from Propos</p> </description> </p> <h3>Rictus En Mirroir</h3> <ul> <li>1985</li> <li>14'</li> <li>fl, ob, hpschd, vc</li> </ul> <p></p> <h2> Julie Mandel </h2> <h3>Invention for Flute and Piano</h3> <ul> <li>1994</li> <li>fl, pn</li> </ul> <p> <description> <p>3 movements</p> </description> </p> <h3>Trio for Flute, Viola and Harp</h3> <ul> <li>1994</li> <li>13'38"</li> <li>fl, hp, vla</li> <li>Theodore Presser</li> </ul> <p> <description> <p>Premiered at Queens College in April, 1996 by Sue Ann Kahn, Christine Ims, and Susan Jolles. In 3 movements : </p> <ul> <li>mvt. 1: 5:01</li> <li>mvt. 2: 4:11</li> <li>mvt. 3: 4:26</li> </ul> </description> </p> <h2> Margaret De Wys </h2> <h3>Charmonium</h3> <ul> <li>1991</li> <li>9'</li> <li>2 vln, vla, vc</li> </ul> <p> <description> <p>Commissioned as quartet for the Meridian String Quartet. Sonorous, bold. Moderate difficulty. Tape available. </p> </description> </p> <hr> Copyright 1999 New York Women Composers<br> Last Modified: July 28, 1999<br><a href="http://www.elharo.com/"> Elliotte Rusty Harold </a><br><a href="mailto:elharo@metalab.unc.edu">elharo@metalab.unc.edu</a></body> </html>
xsl:for-each
can have an
xsl:sort
child just like xsl:apply-templates
<xsl:template match="catalog"> <head> <title><xsl:value-of select="category"/></title> </head> <body> <!-- Header --> <h1><xsl:value-of select="category"/></h1> <ul> <xsl:for-each select="composition"> <xsl:sort select="title"/> <li><xsl:value-of select="title"/></li> </xsl:for-each> </ul> <!-- Body --> <xsl:apply-templates select="composer"> <xsl:sort select="name/last_name"/> <xsl:sort select="name/first_name"/> <xsl:sort select="name/middle_name"/> </xsl:apply-templates> <!-- Signature --> <hr/> Copyright <xsl:value-of select="copyright"/><br/> Last Modified: <xsl:value-of select="last_updated"/><br/> <xsl:apply-templates select="maintainer"/> </body> </xsl:template>
We need to add <a name="some_name">title</a>
around each composition title so we have something to link to.
The generate-id()
function will choose a unique ID
for a particular element.
Here's the new template for the composition
<xsl:template match="composition"> <h3> <a name="{generate-id()}"> <xsl:value-of select="title"/> </a> </h3> <ul> <xsl:if test="date)"> <li><xsl:value-of select="date"/></li> </xsl:if> <xsl:if test="string(length)"> <li><xsl:value-of select="length"/></li> </xsl:if> <xsl:if test="string(instruments)"> <li><xsl:value-of select="instruments"/></li> </xsl:if> <xsl:if test="string(publisher)"> <li><xsl:value-of select="publisher"/></li> </xsl:if> </ul> <p><xsl:apply-templates select="description"/></p> </xsl:template>
Here's the new template for the tabel of contents link
<xsl:template match="catalog"> <head> <title><xsl:value-of select="category"/></title> </head> <body> <!-- Header --> <h1><xsl:value-of select="category"/></h1> <ul> <xsl:for-each select="composition"> <xsl:sort select="title"/> <li> <a href="#{generate-id()}"> <xsl:value-of select="title"/> </a> </li> </xsl:for-each> </ul> <!-- Body --> <xsl:apply-templates select="composer"> <xsl:sort select="name/last_name"/> <xsl:sort select="name/first_name"/> <xsl:sort select="name/middle_name"/> </xsl:apply-templates> <!-- Signature --> <hr/> Copyright <xsl:value-of select="copyright"/><br/> Last Modified: <xsl:value-of select="last_updated"/><br/> <xsl:apply-templates select="maintainer"/> </body> </xsl:template>
Although the ID is generated in two separate places, it is generated for the same node. Consequently, they are the same.
<html> <head> <meta http-equiv="Content-Type" content="application/xml; charset=utf-8"> <title> Small chamber ensembles - 2-4 Players by New York Women Composers </title> </head> <body> <h1> Small chamber ensembles - 2-4 Players by New York Women Composers </h1> <ul> <li><a href="#b1ac21">Charmonium</a></li> <li><a href="#b1ac27">Dr. Blood's Mermaid Lullaby</a></li> <li><a href="#b1ac23">Invention for Flute and Piano</a></li> <li><a href="#b1ac25">Little Trio</a></li> <li><a href="#b1ac31">Propos II</a></li> <li><a href="#b1ac33">Rictus En Mirroir</a></li> <li><a href="#b1ac19">Trio for Flute, Viola and Harp</a></li> <li><a href="#b1ac29">Trio: Dream in D</a></li> </ul> <h2> Beth Anderson </h2> <h3><a name="b1ac27">Dr. Blood's Mermaid Lullaby</a></h3> <ul> <li>1980</li> <li>3'</li> <li>fl or ob, or vn, or vc, pn</li> <li>ACA</li> </ul> <p></p> <h3><a name="b1ac25">Little Trio</a></h3> <ul> <li>1984</li> <li>4'</li> <li>fl, guit, va</li> <li>ACA</li> </ul> <p></p> <h3><a name="b1ac29">Trio: Dream in D</a></h3> <ul> <li>1980</li> <li>10'</li> <li>fl, pn, vc, or vn, pn, vc</li> </ul> <p> <description> <p>Rhapsodic. Passionate. Available on CD <cite> <a href="http://www.amazon.com/exec/obidos/ASIN/B000007NMH/qid=913265342/sr=1-2/" shape="rect"> Two by Three </a> </cite> from North/South Consonance (1998). </p> </description> </p> <h2> Linda Bouchard </h2> <h3><a name="b1ac31">Propos II</a></h3> <ul> <li>1985</li> <li>11'</li> <li>2 tpt</li> </ul> <p> <description> <p>Arrangement from Propos</p> </description> </p> <h3><a name="b1ac33">Rictus En Mirroir</a></h3> <ul> <li>1985</li> <li>14'</li> <li>fl, ob, hpschd, vc</li> </ul> <p></p> <h2> Julie Mandel </h2> <h3><a name="b1ac23">Invention for Flute and Piano</a></h3> <ul> <li>1994</li> <li>fl, pn</li> </ul> <p> <description> <p>3 movements</p> </description> </p> <h3><a name="b1ac19">Trio for Flute, Viola and Harp</a></h3> <ul> <li>1994</li> <li>13'38"</li> <li>fl, hp, vla</li> <li>Theodore Presser</li> </ul> <p> <description> <p>Premiered at Queens College in April, 1996 by Sue Ann Kahn, Christine Ims, and Susan Jolles. In 3 movements : </p> <ul> <li>mvt. 1: 5:01</li> <li>mvt. 2: 4:11</li> <li>mvt. 3: 4:26</li> </ul> </description> </p> <h2> Margaret De Wys </h2> <h3><a name="b1ac21">Charmonium</a></h3> <ul> <li>1991</li> <li>9'</li> <li>2 vln, vla, vc</li> </ul> <p> <description> <p>Commissioned as quartet for the Meridian String Quartet. Sonorous, bold. Moderate difficulty. Tape available. </p> </description> </p> <hr> Copyright 1999 New York Women Composers<br> Last Modified: July 28, 1999<br><a href="http://www.elharo.com/"> Elliotte Rusty Harold </a><br><a href="mailto:elharo@metalab.unc.edu">elharo@metalab.unc.edu</a></body> </html>
The xsl:number
element has a variety of attributes to determine
number style, exactly what's counted, where numbering starts, and so forth
The position()
function returns the position of the current
node in the context node list
<xsl:template match="composition"> <h3><xsl:number value="position()"/>. <a name="{generate-id()}"> <xsl:value-of select="title"/> </a> </h3> <ul> <xsl:if test="date"> <li><xsl:value-of select="date"/></li> </xsl:if> <xsl:if test="length"> <li><xsl:value-of select="length"/></li> </xsl:if> <xsl:if test="instruments"> <li><xsl:value-of select="instruments"/></li> </xsl:if> <xsl:if test="publisher"> <li><xsl:value-of select="publisher"/></li> </xsl:if> </ul> <p><xsl:apply-templates select="description"/></p> </xsl:template>
XPath has a number of basic functions for working with strings:
starts-with(main_string, prefix_string)
contains(containing_string, contained_string)
substring(string, offset, length)
substring-before(string, marker-string)
substring-after(string, marker-string)
string-length(string)
normalize(string)
translate(string, replaced_text, replacement_text)
concat(string1, string2, ...)
The strings these operate on are generally the values of nodes
These may be part of any select expression, but are most commonly used
in xsl:value-of
.
XPath does not, however, provide full Perl or POSIX regular expressions.
XSLT/XPath 2.0 might add these
substring(string, offset, length)
1 is the first character
length is optional
Node-sets are automatically converted to their values
<xsl:template match="composition">
<h3><xsl:number value="position()"/>.
<a name="{generate-id()}">
<xsl:value-of select="title"/>
</a>
</h3>
<ul>
<xsl:if test="date">
<!--not Y10K safe! -->
<li><xsl:value-of select="substring(date,2,4)"/></li>
</xsl:if>
<xsl:if test="length">
<li><xsl:value-of select="length"/></li>
</xsl:if>
<xsl:if test="instruments">
<li><xsl:value-of select="instruments"/></li>
</xsl:if>
<xsl:if test="publisher">
<li><xsl:value-of select="publisher"/></li>
</xsl:if>
</ul>
<p><xsl:apply-templates select="description"/></p>
</xsl:template>
<html> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> <title> Small chamber ensembles - 2-4 Players by New York Women Composers </title> </head> <body> <h1> Small chamber ensembles - 2-4 Players by New York Women Composers </h1> <ul> <li><a href="#d1e139">Charmonium</a></li> <li><a href="#d1e197">Dr. Blood's Mermaid Lullaby</a></li> <li><a href="#d1e161">Invention for Flute and Piano</a></li> <li><a href="#d1e178">Little Trio</a></li> <li><a href="#d1e243">Propos II</a></li> <li><a href="#d1e263">Rictus En Mirroir</a></li> <li><a href="#d1e102">Trio for Flute, Viola and Harp</a></li> <li><a href="#d1e216">Trio: Dream in D</a></li> </ul> <h2> Beth Anderson </h2> <h3>1. <a name="d1e197">Dr. Blood's Mermaid Lullaby</a></h3> <ul> <li>80</li> <li>3'</li> <li>fl or ob, or vn, or vc, pn</li> <li>ACA</li> </ul> <p></p> <h3>2. <a name="d1e178">Little Trio</a></h3> <ul> <li>84</li> <li>4'</li> <li>fl, guit, va</li> <li>ACA</li> </ul> <p></p> <h3>3. <a name="d1e216">Trio: Dream in D</a></h3> <ul> <li>80</li> <li>10'</li> <li>fl, pn, vc, or vn, pn, vc</li> </ul> <p> <description> <p>Rhapsodic. Passionate. Available on CD <cite> <a href="http://www.amazon.com/exec/obidos/ASIN/B000007NMH/qid=913265342/sr=1-2/" shape="rect"> Two by Three</a></cite> from North/South Consonance (1998). </p> </description> </p> <h2> Linda Bouchard </h2> <h3>1. <a name="d1e243">Propos II</a></h3> <ul> <li>85</li> <li>11'</li> <li>2 tpt</li> </ul> <p> <description> <p>Arrangement from Propos</p> </description> </p> <h3>2. <a name="d1e263">Rictus En Mirroir</a></h3> <ul> <li>85</li> <li>14'</li> <li>fl, ob, hpschd, vc</li> </ul> <p></p> <h2> Julie Mandel </h2> <h3>1. <a name="d1e161">Invention for Flute and Piano</a></h3> <ul> <li>94</li> <li>fl, pn</li> </ul> <p> <description> <p>3 movements</p> </description> </p> <h3>2. <a name="d1e102">Trio for Flute, Viola and Harp</a></h3> <ul> <li>94</li> <li>13'38"</li> <li>fl, hp, vla</li> <li>Theodore Presser</li> </ul> <p> <description> <p>Premiered at Queens College in April, 1996 by Sue Ann Kahn, Christine Ims, and Susan Jolles. In 3 movements : </p> <ul> <li>mvt. 1: 5:01</li> <li>mvt. 2: 4:11</li> <li>mvt. 3: 4:26</li> </ul> </description> </p> <h2> Margaret De Wys </h2> <h3>1. <a name="d1e139">Charmonium</a></h3> <ul> <li>91</li> <li>9'</li> <li>2 vln, vla, vc</li> </ul> <p> <description> <p>Commissioned as quartet for the Meridian String Quartet. Sonorous, bold. Moderate difficulty. Tape available. </p> </description> </p> <hr> Copyright 1999 New York Women Composers<br> Last Modified: July 28, 1999<br><a href="http://www.elharo.com/"> Elliotte Rusty Harold </a><br><a href="mailto:elharo@metalab.unc.edu">elharo@metalab.unc.edu</a></body> </html>
XPath has several operators for doing arithmetic:
+
-
*
div
mod
These may be part of any select expression, but are most commonly used in predicates with comparison operators.
XPath includes five functions that operate on numbers:
floor(number)
returns the greatest integer smaller than the number
ceiling(number)
returns the smallest integer greater than the number
round(number)
rounds the number to the nearest integer
sum(number)
returns the sum of its arguments
format-number(number, format-string)
returns the string form of a number formatted according to the specified format-string as if by Java 1.1's
java.text.DecimalFormat
class
There are three primary ways XML documents are transformed into other formats, such as HTML, with an XSLT style sheet:
The XML document and associated style sheet are both served to the client (Web browser), which then transforms the document as specified by the style sheet and presents it to the user.
The server applies an XSL style sheet to an XML document to transform it to some other format (generally HTML) and sends the transformed document to the client (Web browser).
A third program transforms the original XML document into some other format (often HTML) before the document is placed on the server. Both server and client only deal with the post-transform document.
Place an xml-stylesheet
processing
instruction in the prolog immediately after the XML
declaration (if any) and before the document type declaration (if any).
This processing instruction should have a
type
attribute with the value
text/xml
and an href
attribute
whose value is an absolute or relative URL pointing to the style sheet.
<?xml version="1.0"?> <?xml-stylesheet type="text/xml" href="compositions.xsl"?>
Eventually application/xml+xslt
will replace text/xml
.
IE uses the non-existent MIME media type
text/xsl
instead.
This is also how you attach a CSS style sheet to a
document. The only difference here is that the
type
attribute has the value
text/xml
instead of text/css
.
<?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/TR/WD-xsl"> <xsl:template match="/"> <html> <xsl:apply-templates select="catalog"/> </html> </xsl:template> <xsl:template match="catalog"> <body> <xsl:apply-templates select="composition"/> </body> </xsl:template> <xsl:template match="composition"> <h3><xsl:value-of select="name"/></h3> </xsl:template> </xsl:stylesheet>
Many more ways to select and match elements including descendants, attributes, comments, processing instructions, and text.
Many more tests for predicates
The xsl:element
, xsl:attribute
, xsl:processing-instruction
, xsl:comment
, and xsl:text
elements can output elements, attributes, processing instructions, comments, and text calculated from data in the input document.
The xsl:copy-of
element to
copy nodes from the input to the output with their contents
intact
Parameters for passing arguments to templates
Modes for reprocessing the same element in a different fashion
Recursion
The xsl:variable
element defines named constants that can clarify your code.
Named templates, variables, and attribute sets help you reuse common template code.
The xsl:choose
and xsl:when
elements
let you select one of several possibilities depending on a condition.
The xsl:import
and xsl:include
elements merge rules from different style sheets.
Various attributes of the xsl:output
element allow you to specify the output document's format, XML declaration, document type declaration, indentation, encoding and MIME type.
Extension functions written in other languages like Java, JavaScript, and C++
Extension elements written in other languages like Java, JavaScript, and C++
A general regular expression language
Non-final variables (hence no side effects)
Loops
The Extensible Stylesheet Language (XSL) comprises two separate XML applications for transforming and formatting XML documents.
An XSL transformation applies rules to a tree read from an XML document to transform it into an output tree written as an XML document.
An XSL template rule is an xsl:template
element with a match
attribute. Nodes in the
input tree are compared against the patterns of the
match
attributes of the different template
elements. When a match is found, the contents of the
template are output.
The value of a node is a pure text (no markup) string
containing the contents of the node. This can be calculated
by the xsl:value-of
element.
The xsl:apply-templates
element continues
processing the children of the current node
The xsl:if
element produces output if, and
only if, its test
attribute is true.
The xsl:number
element inserts the number
specified by its value
attribute into the
output using a specified number format given by the
format
attribute.
The
xsl:sort
element can reorder the input nodes
before copying them to the output.
The XML Bible, 2nd edition
Elliotte Rusty Harold
IDG Books, 2001
ISBN: 0-7645-4760-7
Chapter 17, XSLT: http://www.cafeconleche.org/books/bible2/chapters/ch18.html
XML in a Nutshell
Elliotte Rusty Harold and W. Scott Means
O'Reilly & Associates, 2001
ISBN 0-596-00058-8
Chapter 9 XPath: http://www.oreilly.com/catalog/xmlnut/chapter/ch09.html
XSL Transformations 1.0 Specification: http://www.w3.org/TR/xslt/
XPath Specification: http://www.w3.org/TR/xpath
until someone comes up with a language with simple SELECT-DELETE-UPDATE semantics, XML databases will be the OODBMSs of the new millenium. I remember being introduced to XML and thinking that the concepts behind relational databases were more complex than those behind the hierarchical structures that encompass XML, amazingly enough the W3C has proved me wrong by producing increasingly complex languages that supposedly deal with handling XML in databases yet have much less functionality than a simple language like SQL.
--Dare Obasanjo on the xml-dev mailing list
XML is not a database, or a replacement for one.
Out of the box, XML doesn't know anything SQL; SQL doesn't know anything aobut XML.
But XML works very well with databases
You generally have to provide the integration code yourself, but this is not hard.
There are some third-party products and database add-ons that will do a lot of the work for you.
Two approaches:
Store the document as a field
Store pieces of the document as fields
There are a lot of products that will pull data out of a database and format it as XML:
Stonebroom's ASP2XML
Transparency's Beanstalk
IBM's DatabaseDom
infoteria's iConnector
IBM's DB2 XML Extender
FileMaker Web Companion
Oracle XML SQL Utility for Java
etc.
Or roll your own:
Java + JDBC
PHP
ASP
Visual Basic
etc.
XML is a very convenient format to move data between systems
It's all text; very platform independent.
Field and record boundaries are very clear
Schemas and DTDs can prevent you from accepting bad data
XML (1.0 or 1.1 as presently conceived) is far from perfect as a serialization format for objects or binary data. BUT once the data is in XML, it is (in principle) liberated from the application or class definitions that produced it. One might think of some SOAP message as a kludgy serialization of some business object, but for others it's an XML "document" that they can whack on with XPath/XQuery/XSLT/SAX/DOM/RDF/godonlyknowswhat. THAT's the real power of XML as an object serialization format, and this totally overwhelms its limitations ... at least today. If someday there are cheap, ubiquitous ASN.1 tools for parsing, transformation, manipulation, display, and querying, then this advantage of XML goes away, and we'll be arguing about this on ASN-DEV or whatever.
--Mike Champion on the xml-dev mailing list
Java works best
C, Perl, Python etc. can also be used
Unicode support is the biggest issue
SAX
DOM
JDOM
dom4j
Parser specific APIs
Public domain, developed on xml-dev mailing list
Maintained by David Megginson
org.xml.sax
package
Parser independent; programs can plug in different parsers
Event based; the parser pushes data to your handler
Read-only
SAX omits DTD declarations
Adds:
Namespace support
Optional validation
Optional lexical events for comments, CDATA sections, entity references
A lot more configurable
Deprecates a lot of SAX1
Adapter classes convert between SAX2 and SAX1 parsers.
Build a parser-specific implementation of the
XMLReader
interface using XMLReaderFactory
Your code registers a ContentHandler
with the parser
An InputSource
feeds the document into the parser
As the document is read, the parser calls back to the
methods of the
ContentHandler
to tell it
what it's seeing in the document.
import org.xml.sax.*; import org.xml.sax.helpers.*; import java.io.*; public class SAX2Checker { public static void main(String[] args) { if (args.length == 0) { System.out.println("Usage: java SAX2Checker URL1 URL2..."); } // set up the parser XMLReader parser; try { parser = XMLReaderFactory.createXMLReader(); } catch (SAXException e) { try { parser = XMLReaderFactory.createXMLReader("org.apache.xerces.parsers.SAXParser"); } catch (SAXException e2) { System.err.println("Error: could not locate a parser."); return; } } // start parsing... for (int i = 0; i < args.length; i++) { // command line should offer URIs or file names try { parser.parse(args[i]); // If there are no well-formedness errors // then no exception is thrown System.out.println(args[i] + " is well formed."); } catch (SAXParseException e) { // well-formedness error System.out.println(args[i] + " is not well formed."); System.out.println(e.getMessage() + " at line " + e.getLineNumber() + ", column " + e.getColumnNumber()); } catch (SAXException e) { // some other kind of error System.out.println(e.getMessage()); } catch (IOException e) { System.out.println("Could not check " + args[i] + " because of the IOException " + e); } } } }
package org.xml.sax; public interface ContentHandler { public void setDocumentLocator(Locator locator); public void startDocument() throws SAXException; public void endDocument() throws SAXException; public void startPrefixMapping(String prefix, String uri) throws SAXException; public void endPrefixMapping(String prefix) throws SAXException; public void startElement(String namespaceURI, String localName, String qualifiedName, Attributes atts) throws SAXException; public void endElement(String namespaceURI, String localName, String rawName) throws SAXException; public void characters(char[] text, int start, int length) throws SAXException; public void ignorableWhitespace(char[] text, int start, int length) throws SAXException; public void processingInstruction(String target, String data) throws SAXException; public void skippedEntity(String name) throws SAXException; }
import org.apache.xerces.parsers.*; import org.w3c.dom.*; import org.xml.sax.*; import java.io.IOException; import java.util.StringTokenizer; public class SAXWordCount implements ContentHandler { private int numWords; public void startDocument() throws SAXException { this.numWords = 0; } public void endDocument() throws SAXException { System.out.println(numWords + " words"); System.out.flush(); } private StringBuffer sb = new StringBuffer(); public void characters(char[] text, int start, int length) throws SAXException { sb.append(text, start, length); } private void flush() { numWords += countWords(sb.toString()); sb = new StringBuffer(); } // methods that signify a word break public void startElement(String namespaceURI, String localName, String rawName, Attributes atts) throws SAXException { this.flush(); } public void endElement(String namespaceURI, String localName, String rawName) throws SAXException { this.flush(); } public void processingInstruction(String target, String data) throws SAXException { this.flush(); } // methods that aren't necessary in this example public void startPrefixMapping(String prefix, String uri) throws SAXException { // ignore; } public void ignorableWhitespace(char[] text, int start, int length) throws SAXException { // ignore; } public void endPrefixMapping(String prefix) throws SAXException { // ignore; } public void skippedEntity(String name) throws SAXException { // ignore; } public void setDocumentLocator(Locator locator) {} private static int countWords(String s) { if (s == null) return 0; s = s.trim(); if (s.length() == 0) return 0; StringTokenizer st = new StringTokenizer(s); return st.countTokens(); } public static void main(String[] args) { SAXParser parser = new SAXParser(); SAXWordCount counter = new SAXWordCount(); parser.setContentHandler(counter); for (int i = 0; i < args.length; i++) { try { parser.parse(args[i]); } catch (SAXException e) { System.err.println(e); } catch (IOException e) { System.err.println(e); } } } // end main }
% java SAXWordCount hotcop.xml 16 words
You do not always have all the information you need at the time of a given callback
You may need to store information in various data structures (stacks, queues,vectors, arrays, etc.) and act on it at a later point
For example, the characters()
method is not guaranteed
to give you the maximum number of contiguous characters. It may
split a single run of characters over multiple method calls.
Defines how XML and HTML documents are represented as objects in programs
Defined in IDL; thus language independent
HTML as well as XML
Writing as well as reading
More complete than SAX or JDOM; covers everything except internal and external DTD subsets
DOM focuses more on the document; SAX focuses more on the parser.
Parser independent interfaces; parser dependent implementation classes. Most programs must use the parser dependent classes. JAXP helps solve this, but so far only for DOM Level 1.
Everything's a Node
:
Extensive use of polymorphism
Lots of casting
Language independence means there's very limited use of the Java class library; Various features are reinvented
Language independence requires no method overloading because not all languages support it.
Several features are poor design in Java, if not in other languages:
Named constants are often shorts
Only one kind of exception; details provided by constants
No Java-specific utility methods
like equals()
, hashCode()
, clone()
, or
toString()
DOM Level 0
DOM Level 1, a W3C Standard
DOM Level 2, a W3C Standard
DOM Level 3, several W3C Working Drafts
Eight Modules:
Core: org.w3c.dom
*
HTML: org.w3c.dom.html
Views: org.w3c.dom.views
StyleSheets: org.w3c.dom.stylesheets
CSS: org.w3c.dom.css
Events: org.w3c.dom.events
*
Traversal: org.w3c.dom.traversal
*
Range: org.w3c.dom.range
Only the core and traversal modules really apply to XML. The other six are for HTML.
* indicates Xerces support
Each XML document should is a tree.
A tree contains nodes.
Some nodes may contain other nodes (depending on node type).
Each document node contains:
zero or one doctype nodes
one root element node
zero or more comment and processing instruction nodes
17 interfaces:
Attr
CDATASection
CharacterData
Comment
Document
DocumentFragment
DocumentType
DOMImplementation
Element
Entity
EntityReference
NamedNodeMap
Node
NodeList
Notation
ProcessingInstruction
Text
plus one exception:
DOMException
Plus a bunch of HTML stuff in org.w3c.dom.html
and other packages
Library specific code creates a parser
The parser parses the document and returns an
org.w3c.dom.Document
object.
The entire document is stored in memory.
DOM methods and interfaces are used to extract data from this object
import org.apache.xerces.parsers.DOMParser; import org.xml.sax.SAXException; import java.io.IOException; import org.w3c.dom.*; public class DOMChecker { public static void main(String[] args) { // This is simpler but less flexible than the SAX approach. // Perhaps a good creational design pattern is needed here? DOMParser parser = new DOMParser(); for (int i = 0; i < args.length; i++) { try { // Read the entire document into memory parser.parse(args[i]); Document d = parser.getDocument(); // work with the document... } catch (SAXException e) { System.err.println(e); } catch (IOException e) { System.err.println(e); } } } }
import org.apache.xerces.parsers.*; import org.w3c.dom.*; import org.xml.sax.*; import java.io.IOException; import java.util.StringTokenizer; public class DOMWordCount { public static void main(String[] args) { DOMParser parser = new DOMParser(); DOMWordCount counter = new DOMWordCount(); for (int i = 0; i < args.length; i++) { try { // Read the entire document into memory parser.parse(args[i]); Document d = parser.getDocument(); int numWords = countWordsInNode(d); System.out.println(numWords + " words"); } catch (SAXException e) { System.err.println(e); } catch (IOException e) { System.err.println(e); } } } // end main // note use of recursion public static int countWordsInNode(Node node) { int numWords = 0; if (node.hasChildNodes()) { NodeList children = node.getChildNodes(); for (int i = 0; i < children.getLength(); i++) { numWords += countWordsInNode(children.item(i)); } } int type = node.getNodeType(); if (type == Node.TEXT_NODE) { String s = node.getNodeValue(); numWords += countWordsInString(s); } return numWords; } private static int countWordsInString(String s) { if (s == null) return 0; s = s.trim(); if (s.length() == 0) return 0; StringTokenizer st = new StringTokenizer(s); return st.countTokens(); } }
% java DOMWordCount hotcop.xml 16 words
JAXP 1.1 = SAX2 + DOM2 + TrAX + factory classes
Factory classes are in the javax.xml.parsers
Bundled with java 1.4 and later
import javax.xml.parsers.*; import org.w3c.dom.*; import org.xml.sax.*; import java.io.IOException; import java.util.StringTokenizer; public class JAXPWordCount { public static void main(String[] args) { try { DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); DocumentBuilder parser = factory.newDocumentBuilder(); JAXPWordCount counter = new JAXPWordCount(); for (int i = 0; i < args.length; i++) { try { // Read the entire document into memory Document d = parser.parse(args[i]); int numWords = countWordsInNode(d); System.out.println(numWords + " words"); } catch (SAXException e) { System.err.println(e); } catch (IOException e) { System.err.println(e); } } // end for } // end try catch (ParserConfigurationException e) { System.err.println( "No parser suporting JAXP could be found in the local class path."); } } // end main // note use of recursion public static int countWordsInNode(Node node) { int numWords = 0; if (node.hasChildNodes()) { NodeList children = node.getChildNodes(); for (int i = 0; i < children.getLength(); i++) { numWords += countWordsInNode(children.item(i)); } } int type = node.getNodeType(); if (type == Node.TEXT_NODE) { String s = node.getNodeValue(); numWords += countWordsInString(s); } return numWords; } private static int countWordsInString(String s) { if (s == null) return 0; s = s.trim(); if (s.length() == 0) return 0; StringTokenizer st = new StringTokenizer(s); return st.countTokens(); } }
% java JAXPWordCount hotcop.xml 16 words
More Java like tree-based API
Parser independent classes sit on top of parsers and other APIs
Construct an org.jdom.input.SAXBuilder
or an
org.jdom.input.DOMBuilder
Invoke the builder's build()
method to
build a Document
object from a
Reader
InputStream
URL
File
SYSTEM ID String
If there's a problem building the document, a JDOMException
is thrown
Work with the resulting Document
object
import org.jdom.JDOMException; import org.jdom.input.SAXBuilder; public class JDOMChecker { public static void main(String[] args) { if (args.length == 0) { System.out.println("Usage: java JDOMChecker URL1 URL2..."); } SAXBuilder builder = new SAXBuilder(); // start parsing... for (int i = 0; i < args.length; i++) { // command line should offer URIs or file names try { builder.build(args[i]); // If there are no well-formedness errors, // then no exception is thrown System.out.println(args[i] + " is well formed."); } catch (JDOMException e) { // indicates a well-formedness or other error System.out.println(args[i] + " is not well formed."); System.out.println(e.getMessage()); } } } }
% java JDOMChecker shortlogs.xml HelloJDOM.java shortlogs.xml is well formed. HelloJDOM.java is not well formed. The markup in the document preceding the root element must be well-formed.: Error on line 1 of XML document: The markup in the document preceding the root element must be well-formed.
import org.jdom.*; import org.jdom.input.SAXBuilder; import java.util.*; public class JDOMWordCount { public static void main(String[] args) { if (args.length == 0) { System.out.println("Usage: java JDOMWordCount URL1 URL2..."); } SAXBuilder builder = new SAXBuilder(); // start parsing... for (int i = 0; i < args.length; i++) { // command line should offer URIs or file names try { Document doc = builder.build(args[i]); Element root = doc.getRootElement(); int numWords = countWordsInElement(root); System.out.println(numWords + " words"); } catch (JDOMException e) { // indicates a well-formedness or other error System.out.println(args[i] + " is not well formed."); System.out.println(e.getMessage()); } } } public static int countWordsInElement(Element element) { int numWords = 0; List children = element.getContent(); Iterator iterator = children.iterator(); while (iterator.hasNext()) { Object o = iterator.next(); if (o instanceof Text) { numWords += countWordsInString((Text) o); } else if (o instanceof Element) { // note use of recursion numWords += countWordsInElement((Element) o); } } return numWords; } private static int countWordsInString(Text text) { if (text == null) return 0; String s = text.getText(); s = s.trim(); if (s.length() == 0) return 0; StringTokenizer st = new StringTokenizer(s); return st.countTokens(); } }
% java JDOMWordCount hotcop.xml 16 words
This presentation: http://www.cafeconleche.org/slides/sd2002west/introxml
XML in a Nutshell
Elliotte Rusty Harold and W. Scott Means
O'Reilly & Associates, 2001
ISBN 0-596-00058-8
XPath: http://www.oreilly.com/catalog/xmlnut/chapter/ch09.html
XML Bible, second edition
Elliotte Rusty Harold
Hungry Minds, 2001
ISBN 0-7645-4760-7
XSLT: http://www.cafeconleche.org/books/bible2/chapters/18.html
XSL-FO: http://www.cafeconleche.org/books/bible2/chapters/19.html
XLinks: http://www.cafeconleche.org/books/bible2/chapters/20.html
XPointers: http://www.cafeconleche.org/books/bible2/chapters/21.html