Part I: XML Basics
Part II: DTDs and Validity
Part III: Namespaces
Part IV: XSL Transformations
Part V: Programming with XML
Extensible Markup Language
A syntax for documents
A Meta-Markup Language
A Structural and Semantic language, not a formatting language
Not just for Web pages
Not like HTML, troff, LaTeX
Make up the tags you need as you need them
The tags you create can be documented in a Document Type Definition (DTD)
A meta syntax for domain-specific markup languages like MusicML, MathML, and CML
XML documents form a tree
Element and attribute names reflect the kind of the element
Formatting can be added with a style sheet
<dt>Hot Cop <dd> by Jacques Morali, Henri Belolo, and Victor Willis <ul> <li>Producer: Jacques Morali <li>Publisher: PolyGram Records <li>Length: 6:20 <li>Written: 1978 <li>Artist: Village People </ul>View Document in Browser
<SONG> <TITLE>Hot Cop</TITLE> <COMPOSER>Jacques Morali</COMPOSER> <COMPOSER>Henri Belolo</COMPOSER> <COMPOSER>Victor Willis</COMPOSER> <PRODUCER>Jacques Morali</PRODUCER> <PUBLISHER>PolyGram Records</PUBLISHER> <LENGTH>6:20</LENGTH> <YEAR>1978</YEAR> <ARTIST>Village People</ARTIST> </SONG>View Document in Browser
SONG {display: block; font-family: New York, Times New Roman, serif} TITLE {display: block; font-size: 24pt; font-weight: bold; font-family: Helvetica, sans} COMPOSER {display: block} PRODUCER {display: block} YEAR {display: block} PUBLISHER {display: block} LENGTH {display: block} ARTIST {display: block; font-style: italic}
<?xml-stylesheet type="text/css" href="song1.css"?>
<?xml-stylesheet type="text/css" href="song.css"?> <SONG> <TITLE>Hot Cop</TITLE> <COMPOSER>Jacques Morali</COMPOSER> <COMPOSER>Henri Belolo</COMPOSER> <COMPOSER>Victor Willis</COMPOSER> <PRODUCER>Jacques Morali</PRODUCER> <PUBLISHER>PolyGram Records</PUBLISHER> <LENGTH>6:20</LENGTH> <YEAR>1978</YEAR> <ARTIST>Village People</ARTIST> </SONG>
<?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="SONG"> <html> <body> <h1> <xsl:value-of select="TITLE"/> by the <xsl:value-of select="ARTIST"/> </h1> <ul> <xsl:apply-templates select="COMPOSER"/> <li>Publisher: <xsl:value-of select="PUBLISHER"/></li> <li>Year: <xsl:value-of select="YEAR"/></li> <li>Producer: <xsl:value-of select="PRODUCER"/></li> </ul> </body> </html> </xsl:template> <xsl:template match="COMPOSER"> <li>Composer: <xsl:value-of select="."/></li> </xsl:template> </xsl:stylesheet>
<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="SONG">
<html>
<body>
<h1>
<xsl:value-of select="TITLE"/>
</h1>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
View in BrowserDomain-Specific Markup Languages
Self-Describing Data
Interchange of Data Among Applications
Structured and Integrated Data
Non proprietary format
Don't pay for what you don't use
Much data is lost due to format problems
XML is very simple
XML is self-describing
XML is well documented
<PERSON ID="p1100" SEX="M">
<NAME>
<GIVEN>Judson</GIVEN>
<SURNAME>McDaniel</SURNAME>
</NAME>
<BIRTH>
<DATE>21 Feb 1834</DATE>
</BIRTH>
<DEATH>
<DATE>9 Dec 1905</DATE>
</DEATH>
</PERSON>
E-commerce
Syndication
EAI and EDI
A document can be assembled from multiple physical storage entities
These may be files, database queries, or anything that can be referred to by a URI
Can even include non-XML content
A specific markup language that uses the XML meta-syntax is called an XML application
Different XML applications have their own more constricted syntaxes and vocabularies within the broader XML syntax
Further syntax can be layered on top of this; e.g. data typing through schemas
Web Pages
Mathematical Equations
Music Notation
Vector Graphics
Metadata
and more...
<?xml version="1.0"?> <html xmlns="http://www.w3.org/TR/REC-html40" xmlns:m="http://www.w3.org/TR/REC-MathML/" > <head> <title>Fiat Lux</title> <meta name="GENERATOR" content="amaya V1.3b" /> </head> <body> <P> And God said, </P> <math> <m:mrow> <m:msub> <m:mi>δ</m:mi> <m:mi>α</m:mi> </m:msub> <m:msup> <m:mi>F</m:mi> <m:mi>αβ</m:mi> </m:msup> <m:mi></m:mi> <m:mo>=</m:mo> <m:mi></m:mi> <m:mfrac> <m:mrow> <m:mn>4</m:mn> <m:mi>π</m:mi> </m:mrow> <m:mi>c</m:mi> </m:mfrac> <m:mi></m:mi> <m:msup> <m:mi>J</m:mi> <m:mrow> <m:mi>β</m:mi> <m:mo></m:mo> </m:mrow> </m:msup> </m:mrow> </math> <P> and there was light </P> </body> </html>
<?xml version="1.0"?>
<CHANNEL HREF="http://www.ibiblio.org/xml/index.html">
<TITLE>Cafe con Leche</TITLE>
<ITEM HREF="http://www.ibiblio.org/xml/books.html">
<TITLE>Books about XML</TITLE>
</ITEM>
<ITEM HREF="http://www.ibiblio.org/xml/tradeshows.html">
<TITLE>Trade shows and conferences about XML</TITLE>
</ITEM>
<ITEM HREF="http://www.ibiblio.org/xml/lists.htm">
<TITLE>Mailing Lists dedicated to XML</TITLE>
</ITEM>
</CHANNEL>
Joseph Conrad's Heart of Darkness
The entire Gutenberg Project
Vector Markup Language (VML)
Internet Explorer 5.0
Microsoft Office 2000
Scalable Vector Graphics (SVG)
Meta-data
Dublin Core
Better Web searching
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:dc="http://purl.org/DC/>
<rdf:Description about="http://www.ibiblio.org/xml/>
<dc:CREATOR>Elliotte Rusty Harold</dc:CREATOR>
<dc:TITLE>Cafe con Leche</dc:TITLE>
</rdf:Description>
</rdf:RDF>
XSL: The Extensible Stylesheet Language
XLink: The Extensible Linking Language
XSL Transformations
XSL Formatting Objects
Data Typing in XML is Weak
<xsd:schema xmlns:xsd="http://www.w3.org/2000/10/XMLSchema"> <xsd:annotation> <xsd:documentation> Song schema for XML and Java Example at SD2000 East Copyright 2000 Elliotte Rusty Harold. </xsd:documentation> </xsd:annotation> <xsd:element name="SONG" type="songType"/> <xsd:complexType name="songType" content="elementOnly""> <xsd:element name="TITLE" type="xsd:string"/> <xsd:element name="PHOTO" type="photoType" minOccurs="0" maxOccurs="1"/> <xsd:element name="COMPOSER" type="xsd:string" minOccurs="1" maxOccurs="unbounded"/> <xsd:element name="PRODUCER" type="xsd:string" minOccurs="0" maxOccurs="unbounded"/> <xsd:element ref="publisher" minOccurs="0" maxOccurs="1"/> <xsd:element name="LENGTH" type="xsd:string"/> <xsd:element name="YEAR" type="xsd:string"/> <xsd:element name="ARTIST" type="xsd:string" minOccurs="1" maxOccurs="unbounded"/> <xsd:attribute name="xmlns" type="xsd:NMTOKEN" use="fixed" value="http://ibiblio.org/xml/namespace/song"/> <xsd:attribute name="xmlns:xlink" type="xsd:NMTOKEN" use="fixed" value="http://www.w3.org/1999/xlink"/> </xsd:complexType> <xsd:complexType name="photoType" content="empty"> <xsd:attribute name="xlink:type" type="xsd:NMTOKEN" use="fixed" value="simple"/> <xsd:attribute name="xlink:show" type="xsd:NMTOKEN" use="fixed" value="embed"/> <xsd:attribute name="xlink:href" type="xsd:uri-reference" use="fixed" value="simple"/> <xsd:attribute name="ALT" type="xsd:string"/> <xsd:attribute name="WIDTH" type="xsd:positive-integer"/> <xsd:attribute name="HEIGHT" type="xsd:positive-integer"/> </xsd:complexType> <xsd:element name="PUBLISHER"> <xsd:complexType base='xsd:string' derivedBy='extension' content="textOnly"> <xsd:attribute name="xlink:type" type="xsd:NMTOKEN" use="fixed" value="simple"/> <xsd:attribute name="xlink:href" type="xsd:uri-reference" /> </xsd:complexType> </xsd:element> </xsd:schema>
Any element can be a link
Links can be bi-directional
Links can be separated from the documents they connect
<footnote xlink:type="simple" xlink:href="footnote7.xml">7</footnote>
Microsoft Office 2000
Netscape What's Related
Plain ASCII or UTF-8 text
.xml is standard file extension
Any standard text editor will work
<?xml version="1.0" encoding="ISO-8859-1" standalone="yes"?> <FOO> Hello XML! </FOO>
version
attribute
required
always has the value 1.0
standalone
attribute
yes
no
encoding
attribute
UTF-8
ISO-8859-1
etc.
Start tag <FOO>
Contents "Hello XML!"
End tag </FOO>
Examine the data
Design a vocabulary for the data
Write a style sheet
XML documents are trees.
XML elements contain other elements as well as text
Within these limits there's more than one way to organize the data
Hierarchically
Relationally
Objects
The catalog?
A custom Document element?
Choose catalog
for the root element
Everything else will be a descendant of catalog
This is not the only possible choice
<?xml version="1.0" encoding="ISO-8859-1"?> <catalog> Everything else will go here... </catalog>View in Browser
Composers?
Songs/Compositions?
Categories?
All of the Above?
<?xml version="1.0" encoding="ISO-8859-1"?>
<catalog>
<category>
Small chamber ensembles - 2-4 Players by New York Women Composers
</category>
</catalog>
View in Browser<?xml version="1.0" encoding="ISO-8859-1"?>
<catalog><category>Small chamber ensembles
- 2-4 Players by New York Women Composers</category></catalog>
View in BrowserEach composer has a name
<?xml version="1.0" encoding="ISO-8859-1"?> <catalog> <category> Small chamber ensembles - 2-4 Players by New York Women Composers </category> <composer> <name>Julie Mandel</name> </composer> <composer> <name>Margaret De Wys</name> </composer> <composer> <name>Beth Anderson</name> </composer> <composer> <name>Linda Bouchard</name> </composer> </catalog>View in Browser
It's better for sorting to divide names into first, middle, and last
Some (e.g. middle name) elements may be empty
<?xml version="1.0" encoding="ISO-8859-1"?> <catalog> <category> Small chamber ensembles - 2-4 Players by New York Women Composers </category> <composer> <name> <first_name>Julie</first_name> <middle_name></middle_name> <last_name>Mandel</last_name> </name> </composer> <composer> <name> <first_name>Margaret</first_name> <middle_name>De</middle_name> <last_name>Wys</last_name> </name> </composer> <composer> <name> <first_name>Beth</first_name> <middle_name></middle_name> <last_name>Anderson</last_name> </name> </composer> <composer> <name> <first_name>Linda</first_name> <middle_name></middle_name> <last_name>Bouchard</last_name> </name> </composer> </catalog>View in Browser
Some people have the same names
Use an ID number to disambiguate
Store the ID number in an id
attribute
name=value
An element may not have two attributes with the same name
Attribute values must be quoted
<?xml version="1.0" encoding="ISO-8859-1"?> <catalog> <category> Small chamber ensembles - 2-4 Players by New York Women Composers </category> <composer id="c1"> <name> <first_name>Julie</first_name> <middle_name></middle_name> <last_name>Mandel</last_name> </name> </composer> <composer id="c2"> <name> <first_name>Margaret</first_name> <middle_name>De</middle_name> <last_name>Wys</last_name> </name> </composer> <composer id="c3"> <name> <first_name>Beth</first_name> <middle_name></middle_name> <last_name>Anderson</last_name> </name> </composer> <composer id="c4"> <name> <first_name>Linda</first_name> <middle_name></middle_name> <last_name>Bouchard</last_name> </name> </composer> </catalog>View in Browser
Attribute are for meta-data; elements are for data.
Does the reader want to see the information? If yes, use element content; if no, use attributes
Attributes are good for ID numbers, URLs, references, and other information not directly relevant to the reader
Attributes can't hold structure well.
Elements allow you to include meta-meta-data (information about the information about the information).
Not everyone always agrees on what is and isn't meta-data.
Elements are more extensible in the face of future changes.
Let's look at an example of what we want:
Rendered HTML:
Tonal. Commissioned/Premiered by the Redlands' New Music Ensemble. (A swale is a meadow or a marsh where a lot of wild plants grow together. The composer discovered the word when a horse named Swale won the Kentucky Derby several years ago. Since her work is primarily collage of newly composed musical swatches, she has used the name extensively.) ACA - American Composers Alliance
Or in HTML:
<dt><cite>Brass Swale</cite> (1988) 5", tbn, 2 Bfl tpts, bar. hn</dt>
<dd><p>
Tonal. Commissioned/Premiered by the Redlands' New Music
Ensemble. (A swale is a meadow or a marsh where a lot of
wild plants grow together. The composer discovered the word
when a horse named Swale won the Kentucky Derby several
years ago. Since her work is primarily collage of newly
composed musical swatches, she has used the name
extensively.) ACA - American Composers
Alliance</p>
</dd>
Title
Date
Description
List of instruments
Length
Publisher
Some pieces may be missing from some compositions
<composition>
<title>Brass Swale</title>
<date>1988</date>
<length>5"</length>
<instruments>tbn, 2 Bfl tpts, bar, hn</instruments>
<description>
Tonal. Commissioned/Premiered by the Redlands' New Music
Ensemble. (A swale is a meadow or a marsh where a lot of
wild plants grow together. The composer discovered the word
when a horse named Swale won the Kentucky Derby several
years ago. Since her work is primarily collage of newly
composed musical swatches, she has used the name
extensively.)
</description>
<publisher>ACA - American Composers Alliance</publisher>
</composition>
View in Browser <composition>
<title>Trio for Flute, Viola and Harp</title>
<date><year>1994</year></date>
<length>13'38"</length>
<instruments>fl, hp, vla</instruments>
<description>
<p>Premiered at Queens College in April, 1996 by Sue Ann Kahn,
Christine Ims, and Susan Jolles. In 3 movements :</p>
<ul>
<li>mvt. 1: 5:01</li>
<li>mvt. 2: 4:11</li>
<li>mvt. 3: 4:26</li>
</ul>
</description>
<publisher>Theodore Presser</publisher>
</composition>
View in Browser <composition composer="c3">
<title>Trio: Dream in D</title>
<date><year>1980</year></date>
<length>10'</length>
<instruments>fl, pn, vc, or vn, pn, vc</instruments>
<description>
Rhapsodic. Passionate. Available on CD
<cite><a href=
"http://www.amazon.com/exec/obidos/ASIN/B000007NMH/qid=913265342/sr%3D1-2/">
Two by Three
</a></cite> from North/South Consonance (1998).
</description>
<publisher></publisher>
</composition>
View in Browser<?xml version="1.0" encoding="ISO-8859-1"?>
<catalog>
<category>
Small chamber ensembles - 2-4 Players by New York Women Composers
</category>
<cataloging_info>
<abstract>Compositions by the members of New York Women Composers</abstract>
<keyword>music publishing</keyword>
<keyword>scores</keyword>
<keyword>women composers</keyword>
<keyword>New York</keyword>
</cataloging_info>
<composer id="c1">
<name>
<first_name>Julie</first_name>
<middle_name></middle_name>
<last_name>Mandel</last_name>
</name>
</composer>
...
</catalog>
View in BrowserCopyright notice
Name of maintainer
Email address of maintainer
Last modified date
<?xml version="1.0" encoding="ISO-8859-1"?>
<catalog>
...
<last_updated>July 28, 1999</last_updated>
<copyright>1999 New York Women Composers</copyright>
<maintainer email="elharo@metalab.unc.edu"
url="http://www.macfaq.com/personal.html">
<name>
<first_name>Elliotte</first_name>
<middle_name>Rusty</middle_name>
<last_name>Harold</last_name>
</name>
</maintainer>
</catalog>
View in BrowserPartially supported by Mozilla, IE 5.0, and Opera 4.0
Full W3C Recommendation
<?xml version="1.0" encoding="ISO-8859-1"?>
<?xml-stylesheet type="text/css" href="compositions1.css"?>
<catalog>
...
</catalog>
View in BrowserNot every element needs a rule
The root element should be at least display: block
catalog { font-family: "New York", "Times New Roman", serif;
font-size: 14pt;
background-color: white;
color: black;
display: block }
Make it look like an H1 heading
category { display: block;
font-family: Helvetica, Arial, sans;
font-size: 32pt;
font-weight: bold;
text-align: center
}
catalog { font-family: New York, Times New Roman, serif;
font-size: 14pt;
background-color: white;
color: black;
display: block
}
Make it look like a level 2 head
No need to stylize the first, middle, and last names separately
composer { display: block;
font-family: Helvetica, Arial, sans;
font-size: 24pt;
font-weight: bold;
text-align: left
}
composition title { display: block;
font-family: Helvetica, Arial, sans;
font-size: 18pt;
font-weight: bold;
text-align: left
}
// cataloging_info is only for search engines
cataloging_info { display: none;
color: white}
display: none
requires CSS2:
<last_updated>July 28, 1999</last_updated>
<copyright>1999 New York Women Composers</copyright>
<maintainer email="elharo@metalab.unc.edu"
url="http://www.macfaq.com/personal.html">
<name>
<first_name>Elliotte</first_name>
<middle_name>Rusty</middle_name>
<last_name>Harold</last_name>
</name>
</maintainer>
last_updated, copyright, maintainer {display: block;
font-size: small}
copyright:before {content: "Copyright " }
last_updated:before {content: "Last Modified " }
last_updated {margin-top: 2ex }
Again, some of this requires CSS2
composition * {display:list-item}
description {display: block}
category { display: block; font-family: Helvetica, Arial, sans; font-size: 32pt; font-weight: bold; text-align: center } catalog { font-family: "New York", "Times New Roman", serif; font-size: 14pt; background-color: white; color: black; display: block } composer { display: block; font-family: Helvetica, Arial, sans; font-size: 24pt; font-weight: bold; text-align: left } composition title { display: block; font-family: Helvetica, Arial, sans; font-size: 18pt; font-weight: bold; text-align: left} composition * {display:list-item} description {display: block} // cataloging_info is only for search engines cataloging_info { display: none; color: #FFFFFF} last_updated, copyright, maintainer {display: block; font-size: small} copyright:before {content: "Copyright " } last_updated:before {content: "Last Modified " } last_updated {margin-top: 2ex }
Should be able to match composers with compositions
Should be able to sort composers and compositions by name
Should be able to include data from attributes; e.g. the maintainer's email address
Horizontal rules would be nice
Better header (e.g. title
and meta
tags) would be nice
CSS Level 3?
XSL
XSL + JavaScript
CSS has broader support
CSS is more stable
XSL is much more powerful
XSL can be used without browser support by transforming to HTML on the server side
Open and close all tags
Empty tags end with />
There is a unique root element
Elements may not overlap
Attribute values are quoted
<
and &
are only used to start tags and entities
Only the five predefined entity references are used
Plus more...
Good:
<p>The quick brown fox jumped over the lazy dog</p>
<li>A very <B>important</B> point</li>
Copyright 1999 Elliotte Rusty Harold<br></br>
Bad:
The quick brown fox jumped over the lazy dog<p>
<li>A very <B>important point
Copyright 1999 Elliotte Rusty Harold<br>
<BR/>
, <HR/>
, and
<IMG/>
instead of
<BR>
, <HR>
, and
<IMG>
Web browsers deal inconsistently with these
Can use <BR></BR>
<HR></HR>
<IMG></IMG>
instead
<BR CLASS="EMPTY"/>
seems to work best.
One element completely contains all other elements of the document
This is HTML
in HTML files
The XML declaration and xml-stylesheet
processing instruction are
not elements
If an element contains a start tag for an element, it must also contain the corresponding end tag
Empty elements may appear anywhere
Every non root element has a parent element
Good:
<A HREF="http://www.ibiblio.org/xml/">
<DIV ALIGN="CENTER">
<A HREF="http://www.ibiblio.org/xml/">
<EMBED SRC="minnesotaswale.aif" hidden="hidden">
Bad:
<A HREF=http://www.ibiblio.org/xml/>
<DIV ALIGN=CENTER>
<EMBED SRC=minnesotaswale.aif hidden=hidden>
<EMBED SRC="minnesotaswale.aif" hidden>
Good:
<H1>O'Reilly & Associates</H1>
Bad:
<H1>O'Reilly & Associates</H1>
Good:
<CODE>for (int i = 0; i <= args.length; i++ ) { </CODE>
Bad:
<CODE>for (int i = 0; i <= args.length; i++ ) { </CODE>
Good:
&
<
>
"
'
Bad:
©
®
&tm;
α
é
etc.
Entity references must end with a semicolon.
<
is good
<
is bad
Decimal
Hexadecimal
Extensible Markup Language
A syntax for documents
A Meta-Markup Language
A Structural and Semantic language, not a formatting language
Not just for Web pages
Not like HTML, troff, LaTeX
Make up the tags you need as you need them
The tags you create can be documented in a Document Type Definition (DTD)
A meta syntax for domain-specific markup languages like MusicML, MathML, and CML
<SONG> <TITLE>Hot Cop</TITLE> <COMPOSER>Jacques Morali</COMPOSER> <COMPOSER>Henri Belolo</COMPOSER> <COMPOSER>Victor Willis</COMPOSER> <PRODUCER>Jacques Morali</PRODUCER> <PUBLISHER>PolyGram Records</PUBLISHER> <LENGTH>6:20</LENGTH> <YEAR>1978</YEAR> <ARTIST>Village People</ARTIST> </SONG>
<!ELEMENT SONG (TITLE, COMPOSER+, PRODUCER*, PUBLISHER*, LENGTH?, YEAR?, ARTIST+)> <!ELEMENT TITLE (#PCDATA)> <!ELEMENT COMPOSER (#PCDATA)> <!ELEMENT PRODUCER (#PCDATA)> <!ELEMENT PUBLISHER (#PCDATA)> <!ELEMENT LENGTH (#PCDATA)> <!-- This should be a four digit year like "1999", not a two-digit year like "99" --> <!ELEMENT YEAR (#PCDATA)> <!ELEMENT ARTIST (#PCDATA)>
<?xml version="1.0"?> <!DOCTYPE SONG SYSTEM "song.dtd"> <SONG> <TITLE>Hot Cop</TITLE> <COMPOSER>Jacques Morali</COMPOSER> <COMPOSER>Henri Belolo</COMPOSER> <COMPOSER>Victor Willis</COMPOSER> <PRODUCER>Jacques Morali</PRODUCER> <PUBLISHER>PolyGram Records</PUBLISHER> <LENGTH>6:20</LENGTH> <YEAR>1978</YEAR> <ARTIST>Village People</ARTIST> </SONG>
To be valid an XML document must be
Well-formed
Must have a document type declaration
Must comply with the constraints specified in the DTD
To check validity you pass the document through a validating parser which should report any errors it finds. For example,
% java sax.SAXCount -v invalidhotcop.xml Error at (file file:/D:/speaking/SD99EAST/dtds/invalidhotcop.xml, line 10, char 8): Element "<SONG>" is not valid because it does not follow the rule, "(TITLE,C OMPOSER+,PRODUCER*,PUBLISHER*,LENGTH?,YEAR?,ARTIST+)". invalidhotcop.xml: 281 ms
A valid document:
% java sax.SAXCount -v validhotcop.xml validhotcop.xml: 170 ms
java.net.MalformedURLException: no protocol:
Domain-Specific Markup Languages
Self-Describing Data
Interchange of Data Among Applications
Structured and Integrated Data
A DTD precisely describes the format
DTDs verify that documents adhere to the format
Ensures interoperability of unrelated tools
DTDs explain the format so reverse engineering isn't as necessary
Comments in DTDs can go even further
<!-- This should be a four digit year like "1999",
not a two-digit year like "99" -->
<!ELEMENT YEAR (#PCDATA)>
E-commerce and syndication
DTDs make sure that two independent applications speak the same language
DTDs detect malformed data
DTDs verify correct data
Can specify relationships between elements using element declarations
Can assemble data from multiple sources using external entity references declared in the DTD
A specific markup language that uses the XML meta-syntax is called an XML application
Different XML applications have their own more constricted syntaxes and vocabularies within the broader XML syntax
The DTD documents this syntax
Further syntax can be layered on top of this; e.g. data typing through schemas
There are two levels of conformance to XML
Well-formed documents are correct with or without a DTD. They adhere to the basic syntax rules of XML
Valid documents also adhere to the constraints specified in a DTD
All valid documents are well-formed; not all well-formed document are valid.
A Document Type Definition describes the elements and attributes that may appear in a document
Validation compares a particular document against a DTD
Well-formedness is a prerequisite for validity
A DTD lists the elements, attributes, and entities contained in a document
A DTD defines the relationships between different elements and attributes
internal vs. external DTDs
Ensures that data is correct before feeding it into a program
Ensures that a format is followed
Establishes what must be supported
Not all documents need to be valid; sometimes well-formed is enough
<?xml version="1.0" encoding="ISO-8859-1"?> <!DOCTYPE catalog SYSTEM "compositions.dtd"> <catalog> <category> Small chamber ensembles - 2-4 Players by New York Women Composers </category> <cataloging_info> <abstract>Compositions by the members of New York Women Composers</abstract> <keyword>music publishing</keyword> <keyword>scores</keyword> <keyword>women composers</keyword> <keyword>New York</keyword> </cataloging_info> <last_updated>July 28, 1999</last_updated> <copyright>1999 New York Women Composers</copyright> <maintainer email="elharo@metalab.unc.edu" url="http://www.macfaq.com/personal.html"> <name> <first_name>Elliotte</first_name> <middle_name>Rusty</middle_name> <last_name>Harold</last_name> </name> </maintainer> <composer id="c1"> <name> <first_name>Julie</first_name> <middle_name></middle_name> <last_name>Mandel</last_name> </name> </composer> <composer id="c2"> <name> <first_name>Margaret</first_name> <middle_name>De</middle_name> <last_name>Wys</last_name> </name> </composer> <composer id="c3"> <name> <first_name>Beth</first_name> <middle_name></middle_name> <last_name>Anderson</last_name> </name> </composer> <composer id="c4"> <name> <first_name>Linda</first_name> <middle_name></middle_name> <last_name>Bouchard</last_name> </name> </composer> <composition composer="c1"> <title>Trio for Flute, Viola and Harp</title> <date><year>1994</year></date> <length>13'38"</length> <instruments>fl, hp, vla</instruments> <description> <p>Premiered at Queens College in April, 1996 by Sue Ann Kahn, Christine Ims, and Susan Jolles. In 3 movements :</p> <ul> <li>mvt. 1: 5:01</li> <li>mvt. 2: 4:11</li> <li>mvt. 3: 4:26</li> </ul> </description> <publisher>Theodore Presser</publisher> </composition> <composition composer="c2"> <title>Charmonium</title> <date><year>1991</year></date> <length>9'</length> <instruments>2 vln, vla, vc</instruments> <description> <p>Commissioned as quartet for the Meridian String Quartet. Sonorous, bold. Moderate difficulty. Tape available.</p> </description> </composition> <composition composer="c1"> <title>Invention for Flute and Piano</title> <date><year>1994</year></date> <instruments>fl, pn</instruments> <description><p>3 movements</p></description> </composition> <composition composer="c3"> <title>Little Trio</title> <date><year>1984</year></date> <length>4'</length> <instruments>fl, guit, va</instruments> <publisher>ACA</publisher> </composition> <composition composer="c3"> <title>Dr. Blood's Mermaid Lullaby</title> <date><year>1980</year></date> <length>3'</length> <instruments>fl or ob, or vn, or vc, pn</instruments> <publisher>ACA</publisher> </composition> <composition composer="c3"> <title>Trio: Dream in D</title> <date><year>1980</year></date> <length>10'</length> <instruments>fl, pn, vc, or vn, pn, vc</instruments> <description> <p>Rhapsodic. Passionate. Available on CD <cite> <a href="http://www.amazon.com/exec/obidos/ASIN/B000007NMH/qid=913265342/sr=1-2/"> Two by Three</a></cite> from North/South Consonance (1998).</p> </description> </composition> <composition composer="c4"> <title>Propos II</title> <date><year>1985</year></date> <length>11'</length> <instruments>2 tpt</instruments> <description><p>Arrangement from Propos</p></description> </composition> <composition composer="c4"> <title>Rictus En Mirroir</title> <date><year>1985</year></date> <length>14'</length> <instruments>fl, ob, hpschd, vc</instruments> </composition> </catalog>View in Browser
Each tag must be declared in a <!ELEMENT>
declaration.
A <!ELEMENT>
declaration gives the
name and content model of the element
The content model uses a simple regular expression-like grammar to precisely specify what is and isn't allowed in an element
ANY
#PCDATA
Sequences
Choices
Mixed Content
Modifiers
EMPTY
<!ELEMENT catalog ANY>
A catalog
can contain any
child element and/or raw text (parsed character data)
Parsed Character Data; i.e. raw text, no markup. For example,
<year>1984</year>
<!ELEMENT year (#PCDATA)>
Valid:
<year>1999</year>
<year>99</year>
<year>1999 C.E.</year>
<year>
The year of our Lord one thousand, nine hundred, and ninety-nine
</year>
Invalid:
<year>
<month>January</month>
<month>February</month>
<month>March</month>
<month>April</month>
<month>May</month>
<month>June</month>
<month>July</month>
<month>August</month>
<month>September</month>
<month>October</month>
<month>November</month>
<month>December</month>
</year>
There are a number of elements in the example document that only contain PCDATA:
<!ELEMENT category (#PCDATA)>
<!ELEMENT abstract (#PCDATA)>
<!ELEMENT keyword (#PCDATA)>
<!ELEMENT last_updated (#PCDATA)>
<!ELEMENT copyright (#PCDATA)>
<!ELEMENT first_name (#PCDATA)>
<!ELEMENT middle_name (#PCDATA)>
<!ELEMENT last_name (#PCDATA)>
<!ELEMENT title (#PCDATA)>
<!ELEMENT year (#PCDATA)>
<!ELEMENT instruments (#PCDATA)>
<!ELEMENT publisher (#PCDATA)>
<!ELEMENT length (#PCDATA)>
DTDs seem fundamentally more obfuscated than C.
Comments can improve this by giving example elements
Comments are the same as in HTML; e.g. <!-- Comment -->
<!-- e.g. "1999 New York Women Composers",
not "Copyright 1999 New York Women Composers" -->
<!ELEMENT copyright (#PCDATA)>
<date><year>1994</year></date>
To declare that a date
element must have a
year
child:
<!ELEMENT date (year)>
You only have to declare the immediate children
<maintainer email="elharo@metalab.unc.edu"
url="http://www.macfaq.com/personal.html">
<name>
<first_name>Elliotte</first_name>
<middle_name>Rusty</middle_name>
<last_name>Harold</last_name>
</name>
</maintainer>
<composer id="c1">
<name>
<first_name>Julie</first_name>
<middle_name></middle_name>
<last_name>Mandel</last_name>
</name>
</composer>
To declare that a maintainer
element must have a
name
child:
<!ELEMENT maintainer (name)>
<!ELEMENT composer (name)>
<name>
<first_name>Elliotte</first_name>
<middle_name>Rusty</middle_name>
<last_name>Harold</last_name>
</name>
Separate multiple required child elements with commas; e.g.
<!ELEMENT name (first_name, middle_name, last_name)>
A list of child elements separated by commas is called a sequence
ELEMENT
The element being described must have only child elements, no mixed content
You must know the order of the child elements
You must know the type of each child element
You must know the number of child elements
The number can be relaxed with wild cards
<cataloging_info>
<abstract>Compositions by the members of New York Women Composers</abstract>
<keyword>music publishing</keyword>
<keyword>scores</keyword>
<keyword>women composers</keyword>
<keyword>New York</keyword>
</cataloging_info>
The +
suffix indicates that one or more of that element
is required at that point
<!ELEMENT cataloging_info (abstract, keyword+)>
The *
suffix indicates that zero, one, or more of that element
is required at that point
<!ELEMENT catalog (category, cataloging_info, last_updated, copyright,
maintainer, composer*, composition*)>
<composition composer="c1">
<title>Trio for Flute, Viola and Harp</title>
<date><year>1994</year></date>
<length>13'38"</length>
<instruments>fl, hp, vla</instruments>
<description>
<p>Premiered at Queens College in April, 1996 by Sue Ann Kahn,
Christine Ims, and Susan Jolles. In 3 movements :</p>
<ul>
<li>mvt. 1: 5:01</li>
<li>mvt. 2: 4:11</li>
<li>mvt. 3: 4:26</li>
</ul>
</description>
<publisher>Theodore Presser</publisher>
</composition>
Suffixing an element name with a question mark (?) in the content model indicates that either 0 or 1 (but not more than one) of that element are expected at that position
<!ELEMENT composition
(title, date, length?, instruments, description?, publisher?)>
A choice indicates one element or another but not both
A choice is signified by a vertical bar |
There can be two or more elements in a choice
<!ELEMENT date (year | ISODate)>
<!ELEMENT year (#PCDATA)>
<!ELEMENT ISODate (#PCDATA)>
Parentheses combine several elements into a single element.
Parenthesized elements can be nested inside other parentheses in place of a single element.
The parenthesized elements can be suffixed with a plus sign, a comma, or a question mark.
<!ELEMENT dl (dt, dd)*>
<!ELEMENT ARTICLE (TITLE, (P | PHOTO | GRAPH | SIDEBAR | PULLQUOTE | SUBHEAD)*, BYLINE?)>
<ELEMENT catalog (category, cataloging_info, last_updated,
copyright, maintainer, (composer | composition)*)>
<!ELEMENT description (#PCDATA | ul | a | cite | p)*>
<!ELEMENT cite (#PCDATA | a)*>
<!ELEMENT ul (li*)>
<!ELEMENT li (#PCDATA)>
<!ELEMENT a (#PCDATA)>
<!ELEMENT p (#PCDATA)>
Mixed content is both #PCDATA and child elements in a choice, followed by an asterisk
Should be avoided where possible
This is the only way to combine PCDATA with child elements in a content model
#PCDATA must come first
#PCDATA cannot be used in a sequence
<!ELEMENT BR EMPTY>
<!ELEMENT IMG EMPTY>
<!ELEMENT HR EMPTY>
Mixed content with other content models
Exactly one element of a given type but in any position (The SGML & operator)
Between M and N of a given element
Restrictions on the PCDATA; e.g. that the year
element must contain a four-digit year
Recall this element:
<maintainer email="elharo@metalab.unc.edu"
url="http://www.macfaq.com/personal.html">
<name>
<first_name>Elliotte</first_name>
<middle_name>Rusty</middle_name>
<last_name>Harold</last_name>
</name>
</maintainer>
It is declared like this:
<!ELEMENT maintainer (name)>
<!ATTLIST maintainer email CDATA "webmaster@nywc.org">
<!ATTLIST maintainer url CDATA "http://www.ibiblio.org/nywc">
The general format of an <!ATTLIST>
declaration is:
<!ATTLIST Element_name Attribute_name Type Default_value>
<maintainer email="elharo@metalab.unc.edu"
url="http://www.macfaq.com/personal.html">
<name>
<first_name>Elliotte</first_name>
<middle_name>Rusty</middle_name>
<last_name>Harold</last_name>
</name>
</maintainer>
It is declared like this:
<!ELEMENT maintainer (name)>
<!ATTLIST maintainer email CDATA "webmaster@nywc.org">
<!ATTLIST maintainer url CDATA "http://www.ibiblio.org/nywc">
But it can also be declared in a single
<!ATTLIST>
declaration like this:
<!ATTLIST maintainer email
CDATA "webmaster@nywc.org" url CDATA "http://www.ibiblio.org/nywc/">
This is more obvious with better indentation:
<!ATTLIST maintainer email CDATA "webmaster@nywc.org"
url CDATA "http://www.ibiblio.org/nywc/">
A literal string value
One of these three keywords
#REQUIRED
#IMPLIED
#FIXED
No default value is provided in the DTD
Document authors must provide an attribute value for each element
<!ELEMENT maintainer (name)>
<!ATTLIST maintainer email CDATA #REQUIRED
url CDATA #REQUIRED>
<!ELEMENT a (#PCDATA)>
<!ATTLIST a href CDATA #IMPLIED>
No default value in the DTD
Author may (but does not have to) provide a value with each element
Value is the same for all elements
Default value must be provided in DTD
Document author may not change default value
<!ELEMENT maintainer (name)>
<!ATTLIST maintainer email CDATA #FIXED "webmaster@nywc.org"
url CDATA #REQUIRED>
CDATA
ID
IDREF
IDREFS
ENTITY
ENTITIES
NOTATION
NMTOKEN
NMTOKENS
Enumerated
Most general attribute type
Value can be any string of text not containing a raw less-than
sign (<
) or quotation marks ("
)
<!ATTLIST maintainer email CDATA #REQUIRED
url CDATA #IMPLIED>
Value must be an XML name
May include letters, digits, underscores, hyphens, and periods
May not include whitespace
May or may not have the name "id" or "ID"
May contain colons only if used for namespaces
Value must be unique within ID type attributes in the document
Generally the default value is #REQUIRED
<!ELEMENT composer (name)>
<!ATTLIST composer id ID #REQUIRED>
Value matches the ID of an element in the same document
Used for links and the like
Multiple elements may share the same IDREF values
<!ELEMENT composition (title, date, length?,
instruments, description?, publisher?)>
<!ATTLIST composition composer IDREF #REQUIRED>
A list of ID values in the same document
Separated by white space
<!ELEMENT composition (title, date, length?,
instruments, description?, publisher?)>
<!ATTLIST composition composer IDREFS #REQUIRED>
<!ELEMENT category (#PCDATA)>
<!ELEMENT abstract (#PCDATA)>
<!ELEMENT keyword (#PCDATA)>
<!ELEMENT last_updated (#PCDATA)>
<!ELEMENT copyright (#PCDATA)>
<!ELEMENT title (#PCDATA)>
<!ELEMENT instruments (#PCDATA)>
<!ELEMENT publisher (#PCDATA)>
<!ELEMENT length (#PCDATA)>
<!ELEMENT date (year | ISODate)>
<!ELEMENT year (#PCDATA)>
<!ELEMENT ISODate (#PCDATA)>
<!ELEMENT catalog (category, cataloging_info, last_updated,
copyright, maintainer, (composer | composition)*)>
<!ELEMENT cataloging_info (abstract, keyword+)>
<!ELEMENT description (#PCDATA | ul | a | cite | p)*>
<!ELEMENT cite (#PCDATA | a)*>
<!ELEMENT ul (li*)>
<!ELEMENT li (#PCDATA)>
<!ELEMENT a (#PCDATA)>
<!ELEMENT p (#PCDATA)>
<!ELEMENT maintainer (name)>
<!ELEMENT name (first_name, middle_name, last_name)>
<!ELEMENT first_name (#PCDATA)>
<!ELEMENT middle_name (#PCDATA)>
<!ELEMENT last_name (#PCDATA)>
<!ATTLIST maintainer email CDATA #REQUIRED
url CDATA #IMPLIED>
<!ELEMENT composer (name)>
<!ATTLIST composer id ID #REQUIRED>
<!ELEMENT composition (title, date, length?, instruments, description?, publisher?)>
<!ATTLIST composition composer IDREFS #REQUIRED>
<!ATTLIST a href CDATA #REQUIRED>
Value is the name of an unparsed general entity declared in the DTD
Value is a list of unparsed general entities declared in the DTD
Separated by white space
Value is the name of a notation declared in the DTD
Value is any legal XML name
Value is a list of XML names
Separated by white space
Not a keyword
Refers to a list of possible values from which one must be chosen
Default value is generally provided explicitly
<!ATTLIST P VISIBLE (TRUE | FALSE) "TRUE">
An abbreviation for commonly used or hard to type text
Begin with an ampersand and end with a semicolon
α
"
©right;
&signature;
Declared in a <!ENTITY>
declaration
<!ENTITY copyright "Copyright 2000">
<!ENTITY quot """>
<!ENTITY signature
"<SIGNATURE>
<COPYRIGHT>2000 Elliotte Rusty Harold</COPYRIGHT>
<EMAIL>elharo@metalab.unc.edu</EMAIL>
<LAST_MODIFIED>March 10, 2000</LAST_MODIFIED>
</SIGNATURE>"
>
<?xml version="1.0" standalone="yes"?> <!DOCTYPE DOCUMENT [ <!ENTITY ERH "Elliotte Rusty Harold"> <!ELEMENT DOCUMENT (TITLE, SIGNATURE)> <!ELEMENT TITLE (#PCDATA)> <!ELEMENT COPYRIGHT (#PCDATA)> <!ELEMENT EMAIL (#PCDATA)> <!ELEMENT LAST_MODIFIED (#PCDATA)> <!ELEMENT SIGNATURE (COPYRIGHT, EMAIL, LAST_MODIFIED)> ]> <DOCUMENT> <TITLE>&ERH;</TITLE> <SIGNATURE> <COPYRIGHT>1999 &ERH;</COPYRIGHT> <EMAIL>elharo@metalab.unc.edu</EMAIL> <LAST_MODIFIED>March 10, 1999</LAST_MODIFIED> </SIGNATURE> </DOCUMENT>View in Browser
A general entity reference that refers to a different file
Parsed and Unparsed
<!ENTITY AlLeiter SYSTEM "mets/AlLeiter.xml"> <!ENTITY ArmandoReynoso SYSTEM "mets/ArmandoReynoso.xml"> <!ENTITY BobbyJones SYSTEM "mets/BobbyJones.xml"> <!ENTITY BradClontz SYSTEM "mets/BradClontz.xml"> <!ENTITY DennisCook SYSTEM "mets/DennisCook.xml"> <!ENTITY GregMcmichael SYSTEM "mets/GregMcmichael.xml"> <!ENTITY HideoNomo SYSTEM "mets/HideoNomo.xml"> <!ENTITY JohnFranco SYSTEM "mets/JohnFranco.xml"> <!ENTITY JosiasManzanillo SYSTEM "mets/JosiasManzanillo.xml"> <!ENTITY OctavioDotel SYSTEM "mets/OctavioDotel.xml"> <!ENTITY RickReed SYSTEM "mets/RickReed.xml"> <!ENTITY RigoBeltran SYSTEM "mets/RigoBeltran.xml"> <!ENTITY WillieBlair SYSTEM "mets/WillieBlair.xml">
Prolog is only a text declaration
Document is not valid and may not be well-formed because it may not have a root element.
<?xml version="1.0" encoding="UTF-8"?> <PLAYER> <GIVEN_NAME>Al</GIVEN_NAME> <SURNAME>Leiter</SURNAME> <P>Starting Pitcher</P> <G>28</G> <GS>28</GS> <W>17</W> <L>6</L> <SV>0</SV> <CG>4</CG> <SO>2</SO> <ERA>2.47</ERA> <IP>193</IP> <HRA>8</HRA> <RA>55</RA> <ER>53</ER> <HB>11</HB> <WP>4</WP> <B>1</B> <WB>71</WB> <K>174</K> </PLAYER>View in Browser
<?xml version="1.0" standalone="no"?> <!DOCTYPE TEAM SYSTEM "team.dtd" [ <!ENTITY % players SYSTEM "mets.dtd"> %players; ] > <TEAM> <TEAM_CITY>New York</TEAM_CITY> <TEAM_NAME>Mets</TEAM_NAME> &AlLeiter; &ArmandoReynoso; &BobbyJones; &BradClontz; &DennisCook; &GregMcmichael; &HideoNomo; &JohnFranco; &JosiasManzanillo; &OctavioDotel; &RickReed; &RigoBeltran; &WillieBlair; </TEAM>View in Browser
Only used in DTDs
Use a %
instead of an &
:
%inlines;
%block;
%mathml-prefix;
%mathml-colon;
Declared in a <!ENTITY %>
declaration
<!ENTITY % ERH "Elliotte Rusty Harold">
<!ENTITY COPY99 "Copyright 1999 %ERH;">
<!ENTITY % inlines
"(PERSON | DEGREE | MODEL | PRODUCT | ANIMAL | INGREDIENT)*">
<!ELEMENT PARAGRAPH %inlines;>
<!ELEMENT CELL %inlines;>
<!ELEMENT HEADING %inlines;>
Only used in DTDs
Pull in other DTD fragments
Add a SYSTEM
to the declaration:
<!ENTITY % player SYSTEM "player.dtd">
%player;
Can use a full URL:
<!ENTITY % player SYSTEM "http://www.ibiblio.org/xml/dtds/player.dtd">
%player;
<description>
<p>Premiered at Queens College in April, 1996 by Sue Ann Kahn,
Christine Ims, and Susan Jolles. In 3 movements :</p>
<ul>
<li>mvt. 1: 5:01</li>
<li>mvt. 2: 4:11</li>
<li>mvt. 3: 4:26</li>
</ul>
</description>
<!ELEMENT description (#PCDATA | ul | a | cite | p)*>
<!ELEMENT cite (#PCDATA | a)*>
<!ELEMENT ul (li*)>
<!ELEMENT li (#PCDATA)>
<!ELEMENT p (#PCDATA)>
<!ELEMENT a (#PCDATA)>
<!ATTLIST a href CDATA #IMPLIED>
XHTML is a reformulation of HTML as strict XML
Tags must be closed
Attribute values must be quoted
<br/>
instead of <br>
etc.
W3C Recommendation 26 January 2000
Includes three DTDs for HTML:
Strict
Transitional
Frameset
What if we can use one of those DTDs instead of inventing our own?
<!-- Extensible HTML version 1.0 Strict DTD This is the same as HTML 4.0 Strict except for changes due to the differences between XML and SGML. Namespace = http://www.w3.org/1999/xhtml For further information, see: http://www.w3.org/TR/xhtml1 Copyright (c) 1998-2000 W3C (MIT, INRIA, Keio), All Rights Reserved. This DTD module is identified by the PUBLIC and SYSTEM identifiers: PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" SYSTEM "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd" $Revision: 1.1 $ $Date: 2000/01/26 14:08:56 $ --> <!--================ Character mnemonic entities =========================--> <!ENTITY % HTMLlat1 PUBLIC "-//W3C//ENTITIES Latin 1 for XHTML//EN" "xhtml-lat1.ent"> %HTMLlat1; <!ENTITY % HTMLsymbol PUBLIC "-//W3C//ENTITIES Symbols for XHTML//EN" "xhtml-symbol.ent"> %HTMLsymbol; <!ENTITY % HTMLspecial PUBLIC "-//W3C//ENTITIES Special for XHTML//EN" "xhtml-special.ent"> %HTMLspecial; <!--================== Imported Names ====================================--> <!ENTITY % ContentType "CDATA"> <!-- media type, as per [RFC2045] --> <!ENTITY % ContentTypes "CDATA"> <!-- comma-separated list of media types, as per [RFC2045] --> <!ENTITY % Charset "CDATA"> <!-- a character encoding, as per [RFC2045] --> <!ENTITY % Charsets "CDATA"> <!-- a space separated list of character encodings, as per [RFC2045] --> <!ENTITY % LanguageCode "NMTOKEN"> <!-- a language code, as per [RFC1766] --> <!ENTITY % Character "CDATA"> <!-- a single character from [ISO10646] --> <!ENTITY % Number "CDATA"> <!-- one or more digits --> <!ENTITY % LinkTypes "CDATA"> <!-- space-separated list of link types --> <!ENTITY % MediaDesc "CDATA"> <!-- single or comma-separated list of media descriptors --> <!ENTITY % URI "CDATA"> <!-- a Uniform Resource Identifier, see [RFC2396] --> <!ENTITY % UriList "CDATA"> <!-- a space separated list of Uniform Resource Identifiers --> <!ENTITY % Datetime "CDATA"> <!-- date and time information. ISO date format --> <!ENTITY % Script "CDATA"> <!-- script expression --> <!ENTITY % StyleSheet "CDATA"> <!-- style sheet data --> <!ENTITY % Text "CDATA"> <!-- used for titles etc. --> <!ENTITY % FrameTarget "NMTOKEN"> <!-- render in this frame --> <!ENTITY % Length "CDATA"> <!-- nn for pixels or nn% for percentage length --> <!ENTITY % MultiLength "CDATA"> <!-- pixel, percentage, or relative --> <!ENTITY % MultiLengths "CDATA"> <!-- comma-separated list of MultiLength --> <!ENTITY % Pixels "CDATA"> <!-- integer representing length in pixels --> <!-- these are used for image maps --> <!ENTITY % Shape "(rect|circle|poly|default)"> <!ENTITY % Coords "CDATA"> <!-- comma separated list of lengths --> <!--=================== Generic Attributes ===============================--> <!-- core attributes common to most elements id document-wide unique id class space separated list of classes style associated style info title advisory title/amplification --> <!ENTITY % coreattrs "id ID #IMPLIED class CDATA #IMPLIED style %StyleSheet; #IMPLIED title %Text; #IMPLIED" > <!-- internationalization attributes lang language code (backwards compatible) xml:lang language code (as per XML 1.0 spec) dir direction for weak/neutral text --> <!ENTITY % i18n "lang %LanguageCode; #IMPLIED xml:lang %LanguageCode; #IMPLIED dir (ltr|rtl) #IMPLIED" > <!-- attributes for common UI events onclick a pointer button was clicked ondblclick a pointer button was double clicked onmousedown a pointer button was pressed down onmouseup a pointer button was released onmousemove a pointer was moved onto the element onmouseout a pointer was moved away from the element onkeypress a key was pressed and released onkeydown a key was pressed down onkeyup a key was released --> <!ENTITY % events "onclick %Script; #IMPLIED ondblclick %Script; #IMPLIED onmousedown %Script; #IMPLIED onmouseup %Script; #IMPLIED onmouseover %Script; #IMPLIED onmousemove %Script; #IMPLIED onmouseout %Script; #IMPLIED onkeypress %Script; #IMPLIED onkeydown %Script; #IMPLIED onkeyup %Script; #IMPLIED" > <!-- attributes for elements that can get the focus accesskey accessibility key character tabindex position in tabbing order onfocus the element got the focus onblur the element lost the focus --> <!ENTITY % focus "accesskey %Character; #IMPLIED tabindex %Number; #IMPLIED onfocus %Script; #IMPLIED onblur %Script; #IMPLIED" > <!ENTITY % attrs "%coreattrs; %i18n; %events;"> <!--=================== Text Elements ====================================--> <!ENTITY % special "br | span | bdo | object | img | map"> <!ENTITY % fontstyle "tt | i | b | big | small"> <!ENTITY % phrase "em | strong | dfn | code | q | sub | sup | samp | kbd | var | cite | abbr | acronym"> <!ENTITY % inline.forms "input | select | textarea | label | button"> <!-- these can occur at block or inline level --> <!ENTITY % misc "ins | del | script | noscript"> <!ENTITY % inline "a | %special; | %fontstyle; | %phrase; | %inline.forms;"> <!-- %Inline; covers inline or "text-level" elements --> <!ENTITY % Inline "(#PCDATA | %inline; | %misc;)*"> <!--================== Block level elements ==============================--> <!ENTITY % heading "h1|h2|h3|h4|h5|h6"> <!ENTITY % lists "ul | ol | dl"> <!ENTITY % blocktext "pre | hr | blockquote | address"> <!ENTITY % block "p | %heading; | div | %lists; | %blocktext; | fieldset | table"> <!ENTITY % Block "(%block; | form | %misc;)*"> <!-- %Flow; mixes Block and Inline and is used for list items etc. --> <!ENTITY % Flow "(#PCDATA | %block; | form | %inline; | %misc;)*"> <!--================== Content models for exclusions =====================--> <!-- a elements use %Inline; excluding a --> <!ENTITY % a.content "(#PCDATA | %special; | %fontstyle; | %phrase; | %inline.forms; | %misc;)*"> <!-- pre uses %Inline excluding img, object, big, small, sup or sup --> <!ENTITY % pre.content "(#PCDATA | a | br | span | bdo | map | tt | i | b | %phrase; | %inline.forms;)*"> <!-- form uses %Block; excluding form --> <!ENTITY % form.content "(%block; | %misc;)*"> <!-- button uses %Flow; but excludes a, form and form controls --> <!ENTITY % button.content "(#PCDATA | p | %heading; | div | %lists; | %blocktext; | table | %special; | %fontstyle; | %phrase; | %misc;)*"> <!--================ Document Structure ==================================--> <!-- the namespace URI designates the document profile --> <!ELEMENT html (head, body)> <!ATTLIST html %i18n; xmlns %URI; #FIXED 'http://www.w3.org/1999/xhtml' > <!--================ Document Head =======================================--> <!ENTITY % head.misc "(script|style|meta|link|object)*"> <!-- content model is %head.misc; combined with a single title and an optional base element in any order --> <!ELEMENT head (%head.misc;, ((title, %head.misc;, (base, %head.misc;)?) | (base, %head.misc;, (title, %head.misc;))))> <!ATTLIST head %i18n; profile %URI; #IMPLIED > <!-- The title element is not considered part of the flow of text. It should be displayed, for example as the page header or window title. Exactly one title is required per document. --> <!ELEMENT title (#PCDATA)> <!ATTLIST title %i18n;> <!-- document base URI --> <!ELEMENT base EMPTY> <!ATTLIST base href %URI; #IMPLIED > <!-- generic metainformation --> <!ELEMENT meta EMPTY> <!ATTLIST meta %i18n; http-equiv CDATA #IMPLIED name CDATA #IMPLIED content CDATA #REQUIRED scheme CDATA #IMPLIED > <!-- Relationship values can be used in principle: a) for document specific toolbars/menus when used with the link element in document head e.g. start, contents, previous, next, index, end, help b) to link to a separate style sheet (rel="stylesheet") c) to make a link to a script (rel="script") d) by stylesheets to control how collections of html nodes are rendered into printed documents e) to make a link to a printable version of this document e.g. a PostScript or PDF version (rel="alternate" media="print") --> <!ELEMENT link EMPTY> <!ATTLIST link %attrs; charset %Charset; #IMPLIED href %URI; #IMPLIED hreflang %LanguageCode; #IMPLIED type %ContentType; #IMPLIED rel %LinkTypes; #IMPLIED rev %LinkTypes; #IMPLIED media %MediaDesc; #IMPLIED > <!-- style info, which may include CDATA sections --> <!ELEMENT style (#PCDATA)> <!ATTLIST style %i18n; type %ContentType; #REQUIRED media %MediaDesc; #IMPLIED title %Text; #IMPLIED xml:space (preserve) #FIXED 'preserve' > <!-- script statements, which may include CDATA sections --> <!ELEMENT script (#PCDATA)> <!ATTLIST script charset %Charset; #IMPLIED type %ContentType; #REQUIRED src %URI; #IMPLIED defer (defer) #IMPLIED xml:space (preserve) #FIXED 'preserve' > <!-- alternate content container for non script-based rendering --> <!ELEMENT noscript %Block;> <!ATTLIST noscript %attrs; > <!--=================== Document Body ====================================--> <!ELEMENT body %Block;> <!ATTLIST body %attrs; onload %Script; #IMPLIED onunload %Script; #IMPLIED > <!ELEMENT div %Flow;> <!-- generic language/style container --> <!ATTLIST div %attrs; > <!--=================== Paragraphs =======================================--> <!ELEMENT p %Inline;> <!ATTLIST p %attrs; > <!--=================== Headings =========================================--> <!-- There are six levels of headings from h1 (the most important) to h6 (the least important). --> <!ELEMENT h1 %Inline;> <!ATTLIST h1 %attrs; > <!ELEMENT h2 %Inline;> <!ATTLIST h2 %attrs; > <!ELEMENT h3 %Inline;> <!ATTLIST h3 %attrs; > <!ELEMENT h4 %Inline;> <!ATTLIST h4 %attrs; > <!ELEMENT h5 %Inline;> <!ATTLIST h5 %attrs; > <!ELEMENT h6 %Inline;> <!ATTLIST h6 %attrs; > <!--=================== Lists ============================================--> <!-- Unordered list --> <!ELEMENT ul (li)+> <!ATTLIST ul %attrs; > <!-- Ordered (numbered) list --> <!ELEMENT ol (li)+> <!ATTLIST ol %attrs; > <!-- list item --> <!ELEMENT li %Flow;> <!ATTLIST li %attrs; > <!-- definition lists - dt for term, dd for its definition --> <!ELEMENT dl (dt|dd)+> <!ATTLIST dl %attrs; > <!ELEMENT dt %Inline;> <!ATTLIST dt %attrs; > <!ELEMENT dd %Flow;> <!ATTLIST dd %attrs; > <!--=================== Address ==========================================--> <!-- information on author --> <!ELEMENT address %Inline;> <!ATTLIST address %attrs; > <!--=================== Horizontal Rule ==================================--> <!ELEMENT hr EMPTY> <!ATTLIST hr %attrs; > <!--=================== Preformatted Text ================================--> <!-- content is %Inline; excluding "img|object|big|small|sub|sup" --> <!ELEMENT pre %pre.content;> <!ATTLIST pre %attrs; xml:space (preserve) #FIXED 'preserve' > <!--=================== Block-like Quotes ================================--> <!ELEMENT blockquote %Block;> <!ATTLIST blockquote %attrs; cite %URI; #IMPLIED > <!--=================== Inserted/Deleted Text ============================--> <!-- ins/del are allowed in block and inline content, but its inappropriate to include block content within an ins element occurring in inline content. --> <!ELEMENT ins %Flow;> <!ATTLIST ins %attrs; cite %URI; #IMPLIED datetime %Datetime; #IMPLIED > <!ELEMENT del %Flow;> <!ATTLIST del %attrs; cite %URI; #IMPLIED datetime %Datetime; #IMPLIED > <!--================== The Anchor Element ================================--> <!-- content is %Inline; except that anchors shouldn't be nested --> <!ELEMENT a %a.content;> <!ATTLIST a %attrs; charset %Charset; #IMPLIED type %ContentType; #IMPLIED name NMTOKEN #IMPLIED href %URI; #IMPLIED hreflang %LanguageCode; #IMPLIED rel %LinkTypes; #IMPLIED rev %LinkTypes; #IMPLIED accesskey %Character; #IMPLIED shape %Shape; "rect" coords %Coords; #IMPLIED tabindex %Number; #IMPLIED onfocus %Script; #IMPLIED onblur %Script; #IMPLIED > <!--===================== Inline Elements ================================--> <!ELEMENT span %Inline;> <!-- generic language/style container --> <!ATTLIST span %attrs; > <!ELEMENT bdo %Inline;> <!-- I18N BiDi over-ride --> <!ATTLIST bdo %coreattrs; %events; lang %LanguageCode; #IMPLIED xml:lang %LanguageCode; #IMPLIED dir (ltr|rtl) #REQUIRED > <!ELEMENT br EMPTY> <!-- forced line break --> <!ATTLIST br %coreattrs; > <!ELEMENT em %Inline;> <!-- emphasis --> <!ATTLIST em %attrs;> <!ELEMENT strong %Inline;> <!-- strong emphasis --> <!ATTLIST strong %attrs;> <!ELEMENT dfn %Inline;> <!-- definitional --> <!ATTLIST dfn %attrs;> <!ELEMENT code %Inline;> <!-- program code --> <!ATTLIST code %attrs;> <!ELEMENT samp %Inline;> <!-- sample --> <!ATTLIST samp %attrs;> <!ELEMENT kbd %Inline;> <!-- something user would type --> <!ATTLIST kbd %attrs;> <!ELEMENT var %Inline;> <!-- variable --> <!ATTLIST var %attrs;> <!ELEMENT cite %Inline;> <!-- citation --> <!ATTLIST cite %attrs;> <!ELEMENT abbr %Inline;> <!-- abbreviation --> <!ATTLIST abbr %attrs;> <!ELEMENT acronym %Inline;> <!-- acronym --> <!ATTLIST acronym %attrs;> <!ELEMENT q %Inline;> <!-- inlined quote --> <!ATTLIST q %attrs; cite %URI; #IMPLIED > <!ELEMENT sub %Inline;> <!-- subscript --> <!ATTLIST sub %attrs;> <!ELEMENT sup %Inline;> <!-- superscript --> <!ATTLIST sup %attrs;> <!ELEMENT tt %Inline;> <!-- fixed pitch font --> <!ATTLIST tt %attrs;> <!ELEMENT i %Inline;> <!-- italic font --> <!ATTLIST i %attrs;> <!ELEMENT b %Inline;> <!-- bold font --> <!ATTLIST b %attrs;> <!ELEMENT big %Inline;> <!-- bigger font --> <!ATTLIST big %attrs;> <!ELEMENT small %Inline;> <!-- smaller font --> <!ATTLIST small %attrs;> <!--==================== Object ======================================--> <!-- object is used to embed objects as part of HTML pages. param elements should precede other content. Parameters can also be expressed as attribute/value pairs on the object element itself when brevity is desired. --> <!ELEMENT object (#PCDATA | param | %block; | form | %inline; | %misc;)*> <!ATTLIST object %attrs; declare (declare) #IMPLIED classid %URI; #IMPLIED codebase %URI; #IMPLIED data %URI; #IMPLIED type %ContentType; #IMPLIED codetype %ContentType; #IMPLIED archive %UriList; #IMPLIED standby %Text; #IMPLIED height %Length; #IMPLIED width %Length; #IMPLIED usemap %URI; #IMPLIED name NMTOKEN #IMPLIED tabindex %Number; #IMPLIED > <!-- param is used to supply a named property value. In XML it would seem natural to follow RDF and support an abbreviated syntax where the param elements are replaced by attribute value pairs on the object start tag. --> <!ELEMENT param EMPTY> <!ATTLIST param id ID #IMPLIED name CDATA #IMPLIED value CDATA #IMPLIED valuetype (data|ref|object) "data" type %ContentType; #IMPLIED > <!--=================== Images ===========================================--> <!-- To avoid accessibility problems for people who aren't able to see the image, you should provide a text description using the alt and longdesc attributes. In addition, avoid the use of server-side image maps. Note that in this DTD there is no name attribute. That is only available in the transitional and frameset DTD. --> <!ELEMENT img EMPTY> <!ATTLIST img %attrs; src %URI; #REQUIRED alt %Text; #REQUIRED longdesc %URI; #IMPLIED height %Length; #IMPLIED width %Length; #IMPLIED usemap %URI; #IMPLIED ismap (ismap) #IMPLIED > <!-- usemap points to a map element which may be in this document or an external document, although the latter is not widely supported --> <!--================== Client-side image maps ============================--> <!-- These can be placed in the same document or grouped in a separate document although this isn't yet widely supported --> <!ELEMENT map ((%block; | form | %misc;)+ | area+)> <!ATTLIST map %i18n; %events; id ID #REQUIRED class CDATA #IMPLIED style %StyleSheet; #IMPLIED title %Text; #IMPLIED name NMTOKEN #IMPLIED > <!ELEMENT area EMPTY> <!ATTLIST area %attrs; shape %Shape; "rect" coords %Coords; #IMPLIED href %URI; #IMPLIED nohref (nohref) #IMPLIED alt %Text; #REQUIRED tabindex %Number; #IMPLIED accesskey %Character; #IMPLIED onfocus %Script; #IMPLIED onblur %Script; #IMPLIED > <!--================ Forms ===============================================--> <!ELEMENT form %form.content;> <!-- forms shouldn't be nested --> <!ATTLIST form %attrs; action %URI; #REQUIRED method (get|post) "get" enctype %ContentType; "application/x-www-form-urlencoded" onsubmit %Script; #IMPLIED onreset %Script; #IMPLIED accept %ContentTypes; #IMPLIED accept-charset %Charsets; #IMPLIED > <!-- Each label must not contain more than ONE field Label elements shouldn't be nested. --> <!ELEMENT label %Inline;> <!ATTLIST label %attrs; for IDREF #IMPLIED accesskey %Character; #IMPLIED onfocus %Script; #IMPLIED onblur %Script; #IMPLIED > <!ENTITY % InputType "(text | password | checkbox | radio | submit | reset | file | hidden | image | button)" > <!-- the name attribute is required for all but submit & reset --> <!ELEMENT input EMPTY> <!-- form control --> <!ATTLIST input %attrs; type %InputType; "text" name CDATA #IMPLIED value CDATA #IMPLIED checked (checked) #IMPLIED disabled (disabled) #IMPLIED readonly (readonly) #IMPLIED size CDATA #IMPLIED maxlength %Number; #IMPLIED src %URI; #IMPLIED alt CDATA #IMPLIED usemap %URI; #IMPLIED tabindex %Number; #IMPLIED accesskey %Character; #IMPLIED onfocus %Script; #IMPLIED onblur %Script; #IMPLIED onselect %Script; #IMPLIED onchange %Script; #IMPLIED accept %ContentTypes; #IMPLIED > <!ELEMENT select (optgroup|option)+> <!-- option selector --> <!ATTLIST select %attrs; name CDATA #IMPLIED size %Number; #IMPLIED multiple (multiple) #IMPLIED disabled (disabled) #IMPLIED tabindex %Number; #IMPLIED onfocus %Script; #IMPLIED onblur %Script; #IMPLIED onchange %Script; #IMPLIED > <!ELEMENT optgroup (option)+> <!-- option group --> <!ATTLIST optgroup %attrs; disabled (disabled) #IMPLIED label %Text; #REQUIRED > <!ELEMENT option (#PCDATA)> <!-- selectable choice --> <!ATTLIST option %attrs; selected (selected) #IMPLIED disabled (disabled) #IMPLIED label %Text; #IMPLIED value CDATA #IMPLIED > <!ELEMENT textarea (#PCDATA)> <!-- multi-line text field --> <!ATTLIST textarea %attrs; name CDATA #IMPLIED rows %Number; #REQUIRED cols %Number; #REQUIRED disabled (disabled) #IMPLIED readonly (readonly) #IMPLIED tabindex %Number; #IMPLIED accesskey %Character; #IMPLIED onfocus %Script; #IMPLIED onblur %Script; #IMPLIED onselect %Script; #IMPLIED onchange %Script; #IMPLIED > <!-- The fieldset element is used to group form fields. Only one legend element should occur in the content and if present should only be preceded by whitespace. --> <!ELEMENT fieldset (#PCDATA | legend | %block; | form | %inline; | %misc;)*> <!ATTLIST fieldset %attrs; > <!ELEMENT legend %Inline;> <!-- fieldset label --> <!ATTLIST legend %attrs; accesskey %Character; #IMPLIED > <!-- Content is %Flow; excluding a, form and form controls --> <!ELEMENT button %button.content;> <!-- push button --> <!ATTLIST button %attrs; name CDATA #IMPLIED value CDATA #IMPLIED type (button|submit|reset) "submit" disabled (disabled) #IMPLIED tabindex %Number; #IMPLIED accesskey %Character; #IMPLIED onfocus %Script; #IMPLIED onblur %Script; #IMPLIED > <!--======================= Tables =======================================--> <!-- Derived from IETF HTML table standard, see [RFC1942] --> <!-- The border attribute sets the thickness of the frame around the table. The default units are screen pixels. The frame attribute specifies which parts of the frame around the table should be rendered. The values are not the same as CALS to avoid a name clash with the valign attribute. --> <!ENTITY % TFrame "(void|above|below|hsides|lhs|rhs|vsides|box|border)"> <!-- The rules attribute defines which rules to draw between cells: If rules is absent then assume: "none" if border is absent or border="0" otherwise "all" --> <!ENTITY % TRules "(none | groups | rows | cols | all)"> <!-- horizontal placement of table relative to document --> <!ENTITY % TAlign "(left|center|right)"> <!-- horizontal alignment attributes for cell contents char alignment char, e.g. char=':' charoff offset for alignment char --> <!ENTITY % cellhalign "align (left|center|right|justify|char) #IMPLIED char %Character; #IMPLIED charoff %Length; #IMPLIED" > <!-- vertical alignment attributes for cell contents --> <!ENTITY % cellvalign "valign (top|middle|bottom|baseline) #IMPLIED" > <!ELEMENT table (caption?, (col*|colgroup*), thead?, tfoot?, (tbody+|tr+))> <!ELEMENT caption %Inline;> <!ELEMENT thead (tr)+> <!ELEMENT tfoot (tr)+> <!ELEMENT tbody (tr)+> <!ELEMENT colgroup (col)*> <!ELEMENT col EMPTY> <!ELEMENT tr (th|td)+> <!ELEMENT th %Flow;> <!ELEMENT td %Flow;> <!ATTLIST table %attrs; summary %Text; #IMPLIED width %Length; #IMPLIED border %Pixels; #IMPLIED frame %TFrame; #IMPLIED rules %TRules; #IMPLIED cellspacing %Length; #IMPLIED cellpadding %Length; #IMPLIED > <!ENTITY % CAlign "(top|bottom|left|right)"> <!ATTLIST caption %attrs; > <!-- colgroup groups a set of col elements. It allows you to group several semantically related columns together. --> <!ATTLIST colgroup %attrs; span %Number; "1" width %MultiLength; #IMPLIED %cellhalign; %cellvalign; > <!-- col elements define the alignment properties for cells in one or more columns. The width attribute specifies the width of the columns, e.g. width=64 width in screen pixels width=0.5* relative width of 0.5 The span attribute causes the attributes of one col element to apply to more than one column. --> <!ATTLIST col %attrs; span %Number; "1" width %MultiLength; #IMPLIED %cellhalign; %cellvalign; > <!-- Use thead to duplicate headers when breaking table across page boundaries, or for static headers when tbody sections are rendered in scrolling panel. Use tfoot to duplicate footers when breaking table across page boundaries, or for static footers when tbody sections are rendered in scrolling panel. Use multiple tbody sections when rules are needed between groups of table rows. --> <!ATTLIST thead %attrs; %cellhalign; %cellvalign; > <!ATTLIST tfoot %attrs; %cellhalign; %cellvalign; > <!ATTLIST tbody %attrs; %cellhalign; %cellvalign; > <!ATTLIST tr %attrs; %cellhalign; %cellvalign; > <!-- Scope is simpler than headers attribute for common tables --> <!ENTITY % Scope "(row|col|rowgroup|colgroup)"> <!-- th is for headers, td for data and for cells acting as both --> <!ATTLIST th %attrs; abbr %Text; #IMPLIED axis CDATA #IMPLIED headers IDREFS #IMPLIED scope %Scope; #IMPLIED rowspan %Number; "1" colspan %Number; "1" %cellhalign; %cellvalign; > <!ATTLIST td %attrs; abbr %Text; #IMPLIED axis CDATA #IMPLIED headers IDREFS #IMPLIED scope %Scope; #IMPLIED rowspan %Number; "1" colspan %Number; "1" %cellhalign; %cellvalign; >
<!ENTITY % xhtml1 SYSTEM "http://www.w3.org/TR/xhtml1/DTD/strict.dtd">
%xhtml1;
<!ENTITY % xhtml1 SYSTEM "http://www.w3.org/TR/xhtml1/DTD/strict.dtd"> %xhtml1; <!ELEMENT category (#PCDATA)> <!ELEMENT abstract (#PCDATA)> <!ELEMENT keyword (#PCDATA)> <!ELEMENT last_updated (#PCDATA)> <!-- e.g. "1999 New York Women Composers", not "Copyright 1999 New York Women Composers" --> <!ELEMENT copyright (#PCDATA)> <!ELEMENT instruments (#PCDATA)> <!ELEMENT publisher (#PCDATA)> <!ELEMENT length (#PCDATA)> <!ELEMENT date (year | ISODate)> <!ELEMENT year (#PCDATA)> <!ELEMENT ISODate (#PCDATA)> <!ELEMENT catalog (category, cataloging_info, last_updated, copyright, maintainer, (composer | composition)*)> <!ELEMENT cataloging_info (abstract, keyword+)> <!ELEMENT description %Block;> <!ELEMENT maintainer (name)> <!ELEMENT name (first_name, middle_name, last_name)> <!ELEMENT first_name (#PCDATA)> <!ELEMENT middle_name (#PCDATA)> <!ELEMENT last_name (#PCDATA)> <!ATTLIST maintainer email CDATA #REQUIRED url CDATA #IMPLIED> <!ELEMENT composer (name)> <!ATTLIST composer id ID #REQUIRED> <!ELEMENT composition (title, date, length?, instruments, description?, publisher?)> <!ATTLIST composition composer IDREFS #REQUIRED> <!ATTLIST a href CDATA #IMPLIED>
<?xml version="1.0"?>
<!DOCTYPE document SYSTEM "http://www.w3.org/TR/xhtml1/DTD/transitional.dtd" [
<!ELEMENT document %BLOCK; >
]>
<document>
<p>Hello There!</p>
</document>
Use XML syntax to describe the allowed content of an XML document rather than DTD syntax
Allow restrictions to be placed on PCDATA content; e.g. that the contents of an element must be an integer between 1 and 10
Area of active research and development
To distinguish between elements and attributes from different vocabularies with different meanings.
To group all related elements and attributes together so that a parser can easily recognize them.
The XLink specification defines an attribute with the name href
.
The XHTML specification also uses href
attributes on some elements.
And the XInclude specification uses href
attributes.
An XSLT style sheet that will transform XHTML documents containing both Scalable Vector Graphics (SVG) pictures and MathML equations into XSL-Formatting object documents.
The a
, title
, script
,
style
and font
elements in XHTML and SVG
The table
element in XHTML and XSL-FO
The text
element in XSLT and SVG
The set
element in MathML and SVG
An XSLT stylesheet that transforms a style sheet in an older version of the XSLT specification to a style sheet in a newer version of the XSLT specification.
Namespaces disambiguate elements with the same name from each other by attaching different prefixes to names from different XML applications.
Each prefix is associated with a URI.
Names whose prefixes are associated with the same URI are in the same namespace.
Names whose prefixes are associated with different URIs are in different namespaces.
Elements and attributes that are in namespaces have names that contain exactly one colon. They look like this:
rdf:description
xlink:type
xsl:template
Everything before the colon is called the prefix
Everything after the colon is called the local part.
The complete name including the colon is called the qualified name.
Each prefix in a qualified name is associated with a URI.
For example, all elements in XSLT 1.0 style sheets are associated with the http://www.w3.org/1999/XSL/Transform URI.
The customary prefix xsl
is a shorthand for the longer URI
http://www.w3.org/1999/XSL/Transform.
You can't use the URI in the element name directly.
{http://www.w3.org/1999/XSL/Transform}template
Prefixes are bound to namespace URIs by attaching an xmlns:prefix
attribute to the prefixed element or one of its ancestors.
<svg:svg xmlns:svg="http://www.w3.org/Graphics/SVG/SVG-19991203.dtd"
width="12cm" height="10cm">
<svg:ellipse rx="110" ry="130" />
<svg:rect x="4cm" y="1cm" width="3cm" height="6cm" />
</svg:svg>
Bindings have scope within the element where they're declared.
An SVG processor can recognize all three of these elements as SVG elements because they all have prefixes bound to the particular URI defined by the SVG specification.
<xhtml:html xmlns:xhtml="http://www.w3.org/1999/xhtml"
xmlns:xlink="http://www.w3.org/XML/XLink/0.9">
<xhtml:head><xhtml:title>Three Namespaces</xhtml:title></xhtml:head>
<xhtml:body>
<xhtml:h1 align="center">An Ellipse and a Rectangle</xhtml:h1>
<svg:svg xmlns:svg="http://www.w3.org/Graphics/SVG/SVG-19991203.dtd"
width="12cm" height="10cm">
<svg:ellipse rx="110" ry="130" />
<svg:rect x="4cm" y="1cm" width="3cm" height="6cm" />
</svg:svg>
<xhtml:p xlink:type="simple"
xlink:href="ellipses.html">
More about ellipses
</xhtml:p>
<xhtml:p xlink:type="simple" xlink:href="rectangles.html">
More about rectangles
</xhtml:p>
<xhtml:hr/>
<xhtml:p>Last Modified February 13, 2000</xhtml:p>
</xhtml:body>
</xhtml:html>
<!ATTLIST svg:svg xmlns:svg (CDATA)
#FIXED "http://www.w3.org/Graphics/SVG/SVG-19991203.dtd">
<svg:svg width="12cm" height="10cm">
<svg:ellipse rx="110" ry="130" />
<svg:rect x="4cm" y="1cm" width="3cm" height="6cm" />
</svg:svg>
Unprefixed attributes are never in any namespace.
Being an attribute of an element in the http://www.w3.org/1999/xhtml
namespace is not sufficient to put the attribute in the http://www.w3.org/1999/xhtml
namespace.
The only way an attribute belongs to a namespace is if it has a declared prefix, like xlink:type
and xlink:href
.
Many XML applications have recommended prefixes. For example, SVG elements often use the prefix svg
and Resource Description Framework (RDF) elements often have the prefix rdf
. However, these prefixes are simply conventions, and can be changed based on necessity, convenience or whim.
Before a prefix can be used, it must be bound to a URI.
These URIs are standardized, not the prefixes.
The prefix can change as long as the URI stays the same.
Purely formal
Can point somewhere but do not have to
Parsers compare namespace URIs on a character by character basis. These are three different namespaces:
http://www.w3.org/1999/XSL/Transform
http://www.w3.org/1999/XSL/Transform/
http://www.w3.org/1999/XSL/Transform/index.html
Indicate that an unprefixed element and all its unprefixed descendant
elements belong to a particular namespace by attaching an xmlns
attribute with no prefix:
<DATASCHEMA xmlns="http://www.w3.org/2000/P3Pv1">
<DATA name="vehicle.make" type="text" short="Make"
category="preference" size="31"/>
<DATA name="vehicle.model" type="text" short="Model"
category="preference" size="31"/>
<DATA name="vehicle.year" type="number" short="Year"
category="preference" size="4"/>
<DATA name="vehicle.license.state." type="postal." short="State"
category="preference" size="2"/>
<DATA name="vehicle.license.number" type="text"
short="License Plate Number" category="preference" size="12"/>
</DATASCHEMA>
Both the DATASCHEMA
and DATA
elements are in the
http://www.w3.org/2000/P3Pv1 namespace.
Default namespaces apply only to elements, not to attributes.
Thus in the above example the name
, type
, short
, category
, and size
attributes are not in any namespace.
You can change the default namespace within a particular
element by adding an xmlns
attribute to the element.
<html xmlns="http://www.w3.org/1999/xhtml"
xmlns:xlink="http://www.w3.org/XML/XLink/0.9">
<head><title>Three Namespaces</title></head>
<body>
<h1 align="center">An Ellipse and a Rectangle</h1>
<svg xmlns="http://www.w3.org/Graphics/SVG/SVG-19991203.dtd"
width="12cm" height="10cm">
<ellipse rx="110" ry="130" />
<rect x="4cm" y="1cm" width="3cm" height="6cm" />
</svg>
<p xlink:type="simple" xlink:href="ellipses.html">
More about ellipses
</p>
<p xlink:type="simple" xlink:href="rectangles.html">
More about rectangles
</p>
<hr/>
<p>Last Modified February 13, 2000</p>
</body>
</html>
<!ATTLIST svg xmlns (CDATA)
#FIXED "http://www.w3.org/Graphics/SVG/SVG-19991203.dtd">
<svg width="12cm" height="10cm">
<ellipse rx="110" ry="130" />
<rect x="4cm" y="1cm" width="3cm" height="6cm" />
</svg>
Namespaces were added to XML 1.0 after the fact, but care was taken to ensure backwards compatibility.
An XML 1.0 parser that does not know about namespaces will most likely not have any troubles reading a document that uses namespaces.
A namespace aware parser also checks to see that all prefixes are mapped to URIs. Otherwise it behaves almost exactly like a non-namespace aware parser.
Other software that sits on top of the raw XML parser, an XSLT engine for example, may treat elements differently depending on what namespace they belong to. However, the XML parser itself mostly doesn't care as long as all well- formedness and namespace constraints are met.
A possible exception occurs in the unlikely event that elements with different prefixes belong to the same namespace or elements with the same prefix belong to different namespaces
Many parsers have the option of whether to report namespace violations so that you can turn namespace processing on or off as you see fit.
DTDs must declare the qualified names
<!ELEMENT svg:text (#PCDATA)>
If the prefix changes, the DTD needs to change to.
Parameter entity references can help when the prefix changes or is removed:
<!ENTITY % mathml-colon ''>
<!ENTITY % mathml-prefix ''>
<!ENTITY % mathml-exp '%mathml-prefix;%mathml-colon;exp' >
<!ENTITY % mathml-abs '%mathml-prefix;%mathml-colon;abs' >
<!ENTITY % mathml-arg '%mathml-prefix;%mathml-colon;arg' >
<!ENTITY % mathml-real '%mathml-prefix;%mathml-colon;real' >
<!ENTITY % mathml-imaginary '%mathml-prefix;%mathml-colon;imaginary' >
A transformation language (XSLT)
A formatting language (XSL-FO)
This talk covers:
XSL Transformations: November 16 1.0 Specification
Something reads an XML document and forms a tree
The tree is passed to the XSLT Processor
The XSLT processor compares the nodes in the tree to the instructions in the style sheet
When the XSLT processor finds a match it outputs a tree fragment
(Optional) The complete output tree is serialized to some other format such as text, HTML, or an XML file
<?xml version="1.0" encoding="ISO-8859-1"?> <!DOCTYPE catalog SYSTEM "compositions.dtd"> <catalog> <category> Small chamber ensembles - 2-4 Players by New York Women Composers </category> <cataloging_info> <abstract>Compositions by the members of New York Women Composers</abstract> <keyword>music publishing</keyword> <keyword>scores</keyword> <keyword>women composers</keyword> <keyword>New York</keyword> </cataloging_info> <last_updated>July 28, 1999</last_updated> <copyright>1999 New York Women Composers</copyright> <maintainer email="elharo@metalab.unc.edu" url="http://www.macfaq.com/personal.html"> <name> <first_name>Elliotte</first_name> <middle_name>Rusty</middle_name> <last_name>Harold</last_name> </name> </maintainer> <composer id="c1"> <name> <first_name>Julie</first_name> <middle_name></middle_name> <last_name>Mandel</last_name> </name> </composer> <composer id="c2"> <name> <first_name>Margaret</first_name> <middle_name>De</middle_name> <last_name>Wys</last_name> </name> </composer> <composer id="c3"> <name> <first_name>Beth</first_name> <middle_name></middle_name> <last_name>Anderson</last_name> </name> </composer> <composer id="c4"> <name> <first_name>Linda</first_name> <middle_name></middle_name> <last_name>Bouchard</last_name> </name> </composer> <composition composer="c1"> <title>Trio for Flute, Viola and Harp</title> <date><year>1994</year></date> <length>13'38"</length> <instruments>fl, hp, vla</instruments> <description> <p>Premiered at Queens College in April, 1996 by Sue Ann Kahn, Christine Ims, and Susan Jolles. In 3 movements :</p> <ul> <li>mvt. 1: 5:01</li> <li>mvt. 2: 4:11</li> <li>mvt. 3: 4:26</li> </ul> </description> <publisher>Theodore Presser</publisher> </composition> <composition composer="c2"> <title>Charmonium</title> <date><year>1991</year></date> <length>9'</length> <instruments>2 vln, vla, vc</instruments> <description> <p>Commissioned as quartet for the Meridian String Quartet. Sonorous, bold. Moderate difficulty. Tape available.</p> </description> </composition> <composition composer="c1"> <title>Invention for Flute and Piano</title> <date><year>1994</year></date> <instruments>fl, pn</instruments> <description><p>3 movements</p></description> </composition> <composition composer="c3"> <title>Little Trio</title> <date><year>1984</year></date> <length>4'</length> <instruments>fl, guit, va</instruments> <publisher>ACA</publisher> </composition> <composition composer="c3"> <title>Dr. Blood's Mermaid Lullaby</title> <date><year>1980</year></date> <length>3'</length> <instruments>fl or ob, or vn, or vc, pn</instruments> <publisher>ACA</publisher> </composition> <composition composer="c3"> <title>Trio: Dream in D</title> <date><year>1980</year></date> <length>10'</length> <instruments>fl, pn, vc, or vn, pn, vc</instruments> <description> <p>Rhapsodic. Passionate. Available on CD <cite> <a href="http://www.amazon.com/exec/obidos/ASIN/B000007NMH/qid=913265342/sr=1-2/"> Two by Three</a></cite> from North/South Consonance (1998).</p> </description> </composition> <composition composer="c4"> <title>Propos II</title> <date><year>1985</year></date> <length>11'</length> <instruments>2 tpt</instruments> <description><p>Arrangement from Propos</p></description> </composition> <composition composer="c4"> <title>Rictus En Mirroir</title> <date><year>1985</year></date> <length>14'</length> <instruments>fl, ob, hpschd, vc</instruments> </composition> </catalog>
<?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> </xsl:stylesheet>
Let's use xt to apply this stylesheet to compositions.xml.
Windows executable:
C:> xt compositions.xml sheet1.xsl
Java executable:
C:> java -Dcom.jclark.xsl.sax.parser=com.jclark.xml.sax.CommentDriver com.jclark.xsl.sax.Driver compositions.xml sheet1.xsl output1.html
<?xml version="1.0" encoding="utf-8"?> Small chamber ensembles - 2-4 Players by New York Women Composers Compositions by the members of New York Women Composers music publishing scores women composers New York July 28, 1999 1999 New York Women Composers Elliotte Rusty Harold Julie Mandel Margaret De Wys Beth Anderson Linda Bouchard Trio for Flute, Viola and Harp 1994 13'38" fl, hp, vla Premiered at Queens College in April, 1996 by Sue Ann Kahn, Christine Ims, and Susan Jolles. In 3 movements : mvt. 1: 5:01 mvt. 2: 4:11 mvt. 3: 4:26 Theodore Presser Charmonium 1991 9' 2 vln, vla, vc Commissioned as quartet for the Meridian String Quartet. Sonorous, bold. Moderate difficulty. Tape available. Invention for Flute and Piano 1994 fl, pn 3 movements Little Trio 1984 4' fl, guit, va ACA Dr. Blood's Mermaid Lullaby 1980 3' fl or ob, or vn, or vc, pn ACA Trio: Dream in D 1980 10' fl, pn, vc, or vn, pn, vc Rhapsodic. Passionate. Available on CD Two by Three from North/South Consonance (1998). Propos II 1985 11' 2 tpt Arrangement from Propos Rictus En Mirroir 1985 14' fl, ob, hpschd, vcView in browser
<?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="composition"> <h3><xsl:value-of select="title"/></h3> </xsl:template> </xsl:stylesheet>View Transformed Document in browser
<?xml version="1.0" encoding="utf-8"?> Small chamber ensembles - 2-4 Players by New York Women Composers Compositions by the members of New York Women Composers music publishing scores women composers New York July 28, 1999 1999 New York Women Composers Elliotte Rusty Harold Julie Mandel Margaret De Wys Beth Anderson Linda Bouchard <h3>Trio for Flute, Viola and Harp</h3> <h3>Charmonium</h3> <h3>Invention for Flute and Piano</h3> <h3>Little Trio</h3> <h3>Dr. Blood's Mermaid Lullaby</h3> <h3>Trio: Dream in D</h3> <h3>Propos II</h3> <h3>Rictus En Mirroir</h3>
<?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="/"> <html> <body> </body> </html> </xsl:template> <xsl:template match="composition"> <h3><xsl:value-of select="title"/></h3> </xsl:template> </xsl:stylesheet>View Transformed Document in browser
<html> <body></body> </html>
<?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="/"> <html> <body> <xsl:apply-templates/> </body> </html> </xsl:template> <xsl:template match="composition"> <h3><xsl:value-of select="title"/></h3> </xsl:template> </xsl:stylesheet>
<html> <body> Small chamber ensembles - 2-4 Players by New York Women Composers Compositions by the members of New York Women Composers music publishing scores women composers New York July 28, 1999 1999 New York Women Composers Elliotte Rusty Harold Julie Mandel Margaret De Wys Beth Anderson Linda Bouchard <h3>Trio for Flute, Viola and Harp</h3> <h3>Charmonium</h3> <h3>Invention for Flute and Piano</h3> <h3>Little Trio</h3> <h3>Dr. Blood's Mermaid Lullaby</h3> <h3>Trio: Dream in D</h3> <h3>Propos II</h3> <h3>Rictus En Mirroir</h3> </body> </html>
<?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="/"> <html> <xsl:apply-templates select="catalog"/> </html> </xsl:template> <xsl:template match="catalog"> <head> <title><xsl:value-of select="category"/></title> </head> <body> <h1><xsl:value-of select="category"/></h1> <xsl:apply-templates select="composition"/> </body> </xsl:template> <xsl:template match="composition"> <h3><xsl:value-of select="title"/></h3> </xsl:template> </xsl:stylesheet>View Transformed Document in browser
<html> <head> <meta http-equiv="Content-Type" content="application/xml; charset=utf-8"> <title> Small chamber ensembles - 2-4 Players by New York Women Composers </title> </head> <body> <h1> Small chamber ensembles - 2-4 Players by New York Women Composers </h1> <h3>Trio for Flute, Viola and Harp</h3> <h3>Charmonium</h3> <h3>Invention for Flute and Piano</h3> <h3>Little Trio</h3> <h3>Dr. Blood's Mermaid Lullaby</h3> <h3>Trio: Dream in D</h3> <h3>Propos II</h3> <h3>Rictus En Mirroir</h3> </body> </html>
<?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="/"> <html> <xsl:apply-templates select="catalog"/> </html> </xsl:template> <xsl:template match="catalog"> <head> <title><xsl:value-of select="category"/></title> </head> <body> <h1><xsl:value-of select="category"/></h1> <xsl:apply-templates select="composition"/> <hr/> Copyright <xsl:value-of select="copyright"/><br/> Last Modified: <xsl:value-of select="last_updated"/> </body> </xsl:template> <xsl:template match="composition"> <h3><xsl:value-of select="title"/></h3> <ul> <li><xsl:value-of select="date"/></li> <li><xsl:value-of select="length"/></li> <li><xsl:value-of select="instruments"/></li> <li><xsl:value-of select="publisher"/></li> </ul> <p><xsl:value-of select="description"/></p> </xsl:template> </xsl:stylesheet>View Transformed Document in browser
<html> <head> <meta http-equiv="Content-Type" content="application/xml; charset=utf-8"> <title> Small chamber ensembles - 2-4 Players by New York Women Composers </title> </head> <body> <h1> Small chamber ensembles - 2-4 Players by New York Women Composers </h1> <h3>Trio for Flute, Viola and Harp</h3> <ul> <li>1994</li> <li>13'38"</li> <li>fl, hp, vla</li> <li>Theodore Presser</li> </ul> <p> Premiered at Queens College in April, 1996 by Sue Ann Kahn, Christine Ims, and Susan Jolles. In 3 movements : mvt. 1: 5:01 mvt. 2: 4:11 mvt. 3: 4:26 </p> <h3>Charmonium</h3> <ul> <li>1991</li> <li>9'</li> <li>2 vln, vla, vc</li> <li></li> </ul> <p> Commissioned as quartet for the Meridian String Quartet. Sonorous, bold. Moderate difficulty. Tape available. </p> <h3>Invention for Flute and Piano</h3> <ul> <li>1994</li> <li></li> <li>fl, pn</li> <li></li> </ul> <p>3 movements</p> <h3>Little Trio</h3> <ul> <li>1984</li> <li>4'</li> <li>fl, guit, va</li> <li>ACA</li> </ul> <p></p> <h3>Dr. Blood's Mermaid Lullaby</h3> <ul> <li>1980</li> <li>3'</li> <li>fl or ob, or vn, or vc, pn</li> <li>ACA</li> </ul> <p></p> <h3>Trio: Dream in D</h3> <ul> <li>1980</li> <li>10'</li> <li>fl, pn, vc, or vn, pn, vc</li> <li></li> </ul> <p> Rhapsodic. Passionate. Available on CD Two by Three from North/South Consonance (1998). </p> <h3>Propos II</h3> <ul> <li>1985</li> <li>11'</li> <li>2 tpt</li> <li></li> </ul> <p>Arrangement from Propos</p> <h3>Rictus En Mirroir</h3> <ul> <li>1985</li> <li>14'</li> <li>fl, ob, hpschd, vc</li> <li></li> </ul> <p></p> <hr> Copyright 1999 New York Women Composers<br> Last Modified: July 28, 1999 </body> </html>
I want to add something like this to the footer so readers can contact me if there's a problem with the page:
Elliotte Rusty Harold<br/> elharo@metalab.unc.edu
This information comes from the maintainer
element:
<maintainer email="elharo@metalab.unc.edu" url="http://www.macfaq.com/personal.html"> <name> <first_name>Elliotte</first_name> <middle_name>Rusty</middle_name> <last_name>Harold</last_name> </name> </maintainer>
So we need a way to get content from attributes in the input document.
This is accomplished by prefixing the attribute name with @
.
<xsl:template match="catalog"> <head> <title><xsl:value-of select="category"/></title> </head> <body> <h1><xsl:value-of select="category"/></h1> <xsl:apply-templates select="composition"/> <hr/> Copyright <xsl:value-of select="copyright"/><br/> Last Modified: <xsl:value-of select="last_updated"/><br/> <xsl:apply-templates select="maintainer"/> </body> </xsl:template> <xsl:template match="maintainer"> <xsl:value-of select="name"/><br/> <xsl:value-of select="@email"/> </xsl:template>
<html> <head> <meta http-equiv="Content-Type" content="application/xml; charset=utf-8"> <title> Small chamber ensembles - 2-4 Players by New York Women Composers </title> </head> <body> <h1> Small chamber ensembles - 2-4 Players by New York Women Composers </h1> <h3>Trio for Flute, Viola and Harp</h3> <ul> <li>1994</li> <li>13'38"</li> <li>fl, hp, vla</li> <li>Theodore Presser</li> </ul> <p> Premiered at Queens College in April, 1996 by Sue Ann Kahn, Christine Ims, and Susan Jolles. In 3 movements : mvt. 1: 5:01 mvt. 2: 4:11 mvt. 3: 4:26 </p> <h3>Charmonium</h3> <ul> <li>1991</li> <li>9'</li> <li>2 vln, vla, vc</li> <li></li> </ul> <p> Commissioned as quartet for the Meridian String Quartet. Sonorous, bold. Moderate difficulty. Tape available. </p> <h3>Invention for Flute and Piano</h3> <ul> <li>1994</li> <li></li> <li>fl, pn</li> <li></li> </ul> <p>3 movements</p> <h3>Little Trio</h3> <ul> <li>1984</li> <li>4'</li> <li>fl, guit, va</li> <li>ACA</li> </ul> <p></p> <h3>Dr. Blood's Mermaid Lullaby</h3> <ul> <li>1980</li> <li>3'</li> <li>fl or ob, or vn, or vc, pn</li> <li>ACA</li> </ul> <p></p> <h3>Trio: Dream in D</h3> <ul> <li>1980</li> <li>10'</li> <li>fl, pn, vc, or vn, pn, vc</li> <li></li> </ul> <p> Rhapsodic. Passionate. Available on CD Two by Three from North/South Consonance (1998). </p> <h3>Propos II</h3> <ul> <li>1985</li> <li>11'</li> <li>2 tpt</li> <li></li> </ul> <p>Arrangement from Propos</p> <h3>Rictus En Mirroir</h3> <ul> <li>1985</li> <li>14'</li> <li>fl, ob, hpschd, vc</li> <li></li> </ul> <p></p> <hr> Copyright 1999 New York Women Composers<br> Last Modified: July 28, 1999<br> Elliotte Rusty Harold <br>elharo@metalab.unc.edu </body> </html>
It would be nice to make the maintainer's name a link to his home page, and his email address a mailto: link. In other words,we want to add this to the footer:
<a href="http://www.macfaq.com/personal.html">Elliotte Rusty Harold</a><br/> <a href="mailto:elharo@metalab.unc.edu">elharo@metalab.unc.edu</a>
So we need a way to copy nodes from the input document to attribute values in the output document. This is done with an attribute value template
<xsl:template match="maintainer"> <a href="{@url}"><xsl:value-of select="name"/></a><br/> <a href="mailto:{@email}"><xsl:value-of select="@email"/></a> </xsl:template>
An attribute value template is a select expression inside curly braces
such as {@url}
. Here the attribute value template selects
attribute values in the input document, but it can also select
element content or more complicated expressions. Notice that the attribute value template
does not have to be the only thing in the attribute value.
There can even be more than one attribue value templates in an attribute value.
<html> <head> <meta http-equiv="Content-Type" content="application/xml; charset=utf-8"> <title> Small chamber ensembles - 2-4 Players by New York Women Composers </title> </head> <body> <h1> Small chamber ensembles - 2-4 Players by New York Women Composers </h1> <h3>Trio for Flute, Viola and Harp</h3> <ul> <li>1994</li> <li>13'38"</li> <li>fl, hp, vla</li> <li>Theodore Presser</li> </ul> <p> Premiered at Queens College in April, 1996 by Sue Ann Kahn, Christine Ims, and Susan Jolles. In 3 movements : mvt. 1: 5:01 mvt. 2: 4:11 mvt. 3: 4:26 </p> <h3>Charmonium</h3> <ul> <li>1991</li> <li>9'</li> <li>2 vln, vla, vc</li> <li></li> </ul> <p> Commissioned as quartet for the Meridian String Quartet. Sonorous, bold. Moderate difficulty. Tape available. </p> <h3>Invention for Flute and Piano</h3> <ul> <li>1994</li> <li></li> <li>fl, pn</li> <li></li> </ul> <p>3 movements</p> <h3>Little Trio</h3> <ul> <li>1984</li> <li>4'</li> <li>fl, guit, va</li> <li>ACA</li> </ul> <p></p> <h3>Dr. Blood's Mermaid Lullaby</h3> <ul> <li>1980</li> <li>3'</li> <li>fl or ob, or vn, or vc, pn</li> <li>ACA</li> </ul> <p></p> <h3>Trio: Dream in D</h3> <ul> <li>1980</li> <li>10'</li> <li>fl, pn, vc, or vn, pn, vc</li> <li></li> </ul> <p> Rhapsodic. Passionate. Available on CD Two by Three from North/South Consonance (1998). </p> <h3>Propos II</h3> <ul> <li>1985</li> <li>11'</li> <li>2 tpt</li> <li></li> </ul> <p>Arrangement from Propos</p> <h3>Rictus En Mirroir</h3> <ul> <li>1985</li> <li>14'</li> <li>fl, ob, hpschd, vc</li> <li></li> </ul> <p></p> <hr> Copyright 1999 New York Women Composers<br> Last Modified: July 28, 1999<br><a href="http://www.macfaq.com/personal.html"> Elliotte Rusty Harold </a><br><a href="mailto:elharo@metalab.unc.edu">elharo@metalab.unc.edu</a></body> </html>
Right now the descriptions in the output document are pure text. The descriptions in the input document are somewhat more styled and include paragraphs, unordered lists and citations; e.g.
<description>
<p>Premiered at Queens College in April, 1996 by Sue Ann Kahn,
Christine Ims, and Susan Jolles. In 3 movements :</p>
<ul>
<li>mvt. 1: 5:01</li>
<li>mvt. 2: 4:11</li>
<li>mvt. 3: 4:26</li>
</ul>
</description>
But all this is stripped by the default template rule used for the description.
We can use xsl:copy
to move these elements into
the output more or less as is. Here are the necessary rules for the four HTML
elements found in compositions.xml:
<xsl:template match="p">
<xsl:copy>
<xsl:apply-templates/>
</xsl:copy>
</xsl:template>
<!-- pass HTML along unchanged -->
<xsl:template match="ul">
<xsl:copy>
<xsl:apply-templates/>
</xsl:copy>
</xsl:template>
<xsl:template match="li">
<xsl:copy>
<xsl:apply-templates"/>
</xsl:copy>
</xsl:template>
<xsl:template match="cite">
<xsl:copy>
<xsl:apply-templates"/>
</xsl:copy>
</xsl:template>
We also have to apply templates to the description
element rather than taking its value:
<xsl:template match="composition"> <h3><xsl:value-of select="title"/></h3> <ul> <li><xsl:value-of select="date"/></li> <li><xsl:value-of select="length"/></li> <li><xsl:value-of select="instruments"/></li> <li><xsl:value-of select="publisher"/></li> </ul> <p><xsl:apply-templates select="description"/></p> </xsl:template>
<html> <head> <meta http-equiv="Content-Type" content="application/xml; charset=utf-8"> <title> Small chamber ensembles - 2-4 Players by New York Women Composers </title> </head> <body> <h1> Small chamber ensembles - 2-4 Players by New York Women Composers </h1> <h3>Trio for Flute, Viola and Harp</h3> <ul> <li>1994</li> <li>13'38"</li> <li>fl, hp, vla</li> <li>Theodore Presser</li> </ul> <p> <p>Premiered at Queens College in April, 1996 by Sue Ann Kahn, Christine Ims, and Susan Jolles. In 3 movements : </p> <ul> <li>mvt. 1: 5:01</li> <li>mvt. 2: 4:11</li> <li>mvt. 3: 4:26</li> </ul> </p> <h3>Charmonium</h3> <ul> <li>1991</li> <li>9'</li> <li>2 vln, vla, vc</li> <li></li> </ul> <p> <p>Commissioned as quartet for the Meridian String Quartet. Sonorous, bold. Moderate difficulty. Tape available. </p> </p> <h3>Invention for Flute and Piano</h3> <ul> <li>1994</li> <li></li> <li>fl, pn</li> <li></li> </ul> <p> <p>3 movements</p> </p> <h3>Little Trio</h3> <ul> <li>1984</li> <li>4'</li> <li>fl, guit, va</li> <li>ACA</li> </ul> <p></p> <h3>Dr. Blood's Mermaid Lullaby</h3> <ul> <li>1980</li> <li>3'</li> <li>fl or ob, or vn, or vc, pn</li> <li>ACA</li> </ul> <p></p> <h3>Trio: Dream in D</h3> <ul> <li>1980</li> <li>10'</li> <li>fl, pn, vc, or vn, pn, vc</li> <li></li> </ul> <p> <p>Rhapsodic. Passionate. Available on CD <cite> Two by Three </cite> from North/South Consonance (1998). </p> </p> <h3>Propos II</h3> <ul> <li>1985</li> <li>11'</li> <li>2 tpt</li> <li></li> </ul> <p> <p>Arrangement from Propos</p> </p> <h3>Rictus En Mirroir</h3> <ul> <li>1985</li> <li>14'</li> <li>fl, ob, hpschd, vc</li> <li></li> </ul> <p></p> <hr> Copyright 1999 New York Women Composers<br> Last Modified: July 28, 1999<br><a href="http://www.macfaq.com/personal.html"> Elliotte Rusty Harold </a><br><a href="mailto:elharo@metalab.unc.edu">elharo@metalab.unc.edu</a></body> </html>
Since all four template rules for the HTML
element have the same content, we can combine them into a single rule
that applies to each of the four using the or operator
|
<xsl:template match="p|ul|li|cite"> <xsl:copy> <xsl:apply-templates/> </xsl:copy> </xsl:template>
Right now the descriptions in the input document only use a few HTML tags, but potentially they could use full HTML up to and including tables, images, styles, and more. You could include separate template rules for each of these, but it's easier to specify a rule that applies to all elements.
<!-- pass unrecognized tags along unchanged --> <xsl:template match="*"> <xsl:copy> <xsl:apply-templates/> </xsl:copy> </xsl:template>
The *
matches all elements that are not matched by some
more specific rules. It only matches element nodes, though. It does not match
nodes for
attributes
comments
processing instructions
namespaces
text
The output is the same in this case, though for a document that used more HTML it might be different.
To copy everything including:
attributes
comments
processing instructions
namespaces
text
we have to use greedier wild cards:
@* to copy attribute nodes
node()
to copy all other nodes
<!-- pass unrecognized nodes along unchanged --> <xsl:template match="node()|@*"> <xsl:copy> <xsl:apply-templates select="node()|@*"/> </xsl:copy> </xsl:template>
xt doesn't yet recognize node()
in match patterns
The output is the same in this case, though for a document that used more HTML it might be different.
Perhaps this is too greedy. Do we really only want to recognize
HTML in the description element? What if somebody puts HTML
in a different, element like instruments
?
What if somebody makes a mistake and adds an element
that shouldn't be there?
I don't think so, but it would be possible to use modes
or other techniques to make this default rule only apply
inside the description
element.
<xsl:template match="composition"> <h3><xsl:value-of select="title"/></h3> <ul> <xsl:if test="string(date)"> <li><xsl:value-of select="date"/></li> </xsl:if> <xsl:if test="string(length)"> <li><xsl:value-of select="length"/></li> </xsl:if> <xsl:if test="string(instruments)"> <li><xsl:value-of select="instruments"/></li> </xsl:if> <xsl:if test="string(publisher)"> <li><xsl:value-of select="publisher"/></li> </xsl:if> </ul> <p><xsl:apply-templates select="description"/></p> </xsl:template>
The string()
function converts the value of the element to a string
The number()
function converts the value of the element to a number
Zero length strings are false
There are all the <
, >
, =
, !=
,
<=
and >=
operators you expect
<html> <head> <meta http-equiv="Content-Type" content="application/xml; charset=utf-8"> <title> Small chamber ensembles - 2-4 Players by New York Women Composers </title> </head> <body> <h1> Small chamber ensembles - 2-4 Players by New York Women Composers </h1> <h3>Trio for Flute, Viola and Harp</h3> <ul> <li>1994</li> <li>13'38"</li> <li>fl, hp, vla</li> <li>Theodore Presser</li> </ul> <p> <description> <p>Premiered at Queens College in April, 1996 by Sue Ann Kahn, Christine Ims, and Susan Jolles. In 3 movements : </p> <ul> <li>mvt. 1: 5:01</li> <li>mvt. 2: 4:11</li> <li>mvt. 3: 4:26</li> </ul> </description> </p> <h3>Charmonium</h3> <ul> <li>1991</li> <li>9'</li> <li>2 vln, vla, vc</li> </ul> <p> <description> <p>Commissioned as quartet for the Meridian String Quartet. Sonorous, bold. Moderate difficulty. Tape available. </p> </description> </p> <h3>Invention for Flute and Piano</h3> <ul> <li>1994</li> <li>fl, pn</li> </ul> <p> <description> <p>3 movements</p> </description> </p> <h3>Little Trio</h3> <ul> <li>1984</li> <li>4'</li> <li>fl, guit, va</li> <li>ACA</li> </ul> <p></p> <h3>Dr. Blood's Mermaid Lullaby</h3> <ul> <li>1980</li> <li>3'</li> <li>fl or ob, or vn, or vc, pn</li> <li>ACA</li> </ul> <p></p> <h3>Trio: Dream in D</h3> <ul> <li>1980</li> <li>10'</li> <li>fl, pn, vc, or vn, pn, vc</li> </ul> <p> <description> <p>Rhapsodic. Passionate. Available on CD <cite> <a href="http://www.amazon.com/exec/obidos/ASIN/B000007NMH/qid=913265342/sr=1-2/" shape="rect"> Two by Three </a> </cite> from North/South Consonance (1998). </p> </description> </p> <h3>Propos II</h3> <ul> <li>1985</li> <li>11'</li> <li>2 tpt</li> </ul> <p> <description> <p>Arrangement from Propos</p> </description> </p> <h3>Rictus En Mirroir</h3> <ul> <li>1985</li> <li>14'</li> <li>fl, ob, hpschd, vc</li> </ul> <p></p> <hr> Copyright 1999 New York Women Composers<br> Last Modified: July 28, 1999<br><a href="http://www.macfaq.com/personal.html"> Elliotte Rusty Harold </a><br><a href="mailto:elharo@metalab.unc.edu">elharo@metalab.unc.edu</a></body> </html>
The composers and their compositions are linked through the
the id
attribute of the composer
element
and the composer
attribute of the composition
element.
<composer id="c3"> <name> <first_name>Beth</first_name> <middle_name></middle_name> <last_name>Anderson</last_name> </name> </composer> <composition composer="c3"> <title>Trio: Dream in D</title> <date><year>(1980)</year></date> <length>10'</length> <instruments>fl, pn, vc, or vn, pn, vc</instruments> <description> Rhapsodic. Passionate. Available on CD <cite><a href="http://www.amazon.com/exec/obidos/ASIN/B000007NMH/qid=913265342/sr%3D1-2/"> Two by Three</a></cite> from North/South Consonance (1998). </description> <publisher></publisher> </composition>
<xsl:template match="catalog"> <head> <title><xsl:value-of select="category"/></title> </head> <body> <h1><xsl:value-of select="category"/></h1> <xsl:apply-templates select="composer"/> <hr/> Copyright <xsl:value-of select="copyright"/><br/> Last Modified: <xsl:value-of select="last_updated"/><br/> <xsl:apply-templates select="maintainer"/> </body> </xsl:template> <xsl:template match="composer"> <h2><xsl:value-of select="name"/></h2> <xsl:apply-templates select="../composition[@composer=current()/@id]"/> </xsl:template>
..
selects the parent element
/
selects a child of the context node
Square braces []
include a predicate to winnow down the
selected nodes
The current()
function refers to the matched composer element
@composer()
and @id
take the value of the composer
attribute and the id
attribute
The =
operator compares the to attributes
<html> <head> <meta http-equiv="Content-Type" content="application/xml; charset=utf-8"> <title> Small chamber ensembles - 2-4 Players by New York Women Composers </title> </head> <body> <h1> Small chamber ensembles - 2-4 Players by New York Women Composers </h1> <h2> Julie Mandel </h2> <h3>Trio for Flute, Viola and Harp</h3> <ul> <li>1994</li> <li>13'38"</li> <li>fl, hp, vla</li> <li>Theodore Presser</li> </ul> <p> <description> <p>Premiered at Queens College in April, 1996 by Sue Ann Kahn, Christine Ims, and Susan Jolles. In 3 movements : </p> <ul> <li>mvt. 1: 5:01</li> <li>mvt. 2: 4:11</li> <li>mvt. 3: 4:26</li> </ul> </description> </p> <h3>Invention for Flute and Piano</h3> <ul> <li>1994</li> <li>fl, pn</li> </ul> <p> <description> <p>3 movements</p> </description> </p> <h2> Margaret De Wys </h2> <h3>Charmonium</h3> <ul> <li>1991</li> <li>9'</li> <li>2 vln, vla, vc</li> </ul> <p> <description> <p>Commissioned as quartet for the Meridian String Quartet. Sonorous, bold. Moderate difficulty. Tape available. </p> </description> </p> <h2> Beth Anderson </h2> <h3>Little Trio</h3> <ul> <li>1984</li> <li>4'</li> <li>fl, guit, va</li> <li>ACA</li> </ul> <p></p> <h3>Dr. Blood's Mermaid Lullaby</h3> <ul> <li>1980</li> <li>3'</li> <li>fl or ob, or vn, or vc, pn</li> <li>ACA</li> </ul> <p></p> <h3>Trio: Dream in D</h3> <ul> <li>1980</li> <li>10'</li> <li>fl, pn, vc, or vn, pn, vc</li> </ul> <p> <description> <p>Rhapsodic. Passionate. Available on CD <cite> <a href="http://www.amazon.com/exec/obidos/ASIN/B000007NMH/qid=913265342/sr=1-2/" shape="rect"> Two by Three </a> </cite> from North/South Consonance (1998). </p> </description> </p> <h2> Linda Bouchard </h2> <h3>Propos II</h3> <ul> <li>1985</li> <li>11'</li> <li>2 tpt</li> </ul> <p> <description> <p>Arrangement from Propos</p> </description> </p> <h3>Rictus En Mirroir</h3> <ul> <li>1985</li> <li>14'</li> <li>fl, ob, hpschd, vc</li> </ul> <p></p> <hr> Copyright 1999 New York Women Composers<br> Last Modified: July 28, 1999<br><a href="http://www.macfaq.com/personal.html"> Elliotte Rusty Harold </a><br><a href="mailto:elharo@metalab.unc.edu">elharo@metalab.unc.edu</a></body> </html>
<xsl:template match="catalog"> <head> <title><xsl:value-of select="category"/></title> </head> <body> <h1><xsl:value-of select="category"/></h1> <xsl:apply-templates select="composer"> <xsl:sort select="name/last_name"/> </xsl:apply-templates> <hr/> Copyright <xsl:value-of select="copyright"/><br/> Last Modified: <xsl:value-of select="last_updated"/><br/> <xsl:apply-templates select="maintainer"/> </body> </xsl:template>
The select
attribute provides the key to sort by
Must be a child of xsl:apply-templates
or xsl:for-each
<html> <head> <meta http-equiv="Content-Type" content="application/xml; charset=utf-8"> <title> Small chamber ensembles - 2-4 Players by New York Women Composers </title> </head> <body> <h1> Small chamber ensembles - 2-4 Players by New York Women Composers </h1> <h2> Beth Anderson </h2> <h3>Little Trio</h3> <ul> <li>1984</li> <li>4'</li> <li>fl, guit, va</li> <li>ACA</li> </ul> <p></p> <h3>Dr. Blood's Mermaid Lullaby</h3> <ul> <li>1980</li> <li>3'</li> <li>fl or ob, or vn, or vc, pn</li> <li>ACA</li> </ul> <p></p> <h3>Trio: Dream in D</h3> <ul> <li>1980</li> <li>10'</li> <li>fl, pn, vc, or vn, pn, vc</li> </ul> <p> <description> <p>Rhapsodic. Passionate. Available on CD <cite> <a href="http://www.amazon.com/exec/obidos/ASIN/B000007NMH/qid=913265342/sr=1-2/" shape="rect"> Two by Three </a> </cite> from North/South Consonance (1998). </p> </description> </p> <h2> Linda Bouchard </h2> <h3>Propos II</h3> <ul> <li>1985</li> <li>11'</li> <li>2 tpt</li> </ul> <p> <description> <p>Arrangement from Propos</p> </description> </p> <h3>Rictus En Mirroir</h3> <ul> <li>1985</li> <li>14'</li> <li>fl, ob, hpschd, vc</li> </ul> <p></p> <h2> Julie Mandel </h2> <h3>Trio for Flute, Viola and Harp</h3> <ul> <li>1994</li> <li>13'38"</li> <li>fl, hp, vla</li> <li>Theodore Presser</li> </ul> <p> <description> <p>Premiered at Queens College in April, 1996 by Sue Ann Kahn, Christine Ims, and Susan Jolles. In 3 movements : </p> <ul> <li>mvt. 1: 5:01</li> <li>mvt. 2: 4:11</li> <li>mvt. 3: 4:26</li> </ul> </description> </p> <h3>Invention for Flute and Piano</h3> <ul> <li>1994</li> <li>fl, pn</li> </ul> <p> <description> <p>3 movements</p> </description> </p> <h2> Margaret De Wys </h2> <h3>Charmonium</h3> <ul> <li>1991</li> <li>9'</li> <li>2 vln, vla, vc</li> </ul> <p> <description> <p>Commissioned as quartet for the Meridian String Quartet. Sonorous, bold. Moderate difficulty. Tape available. </p> </description> </p> <hr> Copyright 1999 New York Women Composers<br> Last Modified: July 28, 1999<br><a href="http://www.macfaq.com/personal.html"> Elliotte Rusty Harold </a><br><a href="mailto:elharo@metalab.unc.edu">elharo@metalab.unc.edu</a></body> </html>
<xsl:template match="catalog"> <head> <title><xsl:value-of select="category"/></title> </head> <body> <h1><xsl:value-of select="category"/></h1> <xsl:apply-templates select="composer"> <xsl:sort select="name/last_name"/> <xsl:sort select="name/first_name"/> <xsl:sort select="name/middle_name"/> </xsl:apply-templates> <hr/> Copyright <xsl:value-of select="copyright"/><br/> Last Modified: <xsl:value-of select="last_updated"/><br/> <xsl:apply-templates select="maintainer"/> </body> </xsl:template>
<html> <head> <meta http-equiv="Content-Type" content="application/xml; charset=utf-8"> <title> Small chamber ensembles - 2-4 Players by New York Women Composers </title> </head> <body> <h1> Small chamber ensembles - 2-4 Players by New York Women Composers </h1> <h2> Beth Anderson </h2> <h3>Little Trio</h3> <ul> <li>1984</li> <li>4'</li> <li>fl, guit, va</li> <li>ACA</li> </ul> <p></p> <h3>Dr. Blood's Mermaid Lullaby</h3> <ul> <li>1980</li> <li>3'</li> <li>fl or ob, or vn, or vc, pn</li> <li>ACA</li> </ul> <p></p> <h3>Trio: Dream in D</h3> <ul> <li>1980</li> <li>10'</li> <li>fl, pn, vc, or vn, pn, vc</li> </ul> <p> <description> <p>Rhapsodic. Passionate. Available on CD <cite> <a href="http://www.amazon.com/exec/obidos/ASIN/B000007NMH/qid=913265342/sr=1-2/" shape="rect"> Two by Three </a> </cite> from North/South Consonance (1998). </p> </description> </p> <h2> Linda Bouchard </h2> <h3>Propos II</h3> <ul> <li>1985</li> <li>11'</li> <li>2 tpt</li> </ul> <p> <description> <p>Arrangement from Propos</p> </description> </p> <h3>Rictus En Mirroir</h3> <ul> <li>1985</li> <li>14'</li> <li>fl, ob, hpschd, vc</li> </ul> <p></p> <h2> Julie Mandel </h2> <h3>Trio for Flute, Viola and Harp</h3> <ul> <li>1994</li> <li>13'38"</li> <li>fl, hp, vla</li> <li>Theodore Presser</li> </ul> <p> <description> <p>Premiered at Queens College in April, 1996 by Sue Ann Kahn, Christine Ims, and Susan Jolles. In 3 movements : </p> <ul> <li>mvt. 1: 5:01</li> <li>mvt. 2: 4:11</li> <li>mvt. 3: 4:26</li> </ul> </description> </p> <h3>Invention for Flute and Piano</h3> <ul> <li>1994</li> <li>fl, pn</li> </ul> <p> <description> <p>3 movements</p> </description> </p> <h2> Margaret De Wys </h2> <h3>Charmonium</h3> <ul> <li>1991</li> <li>9'</li> <li>2 vln, vla, vc</li> </ul> <p> <description> <p>Commissioned as quartet for the Meridian String Quartet. Sonorous, bold. Moderate difficulty. Tape available. </p> </description> </p> <hr> Copyright 1999 New York Women Composers<br> Last Modified: July 28, 1999<br><a href="http://www.macfaq.com/personal.html"> Elliotte Rusty Harold </a><br><a href="mailto:elharo@metalab.unc.edu">elharo@metalab.unc.edu</a></body> </html>
Sorting by composition title is equally straight-forward
but we have to do it in a separate apply-templates
element
<xsl:template match="catalog"> <head> <title><xsl:value-of select="category"/></title> </head> <body> <h1><xsl:value-of select="category"/></h1> <xsl:apply-templates select="composer"> <xsl:sort select="name/last_name"/> <xsl:sort select="name/first_name"/> <xsl:sort select="name/middle_name"/> </xsl:apply-templates> <hr/> Copyright <xsl:value-of select="copyright"/><br/> Last Modified: <xsl:value-of select="last_updated"/><br/> <xsl:apply-templates select="maintainer"/> </body> </xsl:template> <xsl:template match="composer"> <h2><xsl:value-of select="name"/></h2> <xsl:apply-templates select="../composition[@composer=current()/@id]"> <xsl:sort select="title"/> </xsl:apply-templates> </xsl:template>
<html> <head> <meta http-equiv="Content-Type" content="application/xml; charset=utf-8"> <title> Small chamber ensembles - 2-4 Players by New York Women Composers </title> </head> <body> <h1> Small chamber ensembles - 2-4 Players by New York Women Composers </h1> <h2> Beth Anderson </h2> <h3>Dr. Blood's Mermaid Lullaby</h3> <ul> <li>1980</li> <li>3'</li> <li>fl or ob, or vn, or vc, pn</li> <li>ACA</li> </ul> <p></p> <h3>Little Trio</h3> <ul> <li>1984</li> <li>4'</li> <li>fl, guit, va</li> <li>ACA</li> </ul> <p></p> <h3>Trio: Dream in D</h3> <ul> <li>1980</li> <li>10'</li> <li>fl, pn, vc, or vn, pn, vc</li> </ul> <p> <description> <p>Rhapsodic. Passionate. Available on CD <cite> <a href="http://www.amazon.com/exec/obidos/ASIN/B000007NMH/qid=913265342/sr=1-2/" shape="rect"> Two by Three </a> </cite> from North/South Consonance (1998). </p> </description> </p> <h2> Linda Bouchard </h2> <h3>Propos II</h3> <ul> <li>1985</li> <li>11'</li> <li>2 tpt</li> </ul> <p> <description> <p>Arrangement from Propos</p> </description> </p> <h3>Rictus En Mirroir</h3> <ul> <li>1985</li> <li>14'</li> <li>fl, ob, hpschd, vc</li> </ul> <p></p> <h2> Julie Mandel </h2> <h3>Invention for Flute and Piano</h3> <ul> <li>1994</li> <li>fl, pn</li> </ul> <p> <description> <p>3 movements</p> </description> </p> <h3>Trio for Flute, Viola and Harp</h3> <ul> <li>1994</li> <li>13'38"</li> <li>fl, hp, vla</li> <li>Theodore Presser</li> </ul> <p> <description> <p>Premiered at Queens College in April, 1996 by Sue Ann Kahn, Christine Ims, and Susan Jolles. In 3 movements : </p> <ul> <li>mvt. 1: 5:01</li> <li>mvt. 2: 4:11</li> <li>mvt. 3: 4:26</li> </ul> </description> </p> <h2> Margaret De Wys </h2> <h3>Charmonium</h3> <ul> <li>1991</li> <li>9'</li> <li>2 vln, vla, vc</li> </ul> <p> <description> <p>Commissioned as quartet for the Meridian String Quartet. Sonorous, bold. Moderate difficulty. Tape available. </p> </description> </p> <hr> Copyright 1999 New York Women Composers<br> Last Modified: July 28, 1999<br><a href="http://www.macfaq.com/personal.html"> Elliotte Rusty Harold </a><br><a href="mailto:elharo@metalab.unc.edu">elharo@metalab.unc.edu</a></body> </html>
<xsl:template match="catalog"> <head> <title><xsl:value-of select="category"/></title> </head> <body> <!-- Header --> <h1><xsl:value-of select="category"/></h1> <ul> <xsl:for-each select="composition"> <li><xsl:value-of select="title"/></li> </xsl:for-each> </ul> <!-- Body --> <xsl:apply-templates select="composer"> <xsl:sort select="name/last_name"/> <xsl:sort select="name/first_name"/> <xsl:sort select="name/middle_name"/> </xsl:apply-templates> <!-- Signature --> <hr/> Copyright <xsl:value-of select="copyright"/><br/> Last Modified: <xsl:value-of select="last_updated"/><br/> <xsl:apply-templates select="maintainer"/> </body> </xsl:template>
.
selects the context node
xsl:for-each
iterates through the selected nodes,
setting each one to the current node in turn but does not
apply templates to that node.
<html> <head> <meta http-equiv="Content-Type" content="application/xml; charset=utf-8"> <title> Small chamber ensembles - 2-4 Players by New York Women Composers </title> </head> <body> <h1> Small chamber ensembles - 2-4 Players by New York Women Composers </h1> <ul> <li>Trio for Flute, Viola and Harp</li> <li>Charmonium</li> <li>Invention for Flute and Piano</li> <li>Little Trio</li> <li>Dr. Blood's Mermaid Lullaby</li> <li>Trio: Dream in D</li> <li>Propos II</li> <li>Rictus En Mirroir</li> </ul> <h2> Beth Anderson </h2> <h3>Dr. Blood's Mermaid Lullaby</h3> <ul> <li>1980</li> <li>3'</li> <li>fl or ob, or vn, or vc, pn</li> <li>ACA</li> </ul> <p></p> <h3>Little Trio</h3> <ul> <li>1984</li> <li>4'</li> <li>fl, guit, va</li> <li>ACA</li> </ul> <p></p> <h3>Trio: Dream in D</h3> <ul> <li>1980</li> <li>10'</li> <li>fl, pn, vc, or vn, pn, vc</li> </ul> <p> <description> <p>Rhapsodic. Passionate. Available on CD <cite> <a href="http://www.amazon.com/exec/obidos/ASIN/B000007NMH/qid=913265342/sr=1-2/" shape="rect"> Two by Three </a> </cite> from North/South Consonance (1998). </p> </description> </p> <h2> Linda Bouchard </h2> <h3>Propos II</h3> <ul> <li>1985</li> <li>11'</li> <li>2 tpt</li> </ul> <p> <description> <p>Arrangement from Propos</p> </description> </p> <h3>Rictus En Mirroir</h3> <ul> <li>1985</li> <li>14'</li> <li>fl, ob, hpschd, vc</li> </ul> <p></p> <h2> Julie Mandel </h2> <h3>Invention for Flute and Piano</h3> <ul> <li>1994</li> <li>fl, pn</li> </ul> <p> <description> <p>3 movements</p> </description> </p> <h3>Trio for Flute, Viola and Harp</h3> <ul> <li>1994</li> <li>13'38"</li> <li>fl, hp, vla</li> <li>Theodore Presser</li> </ul> <p> <description> <p>Premiered at Queens College in April, 1996 by Sue Ann Kahn, Christine Ims, and Susan Jolles. In 3 movements : </p> <ul> <li>mvt. 1: 5:01</li> <li>mvt. 2: 4:11</li> <li>mvt. 3: 4:26</li> </ul> </description> </p> <h2> Margaret De Wys </h2> <h3>Charmonium</h3> <ul> <li>1991</li> <li>9'</li> <li>2 vln, vla, vc</li> </ul> <p> <description> <p>Commissioned as quartet for the Meridian String Quartet. Sonorous, bold. Moderate difficulty. Tape available. </p> </description> </p> <hr> Copyright 1999 New York Women Composers<br> Last Modified: July 28, 1999<br><a href="http://www.macfaq.com/personal.html"> Elliotte Rusty Harold </a><br><a href="mailto:elharo@metalab.unc.edu">elharo@metalab.unc.edu</a></body> </html>
xsl:for-each
can have an
xsl:sort
child just like xsl:apply-templates
<xsl:template match="catalog"> <head> <title><xsl:value-of select="category"/></title> </head> <body> <!-- Header --> <h1><xsl:value-of select="category"/></h1> <ul> <xsl:for-each select="composition"> <xsl:sort select="title"/> <li><xsl:value-of select="title"/></li> </xsl:for-each> </ul> <!-- Body --> <xsl:apply-templates select="composer"> <xsl:sort select="name/last_name"/> <xsl:sort select="name/first_name"/> <xsl:sort select="name/middle_name"/> </xsl:apply-templates> <!-- Signature --> <hr/> Copyright <xsl:value-of select="copyright"/><br/> Last Modified: <xsl:value-of select="last_updated"/><br/> <xsl:apply-templates select="maintainer"/> </body> </xsl:template>
We need to add <a name="some_name">title</a>
around each compositon title so we have something to link to.
The generate-id()
function will choose a unique ID
for a particular element.
Here's the new template for the composition
<xsl:template match="composition"> <h3> <a name="{generate-id()}"> <xsl:value-of select="title"/> </a> </h3> <ul> <xsl:if test="string(date)"> <li><xsl:value-of select="date"/></li> </xsl:if> <xsl:if test="string(length)"> <li><xsl:value-of select="length"/></li> </xsl:if> <xsl:if test="string(instruments)"> <li><xsl:value-of select="instruments"/></li> </xsl:if> <xsl:if test="string(publisher)"> <li><xsl:value-of select="publisher"/></li> </xsl:if> </ul> <p><xsl:apply-templates select="description"/></p> </xsl:template>
Here's the new template for the tabel of contents link
<xsl:template match="catalog"> <head> <title><xsl:value-of select="category"/></title> </head> <body> <!-- Header --> <h1><xsl:value-of select="category"/></h1> <ul> <xsl:for-each select="composition"> <xsl:sort select="title"/> <li> <a href="#{generate-id()}"> <xsl:value-of select="title"/> </a> </li> </xsl:for-each> </ul> <!-- Body --> <xsl:apply-templates select="composer"> <xsl:sort select="name/last_name"/> <xsl:sort select="name/first_name"/> <xsl:sort select="name/middle_name"/> </xsl:apply-templates> <!-- Signature --> <hr/> Copyright <xsl:value-of select="copyright"/><br/> Last Modified: <xsl:value-of select="last_updated"/><br/> <xsl:apply-templates select="maintainer"/> </body> </xsl:template>
Although the ID is generated in two separate places, it is generated for the same node. Consequently, they are the same.
<html> <head> <meta http-equiv="Content-Type" content="application/xml; charset=utf-8"> <title> Small chamber ensembles - 2-4 Players by New York Women Composers </title> </head> <body> <h1> Small chamber ensembles - 2-4 Players by New York Women Composers </h1> <ul> <li><a href="#b1ac21">Charmonium</a></li> <li><a href="#b1ac27">Dr. Blood's Mermaid Lullaby</a></li> <li><a href="#b1ac23">Invention for Flute and Piano</a></li> <li><a href="#b1ac25">Little Trio</a></li> <li><a href="#b1ac31">Propos II</a></li> <li><a href="#b1ac33">Rictus En Mirroir</a></li> <li><a href="#b1ac19">Trio for Flute, Viola and Harp</a></li> <li><a href="#b1ac29">Trio: Dream in D</a></li> </ul> <h2> Beth Anderson </h2> <h3><a name="b1ac27">Dr. Blood's Mermaid Lullaby</a></h3> <ul> <li>1980</li> <li>3'</li> <li>fl or ob, or vn, or vc, pn</li> <li>ACA</li> </ul> <p></p> <h3><a name="b1ac25">Little Trio</a></h3> <ul> <li>1984</li> <li>4'</li> <li>fl, guit, va</li> <li>ACA</li> </ul> <p></p> <h3><a name="b1ac29">Trio: Dream in D</a></h3> <ul> <li>1980</li> <li>10'</li> <li>fl, pn, vc, or vn, pn, vc</li> </ul> <p> <description> <p>Rhapsodic. Passionate. Available on CD <cite> <a href="http://www.amazon.com/exec/obidos/ASIN/B000007NMH/qid=913265342/sr=1-2/" shape="rect"> Two by Three </a> </cite> from North/South Consonance (1998). </p> </description> </p> <h2> Linda Bouchard </h2> <h3><a name="b1ac31">Propos II</a></h3> <ul> <li>1985</li> <li>11'</li> <li>2 tpt</li> </ul> <p> <description> <p>Arrangement from Propos</p> </description> </p> <h3><a name="b1ac33">Rictus En Mirroir</a></h3> <ul> <li>1985</li> <li>14'</li> <li>fl, ob, hpschd, vc</li> </ul> <p></p> <h2> Julie Mandel </h2> <h3><a name="b1ac23">Invention for Flute and Piano</a></h3> <ul> <li>1994</li> <li>fl, pn</li> </ul> <p> <description> <p>3 movements</p> </description> </p> <h3><a name="b1ac19">Trio for Flute, Viola and Harp</a></h3> <ul> <li>1994</li> <li>13'38"</li> <li>fl, hp, vla</li> <li>Theodore Presser</li> </ul> <p> <description> <p>Premiered at Queens College in April, 1996 by Sue Ann Kahn, Christine Ims, and Susan Jolles. In 3 movements : </p> <ul> <li>mvt. 1: 5:01</li> <li>mvt. 2: 4:11</li> <li>mvt. 3: 4:26</li> </ul> </description> </p> <h2> Margaret De Wys </h2> <h3><a name="b1ac21">Charmonium</a></h3> <ul> <li>1991</li> <li>9'</li> <li>2 vln, vla, vc</li> </ul> <p> <description> <p>Commissioned as quartet for the Meridian String Quartet. Sonorous, bold. Moderate difficulty. Tape available. </p> </description> </p> <hr> Copyright 1999 New York Women Composers<br> Last Modified: July 28, 1999<br><a href="http://www.macfaq.com/personal.html"> Elliotte Rusty Harold </a><br><a href="mailto:elharo@metalab.unc.edu">elharo@metalab.unc.edu</a></body> </html>
<xsl:template match="composition"> <h3><xsl:number value="position()"/>. <a name="{generate-id()}"> <xsl:value-of select="title"/> </a> </h3> <ul> <xsl:if test="string(date)"> <li><xsl:value-of select="date"/></li> </xsl:if> <xsl:if test="string(length)"> <li><xsl:value-of select="length"/></li> </xsl:if> <xsl:if test="string(instruments)"> <li><xsl:value-of select="instruments"/></li> </xsl:if> <xsl:if test="string(publisher)"> <li><xsl:value-of select="publisher"/></li> </xsl:if> </ul> <p><xsl:apply-templates select="description"/></p> </xsl:template>
The xsl:number
element has a variety of attributes to determine
number style, exactly what's counted, where numbering starts, and so forth
The position()
function returns the position of the current
node in the context node list
XSL has a number of basic functions for working with strings:
starts-with(main_string, prefix_string)
contains(containing_string, contained_string)
substring(string, offset, length)
substring-before(string, marker-string)
substring-after(string, marker-string)
string-length(string)
normalize(string)
translate(string, replaced_text, replacement_text)
concat(string1, string2, ...)
The strings these operate on are generally the values of nodes
These may be part of any select expression, but are most commonly used
in xsl:value-of
.
XSL does not, however, provide full Perl or POSIX regular expressions.
<xsl:template match="composition"> <h3><xsl:number value="position()"/>. <a name="{generate-id()}"> <xsl:value-of select="title"/> </a> </h3> <ul> <xsl:if test="string(date)"> <!--not Y10K safe! --> <li><xsl:value-of select="substring(date,3,2)"/></li> </xsl:if> <xsl:if test="string(length)"> <li><xsl:value-of select="length"/></li> </xsl:if> <xsl:if test="string(instruments)"> <li><xsl:value-of select="instruments"/></li> </xsl:if> <xsl:if test="string(publisher)"> <li><xsl:value-of select="publisher"/></li> </xsl:if> </ul> <p><xsl:apply-templates select="description"/></p> </xsl:template>
substring(string, offset, length)
1 is the first character
length is optional
node sets are automatically converted to their values
<html></html>
XSL has several operators for doing arithmetic:
+
-
*
div
mod
These may be part of any select expression, but are most commonly used in predicates with comparison operators.
XSL includes five functions that operate on numbers:
floor(number)
returns the greatest integer smaller than the number
ceiling(number)
returns the smallest integer greater than the number
round(number)
rounds the number to the nearest integer
sum(number)
returns the sum of its arguments
format-number(number, format-string)
returns the string form of a number formatted according to the specified format-string as if by Java 1.1's
java.text.DecimalFormat
class
There are three primary ways XML documents are transformed into other formats, such as HTML, with an XSL style sheet:
The XML document and associated style sheet are both served to the client (Web browser), which then transforms the document as specified by the style sheet and presents it to the user.
The server applies an XSL style sheet to an XML document to transform it to some other format (generally HTML) and sends the transformed document to the client (Web browser).
A third program transforms the original XML document into some other format (often HTML) before the document is placed on the server. Both server and client only deal with the post-transform document.
Attaching an XSL style sheet to an XML document is easy.
Simply insert an xml-stylesheet
processing
instruction in the prolog immediately after the XML
declaration. This processing instruction should have a
type
attribute with the value
text/xsl
and an href
attribute
whose value is a URL pointing to the style sheet. For
example:
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="compositions.xsl"?>
This is also how you attach a CSS style sheet to a
document. The only difference here is that the
type
attribute has the value
text/xsl
instead of text/css
.
<?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/TR/WD-xsl"> <xsl:template match="/"> <html> <xsl:apply-templates select="catalog"/> </html> </xsl:template> <xsl:template match="catalog"> <body> <xsl:apply-templates select="composition"/> </body> </xsl:template> <xsl:template match="composition"> <h3><xsl:value-of select="name"/></h3> </xsl:template> </xsl:stylesheet>
Many more ways to select and match elements including descendants, attributes, comments, processing instructions, and text.
Many more tests for predicates including basic arithmetic operations
The xsl:element
, xsl:attribute
, xsl:processing-instruction
, xsl:comment
, and xsl:text
elements can output elements, attributes, processing instructions, comments, and text calculated from data in the input document.
The xsl:copy-of
element to copy
complete nodes from the input to the output
Parameters for passing arguments to templates
Modes for reprocessing the same element in a different fashion
Recursion
The xsl:variable
element defines named constants that can clarify your code.
Named templates, variables, and attribute sets help you reuse common template code.
The xsl:choose
and xsl:when
elements
let you select one of several possibilities depending on a condition.
The xsl:import
and xsl:include
elements merge rules from different style sheets.
The xsl:message
element
Various attributes of the xsl:output
element allow you to specify the output document's format, XML declaration, document type declaration, indentation, encoding and MIME type.
A general regular expression language
Non-final variables (hence no side effects)
Loops
The Extensible Style Language (XSL) comprises two separate XML applications for transforming and formatting XML documents.
An XSL transformation applies rules to a tree read from an XML document to transform it into an output tree written as an XML document.
An XSL template rule is an xsl:template
element with a match
attribute. Nodes in the
input tree are compared against the patterns of the
match
attributes of the different template
elements. When a match is found, the contents of the
template are output.
The value of a node is a pure text (no markup) string
containing the contents of the node. This can be calculated
by the xsl:value-of
element.
The xsl:apply-templates
element continues
processing the children of the current node
The xsl:if
element produces output if, and
only if, its test
attribute is true.
The xsl:number
element inserts the number
specified by its value
attribute into the
output using a specified number format given by the
format
attribute.
The
xsl:sort
element can reorder the input nodes
before copying them to the output.
The XML Bible
Elliotte Rusty Harold
IDG Books, 1999
ISBN: 0-7645-3236-7
This presentation: http://www.ibiblio.org/xml/slides/sd2000east/introxml/
Chapter 14 of the XML Bible: http://www.ibiblio.org/xml/books/bible/updates/14.html
XSL Transformations Specification: http://www.w3.org/TR/xslt
XPath Specification: http://www.w3.org/TR/xpath
Java works best
C, Perl, Python etc. can also be used
Unicode support is the biggest issue
Event based, stream model
Pushes data to your handler
Programs can plug in different parsers
import org.apache.xerces.parsers.*; import org.w3c.dom.*; import org.xml.sax.*; import java.io.IOException; import java.util.StringTokenizer; public class SAXWordCount implements ContentHandler { private int numWords; public void startDocument() throws SAXException { this.numWords = 0; } public void endDocument() throws SAXException { System.out.println(numWords + " words"); System.out.flush(); } private StringBuffer sb = new StringBuffer(); public void characters(char[] text, int start, int length) throws SAXException { sb.append(text, start, length); } private void flush() { numWords += countWords(sb.toString()); sb = new StringBuffer(); } // methods that signify a word break public void startElement(String namespaceURI, String localName, String rawName, Attributes atts) throws SAXException { this.flush(); } public void endElement(String namespaceURI, String localName, String rawName) throws SAXException { this.flush(); } public void processingInstruction(String target, String data) throws SAXException { this.flush(); } // methods that aren't necessary in this example public void startPrefixMapping(String prefix, String uri) throws SAXException { // ignore; } public void ignorableWhitespace(char[] text, int start, int length) throws SAXException { // ignore; } public void endPrefixMapping(String prefix) throws SAXException { // ignore; } public void skippedEntity(String name) throws SAXException { // ignore; } public void setDocumentLocator(Locator locator) {} private static int countWords(String s) { if (s == null) return 0; s = s.trim(); if (s.length() == 0) return 0; StringTokenizer st = new StringTokenizer(s); return st.countTokens(); } public static void main(String[] args) { SAXParser parser = new SAXParser(); SAXWordCount counter = new SAXWordCount(); parser.setContentHandler(counter); for (int i = 0; i < args.length; i++) { try { parser.parse(args[i]); } catch (SAXException e) { System.err.println(e); } catch (IOException e) { System.err.println(e); } } } // end main }
% java SAXWordCount hotcop.xml 15 words
Tree based
Your handler pulls data from the tree
Programs can plug in different parsers
import org.apache.xerces.parsers.*; import org.w3c.dom.*; import org.xml.sax.*; import java.io.IOException; import java.util.StringTokenizer; public class DOMWordCount { public static void main(String[] args) { DOMParser parser = new DOMParser(); DOMWordCount counter = new DOMWordCount(); for (int i = 0; i < args.length; i++) { try { // Read the entire document into memory parser.parse(args[i]); Document d = parser.getDocument(); int numWords = countWordsInNode(d); System.out.println(numWords + " words"); } catch (SAXException e) { System.err.println(e); } catch (IOException e) { System.err.println(e); } } } // end main // note use of recursion public static int countWordsInNode(Node node) { int numWords = 0; if (node.hasChildNodes()) { NodeList children = node.getChildNodes(); for (int i = 0; i < children.getLength(); i++) { numWords += countWordsInNode(children.item(i)); } } int type = node.getNodeType(); if (type == Node.TEXT_NODE) { String s = node.getNodeValue(); numWords += countWordsInString(s); } return numWords; } private static int countWordsInString(String s) { if (s == null) return 0; s = s.trim(); if (s.length() == 0) return 0; StringTokenizer st = new StringTokenizer(s); return st.countTokens(); } }
% java DOMWordCount hotcop.xml 15 words
The XML Bible
Elliotte Rusty Harold
IDG Books, 1999
ISBN: 0-7645-3236-7
This presentation: http://www.ibiblio.org/xml/slides/sd2000east/introxml
Chapter 14 of the XML Bible: http://www.ibiblio.org/xml/books/bible/updates/14.html
XSL Transformations Specification: http://www.w3.org/TR/xslt
XPath Specification: http://www.w3.org/TR/xpath
XML Specification: http://www.w3.org/TR/REC-xml