Part I: XML Overview
Part II: What is XML Good For?
Part III: A Practical Example
Part IV: Well-formedness
Part V: DTDs and Validity
Part VI: Namespaces
Extensible Markup Language
A syntax for documents
A Meta-Markup Language
A Structural and Semantic language, not a formatting language
Not just for Web pages
Not like HTML, troff, LaTeX
Make up the tags you need as you need them
The tags you create can be documented in a Document Type Definition (DTD)
A meta syntax for domain-specific markup languages like MusicML, MathML, and CML
XML documents form a tree
Element and attribute names reflect the kind of the element
Formatting can be added with a style sheet
<dt>Hot Cop
<dd> by Jacques Morali, Henri Belolo, and Victor Willis
<ul>
<li>Producer: Jacques Morali
<li>Publisher: PolyGram Records
<li>Length: 6:20
<li>Written: 1978
<li>Artist: Village People
</ul>
View Document in Browser<?xml version="1.0"> <SONG> <TITLE>Hot Cop</TITLE> <COMPOSER>Jacques Morali</COMPOSER> <COMPOSER>Henri Belolo</COMPOSER> <COMPOSER>Victor Willis</COMPOSER> <PRODUCER>Jacques Morali</PRODUCER> <PUBLISHER>PolyGram Records</PUBLISHER> <LENGTH>6:20</LENGTH> <YEAR>1978</YEAR> <ARTIST>Village People</ARTIST> </SONG>
Documents are composed primarily of elements
An element is delimited by a start tag and a matching end tag:
<COMPOSER>Jacques Morali</COMPOSER>
Start tag <COMPOSER>
Contents "Jacques Morali"
End tag </COMPOSER>
Elements can contain other elements:
<SONG>
<TITLE>Hot Cop</TITLE>
<COMPOSER>Jacques Morali</COMPOSER>
<COMPOSER>Henri Belolo</COMPOSER>
<COMPOSER>Victor Willis</COMPOSER>
<PRODUCER>Jacques Morali</PRODUCER>
<PUBLISHER>PolyGram Records</PUBLISHER>
<LENGTH>6:20</LENGTH>
<YEAR>1978</YEAR>
<ARTIST>Village People</ARTIST>
</SONG>
Every document has exactly one root element, also known as the document element, that completely contains all other elements.
SONG {display: block} TITLE {display: block; font-family: Helvetica, serif; font-size: 20pt; font-weight: bold} COMPOSER {display: block; font-family: Times, Times New Roman, serif; font-size: 14pt; font-style: italic} ARTIST {display: block; font-family: Times, Times New Roman, serif; font-size: 14pt; font-weight: bold; font-style: italic} PUBLISHER {display: block; font-family: Times, Times New Roman, serif; font-size: 14pt} LENGTH {display: block; font-family: Times, Times New Roman, serif; font-size: 14pt} YEAR {display: block; font-family: Times, Times New Roman, serif; font-size: 14pt}
<?xml-stylesheet type="text/css" href="song1.css"?>
<?xml version="1.0"?> <?xml-stylesheet type="text/css" href="song.css"?> <SONG> <TITLE>Hot Cop</TITLE> <COMPOSER>Jacques Morali</COMPOSER> <COMPOSER>Henri Belolo</COMPOSER> <COMPOSER>Victor Willis</COMPOSER> <PRODUCER>Jacques Morali</PRODUCER> <PUBLISHER>PolyGram Records</PUBLISHER> <LENGTH>6:20</LENGTH> <YEAR>1978</YEAR> <ARTIST>Village People</ARTIST> </SONG>
<?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="SONG"> <html> <body> <h1> <xsl:value-of select="TITLE"/> by the <xsl:value-of select="ARTIST"/> </h1> <ul> <xsl:apply-templates select="COMPOSER"/> <li>Publisher: <xsl:value-of select="PUBLISHER"/></li> <li>Year: <xsl:value-of select="YEAR"/></li> <li>Producer: <xsl:value-of select="PRODUCER"/></li> </ul> </body> </html> </xsl:template> <xsl:template match="COMPOSER"> <li>Composer: <xsl:value-of select="."/></li> </xsl:template> </xsl:stylesheet>
Browser support is weak to non-existent.
Can use third party tools like Xalan, Saxon, and XT
Let's use Saxon to apply this stylesheet to compositions.xml.
Windows executable:
C:\> saxon hotcop.xml song.xsl>hotcop.html
Java executable:
C:\> java com.icl.saxon.StyleSheet hotcop.xml song.xsl>hotcop.html
<html> <body> <h1>Hot Cop by the Village People </h1> <ul> <li>Composer: Jacques Morali</li> <li>Composer: Henri Belolo</li> <li>Composer: Victor Willis</li> <li>Publisher: PolyGram Records</li> <li>Year: 1978</li> <li>Producer: Jacques Morali</li> </ul> </body> </html>View in browser
Plain ASCII or UTF-8 text
.xml is standard file extension
Any standard text editor will work
<?xml version="1.0" encoding="UTF-8" standalone="yes"?> <?xml-stylesheet type="text/css" href="song.css"?> <SONG> <TITLE>Hot Cop</TITLE> <PHOTO SRC="hotcop.jpg" WIDTH="100" HEIGHT="200" ALT="Victor Willis in Cop Outfit"/> <COMPOSER>Jacques Morali</COMPOSER> <COMPOSER>Henri Belolo</COMPOSER> <COMPOSER>Victor Willis</COMPOSER> <PRODUCER>Jacques Morali</PRODUCER> <!-- The publisher is actually Polygram but I needed an example of a general entity reference. --> <PUBLISHER HREF="http://www.amrecords.com/"> A & M Records </PUBLISHER> <LENGTH>6:20</LENGTH> <YEAR>1978</YEAR> <ARTIST>Village People</ARTIST> </SONG> <!-- You can tell what album I was listening to when I wrote this example -->
New features:
Encoding declaration
Standalone declaration
Attributes
Comments
Empty Element Tags
Entity References
At the top of the document, you normally find an XML decalration:
<?xml version="1.0" encoding="ISO-8859-1" standalone="yes"?>
version info
required
always has the value 1.0
standalone document declaration
yes
no
encoding declaration
UTF-8
ISO-8859-1
etc.
name=value
An element may not have two attributes with the same name
Attribute values must be quoted
Attributes are for meta-data; elements are for data.
Does the reader want to see the information? If yes, use element content; if no, use attributes
Attributes are good for ID numbers, URLs, references, and other information not directly relevant to the reader
Attributes can't hold structure well.
Elements allow you to include meta-meta-data (information about the information about the information).
Not everyone always agrees on what is and isn't meta-data.
Elements are more extensible in the face of future changes.
Domain-Specific (Vertical) Markup Languages
Self-Describing Data
Interchange of Data Among Applications
Structured and Integrated Data
Markup language for a vertical market
Non-proprietary format
Don't pay for what you don't use
Much data is lost due to format problems
XML is very simple
XML is self-describing
XML is well documented
<PERSON ID="p1100" SEX="M">
<NAME>
<GIVEN>Judson</GIVEN>
<SURNAME>McDaniel</SURNAME>
</NAME>
<BIRTH>
<DATE>21 Feb 1834</DATE>
</BIRTH>
<DEATH>
<DATE>9 Dec 1905</DATE>
</DEATH>
</PERSON>
E-commerce
Syndication
EAI and EDI
A document can be assembled from multiple physical storage entities
These may be files, database queries, or anything that can be referred to by a URI
Can even include non-XML content
A specific markup language that uses the XML meta-syntax is called an XML application
Different XML applications have their own more constricted syntaxes and vocabularies within the broader XML syntax
Further syntax can be layered on top of this; e.g. data typing through schemas
Web Pages
Mathematical Equations
Music Notation
Vector Graphics
Metadata
and more...
<?xml version="1.0"?> <html xmlns="http://www.w3.org/TR/REC-html40" xmlns:m="http://www.w3.org/TR/REC-MathML/" > <head> <title>Fiat Lux</title> <meta name="GENERATOR" content="amaya V1.3b" /> </head> <body> <P> And God said, </P> <math> <m:mrow> <m:msub> <m:mi>δ</m:mi> <m:mi>α</m:mi> </m:msub> <m:msup> <m:mi>F</m:mi> <m:mi>αβ</m:mi> </m:msup> <m:mi></m:mi> <m:mo>=</m:mo> <m:mi></m:mi> <m:mfrac> <m:mrow> <m:mn>4</m:mn> <m:mi>π</m:mi> </m:mrow> <m:mi>c</m:mi> </m:mfrac> <m:mi></m:mi> <m:msup> <m:mi>J</m:mi> <m:mrow> <m:mi>β</m:mi> <m:mo></m:mo> </m:mrow> </m:msup> </m:mrow> </math> <P> and there was light </P> </body> </html>
<?xml version="1.0"?>
<CHANNEL HREF="http://www.ibiblio.org/xml/index.html">
<TITLE>Cafe con Leche</TITLE>
<ITEM HREF="http://www.ibiblio.org/xml/books.html">
<TITLE>Books about XML</TITLE>
</ITEM>
<ITEM HREF="http://www.ibiblio.org/xml/tradeshows.html">
<TITLE>Trade shows and conferences about XML</TITLE>
</ITEM>
<ITEM HREF="http://www.ibiblio.org/xml/lists.htm">
<TITLE>Mailing Lists dedicated to XML</TITLE>
</ITEM>
</CHANNEL>
Joseph Conrad's Heart of Darkness
The entire Project Gutenberg corpus
Vector Markup Language (VML)
Internet Explorer 5.0, 5.5
Microsoft Office 2000
Scalable Vector Graphics (SVG)
Meta-data
Dublin Core
Better Web searching
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:dc="http://purl.org/DC/>
<rdf:Description about="http://www.ibiblio.org/xml/>
<dc:CREATOR>Elliotte Rusty Harold</dc:CREATOR>
<dc:TITLE>Cafe con Leche</dc:TITLE>
</rdf:Description>
</rdf:RDF>
XSL: The Extensible Stylesheet Language
XLink: The Extensible Linking Language
Data typing in XML is weak
DTDs use a strange non-XML syntax
Limited compatiblity with namespaces
Limited extensibility
Schemas fix all these problems
There are multiple schema languages including:
Rick Jelliffe's Schematron
Murato Makoto's RELAX
James Clark's TreX
The W3C XML Schema Language
<?xml version="1.0"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <xsd:element name="SONG" type="SongType"/> <xsd:complexType name="SongType"> <xsd:seq> <xsd:element name="TITLE" type="xsd:string" minOccurs="1" maxOccurs="1"/> <xsd:element name="COMPOSER" type="xsd:string" minOccurs="1" maxOccurs="unbounded"/> <xsd:element name="PRODUCER" type="xsd:string" minOccurs="0" maxOccurs="unbounded"/> <xsd:element name="PUBLISHER" type="xsd:string" minOccurs="0" maxOccurs="1"/> <xsd:element name="LENGTH" type="xsd:timeDuration" minOccurs="0" maxOccurs="1"/> <xsd:element name="YEAR" type="xsd:gYear" minOccurs="1" maxOccurs="1"/> <xsd:element name="ARTIST" type="xsd:string" minOccurs="1" maxOccurs="unbounded"/> </xsd:seq> </xsd:complexType> </xsd:schema>
Any element can be a link
Links can be bi-directional
Links can be separated from the documents they connect
<footnote xlink:type="simple" xlink:href="footnote7.xml">7</footnote>
Microsoft Office 2000
Netscape What's Related
Examine the data
Design a vocabulary for the data
Write a style sheet
XML documents are trees.
XML elements contain other elements as well as text
Within these limits there's more than one way to organize the data
Hierarchically
Relationally
Objects
The catalog?
A custom Document element?
Choose catalog
for the root element
Everything else will be a descendant of catalog
This is not the only possible choice
<?xml version="1.0" encoding="ISO-8859-1"?> <catalog> Everything else will go here... </catalog>View in Browser
Composers?
Songs/Compositions?
Categories?
All of the Above?
<?xml version="1.0" encoding="ISO-8859-1"?>
<catalog>
<category>
Small chamber ensembles - 2-4 Players by New York Women Composers
</category>
</catalog>
View in Browser<?xml version="1.0" encoding="ISO-8859-1"?>
<catalog><category>Small chamber ensembles
- 2-4 Players by New York Women Composers</category></catalog>
View in BrowserEach composer has a name
<?xml version="1.0" encoding="ISO-8859-1"?> <catalog> <category> Small chamber ensembles - 2-4 Players by New York Women Composers </category> <composer> <name>Julie Mandel</name> </composer> <composer> <name>Margaret De Wys</name> </composer> <composer> <name>Beth Anderson</name> </composer> <composer> <name>Linda Bouchard</name> </composer> </catalog>View in Browser
It's better for sorting to divide names into first, middle, and last
Some (e.g. middle name) elements may be empty
<?xml version="1.0" encoding="ISO-8859-1"?> <catalog> <category> Small chamber ensembles - 2-4 Players by New York Women Composers </category> <composer> <name> <first_name>Julie</first_name> <middle_name></middle_name> <last_name>Mandel</last_name> </name> </composer> <composer> <name> <first_name>Margaret</first_name> <middle_name>De</middle_name> <last_name>Wys</last_name> </name> </composer> <composer> <name> <first_name>Beth</first_name> <middle_name></middle_name> <last_name>Anderson</last_name> </name> </composer> <composer> <name> <first_name>Linda</first_name> <middle_name></middle_name> <last_name>Bouchard</last_name> </name> </composer> </catalog>View in Browser
Some people have the same names
Use an ID number to disambiguate
Store the ID number in an id
attribute
<?xml version="1.0" encoding="ISO-8859-1"?> <catalog> <category> Small chamber ensembles - 2-4 Players by New York Women Composers </category> <composer id="c1"> <name> <first_name>Julie</first_name> <middle_name></middle_name> <last_name>Mandel</last_name> </name> </composer> <composer id="c2"> <name> <first_name>Margaret</first_name> <middle_name>De</middle_name> <last_name>Wys</last_name> </name> </composer> <composer id="c3"> <name> <first_name>Beth</first_name> <middle_name></middle_name> <last_name>Anderson</last_name> </name> </composer> <composer id="c4"> <name> <first_name>Linda</first_name> <middle_name></middle_name> <last_name>Bouchard</last_name> </name> </composer> </catalog>View in Browser
Let's look at an example of what we want:
Rendered HTML:
Tonal. Commissioned/Premiered by the Redlands' New Music Ensemble. (A swale is a meadow or a marsh where a lot of wild plants grow together. The composer discovered the word when a horse named Swale won the Kentucky Derby several years ago. Since her work is primarily collage of newly composed musical swatches, she has used the name extensively.) ACA - American Composers Alliance
Or in HTML:
<dt><cite>Brass Swale</cite> (1988) 5", tbn, 2 Bfl tpts, bar. hn</dt>
<dd><p>
Tonal. Commissioned/Premiered by the Redlands' New Music
Ensemble. (A swale is a meadow or a marsh where a lot of
wild plants grow together. The composer discovered the word
when a horse named Swale won the Kentucky Derby several
years ago. Since her work is primarily collage of newly
composed musical swatches, she has used the name
extensively.) ACA - American Composers
Alliance</p>
</dd>
Title
Date
Description
List of instruments
Length
Publisher
Some pieces may be missing from some compositions
<composition>
<title>Brass Swale</title>
<date>1988</date>
<length>5"</length>
<instruments>tbn, 2 Bfl tpts, bar, hn</instruments>
<description>
Tonal. Commissioned/Premiered by the Redlands' New Music
Ensemble. (A swale is a meadow or a marsh where a lot of
wild plants grow together. The composer discovered the word
when a horse named Swale won the Kentucky Derby several
years ago. Since her work is primarily collage of newly
composed musical swatches, she has used the name
extensively.)
</description>
<publisher>ACA - American Composers Alliance</publisher>
</composition>
View in Browser <composition>
<title>Trio for Flute, Viola and Harp</title>
<date><year>1994</year></date>
<length>13'38"</length>
<instruments>fl, hp, vla</instruments>
<description>
<p>Premiered at Queens College in April, 1996 by Sue Ann Kahn,
Christine Ims, and Susan Jolles. In 3 movements :</p>
<ul>
<li>mvt. 1: 5:01</li>
<li>mvt. 2: 4:11</li>
<li>mvt. 3: 4:26</li>
</ul>
</description>
<publisher>Theodore Presser</publisher>
</composition>
View in Browser <composition composer="c3">
<title>Trio: Dream in D</title>
<date><year>1980</year></date>
<length>10'</length>
<instruments>fl, pn, vc, or vn, pn, vc</instruments>
<description>
Rhapsodic. Passionate. Available on CD
<cite><a href=
"http://www.amazon.com/exec/obidos/ASIN/B000007NMH/qid=913265342/sr%3D1-2/">
Two by Three
</a></cite> from North/South Consonance (1998).
</description>
<publisher></publisher>
</composition>
View in Browser<?xml version="1.0" encoding="ISO-8859-1"?>
<catalog>
<category>
Small chamber ensembles - 2-4 Players by New York Women Composers
</category>
<cataloging_info>
<abstract>Compositions by the members of New York Women Composers</abstract>
<keyword>music publishing</keyword>
<keyword>scores</keyword>
<keyword>women composers</keyword>
<keyword>New York</keyword>
</cataloging_info>
<composer id="c1">
<name>
<first_name>Julie</first_name>
<middle_name></middle_name>
<last_name>Mandel</last_name>
</name>
</composer>
...
</catalog>
View in BrowserCopyright notice
Name of maintainer
Email address of maintainer
Last modified date
<?xml version="1.0" encoding="ISO-8859-1"?>
<catalog>
...
<last_updated>July 28, 1999</last_updated>
<copyright>1999 New York Women Composers</copyright>
<maintainer email="elharo@metalab.unc.edu"
url="http://www.macfaq.com/personal.html">
<name>
<first_name>Elliotte</first_name>
<middle_name>Rusty</middle_name>
<last_name>Harold</last_name>
</name>
</maintainer>
</catalog>
View in BrowserA simple and straight-forward language for applying styles like bold and Helvetica to particular XML elements.
Rather than being stored as part of the document itself, all the style information is placed in a separate document called a style sheet.
Partially supported by Mozilla, Netscape 6, IE 5.0/5.5, and Opera 4.0/5.0
Full W3C Recommendation
category { display: block; font-family: Helvetica, Arial, sans; font-size: 32pt; font-weight: bold; text-align: center } catalog { font-family: "New York", "Times New Roman", serif; font-size: 14pt; background-color: white; color: black; display: block } composer { display: block; font-family: Helvetica, Arial, sans; font-size: 24pt; font-weight: bold; text-align: left } composition title { display: block; font-family: Helvetica, Arial, sans; font-size: 18pt; font-weight: bold; text-align: left} composition * {display:list-item} description {display: block} // cataloging_info is only for search engines cataloging_info { display: none; color: #FFFFFF} last_updated, copyright, maintainer {display: block; font-size: small} copyright:before {content: "Copyright " } last_updated:before {content: "Last Modified " } last_updated {margin-top: 2ex }
Should be able to match composers with compositions
Should be able to sort composers and compositions by name
Should be able to include data from attributes; e.g. the maintainer's email address
Horizontal rules would be nice
Better header (e.g. title
and meta
tags) would be nice
CSS Level 3
XSL
XSL + JavaScript
CSS has broader support
CSS is more stable
XSL is much more powerful
XSL can be used without browser support by transforming to HTML on the server side
Every XML document must be well-formed.
Parsers are not allowed to accept malformed XML documents.
Open and close all tags
Empty tags end with />
There is a unique root element
Elements may not overlap
Attribute values are quoted
<
and &
are only used to start tags and entities
Only the five predefined entity references are used
Plus more...
Good:
<p>The quick brown fox jumped over the lazy dog</p>
<li>A very <B>important</B> point</li>
Copyright 1999 Elliotte Rusty Harold<br></br>
Bad:
The quick brown fox jumped over the lazy dog<p>
<li>A very <B>important point
Copyright 1999 Elliotte Rusty Harold<br>
<BR/>
, <HR/>
, and
<IMG/>
instead of
<BR>
, <HR>
, and
<IMG>
Web browsers deal inconsistently with these
Can use <BR></BR>
<HR></HR>
<IMG></IMG>
instead
<BR CLASS="EMPTY"/>
seems to work best.
One element completely contains all other elements of the document
This is HTML
in HTML files
The XML declaration and xml-stylesheet
processing instruction are
not elements
If an element contains a start tag for an element, it must also contain the corresponding end tag
Empty elements may appear anywhere
Every non root element has a parent element
Good:
<A HREF="http://www.ibiblio.org/xml/">
<DIV ALIGN="CENTER">
<A HREF="http://www.ibiblio.org/xml/">
<EMBED SRC="minnesotaswale.aif" hidden="hidden">
Bad:
<A HREF=http://www.ibiblio.org/xml/>
<DIV ALIGN=CENTER>
<EMBED SRC=minnesotaswale.aif hidden=hidden>
<EMBED SRC="minnesotaswale.aif" hidden>
Good:
<H1>O'Reilly & Associates</H1>
Bad:
<H1>O'Reilly & Associates</H1>
Good:
<CODE>for (int i = 0; i <= args.length; i++ ) { </CODE>
Bad:
<CODE>for (int i = 0; i <= args.length; i++ ) { </CODE>
Good:
&
<
>
"
'
Bad:
©
®
&tm;
α
é
etc.
Entity references must end with a semicolon.
<
is good
<
is bad
Decimal:
¡ | ¡ |
¢ | ¢ |
£ | £ |
¤ | ¤ |
¥ | ¥ |
¦ | ¦ |
etc. for all other Unicode values that are allowed in XML documents |
Hexadecimal
¡ | ¡ |
¢ | ¢ |
£ | £ |
¤ | ¤ |
¥ | ¥ |
¦ | ¦ |
etc. for all other Unicode values that are allowed in XML documents |
There are two levels of conformance to XML
Well-formed documents are correct with or without a DTD. They adhere to the basic syntax rules of XML
Valid documents also adhere to the constraints specified in a DTD
All valid documents are well-formed; not all well-formed document are valid.
A Document Type Definition (DTD) describes the elements and attributes that may appear in a document
Validation compares a particular document against a DTD
Well-formedness is a prerequisite for validity
A DTD lists the elements, attributes, and entities contained in a document
A DTD defines the relationships between different elements and attributes
A DTD for songs:
<!ELEMENT SONG (TITLE, COMPOSER+, PRODUCER*, PUBLISHER*, LENGTH?, YEAR?, ARTIST+)> <!ELEMENT TITLE (#PCDATA)> <!ELEMENT COMPOSER (#PCDATA)> <!ELEMENT PRODUCER (#PCDATA)> <!ELEMENT PUBLISHER (#PCDATA)> <!ELEMENT LENGTH (#PCDATA)> <!-- This should be a four digit year like "1999", not a two-digit year like "99" --> <!ELEMENT YEAR (#PCDATA)> <!ELEMENT ARTIST (#PCDATA)>
Normally stored in a separate file
To be valid an XML document must be
Well-formed
Must have a document type declaration
Must comply with the constraints specified in the DTD
<?xml version="1.0"?> <!DOCTYPE SONG SYSTEM "song.dtd"> <SONG> <TITLE>Hot Cop</TITLE> <COMPOSER>Jacques Morali</COMPOSER> <COMPOSER>Henri Belolo</COMPOSER> <COMPOSER>Victor Willis</COMPOSER> <PRODUCER>Jacques Morali</PRODUCER> <PUBLISHER>PolyGram Records</PUBLISHER> <LENGTH>6:20</LENGTH> <YEAR>1978</YEAR> <ARTIST>Village People</ARTIST> </SONG>
To check validity you pass the document through a validating parser which should report any errors it finds. For example,
% java sax.SAXCount -v invalidhotcop.xml Error at (file file:/D:/speaking/SD99EAST/dtds/invalidhotcop.xml, line 10, char 8): Element "<SONG>" is not valid because it does not follow the rule, "(TITLE,C OMPOSER+,PRODUCER*,PUBLISHER*,LENGTH?,YEAR?,ARTIST+)". invalidhotcop.xml: 281 ms
A valid document:
% java sax.SAXCount -v validhotcop.xml validhotcop.xml: 170 ms
<?xml version="1.0"?> <!DOCTYPE SONG [ <!ELEMENT SONG (TITLE, COMPOSER+, PRODUCER*, PUBLISHER*, LENGTH?, YEAR?, ARTIST+)> <!ELEMENT TITLE (#PCDATA)> <!ELEMENT COMPOSER (#PCDATA)> <!ELEMENT PRODUCER (#PCDATA)> <!ELEMENT PUBLISHER (#PCDATA)> <!ELEMENT LENGTH (#PCDATA)> <!-- This should be a four digit year like "1999", not a two-digit year like "99" --> <!ELEMENT YEAR (#PCDATA)> <!ELEMENT ARTIST (#PCDATA)> ]> <SONG> <TITLE>Hot Cop</TITLE> <COMPOSER>Jacques Morali</COMPOSER> <COMPOSER>Henri Belolo</COMPOSER> <COMPOSER>Victor Willis</COMPOSER> <PRODUCER>Jacques Morali</PRODUCER> <PUBLISHER>PolyGram Records</PUBLISHER> <LENGTH>6:20</LENGTH> <YEAR>1978</YEAR> <ARTIST>Village People</ARTIST> </SONG>
Ensures that data is correct before feeding it into a program
Ensures that a format is followed
Establishes what must be supported
Not all documents need to be valid; sometimes well-formed is enough
<?xml version="1.0" encoding="ISO-8859-1"?> <catalog> <category> Small chamber ensembles - 2-4 Players by New York Women Composers </category> <cataloging_info> <abstract>Compositions by the members of New York Women Composers</abstract> <keyword>music publishing</keyword> <keyword>scores</keyword> <keyword>women composers</keyword> <keyword>New York</keyword> </cataloging_info> <last_updated>July 28, 1999</last_updated> <copyright>1999 New York Women Composers</copyright> <maintainer email="elharo@metalab.unc.edu" url="http://www.macfaq.com/personal.html"> <name> <first_name>Elliotte</first_name> <middle_name>Rusty</middle_name> <last_name>Harold</last_name> </name> </maintainer> <composer id="c1"> <name> <first_name>Julie</first_name> <middle_name></middle_name> <last_name>Mandel</last_name> </name> </composer> <composer id="c2"> <name> <first_name>Margaret</first_name> <middle_name>De</middle_name> <last_name>Wys</last_name> </name> </composer> <composer id="c3"> <name> <first_name>Beth</first_name> <middle_name></middle_name> <last_name>Anderson</last_name> </name> </composer> <composer id="c4"> <name> <first_name>Linda</first_name> <middle_name></middle_name> <last_name>Bouchard</last_name> </name> </composer> <composition composer="c1"> <title>Trio for Flute, Viola and Harp</title> <date><year>(1994)</year></date> <length>13'38"</length> <instruments>fl, hp, vla</instruments> <description> <p>Premiered at Queens College in April, 1996 by Sue Ann Kahn, Christine Ims, and Susan Jolles. In 3 movements :</p> <ul> <li>mvt. 1: 5:01</li> <li>mvt. 2: 4:11</li> <li>mvt. 3: 4:26</li> </ul> </description> <publisher>Theodore Presser</publisher> </composition> <composition composer="c2"> <title>Charmonium</title> <date><year>(1991)</year></date> <length>9'</length> <instruments>2 vln, vla, vc</instruments> <description> Commissioned as quartet for the Meridian String Quartet. Sonorous, bold. Moderate difficulty. Tape available. </description> <publisher></publisher> </composition> <composition composer="c1"> <title>Invention for Flute and Piano</title> <date><year>(1994)</year></date> <length></length> <instruments>fl, pn</instruments> <description>3 movements</description> <publisher></publisher> </composition> <composition composer="c3"> <title>Little Trio</title> <date><year>(1984)</year></date> <length>4'</length> <instruments>fl, guit, va</instruments> <description></description> <publisher>ACA</publisher> </composition> <composition composer="c3"> <title>Dr. Blood's Mermaid Lullaby</title> <date><year>(1980)</year></date> <length>3'</length> <instruments>fl or ob, or vn, or vc, pn</instruments> <description></description> <publisher>ACA</publisher> </composition> <composition composer="c3"> <title>Trio: Dream in D</title> <date><year>(1980)</year></date> <length>10'</length> <instruments>fl, pn, vc, or vn, pn, vc</instruments> <description> Rhapsodic. Passionate. Available on CD <cite><a href="http://www.amazon.com/exec/obidos/ASIN/B000007NMH/qid%3D913265342/sr%3D1-2/">Two by Three</a></cite> from North/South Consonance (1998). </description> <publisher></publisher> </composition> <composition composer="c4"> <title>Propos II</title> <date><year>(1985)</year></date> <length>11'</length> <instruments>2 tpt</instruments> <description>Arrangement from Propos</description> <publisher></publisher> </composition> <composition composer="c4"> <title>Rictus En Mirroir</title> <date><year>(1985)</year></date> <length>14'</length> <instruments>fl, ob, hpschd, vc</instruments> <description></description> <publisher></publisher> </composition> </catalog>View in Browser
Each tag must be declared in a <!ELEMENT>
declaration.
A <!ELEMENT>
declaration gives the
name and content model of the element
The content model uses a simple regular expression-like grammar to precisely specify what is and isn't allowed in an element
ANY
#PCDATA
Sequences
Choices
Mixed Content
Modifiers
EMPTY
<!ELEMENT catalog ANY>
A catalog
can contain any
child element and/or raw text (parsed character data)
Parsed Character Data; i.e. raw text, no markup. For example,
<year>1984</year>
<!ELEMENT year (#PCDATA)>
Valid:
<year>1999</year>
<year>99</year>
<year>1999 C.E.</year>
<year>
The year of our Lord one thousand, nine hundred, and ninety-nine
</year>
Invalid:
<year>
<month>January</month>
<month>February</month>
<month>March</month>
<month>April</month>
<month>May</month>
<month>June</month>
<month>July</month>
<month>August</month>
<month>September</month>
<month>October</month>
<month>November</month>
<month>December</month>
</year>
<!ELEMENT category (#PCDATA)>
<!ELEMENT abstract (#PCDATA)>
<!ELEMENT keyword (#PCDATA)>
<!ELEMENT last_updated (#PCDATA)>
<!ELEMENT copyright (#PCDATA)>
<!ELEMENT first_name (#PCDATA)>
<!ELEMENT middle_name (#PCDATA)>
<!ELEMENT last_name (#PCDATA)>
<!ELEMENT title (#PCDATA)>
<!ELEMENT year (#PCDATA)>
<!ELEMENT instruments (#PCDATA)>
<!ELEMENT publisher (#PCDATA)>
<!ELEMENT length (#PCDATA)>
DTDs seem as obfuscated as C.
Comments can improve this by giving example elements
Comments are the same as in HTML; e.g. <!-- Comment -->
<!-- e.g. "1999 New York Women Composers",
not "Copyright 1999 New York Women Composers" -->
<!ELEMENT copyright (#PCDATA)>
<date><year>1994</year></date>
To declare that a date
element must have a
year
child:
<!ELEMENT date (year)>
You only have to declare the immediate children
<maintainer email="elharo@metalab.unc.edu"
url="http://www.macfaq.com/personal.html">
<name>
<first_name>Elliotte</first_name>
<middle_name>Rusty</middle_name>
<last_name>Harold</last_name>
</name>
</maintainer>
<composer id="c1">
<name>
<first_name>Julie</first_name>
<middle_name></middle_name>
<last_name>Mandel</last_name>
</name>
</composer>
To declare that a maintainer
element must have a
name
child:
<!ELEMENT maintainer (name)>
<!ELEMENT composer (name)>
<name>
<first_name>Elliotte</first_name>
<middle_name>Rusty</middle_name>
<last_name>Harold</last_name>
</name>
Separate multiple required child elements with commas; e.g.
<!ELEMENT name (first_name, middle_name, last_name)>
A list of child elements separated by commas is called a sequence
The element being described must have only child elements, no mixed content
You must know the order of the child elements
You must know the type of each child element
You must know the number of child elements
The number can be relaxed with wild cards
<cataloging_info>
<abstract>Compositions by the members of New York Women Composers</abstract>
<keyword>music publishing</keyword>
<keyword>scores</keyword>
<keyword>women composers</keyword>
<keyword>New York</keyword>
</cataloging_info>
The +
suffix indicates that one or more of that element
is required at that point
<!ELEMENT cataloging_info (abstract, keyword+)>
The *
suffix indicates that zero, one, or more of that element
is required at that point
<!ELEMENT catalog (category, cataloging_info, last_updated, copyright,
maintainer, composer*, composition*)>
<composition composer="c1">
<title>Trio for Flute, Viola and Harp</title>
<date><year>1994</year></date>
<length>13'38"</length>
<instruments>fl, hp, vla</instruments>
<description>
<p>Premiered at Queens College in April, 1996 by Sue Ann Kahn,
Christine Ims, and Susan Jolles. In 3 movements :</p>
<ul>
<li>mvt. 1: 5:01</li>
<li>mvt. 2: 4:11</li>
<li>mvt. 3: 4:26</li>
</ul>
</description>
<publisher>Theodore Presser</publisher>
</composition>
Suffixing an element name with a question mark (?) in the content model indicates that either 0 or 1 (but not more than one) of that element are expected at that position
<!ELEMENT composition
(title, date, length?, instruments, description?, publisher?)>
A choice indicates one element or another but not both
A choice is signified by a vertical bar |
There can be two or more elements in a choice
<!ELEMENT date (year | ISODate)>
<!ELEMENT year (#PCDATA)>
<!ELEMENT ISODate (#PCDATA)>
<!ELEMENT description (#PCDATA | ul | a | cite | p)*>
<!ELEMENT cite (#PCDATA | a)*>
<!ELEMENT ul (li*)>
<!ELEMENT li (#PCDATA)>
<!ELEMENT a (#PCDATA)>
<!ELEMENT p (#PCDATA)>
Mixed content is both #PCDATA and child elements in a choice, followed by an asterisk
Should be avoided where possible
This is the only way to combine PCDATA with child elements in a content model
#PCDATA must come first
#PCDATA cannot be used in a sequence
Mixed content with other content models
Exactly one element of a given type but in any position (The SGML & operator)
Between M and N of a given element
Restrictions on the PCDATA; e.g. that the year
element must contain a four-digit year
Recall this element:
<maintainer email="elharo@metalab.unc.edu"
url="http://www.macfaq.com/personal.html">
<name>
<first_name>Elliotte</first_name>
<middle_name>Rusty</middle_name>
<last_name>Harold</last_name>
</name>
</maintainer>
It is declared like this:
<!ELEMENT maintainer (name)>
<!ATTLIST maintainer email CDATA "webmaster@nywc.org">
<!ATTLIST maintainer url CDATA "http://www.ibiblio.org/nywc">
The general format of an <!ATTLIST>
declaration is:
<!ATTLIST Element_name Attribute_name Type Default_value>
<maintainer email="elharo@metalab.unc.edu"
url="http://www.macfaq.com/personal.html">
<name>
<first_name>Elliotte</first_name>
<middle_name>Rusty</middle_name>
<last_name>Harold</last_name>
</name>
</maintainer>
It is declared like this:
<!ELEMENT maintainer (name)>
<!ATTLIST maintainer email CDATA "webmaster@nywc.org">
<!ATTLIST maintainer url CDATA "http://www.ibiblio.org/nywc">
But it can also be declared in a single
<!ATTLIST>
declaration like this:
<!ATTLIST maintainer email
CDATA "webmaster@nywc.org" url CDATA "http://www.ibiblio.org/nywc/">
This is more obvious with better indentation:
<!ATTLIST maintainer email CDATA "webmaster@nywc.org"
url CDATA "http://www.ibiblio.org/nywc/">
A literal string value
One of these three keywords
#REQUIRED
#IMPLIED
#FIXED
No default value is provided in the DTD
Document authors must provide an attribute value for each element
<!ELEMENT maintainer (name)>
<!ATTLIST maintainer email CDATA #REQUIRED
url CDATA #REQUIRED>
<!ELEMENT a (#PCDATA)>
<!ATTLIST a href CDATA #IMPLIED>
No default value in the DTD
Author may (but does not have to) provide a value with each element
Value is the same for all elements
Default value must be provided in DTD
Document author may not change default value
<!ELEMENT maintainer (name)>
<!ATTLIST maintainer email CDATA #FIXED "webmaster@nywc.org"
url CDATA #REQUIRED>
CDATA
ID
IDREF
IDREFS
ENTITY
ENTITIES
NOTATION
NMTOKEN
NMTOKENS
Enumerated
Most general attribute type
Value can be any string of text not containing a raw less-than
sign (<
) or quotation marks ("
)
<!ATTLIST maintainer email CDATA #REQUIRED
url CDATA #IMPLIED>
Value must be an XML name
May include letters, digits, underscores, hyphens, and periods
May not include whitespace
May or may not have the name "id" or "ID"
May contain colons only if used for namespaces
Value must be unique within ID type attributes in the document
Generally the default value is #REQUIRED
<!ELEMENT composer (name)>
<!ATTLIST composer id ID #REQUIRED>
Value matches the ID of an element in the same document
Used for links and the like
Multiple elements may share the same IDREF values
<!ELEMENT composition (title, date, length?,
instruments, description?, publisher?)>
<!ATTLIST composition composer IDREF #REQUIRED>
A list of ID values in the same document
Separated by white space
<!ELEMENT composition (title, date, length?,
instruments, description?, publisher?)>
<!ATTLIST composition composer IDREFS #REQUIRED>
<!ELEMENT category (#PCDATA)>
<!ELEMENT abstract (#PCDATA)>
<!ELEMENT keyword (#PCDATA)>
<!ELEMENT last_updated (#PCDATA)>
<!ELEMENT copyright (#PCDATA)>
<!ELEMENT title (#PCDATA)>
<!ELEMENT instruments (#PCDATA)>
<!ELEMENT publisher (#PCDATA)>
<!ELEMENT length (#PCDATA)>
<!ELEMENT date (year | ISODate)>
<!ELEMENT year (#PCDATA)>
<!ELEMENT ISODate (#PCDATA)>
<!ELEMENT catalog (category, cataloging_info, last_updated,
copyright, maintainer, (composer | composition)*)>
<!ELEMENT cataloging_info (abstract, keyword+)>
<!ELEMENT description (#PCDATA | ul | a | cite | p)*>
<!ELEMENT cite (#PCDATA | a)*>
<!ELEMENT ul (li*)>
<!ELEMENT li (#PCDATA)>
<!ELEMENT a (#PCDATA)>
<!ELEMENT p (#PCDATA)>
<!ELEMENT maintainer (name)>
<!ELEMENT name (first_name, middle_name, last_name)>
<!ELEMENT first_name (#PCDATA)>
<!ELEMENT middle_name (#PCDATA)>
<!ELEMENT last_name (#PCDATA)>
<!ATTLIST maintainer email CDATA #REQUIRED
url CDATA #IMPLIED>
<!ELEMENT composer (name)>
<!ATTLIST composer id ID #REQUIRED>
<!ELEMENT composition (title, date, length?,
instruments, description?, publisher?)>
<!ATTLIST composition composer IDREFS #REQUIRED>
<!ATTLIST a href CDATA #REQUIRED>
To distinguish between elements and attributes from different vocabularies with different meanings.
To group all related elements and attributes together so that a parser can easily recognize them.
The XLink specification defines an attribute with the name href
.
The XHTML specification also uses href
attributes on some elements.
And the XInclude specification uses href
attributes.
An XSLT style sheet that will transform XHTML documents containing both Scalable Vector Graphics (SVG) pictures and MathML equations into XSL-Formatting object documents.
The a
, title
, script
,
style
and font
elements in XHTML and SVG
The table
element in XHTML and XSL-FO
The text
element in XSLT and SVG
The set
element in MathML and SVG
An XSLT stylesheet that transforms a style sheet in an older version of the XSLT specification to a style sheet in a newer version of the XSLT specification.
Namespaces disambiguate elements with the same name from each other by attaching different prefixes to names from different XML applications.
Each prefix is associated with a URI.
Names whose prefixes are associated with the same URI are in the same namespace.
Names whose prefixes are associated with different URIs are in different namespaces.
Elements and attributes that are in namespaces have names that contain exactly one colon. They look like this:
rdf:description
xlink:type
xsl:template
Everything before the colon is called the prefix
Everything after the colon is called the local part.
The complete name including the colon is called the qualified name.
Each prefix in a qualified name is associated with a URI.
For example, all elements in XSLT 1.0 style sheets are associated with the http://www.w3.org/1999/XSL/Transform URI.
The customary prefix xsl
is a shorthand for the longer URI
http://www.w3.org/1999/XSL/Transform.
You can't use the URI in the element name directly.
{http://www.w3.org/1999/XSL/Transform}template
Prefixes are bound to namespace URIs by attaching an xmlns:prefix
attribute to the prefixed element or one of its ancestors.
<svg:svg xmlns:svg="http://www.w3.org/2000/svg"
width="12cm" height="10cm">
<svg:ellipse rx="110" ry="130" />
<svg:rect x="4cm" y="1cm" width="3cm" height="6cm" />
</svg:svg>
Bindings have scope within the element where they're declared.
An SVG processor can recognize all three of these elements as SVG elements because they all have prefixes bound to the particular URI defined by the SVG specification.
<xhtml:html xmlns:xhtml="http://www.w3.org/1999/xhtml"
xmlns:xlink="http://www.w3.org/1999/xlink">
<xhtml:head><xhtml:title>Three Namespaces</xhtml:title></xhtml:head>
<xhtml:body>
<xhtml:h1 align="center">An Ellipse and a Rectangle</xhtml:h1>
<svg:svg xmlns:svg="http://www.w3.org/2000/svg"
width="12cm" height="10cm">
<svg:ellipse rx="110" ry="130" />
<svg:rect x="4cm" y="1cm" width="3cm" height="6cm" />
</svg:svg>
<xhtml:p xlink:type="simple"
xlink:href="ellipses.html">
More about ellipses
</xhtml:p>
<xhtml:p xlink:type="simple" xlink:href="rectangles.html">
More about rectangles
</xhtml:p>
<xhtml:hr/>
<xhtml:p>Last Modified February 13, 2000</xhtml:p>
</xhtml:body>
</xhtml:html>
<!ATTLIST svg:svg xmlns:svg (CDATA)
#FIXED "http://www.w3.org/2000/svg">
<svg:svg width="12cm" height="10cm">
<svg:ellipse rx="110" ry="130" />
<svg:rect x="4cm" y="1cm" width="3cm" height="6cm" />
</svg:svg>
Unprefixed attributes are never in any namespace.
Being an attribute of an element in the http://www.w3.org/1999/xhtml
namespace is not sufficient to put the attribute in the http://www.w3.org/1999/xhtml
namespace.
The only way an attribute belongs to a namespace is if it has a declared prefix, like xlink:type
and xlink:href
.
Many XML applications have recommended prefixes. For example, SVG elements often use the prefix svg
and Resource Description Framework (RDF) elements often have the prefix rdf
. However, these prefixes are simply conventions, and can be changed based on necessity, convenience or whim.
Before a prefix can be used, it must be bound to a URI.
These URIs are standardized, not the prefixes.
The prefix can change as long as the URI stays the same.
Purely formal
Can point somewhere but do not have to
Parsers compare namespace URIs on a character by character basis. These are three different namespaces:
http://www.w3.org/1999/XSL/Transform
http://www.w3.org/1999/XSL/Transform/
http://www.w3.org/1999/XSL/Transform/index.html
Indicate that an unprefixed element and all its unprefixed descendant
elements belong to a particular namespace by attaching an xmlns
attribute with no prefix:
<html xmlns="http://www.w3.org/1999/xhtml"
xmlns:xlink="http://www.w3.org/1999/xlink">
<head><title>Three Namespaces</title></head>
<body>
<h1 align="center">An Ellipse and a Rectangle</h1>
<svg:svg xmlns:svg="http://www.w3.org/2000/svg"
width="12cm" height="10cm">
<svg:ellipse rx="110" ry="130" />
<svg:rect x="4cm" y="1cm" width="3cm" height="6cm" />
</svg:svg>
<p xlink:type="simple"
xlink:href="ellipses.html">
More about ellipses
</p>
<p xlink:type="simple" xlink:href="rectangles.html">
More about rectangles
</p>
<hr/>
<p>Last Modified February 13, 2000</p>
</body>
</html>
The html
, p
, and other
non-prefixed elements are in the
http://www.w3.org/1999/xhtml namespace.
Default namespaces only apply to elements, not to attributes.
Thus in the above example the align
attribute of
the h1
element is not in any namespace.
You can change the default namespace within a particular
element by adding an xmlns
attribute to the element.
<html xmlns="http://www.w3.org/1999/xhtml"
xmlns:xlink="http://www.w3.org/1999/xlink">
<head><title>Three Namespaces</title></head>
<body>
<h1 align="center">An Ellipse and a Rectangle</h1>
<svg xmlns="http://www.w3.org/2000/svg"
width="12cm" height="10cm">
<ellipse rx="110" ry="130" />
<rect x="4cm" y="1cm" width="3cm" height="6cm" />
</svg>
<p xlink:type="simple" xlink:href="ellipses.html">
More about ellipses
</p>
<p xlink:type="simple" xlink:href="rectangles.html">
More about rectangles
</p>
<hr/>
<p>Last Modified February 13, 2000</p>
</body>
</html>
<!ATTLIST svg xmlns (CDATA)
#FIXED "http://www.w3.org/2000/svg">
<svg width="12cm" height="10cm">
<ellipse rx="110" ry="130" />
<rect x="4cm" y="1cm" width="3cm" height="6cm" />
</svg>
DTDs must declare the qualified names
<!ELEMENT svg:text (#PCDATA)>
If the prefix changes, the DTD needs to change to.
Parameter entity references can help when the prefix changes or is removed:
<!ENTITY % mathml-colon ''>
<!ENTITY % mathml-prefix ''>
<!ENTITY % mathml-exp '%mathml-prefix;%mathml-colon;exp' >
<!ENTITY % mathml-abs '%mathml-prefix;%mathml-colon;abs' >
<!ENTITY % mathml-arg '%mathml-prefix;%mathml-colon;arg' >
<!ENTITY % mathml-real '%mathml-prefix;%mathml-colon;real' >
<!ENTITY % mathml-imaginary '%mathml-prefix;%mathml-colon;imaginary' >
This presentation: http://www.ibiblio.org/xml/slides/oreillyjava2001/fundamentals/
XML in a Nutshell
Elliotte Rusty Harold and W. Scott Means
O'Reilly & Associates, 2001
ISBN 0-596-00058-8
XPath: http://www.oreilly.com/catalog/xmlnut/chapter/ch09.html
XML Bible, second edition
Elliotte Rusty Harold
Hungry Minds, 2001
ISBN 0-7645-4760-7
XLinks: http://www.ibiblio.org/xml/books/bible/updates/16.html
XPointers: http://www.ibiblio.org/xml/books/bible/updates/17.html