Extensible Markup Language
A syntax for documents
A Meta-Markup Language
A Structural and Semantic language, not a formatting language
Not just for Web pages
Not like HTML, troff, LaTeX
Make up the tags you need as you need them
The tags you create can be documented in a Document Type Definition (DTD)
A meta syntax for domain-specific markup languages like MusicML, MathML, and CML
XML documents form a tree
Element and attribute names reflect the kind of the element
Formatting can be added with a style sheet
<dt>Hot Cop <dd> by Jacques Morali, Henri Belolo, and Victor Willis <ul> <li>Producer: Jacques Morali <li>Publisher: PolyGram Records <li>Length: 6:20 <li>Written: 1978 <li>Artist: Village People </ul>View Document in Browser
<SONG> <TITLE>Hot Cop</TITLE> <COMPOSER>Jacques Morali</COMPOSER> <COMPOSER>Henri Belolo</COMPOSER> <COMPOSER>Victor Willis</COMPOSER> <PRODUCER>Jacques Morali</PRODUCER> <PUBLISHER>PolyGram Records</PUBLISHER> <LENGTH>6:20</LENGTH> <YEAR>1978</YEAR> <ARTIST>Village People</ARTIST> </SONG>View Document in Browser
Plain ASCII or UTF-8 text
.xml is standard file extension
Any standard text editor will work
<?xml version="1.0" standalone="yes" encoding="UTF-8"?> <SONG> <TITLE>Hot Cop</TITLE> <COMPOSER>Jacques Morali</COMPOSER> <COMPOSER>Henri Belolo</COMPOSER> <COMPOSER>Victor Willis</COMPOSER> <PRODUCER>Jacques Morali</PRODUCER> <PUBLISHER>PolyGram Records</PUBLISHER> <LENGTH>6:20</LENGTH> <YEAR>1978</YEAR> <ARTIST>Village People</ARTIST> </SONG>
version
attribute
required
always has the value 1.0
standalone
attribute
yes
no
encoding
attribute
UTF-8
8859_1
etc.
<TITLE>Hot Cop</TITLE>
Start tag <TITLE>
Contents "Hot Cop" which is character data
End tag </TITLE>
<SONG>
<TITLE>Hot Cop</TITLE>
<COMPOSER>Jacques Morali</COMPOSER>
<COMPOSER>Henri Belolo</COMPOSER>
<COMPOSER>Victor Willis</COMPOSER>
<PRODUCER>Jacques Morali</PRODUCER>
<PUBLISHER>PolyGram Records</PUBLISHER>
<LENGTH>6:20</LENGTH>
<YEAR>1978</YEAR>
<ARTIST>Village People</ARTIST>
</SONG>
Start tag <SONG>
Contents are child elements TITLE
,
PUBLISHER
, ARTIST
, etc.
End tag </SONG>
SONG {display: block; font-family: New York, Times New Roman, serif} TITLE {display: block; font-size: 24pt; font-weight: bold; font-family: Helvetica, sans} COMPOSER {display: block} PRODUCER {display: block} YEAR {display: block} PUBLISHER {display: block} LENGTH {display: block} ARTIST {display: block; font-style: italic}
<?xml-stylesheet type="text/css" href="song1.css"?>
<?xml-stylesheet type="text/css" href="song.css"?> <SONG> <TITLE>Hot Cop</TITLE> <COMPOSER>Jacques Morali</COMPOSER> <COMPOSER>Henri Belolo</COMPOSER> <COMPOSER>Victor Willis</COMPOSER> <PRODUCER>Jacques Morali</PRODUCER> <PUBLISHER>PolyGram Records</PUBLISHER> <LENGTH>6:20</LENGTH> <YEAR>1978</YEAR> <ARTIST>Village People</ARTIST> </SONG>
Cascading Style Sheets Level 1 (CSS1)
Internet Explorer 5.0
Mozilla 5.0
Cascading Style Sheets Level 2 (CSS2)
Internet Explorer 5 (partial)
Mozilla 5.0 (partial)
Extensible Style Language (XSL)
Internet Explorer 5.0 (older draft, buggy)
LotusXSL, XT, Other non-browser converters
Document Style and Semantics Language (DSSSL)
Jade
Domain-Specific Markup Languages
Self-Describing Data
Interchange of Data Among Applications
Non proprietary format
Don't pay for what you don't use
Much data is lost due to format problems
XML is very simple
XML is self-describing
XML is well documented
<PERSON ID="p1100" SEX="M">
<NAME>
<GIVEN>Judson</GIVEN>
<SURNAME>McDaniel</SURNAME>
</NAME>
<BIRTH>
<DATE>21 Feb 1834</DATE>
</BIRTH>
<DEATH>
<DATE>9 Dec 1905</DATE>
</DEATH>
</PERSON>
E-commerce
Syndication
EAI and EDI
A specific markup language that uses the XML meta-syntax is called an XML application
Different XML applications have their own more constricted syntaxes and vocabularies within the broader XML syntax
Further syntax can be layered on top of this; e.g. data typing through schemas
Web Pages
Mathematical Equations
Music Notation
Vector Graphics
Metadata
and more...
<?xml version="1.0"?> <html xmlns="http://www.w3.org/TR/REC-html40" xmlns:m="http://www.w3.org/TR/REC-MathML/" > <head> <title>Fiat Lux</title> <meta name="GENERATOR" content="amaya V1.3b" /> </head> <body> <P> And God said, </P> <math> <m:mrow> <m:msub> <m:mi>δ</m:mi> <m:mi>α</m:mi> </m:msub> <m:msup> <m:mi>F</m:mi> <m:mi>αβ</m:mi> </m:msup> <m:mi></m:mi> <m:mo>=</m:mo> <m:mi></m:mi> <m:mfrac> <m:mrow> <m:mn>4</m:mn> <m:mi>π</m:mi> </m:mrow> <m:mi>c</m:mi> </m:mfrac> <m:mi></m:mi> <m:msup> <m:mi>J</m:mi> <m:mrow> <m:mi>β</m:mi> <m:mo></m:mo> </m:mrow> </m:msup> </m:mrow> </math> <P> and there was light </P> </body> </html>
<?xml version="1.0"?>
<CHANNEL HREF="http://metalab.unc.edu/xml/index.html">
<TITLE>Cafe con Leche</TITLE>
<ITEM HREF="http://metalab.unc.edu/xml/books.html">
<TITLE>Books about XML</TITLE>
</ITEM>
<ITEM HREF="http://metalab.unc.edu/xml/tradeshows.html">
<TITLE>Trade shows and conferences about XML</TITLE>
</ITEM>
<ITEM HREF="http://metalab.unc.edu/xml/lists.htm">
<TITLE>Mailing Lists dedicated to XML</TITLE>
</ITEM>
</CHANNEL>
Joseph Conrad's Heart of Darkness
Vector Markup Language (VML)
Internet Explorer 5.0
Microsoft Office 2000
Scalable Vector Graphics (SVG)
Meta-data
Dublin Core
Better Web searching
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:dc="http://purl.org/DC/>
<rdf:Description about="http://metalab.unc.edu/xml/>
<dc:Creator>Elliotte Rusty Harold</dc:Creator>
<dc:Title>Cafe con Leche</dc:Title>
</rdf:Description>
</rdf:RDF>
XSL: The Extensible Stylesheet Language
XLinks: The Extensible Linking Language
XSL Transformations
XSL Formatting Objects
Any element can be a link
Links can be bi-directional
Links can be separated from the documents they connect
<footnote xlink:type="simple" xlink:href="footnote7.xml">7</footnote>
Microsoft Office 2000
Netscape What's Related
Examine the data
Design a vocabulary for the data
Write a style sheet
XML documents are trees.
XML elements contain other elements as well as text
Within these limits there's more than one way to organize the data
Hierarchically
Relationally
Objects
The catalog?
A custom Document element?
Choose catalog
for the root element
Everything else will be a descendant of catalog
This is not the only possible choice
<?xml version="1.0" encoding="ISO-8859-1"?> <catalog> Everything else will go here... </catalog>View in Browser
Composers?
Songs/Compositions?
Categories?
All of the Above?
<?xml version="1.0" encoding="ISO-8859-1"?>
<catalog>
<category>
Small chamber ensembles - 2-4 Players by New York Women Composers
</category>
</catalog>
View in Browser<?xml version="1.0" encoding="ISO-8859-1"?>
<catalog><category>Small chamber ensembles
- 2-4 Players by New York Women Composers</category></catalog>
View in BrowserEach composer has a name
<?xml version="1.0" encoding="ISO-8859-1"?> <catalog> <category> Small chamber ensembles - 2-4 Players by New York Women Composers </category> <composer> <name>Julie Mandel</name> </composer> <composer> <name>Margaret De Wys</name> </composer> <composer> <name>Beth Anderson</name> </composer> <composer> <name>Linda Bouchard</name> </composer> </catalog>View in Browser
It's better for sorting to divide names into first, middle, and last
Some (e.g. middle name) elements may be empty
<?xml version="1.0" encoding="ISO-8859-1"?> <catalog> <category> Small chamber ensembles - 2-4 Players by New York Women Composers </category> <composer> <name> <first_name>Julie</first_name> <middle_name></middle_name> <last_name>Mandel</last_name> </name> </composer> <composer> <name> <first_name>Margaret</first_name> <middle_name>De</middle_name> <last_name>Wys</last_name> </name> </composer> <composer> <name> <first_name>Beth</first_name> <middle_name></middle_name> <last_name>Anderson</last_name> </name> </composer> <composer> <name> <first_name>Linda</first_name> <middle_name></middle_name> <last_name>Bouchard</last_name> </name> </composer> </catalog>View in Browser
Some people have the same names
Use an ID number to disambiguate
Store the ID number in an id
attribute
name=value
An element may not have two attributes with the same name
Attribute values must be quoted
<?xml version="1.0" encoding="ISO-8859-1"?> <catalog> <category> Small chamber ensembles - 2-4 Players by New York Women Composers </category> <composer id="c1"> <name> <first_name>Julie</first_name> <middle_name></middle_name> <last_name>Mandel</last_name> </name> </composer> <composer id="c2"> <name> <first_name>Margaret</first_name> <middle_name>De</middle_name> <last_name>Wys</last_name> </name> </composer> <composer id="c3"> <name> <first_name>Beth</first_name> <middle_name></middle_name> <last_name>Anderson</last_name> </name> </composer> <composer id="c4"> <name> <first_name>Linda</first_name> <middle_name></middle_name> <last_name>Bouchard</last_name> </name> </composer> </catalog>View in Browser
Attribute are for meta-data; elements are for data
Does the reader want to see the information? If yes, use element content; if no, use attributes
Attributes are good for ID numbers, URLs, references, and other information not directly relevant to the reader
Attributes can't hold structure well.
Elements allow you to include meta-meta-data (information about the information about the information).
Not everyone always agrees on what is and isn't meta-data.
Elements are more extensible in the face of future changes.
Let's look at an example of what we want:
Rendered HTML:
Tonal. Commissioned/Premiered by the Redlands' New Music Ensemble. (A swale is a meadow or a marsh where a lot of wild plants grow together. The composer discovered the word when a horse named Swale won the Kentucky Derby several years ago. Since her work is primarily collage of newly composed musical swatches, she has used the name extensively.) ACA - American Composers Alliance
Or in HTML:
<dt><cite>Brass Swale</cite> (1988) 5", tbn, 2 Bfl tpts, bar. hn</dt>
<dd><p>
Tonal. Commissioned/Premiered by the Redlands' New Music
Ensemble. (A swale is a meadow or a marsh where a lot of
wild plants grow together. The composer discovered the word
when a horse named Swale won the Kentucky Derby several
years ago. Since her work is primarily collage of newly
composed musical swatches, she has used the name
extensively.) ACA - American Composers
Alliance</p>
</dd>
Title
Date
Description
List of instruments
Length
Publisher
Some pieces may be missing from some compositions
<composition>
<title>Brass Swale</title>
<date>1988</date>
<length>5"</length>
<instruments>tbn, 2 Bfl tpts, bar, hn</instruments>
<description>
Tonal. Commissioned/Premiered by the Redlands' New Music
Ensemble. (A swale is a meadow or a marsh where a lot of
wild plants grow together. The composer discovered the word
when a horse named Swale won the Kentucky Derby several
years ago. Since her work is primarily collage of newly
composed musical swatches, she has used the name
extensively.)
</description>
<publisher>ACA - American Composers Alliance</publisher>
</composition>
View in Browser <composition>
<title>Trio for Flute, Viola and Harp</title>
<date><year>1994</year></date>
<length>13'38"</length>
<instruments>fl, hp, vla</instruments>
<description>
<p>Premiered at Queens College in April, 1996 by Sue Ann Kahn,
Christine Ims, and Susan Jolles. In 3 movements :</p>
<ul>
<li>mvt. 1: 5:01</li>
<li>mvt. 2: 4:11</li>
<li>mvt. 3: 4:26</li>
</ul>
</description>
<publisher>Theodore Presser</publisher>
</composition>
View in Browser <composition composer="c3">
<title>Trio: Dream in D</title>
<date><year>1980</year></date>
<length>10'</length>
<instruments>fl, pn, vc, or vn, pn, vc</instruments>
<description>
Rhapsodic. Passionate. Available on CD
<cite><a href="http://www.amazon.com/exec/obidos/ASIN/B000007NMH/qid=913265342/sr%3D1-2/">
Two by Three</a></cite> from North/South Consonance (1998).
</description>
<publisher></publisher>
</composition>
View in Browser<?xml version="1.0" encoding="ISO-8859-1"?>
<catalog>
<category>
Small chamber ensembles - 2-4 Players by New York Women Composers
</category>
<cataloging_info>
<abstract>Compositions by the members of New York Women Composers</abstract>
<keyword>music publishing</keyword>
<keyword>scores</keyword>
<keyword>women composers</keyword>
<keyword>New York</keyword>
</cataloging_info>
<composer id="c1">
<name>
<first_name>Julie</first_name>
<middle_name></middle_name>
<last_name>Mandel</last_name>
</name>
</composer>
...
</catalog>
View in BrowserCopyright notice
Name of maintainer
Email address of maintainer
Last modified date
<?xml version="1.0" encoding="ISO-8859-1"?>
<catalog>
...
<last_updated>July 28, 1999</last_updated>
<copyright>1999 New York Women Composers</copyright>
<maintainer email="elharo@metalab.unc.edu"
url="http://www.macfaq.com/personal.html">
<name>
<first_name>Elliotte</first_name>
<middle_name>Rusty</middle_name>
<last_name>Harold</last_name>
</name>
</maintainer>
</catalog>
View in BrowserPartially supported by Mozilla and IE 5.0
Full W3C Recommendation
<?xml version="1.0" encoding="ISO-8859-1"?>
<?xml-stylesheet type="text/css" href="compositions1.css"?>
<catalog>
...
</catalog>
View in BrowserNot every element needs a rule
The root element should be at least display: block
catalog { font-family: New York, Times New Roman, serif;
font-size: 14pt;
background-color: white;
color: black;
display: block }
Make it look like an H1 heading
category { display: block;
font-family: Helvetica, Arial, sans;
font-size: 32pt;
font-weight: bold;
text-align: center}
catalog { font-family: New York, Times New Roman, serif;
font-size: 14pt;
background-color: white;
color: black;
display: block }
Make it look like a level 2 head
No need to styleize the first, middle, and last names separately
composer { display: block;
font-family: Helvetica, Arial, sans;
font-size: 24pt;
font-weight: bold;
text-align: left}
composition title { display: block;
font-family: Helvetica, Arial, sans;
font-size: 18pt;
font-weight: bold;
text-align: left}
// cataloging_info is only for search engines
cataloging_info { display: none;
color: white}
display: none
requires CSS2:
<last_updated>July 28, 1999</last_updated>
<copyright>1999 New York Women Composers</copyright>
<maintainer email="elharo@metalab.unc.edu"
url="http://www.macfaq.com/personal.html">
<name>
<first_name>Elliotte</first_name>
<middle_name>Rusty</middle_name>
<last_name>Harold</last_name>
</name>
</maintainer>
last_updated, copyright, maintainer {display: block;
font-size: small}
copyright:before {content: "Copyright " }
last_updated:before {content: "Last Modified " }
last_updated {margin-top: 2ex }
Again, some of this requires CSS2
composition * {display:list-item}
description {display: block}
category { display: block; font-family: Helvetica, Arial, sans; font-size: 32pt; font-weight: bold; text-align: center} catalog { font-family: New York, Times New Roman, serif; font-size: 14pt; background-color: white; color: black; display: block } composer { display: block; font-family: Helvetica, Arial, sans; font-size: 24pt; font-weight: bold; text-align: left} composition title { display: block; font-family: Helvetica, Arial, sans; font-size: 18pt; font-weight: bold; text-align: left} composition * {display:list-item} description {display: block} // cataloging_info is only for search engines cataloging_info { display: none; color: #FFFFFF} last_updated, copyright, maintainer {display: block; font-size: small} copyright:before {content: "Copyright " } last_updated:before {content: "Last Modified " } last_updated {margin-top: 2ex }
Should be able to match composers with compositions
Should be able to sort composers and compositions by name
Should be able to include data from attributes; e.g. the maintainer's email address
Horizontal rules would be nice
Better header (e.g. title
and meta
tags) would be nice
CSS Level 3?
XSL
XSL + JavaScript
CSS has broader support
CSS is more stable
XSL is much more powerful
XSL can be used without browser support by transforming to HTML on the server side
Open and close all tags
Empty tags end with />
There is a unique root element
Elements may not overlap
Attribute values are quoted
<
and &
are only used to start tags and entities
Only the five predefined entity references are used
Plus more...
Good:
<p>The quick brown fox jumped over the lazy dog</p>
<li>A very <B>important</B> point</li>
Copyright 1999 Elliotte Rusty Harold<br></br>
Bad:
The quick brown fox jumped over the lazy dog<p>
<li>A very <B>important point
Copyright 1999 Elliotte Rusty Harold<br>
<BR/>
, <HR/>
, and
<IMG/>
instead of
<BR>
, <HR>
, and
<IMG>
Web browsers deal inconsistently with these
Can use <BR></BR>
<HR></HR>
<IMG></IMG>
instead
One element completely contains all other elements of the document
This is HTML
in HTML files
The XML declaration and xml-stylesheet
processing instruction are
not elements
If an element contains a start tag for an element, it must also contain the corresponding end tag
Empty elements may appear anywhere
Every non root element has a parent element
Good:
<A HREF="http://metalab.unc.edu/xml/">
<DIV ALIGN="CENTER">
<A HREF="http://metalab.unc.edu/xml/">
<EMBED SRC="minnesotaswale.aif" hidden="true">
Bad:
<A HREF=http://metalab.unc.edu/xml/>
<DIV ALIGN=CENTER>
<EMBED SRC=minnesotaswale.aif hidden=true>
<EMBED SRC="minnesotaswale.aif" hidden>
Good:
<H1>O'Reilly & Associates</H1>
Bad:
<H1>O'Reilly & Associates</H1>
Good:
<CODE>for (int i = 0; i <= args.length; i++ ) { </CODE>
Bad:
<CODE>for (int i = 0; i <= args.length; i++ ) { </CODE>
Good:
&
<
>
"
'
Bad:
©
®
&tm;
α
é
etc.
Entity references must end with a semicolon.
<
is good
<
is bad
Java works best
C, Perl, Python etc. can also be used
Unicode support is the biggest issue
Event based
Programs can plug in different parsers
Represents document as a tree of nodes
Loads entire document into memory at one time
Works for HTML too
The XML Bible
Elliotte Rusty Harold
IDG Books, 1999
ISBN: 0-7645-3236-7
This presentation: http://metalab.unc.edu/xml/slides/sd2000east/basics/