Part I: XML Infoset, Canonical XML, and Digital Signatures
Part II: JDOM
Part III: XML Base and XInclude
Part IV: Schemas
Part V: XLinks
Part VI: XPointers
Part VII: The Oracle Speaks
The Infoset is the unfortunate standard to which those in retreat from the radical and most useful implications of well-formedness have rallied. At its core the Infoset insists that there is 'more' to XML than the straightforward syntax of well-formedness. By imposing its canonical semantics the Infoset obviates the infinite other semantic outcomes which might be elaborated in particular unique circumstances from an instance of well-formed XML 1.0 syntax. The question we should be asking is not whether the Infoset has chosen the correct canonical semantics, but whether the syntactic possibilities of XML 1.0 should be curtailed in this way at all.--Walter Perry on the xml-dev mailing list
<?xml version="1.0" encoding="UTF-8" standalone="no"?> <?xml-stylesheet type="text/css" href="song.css"?> <!DOCTYPE SONG SYSTEM "song.dtd"> <SONG xmlns="http://metalab.unc.edu/xml/namespace/song" xmlns:xlink="http://www.w3.org/1999/xlink"> <TITLE>Hot Cop</TITLE> <PHOTO xlink:type="simple" xlink:show="onLoad" xlink:href="hotcop.jpg" ALT="Victor Willis in Cop Outfit" WIDTH="100" HEIGHT="200"/> <COMPOSER>Jacques Morali</COMPOSER> <COMPOSER>Henri Belolo</COMPOSER> <COMPOSER>Victor Willis</COMPOSER> <PRODUCER>Jacques Morali</PRODUCER> <!-- The publisher is actually Polygram but I needed an example of a general entity reference. --> <PUBLISHER xlink:type="simple" xlink:href="http://www.amrecords.com/"> A & M Records </PUBLISHER> <LENGTH>6:20</LENGTH> <YEAR>1978</YEAR> <ARTIST>Village People</ARTIST> </SONG> <!-- You can tell what album I was listening to when I wrote this example -->
<?xml-stylesheet type="text/css" href="song.css"?><SONG xmlns="http://metalab.unc.edu/xml/namespace/song" xmlns:xlink="http://www.w3.org/1999/xlink"> <TITLE>Hot Cop</TITLE> <PHOTO ALT="Victor Willis in Cop Outfit" HEIGHT="200" WIDTH="100" xlink:href="hotcop.jpg" xlink:show="onLoad" xlink:type="simple"></PHOTO> <COMPOSER>Jacques Morali</COMPOSER> <COMPOSER>Henri Belolo</COMPOSER> <COMPOSER>Victor Willis</COMPOSER> <PRODUCER>Jacques Morali</PRODUCER> <PUBLISHER xlink:href="http://www.amrecords.com/" xlink:type="simple"> A & M Records </PUBLISHER> <LENGTH>6:20</LENGTH> <YEAR>1978</YEAR> <ARTIST>Village People</ARTIST> </SONG>
import org.apache.xerces.parsers.*; import org.w3c.dom.*; import org.xml.sax.*; import java.io.*; public class DOMHotCop { public static void main(String[] args) { DOMParser parser = new DOMParser(); try { parser.parse("http://metalab.unc.edu/xml/examples/hot_cop.xml"); Document d = parser.getDocument(); } catch (SAXException e) { System.err.println(e); } catch (IOException e) { System.err.println(e); } } }
The customary form of an XML document
The canonical form of an XML document
The object form of an XML document
A W3C proposed standard for what is and is not significant in an XML document
Not everyone agrees that this is a good thing! or that this is the right list!
The Document Information Item
Element Information Items
Attribute Information Items
Processing instruction Information Items
Reference to Skipped Entity Information Items
Character Information Items
Comment Information Items
The Document Type Declaration Information Item
Entity Declaration Information Items
Notation Information Items
Entity Start Marker Information Items
Entity End Marker Information Items
CDATA Start Marker Information Items
CDATA End Marker Information Items
Namespace Declaration Information Items
Represents the entire document; not just the root element
Properties:
Children
One Element Information Item for the root element
One Comment Information Item for each Comment
One Processing Instruction Information Item for each Processing Instruction
Notation Declarations
Entity Declarations
Base URI
Standalone Declaration
Version Declaration
<PHOTO
xlink:type="simple" xlink:show="onLoad" xlink:href="hotcop.jpg"
ALT="Victor Willis in Cop Outfit" WIDTH="100" HEIGHT="200"/>
<COMPOSER>Jacques Morali</COMPOSER>
<COMPOSER>
<PERSON>
<NAME>
<FIRST>Henri</FIRST>
<LAST>Belolo</LAST>
</NAME>
</PERSON>
</COMPOSER>
<rdf:RDF xmlns:rdf="http://www.w3.org/TR/REC-rdf-syntax#">
<rdf:Description xmlns:dc="http://purl.org/dc/"
about="http://www.ibiblio.org/examples/impressionists.xml">
<dc:title> Impressionist Paintings </dc:title>
<dc:creator> Elliotte Rusty Harold </dc:creator>
<dc:description>
A list of famous impressionist paintings organized
by painter and date
</dc:description>
<dc:date>2000-08-22</dc:date>
</rdf:Description>
</rdf:RDF>
An Element Information Item Includes:
namespace name
local name
children: a list of element, processing instruction, reference to skipped entity, character, and comment information items, one for each element, processing instruction, reference to an unprocessed external entity, data character, and comment appearing immediately within the current element
attributes: an unordered set of attribute information items, one for each of the attributes
(specified or defaulted from the DTD) of this element. xmlns
attributes
declarations are not include.
declared namespaces: an unordered set of namespace declaration information items, one for each of the namespaces declared either in the start-tag of this element or defaulted from the DTD.
in-scope namespaces: An unordered set of namespace declaration information items, one for each of the namespaces in effect for this element
base URI: The absolute URI of the external entity in which this element appears, as defined in XML Base. If this is not known, this property is null.
parent
xlink:type="simple"
xlink:href="http://www.amrecords.com/"
xlink:type = "simple"
xlink:show = "onLoad"
xlink:href="hotcop.jpg"
ALT="Victor Willis in Cop Outfit"
WIDTH=" 100 "
HEIGHT=' 200 '
An Attribute Information Item Includes:
namespace name
local name
normalized value
children: An ordered list of character information items, one for each character appearing in the normalized attribute value
specified: A flag indicating whether this attribute was actually specified in the start-tag of its element, or was defaulted from the DTD
default: An ordered list of character information items, one for each character appearing in the default value specified for this attribute in the DTD, if any.
attribute type:
ID
IDREF
IDREFS
ENTITY
ENTITIES
NMTOKEN
NMTOKENS
NOTATION
CDATA
ENUMERATED
owner element
<!-- The publisher is actually Polygram but I needed
an example of a general entity reference. -->
<!-- <PUBLISHER xlink:type="simple" xlink:href="http://www.amrecords.com/">
A & M Records
</PUBLISHER>
<LENGTH>6:20</LENGTH>
<YEAR>1978</YEAR>
<ARTIST>Village People</ARTIST>
</SONG> -->
<!-- You can tell what album I was
listening to when I wrote this example -->
A comment Information Item includes:
content
parent
<?robots index="yes" follow="no"?>
<?php
mysql_connect("database.unc.edu", "clerk", "password");
$result = mysql("CYNW", "SELECT LastName, FirstName FROM Employees
ORDER BY LastName, FirstName");
$i = 0;
while ($i < mysql_numrows ($result)) {
$fields = mysql_fetch_row($result);
echo "<person>$fields[1] $fields[0] </person>\r\n";
$i++;
}
mysql_close();
?>
target
content
base URI
parent
A character is one Unicode character in the content of an element, attribute value, comment or processing instruction data.
A Character Information Item includes:
xmlns:rdf="http://www.w3.org/TR/REC-rdf-syntax#"
xmlns:dc="http://purl.org/dc/"
xmlns="http://www.w3.org/Graphics/SVG/SVG-19991203.dtd"
A Namespace Declaration Information Item includes:
prefix
children: An ordered list of character information items making up the URI in the attribute value.
owner element
<!DOCTYPE SONG SYSTEM "song.dtd">
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
A Document Type Declaration Information Item includes:
<!ELEMENT SONG (TITLE, PHOTO?, COMPOSER+, PRODUCER*, PUBLISHER*, LENGTH?, YEAR?, ARTIST+)> <!ATTLIST SONG xmlns CDATA #REQUIRED xmlns:xlink CDATA #REQUIRED> <!ELEMENT TITLE (#PCDATA)> <!ELEMENT PHOTO EMPTY> <!ATTLIST PHOTO xlink:type CDATA #FIXED "simple" xlink:href CDATA #REQUIRED xlink:show CDATA #IMPLIED ALT CDATA #REQUIRED WIDTH CDATA #REQUIRED HEIGHT CDATA #REQUIRED > <!ELEMENT COMPOSER (#PCDATA)> <!ELEMENT PRODUCER (#PCDATA)> <!ELEMENT PUBLISHER (#PCDATA)> <!ATTLIST PUBLISHER xlink:type CDATA #IMPLIED xlink:href CDATA #IMPLIED > <!ELEMENT LENGTH (#PCDATA)> <!-- This should be a four digit year like "1999", not a two-digit year like "99" --> <!ELEMENT YEAR (#PCDATA)> <!ELEMENT ARTIST (#PCDATA)>
Comments and processing instructions in the DTD are reported as children the Document Type Declaration information item
Notation and general entity declarations are reported as properties of the Document information item
Attribute types and default values are reported on the actual attributes in the document instance.
Everything else is not reported!
An XML document is made up of one or more physical storage units called entities
Entity references :
Parsed internal general entity references like &
Parsed external general entity references
Unparsed external general entity references
External parameter entity references
Internal parameter entity references
Reading an XML document is not the same thing as reading an XML file
The file contains entity references.
The file document contains the entities' replacement text.
When you use a parser to read a document you'll get the text including characters like <. You will not see the entity references.
Entities are resolved when the document is parsed.
An entity start marker information item is reported immediately before each entity's replacement text.
An entity end marker information item is reported immediately before each entity's replacement text.
Each entity marker information item includes
entity
parent
Each entity declaration information item includes
entity type:
internal general entity
external general entity
unparsed entity
document entity
external DTD subset
name
system identifier
public identifier
base URI
notation
content: The replacement text of the entity, if it is an internal entity. In all other cases, this is null.
charset
The internal and external DTD subsets; especially
ELEMENT
and ATTLIST
declarations
Document encoding
Whether an empty element uses two tags or one
What kind of quotes surround attributes
Insignificant white space in attributes
White space that occurs between attributes
Attribute order
A W3C proposed standard serialization format of an XML document instance
Not everyone agrees that this is a good thing! or that this is the right format! It's totally unsuitable for editors and validation.
Based on the XPath data model
Not really InfoSet compatible
Something of this nature is nonetheless clearly needed for non-XML aware tools like digital signatures, change managements, hash functions, and the like.
The document is encoded in UTF-8
Line breaks are normalized to a linefeed (ASCII , \n)
Attribute values are normalized, as if by a validating processor
Character and parsed entity references are replaced
CDATA sections are replaced with their character content
The XML and document type declarations are removed
Empty elements are converted to start-end tag pairs
White space outside of the document element and within start and end tags is normalized
All white space in character content is retained (except for characters removed during linefeed normalization)
Attribute value delimiters are set to double quotes
Special characters in attribute values and character content are replaced by character references
Superfluous namespace declarations are removed from each element
Default attributes are added to each element
Lexicographic order is imposed on the namespace declarations and attributes of each element
XML InfoSet Specification: http://www.w3.org/TR/xml-infoset
Canonical XML Specification: http://www.w3.org/TR/xml-c14n
XML Signature Specification: http://www.w3.org/TR/xmldsig-core/
There is no compelling reason for a Java API to manipulate XML to be complex, tricky, unintuitive, or a pain in the neck.--JDOM Mission Statement
Writing XML with JDOM
Reading XML through JDOM
The JDOM Classes
An XML document is a tree.
It has a root.
It has nodes.
It is amenable to recursive processing.
Not all applications agree on what the root is.
Not all applications agree on what is and isn't a node.
You need a JDK.
You need some free class libraries.
You need a text editor.
You need some data to process.
A Pure Java API for reading and writing XML Documents
A Java-oriented API for reading and writing XML Documents
A tree-oriented API for reading and writing XML Documents
A parser independent API for reading and writing XML Documents
Created by Brett McLaughlin and Jason Hunter. (James Duncan Davidson is an unindicted coconspirator.)
Open source with an Apache-like license
1.0 Beta 5 is current tarball from October 7, 2000
Last three weeks have added some functionality to the API
This presentation is based on the October 26, 2000 CVS version
org.jdom
org.jdom.input
org.jdom.output
org.jdom.adapters
The classes that represent an XML document and its parts
Attribute
Comment
DocType
Document
Element
Entity
ProcessingInstruction
plus Verifier
plus assorted exceptions
Classes for reading a document into memory from a file or other source
DOMBuilder
SAXBuilder
The classes for writing a document to a file or other target
XMLOutputter
SAXOutputter
(under development)
DOMOutputter
(under development)
Classes for hooking up JDOM to DOM implementations:
AbstractDOMAdapter
OracleV1DOMAdapter
OracleV2DOMAdapter
ProjectXDOMAdapter
XercesDOMAdapter
XercesDOMAdapter
You rarely need to access these directly.
JDOM is for both input and output
New documents can be read from a stream or constructed in memory
An org.jdom.output.XMLOutputter
sends a document from memory to an
OutputStream
or Writer
A JDOM document can also be sent to a
SAX ContentHandler
or DOM org.w3c.dom.Document
for further processing with a different API
<?xml version="1.0"?> <GREETING> Hello JDOM! </GREETING>
import org.jdom.*; import org.jdom.output.XMLOutputter; public class HelloJDOM { public static void main(String[] args) { Element root = new Element("GREETING"); root.setText("Hello JDOM!"); Document doc = new Document(root); // At this point the document only exists in memory. // We still need to serialize it XMLOutputter outputter = new XMLOutputter(); try { outputter.output(doc, System.out); } catch (Exception e) { System.err.println(e); } } }
<?xml version="1.0" encoding="UTF-8"?><GREETING>Hello JDOM!</GREETING>
This is more or less what we wanted, modulo white space.
Here's the same program using DOM instead of JDOM. Which is simpler?
import java.io.*; import org.w3c.dom.*; import org.apache.xerces.dom.*; import org.apache.xml.serialize.*; public class HelloDOM { public static void main(String[] args) { try { DOMImplementationImpl impl = (DOMImplementationImpl) DOMImplementationImpl.getDOMImplementation(); DocumentType type = impl.createDocumentType("GREETING", null, null); // type is supposed to be able to be null, // but in practice that didn't work DocumentImpl hello = (DocumentImpl) impl.createDocument(null, "GREETING", type); Element root = hello.createElement("GREETING"); // We can't use a raw string. Instead we have to first create // a text node. Text text = hello.createTextNode("Hello DOM!"); root.appendChild(text); // Now that the document is created we need to *serialize* it try { OutputFormat format = new OutputFormat(hello); XMLSerializer serializer = new XMLSerializer(System.out, format); serializer.serialize(root); } catch (IOException e) { System.err.println(e); } } catch (DOMException e) { e.printStackTrace(); } } }
import java.math.*; import java.io.*; public class FibonacciText { public static void main(String[] args) { try { FileOutputStream fout = new FileOutputStream("fibonacci.txt"); OutputStreamWriter out = new OutputStreamWriter(fout, "8859_1"); BigInteger low = BigInteger.ZERO; BigInteger high = BigInteger.ONE; for (int i = 0; i <= 25; i++) { out.write(low.toString() + "\r\n"); BigInteger temp = high; high = high.add(low); low = temp; i++; } out.write(high.toString() + "\r\n"); out.close(); } catch (IOException e) { System.err.println(e); } } }
0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 1597 2584 4181 6765 10946 17711 28657 46368 75025 196418
Suppose we want that data in an XML document that looks something like this:
<?xml version="1.0" encoding="UTF-8"?> <Fibonacci_Numbers> <fibonacci index="0">0</fibonacci> <fibonacci index="1">1</fibonacci> <fibonacci index="2">1</fibonacci> <fibonacci index="3">2</fibonacci> <fibonacci index="4">3</fibonacci> <fibonacci index="5">5</fibonacci> <fibonacci index="6">8</fibonacci> <fibonacci index="7">13</fibonacci> <fibonacci index="8">21</fibonacci> <fibonacci index="9">34</fibonacci> <fibonacci index="10">55</fibonacci> <fibonacci index="11">89</fibonacci> <fibonacci index="12">144</fibonacci> <fibonacci index="13">233</fibonacci> <fibonacci index="14">377</fibonacci> <fibonacci index="15">610</fibonacci> <fibonacci index="16">987</fibonacci> <fibonacci index="17">1597</fibonacci> <fibonacci index="18">2584</fibonacci> <fibonacci index="19">4181</fibonacci> <fibonacci index="20">6765</fibonacci> <fibonacci index="21">10946</fibonacci> <fibonacci index="22">17711</fibonacci> <fibonacci index="23">28657</fibonacci> <fibonacci index="24">46368</fibonacci> <fibonacci index="25">75025</fibonacci> </Fibonacci_Numbers>
import org.jdom.Element; import org.jdom.Document; import org.jdom.output.XMLOutputter; import java.math.BigInteger; import java.io.*; public class FibonacciJDOM { public static void main(String[] args) { Element root = new Element("Fibonacci_Numbers"); BigInteger low = BigInteger.ZERO; BigInteger high = BigInteger.ONE; for (int i = 0; i <= 25; i++) { Element fibonacci = new Element("fibonacci"); fibonacci.addAttribute("index", String.valueOf(i)); fibonacci.setText(low.toString()); root.addContent(fibonacci); BigInteger temp = high; high = high.add(low); low = temp; } Document doc = new Document(root); // serialize it into a file try { FileOutputStream out = new FileOutputStream("fibonacci.xml"); XMLOutputter serializer = new XMLOutputter(); serializer.output(doc, out); out.flush(); out.close(); } catch (IOException e) { System.err.println(e); } } }
Again, modulo white space this is correct
<?xml version="1.0" encoding="UTF-8"?><Fibonacci_Numbers><fibonacci index="0">0</fibonacci><fibonacci index="1">1</fibonacci><fibonacci index="2">1</fibonacci><fibonacci index="3">2</fibonacci><fibonacci index="4">3</fibonacci><fibonacci index="5">5</fibonacci><fibonacci index="6">8</fibonacci><fibonacci index="7">13</fibonacci><fibonacci index="8">21</fibonacci><fibonacci index="9">34</fibonacci><fibonacci index="10">55</fibonacci><fibonacci index="11">89</fibonacci><fibonacci index="12">144</fibonacci><fibonacci index="13">233</fibonacci><fibonacci index="14">377</fibonacci><fibonacci index="15">610</fibonacci><fibonacci index="16">987</fibonacci><fibonacci index="17">1597</fibonacci><fibonacci index="18">2584</fibonacci><fibonacci index="19">4181</fibonacci><fibonacci index="20">6765</fibonacci><fibonacci index="21">10946</fibonacci><fibonacci index="22">17711</fibonacci><fibonacci index="23">28657</fibonacci><fibonacci index="24">46368</fibonacci><fibonacci index="25">75025</fibonacci></Fibonacci_Numbers>
Here's the same program using DOM instead of JDOM. Which is simpler?
import java.math.*; import java.io.*; import org.w3c.dom.*; import org.apache.xerces.dom.*; import org.apache.xml.serialize.*; public class FibonacciDOM { public static void main(String[] args) { try { DOMImplementationImpl impl = (DOMImplementationImpl) DOMImplementationImpl.getDOMImplementation(); DocumentType type = impl.createDocumentType("Fibonacci_Numbers", null, null); // type is supposed to be able to be null, // but in practice that didn't work DocumentImpl fibonacci = (DocumentImpl) impl.createDocument(null, "Fibonacci_Numbers", type); BigInteger low = BigInteger.ZERO; BigInteger high = BigInteger.ONE; Element root = fibonacci.createElement("Fibonacci_Numbers"); // This not only creates the element; it also makes it the // root element of the document. for (int i = 0; i <= 25; i++) { Element number = fibonacci.createElement("fibonacci"); number.setAttribute("index", Integer.toString(i)); Text text = fibonacci.createTextNode(low.toString()); number.appendChild(text); root.appendChild(number); BigInteger temp = high; high = high.add(low); low = temp; } try { // Now that the document is created we need to *serialize* it FileOutputStream out = new FileOutputStream("fibonacci_8859_1.xml"); OutputFormat format = new OutputFormat(fibonacci); XMLSerializer serializer = new XMLSerializer(out, format); serializer.serialize(root); out.flush(); out.close(); } catch (IOException e) { System.err.println(e); } } catch (DOMException e) { e.printStackTrace(); } } }
62 lines vs. 42 lines
Suppose we have this DTD at the relative URL fibonacci.dtd:
<!ELEMENT Fibonacci_Numbers (fibonacci*)> <!ELEMENT fibonacci (#PCDATA)> <!ATTLIST fibonacci index CDATA #IMPLIED>
We need this DOCTYPE
declaration:
<!DOCTYPE Fibonacci_Numbers SYSTEM "fibonacci.dtd">
Use the DocType
class to insert a document type declaration
JDOM does not support internal DTD subsets.
JDOM does not let you output a DTD.
import java.math.*; import java.io.*; import org.jdom.*; import org.jdom.output.XMLOutputter; public class ValidFibonacci { public static void main(String[] args) { Element root = new Element("Fibonacci_Numbers"); BigInteger low = BigInteger.ZERO; BigInteger high = BigInteger.ONE; for (int i = 0; i <= 25; i++) { Element fibonacci = new Element("fibonacci"); Attribute index = new Attribute("index", String.valueOf(i)); fibonacci.addAttribute(index); fibonacci.setText(low.toString()); BigInteger temp = high; high = high.add(low); low = temp; root.addContent(fibonacci); } DocType type = new DocType("Fibonacci_Numbers", "fibonacci.dtd"); Document doc = new Document(root, type); // serialize it into a file try { FileOutputStream out = new FileOutputStream("validfibonacci.xml"); XMLOutputter serializer = new XMLOutputter(); serializer.output(doc, out); out.flush(); out.close(); } catch (IOException e) { System.err.println(e); } } }View Output in Browser
<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE Fibonacci_Numbers SYSTEM "fibonacci.dtd"><Fibonacci_Numbers><fibonacci index="0">0</fibonacci><fibonacci index="1">1</fibonacci><fibonacci index="2">1</fibonacci><fibonacci index="3">2</fibonacci><fibonacci index="4">3</fibonacci><fibonacci index="5">5</fibonacci><fibonacci index="6">8</fibonacci><fibonacci index="7">13</fibonacci><fibonacci index="8">21</fibonacci><fibonacci index="9">34</fibonacci><fibonacci index="10">55</fibonacci><fibonacci index="11">89</fibonacci><fibonacci index="12">144</fibonacci><fibonacci index="13">233</fibonacci><fibonacci index="14">377</fibonacci><fibonacci index="15">610</fibonacci><fibonacci index="16">987</fibonacci><fibonacci index="17">1597</fibonacci><fibonacci index="18">2584</fibonacci><fibonacci index="19">4181</fibonacci><fibonacci index="20">6765</fibonacci><fibonacci index="21">10946</fibonacci><fibonacci index="22">17711</fibonacci><fibonacci index="23">28657</fibonacci><fibonacci index="24">46368</fibonacci><fibonacci index="25">75025</fibonacci></Fibonacci_Numbers>
Suppose we want some MathML like this:
<?xml version="1.0" encoding="UTF-8"?>
<mathml:math xmlns:mathml="http://www.w3.org/1998/Math/MathML">
<mathml:mrow>
<mathml:mi>f(0)</mathml:mi>
<mathml:mo>=</mathml:mo>
<mathml:mn>0</mathml:mn>
</mathml:mrow>
<mathml:mrow>
<mathml:mi>f(1)</mathml:mi>
<mathml:mo>=</mathml:mo>
<mathml:mn>1</mathml:mn>
</mathml:mrow>
<mathml:mrow>
<mathml:mi>f(2)</mathml:mi>
<mathml:mo>=</mathml:mo>
<mathml:mn>1</mathml:mn>
</mathml:mrow>
</mathml:math>
Do not use the qualified names like mathml:mn
.
Instead use the prefixes mathml
, local names like mn
,
and URIs like http://www.w3.org/1998/Math/MathML
to create the elements.
Do not include xmlns
attributes
like xmlns:mathml="http://www.w3.org/1998/Math/MathML"
.
XMLOutputter
will
decide where to put the xmlns
attributes
when the document is serialized.
import org.jdom.Element; import org.jdom.Document; import org.jdom.output.XMLOutputter; import java.math.BigInteger; import java.io.*; public class PrefixedFibonacci { public static void main(String[] args) { Element root = new Element("math", "mathml", "http://www.w3.org/1998/Math/MathML"); BigInteger low = BigInteger.ZERO; BigInteger high = BigInteger.ONE; for (int i = 0; i <= 25; i++) { Element mrow = new Element("mrow", "mathml", "http://www.w3.org/1998/Math/MathML"); Element mi = new Element("mi", "mathml", "http://www.w3.org/1998/Math/MathML"); mi.setText("f(" + i + ")"); mrow.addContent(mi); Element mo = new Element("mo", "mathml", "http://www.w3.org/1998/Math/MathML"); mo.setText("="); mrow.addContent(mo); Element mn = new Element("mn", "mathml", "http://www.w3.org/1998/Math/MathML"); mn.setText(low.toString()); mrow.addContent(mn); BigInteger temp = high; high = high.add(low); low = temp; root.addContent(mrow); } Document doc = new Document(root); // serialize it into a file try { FileOutputStream out = new FileOutputStream("prefixed_fibonacci.xml"); XMLOutputter serializer = new XMLOutputter(); serializer.output(doc, out); out.flush(); out.close(); } catch (IOException e) { System.err.println(e); } } }View Output in Browser
Suppose you want some MathML like this:
<?xml version="1.0" encoding="UTF-8"?>
<math xmlns="http://www.w3.org/1998/Math/MathML">
<mrow>
<mi>f(0)</mi>
<mo>=</mo>
<mn>0</mn>
</mrow>
<mrow>
<mi>f(1)</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mrow>
<mi>f(2)</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
</math>
Do not use the local names like mn
.
Instead use the local names like mn
,
and URIs like http://www.w3.org/1998/Math/MathML
to create the elements.
Do not include xmlns
attributes
like xmlns="http://www.w3.org/1998/Math/MathML"
.
XMLOutputter
will
decide where to put the xmlns
attribute
when the document is serialized.
import org.jdom.Element; import org.jdom.Document; import org.jdom.output.XMLOutputter; import java.math.BigInteger; import java.io.*; public class UnprefixedFibonacci { public static void main(String[] args) { Element root = new Element("math", "http://www.w3.org/1998/Math/MathML"); BigInteger low = BigInteger.ZERO; BigInteger high = BigInteger.ONE; for (int i = 0; i <= 25; i++) { Element mrow = new Element("mrow", "http://www.w3.org/1998/Math/MathML"); Element mi = new Element("mi", "http://www.w3.org/1998/Math/MathML"); mi.setText("f(" + i + ")"); mrow.addContent(mi); Element mo = new Element("mo", "http://www.w3.org/1998/Math/MathML"); mo.setText("="); mrow.addContent(mo); Element mn = new Element("mn", "http://www.w3.org/1998/Math/MathML"); mn.setText(low.toString()); mrow.addContent(mn); BigInteger temp = high; high = high.add(low); low = temp; root.addContent(mrow); } Document doc = new Document(root); // serialize it into a file try { FileOutputStream out = new FileOutputStream("unprefixed_fibonacci.xml"); XMLOutputter serializer = new XMLOutputter(); serializer.output(doc, out); out.flush(); out.close(); } catch (IOException e) { System.err.println(e); } } }View Output in Browser
Surname FirstName Team Position Games Played Games Started AtBats Runs Hits Doubles Triples Home runs RBI Stolen Bases Caught Stealing Sacrifice Hits Sacrifice Flies Errors PB Walks Strike outs Hit by pitch
Anderson Garret ANA Outfield 156 151 622 62 183 41 7 15 79 8 3 3 3 6 0 29 80 1
Baughman Justin ANA Second Base 62 54 196 24 50 9 1 1 20 10 4 5 3 8 0 6 36 1
Bolick Frank ANA Third Base 21 11 45 3 7 2 0 1 2 0 0 0 0 0 0 11 8 0
Disarcina Gary ANA Shortstop 157 155 551 73 158 39 3 3 56 12 7 12 3 14 0 21 51 8
Edmonds Jim ANA Outfield 154 150 599 115 184 42 1 25 91 7 5 1 1 5 0 57 114 1
Erstad Darin ANA Outfield 133 129 537 84 159 39 3 19 82 20 6 1 3 3 0 43 77 6
Garcia Carlos ANA Second Base 19 10 35 4 5 1 0 0 0 2 0 1 0 1 0 3 11 1
Glaus Troy ANA Third Base 48 45 165 19 36 9 0 1 23 1 0 0 2 7 0 15 51 0
Greene Todd ANA Outfield 29 15 71 3 18 4 0 1 7 0 0 0 0 0 0 2 20 0
Helfand Eric ANA Catcher 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Hollins Dave ANA Third Base 101 98 363 60 88 16 2 11 39 11 3 2 2 17 0 44 69 7
Jefferies Gregg ANA Outfield 19 18 72 7 25 6 0 1 10 1 0 0 0 0 0 0 5 0
Johnson Mark ANA First Base 10 2 14 1 1 0 0 0 0 0 0 0 0 0 0 0 6 0
Kreuter Chad ANA Catcher 96 74 252 27 63 10 1 2 33 1 0 5 1 9 5 33 49 3
Martin Norberto ANA Second Base 79 50 195 20 42 2 0 1 13 3 1 3 2 4 0 6 29 0
Mashore Damon ANA Outfield 43 24 98 13 23 6 0 2 11 1 0 1 0 0 0 9 22 3
Molina Ben ANA Catcher 2 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Nevin Phil ANA Catcher 75 65 237 27 54 8 1 8 27 0 0 0 2 5 20 17 67 5
O'Brien Charlie ANA Catcher 62 58 175 13 45 9 0 4 18 0 0 3 3 4 1 10 33 2
Palmeiro Orlando ANA Outfield 74 34 165 28 53 7 2 0 21 5 4 7 0 0 0 20 11 0
Pritchett Chris ANA First Base 31 19 80 12 23 2 1 2 8 2 0 0 0 1 0 4 16 0
Salmon Tim ANA Designated Hitter 136 130 463 84 139 28 1 26 88 0 1 0 10 2 0 90 100 3
Shipley Craig ANA Third Base 77 32 147 18 38 7 1 2 17 0 4 4 1 3 0 5 22 5
Velarde Randy ANA Second Base 51 50 188 29 49 13 1 4 26 7 2 0 1 4 0 34 42 1
Walbeck Matt ANA Catcher 108 91 338 41 87 15 2 6 46 1 1 5 5 7 8 30 68 2
Williams Reggie ANA Outfield 29 7 36 7 13 1 0 1 5 3 3 1 0 0 0 7 11 1
import java.io.*; import org.jdom.*; import org.jdom.output.XMLOutputter; public class BaseballTabToXML { public static void main(String[] args) { Element root = new Element("players"); try { FileInputStream fin = new FileInputStream(args[0]); BufferedReader in = new BufferedReader(new InputStreamReader(fin)); String playerStats; while ((playerStats = in.readLine()) != null) { String[] stats = splitLine(playerStats); Element player = new Element("player"); Element first_name = new Element("first_name"); first_name.setText(stats[1]); player.addContent(first_name); Element surname = new Element("surname"); surname.setText(stats[0]); player.addContent(surname); Element games_played = new Element("games_played"); games_played.setText(stats[4]); player.addContent(games_played); Element at_bats = new Element("at_bats"); at_bats.setText(stats[6]); player.addContent(at_bats); Element runs = new Element("runs"); runs.setText(stats[7]); player.addContent(runs); Element hits = new Element("hits"); hits.setText(stats[8]); player.addContent(hits); Element doubles = new Element("doubles"); doubles.setText(stats[9]); player.addContent(doubles); Element triples = new Element("triples"); triples.setText(stats[10]); player.addContent(triples); Element home_runs = new Element("home_runs"); home_runs.setText(stats[11]); player.addContent(home_runs); Element runs_batted_in = new Element("runs_batted_in"); runs_batted_in.setText(stats[12]); player.addContent(runs_batted_in); Element stolen_bases = new Element("stolen_bases"); stolen_bases.setText(stats[13]); player.addContent(stolen_bases); Element caught_stealing = new Element("caught_stealing"); caught_stealing.setText(stats[14]); player.addContent(caught_stealing); Element sacrifice_hits = new Element("sacrifice_hits"); sacrifice_hits.setText(stats[15]); player.addContent(sacrifice_hits); Element sacrifice_flies = new Element("sacrifice_flies"); sacrifice_flies.setText(stats[16]); player.addContent(sacrifice_flies); Element errors = new Element("errors"); errors.setText(stats[17]); player.addContent(errors); Element passed_by_ball = new Element("passed_by_ball"); passed_by_ball.setText(stats[18]); player.addContent(passed_by_ball); Element walks = new Element("walks"); walks.setText(stats[19]); player.addContent(walks); Element strike_outs = new Element("strike_outs"); strike_outs.setText(stats[20]); player.addContent(strike_outs); Element hit_by_pitch = new Element("hit_by_pitch"); hit_by_pitch.setText(stats[21]); player.addContent(hit_by_pitch); root.addContent(player); } Document doc = new Document(root); // serialize it into a file FileOutputStream fout = new FileOutputStream("baseballstats.xml"); XMLOutputter serializer = new XMLOutputter(); serializer.output(doc, fout); fout.flush(); fout.close(); in.close(); } catch (IOException e) { System.err.println(e); } catch (ArrayIndexOutOfBoundsException e) { System.out.println("Usage: java BaseballTabToXML input_file.tab"); } } public static String[] splitLine(String playerStats) { // count the number of tabs int numTabs = 0; for (int i = 0; i < playerStats.length(); i++) { if (playerStats.charAt(i) == '\t') numTabs++; } int numFields = numTabs + 1; String[] fields = new String[numFields]; int position = 0; for (int i = 0; i < numFields; i++) { StringBuffer field = new StringBuffer(); while (position < playerStats.length() && playerStats.charAt(position++) != '\t') { field.append(playerStats.charAt(position-1)); } fields[i] = field.toString(); } return fields; } }View Output in Browser
<?xml version="1.0"?> <players> <player> <first_name>FirstName</first_name> <surname>Surname</surname> <games_played>Games Played</games_played> <at_bats>AtBats</at_bats> <runs>Runs</runs> <hits>Hits</hits> <doubles>Doubles</doubles> <triples>Triples</triples> <home_runs>Home runs</home_runs> <stolen_bases>RBI</stolen_bases> <caught_stealing>Caught Stealing</caught_stealing> <sacrifice_hits>Sacrifice Hits</sacrifice_hits> <sacrifice_flies>Sacrifice Flies</sacrifice_flies> <errors>Errors</errors> <passed_by_ball>PB</passed_by_ball> <walks>Walks</walks> <strike_outs>Strike outs</strike_outs> <hit_by_pitch>Hit by pitch</hit_by_pitch> </player> <player> <first_name>Garret </first_name> <surname>Anderson</surname> <games_played>156</games_played> <at_bats>622</at_bats> <runs>62</runs> <hits>183</hits> <doubles>41</doubles> <triples>7</triples> <home_runs>15</home_runs> <stolen_bases>79</stolen_bases> <caught_stealing>3</caught_stealing> <sacrifice_hits>3</sacrifice_hits> <sacrifice_flies>3</sacrifice_flies> <errors>6</errors> <passed_by_ball>0</passed_by_ball> <walks>29</walks> <strike_outs>80</strike_outs> <hit_by_pitch>1</hit_by_pitch> </player> <player> <first_name>Justin </first_name> <surname>Baughman</surname> <games_played>62</games_played> <at_bats>196</at_bats> <runs>24</runs> <hits>50</hits> <doubles>9</doubles> <triples>1</triples> <home_runs>1</home_runs> <stolen_bases>20</stolen_bases> <caught_stealing>4</caught_stealing> <sacrifice_hits>5</sacrifice_hits> <sacrifice_flies>3</sacrifice_flies> <errors>8</errors> <passed_by_ball>0</passed_by_ball> <walks>6</walks> <strike_outs>36</strike_outs> <hit_by_pitch>1</hit_by_pitch> </player> <player> <first_name>Frank </first_name> <surname>Bolick</surname> <games_played>21</games_played> <at_bats>45</at_bats> <runs>3</runs> <hits>7</hits> <doubles>2</doubles> <triples>0</triples> <home_runs>1</home_runs> <stolen_bases>2</stolen_bases> <caught_stealing>0</caught_stealing> <sacrifice_hits>0</sacrifice_hits> <sacrifice_flies>0</sacrifice_flies> <errors>0</errors> <passed_by_ball>0</passed_by_ball> <walks>11</walks> <strike_outs>8</strike_outs> <hit_by_pitch>0</hit_by_pitch> </player> <player> <first_name>Gary </first_name> <surname>Disarcina</surname> <games_played>157</games_played> <at_bats>551</at_bats> <runs>73</runs> <hits>158</hits> <doubles>39</doubles> <triples>3</triples> <home_runs>3</home_runs> <stolen_bases>56</stolen_bases> <caught_stealing>7</caught_stealing> <sacrifice_hits>12</sacrifice_hits> <sacrifice_flies>3</sacrifice_flies> <errors>14</errors> <passed_by_ball>0</passed_by_ball> <walks>21</walks> <strike_outs>51</strike_outs> <hit_by_pitch>8</hit_by_pitch> </player> <player> <first_name>Jim </first_name> <surname>Edmonds</surname> <games_played>154</games_played> <at_bats>599</at_bats> <runs>115</runs> <hits>184</hits> <doubles>42</doubles> <triples>1</triples> <home_runs>25</home_runs> <stolen_bases>91</stolen_bases> <caught_stealing>5</caught_stealing> <sacrifice_hits>1</sacrifice_hits> <sacrifice_flies>1</sacrifice_flies> <errors>5</errors> <passed_by_ball>0</passed_by_ball> <walks>57</walks> <strike_outs>114</strike_outs> <hit_by_pitch>1</hit_by_pitch> </player> <player> <first_name>Darin </first_name> <surname>Erstad</surname> <games_played>133</games_played> <at_bats>537</at_bats> <runs>84</runs> <hits>159</hits> <doubles>39</doubles> <triples>3</triples> <home_runs>19</home_runs> <stolen_bases>82</stolen_bases> <caught_stealing>6</caught_stealing> <sacrifice_hits>1</sacrifice_hits> <sacrifice_flies>3</sacrifice_flies> <errors>3</errors> <passed_by_ball>0</passed_by_ball> <walks>43</walks> <strike_outs>77</strike_outs> <hit_by_pitch>6</hit_by_pitch> </player> <player> <first_name>Carlos </first_name> <surname>Garcia</surname> <games_played>19</games_played> <at_bats>35</at_bats> <runs>4</runs> <hits>5</hits> <doubles>1</doubles> <triples>0</triples> <home_runs>0</home_runs> <stolen_bases>0</stolen_bases> <caught_stealing>0</caught_stealing> <sacrifice_hits>1</sacrifice_hits> <sacrifice_flies>0</sacrifice_flies> <errors>1</errors> <passed_by_ball>0</passed_by_ball> <walks>3</walks> <strike_outs>11</strike_outs> <hit_by_pitch>1</hit_by_pitch> </player> <player> <first_name>Troy </first_name> <surname>Glaus</surname> <games_played>48</games_played> <at_bats>165</at_bats> <runs>19</runs> <hits>36</hits> <doubles>9</doubles> <triples>0</triples> <home_runs>1</home_runs> <stolen_bases>23</stolen_bases> <caught_stealing>0</caught_stealing> <sacrifice_hits>0</sacrifice_hits> <sacrifice_flies>2</sacrifice_flies> <errors>7</errors> <passed_by_ball>0</passed_by_ball> <walks>15</walks> <strike_outs>51</strike_outs> <hit_by_pitch>0</hit_by_pitch> </player> <player> <first_name>Todd </first_name> <surname>Greene</surname> <games_played>29</games_played> <at_bats>71</at_bats> <runs>3</runs> <hits>18</hits> <doubles>4</doubles> <triples>0</triples> <home_runs>1</home_runs> <stolen_bases>7</stolen_bases> <caught_stealing>0</caught_stealing> <sacrifice_hits>0</sacrifice_hits> <sacrifice_flies>0</sacrifice_flies> <errors>0</errors> <passed_by_ball>0</passed_by_ball> <walks>2</walks> <strike_outs>20</strike_outs> <hit_by_pitch>0</hit_by_pitch> </player> <player> <first_name>Eric </first_name> <surname>Helfand</surname> <games_played>0</games_played> <at_bats>0</at_bats> <runs>0</runs> <hits>0</hits> <doubles>0</doubles> <triples>0</triples> <home_runs>0</home_runs> <stolen_bases>0</stolen_bases> <caught_stealing>0</caught_stealing> <sacrifice_hits>0</sacrifice_hits> <sacrifice_flies>0</sacrifice_flies> <errors>0</errors> <passed_by_ball>0</passed_by_ball> <walks>0</walks> <strike_outs>0</strike_outs> <hit_by_pitch>0</hit_by_pitch> </player> <player> <first_name>Dave </first_name> <surname>Hollins</surname> <games_played>101</games_played> <at_bats>363</at_bats> <runs>60</runs> <hits>88</hits> <doubles>16</doubles> <triples>2</triples> <home_runs>11</home_runs> <stolen_bases>39</stolen_bases> <caught_stealing>3</caught_stealing> <sacrifice_hits>2</sacrifice_hits> <sacrifice_flies>2</sacrifice_flies> <errors>17</errors> <passed_by_ball>0</passed_by_ball> <walks>44</walks> <strike_outs>69</strike_outs> <hit_by_pitch>7</hit_by_pitch> </player> <player> <first_name>Gregg </first_name> <surname>Jefferies</surname> <games_played>19</games_played> <at_bats>72</at_bats> <runs>7</runs> <hits>25</hits> <doubles>6</doubles> <triples>0</triples> <home_runs>1</home_runs> <stolen_bases>10</stolen_bases> <caught_stealing>0</caught_stealing> <sacrifice_hits>0</sacrifice_hits> <sacrifice_flies>0</sacrifice_flies> <errors>0</errors> <passed_by_ball>0</passed_by_ball> <walks>0</walks> <strike_outs>5</strike_outs> <hit_by_pitch>0</hit_by_pitch> </player> <player> <first_name>Mark </first_name> <surname>Johnson</surname> <games_played>10</games_played> <at_bats>14</at_bats> <runs>1</runs> <hits>1</hits> <doubles>0</doubles> <triples>0</triples> <home_runs>0</home_runs> <stolen_bases>0</stolen_bases> <caught_stealing>0</caught_stealing> <sacrifice_hits>0</sacrifice_hits> <sacrifice_flies>0</sacrifice_flies> <errors>0</errors> <passed_by_ball>0</passed_by_ball> <walks>0</walks> <strike_outs>6</strike_outs> <hit_by_pitch>0</hit_by_pitch> </player> <player> <first_name>Chad </first_name> <surname>Kreuter</surname> <games_played>96</games_played> <at_bats>252</at_bats> <runs>27</runs> <hits>63</hits> <doubles>10</doubles> <triples>1</triples> <home_runs>2</home_runs> <stolen_bases>33</stolen_bases> <caught_stealing>0</caught_stealing> <sacrifice_hits>5</sacrifice_hits> <sacrifice_flies>1</sacrifice_flies> <errors>9</errors> <passed_by_ball>5</passed_by_ball> <walks>33</walks> <strike_outs>49</strike_outs> <hit_by_pitch>3</hit_by_pitch> </player> <player> <first_name>Norberto </first_name> <surname>Martin</surname> <games_played>79</games_played> <at_bats>195</at_bats> <runs>20</runs> <hits>42</hits> <doubles>2</doubles> <triples>0</triples> <home_runs>1</home_runs> <stolen_bases>13</stolen_bases> <caught_stealing>1</caught_stealing> <sacrifice_hits>3</sacrifice_hits> <sacrifice_flies>2</sacrifice_flies> <errors>4</errors> <passed_by_ball>0</passed_by_ball> <walks>6</walks> <strike_outs>29</strike_outs> <hit_by_pitch>0</hit_by_pitch> </player> <player> <first_name>Damon </first_name> <surname>Mashore</surname> <games_played>43</games_played> <at_bats>98</at_bats> <runs>13</runs> <hits>23</hits> <doubles>6</doubles> <triples>0</triples> <home_runs>2</home_runs> <stolen_bases>11</stolen_bases> <caught_stealing>0</caught_stealing> <sacrifice_hits>1</sacrifice_hits> <sacrifice_flies>0</sacrifice_flies> <errors>0</errors> <passed_by_ball>0</passed_by_ball> <walks>9</walks> <strike_outs>22</strike_outs> <hit_by_pitch>3</hit_by_pitch> </player> <player> <first_name>Ben </first_name> <surname>Molina</surname> <games_played>2</games_played> <at_bats>1</at_bats> <runs>0</runs> <hits>0</hits> <doubles>0</doubles> <triples>0</triples> <home_runs>0</home_runs> <stolen_bases>0</stolen_bases> <caught_stealing>0</caught_stealing> <sacrifice_hits>0</sacrifice_hits> <sacrifice_flies>0</sacrifice_flies> <errors>0</errors> <passed_by_ball>0</passed_by_ball> <walks>0</walks> <strike_outs>0</strike_outs> <hit_by_pitch>0</hit_by_pitch> </player> <player> <first_name>Phil </first_name> <surname>Nevin</surname> <games_played>75</games_played> <at_bats>237</at_bats> <runs>27</runs> <hits>54</hits> <doubles>8</doubles> <triples>1</triples> <home_runs>8</home_runs> <stolen_bases>27</stolen_bases> <caught_stealing>0</caught_stealing> <sacrifice_hits>0</sacrifice_hits> <sacrifice_flies>2</sacrifice_flies> <errors>5</errors> <passed_by_ball>20</passed_by_ball> <walks>17</walks> <strike_outs>67</strike_outs> <hit_by_pitch>5</hit_by_pitch> </player> <player> <first_name>Charlie </first_name> <surname>Obrien</surname> <games_played>62</games_played> <at_bats>175</at_bats> <runs>13</runs> <hits>45</hits> <doubles>9</doubles> <triples>0</triples> <home_runs>4</home_runs> <stolen_bases>18</stolen_bases> <caught_stealing>0</caught_stealing> <sacrifice_hits>3</sacrifice_hits> <sacrifice_flies>3</sacrifice_flies> <errors>4</errors> <passed_by_ball>1</passed_by_ball> <walks>10</walks> <strike_outs>33</strike_outs> <hit_by_pitch>2</hit_by_pitch> </player> <player> <first_name>Orlando </first_name> <surname>Palmeiro</surname> <games_played>74</games_played> <at_bats>165</at_bats> <runs>28</runs> <hits>53</hits> <doubles>7</doubles> <triples>2</triples> <home_runs>0</home_runs> <stolen_bases>21</stolen_bases> <caught_stealing>4</caught_stealing> <sacrifice_hits>7</sacrifice_hits> <sacrifice_flies>0</sacrifice_flies> <errors>0</errors> <passed_by_ball>0</passed_by_ball> <walks>20</walks> <strike_outs>11</strike_outs> <hit_by_pitch>0</hit_by_pitch> </player> <player> <first_name>Chris </first_name> <surname>Pritchett</surname> <games_played>31</games_played> <at_bats>80</at_bats> <runs>12</runs> <hits>23</hits> <doubles>2</doubles> <triples>1</triples> <home_runs>2</home_runs> <stolen_bases>8</stolen_bases> <caught_stealing>0</caught_stealing> <sacrifice_hits>0</sacrifice_hits> <sacrifice_flies>0</sacrifice_flies> <errors>1</errors> <passed_by_ball>0</passed_by_ball> <walks>4</walks> <strike_outs>16</strike_outs> <hit_by_pitch>0</hit_by_pitch> </player> <player> <first_name>Tim </first_name> <surname>Salmon</surname> <games_played>136</games_played> <at_bats>463</at_bats> <runs>84</runs> <hits>139</hits> <doubles>28</doubles> <triples>1</triples> <home_runs>26</home_runs> <stolen_bases>88</stolen_bases> <caught_stealing>1</caught_stealing> <sacrifice_hits>0</sacrifice_hits> <sacrifice_flies>10</sacrifice_flies> <errors>2</errors> <passed_by_ball>0</passed_by_ball> <walks>90</walks> <strike_outs>100</strike_outs> <hit_by_pitch>3</hit_by_pitch> </player> <player> <first_name>Craig </first_name> <surname>Shipley</surname> <games_played>77</games_played> <at_bats>147</at_bats> <runs>18</runs> <hits>38</hits> <doubles>7</doubles> <triples>1</triples> <home_runs>2</home_runs> <stolen_bases>17</stolen_bases> <caught_stealing>4</caught_stealing> <sacrifice_hits>4</sacrifice_hits> <sacrifice_flies>1</sacrifice_flies> <errors>3</errors> <passed_by_ball>0</passed_by_ball> <walks>5</walks> <strike_outs>22</strike_outs> <hit_by_pitch>5</hit_by_pitch> </player> <player> <first_name>Randy </first_name> <surname>Velarde</surname> <games_played>51</games_played> <at_bats>188</at_bats> <runs>29</runs> <hits>49</hits> <doubles>13</doubles> <triples>1</triples> <home_runs>4</home_runs> <stolen_bases>26</stolen_bases> <caught_stealing>2</caught_stealing> <sacrifice_hits>0</sacrifice_hits> <sacrifice_flies>1</sacrifice_flies> <errors>4</errors> <passed_by_ball>0</passed_by_ball> <walks>34</walks> <strike_outs>42</strike_outs> <hit_by_pitch>1</hit_by_pitch> </player> <player> <first_name>Matt </first_name> <surname>Walbeck</surname> <games_played>108</games_played> <at_bats>338</at_bats> <runs>41</runs> <hits>87</hits> <doubles>15</doubles> <triples>2</triples> <home_runs>6</home_runs> <stolen_bases>46</stolen_bases> <caught_stealing>1</caught_stealing> <sacrifice_hits>5</sacrifice_hits> <sacrifice_flies>5</sacrifice_flies> <errors>7</errors> <passed_by_ball>8</passed_by_ball> <walks>30</walks> <strike_outs>68</strike_outs> <hit_by_pitch>2</hit_by_pitch> </player> <player> <first_name>Reggie </first_name> <surname>Williams</surname> <games_played>29</games_played> <at_bats>36</at_bats> <runs>7</runs> <hits>13</hits> <doubles>1</doubles> <triples>0</triples> <home_runs>1</home_runs> <stolen_bases>5</stolen_bases> <caught_stealing>3</caught_stealing> <sacrifice_hits>1</sacrifice_hits> <sacrifice_flies>0</sacrifice_flies> <errors>0</errors> <passed_by_ball>0</passed_by_ball> <walks>7</walks> <strike_outs>11</strike_outs> <hit_by_pitch>1</hit_by_pitch> </player> </players>
import java.io.*; import org.jdom.*; import org.jdom.output.XMLOutputter; public class BaseballTabToXMLShortcut { public static void main(String[] args) { Element root = new Element("players"); try { FileInputStream fin = new FileInputStream(args[0]); BufferedReader in = new BufferedReader(new InputStreamReader(fin)); String playerStats; while ((playerStats = in.readLine()) != null) { String[] stats = splitLine(playerStats); Element player = new Element("player"); player.addContent((new Element("first_name")).setText(stats[1])); player.addContent((new Element("surname")).setText(stats[0])); player.addContent((new Element("games_played")).setText(stats[4])); player.addContent((new Element("at_bats")).setText(stats[6])); player.addContent((new Element("runs")).setText(stats[7])); player.addContent((new Element("hits")).setText(stats[8])); player.addContent((new Element("doubles")).setText(stats[9])); player.addContent((new Element("triples")).setText(stats[10])); player.addContent((new Element("home_runs")).setText(stats[11])); player.addContent((new Element("runs_batted_in")).setText(stats[12])); player.addContent((new Element("stolen_bases")).setText(stats[13])); player.addContent((new Element("caught_stealing")).setText(stats[14])); player.addContent((new Element("sacrifice_hits")).setText(stats[15])); player.addContent((new Element("sacrifice_flies")).setText(stats[16])); player.addContent((new Element("errors")).setText(stats[17])); player.addContent((new Element("passed_by_ball")).setText(stats[18])); player.addContent((new Element("walks")).setText(stats[19])); player.addContent((new Element("strike_outs")).setText(stats[20])); player.addContent((new Element("hit_by_pitch")).setText(stats[21])); root.addContent(player); } Document doc = new Document(root); // serialize it into a file FileOutputStream fout = new FileOutputStream("baseballstats.xml"); XMLOutputter serializer = new XMLOutputter(); serializer.output(doc, fout); fout.flush(); fout.close(); in.close(); } catch (IOException e) { System.err.println(e); } catch (ArrayIndexOutOfBoundsException e) { System.out.println("Usage: java BaseballTabToXML input_file.tab"); } } public static String[] splitLine(String playerStats) { // count the number of tabs int numTabs = 0; for (int i = 0; i < playerStats.length(); i++) { if (playerStats.charAt(i) == '\t') numTabs++; } int numFields = numTabs + 1; String[] fields = new String[numFields]; int position = 0; for (int i = 0; i < numFields; i++) { StringBuffer field = new StringBuffer(); while (position < playerStats.length() && playerStats.charAt(position++) != '\t') { field.append(playerStats.charAt(position-1)); } fields[i] = field.toString(); } return fields; } }
import java.io.*; import java.text.*; import java.util.*; import org.jdom.*; import org.jdom.output.XMLOutputter; public class BattingAverage { public static void main(String[] args) { Element root = new Element("players"); try { FileInputStream fin = new FileInputStream(args[0]); BufferedReader in = new BufferedReader(new InputStreamReader(fin)); String playerStats; // for formatting batting averages DecimalFormat averages = (DecimalFormat) NumberFormat.getNumberInstance(Locale.US); averages.setMaximumFractionDigits(3); averages.setMinimumFractionDigits(3); averages.setMinimumIntegerDigits(0); while ((playerStats = in.readLine()) != null) { String[] stats = splitLine(playerStats); String formattedAverage; try { int atBats = Integer.parseInt(stats[6]); int hits = Integer.parseInt(stats[8]); int walks = Integer.parseInt(stats[19]); int hitByPitch = Integer.parseInt(stats[21]); int sacrificeFlies = Integer.parseInt(stats[16]); int sacrificeHits = Integer.parseInt(stats[15]); int officialAtBats = atBats - walks - hitByPitch - sacrificeHits; if (officialAtBats <= 0) formattedAverage = "N/A"; else { double average = hits / (double) officialAtBats; formattedAverage = averages.format(average); } } catch (Exception e) { // skip this player continue; } Element player = new Element("player"); Element first_name = new Element("first_name"); first_name.setText(stats[1]); player.addContent(first_name); Element surname = new Element("surname"); surname.setText(stats[0]); player.addContent(surname); Element battingAverage = new Element("batting_average"); battingAverage.setText(formattedAverage); player.addContent(battingAverage); root.addContent(player); } Document doc = new Document(root); // serialize it into a file FileOutputStream fout = new FileOutputStream("battingaverages.xml"); XMLOutputter serializer = new XMLOutputter(); serializer.output(doc, fout); fout.flush(); fout.close(); in.close(); } catch (IOException e) { System.err.println(e); } catch (ArrayIndexOutOfBoundsException e) { System.out.println("Usage: java BattingAverage input_file.tab"); } } public static String[] splitLine(String playerStats) { // count the number of tabs int numTabs = 0; for (int i = 0; i < playerStats.length(); i++) { if (playerStats.charAt(i) == '\t') numTabs++; } int numFields = numTabs + 1; String[] fields = new String[numFields]; int position = 0; for (int i = 0; i < numFields; i++) { StringBuffer field = new StringBuffer(); while (position < playerStats.length() && playerStats.charAt(position++) != '\t') { field.append(playerStats.charAt(position-1)); } fields[i] = field.toString(); } return fields; } }View Output in Browser
<?xml version="1.0"?> <players> <player> <first_name>Garret </first_name> <surname>Anderson</surname> <batting_average>.311</batting_average> </player> <player> <first_name>Justin </first_name> <surname>Baughman</surname> <batting_average>.272</batting_average> </player> <player> <first_name>Frank </first_name> <surname>Bolick</surname> <batting_average>.206</batting_average> </player> <player> <first_name>Gary </first_name> <surname>Disarcina</surname> <batting_average>.310</batting_average> </player> <player> <first_name>Jim </first_name> <surname>Edmonds</surname> <batting_average>.341</batting_average> </player> <player> <first_name>Darin </first_name> <surname>Erstad</surname> <batting_average>.326</batting_average> </player> <player> <first_name>Carlos </first_name> <surname>Garcia</surname> <batting_average>.167</batting_average> </player> <player> <first_name>Troy </first_name> <surname>Glaus</surname> <batting_average>.240</batting_average> </player> <player> <first_name>Todd </first_name> <surname>Greene</surname> <batting_average>.261</batting_average> </player> <player> <first_name>Eric </first_name> <surname>Helfand</surname> <batting_average>N/A</batting_average> </player> <player> <first_name>Dave </first_name> <surname>Hollins</surname> <batting_average>.284</batting_average> </player> <player> <first_name>Gregg </first_name> <surname>Jefferies</surname> <batting_average>.347</batting_average> </player> <player> <first_name>Mark </first_name> <surname>Johnson</surname> <batting_average>.071</batting_average> </player> <player> <first_name>Chad </first_name> <surname>Kreuter</surname> <batting_average>.299</batting_average> </player> <player> <first_name>Norberto </first_name> <surname>Martin</surname> <batting_average>.226</batting_average> </player> <player> <first_name>Damon </first_name> <surname>Mashore</surname> <batting_average>.271</batting_average> </player> <player> <first_name>Ben </first_name> <surname>Molina</surname> <batting_average>.000</batting_average> </player> <player> <first_name>Phil </first_name> <surname>Nevin</surname> <batting_average>.251</batting_average> </player> <player> <first_name>Charlie </first_name> <surname>Obrien</surname> <batting_average>.281</batting_average> </player> <player> <first_name>Orlando </first_name> <surname>Palmeiro</surname> <batting_average>.384</batting_average> </player> <player> <first_name>Chris </first_name> <surname>Pritchett</surname> <batting_average>.303</batting_average> </player> <player> <first_name>Tim </first_name> <surname>Salmon</surname> <batting_average>.376</batting_average> </player> <player> <first_name>Craig </first_name> <surname>Shipley</surname> <batting_average>.286</batting_average> </player> <player> <first_name>Randy </first_name> <surname>Velarde</surname> <batting_average>.320</batting_average> </player> <player> <first_name>Matt </first_name> <surname>Walbeck</surname> <batting_average>.289</batting_average> </player> <player> <first_name>Reggie </first_name> <surname>Williams</surname> <batting_average>.481</batting_average> </player> </players>
You don't need to worry about well-formedness rules
Very configurable output
You can pick any encoding Java supports.
Validity is not automatically maintained.
The stereotypical "Desperate Perl Hacker" (DPH) is supposed to be able to write an XML parser in a weekend.
The parser does the hard work for you.
Your code reads the document through by hooking up JDOM to the parser.
JDOM can connect to any parser that supports SAX or DOM.
SAX, the Simple API for XML
SAX1
SAX2
DOM, the Document Object Model
DOM Level 0
DOM Level 1
DOM Level 2
DOM Level 3
Proprietary APIs
Parser specific APIs
Sun's Java API for XML Parsing = SAX1 + DOM1 + a few factory classes
JSR-000031 XML Data Binding Specification from Bluestone, Sun, WebMethods et al.
The proposed specification will define an XML data-binding facility for the JavaTM Platform. Such a facility compiles an XML schema into one or more Java classes. These automatically-generated classes handle the translation between XML documents that follow the schema and interrelated instances of the derived classes. They also ensure that the constraints expressed in the schema are maintained as instances of the classes are manipulated.
And of course JDOM
Any SAX or DOM compatible parser including:
Apache XML Project's Xerces Java: http://xml.apache.org/xerces-j/index.html
Oracle's XML Parser for Java: http://technet.oracle.com/tech/xml/parser_java2
Sun's Java API for XML http://java.sun.com/products/xml
Public domain, developed on xml-dev mailing list
Maintained by David Megginson
org.xml.sax
package
Event based
Read-only
SAX omits DTD declarations
Adds:
Namespace support
Optional Validation
Optional Lexical events for comments, CDATA sections, entity references
A lot more configurable
Deprecates a lot of SAX1
Adapter classes convert between SAX2 and SAX1 parsers.
Use the XMLReaderFactory
class
to get a parser-specific implementation of the
XMLReader
interface
Your code registers a ContentHandler
with the parser
An InputSource
feeds the document into the parser
As the document is read, the parser calls back to the
methods of the methods of the ContentHandler
to tell it
what it's seeing in the document.
You do not always have all the information you need at the time of a given callback
You may need to store information in various data structures (stacks, queues,vectors, arrays, etc.) and act on it at a later point
For example, the characters()
method is not guaranteed
to give you the maximum number of contiguous characters. It may
split a single run of characters over multiple method calls.
Defines how XML and HTML documents are represented as objects in programs
Defined in IDL; thus language independent
HTML as well as XML
Writing as well as reading
More complete than SAX or JDOM; covers everything except internal and external DTD subsets
DOM focuses more on the document; SAX focuses more on the parser.
Parser independent interfaces; parser dependent implementation classes. Most programs must use the parser dependent classes. JAXP helps solve this, but so far only for DOM Level 1.
Everything's a Node
:
Extensive use of polymorphism
Lots of casting
Language independence means there's very limited use of the Java class library; Various features are reinvented
Language independence requires no method overloading because not all languages support it.
Several features are poor design in Java, if not in other languages:
Named constants are often shorts
Only one kind of exception; details provided by constants
No Java-specific utility methods
like equals()
, hashCode()
, clone()
, or
toString()
DOM Level 0:
DOM Level 1, a W3C Standard
DOM Level 2, a W3C Standard
DOM Level 3, eventual W3C Standard to add schema and DTD support
Eight Modules:
Core | org.w3c.dom * |
HTML | org.w3c.dom.html |
Views | org.w3c.dom.views |
StyleSheets | org.w3c.dom.stylesheets |
CSS | org.w3c.dom.css |
Events | org.w3c.dom.events * |
Traversal | org.w3c.dom.traversal * |
Range | org.w3c.dom.range |
Only the core and traversal modules really apply to XML. The other six are for HTML.
* indicates Xerces support
Each XML document should contain exactly one tree.
A tree contains nodes.
Some nodes may contain other nodes (depending on node type).
Each document node contains:
zero or one doctype nodes
one root element node
zero or more comment and processing instruction nodes
17 interfaces:
DOM Interface | JDOM Equivalent |
---|---|
Attr | Attribute |
CDATASection | CDATA |
CharacterData |
|
Comment | Comment |
Document | Document |
DocumentFragment |
|
DocumentType | DocType |
DOMImplementation |
|
Element | Element |
Entity | Entity |
EntityReference |
|
NamedNodeMap |
|
Node |
|
NodeList |
|
Notation |
|
ProcessingInstruction | ProcessingInstruction |
Text | java.lang.String |
plus one exception:
DOMException
Plus a bunch of HTML stuff in org.w3c.dom.html
and other packages
we will ignore
Library specific code creates a parser
The parser parses the document and returns an
org.w3c.dom.Document
object.
The entire document is stored in memory.
DOM methods and interfaces are used to extract data from this object
Construct an org.jdom.input.SAXBuilder
or an
org.jdom.input.DOMBuilder
; no parser specific code is needed!
Invoke the builder's build()
method to
build a Document
object from a
Reader
InputStream
URL
File
String
containing a SYSTEM ID
If there's a problem building the document, a JDOMException
is thrown
Work with the resulting Document
object
import org.jdom.JDOMException; import org.jdom.input.SAXBuilder; public class JDOMChecker { public static void main(String[] args) { if (args.length == 0) { System.out.println("Usage: java JDOMChecker URL1 URL2..."); } SAXBuilder builder = new SAXBuilder(); // start parsing... for (int i = 0; i < args.length; i++) { // command line should offer URIs or file names try { builder.build(args[i]); // If there are no well-formedness errors, // then no exception is thrown System.out.println(args[i] + " is well formed."); } catch (JDOMException e) { // indicates a well-formedness or other error System.out.println(args[i] + " is not well formed."); System.out.println(e.getMessage()); } } } }
% java JDOMChecker shortlogs.xml HelloJDOM.java shortlogs.xml is well formed. HelloJDOM.java is not well formed. The markup in the document preceding the root element must be well-formed.: Error on line 1 of XML document: The markup in the document preceding the root element must be well-formed.
Not all parsers are validating but Xerces-J is.
Validity errors are not fatal; therefore they do not necessarily cause
a JDOMException
However, you can tell the builder you want it to validate by passing
true
to its constructor:
SAXBuilder builder = new SAXBuilder(true);
import org.jdom.JDOMException; import org.jdom.input.SAXBuilder; public class Validator { public static void main(String[] args) { if (args.length == 0) { System.out.println("Usage: java Validator URL1 URL2..."); } SAXBuilder builder = new SAXBuilder(true); /* ^^^^ */ /* Turn on validation */ // start parsing... // start parsing... for (int i = 0; i < args.length; i++) { // command line should offer URIs or file names try { builder.build(args[i]); // If there are no well-formedness errors, // then no exception is thrown System.out.println(args[i] + " is well formed."); } catch (JDOMException e) { // indicates a well-formedness or validity error System.out.println(args[i] + " is not valid."); System.out.println(e.getMessage()); } } } }
% java Validator invalid_fibonacci.xml invalid_fibonacci.xml is not valid. Element type "title" must be declared.: Error on line 8 of XML document: Element type "title" must be declared. % java Validator validfibonacci.xml validfibonacci.xml is valid.
Use DOMBuilder
instead of SAXBuilder
Must have an existing DOM tree, specifically
an org.w3c.dom.Document
(Note the name conflict
with org.jdom.Document
)
DOM validation is currently broken.
Approximately doubles the memory usage.
In general, SAX is easier and more efficient.
import org.jdom.*; import org.jdom.input.DOMBuilder; import org.apache.xerces.parsers.*; public class DOMValidator { public static void main(String[] args) { if (args.length == 0) { System.out.println("Usage: java DOMValidator URL1 URL2..."); } DOMBuilder builder = new DOMBuilder(true); /* ^^^^ */ /* Turn on validation */ // start parsing... DOMParser parser = new DOMParser(); // Xerces specific class for (int i = 0; i < args.length; i++) { try { // Read the entire document into memory parser.parse(args[i]); org.w3c.dom.Document domDoc = parser.getDocument(); org.jdom.Document jdomDoc = builder.build(domDoc); // If there are no validity errors, // then no exception is thrown System.out.println(args[i] + " is valid."); } catch (Exception e) { // indicates a well-formedness or validity error System.out.println(args[i] + " is not valid."); System.out.println(e.getMessage()); } } } }
One program, three implementations:
SAX
DOM
JDOM
UserLand's RSS based list of Web logs at http://static.userland.com/weblogMonitor/logs.xml:
<?xml version="1.0" encoding="ISO-8859-1" ?> <!DOCTYPE foo SYSTEM "http://msdn.microsoft.com/xml/general/htmlentities.dtd"> <weblogs> <log> <name>MozillaZine</name> <url>http://www.mozillazine.org</url> <changesUrl>http://www.mozillazine.org/contents.rdf</changesUrl> <ownerName>Jason Kersey</ownerName> <ownerEmail>kerz@en.com</ownerEmail> <description>THE source for news on the Mozilla Organization. DevChats, Reviews, Chats, Builds, Demos, Screenshots, and more.</description> <imageUrl></imageUrl> <adImageUrl>http://static.userland.com/weblogMonitor/ads/kerz@en.com.gif</adImageUrl> </log> <log> <name>SalonHerringWiredFool</name> <url>http://www.salonherringwiredfool.com/</url> <ownerName>Some Random Herring</ownerName> <ownerEmail>salonfool@wiredherring.com</ownerEmail> <description></description> </log> <log> <name>SlashDot.Org</name> <url>http://www.slashdot.org/</url> <ownerName>Simply a friend</ownerName> <ownerEmail>afriendofweblogs@weblogs.com</ownerEmail> <description>News for Nerds, Stuff that Matters.</description> </log> </weblogs>
Design Decisions
Should we return an array, an Enumeration
,
a List
, or what?
Perhaps we should use multiple threads?
package org.xml.sax; public interface ContentHandler { public void setDocumentLocator(Locator locator); public void startDocument() throws SAXException; public void endDocument() throws SAXException; public void startPrefixMapping(String prefix, String uri) throws SAXException; public void endPrefixMapping(String prefix) throws SAXException; public void startElement(String namespaceURI, String localName, String rawName, Attributes atts) throws SAXException; public void endElement(String namespaceURI, String localName, String qualifiedName) throws SAXException; public void characters(char[] ch, int start, int length) throws SAXException; public void ignorableWhitespace(char[] ch, int start, int length) throws SAXException; public void processingInstruction(String target, String data) throws SAXException; public void skippedEntity(String name) throws SAXException; }
We do not know how many URLs there will be when we start parsing
so let's use a Vector
Single threaded for simplicity but a real program would use multiple threads
One to load and parse the data
Another thread (probably the main thread) to serve the data
Early data could be provided before the entire document had been read
The character data of each url
element needs to be stored.
Everything else can be ignored.
A startElement()
with the name
url indicates that we need to start
storing this data.
A stopElement()
with the name url indicates that we need to stop
storing this data, convert it to a URL
and put it in the
Vector
Should we hide the XML parsing inside a non-public class to avoid accidentally calling the methods from unexpected places or threads?
import org.xml.sax.*; import org.xml.sax.helpers.XMLReaderFactory; import java.util.*; import java.io.*; public class WeblogsSAX { public static List listChannels() throws IOException, SAXException { return listChannels( "http://static.userland.com/weblogMonitor/logs.xml"); } public static List listChannels(String uri) throws IOException, SAXException { XMLReader parser = XMLReaderFactory.createXMLReader(); Vector urls = new Vector(1000); URIGrabber u = new URIGrabber(urls); parser.setContentHandler(u); parser.parse(uri); return urls; } public static void main(String[] args) { try { List urls; if (args.length > 0) urls = listChannels(args[0]); else urls = listChannels(); Iterator iterator = urls.iterator(); while (iterator.hasNext()) { System.out.println(iterator.next()); } } catch (IOException e) { System.err.println(e); } catch (SAXParseException e) { System.err.println(e); System.err.println("at line " + e.getLineNumber() + ", column " + e.getColumnNumber()); } catch (SAXException e) { System.err.println(e); } catch (/* Unexpected */ Exception e) { e.printStackTrace(); } } }
import org.xml.sax.*; import java.net.*; import java.util.Vector; // conflicts with java.net.ContentHandler class URIGrabber implements org.xml.sax.ContentHandler { private Vector urls; URIGrabber(Vector urls) { this.urls = urls; } // do nothing methods public void setDocumentLocator(Locator locator) {} public void startDocument() throws SAXException {} public void endDocument() throws SAXException {} public void startPrefixMapping(String prefix, String uri) throws SAXException {} public void endPrefixMapping(String prefix) throws SAXException {} public void skippedEntity(String name) throws SAXException {} public void ignorableWhitespace(char[] text, int start, int length) throws SAXException {} public void processingInstruction(String target, String data) throws SAXException {} // Remember, there's no guarantee all the text of the // url element will be returned in a single call to characters private StringBuffer urlBuffer; private boolean collecting = false; public void startElement(String namespaceURI, String localName, String rawName, Attributes atts) throws SAXException { if (rawName.equals("url")) { collecting = true; urlBuffer = new StringBuffer(); } } public void characters(char[] text, int start, int length) throws SAXException { if (collecting) { urlBuffer.append(text, start, length); } } public void endElement(String namespaceURI, String localName, String rawName) throws SAXException { if (rawName.equals("url")) { collecting = false; String url = urlBuffer.toString(); try { urls.addElement(new URL(url)); } catch (MalformedURLException e) { // skip this url } } } }
% java Weblogs shortlogs.xml
http://www.mozillazine.org
http://www.salonherringwiredfool.com/
http://www.slashdot.org/
This example is very sequential so SAX fits it nicely
Let's look at the port to DOM
We cannot easily find out how many URLs there will be when we start parsing, even though they're all in memory.
Single threaded by nature; no benefit to multiple threads since no data will be available until the entire document has been read and parsed.
The character data of each url
element needs to be read.
Everything else can be ignored.
We can use NodeIterator
to walk the tree.
We can use NodeIterator
to select only the
url
elements.
The XML parsing is so straight-forward it can be done inside one method. No extra class is required.
package org.w3c.dom;
public interface Node {
// NodeType
public static final short ELEMENT_NODE = 1;
public static final short ATTRIBUTE_NODE = 2;
public static final short TEXT_NODE = 3;
public static final short CDATA_SECTION_NODE = 4;
public static final short ENTITY_REFERENCE_NODE = 5;
public static final short ENTITY_NODE = 6;
public static final short PROCESSING_INSTRUCTION_NODE = 7;
public static final short COMMENT_NODE = 8;
public static final short DOCUMENT_NODE = 9;
public static final short DOCUMENT_TYPE_NODE = 10;
public static final short DOCUMENT_FRAGMENT_NODE = 11;
public static final short NOTATION_NODE = 12;
public String getNodeName();
public String getNodeValue() throws DOMException;
public void setNodeValue(String nodeValue) throws DOMException;
public short getNodeType();
public Node getParentNode();
public NodeList getChildNodes();
public Node getFirstChild();
public Node getLastChild();
public Node getPreviousSibling();
public Node getNextSibling();
public NamedNodeMap getAttributes();
public Document getOwnerDocument();
public Node insertBefore(Node newChild, Node refChild) throws DOMException;
public Node replaceChild(Node newChild, Node oldChild) throws DOMException;
public Node removeChild(Node oldChild) throws DOMException;
public Node appendChild(Node newChild) throws DOMException;
public boolean hasChildNodes();
public Node cloneNode(boolean deep);
public void normalize();
public boolean supports(String feature, String version);
public String getNamespaceURI();
public String getPrefix();
public void setPrefix(String prefix) throws DOMException;
public String getLocalName();
}
package org.w3c.dom.traversal;
public interface NodeIterator {
public Node nextNode() throws DOMException;
public Node previousNode() throws DOMException;
public int getWhatToShow();
public NodeFilter getFilter();
public boolean getExpandEntityReferences();
public void detach();
}
package org.w3c.dom.traversal;
public interface NodeFilter {
// Constants returned by acceptNode
public static final short FILTER_ACCEPT = 1;
public static final short FILTER_REJECT = 2;
public static final short FILTER_SKIP = 3;
public short acceptNode(Node n);
// Constants for whatToShow
public static final int SHOW_ALL = 0x0000FFFF;
public static final int SHOW_ELEMENT = 0x00000001;
public static final int SHOW_ATTRIBUTE = 0x00000002;
public static final int SHOW_TEXT = 0x00000004;
public static final int SHOW_CDATA_SECTION = 0x00000008;
public static final int SHOW_ENTITY_REFERENCE = 0x00000010;
public static final int SHOW_ENTITY = 0x00000020;
public static final int SHOW_PROCESSING_INSTRUCTION = 0x00000040;
public static final int SHOW_COMMENT = 0x00000080;
public static final int SHOW_DOCUMENT = 0x00000100;
public static final int SHOW_DOCUMENT_TYPE = 0x00000200;
public static final int SHOW_DOCUMENT_FRAGMENT = 0x00000400;
public static final int SHOW_NOTATION = 0x00000800;
}
import org.w3c.dom.*; import org.w3c.dom.traversal.*; import org.xml.sax.SAXException; import java.io.IOException; import java.util.*; import java.net.*; public class WeblogsDOM { public static String DEFAULT_URL = "http://static.userland.com/weblogMonitor/logs.xml"; public static List listChannels() throws DOMException { return listChannels(DEFAULT_URL); } public static List listChannels(String uri) throws DOMException { if (uri == null) { throw new NullPointerException("URL must be non-null"); } org.apache.xerces.parsers.DOMParser parser = new org.apache.xerces.parsers.DOMParser(); Vector urls = null; try { // Read the entire document into memory parser.parse(uri); Document doc = parser.getDocument(); org.apache.xerces.dom.DocumentImpl impl = (org.apache.xerces.dom.DocumentImpl) doc; NodeIterator iterator = impl.createNodeIterator(doc, NodeFilter.SHOW_ALL, new URLFilter(), true); urls = new Vector(100); Node current = null; while ((current = iterator.nextNode()) != null) { try { String content = current.getNodeValue(); URL u = new URL(content); urls.addElement(u); } catch (MalformedURLException e) { // bad input data from one third party; just ignore it } } } catch (SAXException e) { System.err.println(e); } catch (IOException e) { System.err.println(e); } return urls; } static class URLFilter implements NodeFilter { public short acceptNode(Node n) { if (n instanceof Text) { Node parent = n.getParentNode(); if (parent instanceof Element) { Element e = (Element) parent; if (e.getTagName().equals("url")) { return NodeFilter.FILTER_ACCEPT; } } } return NodeFilter.FILTER_REJECT; } } public static void main(String[] args) { try { List urls; if (args.length > 0) { try { URL url = new URL(args[0]); urls = listChannels(args[0]); } catch (MalformedURLException e) { System.err.println("Usage: java WeblogsJDOM url"); return; } } else { urls = listChannels(); } Iterator iterator = urls.iterator(); while (iterator.hasNext()) { System.out.println(iterator.next()); } } catch (/* Unexpected */ Exception e) { e.printStackTrace(); } } // end main }
% java WeblogsDOM
http://2020Hindsight.editthispage.com/
http://www.sff.net/people/mitchw/weblog/weblog.htp
http://nate.weblogs.com/
http://plugins.launchpoint.net
http://404.psistorm.net
http://home.att.net/~geek9000
http://daubnet.tzo.com/weblog
several hundred more...
Let's look at the port to JDOM
We can easily find out how many URLs there will be when we start parsing.
Single threaded by nature; no benefit to mutiple threads since no data will be available until the entire document has been read and parsed.
The character data of each url
element needs to be read.
Everything else can be ignored.
The format is very straight-forward so we don't need to traverse the entire tree.
The XML parsing is so straight-forward it can be done inside one method. No extra class is required.
import org.jdom.*; import org.jdom.input.SAXBuilder; import java.util.*; import java.net.*; public class WeblogsJDOM { public static String DEFAULT_SYSTEM_ID = "http://static.userland.com/weblogMonitor/logs.xml"; public static List listChannels() throws JDOMException { return listChannels(DEFAULT_SYSTEM_ID); } public static List listChannels(String systemID) throws JDOMException, NullPointerException { if (systemID == null) { throw new NullPointerException("URL must be non-null"); } SAXBuilder builder = new SAXBuilder(); // Load the entire document into memory // from the network or file system Document doc = builder.build(systemID); // Descend the tree and find the URLs. It helps that // the document has a very regular structure. Element weblogs = doc.getRootElement(); List logs = weblogs.getChildren("log"); Vector urls = new Vector(logs.size()); Iterator iterator = logs.iterator(); while (iterator.hasNext()) { Object o = iterator.next(); Element log = (Element) o; try { // This will probably be changed to // getElement() or getChildElement() Element url = log.getChild("url"); if (url == null) continue; String content = url.getTextTrim(); URL u = new URL(content); urls.addElement(u); } catch (MalformedURLException e) { // bad input data from one third party; just ignore it } } return urls; } public static void main(String[] args) { try { List urls; if (args.length > 0) { urls = listChannels(args[0]); } else { urls = listChannels(); } Iterator iterator = urls.iterator(); while (iterator.hasNext()) { System.out.println(iterator.next()); } } catch (/* Unexpected */ Exception e) { e.printStackTrace(); } } }
% java WeblogsJDOM
http://2020Hindsight.editthispage.com/
http://www.sff.net/people/mitchw/weblog/weblog.htp
http://nate.weblogs.com/
http://plugins.launchpoint.net
http://404.psistorm.net
http://home.att.net/~geek9000
http://daubnet.tzo.com/weblog
several hundred more...
The classes that represent an XML document and its parts
Document
Element
Attribute
Comment
DocType
Entity
ProcessingInstruction
Verifier
plus assorted exceptions
The root node containing the entire document; not the same as the root element
Contains:
one element
zero or more processing instructions
zero or more comments
zero or one document type declarations
package org.jdom;
public class Document implements Serializable, Cloneable {
protected List content;
protected Element rootElement;
protected DocType docType;
protected Document() {}
public Document(Element rootElement) {}
public Document(Element rootElement, DocType docType) {}
public Element getRootElement() {}
public Document setRootElement(Element rootElement) {}
public DocType getDocType() {}
public Document setDocType(DocType docType) {}
public List getProcessingInstructions() {}
public List getProcessingInstructions(String target) {}
public ProcessingInstruction getProcessingInstruction(String target)
throws NoSuchProcessingInstructionException {}
public Document addProcessingInstruction(ProcessingInstruction pi) {}
public Document addProcessingInstruction(String target, String data) {}
public Document addProcessingInstruction(String target, Map data) {}
public Document setProcessingInstructions(List processingInstructions) {}
public boolean removeProcessingInstruction(ProcessingInstruction processingInstruction) {}
public boolean removeProcessingInstruction(String target) {}
public boolean removeProcessingInstructions(String target) {}
public Document addComment(Comment comment) {}
public List getMixedContent() {}
// basic utility methods
public final String toString() {}
public final String getSerializedForm() {} // going away
public final boolean equals(Object ob) {}
public final int hashCode() {}
public final Object clone() {}
}
import org.jdom.Document; import org.jdom.JDOMException; import org.jdom.input.SAXBuilder; import org.jdom.output.XMLOutputter; import java.io.IOException; public class XMLPrinter { public static void main(String[] args) { if (args.length == 0) { System.out.println("Usage: java XMLPrinter URL1 URL2..."); } SAXBuilder builder = new SAXBuilder(); // start parsing... for (int i = 0; i < args.length; i++) { // command line should offer URIs or file names try { Document doc = builder.build(args[i]); System.out.println("*************" + args[i] + "*************"); XMLOutputter outputter = new XMLOutputter(); outputter.output(doc, System.out); } catch (JDOMException e) { // indicates a well-formedness or other error System.out.println(args[i] + " is not well formed."); System.out.println(e.getMessage()); } catch (IOException e) { // shouldn't happen beacuse System.out eats exceptions System.out.println(e.getMessage()); } } } }
% java XMLPrinter shortlogs.xml
*************shortlogs.xml*************
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE foo SYSTEM "http://msdn.microsoft.com/xml/general/htmlentities.dtd"><weblogs>
<log>
<name>MozillaZine</name>
<url>http://www.mozillazine.org</url>
<changesUrl>http://www.mozillazine.org/contents.rdf</changesUrl>
<ownerName>Jason Kersey</ownerName>
<ownerEmail>kerz@en.com</ownerEmail>
<description>THE source for news on the Mozilla Organization. DevChats, Reviews, Chats, Builds, Demos, Screenshots, and more.</description>
<imageUrl />
<adImageUrl>http://static.userland.com/weblogMonitor/ads/kerz@en.com.gif</adImageUrl>
</log>
<log>
<name>SalonHerringWiredFool</name>
<url>http://www.salonherringwiredfool.com/</url>
<ownerName>Some Random Herring</ownerName>
<ownerEmail>salonfool@wiredherring.com</ownerEmail>
<description />
</log>
<log>
<name>SlashDot.Org</name>
<url>http://www.slashdot.org/</url>
<ownerName>Simply a friend</ownerName>
<ownerEmail>afriendofweblogs@weblogs.com</ownerEmail>
<description>News for Nerds, Stuff that Matters.</description>
</log>
</weblogs>
Represents a complete element including its start tag, end tag, and content
Contains:
Child Elements
Processing Instructions
Comments
Text
JDOM enforces restrictions on element names and possibly values; e.g. name cannot contain start with a digit.
The content is stored as a java.util.List
which contains
One String
object per text node
One Element
object per child element
One Comment
object per comment
One CDATA
object per CDATA section
One ProcessingInstruction
object per processing instruction
Use the regular methods of java.util.List
to
add, remove, and inspect the contents of an element
Since the methods of java.util.List
expect to work
with Object
objects, casting back to JDOM types
and String
is frequent
Various utility methods mean you don't always have to work with the full list.
Attributes are available as a separate List
since attributes are not children.
This list only contains Attribute
objects.
package org.jdom;
public class Element implements Serializable, Cloneable {
protected String name;
protected Namespace namespace;
protected Element parent;
protected boolean isRootElement;
protected List attributes;
protected List content;
protected Element() {}
public Element(String name, Namespace namespace) {}
public Element(String name) {}
public Element(String name, String uri) {}
public Element(String name, String prefix, String uri) {}
public String getName() {}
public Namespace getNamespace() {}
public String getNamespacePrefix() {}
public String getNamespaceURI() {}
public String getQualifiedName() {}
public Element getParent() {}
protected Element setParent(Element parent) {}
public boolean isRootElement() {}
protected Element setIsRootElement(boolean isRootElement) {}
public String getText() {}
public String getTextTrim() {}
public boolean hasMixedContent() {}
public List getMixedContent() {}
public String getChildText(String name) {}
public String getChildTextTrim(String name) {}
public String getChildText(String name, Namespace ns) {}
public Element setMixedContent(List mixedContent) {}
public List getChildren() {}
public Element setChildren(List children) {}
public List getChildren(String name, Namespace ns) {}
// will be renamed, probably getElement() {}
public Element getChild(String name, Namespace ns) {}
public Element getChild(String name) {}
public boolean removeChild(String name) {}
public boolean removeChild(String name, Namespace ns) {}
public boolean removeChildren(String name) {}
public boolean removeChildren(String name, Namespace ns) {}
public boolean removeChildren() {}
public Element addContent(String text) {}
public Element addContent(Element element) {}
public Element addContent(ProcessingInstruction pi) {}
public Element addContent(Entity entity) {}
public Element addContent(Comment comment) {}
public Element addContent(CDATA cdata) {}
public boolean removeContent(Element element) {}
public boolean removeContent(ProcessingInstruction pi) {}
public boolean removeContent(Entity entity) {}
public boolean removeContent(Comment comment) {}
public List getAttributes() {}
public Attribute getAttribute(String name) {}
public Attribute getAttribute(String name, Namespace ns) {}
public String getAttributeValue(String name) {}
public String getAttributeValue(String name, Namespace ns) {}
public Element setAttributes(List attributes) {}
public Element addAttribute(Attribute attribute) {}
public Element addAttribute(String name, String value) {}
public boolean removeAttribute(String name, String uri) {}
public boolean removeAttribute(String name) {}
public boolean removeAttribute(String name, Namespace ns) {}
public Element getCopy(String name, Namespace ns) {}
public Element getCopy(String name, String uri) {}
public Element getCopy(String name, String prefix, String uri) {}
/////////////////////////////////////////////////////////////////
// Basic Utility Methods
/////////////////////////////////////////////////////////////////
public final String toString() {}
public final String getSerializedForm() {} // will be removed
public final boolean equals(Object ob) {}
public final int hashCode() {}
public final Object clone() {}
}
import org.jdom.*; import org.jdom.input.SAXBuilder; import java.util.*; public class XCount { public static void main(String[] args) { if (args.length == 0) { System.out.println("Usage: java XCount URL1 URL2..."); } SAXBuilder builder = new SAXBuilder(); System.out.println( "File\tElements\tAttributes\tComments\tProcessing Instructions\tCharacters"); // start parsing... for (int i = 0; i < args.length; i++) { // command line should offer URIs or file names try { Document doc = builder.build(args[i]); System.out.print(args[i] + ":\t"); String result = count(doc); System.out.println(result); } catch (JDOMException e) { // indicates a well-formedness or other error System.out.println(args[i] + " is not a well formed XML document."); System.out.println(e.getMessage()); } } } private static int numCharacters = 0; private static int numComments = 0; private static int numElements = 0; private static int numAttributes = 0; private static int numProcessingInstructions = 0; public static String count(Document doc) { numCharacters = 0; numComments = 0; numElements = 0; numAttributes = 0; numProcessingInstructions = 0; List children = doc.getMixedContent(); Iterator iterator = children.iterator(); while (iterator.hasNext()) { Object o = iterator.next(); if (o instanceof Element) { numElements++; count((Element) o); } else if (o instanceof Comment) numComments++; else if (o instanceof ProcessingInstruction) numProcessingInstructions++; } String result = numElements + "\t" + numAttributes + "\t" + numComments + "\t" + numProcessingInstructions + "\t" + numCharacters; return result; } public static void count(Element element) { List attributes = element.getAttributes(); numAttributes += attributes.size(); List children = element.getMixedContent(); Iterator iterator = children.iterator(); while (iterator.hasNext()) { Object o = iterator.next(); if (o instanceof Element) { numElements++; count((Element) o); } else if (o instanceof Comment) numComments++; else if (o instanceof ProcessingInstruction) numProcessingInstructions++; else if (o instanceof String) { String s = (String) o; numCharacters += s.length(); } } } }
% java XCount shortlogs.xml hotcop.xml
File Elements Attributes Comments Processing Instructions
Characters
shortlogs.xml: 30 0 0 0 736
hotcop.xml: 11 8 2 1 95
Each attribute is represented as an Attribute
object
Each Attribute
has:
A local name, a String
A value, a String
A Namespace
object (which may be
Namespace.NO_NAMESPACE
)
Everything else can be determined from these three items.
Convenience methods can convert the attribute value to various types
like int
or double
JDOM enforces restrictions on attribute names and values; e.g. value may not contain < or >
Attributes are stored in a java.util.List
in the Element
that contains them
This list only contains Attribute
objects.
package org.jdom;
public class Attribute implements Serializable, Cloneable {
protected String name;
protected Namespace namespace;
protected String value;
protected Attribute() {}
public Attribute(String name, String value, Namespace namespace) {}
public Attribute(String name, String prefix, String uri, String value) {}
public Attribute(String name, String value) {}
public String getName() {}
public String getQualifiedName() {}
public String getNamespacePrefix() {}
public String getNamespaceURI() {}
public Namespace getNamespace() {}
public String getValue() {}
public void setValue(String value) {}
/////////////////////////////////////////////////////////////////
// Basic Utility Methods
/////////////////////////////////////////////////////////////////
public final String toString() {}
public final boolean equals(Object ob) {}
public final int hashCode() {}
public final Object clone() {}
/////////////////////////////////////////////////////////////////
// Convenience Methods below here
/////////////////////////////////////////////////////////////////
public String getValue(String defaultValue) {}
public int getIntValue(int defaultValue) {}
public int getIntValue() throws DataConversionException {}
public long getLongValue(long defaultValue) {}
public long getLongValue() throws DataConversionException {}
public float getFloatValue(float defaultValue) {}
public float getFloatValue() throws DataConversionException {}
public double getDoubleValue(double defaultValue) {}
public double getDoubleValue() throws DataConversionException {}
public boolean getBooleanValue(boolean defaultValue) {}
public boolean getBooleanValue() throws DataConversionException {}
public char getCharValue(char defaultValue) {}
public char getCharValue() throws DataConversionException {}
}
import java.io.IOException; import org.jdom.*; import org.jdom.input.SAXBuilder; import org.jdom.output.XMLOutputter; import java.util.*; public class IDTagger { private static int id = 1; public static void processElement(Element element) { if (element.getAttribute("ID") == null) { element.addAttribute(new Attribute("ID", "_" + id)); id = id + 1; } // recursion List children = element.getChildren(); Iterator iterator = children.iterator(); while (iterator.hasNext()) { processElement((Element) iterator.next()); } } public static void main(String[] args) { SAXBuilder builder = new SAXBuilder(); for (int i = 0; i < args.length; i++) { try { // Read the entire document into memory Document document = builder.build(args[i]); processElement(document.getRootElement()); // now we serialize the document... XMLOutputter serializer = new XMLOutputter(); serializer.output(document, System.out); System.out.flush(); } catch (JDOMException e) { System.err.println(e); continue; } catch (IOException e) { System.err.println(e); continue; } } } // end main }
<?xml version="1.0"?><backslash xmlns:backslash="http://slashdot.org/backslash.dtd"> <story> <title>The Onion to buy the New York Times</title> <url>http://slashdot.org/articles/00/02/19/1128240.shtml</url> <time>2000-02-19 17:25:15</time> <author>CmdrTaco</author> <department>stuff-to-read</department> <topic>media</topic> <comments>20</comments> <section>articles</section> <image>topicmedia.gif</image> </story> <story> <title>Al Gore's Webmaster Answers Your Questions</title> <url>http://slashdot.org/interviews/00/02/19/0932207.shtml</url> <time>2000-02-19 17:00:52</time> <author>Roblimo</author> <department>political-process-online</department> <topic>usa</topic> <comments>49</comments> <section>interviews</section> <image>topicus.gif</image> </story> <story> <title>Open Source Africa</title> <url>http://slashdot.org/articles/00/02/19/1016216.shtml</url> <time>2000-02-19 16:05:58</time> <author>emmett</author> <department>songs-by-toto</department> <topic>linux</topic> <comments>50</comments> <section>articles</section> <image>topiclinux.gif</image> </story> <story> <title>Microsoft Funded by NSA, Helps Spy on Win Users?</title> <url>http://slashdot.org/articles/00/02/19/0750247.shtml</url> <time>2000-02-19 14:07:04</time> <author>Roblimo</author> <department>deep-dark-conspiracy-theories</department> <topic>microsoft</topic> <comments>154</comments> <section>articles</section> <image>topicms.gif</image> </story> <story> <title>X-Men Trailer Released</title> <url>http://slashdot.org/articles/00/02/18/0829209.shtml</url> <time>2000-02-19 13:47:06</time> <author>emmett</author> <department>mutant</department> <topic>movies</topic> <comments>70</comments> <section>articles</section> <image>topicmovies.gif</image> </story> <story> <title>Connell Replies to "Grok" Comments</title> <url>http://slashdot.org/articles/00/02/18/202240.shtml</url> <time>2000-02-19 05:01:37</time> <author>Hemos</author> <department>replying-to-things</department> <topic>linux</topic> <comments>197</comments> <section>articles</section> <image>topiclinux.gif</image> </story> <story> <title>etoy.com Returns</title> <url>http://slashdot.org/yro/00/02/18/1739216.shtml</url> <time>2000-02-19 02:35:06</time> <author>nik</author> <department>NP:-gimme-shelter</department> <topic>internet</topic> <comments>77</comments> <section>yro</section> <image>topicinternet.jpg</image> </story> <story> <title>New Propaganda Series: Rebirth</title> <url>http://slashdot.org/articles/00/02/18/205232.shtml</url> <time>2000-02-19 01:05:26</time> <author>Hemos</author> <department>as-pretty-as-always</department> <topic>graphics</topic> <comments>120</comments> <section>articles</section> <image>topicgraphics3.gif</image> </story> <story> <title>Giving Back</title> <url>http://slashdot.org/features/00/02/18/1631224.shtml</url> <time>2000-02-18 22:27:26</time> <author>emmett</author> <department>salvation-army</department> <topic>news</topic> <comments>122</comments> <section>features</section> <image>topicnews.gif</image> </story> <story> <title>Connectix Considering Open Sourcing VGS?</title> <url>http://slashdot.org/articles/00/02/18/1050225.shtml</url> <time>2000-02-18 20:46:20</time> <author>emmett</author> <department>grain-of-salt</department> <topic>news</topic> <comments>93</comments> <section>articles</section> <image>topicnews.gif</image> </story> </backslash>View Input in Browser
<?xml version="1.0" encoding="UTF-8"?> <backslash ID="_1"> <story ID="_2"> <title ID="_3">The Onion to buy the New York Times</title> <url ID="_4">http://slashdot.org/articles/00/02/19/1128240.shtml</url> <time ID="_5">2000-02-19 17:25:15</time> <author ID="_6">CmdrTaco</author> <department ID="_7">stuff-to-read</department> <topic ID="_8">media</topic> <comments ID="_9">20</comments> <section ID="_10">articles</section> <image ID="_11">topicmedia.gif</image> </story> <story ID="_12"> <title ID="_13">Al Gore's Webmaster Answers Your Questions</title> <url ID="_14">http://slashdot.org/interviews/00/02/19/0932207.shtml</url> <time ID="_15">2000-02-19 17:00:52</time> <author ID="_16">Roblimo</author> <department ID="_17">political-process-online</department> <topic ID="_18">usa</topic> <comments ID="_19">49</comments> <section ID="_20">interviews</section> <image ID="_21">topicus.gif</image> </story> <story ID="_22"> <title ID="_23">Open Source Africa</title> <url ID="_24">http://slashdot.org/articles/00/02/19/1016216.shtml</url> <time ID="_25">2000-02-19 16:05:58</time> <author ID="_26">emmett</author> <department ID="_27">songs-by-toto</department> <topic ID="_28">linux</topic> <comments ID="_29">50</comments> <section ID="_30">articles</section> <image ID="_31">topiclinux.gif</image> </story> <story ID="_32"> <title ID="_33">Microsoft Funded by NSA, Helps Spy on Win Users?</title> <url ID="_34">http://slashdot.org/articles/00/02/19/0750247.shtml</url> <time ID="_35">2000-02-19 14:07:04</time> <author ID="_36">Roblimo</author> <department ID="_37">deep-dark-conspiracy-theories</department> <topic ID="_38">microsoft</topic> <comments ID="_39">154</comments> <section ID="_40">articles</section> <image ID="_41">topicms.gif</image> </story> <story ID="_42"> <title ID="_43">X-Men Trailer Released</title> <url ID="_44">http://slashdot.org/articles/00/02/18/0829209.shtml</url> <time ID="_45">2000-02-19 13:47:06</time> <author ID="_46">emmett</author> <department ID="_47">mutant</department> <topic ID="_48">movies</topic> <comments ID="_49">70</comments> <section ID="_50">articles</section> <image ID="_51">topicmovies.gif</image> </story> <story ID="_52"> <title ID="_53">Connell Replies to "Grok" Comments</title> <url ID="_54">http://slashdot.org/articles/00/02/18/202240.shtml</url> <time ID="_55">2000-02-19 05:01:37</time> <author ID="_56">Hemos</author> <department ID="_57">replying-to-things</department> <topic ID="_58">linux</topic> <comments ID="_59">197</comments> <section ID="_60">articles</section> <image ID="_61">topiclinux.gif</image> </story> <story ID="_62"> <title ID="_63">etoy.com Returns</title> <url ID="_64">http://slashdot.org/yro/00/02/18/1739216.shtml</url> <time ID="_65">2000-02-19 02:35:06</time> <author ID="_66">nik</author> <department ID="_67">NP:-gimme-shelter</department> <topic ID="_68">internet</topic> <comments ID="_69">77</comments> <section ID="_70">yro</section> <image ID="_71">topicinternet.jpg</image> </story> <story ID="_72"> <title ID="_73">New Propaganda Series: Rebirth</title> <url ID="_74">http://slashdot.org/articles/00/02/18/205232.shtml</url> <time ID="_75">2000-02-19 01:05:26</time> <author ID="_76">Hemos</author> <department ID="_77">as-pretty-as-always</department> <topic ID="_78">graphics</topic> <comments ID="_79">120</comments> <section ID="_80">articles</section> <image ID="_81">topicgraphics3.gif</image> </story> <story ID="_82"> <title ID="_83">Giving Back</title> <url ID="_84">http://slashdot.org/features/00/02/18/1631224.shtml</url> <time ID="_85">2000-02-18 22:27:26</time> <author ID="_86">emmett</author> <department ID="_87">salvation-army</department> <topic ID="_88">news</topic> <comments ID="_89">122</comments> <section ID="_90">features</section> <image ID="_91">topicnews.gif</image> </story> <story ID="_92"> <title ID="_93">Connectix Considering Open Sourcing VGS?</title> <url ID="_94">http://slashdot.org/articles/00/02/18/1050225.shtml</url> <time ID="_95">2000-02-18 20:46:20</time> <author ID="_96">emmett</author> <department ID="_97">grain-of-salt</department> <topic ID="_98">news</topic> <comments ID="_99">93</comments> <section ID="_100">articles</section> <image ID="_101">topicnews.gif</image> </story> </backslash>View Output in Browser
Unparsed entities really aren't handled at all.
In general, the parser resolves parsed entities and you never see them.
When writing, the outputter outputs entity references but not the entity's content.
The Entity
class represents a parsed entity.
The API is mostly like the API of Element
This one is still being thought out.
package org.jdom;
public class Entity implements Serializable, Cloneable {
protected String name;
protected List content;
protected Entity() {}
public Entity(String name) {}
public String getName() {}
public String getContent() {}
public Entity setContent(String textContent) {}
public boolean hasMixedContent() {}
public List getMixedContent() {}
public Entity setMixedContent(List mixedContent) {}
public List getChildren() {}
public Entity setChildren(List children) {}
public Entity addChild(Element element) {}
public Entity addChild(String s) {}
public Entity addText(String text) {}
public final String toString() {}
public final String getSerializedForm() {} // will be removed
public final boolean equals(Object ob) {}
public final int hashCode() {}
public final Object clone() {}
}
A Comment
object Represents a comment like this example from the XML 1.0 spec:
<!--* N.B. some readers (notably JC) find the following
paragraph awkward and redundant. I agree it's logically redundant:
it *says* it is summarizing the logical implications of
matching the grammar, and that means by definition it's
logically redundant. I don't think it's rhetorically
redundant or unnecessary, though, so I'm keeping it. It
could however use some recasting when the editors are feeling
stronger. -MSM *-->
No children
JDOM checks the content to make sure it's legal (i.e. does not contain a double-hyphen)
package org.jdom;
public class Comment implements Serializable, Cloneable {
protected String text;
protected Comment() {}
public Comment(String text) {}
public String getText() {}
public void setText(String text) {}
public final String toString() {}
public final String getSerializedForm() {} // will be removed
public final boolean equals(Object ob) {}
public final int hashCode() {}
public final Object clone() {}
}
import org.jdom.*; import org.jdom.input.SAXBuilder; import java.util.*; public class CommentReader { public static void main(String[] args) { SAXBuilder builder = new SAXBuilder(); for (int i = 0; i < args.length; i++) { try { // Read the entire document into memory Document doc = builder.build(args[i]); List content = doc.getMixedContent(); Iterator iterator = content.iterator(); while (iterator.hasNext()) { Object o = iterator.next(); if (o instanceof Comment) { Comment c = (Comment) o; System.out.println(c.getText()); System.out.println(); } else if (o instanceof Element) { processElement((Element) o); } } } catch (JDOMException e) { System.err.println(e); e.getRootCause().printStackTrace(); } } } // end main // note use of recursion public static void processElement(Element element) { List content = element.getMixedContent(); Iterator iterator = content.iterator(); while (iterator.hasNext()) { Object o = iterator.next(); if (o instanceof Comment) { Comment c = (Comment) o; System.out.println(c.getText()); System.out.println(); } else if (o instanceof Element) { processElement((Element) o); } } // end while } }
% java CommentReader hotcop.xml
The publisher is actually Polygram but I needed
an example of a general entity reference.
You can tell what album I was
listening to when I wrote this example
Represents a processing instruction like
<?robots index="yes" follow="no"?>
No children
Some have pseudo-attributes; some don't:
<?php
mysql_connect("database.unc.edu", "clerk", "password");
$result = mysql("music", "SELECT LastName, FirstName FROM Employees
ORDER BY LastName, FirstName");
$i = 0;
while ($i < mysql_numrows ($result)) {
$fields = mysql_fetch_row($result);
echo "<person>$fields[1] $fields[0] </person>\r\n";
$i++;
}
mysql_close();
?>
A ProcessingInstruction is represented as either
Target and Value
Target and Pseudo-attributes
As usual JDOM checks the contents of each processingInstruction
object for well-formedness
package org.jdom;
public class ProcessingInstruction implements Serializable, Cloneable {
protected String target;
protected String rawData;
protected Map mapData;
protected ProcessingInstruction() {}
public ProcessingInstruction(String target, Map data) {}
public ProcessingInstruction(String target, String data) {}
public String getTarget() {}
public String getData() {}
public ProcessingInstruction setData(String data) {}
public ProcessingInstruction setData(Map data) {}
public String getValue(String name) {}
public ProcessingInstruction setValue(String name, String value) {}
public boolean removeValue(String name) {}
public final String toString() {}
public final String getSerializedForm() {} // will be removed
public final boolean equals(Object ob) {}
public final int hashCode() {}
public final Object clone() {}
}
import java.io.*; import java.util.*; import org.jdom.*; import org.jdom.input.SAXBuilder; public class XLinkSpider { private static SAXBuilder builder = new SAXBuilder(); private static Vector visited = new Vector(); private static int maxDepth = 5; private static int currentDepth = 0; public static void listURIs(String systemID) { currentDepth++; try { if (currentDepth < maxDepth) { Document document = builder.build(systemID); // check to see if we're allowed to spider boolean index = true; boolean follow = true; ProcessingInstruction robots = document.getProcessingInstruction("robots"); if (robots != null) { String indexValue = robots.getValue("index"); if (indexValue.equalsIgnoreCase("no")) index = false; String followValue = robots.getValue("follow"); if (followValue.equalsIgnoreCase("no")) follow = false; } Vector uris = new Vector(); // search the document for uris, // store them in vector, and print them if (follow) searchForURIs(document.getRootElement(), uris); Enumeration e = uris.elements(); while (e.hasMoreElements()) { String uri = (String) e.nextElement(); visited.addElement(uri); if (index) listURIs(uri); } } } catch (JDOMException e) { // couldn't load the document, // probably not well-formed XML, skip it } finally { currentDepth--; System.out.flush(); } } private static Namespace xlink = Namespace.getNamespace("http://www.w3.org/1999/xlink"); // use recursion public static void searchForURIs(Element element, Vector uris) { // look for XLinks in this element String uri = element.getAttributeValue("href", xlink); if (uri != null && !uri.equals("") && !visited.contains(uri) && !uris.contains(uri)) { System.out.println(uri); uris.addElement(uri); } // process child elements recursively List children = element.getChildren(); Iterator iterator = children.iterator(); while (iterator.hasNext()) { searchForURIs((Element) iterator.next(), uris); } } public static void main(String[] args) { if (args.length == 0) { System.out.println("Usage: java XLinkSpider URL1 URL2..."); } // start parsing... for (int i = 0; i < args.length; i++) { System.err.println(args[i]); listURIs(args[i]); } // end for } // end main } // end XLinkSpider
JDOM is fully namespace aware
Namespaces are represented by instances of
the Namespace
class rather than by attributes or raw strings
Always ask for elements and attributes by local names and namespace URIs
Elements and attributes that are not in any namespace can be asked for by local name alone
Never identify an element or attribute by qualified name
Mostly for internal parser use
Occasionally useful for tasks like finding out whether a document contains any XLinks
package org.jdom;
public final class Namespace {
public static final Namespace NO_NAMESPACE = new Namespace("", "");
public static final Namespace XML_NAMESPACE =
new Namespace("xml", "http://www.w3.org/XML/1998/namespace");
// factory methods
public static Namespace getNamespace(String prefix, String uri) {}
public static Namespace getNamespace(String uri) {}
// getter methods
public String getPrefix() {}
public String getURI() {}
// utility methods
public boolean equals(Object ob) {}
public String toString() {}
public int hashCode() {}
}
Represents a document type declaration
Has no children
package org.jdom;
public class DocType implements Serializable, Cloneable {
protected String elementName;
protected String publicID;
protected String systemID;
protected DocType() {}
public DocType(String rootElementName, String publicID, String systemID) {}
public DocType(String rootElementName, String systemID) {}
public DocType(String rootElementName) {}
public String getElementName() {}
public String getPublicID() {}
public DocType setPublicID(String publicID) {}
public String getSystemID() {}
public DocType setSystemID(String systemID) {}
// Usual utility methods
public final String toString() {}
public final String getSerializedForm() {} // will be removed
public final boolean equals(Object ob) {}
public final int hashCode() {}
public final Object clone() {}
}
Verify that a document is correct XHTML
From the XHTML 1.0 spec:
It must validate against one of the three DTDs found in Appendix A.
The root element of the document must be
<html>
.
The root element of the document must designate the XHTML namespace using the
xmlns
attribute [XMLNAMES]. The namespace for XHTML is defined to behttp://www.w3.org/1999/xhtml
.
There must be a DOCTYPE declaration in the document prior to the root element. The public identifier included in the DOCTYPE declaration must reference one of the three DTDs found in Appendix A using the respective Formal Public Identifier. The system identifier may be changed to reflect local system conventions.
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "DTD/xhtml1-strict.dtd"> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "DTD/xhtml1-transitional.dtd"> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN" "DTD/xhtml1-frameset.dtd">
import java.io.*; import org.jdom.*; import org.jdom.input.SAXBuilder; public class XHTMLValidator { public static void main(String[] args) { for (int i = 0; i < args.length; i++) { validate(args[i]); } } private static SAXBuilder builder = new SAXBuilder(true); /* ^^^^ */ /* turn on validation */ // not thread safe public static void validate(String source) { Document document; try { document = builder.build(source); } catch (JDOMException e) { System.out.println("Error: " + e.getMessage()); e.printStackTrace(); return; } // If we get this far, then the document is valid XML. // Check to see whether the document is actually XHTML DocType doctype = document.getDocType(); if (doctype == null) { System.out.println("No DOCTYPE"); return; } String name = doctype.getElementName(); String systemID = doctype.getSystemID(); String publicID = doctype.getPublicID(); if (!name.equals("html")) { System.out.println("Incorrect root element name " + name); } if (publicID == null || (!publicID.equals("-//W3C//DTD XHTML 1.0 Strict//EN") && !publicID.equals("-//W3C//DTD XHTML 1.0 Transitional//EN") && !publicID.equals("-//W3C//DTD XHTML 1.0 Frameset//EN"))) { System.out.println(source + " does not seem to use an XHTML 1.0 DTD"); } // Check the namespace on the root element Element root = document.getRootElement(); Namespace namespace = root.getNamespace(); String prefix = namespace.getPrefix(); String uri = namespace.getURI(); if (!uri.equals("http://www.w3.org/1999/xhtml")) { System.out.println(source + " does not properly declare the" + " http://www.w3.org/1999/xhtml namespace" + " on the root element"); } if (!prefix.equals("")) { System.out.println(source + " does not use the empty prefix for XHTML"); } } }
% java XHTMLValidator http://www.w3.org/TR/xhtml1
Error: File "http://www.w3.org/TR/DTD/xhtml1-strict.dtd" not found.: Error on
line -1 of XML document: File "http://www.w3.org/TR/DTD/xhtml1-strict.dtd" not
found.
org.jdom.JDOMException: File "http://www.w3.org/TR/DTD/xhtml1-strict.dtd" not
found.: Error on line -1 of XML document: File
"http://www.w3.org/TR/DTD/xhtml1-strict.dtd" not found.
at org.jdom.input.SAXBuilder.build(SAXBuilder.java:227)
at org.jdom.input.SAXBuilder.build(SAXBuilder.java:359)
at XHTMLValidator.validate(XHTMLValidator.java:25)
at XHTMLValidator.main(XHTMLValidator.java:11)
Root cause: org.jdom.JDOMException: Error on line -1 of XML document: File
"http://www.w3.org/TR/DTD/xhtml1-strict.dtd" not found.
at org.jdom.input.SAXBuilder.build(SAXBuilder.java:228)
at org.jdom.input.SAXBuilder.build(SAXBuilder.java:359)
at XHTMLValidator.validate(XHTMLValidator.java:25)
at XHTMLValidator.main(XHTMLValidator.java:11)
Checks a variety of strings to see if they're legal for particular uses in XML as specified by XML 1.0 and Namespaces in XML.
Mostly for internal parser use
package org.jdom;
public final class Verifier {
public static final String checkElementName(String name) {}
public static final String checkAttributeName(String name) {}
public static final String checkCharacterData(String text) {}
public static final String checkNamespacePrefix(String prefix) {}
public static final String checkNamespaceURI(String uri) {}
public static final String checkProcessingInstructionTarget(String target) {}
public static final String checkCommentData(String data) {}
public static boolean isXMLCharacter(char c) {}
public static boolean isXMLNameCharacter(char c) {}
public static boolean isXMLNameStartCharacter(char c) {}
public static boolean isXMLLetterOrDigit(char c) {}
public static boolean isXMLLetter(char c) {}
public static boolean isXMLCombiningChar(char c) {}
public static boolean isXMLExtender(char c) {}
public static boolean isXMLDigit(char c) {}
}
A checked exception so you must catch it
Wraps other exceptions that are thrown during JDOM operations
like IOException
or SAXException
Root cause of exception (if any) is accessible through
the getRootCause()
method:
public Throwable getRootCause()
Subclasses:
DataConversionException
NoSuchAttributeException
NoSuchChildException
NoSuchProcessingInstructionException
IllegalArgumentException
subclasses:
IllegalAddException
IllegalDataException
IllegalNameException
IllegalTargetException
package org.jdom;
public class JDOMException extends Exception {
protected Throwable rootCause;
public JDOMException() {}
public JDOMException(String message) {}
public JDOMException(String message, Throwable rootCause) {}
public String getMessage() {}
public void printStackTrace() {}
public void printStackTrace(PrintStream s) {}
public void printStackTrace(PrintWriter w) {}
public Throwable getRootCause() {}
}
DOMOutputter
SAXOutputter
XMLOutputter
The process of taking an in-memory JDOM Document
and converting it
to a stream of characters that can be written onto an output stream
The org.jdom.output.XMLOutputter
class
This class is still undergoing API changes.
package org.jdom.output;
public class XMLOutputter implements Cloneable {
protected static final String STANDARD_INDENT = " ";
public XMLOutputter() {}
public XMLOutputter(String indent) {}
public XMLOutputter(String indent, boolean newlines) {}
public XMLOutputter(String indent, boolean newlines, String encoding) {}
public XMLOutputter(XMLOutputter that) {}
public void setLineSeparator(String separator) {}
public void setNewlines(boolean newlines) {}
public void setEncoding(String encoding) {}
public void setOmitEncoding(boolean omitEncoding) {}
public void setSuppressDeclaration(boolean suppressDeclaration) {}
public void setExpandEmptyElements(boolean expandEmptyElements) {}
public void setTrimText(boolean trimText) {}
public void setPadText(boolean padText) {}
public void setIndent(String indent) {}
public void setIndent(boolean doIndent) {}
public void setIndentLevel(int indentLevel) {}
public void setIndentSize(int indentSize) {}
protected void indent(Writer out, int level) throws IOException {}
protected void maybePrintln(Writer out) throws IOException {}
protected Writer makeWriter(OutputStream out)
throws java.io.UnsupportedEncodingException {}
protected Writer makeWriter(OutputStream out, String encoding)
throws java.io.UnsupportedEncodingException {}
public void output(Document doc, OutputStream out) throws IOException {}
public void output(Document doc, Writer writer) throws IOException {}
public void output(Element element, Writer out) throws IOException {}
public void output(Element element, OutputStream out) {}
public void outputElementContent(Element element, Writer out) throws IOException {}
public void output(CDATA cdata, Writer out) throws IOException {}
public void output(CDATA cdata, OutputStream out) throws IOException {}
public void output(Comment comment, Writer out) throws IOException {}
public void output(Comment comment, OutputStream out) throws IOException {}
public void output(String string, Writer out) throws IOException {}
public void output(String string, OutputStream out) throws IOException {}
public void output(Entity entity, Writer out) throws IOException {}
public void output(Entity entity, OutputStream out) throws IOException {}
public void output(ProcessingInstruction processingInstruction, Writer out)
throws IOException {}
public void output(ProcessingInstruction processingInstruction, OutputStream out)
throws IOException {}
public String outputString(Document doc) throws IOException {}
public String outputString(Element element) throws IOException {}
// internal printing methods
protected void printDeclaration(Document doc, Writer out, String encoding)
throws IOException {}
protected void printDocType(DocType docType, Writer out) throws IOException {}
protected void printComment(Comment comment, Writer out, int indentLevel)
throws IOException {}
protected void printProcessingInstruction(ProcessingInstruction pi,
Writer out, int indentLevel) throws IOException {}
protected void printCDATASection(CDATA cdata, Writer out, int indentLevel)
throws IOException {}
protected void printElement(Element element, Writer out,
int indentLevel, NamespaceStack namespaces) throws IOException {}
protected void printElementContent(Element element, Writer out,
int indentLevel, NamespaceStack namespaces, List mixedContent)
throws IOException {}
protected void printString(String s, Writer out) throws IOException {}
protected void printEntity(Entity entity, Writer out) throws IOException {}
protected void printNamespace(Namespace ns, Writer out) throws IOException {}
protected void printAttributes(List attributes, Element parent,
Writer out, NamespaceStack namespaces)
throws IOException {}
public int parseArgs(String[] args, int i) {}
}
Configured with three variables passed to the constructor:
indent
String
added at each level
of output; e.g. two spaces or a tablineSeparator
String
to break lines with,
no line breaking is performed if this is null or the empty string
encoding
Options can be set with these twelve methods:
public void setLineSeparator(String separator) {}
public void setNewlines(boolean newlines) {}
public void setEncoding(String encoding) {}
public void setOmitEncoding(boolean omitEncoding) {}
public void setSuppressDeclaration(boolean suppressDeclaration) {}
public void setExpandEmptyElements(boolean expandEmptyElements) {}
public void setTrimText(boolean trimText) {}
public void setPadText(boolean padText) {}
public void setIndent(String indent) {}
public void setIndent(boolean doIndent) {}
public void setIndentLevel(int indentLevel) {}
public void setIndentSize(int indentSize) {}
The output()
method writes a Document
onto a given
OutputStream
:
public void output(Document doc, OutputStream out) throws IOException {}
public void output(Document doc, Writer writer) throws IOException {}
There are also output()
methods for other JDOM classes:
public void output(Element element, Writer out) throws IOException {}
public void output(Element element, OutputStream out) {}
public void outputElementContent(Element element, Writer out) throws IOException {}
public void output(CDATA cdata, Writer out) throws IOException {}
public void output(CDATA cdata, OutputStream out) throws IOException {}
public void output(Comment comment, Writer out) throws IOException {}
public void output(Comment comment, OutputStream out) throws IOException {}
public void output(String string, Writer out) throws IOException {}
public void output(String string, OutputStream out) throws IOException {}
public void output(Entity entity, Writer out) throws IOException {}
public void output(Entity entity, OutputStream out) throws IOException {}
public void output(ProcessingInstruction processingInstruction, Writer out)
throws IOException {}
public void output(ProcessingInstruction processingInstruction, OutputStream out)
throws IOException {}
public String outputString(Document doc) throws IOException {}
public String outputString(Element element) throws IOException {}
Configured by overriding protected methods:
protected void printDeclaration(Document doc, Writer out, String encoding)
throws IOException {}
protected void printDocType(DocType docType, Writer out) throws IOException {}
protected void printComment(Comment comment, Writer out, int indentLevel)
throws IOException {}
protected void printProcessingInstruction(ProcessingInstruction pi,
Writer out, int indentLevel) throws IOException {}
protected void printCDATASection(CDATA cdata, Writer out, int indentLevel)
throws IOException {}
protected void printElement(Element element, Writer out,
int indentLevel, NamespaceStack namespaces) throws IOException {}
protected void printElementContent(Element element, Writer out,
int indentLevel, NamespaceStack namespaces, List mixedContent)
throws IOException {}
protected void printString(String s, Writer out) throws IOException {}
protected void printEntity(Entity entity, Writer out) throws IOException {}
protected void printNamespace(Namespace ns, Writer out) throws IOException {}
protected void printAttributes(List attributes, Element parent,
Writer out, NamespaceStack namespaces)
throws IOException {}
A bug in the current version of JDOM prevents this from working.
import org.jdom.*; import org.jdom.output.XMLOutputter; import org.jdom.input.SAXBuilder; import java.io.*; import java.util.*; public class TagStripper extends XMLOutputter { public TagStripper() { super(); } // Things we won't print at all protected void printDeclaration(Document doc, Writer out, String encoding) {} protected void printComment(Comment comment, Writer out, int indentLevel) {} protected void printDocType(DocType docType, Writer out) {} protected void printProcessingInstruction(ProcessingInstruction pi, Writer out, int indentLevel) {} protected void printNamespace(Namespace ns, Writer out) {} protected void printAttributes(List attributes, Writer out) {} protected void printElement(Element element, Writer out, int indentLevel, NamespaceStack namespaces) throws IOException { List content = element.getMixedContent(); Iterator iterator = content.iterator(); while (iterator.hasNext()) { Object o = iterator.next(); if (o instanceof String) { out.write((String) o); this.maybePrintln(out); } else if (o instanceof Element) { printElement((Element) o, out, indentLevel, namespaces); } } } // Could easily have put main() method in a separate class public static void main(String[] args) { if (args.length == 0) { System.out.println( "Usage: java TagStripper URL1 URL2..."); } TagStripper stripper = new TagStripper(); SAXBuilder builder = new SAXBuilder(); // start parsing... for (int i = 0; i < args.length; i++) { // command line should offer URIs or file names try { Document doc = builder.build(args[i]); stripper.output(doc, System.out); } catch (JDOMException e) { // a well-formedness error System.out.println(args[i] + " is not well formed."); System.out.println(e.getMessage()); } catch (IOException e) { // a well-formedness error System.out.println(e.getMessage()); } } } }
% java TagStripper hotcop.xml
Hot Cop
Jacques Morali
Henri Belolo
Victor Willis
Jacques Morali
A & M Records
6:20
1978
Village People
The process of taking an in-memory JDOM Document
and converting it to
an org.w3c.dom.Document
object
The org.jdom.output.DOMOutputter
class:
package org.jdom.output;
public class DOMOutputter {
// Constructors
public DOMOutputter() {}
// Outputter methods
public org.w3c.dom.Document output(Document document) {}
public org.w3c.dom.Element output(Element element) {}
public org.w3c.dom.Element output(Element element, String domAdapterClass) {}
public org.w3c.dom.Document output(Document document, String domAdapterClass) {}
// utility methods
protected void buildDOMTree(Object content, org.w3c.dom.Document doc,
org.w3c.dom.Element current, boolean atRoot, LinkedList namespaces) {}
public String getXmlnsTagFor(Namespace ns);
}
The process of taking an in-memory JDOM Document
and
walking its tree while firing off SAX events
The org.jdom.output.SAXOutputter
class
Documents larger than available memory
Byte-for-byte faithful round trips
DTDs
XPath Queries (may be added in 1.1)
JavaWorld: http://javaworld.com/javaworld/jw-05-2000/jw-0518-jdom.html
Java and XML, Brett McLaughlin, O'Reilly & Associates, 2000, ISBN 0-596-00016-2, http://www.oreilly.com/catalog/javaxml/
The problem is that we're not providing the tools. We're providing the specs. That's a whole different ball game. If tools existed for actually making really interesting use of RDF and XLink and XInclude then people would use them. If IE and/or Mozilla supported the full gamut of specs, from XSLT 1.0 to XLink and XInclude (OK, so they're not quite REC's, but with time...) then you would find people using them more.--Matt Sergeant on the xml-dev mailing list
An inband means of specifying the proper URI for a document that can succeed even if out-of-band mechanisms aren't available.
A means of specifying the proper base URI which relative URLs are relative to, even if the document itself is copied to a different location.
An XML replacement for the HTML BASE
element
<slide xml:base="http://www.ibiblio.org/xml/slides/sd2000east/advancedxml">
<title>The xml:base attribute</title>
...
<previous xlink:type="simple" xlink:href="What_Is_XBase.xml"/>
<next xlink:type="simple" xlink:href="xbaseexample.xml"/>
</slide>
May be attached to any element to set the base URI for that element and its descendants
The xml
prefix is automatically bound
to the http://www.w3.org/XML/1998/namespace URI
The value should be an absolute URI
Adapted from the XML Base spec:
<?xml version="1.0"?> <doc xml:base="http://example.org/today/" xmlns:xlink="http://www.w3.org/1999/xlink"> <head> <title>Virtual Library</title> </head> <body> <paragraph> See <link xlink:type="simple" xlink:href="new.xml">what's new</link>!</paragraph> <paragraph>Check out the hot picks of the day!</paragraph> <olist xml:base="/hotpicks/"> <item> <link xlink:type="simple" xlink:href="pick1.xml">Hot Pick #1</link> </item> <item> <link xlink:type="simple" xlink:href="pick2.xml">Hot Pick #2</link> </item> <item> <link xlink:type="simple" xlink:href="pick3.xml">Hot Pick #3</link> </item> </olist> </body> </doc>
"what's new" resolves to the URI "http://example.org/today/new.xml"
"Hot Pick #1" resolves to the URI "http://example.org/hotpicks/pick1.xml"
"Hot Pick #2" resolves to the URI "http://example.org/hotpicks/pick2.xml"
"Hot Pick #3" resolves to the URI "http://example.org/hotpicks/pick3.xml"
How does it interact with XHTML? in particular,
the XHTML base
element?
Browser and other application support?
A means of incuding one XML document inside another, irrespective of validation.
Based on the XML Infoset; a source infoset is transformed into a result infoset
XLink show="embed"
only graphically includes,
like the IMG
element in HTML.
It does not merge infosets.
External parsed entities:
Require a DTD
Can only handle very limited documents; i.e. not all well-formed XML documents are well-formed external parsed entities. In particular XML declarations can be and document type declarations are a problem.
Doesn't allow unparsed text inserted as CDATA
XSLT document()
function
Only handles XSLT
No unparsed, pure-text includes
Custom code or XSLT extension functions
href
attribute identifies the document (or part thereof)
to be included
In the http://www.w3.org/1999/XML/xinclude namespace.
<book xmlns:xinclude="http://www.w3.org/1999/XML/xinclude">
<title>Processing XML with Java</title>
<chapter><xinclude:include href="dom.xml"/></chapter>
<chapter><xinclude:include href="sax.xml"/></chapter>
<chapter><xinclude:include href="jdom.xml"/></chapter>
</book>
parse="xml"
parse="text"
<
will change to <
and so forth.
<slide xmlns:xinclude="http://www.w3.org/1999/XML/xinclude">
<title>The href attribute</title>
<ul>
<li>Identifies the document to be included with a URI</li>
<li>The document at the URI replaces the <code>include</code>
element in the including document</li>
<li>The <code>xinclude</code> prefix is bound to the http://www.w3.org/1999/XML/xinclude
namespace URI.
</li>
</ul>
<pre><code><xinclude:include parse="text" href="processing_xml_with_java.xml"/>
</code></pre>
<description>
A slide from Elliotte Rusty Harold's Advanced XML course at
<host_ref/>, <date_ref/>
</description>
<last_modified>October 26, 2000</last_modified>
</slide>
package com.macfaq.xml; import java.net.*; import java.util.*; import java.io.*; import org.jdom.*; import org.jdom.input.*; import org.jdom.output.*; public class XIncluder { public final static Namespace XINCLUDE_NAMESPACE = Namespace.getNamespace("xinclude", "http://www.w3.org/1999/XML/xinclude"); private static SAXBuilder builder = new SAXBuilder(); public static Document resolve(Document original, String base) throws IOException, JDOMException { if (original == null) throw new NullPointerException("Document must not be null"); Element root = original.getRootElement(); // check to see if root element has an xml:base ???? Element resolved = (Element) resolve(root, base); // catch a ClassCastException if a String is returned???? Document result = new Document(resolved, original.getDocType()); Iterator iterator = original.getMixedContent().iterator(); while (iterator.hasNext()) { Object o = iterator.next(); if (o instanceof Comment) { Comment c = (Comment) o; result.addContent((Comment) c.clone()); } else if (o instanceof ProcessingInstruction) { ProcessingInstruction pi =(ProcessingInstruction) o; result.addContent((ProcessingInstruction) pi.clone()); } } return result; } // either returns an Element or a String public static Object resolve(Element original, String base) throws IOException, JDOMException { if (original == null) throw new NullPointerException("You can't XInclude a null element."); Stack bases = new Stack(); if (base != null) bases.push(base); Object result = resolve(original, bases); bases.pop(); return result; } // either returns an Element or a String protected static Object resolve(Element original, Stack bases) throws IOException, JDOMException { Element result; String base = ""; if (bases.size() != 0) base = (String) bases.peek(); Attribute href = original.getAttribute("href", XINCLUDE_NAMESPACE); Attribute baseAttribute = original.getAttribute("base", Namespace.XML_NAMESPACE); if (baseAttribute != null) base = baseAttribute.getValue(); if (href == null) { // recursively process children result = new Element(original.getName(), original.getNamespace()); Iterator attributes = original.getAttributes().iterator(); while (attributes.hasNext()) { Attribute a = (Attribute) attributes.next(); result.addAttribute((Attribute) a.clone()); } List children = original.getMixedContent(); Iterator iterator = children.iterator(); while (iterator.hasNext()) { Object o = iterator.next(); if (o instanceof Element) { Element e = (Element) o; Object resolved = resolve(e, bases); if (resolved instanceof String) result.addContent((String) resolved); else result.addContent((Element) resolved); } else if (o instanceof String) { result.addContent((String) o); } else if (o instanceof Comment) { result.addContent((Comment) o); } else if (o instanceof CDATA) { result.addContent((CDATA) o); } else if (o instanceof ProcessingInstruction) { result.addContent((ProcessingInstruction) o); } } } else { boolean parse = true; Attribute parseAttribute = original.getAttribute("parse", XINCLUDE_NAMESPACE); if (parseAttribute != null) { if (parseAttribute.getValue().equals("text")) parse = false; } URL remote; if (base != null) { URL context = new URL(base); remote = new URL(context, href.getValue()); } else { remote = new URL(href.getValue()); } // need to handle unparsed results too // need to watch out for loops if (parse) { // checks for equality (OK) or identity (not OK)???? if (bases.contains(remote.toExternalForm())) { throw new RuntimeException("Circular XInclude Reference!"); } Document doc = builder.build(remote); bases.push(remote.toExternalForm()); result = (Element) resolve(doc.getRootElement(), bases); bases.pop(); } else { // insert text return getURL(remote); } } return result; } public static String getURL(URL source) throws IOException { StringBuffer s = new StringBuffer(); InputStream in = new BufferedInputStream(source.openStream()); // does XInclude give you anything to specify the character set???? InputStreamReader reader = new InputStreamReader(in, "8859_1"); int c; while ((c = in.read()) != -1) { if (c == '<') s.append("<"); else if (c == '&') s.append("&"); else s.append((char) c); } return s.toString(); } public static void main(String[] args) { SAXBuilder builder = new SAXBuilder(); XMLOutputter outputter = new XMLOutputter(); for (int i = 0; i < args.length; i++) { try { Document input = builder.build(args[i]); // absolutize URL String base = args[i]; if (base.indexOf(':') < 0) { File f = new File(base); base = f.toURL().toExternalForm(); } Document output = resolve(input, base); // need to set encoding on this to Latin-1 and check what // happens to UTF-8 curly quotes outputter.output(output, System.out); } catch (Exception e) { System.err.println(e); e.printStackTrace(); } } } }
XML Base Specification: http://www.w3.org/TR/xmlbase
XInclude Specification: http://www.w3.org/TR/xinclude
Schemas are not the salvation for the world of Markup Languages, just as DTDs aren't the embodiment of evil.--Ann Navarro on the XHTML-L mailing list
Generically, a document that describes what a correct document may contain
Specifically, a W3C Recommendation for an XML-document syntax that describes the permissible contents of XML documents
Created by W3C XML Schema Working Group based on many different submissions
No known patent, trademark, or other IP restrictions
XML Schema Part 1: Structures: http://www.w3.org/TR/xmlschema-1/
XML Schema Part 2: Datatypes: http://www.w3.org/TR/xmlschema-2/
Unusual, non-XML like syntax
No data typing, especially for element content
Limited extensibility
Only marginally compatible with namespaces
Cannot use mixed content and enforce order and number of child elements
Cannot enforce number of child elements without also enforcing order.
(i.e. no &
operator from SGML)
DTDs | Schemas |
---|---|
<!ELEMENT> declaration | xsd:element element |
<!ATTLIST> declaration | xsd:attribute element |
<!NOTATION> declaration | |
<!ENTITY> declaration | |
Data types |
Last call working draft from April 7, 2000
Candidate Recommendation October 24, 2000
<?xml version="1.0"?> <GREETING> Hello XML! </GREETING>
<xsd:schema xmlns:xsd="http://www.w3.org/1999/XMLSchema"> <xsd:element name="GREETING" type="xsd:string"/> </xsd:schema>
xsi:noNamespaceSchemaLocation
attribute on
root element
xsi
prefix is mapped to http://www.w3.org/1999/XMLSchema-instance URI
For example,
<?xml version="1.0"?> <GREETING xmlns:xsi="http://www.w3.org/1999/XMLSchema-instance" xsi:noNamespaceSchemaLocation="greeting.xsd"> Hello XML! </GREETING>
Other means of connecting schemas to documents are allowed
D:\schemas\examples>java sax.SAX2Count -v greeting2.xml greeting2.xml: 701 ms (1 elems, 1 attrs, 0 spaces, 12 chars)
<?xml version="1.0"?> <GREETING xmlns:xsi="http://www.w3.org/1999/XMLSchema-instance" xsi:noNamespaceSchemaLocation="greeting.xsd"> <P>Hello XML!</P> </GREETING>
D:\speaking\SDExpo 2000 East\schemas\examples>java sax.SAX2Count -v greeting3.xml [Error] greeting3.xml:4:6: Element type "P" must be declared. [Error] greeting3.xml:5:13: Datatype error: In element 'GREETING' : Can not have element children within a simple type content. greeting3.xml: 781 ms (2 elems, 1 attrs, 0 spaces, 14 chars)
The namespace URIs have changed.
<xsd:schema xmlns:xsd="http://www.w3.org/2000/10/XMLSchema"> <xsd:element name="GREETING" type="xsd:string"/> </xsd:schema>
xsi:noNamespaceSchemaLocation
attribute on
root element
xsi
prefix is mapped to http://www.w3.org/2000/10/XMLSchema-instance URI
For example,
<?xml version="1.0"?> <GREETING xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance" xsi:noNamespaceSchemaLocation="greeting.xsd"> Hello XML! </GREETING>
<?xml version="1.0"?> <GREETING xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance" xsi:noNamespaceSchemaLocation="greeting.xsd"> <P>Hello XML!</P> </GREETING>
<?xml version="1.0"?> <SONG xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance" xsi:noNamespaceSchemaLocation="song.xsd"> <TITLE>Hot Cop</TITLE> <COMPOSER>Jacques Morali</COMPOSER> <COMPOSER>Henri Belolo</COMPOSER> <COMPOSER>Victor Willis</COMPOSER> <PRODUCER>Jacques Morali</PRODUCER> <PUBLISHER>PolyGram Records</PUBLISHER> <LENGTH>6:20</LENGTH> <YEAR>1978</YEAR> <ARTIST>Village People</ARTIST> </SONG>
Complex types can have child elements and attributes
Simple types cannot have children or attributes
<xsd:schema xmlns:xsd="http://www.w3.org/2000/10/XMLSchema"> <xsd:element name="SONG" type="songType"/> <xsd:complexType name="songType"> <xsd:element name="TITLE" type="xsd:string"/> <xsd:element name="COMPOSER" type="xsd:string" minOccurs="1" maxOccurs="unbounded"/> <xsd:element name="PRODUCER" type="xsd:string" minOccurs="0" maxOccurs="unbounded"/> <xsd:element name="PUBLISHER" type="xsd:string" minOccurs="0" maxOccurs="1"/> <xsd:element name="LENGTH" type="xsd:timeDuration"/> <xsd:element name="YEAR" type="xsd:string"/> <xsd:element name="ARTIST" type="xsd:string" minOccurs="1" maxOccurs="unbounded"/> </xsd:complexType> </xsd:schema>
xsd:element
declares an element and assigns it a type
xsd:attribute
declares an attribute and assigns it a type
xsd:complexType
defines a new type
D:\speaking\SDExpo 2000 East\schemas\examples>java sax.SAX2Count -v hotcop.xml
[Error] hotcop.xml:10:25: Datatype error: java.text.ParseException: Illegal or
misplaced separator.
Here's the problem:
<?xml version="1.0"?>
<SONG xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="song.xsd">
<TITLE>Hot Cop</TITLE>
<COMPOSER>Jacques Morali</COMPOSER>
<COMPOSER>Henri Belolo</COMPOSER>
<COMPOSER>Victor Willis</COMPOSER>
<PRODUCER>Jacques Morali</PRODUCER>
<PUBLISHER>PolyGram Records</PUBLISHER>
<LENGTH>6:20</LENGTH>
<YEAR>1978</YEAR>
<ARTIST>Village People</ARTIST>
</SONG>
This is not in the schema time duration format! which is ISO 8601 "PnYn MnDTnH nMnS, where nY represents the number of years, nM the number of months, nD the number of days, 'T' is the date/time separator, nH the number of hours, nM the number of minutes and nS the number of seconds. The number of seconds can include decimal digits to arbitrary precision. An optional preceding minus sign ('-') is allowed, to indicate a negative duration."
<?xml version="1.0"?> <SONG xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance" xsi:noNamespaceSchemaLocation="song.xsd"> <TITLE>Hot Cop</TITLE> <COMPOSER>Jacques Morali</COMPOSER> <COMPOSER>Henri Belolo</COMPOSER> <COMPOSER>Victor Willis</COMPOSER> <PRODUCER>Jacques Morali</PRODUCER> <PUBLISHER>PolyGram Records</PUBLISHER> <LENGTH>P0YT6M20S</LENGTH> <YEAR>1978</YEAR> <ARTIST>Village People</ARTIST> </SONG>
Xerces doesn't get this one right yet!
Boolean
String
URIs
Numeric types
Time types
XML types
No money types. However, these can be derived
XML Schema Built-In Simple Types | ||
---|---|---|
Name | Type | Examples |
float | IEEE 754 32-bit floating point number | -INF, -1E4, -0, 0, 12.78E-2, 12, INF, NaN |
double | IEEE 754 64-bit floating point number | -INF, 1.401E-90, -1E4, -0, 0, 12.78E-2, 12, INF, NaN, 3.4E42 |
decimal | arbitrary precision, decimal numbers | -2.7E400, 5.7E-444, -3.1415292, 0, 7.8, 90200.76, 3.4E1024 |
binary | a binary number made up of zeroes and ones | 10000100111 |
integer | an arbitrarily large or small integer | -500000000000000000000000, -9223372036854775809, -126789, -1, 0, 1, 5, 23, 42, 126789, 9223372036854775808, 456734987324983264987362495809587095720978 |
nonPositiveInteger | an integer less than or equal to zero | 0, -1, -2, -3, -4, -5, ... |
negativeInteger | an integer strictly less than zero | -1, -2, -3, -4, -5, ... |
long | an eight-byte two's complement integer such as Java's
long type |
-9223372036854775808, -12678967543233, -1, 9223372036854775807 |
int | an integer that can be represented as a four-byte,
two's complement number such as Java's int type |
-2147483648, -1, 0, 1, 5, 23, 42, 2147483647 |
short | an integer that can be represented as a two-byte,
two's complement number such as Java's short type |
-32768, -1, 0, 1, 5, 23, 42, 32767 |
byte | an integer that can be represented as a one-byte,
two's complement number such as Java's byte type |
-128, -1, 0, 1, 5, 23, 42, 127 |
nonNegativeInteger | an integer greater than or equal to zero | 0, 1, 2, 3, 4, 5, ... |
unsignedLong | an eight-byte unsigned integer | 0, 1, 2, 3, 4, 5, ...18446744073709551614, 18446744073709551615 |
unsignedInt | a four-byte unsigned integer | 0, 1, 2, 3, 4, 5, ...4294967294, 4294967295 |
unsignedShort | a two-byte unsigned integer | 0, 1, 2, 3, 4, 5, ...65534, 65535 |
unsignedByte | a one-byte unsigned integer | 0, 1, 2, 3, 4, 5, ...254, 255 |
positiveInteger | an integer strictly greater than zero | 1, 2, 3, 4, 5, 6, ... |
XML Schema Built-In Simple Types | ||
---|---|---|
Name | Type | Examples |
timeInstant | a particular moment in Co-Ordinated Universal Time; up to an arbitrarily small fraction of a second | 1999-05-31T13:20:00.000-05:00 |
month | A given month in a given year | 2000-10 |
year | a given year | 2000 |
century | a specified century | 19 |
recurringDate | a date in no particular year, or rather in every year | --10-31 |
recurringDay | a day in no particular month, or rather in every mnonth | ----31 |
timeDuration | a length of time, without fixed endpoints, to an arbitrary fraction of a second | P2000Y10M31DT09H32M7.4312S |
date | a specific day in history | 2000-10-31 |
time | a specific time of day, that recurs every day | 14:30:00.000, 09:30:00.000-05:00 |
XML Schema Built-In Simple Types | ||
---|---|---|
Name | Type | Examples |
ID | XML 1.0 ID attribute type | any XML name that's unique among ID type attributes |
IDREF | XML 1.0 IDREF attribute type | any XML name that's used as an ID type attribute elsewhere in the document |
ENTITY | XML 1.0 ENTITY attribute type | any XML name that's declared as an unparsed entity in the DTD |
NOTATION | XML 1.0 NOTATION attribute type | any XML name that's declared as a notation name in the DTD |
language | valid values for xml:lang as defined in XML 1.0 | en-GB, en-US, fr |
IDREFS | XML 1.0 IDREFS attribute type | a white space separated list of IDREF names |
ENTITIES | XML 1.0 ENTITIES attribute type | a white space separated list of ENTITY names |
NMTOKEN | XML 1.0 NMTOKEN attribute type | 12 are you ready |
NMTOKENS | XML 1.0 NMTOKENS attribute type | a white space separated list of name tokens |
Name | An XML 1.0 Name | set, title, rdf, math, math123, href |
QName | a prefixed name | song:title |
NCName | a local name without any colons | title |
XML Schema Built-In Simple Types | ||
---|---|---|
Name | Type | Examples |
string | Parsed Character Data; #PCDATA | Hot Cop |
boolean | C++'s bool type | true, false, 1, 0 |
uriReference | relative or absolute URI | http://www.w3.org/TR/2000/WD-xmlschema-2-20000407/#timeDuration, /javafaq/reports/JCE1.2.1.html |
<?xml version="1.0"?> <SONG xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance" xsi:noNamespaceSchemaLocation="attribute_song.xsd"> <TITLE>Hot Cop</TITLE> <PHOTO ALT="Victor Willis in Cop Outfit" WIDTH="100" HEIGHT="200"/> <COMPOSER>Jacques Morali</COMPOSER> <COMPOSER>Henri Belolo</COMPOSER> <COMPOSER>Victor Willis</COMPOSER> <PRODUCER>Jacques Morali</PRODUCER> <PUBLISHER>PolyGram Records</PUBLISHER> <YEAR>1978</YEAR> <ARTIST>Village People</ARTIST> </SONG>
<xsd:schema xmlns:xsd="http://www.w3.org/2000/10/XMLSchema"> <xsd:element name="SONG" type="SongType"/> <!-- An empty element --> <xsd:complexType name="PhotoType"> <xsd:complexContent> <xsd:restriction base="xsd:anyType"> <xsd:attribute name="ALT" type="xsd:string"/> <xsd:attribute name="WIDTH" type="xsd:nonNegativeInteger"/> <xsd:attribute name="HEIGHT" type="xsd:nonNegativeInteger"/> </xsd:restriction> </xsd:complexContent> </xsd:complexType> <xsd:complexType name="SongType"> <xsd:element name="TITLE" type="xsd:string" minOccurs="1" maxOccurs="1"/> <xsd:element name="PHOTO" type="PhotoType" minOccurs="0" maxOccurs="1"/> <xsd:element name="COMPOSER" type="xsd:string" minOccurs="1" maxOccurs="unbounded"/> <xsd:element name="PRODUCER" type="xsd:string" minOccurs="0" maxOccurs="unbounded"/> <xsd:element name="PUBLISHER" type="xsd:string" minOccurs="0" maxOccurs="1"/> <xsd:element name="YEAR" type="xsd:year" minOccurs="1" maxOccurs="1"/> <xsd:element name="ARTIST" type="xsd:string" minOccurs="1" maxOccurs="unbounded"/> </xsd:complexType> </xsd:schema>
<?xml version="1.0"?> <SONG xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance" xsi:noNamespaceSchemaLocation="nested_song.xsd"> <TITLE>Hot Cop</TITLE> <PHOTO ALT="Victor Willis in Cop Outfit" WIDTH="100" HEIGHT="200"/> <COMPOSER> <NAME> <GIVEN>Jacques</GIVEN> <FAMILY>Morali</FAMILY> </NAME> </COMPOSER> <COMPOSER> <NAME> <GIVEN>Henri</GIVEN> <FAMILY>Belolo</FAMILY> </NAME> </COMPOSER> <COMPOSER> <NAME> <GIVEN>Victor</GIVEN> <FAMILY>Willis</FAMILY> </NAME> </COMPOSER> <PRODUCER> <NAME> <GIVEN>Jacques</GIVEN> <FAMILY>Morali</FAMILY> </NAME> </PRODUCER> <PUBLISHER>PolyGram Records</PUBLISHER> <YEAR>1978</YEAR> <ARTIST>Village People</ARTIST> </SONG>
<xsd:schema xmlns:xsd="http://www.w3.org/2000/10/XMLSchema"> <xsd:element name="SONG" type="SongType"/> <xsd:complexType name="ComposerType"> <xsd:element name="NAME"> <xsd:complexType> <xsd:element name="GIVEN" type="xsd:string" minOccurs="1" maxOccurs="1"/> <xsd:element name="FAMILY" type="xsd:string" minOccurs="1" maxOccurs="1"/> </xsd:complexType> </xsd:element> </xsd:complexType> <xsd:complexType name="ProducerType"> <xsd:element name="NAME"> <xsd:complexType> <xsd:element name="GIVEN" type="xsd:string" minOccurs="1" maxOccurs="1"/> <xsd:element name="FAMILY" type="xsd:string" minOccurs="1" maxOccurs="1"/> </xsd:complexType> </xsd:element> </xsd:complexType> <xsd:complexType name="PhotoType"> <xsd:complexContent> <xsd:restriction base="xsd:anyType"> <xsd:attribute name="ALT" type="xsd:string"/> <xsd:attribute name="WIDTH" type="xsd:nonNegativeInteger"/> <xsd:attribute name="HEIGHT" type="xsd:nonNegativeInteger"/> </xsd:restriction> </xsd:complexContent> </xsd:complexType> <xsd:complexType name="SongType"> <xsd:element name="TITLE" type="xsd:string" minOccurs="1" maxOccurs="1"/> <xsd:element name="PHOTO" type="PhotoType" minOccurs="0" maxOccurs="1"/> <xsd:element name="COMPOSER" type="ComposerType" minOccurs="1" maxOccurs="unbounded"/> <xsd:element name="PRODUCER" type="ProducerType" minOccurs="0" maxOccurs="unbounded"/> <xsd:element name="PUBLISHER" type="xsd:string" minOccurs="0" maxOccurs="1"/> <xsd:element name="YEAR" type="xsd:year" minOccurs="1" maxOccurs="1"/> <xsd:element name="ARTIST" type="xsd:string" minOccurs="1" maxOccurs="unbounded"/> </xsd:complexType> </xsd:schema>
PRODUCER
and COMPOSER
are
really the same type.
<xsd:schema xmlns:xsd="http://www.w3.org/2000/10/XMLSchema"> <xsd:element name="SONG" type="SongType"/> <xsd:complexType name="PersonType"> <xsd:element name="NAME"> <xsd:complexType> <xsd:element name="GIVEN" type="xsd:string" minOccurs="1" maxOccurs="1"/> <xsd:element name="FAMILY" type="xsd:string" minOccurs="1" maxOccurs="1"/> </xsd:complexType> </xsd:element> </xsd:complexType> <xsd:complexType name="SongType"> <xsd:element name="TITLE" type="xsd:string" minOccurs="1" maxOccurs="1"/> <xsd:element name="PHOTO" type="PhotoType" minOccurs="0" maxOccurs="1"/> <xsd:element name="COMPOSER" type="PersonType" minOccurs="1" maxOccurs="unbounded"/> <xsd:element name="PRODUCER" type="PersonType" minOccurs="0" maxOccurs="unbounded"/> <xsd:element name="PUBLISHER" type="xsd:string" minOccurs="0" maxOccurs="1"/> <xsd:element name="YEAR" type="xsd:year" minOccurs="1" maxOccurs="1"/> <xsd:element name="ARTIST" type="xsd:string" minOccurs="1" maxOccurs="unbounded"/> </xsd:complexType> <xsd:complexType name="PhotoType"> <xsd:complexContent> <xsd:restriction base="xsd:anyType"> <xsd:attribute name="ALT" type="xsd:string"/> <xsd:attribute name="WIDTH" type="xsd:nonNegativeInteger"/> <xsd:attribute name="HEIGHT" type="xsd:nonNegativeInteger"/> </xsd:restriction> </xsd:complexContent> </xsd:complexType> </xsd:schema>
Schemas let you enforce order and appearance of elements in mixed content.
<?xml version="1.0"?> <SONG xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance" xsi:noNamespaceSchemaLocation="mixed_song.xsd"> <TITLE>Hot Cop</TITLE> <PHOTO ALT="Victor Willis in Cop Outfit" WIDTH="100" HEIGHT="200"/> <COMPOSER> <NAME>Mr. <GIVEN>Jacques</GIVEN> <FAMILY>Morali</FAMILY> Esq.</NAME> </COMPOSER> <COMPOSER> <NAME>Mr. <GIVEN>Henri</GIVEN> L. <FAMILY>Belolo</FAMILY>, M.D.</NAME> </COMPOSER> <COMPOSER> <NAME>Mr. <GIVEN>Victor</GIVEN> C. <FAMILY>Willis</FAMILY></NAME> </COMPOSER> <PRODUCER> <NAME>Mr. <GIVEN>Jacques</GIVEN> S. <FAMILY>Morali</FAMILY></NAME> </PRODUCER> <PUBLISHER>PolyGram Records</PUBLISHER> <YEAR>1978</YEAR> <ARTIST>Village People</ARTIST> </SONG>
<xsd:schema xmlns:xsd="http://www.w3.org/2000/10/XMLSchema"> <xsd:element name="SONG" type="SongType"/> <xsd:complexType name="PersonType"> <xsd:element name="NAME"> <xsd:complexType content="mixed"> <xsd:element name="GIVEN" type="xsd:string" minOccurs="1" maxOccurs="1"/> <xsd:element name="FAMILY" type="xsd:string" minOccurs="1" maxOccurs="1"/> </xsd:complexType> </xsd:element> </xsd:complexType> <xsd:complexType name="SongType" content="elementOnly"> <xsd:element name="TITLE" type="xsd:string" minOccurs="1" maxOccurs="1"/> <xsd:element name="PHOTO" type="PhotoType" minOccurs="0" maxOccurs="1"/> <xsd:element name="COMPOSER" type="PersonType" minOccurs="1" maxOccurs="unbounded"/> <xsd:element name="PRODUCER" type="PersonType" minOccurs="0" maxOccurs="unbounded"/> <xsd:element name="PUBLISHER" type="xsd:string" minOccurs="0" maxOccurs="1"/> <xsd:element name="YEAR" type="xsd:year" minOccurs="1" maxOccurs="1"/> <xsd:element name="ARTIST" type="xsd:string" minOccurs="1" maxOccurs="unbounded"/> </xsd:complexType> <xsd:complexType name="PhotoType"> <xsd:complexContent> <xsd:restriction base="xsd:anyType"> <xsd:attribute name="ALT" type="xsd:string"/> <xsd:attribute name="WIDTH" type="xsd:nonNegativeInteger"/> <xsd:attribute name="HEIGHT" type="xsd:nonNegativeInteger"/> </xsd:restriction> </xsd:complexContent> </xsd:complexType> </xsd:schema>
<?xml version="1.0"?> <SONG xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance" xsi:noNamespaceSchemaLocation="unordered_song.xsd"> <TITLE>Hot Cop</TITLE> <PHOTO ALT="Victor Willis in Cop Outfit" WIDTH="100" HEIGHT="200"/> <COMPOSER> <NAME><FAMILY>Morali</FAMILY> <GIVEN>Jacques</GIVEN></NAME> </COMPOSER> <COMPOSER> <NAME><GIVEN>Henri</GIVEN> <FAMILY>Belolo</FAMILY></NAME> </COMPOSER> <COMPOSER> <NAME><FAMILY>Willis</FAMILY> <GIVEN>Victor</GIVEN></NAME> </COMPOSER> <PRODUCER> <NAME><GIVEN>Jacques</GIVEN> <FAMILY>Morali</FAMILY></NAME> </PRODUCER> <PUBLISHER>PolyGram Records</PUBLISHER> <YEAR>1978</YEAR> <ARTIST>Village People</ARTIST> </SONG>
Each element in the xsd:all
group must occur zero or once; that is
minOccurs
and maxOccurs
must each be 0 or 1
The xsd:all
group must be the top level element of its type
The xsd:all
group may contain only individual element declarations;
no choice or sequences
<xsd:schema xmlns:xsd="http://www.w3.org/2000/10/XMLSchema"> <xsd:element name="SONG" type="SongType"/> <xsd:complexType name="PersonType"> <xsd:element name="NAME"> <xsd:complexType> <xsd:all> <xsd:element name="GIVEN" type="xsd:string" minOccurs="1" maxOccurs="1"/> <xsd:element name="FAMILY" type="xsd:string" minOccurs="1" maxOccurs="1"/> </xsd:all> </xsd:complexType> </xsd:element> </xsd:complexType> <xsd:complexType name="SongType"> <xsd:element name="TITLE" type="xsd:string" minOccurs="1" maxOccurs="1"/> <xsd:element name="PHOTO" type="PhotoType" minOccurs="0" maxOccurs="1"/> <xsd:element name="COMPOSER" type="PersonType" minOccurs="1" maxOccurs="unbounded"/> <xsd:element name="PRODUCER" type="PersonType" minOccurs="0" maxOccurs="unbounded"/> <xsd:element name="PUBLISHER" type="xsd:string" minOccurs="0" maxOccurs="1"/> <xsd:element name="YEAR" type="xsd:year" minOccurs="1" maxOccurs="1"/> <xsd:element name="ARTIST" type="xsd:string" minOccurs="1" maxOccurs="unbounded"/> </xsd:complexType> <!-- An empty element --> <xsd:complexType name="PhotoType" content="empty"> <xsd:complexContent> <xsd:restriction base="xsd:anyType"> <xsd:attribute name="ALT" type="xsd:string"/> <xsd:attribute name="WIDTH" type="xsd:nonNegativeInteger"/> <xsd:attribute name="HEIGHT" type="xsd:nonNegativeInteger"/> </xsd:restriction> </xsd:complexContent> </xsd:complexType> </xsd:schema>
xsd:choice
requires exactly one of a group
of specified elements to appear
The choice can have
minOccurs
and maxOccurs
attributes
that adjust this from zero to any given number.
xsd:sequence
requires each child element it specifies
to appear in the specified order
The sequence can have
minOccurs
and maxOccurs
attributes
that repeat each sequence zero to any given number of times.
<?xml version="1.0"?> <SONG xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance" xsi:noNamespaceSchemaLocation="derived_song.xsd"> <TITLE>Hot Cop</TITLE> <PHOTO ALT="Victor Willis in Cop Outfit" WIDTH="100" HEIGHT="200"/> <COMPOSER> <NAME><FAMILY>Morali</FAMILY> <GIVEN>Jacques</GIVEN></NAME> </COMPOSER> <COMPOSER> <NAME><GIVEN>Henri</GIVEN> <FAMILY>Belolo</FAMILY></NAME> </COMPOSER> <COMPOSER> <NAME><FAMILY>Willis</FAMILY> <GIVEN>Victor</GIVEN></NAME> </COMPOSER> <PRODUCER> <NAME><GIVEN>Jacques</GIVEN> <FAMILY>Morali</FAMILY></NAME> </PRODUCER> <PUBLISHER>PolyGram Records</PUBLISHER> <YEAR>1978</YEAR> <ARTIST>Village People</ARTIST> <PRICE>$1.35</PRICE> </SONG>
Suppose you want a money type to specify that the PRICE
element content must look like $1.35 or ¥11000
Derive this from the xsd:string
type by restriction
Use a regular expression to specify the pattern
More or less Perl-like with some Unicode extensions
The money regular expression:
\p{Sc}\p{Nd}+(\.\p{Nd}\p{Nd})?
\p{Sc}
\p{Nd}
\p{Nd}+
\.
(\.\p{Nd}\p{Nd})
(\.\p{Nd}\p{Nd})?
The base
attribute specifies the type it's derived from
The name
attribute specifies the name it will be referred to as
The pattern
child element imposes the restriction
<xsd:simpleType base="xsd:string" name="money">
<xsd:pattern value="\p{Sc}\p{Nd}+(\.\p{Nd}\p{Nd})?"/>
</xsd:simpleType>
<xsd:schema xmlns:xsd="http://www.w3.org/2000/10/XMLSchema"> <xsd:element name="SONG" type="SongType"/> <xsd:simpleType base="xsd:string" name="money"> <xsd:pattern value="\p{Sc}\p{Nd}+(\.\p{Nd}\p{Nd})?"/> <!-- Regular Expression: \p{Sc} Any Unicode currency indicator; e.g. $, ¥, £, &#A4, etc. \p{Nd} A Unicode decimal digit character \p{Nd}+ One or more Unicode decimal digit characters \. The period character (\.\p{Nd}\p{Nd}) (\.\p{Nd}\p{Nd})? Zero or one strings of the form .35 This works for any decimalized currency. --> </xsd:simpleType> <xsd:complexType name="SongType"> <xsd:element name="TITLE" type="xsd:string" minOccurs="1" maxOccurs="1"/> <xsd:element name="PHOTO" type="PhotoType" minOccurs="0" maxOccurs="1"/> <xsd:element name="COMPOSER" type="PersonType" minOccurs="1" maxOccurs="unbounded"/> <xsd:element name="PRODUCER" type="PersonType" minOccurs="0" maxOccurs="unbounded"/> <xsd:element name="PUBLISHER" type="xsd:string" minOccurs="0" maxOccurs="1"/> <xsd:element name="YEAR" type="xsd:year" minOccurs="1" maxOccurs="1"/> <xsd:element name="ARTIST" type="xsd:string" minOccurs="1" maxOccurs="unbounded"/> <xsd:element name="PRICE" type="money" minOccurs="1" maxOccurs="1"/> </xsd:complexType> <xsd:complexType name="PersonType"> <xsd:element name="NAME"> <xsd:complexType> <xsd:all> <xsd:element name="GIVEN" type="xsd:string" minOccurs="1" maxOccurs="1"/> <xsd:element name="FAMILY" type="xsd:string" minOccurs="1" maxOccurs="1"/> </xsd:all> </xsd:complexType> </xsd:element> </xsd:complexType> <xsd:complexType name="PhotoType"> <xsd:complexContent> <xsd:restriction base="xsd:anyType"> <xsd:attribute name="ALT" type="xsd:string"/> <xsd:attribute name="WIDTH" type="xsd:nonNegativeInteger"/> <xsd:attribute name="HEIGHT" type="xsd:nonNegativeInteger"/> </xsd:restriction> </xsd:complexContent> </xsd:complexType> </xsd:schema>
<?xml version="1.0"?> <GREETING xmlns="http://ibiblio.org/xml/schemas/greeting/" xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance" xsi:schemaLocation="http://ibiblio.org/xml/schemas/greeting/ greeting_defaultNS.xsd"> Hello XML! </GREETING>
<xsd:schema xmlns:xsd="http://www.w3.org/2000/10/XMLSchema" targetNamespace="http://ibiblio.org/xml/schemas/greeting/" > <xsd:element name="GREETING" type="xsd:string"/> </xsd:schema>
<?xml version="1.0" encoding="UTF-8" standalone="no"?> <SONG xmlns="http://ibiblio.org/xml/namespace/song" xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance" xsi:schemaLocation = "http://ibiblio.org/xml/namespace/song namespace_song.xsd" > <TITLE>Hot Cop</TITLE> <PHOTO ALT="Victor Willis in Cop Outfit" WIDTH="100" HEIGHT="200"/> <COMPOSER>Jacques Morali</COMPOSER> <COMPOSER>Henri Belolo</COMPOSER> <COMPOSER>Victor Willis</COMPOSER> <PRODUCER>Jacques Morali</PRODUCER> <PUBLISHER>PolyGram Records</PUBLISHER> <YEAR>1978</YEAR> <ARTIST>Village People</ARTIST> </SONG>
<xsd:schema xmlns:xsd="http://www.w3.org/2000/10/XMLSchema" xmlns="http://ibiblio.org/xml/namespace/song" targetNamespace="http://ibiblio.org/xml/namespace/song" elementFormDefault="qualified" attributeFormDefault="unqualified" > <xsd:element name="SONG" type="SongType"/> <xsd:complexType name="SongType"> <xsd:element name="TITLE" type="xsd:string" minOccurs="1" maxOccurs="1"/> <xsd:element name="PHOTO" type="PhotoType" minOccurs="0" maxOccurs="1"/> <xsd:element name="COMPOSER" type="xsd:string" minOccurs="1" maxOccurs="unbounded"/> <xsd:element name="PRODUCER" type="xsd:string" minOccurs="0" maxOccurs="unbounded"/> <xsd:element name="PUBLISHER" type="xsd:string" minOccurs="0" maxOccurs="1"/> <xsd:element name="YEAR" type="xsd:year" minOccurs="1" maxOccurs="1"/> <xsd:element name="ARTIST" type="xsd:string" minOccurs="0" maxOccurs="unbounded"/> </xsd:complexType> <xsd:complexType name="PhotoType"> <xsd:complexContent> <xsd:restriction base="xsd:anyType"> <xsd:attribute name="ALT" type="xsd:string"/> <xsd:attribute name="WIDTH" type="xsd:nonNegativeInteger"/> <xsd:attribute name="HEIGHT" type="xsd:nonNegativeInteger"/> </xsd:restriction> </xsd:complexContent> </xsd:complexType> </xsd:schema>
<?xml version="1.0" encoding="UTF-8" standalone="no"?> <SONG xmlns="http://ibiblio.org/xml/namespace/song" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance" xsi:schemaLocation = "http://ibiblio.org/xml/namespace/song namespace_song.xsd http://www.w3.org/1999/xlink xlink.xsd" > <TITLE>Hot Cop</TITLE> <PHOTO xlink:type="simple" xlink:href="hotcop.jpg" ALT="Victor Willis in Cop Outfit" WIDTH="100" HEIGHT="200"/> <COMPOSER>Jacques Morali</COMPOSER> <COMPOSER>Henri Belolo</COMPOSER> <COMPOSER>Victor Willis</COMPOSER> <PRODUCER>Jacques Morali</PRODUCER> <PUBLISHER>PolyGram Records</PUBLISHER> <YEAR>1978</YEAR> <ARTIST>Village People</ARTIST> </SONG>
<xsd:schema xmlns:xsd="http://www.w3.org/2000/10/XMLSchema" xmlns:xlink="http://www.w3.org/1999/xlink" targetNamespace="http://www.w3.org/1999/xlink" attributeFormDefault="qualified" > <xsd:attributeGroup name="XLinkAttributes"> <!-- should make this fixed and provide the default value simple???? --> <xsd:attribute name="xlink:type" type="xsd:string"/> <xsd:attribute name="xlink:href" type="xsd:uriReference"/> </xsd:attributeGroup> </xsd:schema>
<xsd:schema xmlns:xsd="http://www.w3.org/2000/10/XMLSchema" xmlns="http://ibiblio.org/xml/namespace/song" xmlns:xlink="http://www.w3.org/1999/xlink" targetNamespace="http://ibiblio.org/xml/namespace/song" elementFormDefault="qualified" attributeFormDefault="unqualified" > <xsd:import namespace="http://www.w3.org/1999/xlink" schemaLocation="xlink.xsd"/> <xsd:complexType name="PhotoType"> <xsd:complexContent> <xsd:restriction base="xsd:anyType"> <xsd:attributeGroup ref="XLinkAttributes"/> <xsd:attribute name="ALT" type="xsd:string"/> <xsd:attribute name="WIDTH" type="xsd:nonNegativeInteger"/> <xsd:attribute name="HEIGHT" type="xsd:nonNegativeInteger"/> </xsd:restriction> </xsd:complexContent> </xsd:complexType> <xsd:element name="SONG" type="SongType"/> <xsd:complexType name="SongType"> <xsd:element name="TITLE" type="xsd:string" minOccurs="1" maxOccurs="1"/> <xsd:element name="PHOTO" type="PhotoType" minOccurs="0" maxOccurs="1"/> <xsd:element name="COMPOSER" type="xsd:string" minOccurs="1" maxOccurs="unbounded"/> <xsd:element name="PRODUCER" type="xsd:string" minOccurs="0" maxOccurs="unbounded"/> <xsd:element name="PUBLISHER" type="xsd:string" minOccurs="0" maxOccurs="1"/> <xsd:element name="YEAR" type="xsd:year" minOccurs="1" maxOccurs="1"/> <xsd:element name="ARTIST" type="xsd:string" minOccurs="0" maxOccurs="unbounded"/> </xsd:complexType> </xsd:schema>
The top-level xsd:annotation
element describes the schema
Its xsd:documentation
child element describes the schema
for human readers
Its xsd:appInfo
child element describes the schema
for computer programs; e.g. stylesheet instructions
<xsd:annotation>
<xsd:documentation>
Song schema for XML and Java Example at SDExpo 2000 East
Copyright 2000 Elliotte Rusty Harold.
</xsd:documentation>
</xsd:annotation>
Cannot declare entities
Parent models
Extra-document validation
Rick Jelliffe's Schematron
Murato Makoto's RELAX
DTDs
Rick Jelliffe's Schematron:
The Schematron differs in basic concept from other schema languages in that it not based on grammars but on finding tree patterns in the parsed document. This approach allows many kinds of structures to be represented which are inconvenient and difficult in grammar-based schema languages.
XSLT/XPath Based
Murato Makoto
JIS standard
Uses W3C Schema data types
This presentation: http://www.ibiblio.org/xml/slides/sd2000east/advancedxml/
W3C Schema Primer: http://www.w3.org/TR/xmlschema-0/
XML Schema Part 1: Structures: http://www.w3.org/TR/xmlschema-1/
XML Schema Part 2: Datatypes: http://www.w3.org/TR/xmlschema-2/
Once you've tasted XLink's Chunky Monkey, it's hard to reconcile yourself to HTML's vanilla.--John E. Simpson on the xsl-list mailing list
Linking in XML is divided into three parts, XLinks, XPointer, and XPath.
XLinks
XPointers
XPath
XLink, the XML Linking Language, defines how one document links to another document. XPointer, the XML Pointer Language, defines how individual parts of a document are addressed. XPath is a syntax used in XPointers for identifying particular nodes in an XML document's tree.
An XLink points to a URI (in practice, a URL) that specifies a particular resource. This URL may include an XPointer part that more specifically identifies the desired part or section of the targeted resource or document. XPointers use the XPath syntax shared with XSL to identify particular elements in the document tree.
This talk covers:
XPointers: June 7, 2000 Candidate Recommendation
The Web conquered gopher for one reason: HTML made it possible to embed hypertext links in documents.
HTML linking has limits
You can only link to one document at a time
You must link to the entire document.
Once the link is traversed the trail of where you've been is lost.
Designed especially for use with XML
Multidirectional
Any element can be a link, not just <A>
Can link to arbitrary positions in the document
Currently, there are no general-purpose applications that support arbitrary XLinks. That's because XLinks have a much broader base of applicability than HTML links. XLinks are not just used for hypertext connections and embedding images in documents. They can be used by any custom application that needs to establish connections between documents and parts of documents, for any reason. Thus, even when XLinks are fully implemented in browsers they may not always be blue underlined text that you click to jump to another page. They can be that, but they can also be both more and less, depending on your needs.
Any element can be a link
Linking elements are identified by an xlink:type
attribute with
one of these six values:
simple
extended
locator
arc
resource
title
Each linking element contains an xlink:href
attribute whose
value is the URI of the resource being linked to.
An xmlns:xlink
attribute associates the xlink
prefix with the http://www.w3.org/1999/xlink namespace.
<FOOTNOTE xmlns:xlink="http://www.w3.org/1999/xlink"
xlink:type="simple"
xlink:href="footnote7.xml">7</FOOTNOTE>
<COMPOSER xmlns:xlink="http://www.w3.org/1999/xlink"
xlink:type="simple"
xlink:href="http://www.interport.net/~beand/">
Beth Anderson
</COMPOSER>
<IMAGE xmlns:xlink="http://www.w3.org/1999/xlink"
xlink:type="simple" xlink:href="logo.gif"/>
<!ELEMENT FOOTNOTE (#PCDATA)>
<!ATTLIST FOOTNOTE
xmlns:xlink CDATA #FIXED "http://www.w3.org/1999/xlink"
xlink:type CDATA #FIXED "simple"
xlink:href CDATA #REQUIRED
>
<!ELEMENT COMPOSER (#PCDATA)>
<!ATTLIST COMPOSER
xmlns:xlink CDATA #FIXED "http://www.w3.org/1999/xlink"
xlink:type CDATA #FIXED "simple"
xlink:href CDATA #REQUIRED
>
<!ELEMENT IMAGE EMPTY>
<!ATTLIST IMAGE
xmlns:xlink CDATA #FIXED "http://www.w3.org/1999/xlink"
xlink:type CDATA #FIXED "simple"
xlink:href CDATA #REQUIRED
>
<FOOTNOTE xlink:href="footnote7.xml">7</FOOTNOTE>
<COMPOSER xlink:href="http://www.interport.net/~beand/">
Beth Anderson
</COMPOSER>
<IMAGE xlink:href="logo.gif"/>
<!ELEMENT IMAGE EMPTY>
<!ATTLIST IMAGE
xmlns:xlink CDATA #FIXED "http://www.w3.org/1999/xlink"
xlink:type CDATA #FIXED "simple"
xlink:href CDATA #REQUIRED
ALT CDATA #REQUIRED
HEIGHT CDATA #REQUIRED
WIDTH CDATA #REQUIRED
>
A link element may contain optional
xlink:role
and xlink:title
attributes that describe the remote resource, that is, the
document or other resource to which the link points
The title contains a short plain text description.
The role contains a URI Pointing to a long description.
<AUTHOR
xmlns:xlink="http://www.w3.org/1999/xlink"
xlink:href="http://www.macfaq.com/personal.html"
xlink:title="Elliotte Rusty Harold's personal home page"
xlink:role="http://www.macfaq.com/about.html"
</AUTHOR>
As with all other attributes, the
xlink:title
and xlink:role
attributes should be declared in the DTD for all the
elements to which they belong. For example, this is a
reasonable declaration for the above AUTHOR
element:
<!ELEMENT AUTHOR (#PCDATA)>
<!ATTLIST AUTHOR
xmlns:xlink CDATA #FIXED "http://www.w3.org/1999/xlink"
xlink:type CDATA #FIXED "simple"
xlink:href CDATA #REQUIRED
xlink:title CDATA #IMPLIED
xlink:role CDATA #IMPLIED
>
Linking elements can contain two more optional attributes that suggest to applications how the remote resource is associated with the current page. These are:
xlink:show
suggests
where the content should be displayed when
the link is activated
xlink:actuate
suggests whether the link should be traversed
automatically or whether a specific user request is required
These are application dependent, however, and applications are free to ignore the suggestions.
The xlink:show
attribute has five predefined values:
replace
new
embed
other
none
Like all attributes in valid documents, the
xlink:show
attribute must be declared in a
<!ATTLIST>
declaration for the DTD's link
element. For example:
<!ELEMENT WEBSITE (#PCDATA)>
<!ATTLIST WEBSITE
xmlns:xlink CDATA #FIXED "http://www.w3.org/1999/xlink"
xlink:type CDATA #FIXED "simple"
xlink:href CDATA #REQUIRED
xlink:show (new | replace | embed) #IMPLIED "replace"
>
A linking element's xlink:actuate
attribute has
four predefined
values:
onRequest
onLoad
other
none
<IMAGE
xmlns:xlink="http://www.w3.org/1999/xlink"
xlink:type="simple" xlink:href="logo.gif"
xlink:actuate="onLoad"/>
Like all attributes in valid documents, the
actuate
attribute must be declared in the DTD
in a <!ATTLIST>
declaration for the link
elements in which it appears. For example:
<!ELEMENT IMAGE EMPTY>
<!ATTLIST IMAGE
xmlns:xlink CDATA #FIXED "http://www.w3.org/1999/xlink"
xlink:type CDATA #FIXED "simple"
xlink:href CDATA #REQUIRED
xlink:show (new | replace | embed) #IMPLIED "embed"
xlink:actuate (onRequest | onLoad) #IMPLIED "onLoad"
>
<!ENTITY % link-attributes
"xlink:type CDATA #FIXED 'simple'
xlink:role CDATA #IMPLIED
xlink:title CDATA #IMPLIED
xmlns:xlink CDATA #FIXED 'http://www.w3.org/1999/xlink'
xlink:href CDATA #REQUIRED
xlink:show (new | replace | embed) #IMPLIED 'replace'
xlink:actuate (onRequest | onLoad) #IMPLIED 'onRequest'"
>
<!ELEMENT COMPOSER (#PCDATA)>
<!ATTLIST COMPOSER
%link-attributes;
>
<!ELEMENT AUTHOR (#PCDATA)>
<!ATTLIST AUTHOR
%link-attributes;
>
<!ELEMENT WEBSITE (#PCDATA)>
<!ATTLIST WEBSITE
%link-attributes;
>
Simple links are very similar to HTML links, one-directional, one-element-to-one-document links
Extended links are multi-directional, many-to-many links
An extended link is a list of nodes and a list of the connections between them
An extended link is included in an XML document as an element of some arbitrary
type like COMPOSER
or TEAM
that has an
xlink:type
attribute with the value extended.
<WEBSITE xmlns:xlink="http://www.w3.org/1999/xlink"
xlink:type="extended">
...
</WEBSITE>
Extended links generally point to more than one target and from more than one source. Both sources and targets are called by the more generic word resource.
Resources are divided into remote resources and local resources.
A local resource is actually contained
inside the extended link element. It is enclosed in element of
arbitrary type that has an
xlink:type
attribute with the value
resource
.
A remote resource exists outside the extended link element, very possibly in
another document. The extended link element contains locator child elements that
point to the remote resource. These are elements with any name that have an
xlink:type
attribute with the value locator
.
Each locator element has an
xlink:href
attribute whose value is
a URI locating the remote resource.
<WEBSITE xmlns:xlink="http://www.w3.org/1999/xlink"
xlink:type="extended">
<NAME xlink:type="resource">Cafe au Lait</NAME>
<HOMESITE xlink:type="locator"
xlink:href="http://ibiblio.org/javafaq/"/>
<MIRROR xlink:type="locator"
xlink:href="http://sunsite.kth.se/javafaq"/>
<MIRROR xlink:type="locator"
xlink:href="http://sunsite.informatik.rwth-aachen.de/javafaq/"/>
<MIRROR xlink:type="locator"
xlink:href="http://sunsite.cnlab-switch.ch/javafaq/"/>
</WEBSITE>
This WEBSITE
element describes an extended link with five resources:
The text "Cafe au Lait", a local resource
The document at http://ibiblio.org/javafaq/, a remote resource
The document at http://sunsite.kth.se/javafaq, a remote resource
The document at http://sunsite.informatik.rwth-aachen.de/javafaq/, a remote resource
The document at http://sunsite.cnlab-switch.ch/javafaq/, a remote resource
Since one of the resources referenced by this extended link is contained in the extended link, it is called an inline link. It will be included as part of one of the documents it connects.
This picture shows the WEBSITE
extended
link element and five resources, one of which WEBSITE
contains,
the other four of which are referred to by URLs. However, this just
describes these resources. No connections are implied between them.
Both the extended link element itself and the individual
locator children may have descriptive attributes such as
xlink:role
and xlink:title
.
The
xlink:role
and xlink:title
attributes
of the extended link element provide default roles and titles
for each of the individual locator child elements.
Individual resource and
locator elements may override these defaults with
xlink:role
and xlink:title
attributes
of their own.
<WEBSITE xmlns:xlink="http://www.w3.org/1999/xlink"
xlink:type="extended" xlink:title="Cafe au Lait">
<NAME xlink:type="resource"
xlink:role="http://ibiblio.org/javafaq/">
Cafe au Lait
</NAME>
<HOMESITE xlink:type="locator"
xlink:href="http://ibiblio.org/javafaq/"
xlink:role="http://ibiblio.org/"/>
<MIRROR xlink:type="locator"
xlink:title="Cafe au Lait Swedish Mirror"
xlink:role="http://sunsite.kth.se/"
xlink:href="http://sunsite.kth.se/javafaq"/>
<MIRROR xlink:type="locator"
xlink:title="Cafe au Lait German Mirror"
xlink:role="http://sunsite.informatik.rwth-aachen.de/"
xlink:href=
"http://sunsite.informatik.rwth-aachen.de/javafaq/"/>
<MIRROR xlink:type="locator"
xlink:title="Cafe au Lait Swiss Mirror"
xlink:role="http://sunsite.cnlab-switch.ch/"
xlink:href="http://sunsite.cnlab-switch.ch/javafaq/"/>
</WEBSITE>
<!ELEMENT WEBSITE (NAME, HOMESITE, MIRROR*) >
<!ATTLIST WEBSITE
xmlns:xlink CDATA #FIXED "http://www.w3.org/1999/xlink"
xlink:type (extended) #FIXED "extended"
xlink:title CDATA #IMPLIED
xlink:role CDATA #IMPLIED
>
<!ELEMENT NAME (#PCDATA)>
<!ATTLIST NAME
xlink:type (resource) #FIXED "resource"
xlink:role CDATA #IMPLIED
xlink:title CDATA #IMPLIED
>
<!ELEMENT HOMESITE (#PCDATA)>
<!ATTLIST HOMESITE
xlink:type (locator) #FIXED "locator"
xlink:href CDATA #REQUIRED
xlink:role CDATA #IMPLIED
xlink:title CDATA #IMPLIED
>
<!ELEMENT MIRROR (#PCDATA)>
<!ATTLIST MIRROR
xlink:type (locator) #FIXED "locator"
xlink:href CDATA #REQUIRED
xlink:role CDATA #IMPLIED
xlink:title CDATA #IMPLIED
>
<!ENTITY % extended.att
"xlink:type CDATA #FIXED 'extended'
xmlns:xlink CDATA #FIXED 'http://www.w3.org/1999/xlink'
xlink:role CDATA #IMPLIED
xlink:title CDATA #IMPLIED"
>
<!ENTITY % resource.att
"xlink:type (resource) #FIXED 'resource'
xlink:href CDATA #REQUIRED
xlink:role CDATA #IMPLIED
xlink:title CDATA #IMPLIED"
>
<!ENTITY % locator.att
"xlink:type (locator) #FIXED 'locator'
xlink:href CDATA #REQUIRED
xlink:role CDATA #IMPLIED
xlink:title CDATA #IMPLIED"
>
<!ELEMENT WEBSITE (HOMESITE, MIRROR*) >
<!ATTLIST WEBSITE
%extended.att;
>
<!ELEMENT NAME (#PCDATA)>
<!ATTLIST NAME
%resource.att;
>
<!ELEMENT HOMESITE (#PCDATA)>
<!ATTLIST HOMESITE
%locator.att;
>
<!ELEMENT MIRROR (#PCDATA)>
<!ATTLIST MIRROR
%locator.att;
>
In an extended link with three resources, A, B, and C; there are nine different possible traversals.
These potential traversals are called arcs
Arcs are represented in XML by elements
that have an xlink:type
attribute with the value arc
.
Traversal rules are defined by
attaching xlink:actuate
and xlink:show
attributes to arc elements.
An arc element has an xlink:from
attribute and an
xlink:to
attribute.
These attributes match the xlink:label
attributes of the locator
element in the extended link from which traversal is initiated and to which the
traversal goes.
<?xml version="1.0"?>
<WEBSITE xmlns:xlink="http://www.w3.org/1999/xlink"
xlink:type="extended" xlink:title="Cafe au Lait">
<NAME xlink:type="resource" xlink:label="source">
Cafe au Lait
</NAME>
<HOMESITE xlink:type="locator"
xlink:href="http://ibiblio.org/javafaq/"
xlink:label="us"/>
<MIRROR xlink:type="locator"
xlink:title="Cafe au Lait Swedish Mirror"
xlink:label="se"
xlink:href="http://sunsite.kth.se/javafaq"/>
<MIRROR xlink:type="locator"
xlink:title="Cafe au Lait German Mirror"
xlink:label="de"
xlink:href="http://sunsite.informatik.rwth-aachen.de/javafaq/"/>
<MIRROR xlink:type="locator"
xlink:title="Cafe au Lait Swiss Mirror"
xlink:label="ch"
xlink:href="http://sunsite.cnlab-switch.ch/javafaq/"/>
<CONNECTION xlink:type="arc" xlink:from="source"
xlink:to="ch" xlink:show="replace"
xlink:actuate="onRequest"/>
<CONNECTION xlink:type="arc" xlink:from="source"
xlink:to="us" xlink:show="replace"
xlink:actuate="onRequest"/>
<CONNECTION xlink:type="arc" xlink:from="source"
xlink:to="se" xlink:show="replace"
xlink:actuate="onRequest"/>
<CONNECTION xlink:type="arc" xlink:from="source"
xlink:to="sk" xlink:show="replace"
xlink:actuate="onRequest"/>
</WEBSITE>
<WEBSITE xmlns:xlink="http://www.w3.org/1999/xlink"
xlink:type="extended" xlink:title="Cafe au Lait">
<NAME xlink:type="resource" xlink:label="source">
Cafe au Lait
</NAME>
<HOMESITE xlink:type="locator"
xlink:href="http://ibiblio.org/javafaq/"
xlink:label="us"/>
<MIRROR xlink:type="locator"
xlink:title="Cafe au Lait Swedish Mirror"
xlink:label="se"
xlink:href="http://sunsite.kth.se/javafaq"/>
<MIRROR xlink:type="locator"
xlink:title="Cafe au Lait German Mirror"
xlink:label="sk"
xlink:href="http://sunsite.informatik.rwth-aachen.de/javafaq/"/>
<MIRROR xlink:type="locator"
xlink:title="Cafe au Lait Swiss Mirror"
xlink:label="ch"
xlink:href="http://sunsite.cnlab-switch.ch/javafaq/"/>
<CONNECTION xlink:type="arc" xlink:from="source"
xlink:to="mirror" xlink:show="replace"
xlink:actuate="onRequest"/>
</WEBSITE>
<?xml version="1.0"?>
<WEBSITE xmlns:xlink="http://www.w3.org/1999/xlink"
xlink:type="extended" xlink:title="Cafe au Lait">
<NAME xlink:type="resource" xlink:label="source">
Cafe au Lait
</NAME>
<HOMESITE xlink:type="locator"
xlink:href="http://ibiblio.org/javafaq/"
xlink:label="us"/>
<MIRROR xlink:type="locator"
xlink:title="Cafe au Lait Swedish Mirror"
xlink:label="se"
xlink:href="http://sunsite.kth.se/javafaq"/>
<MIRROR xlink:type="locator"
xlink:title="Cafe au Lait German Mirror"
xlink:label="sk"
xlink:href="http://sunsite.informatik.rwth-aachen.de/javafaq/"/>
<MIRROR xlink:type="locator"
xlink:title="Cafe au Lait Swiss Mirror"
xlink:label="ch"
xlink:href="http://sunsite.cnlab-switch.ch/javafaq/"/>
<xlink:arc from="source" show="new" actuate="onRequest"/>
<CONNECTION xlink:type="arc" xlink:from="source"
xlink:show="replace" xlink:actuate="onRequest"/>
</WEBSITE>
<!ELEMENT WEBSITE (HOMESITE, MIRROR*, xlink:arc*) >
<!ATTLIST WEBSITE
xmlns:xlink CDATA #FIXED "http://www.w3.org/1999/xlink"
xlink:type (extended) #FIXED "extended"
xlink:title CDATA #IMPLIED
xlink:label CDATA #IMPLIED
>
<!ELEMENT HOMESITE (#PCDATA)>
<!ATTLIST HOMESITE
xlink:type (locator) #FIXED "locator"
xlink:href CDATA #REQUIRED
xlink:label CDATA #REQUIRED
xlink:title CDATA #IMPLIED
>
<!ELEMENT MIRROR (#PCDATA)>
<!ATTLIST MIRROR
xlink:type (locator) #FIXED "locator"
xlink:href CDATA #REQUIRED
xlink:label CDATA #REQUIRED
xlink:title CDATA #IMPLIED
>
<!ELEMENT xlink:arc EMPTY>
<!ATTLIST CONNECTION
xlink:type (arc) #FIXED "arc"
xlink:from CDATA #IMPLIED
xlink:to CDATA #IMPLIED
xlink:show (replace) #IMPLIED "replace"
xlink:actuate (onRequest | onLoad) #IMPLIED "onRequest"
>
Inline links, such as the familiar A
element
from HTML, are themselves part of the source or target of the
link. The source of the link, that is the blue underlined text, is
included inside the A
element that defines the link.
Most simple links are inline.
An out-of-line link does not contain any part of any of the resources it connects. Instead, the links are stored in a separate document called the linkbase.
Out of line links allow you to add links to and from documents that can't be modified such as a page on someone else's web site.
Out of line links allow you to add links to different parts of non-XML content.
Out of line links are not yet supported by software.
<COURSE xmlns:xlink="http://www.w3.org/1999/xlink"
xlink:type="extended">
<TOC xlink:type="locator" xlink:href="index.xml" xlink:label="index"/>
<CLASS xlink:type="locator" xlink:href="week1.xml" xlink:label="class"/>
<CLASS xlink:type="locator" xlink:href="week2.xml" xlink:label="class"/>
<CLASS xlink:type="locator" xlink:href="week3.xml" xlink:label="class"/>
<CLASS xlink:type="locator" xlink:href="week4.xml" xlink:label="class"/>
<CLASS xlink:type="locator" xlink:href="week5.xml" xlink:label="class"/>
<CLASS xlink:type="locator" xlink:href="week6.xml" xlink:label="class"/>
<CLASS xlink:type="locator" xlink:href="week7.xml" xlink:label="class"/>
<CLASS xlink:type="locator" xlink:href="week8.xml" xlink:label="class"/>
<CLASS xlink:type="locator" xlink:href="week9.xml" xlink:label="class"/>
<CLASS xlink:type="locator" xlink:href="week10.xml" xlink:label="class"/>
<CLASS xlink:type="locator" xlink:href="week11.xml" xlink:label="class"/>
<CLASS xlink:type="locator" xlink:href="week12.xml" xlink:label="class"/>
<CLASS xlink:type="locator" xlink:href="week13.xml" xlink:label="class"/>
<CONNECTION xlink:type="arc" from="index" to="class"/>
<CONNECTION xlink:type="arc" from="class" to="index"/>
</COURSE>
<COURSE xmlns:xlink="http://www.w3.org/1999/xlink"
xlink:type="extended">
<CLASS xlink:type="locator" xlink:href="week1.xml" xlink:label="1"/>
<CLASS xlink:type="locator" xlink:href="week2.xml" xlink:label="2"/>
<CLASS xlink:type="locator" xlink:href="week3.xml" xlink:label="3"/>
<CLASS xlink:type="locator" xlink:href="week4.xml" xlink:label="4"/>
<CLASS xlink:type="locator" xlink:href="week5.xml" xlink:label="5"/>
<CLASS xlink:type="locator" xlink:href="week6.xml" xlink:label="6"/>
<CLASS xlink:type="locator" xlink:href="week7.xml" xlink:label="7"/>
<CLASS xlink:type="locator" xlink:href="week8.xml" xlink:label="8"/>
<CLASS xlink:type="locator" xlink:href="week9.xml" xlink:label="9"/>
<CLASS xlink:type="locator" xlink:href="week10.xml" xlink:label="10"/>
<CLASS xlink:type="locator" xlink:href="week11.xml" xlink:label="11"/>
<CLASS xlink:type="locator" xlink:href="week12.xml" xlink:label="12"/>
<CLASS xlink:type="locator" xlink:href="week13.xml" xlink:label="13"/>
<!-- Previous Links -->
<CONNECTION xlink:type="arc" xlink:from="2" xlink:to="1"/>
<CONNECTION xlink:type="arc" xlink:from="3" xlink:to="2"/>
<CONNECTION xlink:type="arc" xlink:from="4" xlink:to="3"/>
<CONNECTION xlink:type="arc" xlink:from="5" xlink:to="4"/>
<CONNECTION xlink:type="arc" xlink:from="6" xlink:to="5"/>
<CONNECTION xlink:type="arc" xlink:from="7" xlink:to="6"/>
<CONNECTION xlink:type="arc" xlink:from="8" xlink:to="7"/>
<CONNECTION xlink:type="arc" xlink:from="9" xlink:to="8"/>
<CONNECTION xlink:type="arc" xlink:from="10" xlink:to="9"/>
<CONNECTION xlink:type="arc" xlink:from="11" xlink:to="10"/>
<CONNECTION xlink:type="arc" xlink:from="12" xlink:to="11"/>
<CONNECTION xlink:type="arc" xlink:from="13" xlink:to="12"/>
<!-- Next Links -->
<CONNECTION xlink:type="arc" xlink:from="1" xlink:to="2"/>
<CONNECTION xlink:type="arc" xlink:from="2" xlink:to="3"/>
<CONNECTION xlink:type="arc" xlink:from="3" xlink:to="4"/>
<CONNECTION xlink:type="arc" xlink:from="4" xlink:to="5"/>
<CONNECTION xlink:type="arc" xlink:from="5" xlink:to="6"/>
<CONNECTION xlink:type="arc" xlink:from="6" xlink:to="7"/>
<CONNECTION xlink:type="arc" xlink:from="7" xlink:to="8"/>
<CONNECTION xlink:type="arc" xlink:from="8" xlink:to="9"/>
<CONNECTION xlink:type="arc" xlink:from="9" xlink:to="10"/>
<CONNECTION xlink:type="arc" xlink:from="10" xlink:to="11"/>
<CONNECTION xlink:type="arc" xlink:from="11" xlink:to="12"/>
<CONNECTION xlink:type="arc" xlink:from="12" xlink:to="13"/>
</COURSE>
A single XML document may contain multiple out-of-line extended links. However, the current XLink specification is relatively silent on exactly what the format of such a compound document should look like. About all it says is that such a document must be a well-formed XML document. An XLink processor would presumably read the entire document an extract any extended links that indicate connections to or from the current document.
A browser or other application that's reading the individual pages needs to be informed that there is a separate linkbase elsewhere that it should read and parse so that it can show the links to the user.
Ideally it would be handled through some external mechanism like HTTP headers.
The only currently defined way to do this
is to add an arc element inside the documents the out-of-line
link connects. This arc has an xlink:arcrole
attribute with the value
http://www.w3.org/1999/xlink/properties/linkbase
.
It's xlink:to
attribute points to the linlkbase.
<METADATA xlink:type="xlink:extended"
xmlns:xlink="http://www.w3.org/1999/xlink">
<LINKBASE xlink:type="arc"
xmlns:xlink="http://www.w3.org/1999/xlink"
xlink:arcrole="http://www.w3.org/1999/xlink/properties/linkbase"
xlink:to="courselinks"/>
<RESOURCE xlink:type="locator" href="courselinks.xml"
xlink:label="courselinks"/>
</METADATA>
XLinks can do everything HTML links can do and quite a bit more, but they aren't supported by current applications.
XLink elements of all types are placed in the
http://www.w3.org/1999/xlink namespace, normally with
the xlink
prefix. However, the URI is likely to
change in future revisions to XLink.
Simple links behave much like HTML links, but they are not
restricted to a single <A>
tag.
Linking elements are identified by xlink:type
attributes.
Simple link elements are identified by
xlink:type
attributes with the value simple.
Linking elements can describe the resource they're linking to
with xlink:title
and xlink:role
attributes.
Linking elements can use the xlink:show
attribute to
tell the application how the content should be displayed when
the link is activated, for example, by opening a new window.
Linking elements can use the xlink:actuate
attribute to
tell the application whether the link should be traversed
without a specific user request.
Extended link elements are identified by
xlink:type
attributes with the value extended.
Extended links can contain multiple locators, resources, and arcs. Currently, it's left to the application to decide how to choose between different alternatives.
Resource elements are identified by xlink:type
attributes with the value resource.
Locator elements are identified by xlink:type
attributes with the value locator.
A locator element has an
xlink:href
attribute whose value is the URI of the
resource it locates.
Arc elements are identified by xlink:type
attributes with the value arc.
Arc elements have xlink:from
and xlink:to
attributes of IDREF
type that identify the
resources they connect by their roles.
Arc elements may have xlink:show
and
xlink:actuate
attributes to determine when and how
traversal of the link occurs.
An out-of-line link is a link that does not contain any local resources.
A linkbase is a document containing multiple out-of-line, extended link elements.
A linkbase is found when a document with an extended link with the role xlink:external-linkset is read.
This presentation: http://www.ibiblio.org/xml/slides/sd2000east/advancedxml
XLink Specification: http://www.w3.org/TR/xlink/
Chapter 16 of the XML Bible: http://www.ibiblio.org/xml/books/bible/updates/16.html
The many advantages of descriptive pointing are crucial for a scalable, generic pointing system. Descriptive pointing is crucial for all the same reasons that descriptive markup is crucial to documents, and that making links first-class objects is crucial to linking. It is also clearly feasible, as shown by multiple implementations of the prior WDs from the XML WG, and of TEI extended pointers.--XML Linking Working Group, XML XPointer Requirements
Why Use XPointers?
XPointer Examples
A Concrete Example
Location Paths, Steps, and Sets
Axes
Node Tests
Predicates
Functions that Return Node Sets
Points
Ranges
Child Sequences
XPointer, the XML Pointer Language, defines an addressing scheme for individual parts of an XML document. XLinks point to a URI (in practice, a URL) that specifies a particular resource. The URI may include an XPointer part that more specifically identifies the desired part or element of the targeted resource or document. XPointers use the same XPath syntax you're familiar with from XSL transformations to identify the parts of the document they point to, along with a few additional pieces.
The element with a given ID
All elements that possess a certain attribute
The first element of a certain type
The last element whose class
attribute has the value pending
.
The seventh element of a given type
The first child of the seventh element
and many more including combinations of these addresses...
xpointer(id("ebnf"))
xpointer(descendant::language[position()=2])
ebnf
xpointer(/child::spec/child::body/child::*/child::language[position()=2])
/1/14/2
xpointer(id("ebnf"))xpointer(id("EBNF"))
The document is not specified in the XPointer; rather, the
XLink specifies the document. The XLinks you saw in the previous
chapter did not contain XPointers, but it isn't hard to add
XPointers to them. Most of the time you simply append the
XPointer to the URI separated by a #
, just as you
do with named anchors in HTML. For example, the above list of
XPointers could be suffixed to URLs and come out looking like
the following:
http://www.w3.org/TR/1998/REC-xml-19980210.xml#xpointer(id("ebnf"))
http://www.w3.org/TR/1998/REC-xml-19980210.xml#xpointer(descendant::language[position()=2])
http://www.w3.org/TR/1998/REC-xml-19980210.xml#ebnf
http://www.w3.org/TR/1998/REC-xml-19980210.xml#xpointer(/child::spec/child::body/child::*/child::language[position()=2])
http://www.w3.org/TR/1998/REC-xml-19980210.xml#/1/14/2
http://www.w3.org/TR/1998/REC-xml-19980210.xml#xpointer(id("ebnf"))xpointer(id("EBNF"))
Normally these
are used as values of the xlink:href
attribute of a
linking element. For example:
<SPECIFICATION xmlns:xlink="http://www.w3.org/1999/xlink" xlink:type="simple"
xlink:href="http://www.w3.org/TR/1998/REC-xml-19980210.xml#xpointer(id('ebnf'))">
xlink:actuate="onRequest" xlink:show="replace"
Extensible Markup Language (XML) 1.0
</SPECIFICATION>
<?xml version="1.0"?>
<!DOCTYPE FAMILYTREE [
<!ELEMENT FAMILYTREE (PERSON | FAMILY)*>
<!-- PERSON elements -->
<!ELEMENT PERSON (NAME*, BORN*, DIED*, SPOUSE*)>
<!ATTLIST PERSON
ID ID #REQUIRED
FATHER CDATA #IMPLIED
MOTHER CDATA #IMPLIED
>
<!ELEMENT NAME (#PCDATA)>
<!ELEMENT BORN (#PCDATA)>
<!ELEMENT DIED (#PCDATA)>
<!ELEMENT SPOUSE EMPTY>
<!ATTLIST SPOUSE IDREF IDREF #REQUIRED>
<!--FAMILY-->
<!ELEMENT FAMILY (HUSBAND?, WIFE?, CHILD*) >
<!ATTLIST FAMILY ID ID #REQUIRED>
<!ELEMENT HUSBAND EMPTY>
<!ATTLIST HUSBAND IDREF IDREF #REQUIRED>
<!ELEMENT WIFE EMPTY>
<!ATTLIST WIFE IDREF IDREF #REQUIRED>
<!ELEMENT CHILD EMPTY>
<!ATTLIST CHILD IDREF IDREF #REQUIRED>
]>
<FAMILYTREE>
<PERSON ID="p1">
<NAME>Domeniquette Celeste Baudean</NAME>
<BORN>21 Apr 1836</BORN>
<DIED>Unknown</DIED>
<SPOUSE IDREF="p2"/>
</PERSON>
<PERSON ID="p2">
<NAME>Jean Francois Bellau</NAME>
<SPOUSE IDREF="p1"/>
</PERSON>
<PERSON ID="p3" FATHER="p2" MOTHER="p1">
<NAME>Elodie Bellau</NAME>
<BORN>11 Feb 1858</BORN>
<DIED>12 Apr 1898</DIED>
<SPOUSE IDREF="p4"/>
</PERSON>
<PERSON ID="p4" FATHER="p2" MOTHER="p1">
<NAME>John P. Muller</NAME>
<SPOUSE IDREF="p3"/>
</PERSON>
<PERSON ID="p7">
<NAME>Adolf Eno</NAME>
<SPOUSE IDREF="p6"/>
</PERSON>
<PERSON ID="p6" FATHER="p2" MOTHER="p1">
<NAME>Maria Bellau</NAME>
<SPOUSE IDREF="p7"/>
</PERSON>
<PERSON ID="p5" FATHER="p2" MOTHER="p1">
<NAME>Eugene Bellau</NAME>
</PERSON>
<PERSON ID="p8" FATHER="p2" MOTHER="p1">
<NAME>Louise Pauline Bellau</NAME>
<BORN>29 Oct 1868</BORN>
<DIED>3 May 1938</DIED>
<SPOUSE IDREF="p9"/>
</PERSON>
<PERSON ID="p9">
<NAME>Charles Walter Harold</NAME>
<BORN>about 1861</BORN>
<DIED>about 1938</DIED>
<SPOUSE IDREF="p8"/>
</PERSON>
<PERSON ID="p10" FATHER="p2" MOTHER="p1">
<NAME>Victor Joseph Bellau</NAME>
<SPOUSE IDREF="p11"/>
</PERSON>
<PERSON ID="p11">
<NAME>Ellen Gilmore</NAME>
<SPOUSE IDREF="p10"/>
</PERSON>
<PERSON ID="p12" FATHER="p2" MOTHER="p1">
<NAME>Honore Bellau</NAME>
</PERSON>
<FAMILY ID="f1">
<HUSBAND IDREF="p2"/>
<WIFE IDREF="p1"/>
<CHILD IDREF="p3"/>
<CHILD IDREF="p5"/>
<CHILD IDREF="p6"/>
<CHILD IDREF="p8"/>
<CHILD IDREF="p10"/>
<CHILD IDREF="p12"/>
</FAMILY>
<FAMILY ID="f2">
<HUSBAND IDREF="p7"/>
<WIFE IDREF="p6"/>
</FAMILY>
</FAMILYTREE>
Many (though not all) XPointers are location paths. These are the same location paths used by XSLT.
Location paths are built from location steps.
Each location step specifies a point in the targeted document, generally relative to some other well-known point such as the start of the document or another location step. This well-known point is called the context node.
A location step has three parts:
The axis
The node test
An optional predicate
axis::node-test[predicate]
child::PERSON[position()=2]
The axis tells you in what direction to search from the context node.
The node test tells you which nodes to consider along the axis.
The predicate is a boolean expression that tests each node in that set. If that expression returns false, then the node is removed from the set.
xpointer(/child::FAMILYTREE/child::PERSON[position()=3])
The location path of this XPointer is
/child::FAMILYTREE/child::PERSON[position()=3]
.
It is built from two location steps:
/child::FAMILYTREE
child::PERSON[position()=3]
xpointer(/child::FAMILYTREE/child::PERSON[position()>3])
XPath defines twelve axes along which an XPointer may search for nodes, all from the same XPath syntax used for XSLT. These depend on context to determine exactly what they point to. For instance, consider this location path:
id("p6")/child::NAME
It begins with the id()
function that returns a
node set containing the element with the ID type attribute whose
value is p6
. This provides a context node for the
following location step along the relative child
axis. Other axes include ancestor
,
descendant
, self
,
ancestor-or-self
, descendant-or-self
,
attribute
, and more. Each serves to select a
particular subset of the elements in the document. For instance,
the following
axis selects from nodes that come
after the context node. The preceding
axis selects
from nodes that come before the context node.
Axis | Selects From |
ancestor |
the parent of the context node, the parent of the parent of the context node, the parent of the parent of the parent of the context node, and so forth back to the root node |
ancestor-or-self |
the ancestors of the context node and the context node itself |
attribute |
the attributes of the context node |
child |
the immediate children of the context node |
descendant |
the children of the context node, the children of the children of the context node, and so forth |
descendant-or-self |
the context node itself and its descendants |
following |
all nodes that start after the end of the context node, excluding attribute and namespace nodes |
following-sibling |
all nodes that start after the end of the context node and have the same parent as the context node |
parent |
the unique parent node of the context node |
preceding |
all nodes that start before the beginning of the context node, excluding attribute and namespace nodes |
preceding-sibling |
all nodes that start before the beginning of the context node and have the same parent as the context node |
self |
the context node |
The child
axis selects from the children of the
context node. For example, consider this XPointer:
xpointer(/child::FAMILYTREE/child::PERSON[position()=3]/child::NAME)
Reading from right to left, it selects the NAME
child of the third PERSON
element that's a child of
the FAMILYTREE
element that's a child of the root
element. In this example, there's only one such element, but if
there are more than one then all are returned. For instance
consider this XPointer:
xpointer(/child::FAMILYTREE/child::PERSON/child::NAME)
This selects all NAME
children of
PERSON
elements that are children of
FAMILYTREE
elements that are children of the root.
They're a dozen of these in Example 17-1.
It's important to note that the child
axis only
selects from the immediate children of the context node.
For example, consider this URI:
http://www.theharolds.com/genealogy.xml#xpointer(/child::NAME)
This points nowhere because there are no NAME
elements in the document that are direct, immediate children of
the root node. There are a dozen NAME
elements that
are indirect children. If you'd like to refer to these, you
should use the descendant
axis instead of
child
.
As in XSLT, the child axis is implied if no explicit axis name is present. For instance, the above three XPointers would more likely be written in this abbreviated form:
xpointer(/FAMILYTREE/PERSON[position()=3]/NAME)
xpointer(/FAMILYTREE/PERSON/NAME)
xpointer(/NAME)
The descendant
axis searches through all the
descendants of the context node, not just the immediate
children. For example,
/descendant::BORN[position()=3]
selects the third
BORN
element encountered in a depth-first search of
the document tree. (Depth first is the order you get if you
simply read through the XML document from top to bottom.) In
Listing 17-1, that selects Louise Pauline Bellau's birthday,
<BORN>29 Oct 1868</BORN>
.
The descendant axis can be abbreviated by using a double
slash in place of a single slash. For example,
//BORN[position()=3]
also selects the third
BORN
element encountered in a depth-first search of
the document tree. //NAME
selects all
NAME
elements in the document.
//PERSON/NAME
selects all NAME
children of PERSON
elements.
The
descendant-or-self
axis searches through all the
descendants of the context node, starting with the context node
itself, until it finds the requested element. For example,
id("p11")/descendant-or-self::PERSON
refers to all
PERSON
children of the element with ID p11 as well
as that element itself, since it is of type PERSON
.
There is no abbreviation for
descendant-or-self
.
The parent
axis refers
to the node that's the immediate parent of the context node. For
example,
/descendant::HUSBAND[position()=1]/parent::*
refers
to the parent element of the first HUSBAND
element
in the document.
Without a node test the parent axis can be abbreviated by a
..
as in
//HUSBAND[position()=1]/..
.
The self
axis refers to
the context node. It's sometimes useful when making relative
links. For example, /self::node()
selects the root
node of the document (which is not the same as the root element
of the document; that would be selected by
/child::*
or, in this example,
/child::FAMILYTREE
.) It can abbreviated by a single
period. However, this axis is rarely used in XPointers. It's
more useful for XSLT select expressions.
The ancestor
axis
selects all nodes that contain the context node, starting with
its parent. For example,
/descendant::BORN[position()=2]/ancestor::*[position()=1]
selects the element which contains the second BORN
element. In this example, it selects Elodie Bellau's
PERSON
element. There's no abbreviation for the
ancestor
axis.
The
ancestor-or-self
axis selects the context node and
all nodes that contain it. For example,
id("p1")/ancestor-or-self::*
selects a node set
including Domeniquette Celeste Baudean's PERSON
element that has ID p1, its parent, the FAMILYTREE
element, and its parent, the root node. There's also no
abbreviation for the ancestor-or-self
axis.
The preceding
axis
selects all elements that occur before the context node. The
preceding
axis has no respect for hierarchy. The
first time it encounters an element's start tag, end tag, or
empty tag, it counts that element. For example, consider this
rule:
/descendant::BORN[position()=3]/preceding::*[position()=5]
This says go to the third BORN
element from the root, Louise
Pauline Bellau's birthday, <BORN>29 Oct 1868</BORN>
,
and then move back five elements. This lands on Maria Bellau's
PERSON
element. There's no abbreviation for the
preceding
axis.
The following
axis selects all elements that occur after the
context node's closing tag. Like preceding
, following
has no respect for hierarchy. The first time it encounters an element's start
tag or empty tag, it counts that element. For example, consider this rule:
/descendant::BORN[position()=2]/following::*[position()=5]
This says go to Elodie Bellau's birthday, <BORN>11 Feb
1858</BORN>
, and then move forward five elements. This lands on
John P. Muller's NAME
element, <NAME>John P.
Muller</NAME>
, after passing through Elodie Bellau's DIED
element, Elodie Bellau's SPOUSE
element, Elodie Bellau's
PERSON
element, and John P. Muller's PERSON
element,
in this order. There's no abbreviation for the following
axis.
The preceding-sibling
axis selects elements that precede the
context node in the same parent element. For example,
/descendant::BORN[position()=2]/preceding-sibling::*[position()=1]
selects Elodie Bellau's NAME
element, <NAME>Elodie
Bellau</NAME>
. /descendant::BORN[position()=2]/preceding-
sibling::*[position()=2]
doesn’t point to anything because there's
only one sibling of Elodie Bellau's BORN
element before it. There's
no abbreviation for the preceding-sibling
axis.
The following-sibling
axis selects elements that follow the
context node in the same parent element. For example,
/descendant::BORN[position()=2]/following-sibling::*[position()=1]
selects Elodie Bellau's DIED
element, <DIED>12 Apr
1898</DIED>
. /descendant::BORN[position()=2]/following-
sibling::*[position()=3]
doesn't point to anything because there are only
two sibling elements following Elodie Bellau's BORN
element.
There's no abbreviation for the following-sibling
axis.
The attribute
axis selects an attribute node contained by the
context node. For example, the XPointer
/descendant::SPOUSE/attribute::IDREF
selects all IDREF
attributes of all SPOUSE
elements in the document. The attribute
axis can be abbreviated by an @
sign. Thus
//SPOUSE/@IDREF
selects all IDREF
attributes of all
SPOUSE
elements in the document. @*
is a general
abbreviation for an attribute with any name. Thus //SPOUSE/@*
indicates all attributes of all SPOUSE
elements.
For another example, to find all PERSON
elements in the document
http://www.theharolds.com/genealogy.xml
whose FATHER
attribute is Jean Francois Bellau (ID p2), you could write
//PERSON[@FATHER="p2"]
.
Most of the time the node test part of the basis is simply an
element name like PERSON
or BORN
.
However, there are seven other possibilities:
*
node()
text()
comment()
processing-instruction()
point()
range()
<CITATION CLASS="TURING" ID="C2">
<AUTHOR>Turing, Alan M.</AUTHOR>
"<TITLE>On Computable Numbers,
With an Application to the Entscheidungs-problem</TITLE>"
<JOURNAL>
Proceedings of the London Mathematical Society</JOURNAL>,
<SERIES>Series 2</SERIES>,
<VOLUME>42</VOLUME>
(<YEAR>1936</YEAR>):
<PAGES>230-65</PAGES>.
</CITATION>
The following XPointer refers to the quotation mark before
the TITLE
element.
id("C2")/child::text()[position()=2]
The first text node in this fragment is the whitespace
between <CITATION CLASS="TURING" ID="C2">
and
<AUTHOR>
. Technically, this XPointer refers
to all text between </AUTHOR>
and
<TITLE>
, including the whitespace and not
just the quotation mark.
Because character data does not contain any child nodes, most
relative location steps may not be attached to an XPointer that
selects a text node. The exception is the
point()
node test
which will be discussed later.
The
comment()
node test specifically refers to
comments. For example, this XPointer points to the third comment
in the document:
/descendant::comment()[position()=3]
Because comments do not contain attributes or elements, you
cannot add an additional child
,
descendant
, or attribute
relative
location step after the first term that selects a comment.
Finally, the processing-instruction()
node test
selects any processing instructions that occur along the chosen
axis. You can use it without any arguments to select any
processing instructions, or with arguments to specify the
particular processing instruction targets you want to select.
For example, /descendant::processing-instruction()
selects all processing instructions in the document. However,
/descendant::processing-instruction(xml-stylesheet)
only finds processing instructions that begin
<?xml-stylesheet
.
/descendant::processing-instruction(php)
only finds
processing instructions intended for PHP. As with comments,
because processing instructions do not contain attributes or
elements, you cannot add an additional child
,
descendant
, or attribute
relative
location term after the first term that selects a processing
instruction.
The point()
and range()
mode tests refer
to new ways of dividing an XML document.
They will be discussed below.
Although the other node tests all end with parentheses, none
of them except processing-instruction()
actually
take any arguments.
All elements that have a specified attribute
All elements that have a specified attribute with a specified value
The first element that contains a specified child element
An element whose text content includes a specified string
All elements that are not the first or last children of their parents
All elements whose value is a number
All elements whose value is a number greater than 100
These are just a small sampling of the selections that predicates make possible.
XPath predicate expressions are ultimately converted to a boolean after all calculations are finished. Non-boolean results are converted as follows:
A number is true if it's equal to the position of the context node, false otherwise.
An empty node set is false; all other node sets are true.
An empty result fragment is false; all other result fragments are true.
A zero length string is false; all other strings are true (including the string "false")
The predicate expression is evaluated for each node in the context node list. Each node for which the expression ultimately evaluates to false is removed from the list. Thus only those nodes that satisfy the predicate remain. I will not repeat here the discussion of the operators and functions available to use expressions. However, I will show you a few examples of predicates using the expression syntax as it's likely to be used in XPointers.
Probably the function most frequently used in XPointer
predicates is position()
. This returns the index of
the node in the context node list. This allows you to find the
first, second, third, or other indexed node. You can compare
positions using the various relational operators like
<
, >
, =
,
!=
, >=
, and <=
.
xpointer(/child::FAMILYTREExpointer(/child::*[position()=1])
xpointer(/child::FAMILYTREExpointer(/child::*[position()=2])
xpointer(/child::FAMILYTREExpointer(/child::*[position()=3])
xpointer(/child::FAMILYTREExpointer(/child::*[position()=4])
xpointer(/child::FAMILYTREExpointer(/child::*[position()=5])
xpointer(/child::FAMILYTREExpointer(/child::*[position()=6])
xpointer(/child::FAMILYTREExpointer(/child::*[position()=7])
xpointer(/child::FAMILYTREExpointer(/child::*[position()=8])
xpointer(/child::FAMILYTREExpointer(/child::*[position()=9])
xpointer(/child::FAMILYTREExpointer(/child::*[position()=10])
xpointer(/child::FAMILYTREExpointer(/child::*[position()=11])
xpointer(/child::FAMILYTREExpointer(/child::*[position()=12])
xpointer(/child::FAMILYTREExpointer(/child::*[position()=13])
xpointer(/child::FAMILYTREExpointer(/child::*[position()=14])
xpointer(/child::FAMILYTREE/child::*[1])
xpointer(/child::FAMILYTREE/child::*[2])
xpointer(/child::FAMILYTREE/child::*[3])
xpointer(/child::FAMILYTREE/child::*[4])
xpointer(/child::FAMILYTREE/child::*[5])
xpointer(/child::FAMILYTREE/child::*[6])
xpointer(/child::FAMILYTREE/child::*[7])
xpointer(/child::FAMILYTREE/child::*[8])
xpointer(/child::FAMILYTREE/child::*[9])
xpointer(/child::FAMILYTREE/child::*[10])
xpointer(/child::FAMILYTREE/child::*[11])
xpointer(/child::FAMILYTREE/child::*[12])
xpointer(/child::FAMILYTREE/child::*[13])
xpointer(/child::FAMILYTREE/child::*[14])
id()
root()
here()
origin()
The last two, here()
and origin()
are XPointer extensions to XPath that are not available in XSLT.
The id()
function is one of the simplest and most robust means of
identifying a element node. It selects the element in the
document that has an ID type attribute with a specified value.
For example, consider the URI
http://www.theharolds.com/genealogy.xml#xpointer(id("p12")). If
you look back at Listing 17-1, you find this element:
<PERSON ID="p12" FATHER="p2" MOTHER="p1">
<NAME>Honore Bellau</NAME>
</PERSON>
Since ID pointers are so common and so useful, there's also
a shortcut for this. If all you want to do is point to a
particular element with a particular ID, you can skip all the
xpointer(id(""))
fru-fru and just use the
bare ID after the #
like this:
http://www.theharolds.com/genealogy.xml#p12
XPointers are evaluated from left to right. The first match found is returned, so the backup is only used if an ID type attribute with the value p12 can't be found.
Consider a simple slide show. In this example,
here()/following::SLIDE[1]
refers to the next slide in the
show. here()/preceding::SLIDE[1]
refers to the previous slide
in the show. Presumably this would be used in conjunction with a
style sheet that showed one slide at a time.
<?xml version="1.0"?>
<SLIDESHOW xmlns:xlink="http://www.w3.org/1999/xlink">
<SLIDE>
<H1>Welcome to the slide show!</H1>
<BUTTON xlink:type="simple"
xlink:href="here()/following::SLIDE[1]">
Next
</BUTTON>
</SLIDE>
<SLIDE>
<H1>This is the second slide</H1>
<BUTTON xlink:type="simple"
xlink:href="here()/preceding::SLIDE[1]">
Previous
</BUTTON>
<BUTTON xlink:type="simple"
xlink:href="here()/following::SLIDE[1]">
Next
</BUTTON>
</SLIDE>
<SLIDE>
<H1>This is the second slide</H1>
<BUTTON xlink:type="simple"
xlink:href="here()/preceding::SLIDE[1]">
Previous
</BUTTON>
<BUTTON xlink:type="simple"
xlink:href="here()/following::SLIDE[1]">
Next
</BUTTON>
</SLIDE>
<SLIDE>
<H1>This is the third slide</H1>
<BUTTON xlink:type="simple"
xlink:href="here()/preceding::SLIDE[1]">
Previous
</BUTTON>
<BUTTON xlink:type="simple"
xlink:href="here().following(1,SLIDE)">
Next
</BUTTON>
</SLIDE>
...
<SLIDE>
<H1>This is the last slide</H1>
<BUTTON xlink:type="simple"
xlink:href="here()/preceding::SLIDE[1]">
Previous
</BUTTON>
</SLIDE>
</SLIDESHOW>
Generally, the here()
location term is only used in fully
relative URIs in XLinks. If any URI part is included, it must be
the same as the URI of the current document.
The origin()
function is much the same as here()
;
that is, it refers to the source of a link. However,
origin()
is used in out-of-line links where the
link is not actually present in the source document. It points to the
element in the source document from which the user activated the link.
<BORN>11 Feb 1858</BORN>
Every point is either between two nodes or between two characters in the parsed character data of a document. To make sense of this you have to remember that parsed character data is part of a text node. For instance, consider this very simple but well-formed XML document:
<GREETING>
Hello
</GREETING>
There are exactly three nodes and 13 distinct points in this document. In order the points are:
The point before the root node
The point before the GREETING
element node
The point before the text node containing the text "Hello" (as well as assorted white space)
The point before the white space between <GREETING>
and Hello.
The point before the first H in Hello
The point between the H and the e in Hello
The point between the e and the l in Hello
The point between the l and the l in Hello
The point between the l and the o in Hello
The point after the o in Hello
The point after the white space between Hello and </GREETING>
.
The point after the GREETING
element.
The point after the root node.
The exact details of the white space in the document are not considered here. XPointer collapses all runs of white space to a single space.
A point is selected using an XPath expression that points at
a node; then suffixing it with
/point()[position()=n]
where n
is
the index of the point following that node that you want. The
index refers to the point before nth child element if the
context node is an element or root node, or to the nth character
of the string value of the node otherwise. For example, to
select the point immediately before the D in Domeniquette
Celeste Baudean's NAME
element,
/child::FAMILYTREE/descendant::*[position()=1]/child::NAME/child::text()/point()[position()=0]
To select the point after the last e in Domeniquette, since there are 12 letters in Domeniquette,
/child::FAMILYTREE/descendant::*[position()=1]/child::NAME/child::text()/point()[position()=12]
In some applications it may be important to specify a range across a document rather than a particular point in the document. For instance, the selection a user makes with a mouse is not necessarily going to match up with any one element or node. It may start in the middle of one paragraph, extend across a heading and a picture and then into the middle of another paragraph two pages down.
Any such contiguous area of a document can be described with a range. A range begins at one point and continues until another point. Each point is identified by a location path. If the starting path points to a node set rather than a point, then the first point in the location set the XPointer identifies is the start point. If the ending location path points to a node set rather than a point, then the last point in the location set the XPointer identifies is the end point of the range.
To specify a range, you append
/range-to(end-point)
to a locaiton path specifying the start point of the range. The
parentheses contain a location path specifying the endpoint
of the range. For example, suppose you want to select everything
between the first PERSON
element and the last
PERSON
element
xpointer(/child::PERSON[position() = 1]/range-to(/child::PERSON[position() = last()]))
range(location-set)
range-inside(location-set)
start-point(location-set)
start-point(//PERSON[1])
Returns the point immediately before the first PERSON
element.
start-point(//PERSON)
returns the set of points immediately before each PERSON
element.
end-point(location-set)
start-point()
except that it returns the
points immediately after each location in its input.
XPointer provides some very basic string matching
capabilities through the string-range()
function. This function
takes as an argument a node set to search and a substring to
search for. It returns a node set containing one range for each
non-overlapping match to the string. You can also provide
optional index and length arguments indicating how many
characters after the match the range should start and how many
characters after the start the range should continue. The basic
syntax is:
string-range(node-set,substring,index,length)
The first node-set argument is an XPath expression specifying which part of the document to search for a matching string. The second substring argument is the actual string to search for. By default, the range returned starts before the first matched character and encompasses all the matched characters. However, the index argument can give a positive number to start after the start of the match. For instance, setting it to 2 would indicate that the range starts after the first matched character. The length argument can specify how many characters to include in the range.
A string range points to an occurrence of a specified string, or a substring of a given string in the text (not markup) of the document. For example, this XPointer finds all occurrences of the string "Harold":
xpointer(string-range(/,"Harold"))
You can change the first argument to specify what nodes you want
to look in. For example, this XPointer finds all occurrences of
the string "Harold" in NAME
elements:
xpointer(string-range(//NAME,"Harold"))
String ranges may have node tests. Thus this XPointer finds only the first occurrence of the string "Harold" in the document:
xpointer(string-range(/,"Harold")[position()=1])
This targets the position immediately preceding the word Harold
in Charles Walter Harold's NAME
element. This is not
the same as pointing at the entire NAME
element as an
element-based selector would do.
A third numeric argument targets a particular position in the string. For example, this targets the point immediately following the first occurrence of the string "Harold" because Harold has six letters:
xpointer(string-range(/,"Harold",6)[position()=1])
An optional fourth argument specifies the number of characters to select. For example, this URI selects the "old" from the first occurrence of the entire string "Harold":
xpointer(string-range(/,"Harold",4,3)[position()=1])
If the first string argument in the node test is the empty string, then relevant positions in the context node's text contents will be selected. For example, the following XPointer targets the first six characters of the document's parsed character data:
xpointer(/string::"",1,6[position()=1])
For another example, let's suppose you want to find the year of birth for all people born in the nineteenth century. The following will accomplish that:
xpointer(string-range(//BORN, "18", 2, 4)
This says to look in all BORN
elements for the string " 18".
(The initial space is important to avoid accidentally matching
someone born in 1918 or on the 18th day of the month.) When it's
found move one character ahead (to skip the space) and return a
range covering the next four characters.
When matching strings, case is considered. All white space is condensed to a single space. Markup characters are ignored.
A child sequence is a shortcut for XPointers exemplified by the second example above; that is, an XPointer that consists of nothing but a series of child relative location steps counting down from the root node, each of which selects a particular child by position only. The shortcut is to use only the position number and the slashes that separate individual elements from each other, like this:
http://www.theharolds.com/genealogy.xml#/1/4
/1/4
is a child sequence that says to select the
fourth child element of the first child element of the root.
This syntax can be extended for any depth of child elements. For
example these two URIs point to John P. Muller's
NAME
and SPOUSE
elements respectively:
http://www.theharolds.com/genealogy.xml#/1/4/1
http://www.theharolds.com/genealogy.xml#/1/4/2
Child sequences may include an initial ID. In that case the
counting begins from the element with that ID rather than from
the root. For example, John P. Muller's PERSON
element has an ID
attribute with the value p4
. Consequently the XPointer p4/1
points to his NAME
element and p4/2
points to his SPOUSE
element.
Each child sequence always points to a single element. You cannot use child sequences with any other relative location steps. You cannot use them to select elements of a particular type. You cannot use them to select attribute or strings. You can only use them to select a single element by its relative location in the tree.
XPointers refer to particular parts of or locations in XML documents.
The syntax of an XPointer is the keyword xpointer
, followed
by parentheses containing an XPath expression that returns a
node set.
The id()
function points to an element with a
specified value for an ID type attribute.
Location steps can be chained to make more sophisticated location paths.
Each location step contains an axis, a node test, and zero or more predicates.
Relative location steps select nodes in a document based on their relationship to a context node.
The self
axis points to the context node. It
can be abbreviated as a period (.
).
The parent
axis points to the node that
contains the context node. It can be abbreviated as a double
period (..
).
The child
axis points to immediate children of
the context node. It can be abbreviated simply by a node test.
The descendant
axis points to all elements
contained in the context node. It can be abbreviated as a double
slash (//
).
The descendant-or-self
axis points to all
elements contained in the context node as well as the context
node itself.
The ancestor
axis points to an element that
contains the context node.
The ancestor-or-self
axis points to all
elements that contain the context node as well as the context
node itself.
The preceding
axis points to any element that
comes before the context node.
The following
axis points to any element
following the context node.
The preceding-sibling
axis selects from sibling
elements that precede the context node.
The following-sibling
axis selects from sibling
elements that follow the context node.
The attribute axis points to an attribute of the context
node. It can be abbreviated as a @
sign.
The node test of a relative location step is normally an
element name, but may also be the *
wild card to
select all elements or one of the keywords
comment()
, text()
,
processing-instruction()
, node()
,
point()
or range()
.
The optional predicate of a relative location step is an XPath boolean expression enclosed in square brackets that further narrows down the node set the XPointer refers to.
A point indicates a position preceding or following a node or a character.
A range identifies the parsed character data between two points.
The string-range()
function points to a
specified block of text.
A child sequence points to an element by counting children from the root.
This presentation: http://www.ibiblio.org/xml/slides/sd2000east/advancedxml
XPointer Specification: http://www.w3.org/TR/xptr
Chapter 17 of the XML Bible: http://www.ibiblio.org/xml/books/bible/updates/16.html
The triumph of worse is better
Developers need them desperately
Far too complex to be used as broadly as they're needed; experts only
Will be replaced within ten years; much like Java has replaced C
Won't succeed unless and until there's a killer app
First company to define the killer app gets to fill in the holes in the spec over the protests of the W3C and the hypertext community
Won't succeed unless and until there's a killer app
First company to define the killer app gets to fill in the holes in the spec over the protests of the W3C and the hypertext community
XSLT 1.1
XSL-FO
DOM3
XHTML
Query languages
Schema Repositories
MathML
SVG
Browser support
A standard function to create multiple output documents
A standard function to convert strings to node sets
Slow but successful adoption; steady linear growth
Grammar access
Extra attributes on Entity
,
Document
, Node
,
and Text
interfaces
Standard means of loading and saving XML documents. (This pulls the last leg out from under JAXP.)
Key Events
Too complex
Too little tool support
Too poorly documented
Offer no benefits to web page authors; the only people benefited are the tool vendors
Too early to say
XML still isn't a database
Commerce One
UDDI
BizTalk
xml.org
etc.
Mozilla will save this
Illustrator supports it now
Several browser plug-ins are available
Many tools
We needed this 10 years ago
We won't see reliable browser support for XML until at least 2002
Non-PC devices will become common; necessitating a move to browser-independent layout
Mozilla 2.0 knocks off IE
The best way to predict the future is to invent it.--Alan Kay
This presentation: http://www.ibiblio.org/xml/slides/sd2000east/advancedxml
JDOM Web Site: http://www.jdom.org
XML InfoSet Specification: http://www.w3.org/TR/xml-infoset
XML Base Specification: http://www.w3.org/TR/xmlbase
XInclude Specification: http://www.w3.org/TR/xinclude
XPointers: http://www.ibiblio.org/xml/books/bible/updates/17.html
XLinks: http://www.ibiblio.org/xml/books/bible/updates/16.html
XPath Specification: http://www.w3.org/TR/xpath
W3C Schema Primer: http://www.w3.org/TR/xmlschema-0/
XML Schema Part 1: Structures: http://www.w3.org/TR/xmlschema-1/
XML Schema Part 2: Datatypes: http://www.w3.org/TR/xmlschema-2/