Cutting Edge XML ProgrammingElliotte Rusty HaroldXMLOne EuropeWednesday, September 19, 2001elharo@metalab.unc.eduhttp://www.ibiblio.org/xml/ |
Part I: SAX 2.1
Part II: DOM Level 3
Part III: XSLT 2.0 and Beyond
Actually, SAX2 has ** MUCH ** better infoset support than DOM does. Yes, I've done the detailed analysis.
--David Brownell on the xml-dev mailing list
Full Infoset support
Backwards compatible with SAX2
Much less radical changes than from SAX1 to SAX2
Infoset includes a flag saying whether a given attribute value was specified in the instance document or defaulted from the DTD.
DOM also wants to know this
Solution:
package org.xml.sax.ext;
public interface Attributes2 extends Attributes {
public boolean isSpecified (int index);
public boolean isSpecified (String uri, String localName);
public boolean isSpecified (String qualifiedName);
}
This interface would be implemented by SAX 2.1
Attributes
objects provided in
startElement()
callbacks
The read-only
http://xml.org/sax/features/use-attributes2 feature
specifies whether Attributes2
is available
<?xml version="1.0" standalone="yes"?>
The XML Infoset includes a standalone property for documents
Not currently exposed by SAX2
Solution: Define a new read-only feature: http://xml.org/sax/features/is-standalone
Open issue: distinguish between standalone="no"
and omitted standalone declaration?
<?xml version="1.0" encoding="UTF-16"?>
Infoset includes the version and encoding from the XML declaration; SAX2 does not.
Unlike standalone, these apply to all parsed entities; not just the document entity
Solution:
package org.xml.sax.ext;
public interface Locator2 extends Locator {
public String getXMLVersion ();
public String getEncoding ();
}
This would be implemented by
Locator
objects passed to
setDocumentLocator()
methods
The read-only feature http://xml.org/sax/features/use-locator2
says whether Locator2
's are used.
To make matters worse, there can be as many as three encodings:
What's declared in the document using an encoding declaration in the XML declaration
The MIME type encoding, as specified by the the HTTP header
The name of the encoding used by a java.io.InputStreamReader
(UTF8 vs. UTF-8)
There's no way to find out what features a given XMLReader recognizes.
Solution: Define two new read-only properties:
XMLReader
.
XMLReader
.
Or perhaps a method instead of a property?
The DeclHandler
and LexicalHandler
extension handlers are not supported by the
DefaultHandler
convenience class.
Solution:
Define a new org.xml.sax.ext class
implementing those two
interfaces, inheriting from org.xml.sax.helpers.DefaultHandler
public class DefaultHandler2 extends DefaultHandler
implements DeclHandler, LexicalHandler {
// LexicalHandler methods
public void startDTD(String name, String publicId, String systemId)
throws SAXException {}
public void endDTD() throws SAXException {}
public void startEntity(String name) throws SAXException {}
public void endEntity(String name) throws SAXException {}
public void startCDATA() throws SAXException {}
public void endCDATA() throws SAXException {}
public void comment(char[] ch, int start, int length)
throws SAXException {}
// DeclHandler methods
public void elementDecl(String name, String model)
throws SAXException {}
public void attributeDecl(String elementName,
String attributeName, String type,
String valueDefault, String value)
throws SAXException {}
public void internalEntityDecl(String name, String value)
throws SAXException {}
public void externalEntityDecl(String name, String publicID,
String systemID) throws SAXException {}
}
Alternately,
update DefaultHandler
.
Problem: There is no conventional way for applications to identify the version of the parser they are using, for purposes of diagnostics or other kinds of troubleshooting.
The best the JVM supports is the JDK 1.2
java.lang.Package
facility,
which is dependent on the JAR file metadata. It provides a partial solution, at
the price of portability (JDK 1.1 APIs are much more portable) and
assumptions like "one parser per package".
Solution: Define a new standard read-only property:
Returns a string identifying the reader and its version for use in diagnostics.
Parsers could support that if desired, probably using some sort of resource-based mechanism (not necessarily Package) to keep such release-specific strings out of the source code.
Open issue: Should there be separate strings to ID the reader (likely a constant value) and its version (ideally assigned in release engineering)?
package org.jdom;
public final class Verifier {
public static final String checkElementName(String name) {}
public static final String checkAttributeName(String name) {}
public static final String checkCharacterData(String text) {}
public static final String checkNamespacePrefix(String prefix) {}
public static final String checkNamespaceURI(String uri) {}
public static final String checkProcessingInstructionTarget(String target) {}
public static final String checkCommentData(String data) {}
public static boolean isXMLCharacter(char c) {}
public static boolean isXMLNameCharacter(char c) {}
public static boolean isXMLNameStartCharacter(char c) {}
public static boolean isXMLLetterOrDigit(char c) {}
public static boolean isXMLLetter(char c) {}
public static boolean isXMLCombiningChar(char c) {}
public static boolean isXMLExtender(char c) {}
public static boolean isXMLDigit(char c) {}
}
Subscribe to the xml-dev mailing list, http://lists.xml.org/archives/xml-dev/
DOM Level 0: what was implemented for JavaScript in Netscape 3/IE3
DOM Level 1, a W3C Standard
DOM Level 2, a W3C Standard
DOM Level 3: Several Working Drafts:
Grammar access; a.k.a abstract schemas (DTDs, TREX schemas, W3C Schema Language schemas)
Extra attributes on Entity
,
Document
, Node
,
and Text
interfaces
Standard means of loading and saving XML documents.
Bootstrapping new documents
Key events
DOMKey
Node3
Document3
Text3
Entity3
Bootstrapping
Every node gets a unique key automatically generated by the DOM implementation to uniquely identify DOM nodes.
Type, attributes, and methods of the DOMKey
interface
remain to be determined for Java.
It will likely just be decalred an
Object
.
A Number
in JavaScript
Extends (or replaces?)
DOM2 Node
.
Adds:
I will only show the new methods. Currently, the plan is to
simple add these to the exiting Node
interface.
In IDL:
interface Node {
readonly attribute DOMString baseURI;
typedef enum _DocumentOrder {
DOCUMENT_ORDER_PRECEDING,
DOCUMENT_ORDER_FOLLOWING,
DOCUMENT_ORDER_SAME,
DOCUMENT_ORDER_UNORDERED
};
DocumentOrder;
DocumentOrder compareDocumentOrder(in Node other) raises(DOMException);
typedef enum _TreePosition {
TREE_POSITION_PRECEDING,
TREE_POSITION_FOLLOWING,
TREE_POSITION_ANCESTOR,
TREE_POSITION_DESCENDANT,
TREE_POSITION_SAME,
TREE_POSITION_UNORDERED
};
TreePosition;
TreePosition compareTreePosition(in Node other) raises(DOMException);
attribute DOMString textContent;
readonly attribute DOMKey key;
boolean isSameNode(in Node other);
DOMString lookupNamespacePrefix(in DOMString namespaceURI);
DOMString lookupNamespaceURI(in DOMString prefix);
void normalizeNS();
boolean equalsNode(in Node arg, in boolean deep);
Node getAs(in DOMString feature);
};
};
Java binding:
package org.w3c.dom;
public interface Node {
public String getBaseURI();
public static final int DOCUMENT_ORDER_PRECEDING = 1;
public static final int DOCUMENT_ORDER_FOLLOWING = 2;
public static final int DOCUMENT_ORDER_SAME = 3;
public static final int DOCUMENT_ORDER_UNORDERED = 4;
public int compareDocumentOrder(Node other) throws DOMException;
public static final int TREE_POSITION_PRECEDING = 1;
public static final int TREE_POSITION_FOLLOWING = 2;
public static final int TREE_POSITION_ANCESTOR = 3;
public static final int TREE_POSITION_DESCENDANT = 4;
public static final int TREE_POSITION_SAME = 5;
public static final int TREE_POSITION_UNORDERED = 6;
public int compareTreePosition(Node other) throws DOMException;
public String getTextContent();
public void setTextContent(String textContent);
public boolean isSameNode(Node other);
public boolean equalsNode(Node arg, boolean deep);
public String lookupNamespacePrefix(String namespaceURI);
public String lookupNamespaceURI(String prefix);
public void normalizeNS();
public Node getAs(String feature);
public Object getKey();
}
XML documents may be built from multiple parsed entities, each of which is not necessarily a well-formed XML document, but is at least a plausible part of a well-formed XML document.
Each entity may have its own text declaration.
This is like an XML declaration without a standalone
attribute
and with an optional version
attribute:
<?xml version="1.0"?>
<?xml version="1.0" encoding="ISO-8859-9"?>
<?xml encoding="ISO-8859-9"?>
DOM3 adds:
In IDL:
interface Entity : Node {
attribute DOMString actualEncoding;
attribute DOMString encoding;
attribute DOMString version;
};
Java binding:
package org.w3c.dom;
public interface Entity extends Node {
public String getActualEncoding();
public void setActualEncoding(String actualEncoding);
public String getEncoding();
public void setEncoding(String encoding);
public String getVersion();
public void setVersion();
}
Adds:
<?xml version="1.0"?>
<?xml version="1.0" encoding="ISO-8859-9"?>
<?xml version="1.0" encoding="ISO-8859-9" standalone="no"?>
<?xml version="1.0" standalone="yes"?>
adoptNode()
setBaseURI()
In IDL:
interface Document : Node {
attribute DOMString actualEncoding;
attribute DOMString encoding;
attribute boolean standalone;
attribute boolean strictErrorChecking;
attribute DOMString version;
Node adoptNode(in Node source) raises(DOMException);
void setBaseURI(in DOMString baseURI) raises(DOMException);
};
Java binding:
package org.w3c.dom;
public interface Document extends Node {
public String getActualEncoding();
public void setActualEncoding(String actualEncoding);
public String getEncoding();
public void setEncoding(String encoding);
public boolean getStandalone();
public void setStandalone(boolean standalone);
public boolean getStrictErrorChecking();
public void setStrictErrorChecking(boolean strictErrorChecking);
public String getVersion();
public void setVersion(String version);
public Node adoptNode(Node source) throws DOMException;
public void setBaseURI(String baseURI) throws DOMException;
}
Adds:
isWhitespaceInElementContent()
wholeText()
In IDL:
interface Text : Node {
readonly attribute boolean isWhitespaceInElementContent;
readonly attribute DOMString wholeText;
Text replaceWholeText(in DOMString content) raises(DOMException);
};
Java binding:
package org.w3c.dom;
public interface Text extedns Node {
public boolean getIsWhitespaceInElementContent();
public String getWholeText();
public Text replaceWholeText(String content)
throws DOMException;
}
DOM2 has no implementation-independent means to create
a new Document
object
Implementation-dependent methods tend to be fairly complex. For example, in Xerces-J:
DOMImplementation impl = DOMImplementationImpl.getDOMImplementation();
Document fibonacci = impl.createDocument(null, "Fibonacci_Numbers", null);
Still no language-independent means to create
a new Document
object
Does provide an implementation-independent method for Java only:
DOMImplementation impl = DOMImplementationRegistry.getDOMImplementation("XML");
package org.w3c.dom;
public class DOMImplementationRegistry {
// The system property to specify the DOMImplementationSource class names.
public static String PROPERTY = "org.w3c.dom.DOMImplementationSourceList";
public static DOMImplementation getDOMImplementation(String features)
throws ClassNotFoundException, InstantiationException, IllegalAccessException;
public static void addSource(DOMImplementationSource s)
throws ClassNotFoundException, InstantiationException, IllegalAccessException;
}
Loading: parsing an existing XML document
to produce a Document
object
Saving: serializing a Document
object
into a file or onto a stream
Completely implementation dependent in DOM2
Library specific code creates a parser
The parser parses the document and returns a DOM
org.w3c.dom.Document
object.
The entire document is stored in memory.
DOM methods and interfaces are used to extract data from this object
This program parses with Xerces. Other parsers are different.
import org.apache.xerces.parsers.*; import org.w3c.dom.*; import org.xml.sax.*; import java.io.*; public class DOMParserMaker { public static void main(String[] args) { DOMParser parser = new DOMParser(); for (int i = 0; i < args.length; i++) { try { parser.parse(args[i]); Document d = parser.getDocument(); } catch (SAXException e) { System.err.println(e); } catch (IOException e) { System.err.println(e); } } } }
import org.w3c.dom.*; public class DOM3ParserMaker { public static void main(String[] args) { DOMImplementationFactoryLS impl = (DOMImplementationLS) DOMImplementationFactory.getDOMImplementation(); DOMBuilder parser = impl.getDOMBuilder(); for (int i = 0; i < args.length; i++) { try { Document d = parser.parseURI(args[i]); } catch (DOMSystemException e) { System.err.println(e); } catch (DOMException e) { System.err.println(e); } } } }
This code will not actually compile or run until some parser supports DOM3 Load and Save.
DOMImplementationLS
DOMImplementation
interface that provides the factory
methods for creating the objects
required for loading and saving.DOMBuilder
DOMInputSource
InputSource
DOMEntityResolver
DOMBuilderFilter
Element
nodes as
they are being processed during the parsing of a document.
like SAX filters.
DOMWriter
DOMCMBuilder
DOMCMWriter
DocumentLS
ParserErrorEvent
Factory interface to create new
DOMBuilder
and DOMWriter
implementations.
IDL:
interface DOMImplementationLS {
DOMBuilder createDOMBuilder();
DOMWriter createDOMWriter();
};
Java Binding:
package org.w3c.dom.loadSave;
public interface DOMImplementationLS {
public DOMBuilder createDOMBuilder();
public DOMWriter createDOMWriter();
}
Provides an implementation-independent
API for parsing XML documents to produce a DOM
Document
object.
Instances are built by the
createDOMBuilder()
method in DOMImplementationLS
.
IDL:
interface DOMBuilder {
attribute DOMEntityResolver entityResolver;
attribute DOMErrorHandler errorHandler;
attribute DOMBuilderFilter filter;
attribute boolean mimeTypeCheck;
void setFeature(in DOMString name,
in boolean state)
raises(DOMException);
boolean supportsFeature(in DOMString name);
boolean canSetFeature(in DOMString name,
in boolean state);
boolean getFeature(in DOMString name)
raises(dom::DOMException);
Document parseURI(in DOMString uri)
raises(DOMException, DOMSystemException);
Document parseDOMInputSource(in DOMInputSource is)
raises(DOMException, DOMSystemException);
};
Java Binding:
package org.w3c.dom.loadSave;
public interface DOMBuilder {
public DOMEntityResolver getEntityResolver();
public void setEntityResolver(DOMEntityResolver entityResolver);
public DOMErrorHandler getErrorHandler();
public void setErrorHandler(DOMErrorHandler errorHandler);
public DOMBuilderFilter getFilter();
public void setFilter(DOMBuilderFilter filter);
public boolean getMimeTypeCheck();
public void setMimeTypeCheck(boolean mimeTypeCheck);
public void setFeature(String name, boolean state)
throws DOMException;
public boolean supportsFeature(String name);
public boolean canSetFeature(String name, boolean state);
public boolean getFeature(String name) throws DOMException;
public Document parseURI(String uri)
throws DOMException, DOMSystemException;
public Document parseDOMInputSource(DOMInputSource is)
throws DOMException, DOMSystemException;
}
Like SAX2's InputSource
class,
this interface is an abstraction of all the different things
(streams, files, byte arrays, sockets, URLs, etc.) from which
an XML document can be read.
IDL:
interface DOMInputSource {
attribute DOMInputStream byteStream;
attribute DOMReader characterStream;
attribute DOMString encoding;
attribute DOMString publicId;
attribute DOMString systemId;
};
Java Binding:
package org.w3c.dom.loadSave;
public interface DOMInputSource {
public InputStream getByteStream();
public void setByteStream(InputStream in);
public Reader getCharacterStream();
public void setCharacterStream(Reader in);
public String getEncoding();
public void setEncoding(String encoding);
public String getPublicId();
public void setPublicId(String publicId);
public String getSystemId();
public void setSystemId(String systemId);
}
Like SAX2's EntityResolver
interface,
this interface lets applications redirect references to external entities.
IDL:
interface DOMEntityResolver {
DOMInputSource resolveEntity(in DOMString publicId,
in DOMString systemId )
raises(DOMSystemException);
};
Java Binding:
package org.w3c.dom.loadSave;
public interface DOMEntityResolver {
public DOMInputSource resolveEntity(String publicId,
String systemId ) throws DOMSystemException;
}
Provides an API for serializing (writing) a DOM document out as a sequence of bytes onto a stream, file, socket, byte array, etc.
IDL:
interface DOMWriter {
attribute DOMString encoding;
readonly attribute DOMString lastEncoding;
attribute unsigned short format;
// Modified in DOM Level 3:
attribute DOMString newLine;
void writeNode(in DOMOutputStream destination, in Node node)
raises(DOMSystemException);
};
Java Binding:
package org.w3c.dom.loadSave;
public interface DOMWriter {
public String getEncoding();
public void setEncoding(String encoding);
public String getLastEncoding();
public short getFormat();
public void setFormat(short format);
public String getNewLine();
public void setNewLine(String newLine);
public void writeNode(OutputStream out, Node node)
throws DOMSystemException;
}
Lets applications examine element nodes as they are being constructed during a parse.
As each element is examined, it may be modified or removed, or parsing may be aborted.
IDL:
interface DOMBuilderFilter {
boolean startElement(in Element element);
boolean endElement(in Element element);
};
Java Binding:
package org.w3c.dom.loadSave;
public interface DOMBuilderFilter {
public boolean startElement(Element element);
public boolean endElement(Element element);
}
A DTD parser, a schema parser, etc.
IDL:
interface DOMCMBuilder : DOMBuilder {
CMModel parseCMURI(in DOMString uri)
raises(DOMException, DOMSystemException);
CMModel parseCMInputSource(in DOMInputSource is)
raises(DOMException, DOMSystemException);
};
Java Binding:
package org.w3c.dom.loadSave;
public interface DOMCMBuilder extends DOMBuilder {
public CMModel parseCMURI(String uri)
throws DOMException, DOMSystemException;
public CMModel parseCMInputSource(DOMInputSource is)
throws DOMException, DOMSystemException;
}
Serializes DTDs, schemas, and other content models
IDL:
interface DOMCMWriter : DOMWriter {
void writeCMModel(in DOMOutputStream destination,
in CMModel model)
raises(DOMSystemException);
};
Java Binding:
package org.w3c.dom.loadSave;
public interface DOMCMWriter extends DOMWriter {
public void writeCMModel(OutputStream destination, CMModel model)
throws DOMSystemException;
}
A "mechanism by which the content of a document can be replaced with the DOM tree produced when loading a URL, or parsing a string."
An instance of the DocumentLS
interface
can be obtained by using binding-specific casting methods on an instance of the
Document
interface.
IDL:
interface DocumentLS {
attribute boolean async;
void abort();
boolean load(in DOMString url);
boolean loadXML(in DOMString source);
DOMString saveXML(in Node node) raises(DOMException);
};
Java Binding:
package org.w3c.dom.loadSave;
import org.w3c.dom.Node;
import org.w3c.dom.DOMException;
public interface DocumentLS {
public boolean getAsync();
public void setAsync(boolean async);
public void abort();
public boolean load(String url);
public boolean loadXML(String source);
public String saveXML(Node node) throws DOMException;
}
Represents an error (of what kind?) in the document being parsed
IDL:
interface ParserErrorEvent {
readonly attribute long errorCode;
readonly attribute long filepos;
readonly attribute long line;
readonly attribute long linepos;
readonly attribute DOMString reason;
readonly attribute DOMString srcText;
readonly attribute DOMString url;
};
Java Binding:
package org.w3c.dom.loadSave;
public interface ParserErrorEvent {
public int getErrorCode();
public int getFilepos();
public int getLine();
public int getLinepos();
public String getReason();
public String getSrcText();
public String getUrl();
}
Abstract Schemas (AS) include DTDs, W3C XML Schema Language Schemas, TREX, and more
Should be able to access their information without binding yourself too tightly to any one language
Abstract Schema and AS-Editing Interfaces:
ASModel
ASExternalModel
ASNode
ASNodeList
ASDOMStringList
ASNamedNodeMap
ASDataType
ASPrimitiveType
ASElementDeclaration
ASChildren
ASAttributeDeclaration
ASEntityDeclaration
ASNotationDeclaration
Validation and Other Interfaces:
Document
DocumentAS
DOMImplementationAS
Document-Editing Interfaces:
NodeAS
ElementAS
CharacterDataAS
DocumentTypeAS
AttributeAS
DOM Error Handler Interfaces:
DOMErrorHandler
DOMLocator
DOMImplementation.hasFeature("AS-EDIT")
returns true if
a given DOM supports these interfaces for editing abstract schemas:
ASModel
ASExternalModel
ASNode
ASNodeList
ASNamedNodeMap
ASDataType
ASElementDeclaration
ASChildren
ASAttributeDeclaration
ASEntityDeclaration
ASNotationDeclaration
Represents an abstract content model that could be a DTD, an XML Schema, or something else. It has both an internal and external subset.
IDL:
interface ASModel : ASNode {
readonly attribute boolean isNamespaceAware;
attribute ASElementDeclaration rootElementDecl;
attribute DOMString systemId;
attribute DOMString publicId;
ASNodeList getASNodes();
boolean removeNode(in ASNode node);
boolean insertBefore(in ASNode newNode in ASNode refNode);
boolean validate();
ASElementDeclaration createASElementDeclaration(inout DOMString namespaceURI,
in DOMString qualifiedElementName)
raises(DOMException);
ASAttributeDeclaration createASAttributeDeclaration(inout DOMString namespaceURI,
in DOMString qualifiedName)
raises(DOMException);
ASNotationDeclaration createASNotationDeclaration(inout DOMString namespaceURI,
in DOMString qualifiedElementName,
in DOMString systemIdentifier,
inout DOMString publicIdentifier)
raises(DOMException);
ASEntityDeclaration createASEntityDeclaration(in DOMString name)
raises(DOMException);
ASChildren createASChildren(in unsigned long minOccurs,
in unsigned long maxOccurs,
inout unsigned short operator)
raises(DOMException);
};
Java binding:
package org.w3c.dom.abstractSchemas;
import org.w3c.dom.DOMException;
public interface ASModel extends ASNode {
public boolean getIsNamespaceAware();
public ASElementDeclaration getRootElementDecl();
public void setRootElementDecl(ASElementDeclaration rootElementDecl);
public String getSystemId();
public void setSystemId(String systemId);
public String getPublicId();
public void setPublicId(String publicId);
public ASNodeList getASNodes();
public boolean removeNode(ASNode node);
public boolean insertBefore(ASNode newNode, ASNode refNode);
public boolean validate();
public ASElementDeclaration createASElementDeclaration(String namespaceURI,
String qualifiedElementName) throws DOMException;
public ASAttributeDeclaration createASAttributeDeclaration(String namespaceURI,
String qualifiedName) throws DOMException;
public ASNotationDeclaration createASNotationDeclaration(String namespaceURI,
String qualifiedElementName,
String systemIdentifier,
String publicIdentifier)
throws DOMException;
public ASEntityDeclaration createASEntityDeclaration(String name)
throws DOMException;
public ASChildren createASChildren(int minOccurs,
int maxOccurs,
short operator)
throws DOMException;
}
A ASModel
that is not bound to a particular document,
and can thus be shared among documents.
IDL:
interface ASExternalModel : ASModel {
};
Java binding:
package org.w3c.dom.contentModel;
public interface ASExternalModel extends ASModel {
}
The node for the various kinds of declarations out of which
ASModel
s are built
IDL:
interface ASNode {
const unsigned short AS_ELEMENT_DECLARATION = 1;
const unsigned short AS_ATTRIBUTE_DECLARATION = 2;
const unsigned short AS_NOTATION_DECLARATION = 3;
const unsigned short AS_ENTITY_DECLARATION = 4;
const unsigned short AS_CHILDREN = 5;
const unsigned short AS_MODEL = 6;
const unsigned short AS_EXTERNALMODEL = 7;
readonly attribute unsigned short cmNodeType;
attribute ASModel ownerASModel;
attribute DOMString nodeName;
attribute DOMString prefix;
attribute DOMString localName;
attribute DOMString namespaceURI;
ASNode cloneASNode();
};
Java binding:
package org.w3c.dom.abstractSchemas;
public interface ASNode {
public static final short AS_ELEMENT_DECLARATION = 1;
public static final short AS_ATTRIBUTE_DECLARATION = 2;
public static final short AS_NOTATION_DECLARATION = 3;
public static final short AS_ENTITY_DECLARATION = 4;
public static final short AS_CHILDREN = 5;
public static final short AS_MODEL = 6;
public static final short AS_EXTERNALMODEL = 7;
public short getCmNodeType();
public ASModel getOwnerASModel();
public void setOwnerASModel(ASModel ownerASModel);
public String getNodeName();
public void setNodeName(String nodeName);
public String getPrefix();
public void setPrefix(String prefix);
public String getLocalName();
public void setLocalName(String localName);
public String getNamespaceURI();
public void setNamespaceURI(String namespaceURI);
public ASNode cloneASNode();
}
An ordered list of the nodes in a content model
IDL:
interface ASNodeList {
readonly attribute int length;
ASNode item(in int index);
};
Java binding:
package org.w3c.dom.abstractSchemas;
public interface ASNodeList {
public int getLength();
public ASNode item(int index);
}
An unordered set of AS nodes
IDL:
interface ASNamedNodeMap {
readonly attribute int length;
ASNode getNamedItem(inout DOMString name);
ASNode getNamedItemNS(in DOMString namespaceURI, inout DOMString localName);
ASNode item(in int index);
ASNode removeNamedItem(in DOMString name);
ASNode removeNamedItemNS(in DOMString namespaceURI, in DOMString localName);
ASNode setNamedItem(inout ASNode newASNode)
raises(DOMASException);
ASNode setNamedItemNS(inout ASNode newASNode)
raises(DOMASException);
};
Java binding:
package org.w3c.dom.abstractSchemas;
import org.w3c.dom.DOMASException;
public interface ASNamedNodeMap {
public int getLength();
public ASNode getNamedItem(String name);
public ASNode getNamedItemNS(String namespaceURI,
String localName);
public ASNode item(int index);
public ASNode removeNamedItem(String name);
public ASNode removeNamedItemNS(String namespaceURI,
String localName);
public ASNode setNamedItem(ASNode newASNode)
throws DOMASException;
public ASNode setNamedItemNS(ASNode newASNode)
throws DOMASException;
}
Data types used in content models
This one is a little weak
IDL:
interface ASDataType {
const short STRING_DATATYPE = 1;
short getASPrimitiveType();
};
Java binding:
package org.w3c.dom.abstractSchemas;
public interface ASDataType {
public static final short STRING_DATATYPE = 1;
public short getASPrimitiveType();
}
Primitive data types used in content models
This one is a little weak
IDL:
interface ASPrimitiveType : ASDataType {
const short BOOLEAN_DATATYPE = 2;
const short FLOAT_DATATYPE = 3;
const short DOUBLE_DATATYPE = 4;
const short DECIMAL_DATATYPE = 5;
const short HEXBINARY_DATATYPE = 6;
const short BASE64BINARY_DATATYPE = 7;
const short ANYURI_DATATYPE = 8;
const short QNAME_DATATYPE = 9;
const short DURATION_DATATYPE = 10;
const short DATETIME_DATATYPE = 11;
const short DATE_DATATYPE = 12;
const short TIME_DATATYPE = 13;
const short YEARMONTH_DATATYPE = 14;
const short YEAR_DATATYPE = 15;
const short MONTHDAY_DATATYPE = 16;
const short DAY_DATATYPE = 17;
const short MONTH_DATATYPE = 18;
const short NOTATION_DATATYPE = 19;
attribute decimal lowValue;
attribute decimal highValue;
};
Java binding:
package org.w3c.dom.abstractSchemas;
import org.w3c.dom.decimal;
public interface ASPrimitiveType extends ASDataType {
public static final short BOOLEAN_DATATYPE = 2;
public static final short FLOAT_DATATYPE = 3;
public static final short DOUBLE_DATATYPE = 4;
public static final short DECIMAL_DATATYPE = 5;
public static final short HEXBINARY_DATATYPE = 6;
public static final short BASE64BINARY_DATATYPE = 7;
public static final short ANYURI_DATATYPE = 8;
public static final short QNAME_DATATYPE = 9;
public static final short DURATION_DATATYPE = 10;
public static final short DATETIME_DATATYPE = 11;
public static final short DATE_DATATYPE = 12;
public static final short TIME_DATATYPE = 13;
public static final short YEARMONTH_DATATYPE = 14;
public static final short YEAR_DATATYPE = 15;
public static final short MONTHDAY_DATATYPE = 16;
public static final short DAY_DATATYPE = 17;
public static final short MONTH_DATATYPE = 18;
public static final short NOTATION_DATATYPE = 19;
public decimal getLowValue();
public void setLowValue(decimal lowValue);
public decimal getHighValue();
public void setHighValue(decimal highValue);
}
Represents a declaration of an element such as
<!ELEMENT TIME (#PCDATA)>
or an xsd:element
schema element
IDL:
interface ASElementDeclaration : ASNode {
const short EMPTY_CONTENTTYPE = 1;
const short ANY_CONTENTTYPE = 2;
const short MIXED_CONTENTTYPE = 3;
const short ELEMENTS_CONTENTTYPE = 4;
attribute boolean strictMixedContent;
attribute ASDataType elementType;
attribute boolean isPCDataOnly;
attribute short contentType;
attribute DOMString tagName;
ASChildren getASChildren();
void setASChildren(inout ASChildren elementContent)
raises(DOMASException);
ASNamedNodeMap getASAttributeDecls();
void setASAttributeDecls(inout ASNamedNodeMap attributes);
void addASAttributeDecl(in ASAttributeDeclaration attributeDecl);
ASAttributeDeclaration removeASAttributeDecl(in ASAttributeDeclaration attributeDecl);
};
Java binding:
package org.w3c.dom.abstractSchemas;
import org.w3c.dom.DOMASException;
public interface ASElementDeclaration extends ASNode {
public static final short EMPTY_CONTENTTYPE = 1;
public static final short ANY_CONTENTTYPE = 2;
public static final short MIXED_CONTENTTYPE = 3;
public static final short ELEMENTS_CONTENTTYPE = 4;
public boolean getStrictMixedContent();
public void setStrictMixedContent(boolean strictMixedContent);
public ASDataType getElementType();
public void setElementType(ASDataType elementType);
public boolean getIsPCDataOnly();
public void setIsPCDataOnly(boolean isPCDataOnly);
public short getContentType();
public void setContentType(short contentType);
public String getTagName();
public void setTagName(String tagName);
public ASChildren getASChildren();
public void setASChildren(ASChildren elementContent)
throws DOMASException;
public ASNamedNodeMap getASAttributeDecls();
public void setASAttributeDecls(ASNamedNodeMap attributes);
public void addASAttributeDecl(ASAttributeDeclaration attributeDecl);
public ASAttributeDeclaration removeASAttributeDecl(ASAttributeDeclaration attributeDecl);
}
Represents the list of child elements in a content model of an element declaration
IDL:
interface ASChildren : ASNode {
const unsigned long UNBOUNDED = MAX_LONG;
const unsigned short NONE = 0;
const unsigned short SEQUENCE = 1;
const unsigned short CHOICE = 2;
attribute unsigned short listOperator;
attribute unsigned long minOccurs;
attribute unsigned long maxOccurs;
attribute ASNodeList subModels;
ASNode removeASNode(in unsigned long nodeIndex);
int insertASNode(in unsigned long nodeIndex,
in ASNode newNode);
int appendASNode(in ASNode newNode);
};
Java binding:
package org.w3c.dom.abstractSchemas;
public interface ASChildren extends ASNode {
public static final int UNBOUNDED = MAX_LONG;
public static final short NONE = 0;
public static final short SEQUENCE = 1;
public static final short CHOICE = 2;
public short getListOperator();
public void setListOperator(short listOperator);
public int getMinOccurs();
public void setMinOccurs(int minOccurs);
public int getMaxOccurs();
public void setMaxOccurs(int maxOccurs);
public ASNodeList getSubModels();
public void setSubModels(ASNodeList subModels);
public ASNode removeASNode(int nodeIndex);
public int insertASNode(int nodeIndex, ASNode newNode);
public int appendASNode(ASNode newNode);
}
Represents a declaration of an attribute; e.g. an xsd:attribute
schema element
oe
<!ATTLIST TIME HOURS CDATA #IMPLIED>
IDL:
interface ASAttributeDeclaration : ASNode {
const short NO_VALUE_CONSTRAINT = 0;
const short DEFAULT_VALUE_CONSTRAINT = 1;
const short FIXED_VALUE_CONSTRAINT = 2;
attribute DOMString attrName;
attribute ASDataType attrType;
attribute DOMString attributeValue;
attribute DOMString enumAttr;
attribute ASNodeList ownerElement;
attribute short constraintType;
};
Java binding:
package org.w3c.dom.abstractSchemas;
public interface ASAttributeDeclaration extends ASNode {
public static final short NO_VALUE_CONSTRAINT = 0;
public static final short DEFAULT_VALUE_CONSTRAINT = 1;
public static final short FIXED_VALUE_CONSTRAINT = 2;
public String getAttrName();
public void setAttrName(String attrName);
public ASDataType getAttrType();
public void setAttrType(ASDataType attrType);
public String getAttributeValue();
public void setAttributeValue(String attributeValue);
public String getEnumAttr();
public void setEnumAttr(String enumAttr);
public ASNodeList getOwnerElement();
public void setOwnerElement(ASNodeList ownerElement);
public short getConstraintType();
public void setConstraintType(short constraintType);
}
Represents a declaration of an entity; e.g.
<!ENTITY COPY01 "Copyright 2001 Elliotte Harold">
IDL:
interface ASEntityDeclaration : ASNode {
const short INTERNAL_ENTITY = 1;
const short EXTERNAL_ENTITY = 2;
attribute short entityType;
attribute DOMString entityName;
attribute DOMString entityValue;
attribute DOMString systemId;
attribute DOMString publicId;
attribute DOMString notationName;
};
Java binding:
ppackage org.w3c.dom.abstractSchemas;
public interface ASEntityDeclaration extends ASNode {
public static final short INTERNAL_ENTITY = 1;
public static final short EXTERNAL_ENTITY = 2;
public short getEntityType();
public void setEntityType(short entityType);
public String getEntityName();
public void setEntityName(String entityName);
public String getEntityValue();
public void setEntityValue(String entityValue);
public String getSystemId();
public void setSystemId(String systemId);
public String getPublicId();
public void setPublicId(String publicId);
public String getNotationName();
public void setNotationName(String notationName);
}
Represents a declaration of a notation; e.g.
<!NOTATION TXT SYSTEM "text/plain">
IDL:
interface ASNotationDeclaration : ASNode {
attribute DOMString notationName;
attribute DOMString systemId;
attribute DOMString publicId;
};
Java binding:
package org.w3c.dom.abstractSchemas;
public interface ASNotationDeclaration extends ASNode {
public String getNotationName();
public void setNotationName(String notationName);
public String getSystemId();
public void setSystemId(String systemId);
public String getPublicId();
public void setPublicId(String publicId);
}
Document
DocumentAS
DOMImplementationAS
The DOM2 Document
interface gets
a new setErrorHandler()
method
IDL:
interface Document {
void setErrorHandler(in DOMErrorHandler handler);
};
Java binding:
package org.w3c.dom.contentModel;
public interface Document {
public void setErrorHandler(DOMErrorHandler handler);
}
The different specs aren't synced up on this one yet.
Extends the
Document
interface with additional methods for both
document and abstract schema
editing.
IDL:
interface DocumentAS : Document {
attribute boolean continuousValidityChecking;
int numASs();
ASModel getInternalAS();
ASNodeList getASs();
ASModel getActiveAS();
void addAS(in ASModel cm);
void removeAS(in ASModel cm);
boolean activateAS(in ASModel cm);
};
Java binding:
package org.w3c.dom.abstractSchemas;
public interface DocumentAS extends Document {
public boolean getContinuousValidityChecking();
public void setContinuousValidityChecking(boolean continuousValidityChecking);
public int numASs();
public ASModel getInternalAS();
public ASNodeList getASs();
public ASModel getActiveAS();
public void addAS(ASModel cm);
public void removeAS(ASModel cm);
public boolean activateAS(ASModel cm);
}
Extends the DOM2
DOMImplementation
interface with factory methods to create
schema documents
IDL:
interface DOMImplementationAS : DOMImplementation {
ASModel createAS();
ASExternalModel createExternalAS();
};
Java binding:
package org.w3c.dom.abstractSchemas;
public interface DOMImplementationAS extends DOMImplementation {
public ASModel createAS();
public ASExternalModel createExternalAS();
}
Allows you to determine whether or not it's valid to add or a delete a node at a particular position in a document. This is called guided document editing.
DOMImplementation.hasFeature("AS-DOC")
returns true if
a given DOM supports these capabilities.
NodeAS
ElementAS
CharacterDataAS
DocumentTypeAS
AttributeAS
Extends the DOM2 Node
interface with methods for
guided document editing.
IDL:
interface NodeAS : Node {
const short WF_CHECK = 1;
const short NS_WF_CHECK = 2;
const short PARTIAL_VALIDITY_CHECK = 3;
const short STRICT_VALIDITY_CHECK = 4;
attribute short wfValidityCheckLevel;
boolean canInsertBefore(in Node newChild, in Node refChild)
raises(DOMException);
boolean canRemoveChild(in Node oldChild) raises(DOMException);
boolean canReplaceChild(in Node newChild, in Node oldChild)
raises(DOMException);
boolean canAppendChild(in Node newChild) raises(DOMException);
boolean isValid(in boolean deep) raises(DOMException);
};
Java binding:
package org.w3c.dom.abstractSchemas;
public interface NodeAS extends Node {
public static final short WF_CHECK = 1;
public static final short NS_WF_CHECK = 2;
public static final short PARTIAL_VALIDITY_CHECK = 3;
public static final short STRICT_VALIDITY_CHECK = 4;
public short getWfValidityCheckLevel();
public void setWfValidityCheckLevel(short wfValidityCheckLevel);
public boolean canInsertBefore(Node newChild, Node refChild)
throws DOMException;
public boolean canRemoveChild(Node oldChild)
throws DOMException;
public boolean canReplaceChild(Node newChild, Node oldChild)
throws DOMException;
public boolean canAppendChild(Node newChild)
throws DOMException;
public boolean isValid(boolean deep) throws DOMException;
}
Extends the DOM2 Element
interface with methods for guided document editing.
IDL:
interface ElementAS : Element {
short contentType();
ASElementDeclaration getElementDeclaration() raises(DOMException);
boolean canSetAttribute(in DOMString attrname,
in DOMString attrval);
boolean canSetAttributeNode(in Node node);
boolean canSetAttributeNodeNS(in Node node);
boolean canSetAttributeNS(in DOMString attrname,
in DOMString attrval,
in DOMString namespaceURI,
in DOMString localName);
boolean canRemoveAttribute(in DOMString attrname);
boolean canRemoveAttributeNS(in DOMString attrname,
inout DOMString namespaceURI);
boolean canRemoveAttributeNode(in Node node);
ASDOMStringList getChildElements();
ASDOMStringList getParentElements();
ASDOMStringList getAttributeList();
};
Java binding:
package org.w3c.dom.abstractSchemas;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.DOMException;
public interface ElementAS extends Element {
public short contentType();
public ASElementDeclaration getElementDeclaration()
throws DOMException;
public boolean canSetAttribute(String attrname, String attrval);
public boolean canSetAttributeNode(Node node);
public boolean canSetAttributeNodeNS(Node node);
public boolean canSetAttributeNS(String attrname,
String attrval, String namespaceURI, String localName);
public boolean canRemoveAttribute(String attrname);
public boolean canRemoveAttributeNS(String attrname, String namespaceURI);
public boolean canRemoveAttributeNode(Node node);
public ASDOMStringList getChildElements();
public ASDOMStringList getParentElements();
public ASDOMStringList getAttributeList();
}
Extends the DOM2 Text
interface (which itself extends the DOM2 CharacterData
interface) with methods for guided document editing.
IDL:
interface CharacterDataAS : CharacterData {
boolean isWhitespaceOnly();
boolean canSetData(in unsigned long offset, in DOMString arg)
raises(DOMException);
boolean canAppendData(in DOMString arg) raises(DOMException);
boolean canReplaceData(in unsigned long offset,
in unsigned long count, in DOMString arg)
raises(DOMException);
boolean canInsertData(in unsigned long offset, in DOMString arg)
raises(DOMException);
boolean canDeleteData(in unsigned long offset, in DOMString arg)
raises(DOMException);
};
Java binding:
package org.w3c.dom.abstractSchemas;
public interface CharacterDataAS extends CharacterData {
public boolean isWhitespaceOnly();
public boolean canSetData(int offset, String arg)
throws DOMException;
public boolean canAppendData(String arg)
throws DOMException;
public boolean canReplaceData(int offset, int count, String arg)
throws DOMException;
public boolean canInsertData(int offset, String arg)
throws DOMException;
public boolean canDeleteData(int offset, String arg)
throws DOMException;
}
Extends the DOM2 DocumentType
interface with methods for guided document editing.
IDL:
interface DocumentTypeAS : DocumentType {
readonly attribute ASDOMStringList definedElementTypes;
boolean isElementDefined(in DOMString elemTypeName);
boolean isElementDefinedNS(in DOMString elemTypeName,
in DOMString namespaceURI,
in DOMString localName);
boolean isAttributeDefined(in DOMString elemTypeName,
in DOMString attrName);
boolean isAttributeDefinedNS(in DOMString elemTypeName,
in DOMString attrName,
in DOMString namespaceURI,
in DOMString localName);
boolean isEntityDefined(in DOMString entName);
};
Java binding:
package org.w3c.dom.abstractSchemas;
public interface DocumentTypeAS extends DocumentType {
public ASDOMStringList getDefinedElementTypes();
public boolean isElementDefined(String elemTypeName);
public boolean isElementDefinedNS(String elemTypeName,
String namespaceURI,
String localName);
public boolean isAttributeDefined(String elemTypeName,
String attrName);
public boolean isAttributeDefinedNS(String elemTypeName,
String attrName,
String namespaceURI,
String localName);
public boolean isEntityDefined(String entName);
}
Extends the DOM2 Attr
interface with methods for guided document editing.
IDL:
interface AttributeAS : Attr {
ASAttributeDeclaration getAttributeDeclaration();
ASNotationDeclaration getNotation() raises(DOMException);
};
Java binding:
package org.w3c.dom.abstractSchemas;
import org.w3c.dom.DOMException;
import org.w3c.dom.Attr;
public interface AttributeAS extends Attr {
public ASAttributeDeclaration getAttributeDeclaration();
public ASNotationDeclaration getNotation() throws DOMException;
}
DOMErrorHandler
DOMLocator
Similar to SAX2's ErrorHandler
interface.
A callback interface
An application implements this interface and
then registers it with the setErrorHandler()
method to provide
warnings, errors, and fatal errors.
IDL:
interface DOMErrorHandler {
void warning(in DOMLocator where,
in DOMString how,
in DOMString why)
raises(dom::DOMSystemException);
void fatalError(in DOMLocator where,
in DOMString how,
in DOMString why)
raises(dom::DOMSystemException);
void error(in DOMLocator where,
in DOMString how,
in DOMString why)
raises(dom::DOMSystemException);
};
Java binding:
package org.w3c.dom.contentModel;
public interface DOMErrorHandler {
public void warning(DOMLocator where, String how, String why)
throws DOMSystemException;
public void fatalError(DOMLocator where, String how, String why)
throws DOMSystemException;
public void error(DOMLocator where, String how, String why)
throws DOMSystemException;
}
Similar to SAX2's Locator
interface.
An application can implement this interface and
then register it with the setLocator()
method to
find out in which line and column and file a given
node appears.
IDL:
interface DOMLocator {
int getColumnNumber();
int getLineNumber();
DOMString getPublicID();
DOMString getSystemID();
Node getNode();
};
Java binding:
package org.w3c.dom.contentModel;
public interface DOMLocator {
public int getColumnNumber();
public int getLineNumber();
public String getPublicID();
public Node getNode();
}
Document Object Model (DOM) Level 3 Content Models and Load and Save Specification: http://www.w3.org/TR/DOM-Level-3-CMLS/
Document Object Model (DOM) Level 3 Core Specification Version 1.0: http://www.w3.org/TR/DOM-Level-3-Core
Document Object Model (DOM) Requirements: http://www.w3.org/TR/DOM-Requirements/
Document Object Model (DOM) Level 3 Views and Formatting Specification: http://www.w3.org/TR/DOM-Level-3-Views/
In SQL, the query language is not expressed in tables and rows. In XQuery, the query language is not expressed in XML. Why is this a problem?--Jonathan Robie on the xml-dev mailing list
Used for XSLT 2.0 and XQuery
Schema Aware
Simplify manipulation of XML Schema-typed content
Simplify manipulation of string content
Support related XML standards
Improve ease of use
Improve interoperability
Improve internationalization (i18n) support
Maintain backward compatibility
Enable improved processor efficiency
Must express data model in terms of the Infoset
Must provide common core syntax and semantics for XSLT 2.0 and XML Query 1.0
Must support explicit "for any" or "for all" comparison and equality semantics
Must add min()
and max()
functions
Any valid XPath 1.0 expression SHOULD also be a valid XPath 2.0 expression when operating in the absence of XML Schema type information.
Should provide intersection and difference functions
Must loosen restrictions on location steps
Must provide a conditional expression (e.g. ternary
?:
operator in Java and C)
Should support additional string functions, possibly including space padding, string replacement and conversion to upper or lower case
Must support regular expression string matching using the regexp syntax from schemas
Must add support for XML Schema primitive datatypes
Should add support for XML Schema structures
Uses XPath 2.0
Schema Aware
Simplify manipulation of XML Schema-typed content
Simplify manipulation of string content
Support related XML standards
Improve ease of use
Improve interoperability
Improve i18n support
Maintain backward compatibility
Enable improved processor efficiency
Simplifying the ability to parse unstructured information to produce structured results.
Turning XSLT into a general-purpose programming language
Must maintain backwards compatibility with XSLT 1.1
Should be able to match elements and attributes whose value is explicitly null.
Should allow included documents to encapsulate local stylesheets
Could support accessing infoset items for XML declaration
Could provide qualified name aware string functions
Could enable constructing a namespace with computed name
Could simplify resolving prefix conflicts in qname-valued attributes
Could support XHTML output method
Must allow matching on default namespace without explicit prefix
Must add date formatting functions
Must simplify accessing IDs and keys in other documents
Should provide function to absolutize relative URIs
Should include unparsed text from an external resource
Should allow authoring extension functions in XSLT
Should output character entity references instead of numeric character entities
Should construct entity reference by name
Should support Unicode string normalization
Should standardize extension element language bindings
Could improve efficiency of transformations on large documents
Could support reverse IDREF attributes
Could support case-insensitive comparisons
Could support lexigraphic string comparisons
Could allow comparing nodes based on document order
Could improve support for unparsed entities
Could allow processing a node with the "next best matching" template
Could make coercions symmetric by allowing scalar to nodeset conversion
Must support XML schema
Must simplify constructing and copying typed content
Must support sorting nodes based on XML schema type
Could support scientific notation in number formatting
Could provide ability to detect whether "rich" schema information is available
Must simplify grouping
Multiple output documents
Variables can be set to node sets; no more result tree fragments.
Extension functions defined in style sheets with Java and ECMAScript
Standard Java and JavaScript bindings for extension functions
Existing elements and functions hardly change at all
Namespace is still http://www.w3.org/1999/XSL/Transform
version
attribute of
xsl:stylesheet
has value 2.0
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<!-- Top level elements -->
</xsl:stylesheet>
The result tree fragment data-type has been eliminated.
Variable-binding elements with content now construct node-sets
These node sets can now be operated on by templates
Functionality previously available with
saxon:nodeSet()
and similar extension functions
Allows you to generate multiple documents from one source document
Previously available with extension functions like
xt:document
and saxon:output
Syntax modeled on xsl:output
<xsl:document
href = { uri-reference }
method = { "xml" | "html" | "text" | qname-but-not-ncname }
version = { nmtoken }
encoding = { string }
omit-xml-declaration = { "yes" | "no" }
standalone = { "yes" | "no" }
doctype-public = { string }
doctype-system = { string }
cdata-section-elements = { qnames }
indent = { "yes" | "no" }
media-type = { string }
<!-- Content: template -->
</xsl:document>
Partially supported by Saxon 6.2 and later
<xsl:document method="html" encoding="ISO-8859-1" href="index.html">
<html>
<head>
<title><xsl:value-of select="title"/></title>
</head>
<body>
<h1 align="center"><xsl:value-of select="title"/></h1>
<ul>
<xsl:for-each select="slide">
<li><a href="{format-number(position(),'00')}.html"><xsl:value-of select="title"/></a></li>
</xsl:for-each>
</ul>
<p><a href="{translate(title,' ', '_')}.html">Entire Presentation as Single File</a></p>
<hr/>
<div align="center">
<A HREF="01.html">Start</A> | <A HREF="/xml/">Cafe con Leche</A>
</div>
<hr/>
<font size="-1">
Copyright 2001
<a href="http://www.macfaq.com/personal.html">Elliotte Rusty Harold</a><br/>
<a href="mailto:elharo@metalab.unc.edu">elharo@metalab.unc.edu</a><br/>
Last Modified <xsl:apply-templates select="last_modified" mode="lm"/>
</font>
</body>
</html>
</xsl:document>
Defines an extension function, possibly inline
Syntax:
<xsl:script
implements-prefix = ncname
language = "ecmascript" | "javascript" | "java" | qname-but-not-ncname
src = uri-reference
archive = uri-references>
<!-- Content: #PCDATA -->
</xsl:script>
Partially supported by Saxon 6.2 for Java only
<?xml version="1.0"?> <xsl:stylesheet version="1.1" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:date="http://www.cafeconleche.org/ns/" > <xsl:template match="/"> <xsl:value-of select="date:new()"/> </xsl:template> <xsl:script implements-prefix="date" language="java" src="java:java.util.Date" /> </xsl:stylesheet>
<?xml version="1.0"?> <xsl:stylesheet version="1.1" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:date="http://www.cafeconleche.org/ns/date" > <xsl:template match="/"> <xsl:value-of select="date:clock()"/> </xsl:template> <xsl:script implements-prefix="date" language="javascript"> function clock() { var time = new Date(); var hours = time.getHours(); var min = time.getMinutes(); var sec = time.getSeconds(); var status = "AM"; if (hours > 11) { status = "PM"; } if (hours < 11) { hours -= 12; } if (min < 10) { min = "0" + min; } if (sec < 10) { sec = "0" + sec; } return hours + ":" + min + ":" + sec + " " + status; } </xsl:script> </xsl:stylesheet>
Three parts:
A data model for XML documents based on the XML Infoset
A mathematically precise query algebra; i.e. a set of query operators on that data model
A query language based on these query operators and this algebra
A fourth generation declarative language like SQL; not a procedural language like Java or a functional language like XSLT
Queries operate on single documents or fixed collections of documents.
Queries select whole documents or subtrees of documents that match conditions defined on document content and structure
Can construct new documents based on what is selected
No updates or inserts!
Narrative documents and collections of such documents; e.g. generate a table of contents for a book
Data-oriented documents; e.g. SQL-like queries of an XML dump of a database
Filtering streams to process logs of email messages, network packets, stock market data, newswire feeds, EDI, or weather data to filter and route messages represented in XML, to extract data from XML streams, or to transform data in XML streams.
XML views of non-XML data
Files on a disk
Native-XML databases like Software AG's Tamino
DOM trees in memory
Streaming data
Other representations of the infoset
Direct query tools at command line
GUI query tools
JSP, ASP, PHP, and other such server side technologies
Programs written in Java, C++, and other languages that need to extract data from XML documents
Others are possible
Anywhere SQL is used to extract data from a database, XQuery is used to extract data from an XML document.
SQL is a non-compiled language that must be processed by some other tool to extract data from a database. So is XQuery.
A relational database contains tables | An XML database contains collections |
A relational table contains records with the same schema | A collection contains XML documents with the same DTD |
A relational record is an unordered list of named values | An XML document is a tree of nodes |
A SQL query returns an unordered set of records | An XQuery returns an ordered node set |
XML 1.0 #PCDATA
Schema primitive types: positiveInteger, String, float, double, unsignedLong, gYear, date, time, boolean, etc.
Schema complex types
Collections of these types
References to these types
Most of the examples in this talk query this bibliography document at the (relative) URL bib.xml:
<bib> <book year="1994"> <title>TCP/IP Illustrated</title> <author><last>Stevens</last><first>W.</first></author> <publisher>Addison-Wesley</publisher> <price> 65.95</price> </book> <book year="1992"> <title>Advanced Programming in the Unix Environment</title> <author><last>Stevens</last><first>W.</first></author> <publisher>Addison-Wesley</publisher> <price>65.95</price> </book> <book year="2000"> <title>Data on the Web</title> <author><last>Abiteboul</last><first>Serge</first></author> <author><last>Buneman</last><first>Peter</first></author> <author><last>Suciu</last><first>Dan</first></author> <publisher>Morgan Kaufmann Publishers</publisher> <price> 39.95</price> </book> <book year="1999"> <title>The Economics of Technology and Content for Digital TV</title> <editor> <last>Gerbarg</last><first>Darcy</first> <affiliation>CITI</affiliation> </editor> <publisher>Kluwer Academic Publishers</publisher> <price>129.95</price> </book> </bib>
Adapted from Mary Fernandez, Jerome Simeon, and Phil Wadler: XML Query Languages: Experiences and Exemplars, 1999, as adapted in XML Query Use Cases
FOR
: each node selected by an XPath 2.0 location path
LET
: a new variable have a specified value
WHERE
: a condition expressed in XPath is true
RETURN
: this node set
FOR $t IN document("bib.xml")/bib/book/title
RETURN
$t
Adapted from XML Query Use Cases
<title>TCP/IP Illustrated</title>
<title>Advanced Programming in the Unix Environment</title>
<title>Data on the Web</title>
<title>The Economics of Technology and Content for Digital TV</title>
Adapted from XML Query Use Cases
An XML Syntax for XQuery
Intended for machine processing and programmer convenience, not for human legibility
In XQuery:
FOR $t IN document("bib.xml")/bib/book/title
RETURN
$t
In XQueryX:
<?xml version="1.0"?>
<xq:query xmlns:xq="http://www.w3.org/2001/06/xqueryx">
<xq:flwr>
<xq:forAssignment variable="$t">
<xq:step axis="CHILD">
<xq:function name="document">
<xq:constant datatype="CHARSTRING">http://www.bn.com</xq:constant>
</xq:function>
<xq:identifier>bib</xq:identifier>
</xq:step>
<xq:step axis="CHILD">
<xq:identifier>book</xq:identifier>
</xq:step>
<xq:step axis="CHILD">
<xq:identifier>title</xq:identifier>
</xq:step>
</xq:forAssignment>
<xq:return>
<xq:variable>$b</xq:variable>
</xq:return>
</xq:flwr>
</xq:query>
Tags are given as literals
XQuery expression which is evaluated to become the contents of the element is enclosed in curly braces
The contents can also contain literal text outside the braces
List titles of all books in a bib
element.
Put each title in a book
element.
<bib>
{
FOR $t IN document("bib.xml")/bib/book/title
RETURN
<book>
{ $t }
</book>
}
</bib>
Adapted from XML Query Use Cases
<bib>
<book>
<title>TCP/IP Illustrated</title>
</book>
<book>
<title>Advanced Programming in the Unix Environment</title>
</book>
<book>
<title>Data on the Web</title>
</book>
<book>
<title>The Economics of Technology and Content for Digital TV</title>
</book>
</bib>
Adapted from XML Query Use Cases
List titles of books published by Addison-Wesley
<bib>
{
FOR $b IN document("bib.xml")/bib/book
WHERE $b/publisher = "Addison-Wesley"
RETURN
$b/title
}
</bib>
This WHERE
clause could be replaced by an XPath predicate:
<bib>
{
FOR $b IN document("bib.xml")/bib/book[publisher="Addison-Wesley"]
RETURN
$b/title
}
</bib>
But WHERE
clauses can combine
multiple variables from multiple documents
Adapted from XML Query Use Cases
<bib>
<title>TCP/IP Illustrated</title>
<title>Advanced Programming in the Unix Environment</title>
</bib>
Adapted from XML Query Use Cases
XQuery booleans include:
AND
OR
NOT()
List books published by Addison-Wesley after 1993:
<bib>
{
FOR $b IN document("bib.xml")/bib/book
WHERE $b/publisher = "Addison-Wesley" AND $b/@year > 1993
RETURN
$b/title
}
</bib>
Adapted from XML Query Use Cases
<bib>
<title>Advanced Programming in the Unix Environment</title>
</bib>
Adapted from XML Query Use Cases
List books published by Addison-Wesley after 1993, including their year and title:
<bib>
{
FOR $b IN document("bib.xml")/bib/book
WHERE $b/publisher = "Addison-Wesley" AND $b/@year > 1993
RETURN
<book year = { $b/@year }>
{ $b/title }
</book>
}
</bib>
This is not well-formed XML!
Adapted from XML Query Use Cases
<bib>
<book year="1992">
<title>Advanced Programming in the Unix Environment</title>
</book>
</bib>
Adapted from XML Query Use Cases
Create a list of all the title-author pairs, with each pair enclosed in
a result
element.
<results>
{
FOR $b IN document("bib.xml")/bib/book,
$t IN $b/title,
$a IN $b/author
RETURN
<result>
{ $t }
{ $a }
</result>
}
</results>
Adapted from XML Query Use Cases
<results>
<result>
<title>TCP/IP Illustrated</title>
<author><last>Stevens</last><first>W.</first></author>
</result>
<result>
<title>Advanced Programming in the Unix Environment</title>
<author><last>Stevens</last><first>W.</first></author>
</result>
<result>
<title>Data on the Web</title>
<author><last>Abiteboul</last><first>Serge</first></author>
</result>
<result>
<title> Data on the Web</title>
<author><last>Buneman</last><first>Peter</first></author>
</result>
<result>
<title>Data on the Web</title>
<author><last>Suciu</last><first>Dan</first></author>
</result>
</results>
Adapted from XML Query Use Cases
For each book in the bibliography, list the title and authors, grouped inside
a result
element.
<results>
{
FOR $b IN document("bib.xml")/bib/book
RETURN
<result>
{ $b/title }
{
FOR $a IN $b/author
RETURN $a
}
</result>
}
</results>
Adapted from XML Query Use Cases
<?xml version="1.0"?>
<results xmlns:ino="http://namespaces.softwareag.com/tamino/response2" xmlns:xql="http://metalab.unc.edu/xql/">
<result>
<title>TCP/IP Illustrated</title>
<author>
<last>Stevens</last>
<first>W.</first>
</author>
</result>
<result>
<title>Advanced Programming in the Unix Environment</title>
<author>
<last>Stevens</last>
<first>W.</first>
</author>
</result>
<result>
<title>Data on the Web</title>
<author>
<last>Abiteboul</last>
<first>Serge</first>
</author>
<author>
<last>Buneman</last>
<first>Peter</first>
</author>
<author>
<last>Suciu</last>
<first>Dan</first>
</author>
</result>
<result>
<title>The Economics of Technology and Content for Digital TV</title>
</result>
</results>
Adapted from XML Query Use Cases
For each author in the bibliography, list the author's name and the titles of
all books by that author, grouped inside a result
element.
<results>
{
FOR $a IN distinct(document("bib.xml")//author)
RETURN
<result>
{ $a }
{ FOR $b IN document("bib.xml")/bib/book[author=$a]
RETURN $b/title
}
</result>
}
</results>
Adapted from XML Query Use Cases
<results>
<result>
<author><last>Stevens</last><first>W.</first></author>
<title>TCP/IP Illustrated</title>
<title>Advanced Programming in the Unix Environment</title>
</result>
<result>
<author><last>Abiteboul</last><first>Serge</first></author>
<title>Data on the Web</title>
</result>
<result>
<author><last>Buneman</last><first>Peter</first></author>
<title>Data on the Web</title>
</result>
<result>
<author><last>Suciu</last><first>Dan</first></author>
<title>Data on the Web</title>
</result>
</results>
Adapted from XML Query Use Cases
List the titles and years of all books published by Addison-Wesley after 1991, in alphabetic order.
<bib>
{
FOR $b IN document("bib.xml")//book
[publisher = "Addison-Wesley" AND @year > "1991"]
RETURN
<book>
{ $b/@year } { $b/title }
</book> SORTBY (title)
}
</bib>
Adapted from XML Query Use Cases
<bib>
<book year="1992">
<title>Advanced Programming in the Unix Environment</title>
</book>
<book year="1994">
<title>TCP/IP Illustrated</title>
</book>
</bib>
Adapted from XML Query Use Cases
Find books in which some element has a tag ending in "or" and the same element contains the string "Suciu" (at any level of nesting). For each such book, return the title and the qualifying element.
<result xmlns:xf="http://www.w3.org/2001/08/xquery-operators">
FOR $b IN document("bib.xml")//book,
$e IN $b/*[contains(string(.), "Suciu")]
WHERE xf:ends_with(name($e), "or")
RETURN
<book>
{ $b/title} { $e }
</book>
</result>
Not supported by Quip yet
Adapted from XML Query Use Cases
<result>
<book>
<title> Data on the Web </title>
<author> <last> Suciu </last> <first> Dan </first> </author>
</book>
</result>
Adapted from XML Query Use Cases
xf:decimal
xf:integer
xf:long
xf:int
xf:short
xf:byte
xf:float
xf:double
xf:floor
xf:ceiling
xf:round
xf:string
xf:normalizedString
xf:token
xf:language
xf:Name
xf:NMTOKEN
xf:NCName
xf:ID
xf:IDREF
xf:ENTITY
xf:codepoint-compare
xf:compare
xf:concat
xf:starts-with
xf:ends-with
xf:codepoint-contains
xf:contains
xf:substring
xf:string-length
xf:codepoint-substring-before
xf:substring-before
xf:codepoint-substring-after
xf:substring-after
xf:normalize-space
xf:normalize-unicode
xf:upper-case
xf:lower-case
xf:translate
xf:string-pad-beginning
xf:string-pad-end
xf:match
xf:replace
xf:true
xf:false
xf:boolean-from-string
xf:not
xf:duration
xf:dateTime
xf:date
xf:time
xf:gYearMonth
xf:gYear
xf:gMonthDay
xf:gMonth
xf:gDay
xf:currentDateTime
xf:get-Century-from-dateTime
xf:get-Century-from-date
xf:get-Century-from-gYear
xf:get-Century-from-gYearMonth
xf:get-gYear-from-dateTime
xf:get-gYear-from-date
xf:get-gYear-from-gYearMonth
xf:get-gMonth-from-dateTime
xf:get-gMonth-from-date
xf:get-gMonth-from-gYearMonth
xf:get-gMonth-from-gMonthDay
xf:get-gDay-from-dateTime
xf:get-gDay-from-date
xf:get-gDay-from-gMonthDay
xf:get-hour-from-dateTime
xf:get-hour-from-time
xf:get-minutes-from-dateTime
xf:get-minutes-from-time
xf:get-seconds-from-dateTime
xf:get-seconds-from-time
xf:get-timezone-from-dateTime
xf:get-timezone-from-date
xf:get-timezone-from-time
xf:get-timezone-from-gYear
xf:get-timezone-from-gYearMonth
xf:get-timezone-from-gMonth
xf:get-timezone-from-gMonthDay
xf:get-timezone-from-gDay
xf:get-years
xf:get-months
xf:get-days
xf:get-hours
xf:get-minutes
xf:get-seconds
xf:add-days
xf:add-months
xf:add-years
xf:add-gMonth
xf:add-gYear
xf:get-duration
xf:get-end
xf:get-start
xf:temporal-dateTimes-contains
xf:temporal-dateTimeDuration-contains
xf:temporal-durationDateTime-contains
xf:QName-from-uri
xf:QName-from-prefix
xf:QName
xf:get-local-name
xf:get-namespace-uri
xf:anyURI
xf:NOTATION
xf:local-name
xf:namespace-uri
xf:number
xf:node-equal
xf:value-equal
xf:node-before
xf:node-after
xf:copy
xf:shallow
xf:boolean
TO
xf:position
xf:last
xf:item-at
xf:index-of
xf:empty
xf:exists
xf:identity-distinct
xf:value-distinct
xf:sort
xf:reverse-sort
xf:insert
xf:sublist-before
xf:sublist-after
xf:sublist
xf:sequence-pad-beginning
xf:sequence-pad-end
xf:truncate-beginning
xf:truncate-end
xf:resize-beginning
xf:resize-end
xf:unordered
xf:sequence-value-equal
xf:sequence-node-equal
xf:union
xf:union-all
xf:intersect
xf:intersect-all
xf:except
xf:except-all
xf:count
xf:avg
xf:max
xf:min
xf:sum
xf:id
xf:idref
xf:filter
xf:document
Casting to string and its derived types
Casting to numeric types
Casting to datetime and duration types
Casting to all other simple types
xf:boolean
xf:string
Sample data at "reviews.xml":
<reviews> <entry> <title>Data on the Web</title> <price>34.95</price> <review> A very good discussion of semi-structured database systems and XML. </review> </entry> <entry> <title>Advanced Programming in the Unix Environment</title> <price>65.95</price> <review> A clear and detailed discussion of UNIX programming. </review> </entry> <entry> <title>TCP/IP Illustrated</title> <price>65.95</price> <review> One of the best books on TCP/IP. </review> </entry> </reviews>
Adapted from XML Query Use Cases
<!ELEMENT reviews (entry*)> <!ELEMENT entry (title, price, review)> <!ELEMENT title (#PCDATA)> <!ELEMENT price (#PCDATA)> <!ELEMENT review (#PCDATA)>
For each book found in both bib.xml and reveiws.xml, list the title of the book and its price from each source.
<books-with-prices>
{
FOR $b IN document("bib.xml")//book,
$a IN document("reviews.xml")//entry
WHERE $b/title = $a/title
RETURN
<book-with-prices>
{ $b/title },
<price-amazon> { $a/price/text() } </price-amazon>
<price-bn> { $b/price/text() } </price-bn>
</book-with-prices>
}
</books-with-prices>
Adapted from XML Query Use Cases
<books-with-prices>
<book-with-prices>
<title>TCP/IP Illustrated</title>
<price-amazon>65.95</price-amazon>
<price-bn>65.95</price-bn>
</book-with-prices>
<book-with-prices>
<title>Advanced Programming in the Unix Environment</title>
<price-amazon>65.95</price-amazon>
<price-bn>65.95</price-bn>
</book-with-prices>
<book-with-prices>
<title>Data on the Web</title>
<price-amazon>34.95</price-amazon>
<price-bn>39.95</price-bn>
</book-with-prices>
</books-with-prices>
Adapted from XML Query Use Cases
The next query also uses an input document named "prices.xml":
<prices> <book> <title>Advanced Programming in the Unix Environment</title> <source>www.amazon.com</source> <price>65.95</price> </book> <book> <title>Advanced Programming in the Unix Environment </title> <source>www.bn.com</source> <price>65.95</price> </book> <book> <title>TCP/IP Illustrated </title> <source>www.amazon.com</source> <price>65.95</price> </book> <book> <title>TCP/IP Illustrated </title> <source>www.bn.com</source> <price>65.95</price> </book> <book> <title>Data on the Web</title> <source>www.amazon.com</source> <price>34.95</price> </book> <book> <title>Data on the Web</title> <source>www.bn.com</source> <price>39.95</price> </book> </prices>
Adapted from XML Query Use Cases
In the document "prices.xml", find the minimum price for each book, in the
form of a minprice
element with the book title as its
title
attribute.
<results>
{
FOR $t IN distinct(document("prices.xml")/book/title)
LET $p := $doc/book[title = $t]/price
RETURN
<minprice title = { $t/text() } >
{ min($p) }
</minprice>
}
</results>
Adapted from XML Query Use Cases
<results>
<minprice title="Advanced Programming in the Unix Environment"> 65.95 </minprice>
<minprice title="TCP/IP Illustrated"> 65.95 </minprice>
<minprice title="Data on the Web"> 34.95 </minprice>
</results>
Adapted from XML Query Use Cases
For each book with an author, return a
book
with its title and authors. For
each book with an editor, return a
reference
with the book title and the
editor's affiliation.
<bib>
{
FOR $b IN document("bib.xml")//book[author]
RETURN
<book>
{ $b/title }
{ $b/author }
</book>,
FOR $b IN document("bib.xml")//book[editor]
RETURN
<reference>
{ $b/title }
<org> { $b/editor/affiliation/text() } </org>
</reference>
}
</bib>
Adapted from XML Query Use Cases
<bib>
<book>
<title>TCP/IP Illustrated</title>
<author><last> Stevens </last> <first> W.</first></author>
</book>
<book>
<title>Advanced Programming in the Unix Environment</title>
<author><last>Stevens</last><first>W.</first></author>
</book>
<book>
<title>Data on the Web</title>
<author><last>Abiteboul</last><first>Serge</first></author>
<author><last>Buneman</last><first>Peter</first></author>
<author><last>Suciu</last><first>Dan</first></author>
</book>
<reference>
<title>The Economics of Technology and Content for Digital TV</title>
<org>CITI</org>
</reference>
</bib>
Adapted from XML Query Use Cases
Quilt: http://www.almaden.ibm.com/cs/people/chamberlin/quilt.html
Kweelt: http://db.cis.upenn.edu/Kweelt/
Ipedo: http://www.ipedo.com/
XSLT 1.1 Working Draft: http://www.w3.org/TR/xslt11/
XPath 2.0 Requirements: http://www.w3.org/TR/2001/WD-xpath20req-20010214
XSLT 2.0 Requirements: http://www.w3.org/TR/2001/WD-xslt20req-20010214
XQuery: A Query Language for XML: http://www.w3.org/TR/xquery/
XML Query Requirements: http://www.w3.org/TR/xmlquery-req
XML Query Use Cases: http://www.w3.org/TR/xmlquery-use-cases
XML Query Data Model: http://www.w3.org/TR/query-datamodel/
The XML Query Algebra: http://www.w3.org/TR/query-algebra/
XML Syntax for XQuery 1.0 (XQueryX): http://www.w3.org/TR/xqueryx
XQuery 1.0 and XPath 2.0 Functions and Operators Version 1.0: http://www.w3.org/TR/2001/WD-xquery-operators-20010827
This presentation: http://www.ibiblio.org/xml/slides/xmlone/amsterdam2001/cuttingedge/