XML News from Saturday, February 12, 2005

The W3C XSL and XML Query working groups have published nine revised working drafts:

Changes in XPath 2.0 in these drafts include:

The fn:id and fn:idref functions now work on values specified as xs:ID, xs:IDREF and xs:IDREFS as well as the DTD types ID, IDREF and IDREFS and the newly-defined xml:id.
A fn:codepoint-equal has been added that compares strings based on the Unicode code point collation.
A fn:doc-available function has been added to indicate whether an XML document can be retrieved from a given URI.
xdt:untypedAny has been changed to xdt:untyped.
xs:anyType is no longer an abstract type, but is now used to denote the type of a partially validated element node. Since there is no longer a meaningful distinction between abstract types and concrete types, these terms are no longer used in this document.
Value comparisons now return the empty sequence if either operand is the empty sequence.
The typed value of a namespace node is an instance of xs:string, not xdt:untypedAtomic.
The precedence of the cast and treat operators and unary arithmetic operators has been increased.
A new component has been added to the static context: context item static type.
The XPath 2.0 specification now clearly distinguishes between "statically-known namespaces" (a static property of an expression) and "in-scope namespaces" (a dynamic property of an element).

XSLT 2 specific changes in these drafts include:

A non-schema-aware processor now allows all the built-in types defined in XML Schema to be used; previously only a subset of the primitive types plus xs:integer were permitted.
Error codes have been assigned to some error conditions that previously had no code assigned.
xsl:use-when attributes can appear on elements that are not in the XSLT namespace, whether or not it is a literal result element. For example, it can usefully appear on an extension instruction.
The behavior of certain constructs in backwards-compatible mode has changed to more closely reflect the XSLT 1.0 behavior. Specifically:
- In backwards compatible mode, the xsl:number instruction now outputs NaN when the supplied value is an empty sequence or non-numeric, rather than signaling an error.
- In backwards compatible mode, parameters passed to a built-in template rule are not passed on.
- If no output method is explicitly requested, and the first element node output appears to be an XHTML document element, then under XSLT 2.0 the output method defaults to XHTML; with backwards compatibility enabled, the XML output method will be used.
- An XSLT 1.0 processor compared the value of the expression in the use attribute of xsl:key to the value supplied in the second argument of the key function by converting both to strings. An XSLT 2.0 processor normally compares the values as supplied. The XSLT 1.0 behavior is emulated if any of the xsl:key elements making up the key definition enables backwards-compatible behavior.
XPath expressions in attribute value templates are now expanded using the same rules as apply to the select attribute in instructions such as xsl:attribute. The effect of the change is that if the value of the expression contains several adjacent text nodes, no whitespace is inserted between the string values of these text nodes.
When the 3-argument form of thekey function is used, the search is now restricted to the subtree rooted at the node identified by the third argument. Previously the third argument merely identified the document to be searched.
The rules for the format-number function have been changed so that numbers are never output with a trailing decimal point. (I'm not sure I like this change. It could cause problems when generating C code, for example.)
The undeclare-namespaces attribute has been renamed undeclare-prefixes.
It is now a recoverable error to generate nodes in the result tree using a namespace name that is not a valid instance of xs:anyURI. XSLT 1.0 explicitly stated that this was not an error; however, the XPath 2.0 data model assumes that the name of a node is a valid xs:QName, and the namespace part of a valid xs:QName, if present, must be a valid xs:anyURI. The fact that this error is recoverable, however, gives implementations freedom to avoid strict validation of namespace names if they wish to do so.

XQuery-specific changes in these drafts include:

An ordering declaration has been added to the Prolog, which affects the ordering semantics of path expressions, FLWOR expressions, and union, intersect, and except expressions. In addition, ordered and unordered operators have been introduced that permit ordering semantics to be controlled at the expression level within a query.
Validation has been separated from construction. Validation now occurs only as a result of an explicit validate expression. Validation modes are strict and lax, and are specified on the validate expression. New construction modes strip and preserve have been defined and are declared in the Prolog. The notion of "validation context" has been deleted. The XQuery definition of validation has been converged with the definition used in XSLT.
Function overloading: That is, multiple user-defined functions can have the same name as long as they have different numbers of arguments.
xdt:untypedAny is changed to xdt:untyped.
Computed namespace constructors are now completely static and are allowed only inside a computed element constructor. Namespace declarations in a computed element constructor must come before the element content, and must consist entirely of literals. The namespace prefix is optional. If absent, it has the effect of setting the default namespace for elements and types within the scope of the constructed element.
The syntax for variable initialization in the Prolog now uses an assignment operator (":="). Also, circularities in variable initialization are now static errors.
An error is raised if a module attempts to import itself (target namespace of importing module and imported modules are the same).
A schema can now be imported without specifying either a target namespace or a location hint.
Module imports and schema imports now accept multiple location hints, representing multiple physical resources in the same module or schema.
CData Sections are no longer considered to be constructors, but are simply a notational convenience for embedding special characters in the content of an element or attribute constructor.
Three new components have been added to the static context: XQuery Flagger status, XQuery Static Flagger status, and context item static type. (Note: Flagger status items were later deleted.)
An order by clause may now accept values of mixed type if they have a common type that is reachable by numeric type promotion and/or moving up the type derivation hierarchy, and if this common type has a gt operator.
In element and document node constructors, if the content sequence contains a document node, that node is replaced by its children (this was previously treated as an error).
Atomization now applies to the name expression of a computed processing instruction constructor.
It is now implementation-defined whether undeclaration of namespace prefixes in an element constructor is supported. If supported, this feature conforms to the semantics of Namespaces 1.1. In other words, if an element constructor binds a namespace prefix to the zero-length string, any binding of that prefix defined at an outer level is suspended within the scope of the constructed element.
In a computed text node constructor, the expression enclosed in curly braces is no longer optional, since it is not possible to construct an empty text node.
Rules for processing comment constructors have changed, to ensure that the resulting comment does not contain adjacent hyphens or end with a hyphen.

For the first time in years, most of the various working drafts are now in sync. (The formal semantics document still hasn't completely caught up.) Changes over all aren't major. These still aren't last call working drafts, though; and it seems unlikely the working group will finish this year. Just maybe the final versions will be released in 2006.

The W3C XQuery working group has also published the first public working ddraft ofXQuery Update Facility Requirements. XQuery as it currently exists is basically just SELECT in SQL terms. This is the beginning of work on INSERT, UPDATE, and DELETE. This is just a list of proposed requirements for an eventual update language. No actual syntax or behavior is suggested in this draft.

In related news. Michael Kay has released version 8.3 of Saxon, his XSLT 2.0 and XQuery processor. Besides updating Saxon to cover the latest working drafts, this release makes the dependency on JAXP 1.3 a lot softer so that Saxon is much easier to install and run in Java 1.4 environments. Saxon 8.3 is published in two versions for both of which Java 1.4 or later is required. Saxon 8.3B is an open source product published under the Mozilla Public License 1.0 that "implements the 'basic' conformance level for XSLT 2.0 and XQuery." Saxon 8.3SA is a £250.00 payware version that "allows stylesheets and queries to import an XML Schema, to validate input and output trees against a schema, and to select elements and attributes based on their schema-defined type. Saxon-SA also incorporates a free-standard XML Schema validator. In addition Saxon-SA incorporates some advanced extensions not available in the Saxon-B product. These include a try/catch capability for catching dynamic errors, improved error diagnostics, support for higher-order functions, and additional facilities in XQuery including support for grouping, advanced regular expression analysis, and formatting of dates and numbers." Upgrades from 8.x are free.