2008 XML News

Wednesday, December 31, 2008 (Permalink)

The W3C Web Content Accessibility Guidelines (WCAG) Working Group has published the finished Recommendation of the Web Content Accessibility Guidelines 2.0. "Web Content Accessibility Guidelines (WCAG) 2.0 covers a wide range of recommendations for making Web content more accessible. Following these guidelines will make content accessible to a wider range of people with disabilities, including blindness and low vision, deafness and hearing loss, learning disabilities, cognitive limitations, limited movement, speech difficulties, photosensitivity and combinations of these. Following these guidelines will also often make your Web content more usable to users in general. WCAG 2.0 success criteria are written as testable statements that are not technology-specific. "

Saturday, December 27, 2008 (Permalink)

Bare Bones Software has released version 9.1 of BBEdit, my preferred text editor on the Mac, my favorite XML editor on any platforms, what I'm using to type these very words. Version 9.1 "now includes a copy of Consolas Regular, an excellent antialiased code editing font. This font is licensed from Ascender Corporation for use only with BBEdit." New copies cost $125.Upgrades from 9.0 are free. Mac OS X 10.4 or later is required.

Friday, December 26, 2008 (Permalink)

The W3C Web API Working Group has published the finished recommendation of Element Traversal Specification. "This specification defines the ElementTraversal interface, which allows script navigation of the elements of a DOM tree, excluding all other nodes in the DOM, such as text nodes. It also provides an attribute to expose the number of child elements of an element. It is intended to provide a more convenient alternative to existing DOM navigation interfaces, with a low implementation footprint."

ElementTraversal provides some extra properties/methods for navigating only through elements, while ignoring text and white space:

  • firstElementChild
  • lastElementChild
  • previousElementSibling
  • nextElementSibling
  • childElementCount

This makes it easier to process record-like XML, but inappropriate for reading documents with mixed content. It may be mildly helpful if it achieves broad adoption in browsers. However at this point adding more methods to DOM is just putting lipstick on a pig. Until we admit that DOM was a mistake, we can't really begin to address our problems.

Monday, December 22, 2008 (Permalink)

The W3C Scalable Vector Graphics Working Group has posted the finished recommendation of Scalable Vector Graphics (SVG) Tiny 1.2. SVG Tiny is a "a language for describing two-dimensional vector and mixed vector/raster graphics in XML. Its goal is to provide the ability to create a whole range of graphical content, from static images to animations to interactive Web applications. SVG 1.2 Tiny is a profile of SVG intended for implementation on a range of devices, from cellphones and PDAs to desktop and laptop computers, and thus includes a subset of the features included in SVG 1.1 Full, along with new features to extend the capabilities of SVG. Further extensions are planned in the form of modules which will be compatible with SVG 1.2 Tiny, and which when combined with this specification, will match and exceed the capabilities of SVG 1.1 Full."

Sunday, December 21, 2008 (Permalink)

Michael Kay has released versions 9.1.0.5 of Saxon, his XSLT 2.0 and XQuery processor for Java and .NET. This is a bug fix release.

Saxon is published in two versions for both of which Java 1.4 or later (or .NET) is required. Saxon 9.1B is an open source product published under the Mozilla Public License 1.0 that "implements the 'basic' conformance level for XSLT 2.0 and XQuery." Saxon 9.1 SA is a £300.00 payware. According to Kay,

The most obvious difference between Saxon-SA and Saxon-B is that Saxon-SA is schema-aware: it allows stylesheets and queries to import an XML Schema, to validate input and output trees against a schema, and to select elements and attributes based on their schema-defined type. Saxon-SA also incorporates a free-standing XML Schema validator.

In addition Saxon-SA incorporates some advanced extensions and optimizations not available in the Saxon-B product:

  • Saxon-SA is able to compile XQuery code directly into Java classes.

  • Saxon-SA has an advanced optimizer which recognizes joins in XPath expressions, XQuery FLOWR expressions, and in XSLT templates (nested xsl:for-each instructions). Whereas Saxon-B always implements these as nested loops, Saxon-SA uses a variety of strategies including indexes and hash joins. This can give dramatic improvements in execution time for large documents: some of the queries in the XMark benchmark improve by a factor of 300 (from 16 seconds to 45 milliseconds) to process a 10Mbyte source file.

  • Saxon-SA has a facility to process large documents in streaming mode. This enables documents to be handled that are too large to hold in memory (it has been tested up to 20Gb).

  • Additional extensions available in Saxon-SA include a try/catch capability for catching dynamic errors, improved error diagnostics, support for higher-order functions, and additional facilities in XQuery including support for grouping, advanced regular expression analysis, and formatting of dates and numbers.

Friday, December 19, 2008 (Permalink)

Planamesa Software has released NeoOffice/J 2.2.5 patch 5, a Mac port of OpenOffice 2.1 using a Java-based GUI. This is a bug fix release. Mac OS X 10.3.9 or later is required.

Thursday, December 18, 2008 (Permalink)

The Mozilla Project has released Firefox 2.0.0.20 to fix a security bug that apparently only affects Windows. other users don't need to upgrade.

Wednesday, December 17, 2008 (Permalink)

The Mozilla Project has released Firefox 3.0.5 and 2.0.0.19 and SeaMonkey 1.1.14 to fix various security bugs. All users should upgrade.

There are a couple of other improvements of note in Firefox 3.0.5. Most interestingly, it no longer requires you to agree to a license on installation. Open source software doesn't need that and shouldn't do it. AppleScript was supposed to be at least marginally fixed in this release, but that doesn't seem to have happened so I'm still on 2.0.0.19 until at least the next release. :-(

Tuesday, December 16, 2008 (Permalink)

The W3C XQuery working group has published an updated working draft of XQuery 1.1 and XQuery 1.1 Use Cases. New features since XQuery 1.0 include:

  • Added 3.8.4 Group By Clause to FLWOR Expression.
  • Added 3.8.2 Window Clauses to FLWOR Expression.
  • try/catch
  • Nondeterministic external functions
  • "count" clause in FLWOR
  • Outer For
  • Query prolog syntax to specify how decimal numbers are formatted.
Monday, December 15, 2008 (Permalink)

The W3C XQuery working group has posted new working drafts of XQuery Scripting Extension 1.0 and XQuery Scripting Extension 1.0 Use Cases:

The principal extensions introduced by XQSE are as follows:

  1. An ordering is defined on the evaluation of certain kinds of XQuery expressions. An implementation may use any execution strategy as long as the result complies with the semantics of this ordering. The ordering is defined in a way that places no additional constraints on the evaluation of any valid XQUF or [XQuery 1.0] expressions.

  2. Expressions in XQSE may have side-effects that are visible to subsequent expressions (according to the above ordering of evaluation).

  3. XQSE introduces the following new kinds of expressions:

    1. Apply (semicolon) expressions

    2. Blocks

    3. Assignment expressions

    4. Exit expressions

    5. While expressions

  4. XQSE relaxes the constraints on the placement of updating expressions, so that a non-empty XDM instance can be returned by an expression as well as a non-empty pending update list. In order to allow this, new rules to determine the category and resulting pending update list are added to every existing expression.

  5. XQSE introduces a new expression category called sequential expressions. The simple and updating expression categories introduced by XQUF are retained, but the vacuous expression category no longer has significance. Informal definitions of all the expression categories are summarized here. For normative definitions of the categories, see the "Category Rules" that are specified for each kind of expression in [2.3 New Kinds of Expressions] and [2.4 Changes to Existing Expressions].

    1. [Definition: An updating expression is an expression that can return a non-empty pending update list.] Updating expressions include insert, delete, replace, rename, and calls to updating functions, as well as certain other expressions that contain nested updating expressions. An updating expression may return a non-empty XDM instance as well as a non-empty pending update list - however note that it does not actually apply any updates.

    2. [Definition: A sequential expression is an expression that can have side effects other than constructing a new node or raising an error.] Side effects include applying updates to an XDM instance, altering the dynamic context, or affecting the flow of control. Sequential expressions include apply expressions, assignment, exit, while, and calls to sequential functions, as well as certain other expressions that contain nested sequential expressions. The side effects of a sequential expression are immediately effective and are visible to subsequent expressions. Because of their side effects, sequential expressions must be evaluated in a well-defined order. In addition to its side effects, a sequential expression may return a non-empty XDM instance, but it never returns a non-empty pending update list.

    3. [Definition: A simple expression is an expression that is neither an updating expression nor a sequential expression.] A simple expression may return an XDM instance, and it may construct a node or raise an error.

    The classification of each expression into one of the above categories is performed by static analysis. For each kind of expression, XQSE provides rules that specify the required categories of the operand expressions and the category of the expression itself.

Saturday, December 13, 2008 (Permalink)

Automattic has released Wordpress 2.7.0 an open source (GPL) blog engine based on PHP and MySQL. Notable new features in this release include a radically revised user interface and more automated upgrades to future versions. I'm going to try to manually upgrade my sites now. Wish me luck.

Friday, December 12, 2008 (Permalink)

Google has released Chrome 1.0, an open source WebKit-based browser for Windows. Annoyingly the website won't let you download Chrome on a Mac. Hasn't Google heard of Parallels, VMWare, and Bootcamp? URIs should not return different content based on the client's platform. Doing so is a major violation of the web architecture. Also, although Chrome claims to be open source (and probably is), there's an annoying page of legalese you have to agree to before you can download it. Would someone please compile it from source and post a no-contract version that can be downloaded on any platform?

On the positive side, Chrome actually asks you who you want your default search engine to be when you start up, something I don't recall any other browser doing. Honestly, this may be going a little too far in the direction of even-handedness. Not all search engines are created equal. By all means let users change their default search engine if they wish, but I don't see anything wrong with simply setting Google as the default and not bothering users about that unless they care. Google's the market leader in search for good reason: they do it way better than anyone else. Choosing the best option for your customers is a good thing, even when the best option is yourself.

Thursday, December 11, 2008 (Permalink)

The W3C has published the first working draft of rdf:text: A Datatype for Internationalized Text:

The datatype identified by the URI http://www.w3.org/1999/02/22-rdf-syntax-ns#text (abbreviated rdf:text) allows for the representation of internationalized text strings. In addition to the RIF and OWL specifications, this datatype is expected to supersede RDF's plain literals with language tags, cf. [5], which is why this datatype has been added into the rdf: namespace.

Value Space. The value space of rdf:text is the set of all pairs of the form ( "text" , "lang" ), where "text" is a string and "lang" is either the empty string "" or a lowercase language tag.

Lexical Space. A lexical value of rdf:text is a string "val" that contains at least one @ character (U+40) and that satisfies the following condition:

Let i be the position of the last @ (U+40) character in "val", and let "abc" and "tag" be the substrings of "val" containing the characters up to and after position i (noninclusive), respectively. Then ,"tag" MUST be either empty or a valid language tag.

Each such lexical value is assigned a data value ( "abc", "lc-tag" ), where "lc-tag" is the string "tag" converted to lowercase.

Editor's Note: Open Issues: The definition of the set of characters, particularly the fact that it is infinite, as well as the compatibility with XML strings - whether the string part of the lex & val space should be the same as xs:string - are still under discussion.

Lexical value "Family Guy@en" is mapped to the data value ( "Family Guy" , "en" ), and "Family Guy@" is mapped to ( "Family Guy" , "" ). Furthermore, "Family Guy" is not a valid lexical value of rdf:text because it does not contain the @ (U+40) character.

Wednesday, December 10, 2008 (Permalink)

The W3C has published the candidate recommendation of the CSS Marquee Module Level 3. "When documents (e.g., HTML) are laid out on visual media (e.g., screen or print) and the contents of some element are too large for a given area, CSS allows the designer to specify whether and how the overflow is displayed. One way, available on certain devices, is the “marquee” effect: the content is animated and moves automatically back and forth. This module defines the properties to control that effect."

Tuesday, December 9, 2008 (Permalink)

The Call for Papers for Balisage 2009 has been posted. "We welcome papers about topic maps, document modeling, markup of overlapping structures, ontologies, metadata, content management, and other markup-related topics at Balisage. If you want to talk, in detail XML, XSL, SGML, LMNL, XSL-FO, XTM, RDF, XQuery, Topic Maps, SVG, MathML OWL, UBL, XSD, TexMECS, RNG, or any other markup-related topic, we urge you to participate in Balisage." Paper Submissions are due by April 24. The conference takes place August 11-14.

In addition the official conference will be proceeded by a one day "International Symposium on Processing XML Efficiently: Overcoming Limits on Space, Time, or Bandwidth" on the 10th.

Monday, December 8, 2008 (Permalink)

The W3C Core Working group has broken faith with the XML community by publishing an XML 1.0, fifth edition spec that is incompatible with all previous versions. The grammar has changed and previously malformed documents are suddenly well-formed. Existing parsers cannot handle the syntax defined by this edition. XML 1.1 has failed so now the W3C is trying to rewrite history and pretend that this is what they meant all along. (If that were true, why did we waste so much time on XML 1.1?) Apparently stability of standards is no longer a virtue at the W3C. This is even worse than the XML 1.1 debacle. At least there, the W3C admitted they were pushing a new, incompatible version; and gave users a hook to tell which version they were receiving. Now we don't even have that. As if XML weren't already confusing enough for people who don't spend 60 hours a week thinking about this stuff. Now we have to explain that the well-formedness of a document depends on which version of which parser is being used, and which edition of XML the parser implements, and no, there's nothing in the document to tell you which version you should be using.

The ostensible goal of this edition is to improve internationalizability of XML by enabling additional characters that might someday be needed by someone, somewhere to name an element or attribute. (Byzantine Greek musical symbols anyone?) In practice, though, I think fear will do the opposite. The real rules are now far too confusing and far too poorly labeled for any person to follow given the unadvertised version conflicts. The quick and dirty reality is now going to be, "Name everything with Latin-1". If you go beyond that, you're taking your chances that what works with your parser may not work with others'. Great for Western Europeans (except Greeks) and Americans; sucks for everybody else.

Perhaps the time has come to say that the W3C has outlived its usefulness. Really, has there been any important W3C spec in this millennium that's worth the paper it isn't printed on? The W3C almost killed HTML, and browser vendors have effectively abandoned it. Between schemas and XML 1.0 5th edition, they same intent on doing the same thing to XML. And don't get me started on the huge amount of effort and brain power being wasted on counting semantic angels on top of a URI-named pin. XSLT 2 and XPath 2 were still-born, and the much more pragmatic XSLT 1.1 was killed. Maybe XQuery, but even that is far more complex and less powerful than it should be due to an excessive number of use cases and a poorly designed schema type system. I think we might all be better off if the W3C had declared victory and closed up shop in 2001.

Thursday, December 4, 2008 (Permalink)

I have uploaded Jaxen 1.1.2, an open source XPath 1.0 engine written in Java that supports multiple object models including DOM, XOM, JDOM, and dom4j. It is also flexible enough to be adapted to XML views of non-XML data structures. For instance, PMD uses it to enable XPath expressions to query compiled Java byte code. Version 1.1.2 is believed to be fully conformant with the XPath 1.0 specification, modulo undiscovered bugs. This release fixes assorted a couple of significant bugs that incorrectly evaluated some XPath expressions. You should upgrade when you get a chance. Jaxen is published under a modified BSD license.

Wednesday, December 3, 2008 (Permalink)

XMLMind has released Qizx/db 2.2, a $600 closed source, embeddable native XML database engine and/or database server written in Java that supports XQuery 1.0. Version 2.2 adds support for XQuery 1.1 "group by" and "for ... window" clauses in FLWOR expressions. The query interpreter part is available under an open source license.

Tuesday, December 2, 2008 (Permalink)

Norm Walsh has posted version 0.9.1 of Calabash, an open source XProc implementation written in Java. Calabash currently passes all the tests in the XProc test suite. Java 5 or later is required. Calabash is published under the GNU General Public License Version 2.0.

Thursday, November 27, 2008 (Permalink)

Sun has released MySQL Server 5.1.30 GA, "the first 5.1 production version of the popular open source database. MySQL 5.1.30 is recommended for use on production systems." Notable new features in this release include table and index partitioning, row-based and mixed replication a built-in job scheduler, new SQL diagnostic aids and performance utilities, and improved XML handling with XPath support. "ExtractValue() returns the content of a fragment of XML matching a given XPath expression. UpdateXML() replaces the element selected from a fragment of XML by an XPath expression supplied by the user with a second XML fragment (also user-supplied), and returns the modified XML. See Section 11.10, “XML Functions”."

Wednesday, November 26, 2008 (Permalink)

Matt Mullenweg has released Wordpress 2.6.5 an open source (GPL) blog engine based on PHP and MySQL. This is yet another security fix. "The security issue is an XSS exploit discovered by Jeremias Reith that fortunately only affects IP-based virtual servers running on Apache 2.x. If you are interested only in the security fix, copy wp-includes/feed.php and wp-includes/version.php from the 2.6.5 release package. 2.6.5 contains three other small fixes in addition to the XSS fix. The first prevents accidentally saving post meta information to a revision. The second prevents XML-RPC from fetching incorrect post types. The third adds some user ID sanitization during bulk delete requests."

Tuesday, November 25, 2008 (Permalink)

The W3C Scalable Vector Graphics Working Group has posted the proposed recommendation of Scalable Vector Graphics (SVG) Tiny 1.2. SVG Tiny is a "a language for describing two-dimensional vector and mixed vector/raster graphics in XML. Its goal is to provide the ability to create a whole range of graphical content, from static images to animations to interactive Web applications. SVG 1.2 Tiny is a profile of SVG intended for implementation on a range of devices, from cellphones and PDAs to desktop and laptop computers, and thus includes a subset of the features included in SVG 1.1 Full, along with new features to extend the capabilities of SVG. Further extensions are planned in the form of modules which will be compatible with SVG 1.2 Tiny, and which when combined with this specification, will match and exceed the capabilities of SVG 1.1 Full."

Thursday, November 20, 2008 (Permalink)

The W3C Web API Working Group has published the proposed recommendation of Element Traversal Specification. "This specification defines the ElementTraversal interface, which allows script navigation of the elements of a DOM tree, excluding all other nodes in the DOM, such as text nodes. It also provides an attribute to expose the number of child elements of an element. It is intended to provide a more convenient alternative to existing DOM navigation interfaces, with a low implementation footprint."

ElementTraversal provides some extra properties/methods for navigating only through elements, while ignoring text and white space:

  • firstElementChild
  • lastElementChild
  • previousElementSibling
  • nextElementSibling
  • childElementCount

This makes it easier to process record-like XML, but inappropriate for reading documents with mixed content. It may be mildly helpful if it achieves broad adoption in browsers. However at this point adding more methods to DOM is just putting lipstick on a pig. Until we admit that DOM was a mistake, we can't really begin to address our problems.

Wednesday, November 19, 2008 (Permalink)

The W3C Math Working Group has posted an updated working draft of Mathematical Markup Language (MathML) Version 3.0.:

The present draft is an incremental one making public some of the results of Math Working Group work in recent months. The biggest difference this time is in Chapter 4, although there have been smaller ameliorations throughout the specification. A more detailed description of changes from the previous Recommendation follows.

  • With the second Working Draft, much of the non-normative explication that formerly was found in Chapters 1 and 2, and many examples from elsewhere in the previous MathML specifications, were removed from the MathML3 specification and incorporated into a MathML Primer being prepared as a separate document. It is expected this will help the use of this formal MathML3 specification as a reference document in implementations, and offer the new user better help in understanding MathML's deployment. The remaining content of Chapters 1 and 2 is being edited to reflect the changes elsewhere in the document, and in the rapidly evolving Web environment. Some of their text used to go back to early days of the Web and XML, and its explanations are now commonplace.

  • Chapter 3, on presentation-oriented markup, in this draft adds new material on linebreaking and on markup for elementary math notations. Material introduced in the last draft revising the mpadded and maction elements has been further revised as a result of active discussion. It is possible it may undergo further modification. In addition, the layout of schemata such as that for long division and its associated mcolumn element have been carefully revised. Earlier work, as recorded in the W3C Note Arabic mathematical notation, has allowed clarification of the relationship with bidirectional text and examples with RTL text have been added.

  • Chapter 4, on content-oriented markup, contains major changes and additions in this Working Draft. The meaning of the actual content remains as before in principle, but a lot of work has been done on expressing it better. The text of this chapter is generated by filtered extraction from XML Content Dictionaries written in accordance with OpenMath. The details of the Content Dictionary format have been further specified and the generation procedure improved. It is expected that the Content Dictionaries will become a separate joint publication of the W3C and OpenMath referenced by the MathML3 specification. The Content Dictionaries are now publicly available in draft and much work has already been done on refining them. Their format is given in Chapter 8.

  • Chapter 5 is being refined as its purpose has been further clarified. This chapter deals with interrelations of parts of the MathML specification, especially with presentation and content markup.

  • Chapter 6 has been rewritten and reorganized to reflect the new situation in regard to Unicode, and the changed W3C context with regard to named character entities. The new W3C specification of Entity Definitions for Characters in XML, which incorporates those used for mathematics is becoming a public working draft [Entities]. It is expected that some new ancillary tables will be provided that reflect requests the Math WG has received.

  • Chapter 7 has been restored with a new and clearer purpose. This chapter looks outward to the larger world in which MathML must function.

  • Chapter 8 will specify the format of MathML3 Content Dictionaries, as previously handled more briefly in sections 4.5 and 4.6. The DOM for MathML, previously in a chapter at this point, is being prepared as a separate specification.

  • The Appendices, of which there are eight shown, have not been fully reworked. Eventually what amount to revisions of the present appendices A, F, G, H, I and J are all that are expected to remain. Appendix A now contains the new RelaxNG schema for MathML3 as well as discussion of MathML3 DTD issues.

Monday, November 17, 2008 (Permalink)

The W3C Web Content Accessibility Guidelines (WCAG) Working Group has published the Proposed Recommendation of the Web Content Accessibility Guidelines 2.0 and updated Working Drafts of Understanding WCAG 2.0, Techniques for WCAG 2.0, and How to Meet WCAG 2.0. "Web Content Accessibility Guidelines (WCAG) 2.0 covers a wide range of recommendations for making Web content more accessible. Following these guidelines will make content accessible to a wider range of people with disabilities, including blindness and low vision, deafness and hearing loss, learning disabilities, cognitive limitations, limited movement, speech difficulties, photosensitivity and combinations of these. Following these guidelines will also often make your Web content more usable to users in general. WCAG 2.0 success criteria are written as testable statements that are not technology-specific. " Comments are due by December 2, 2008.

Saturday, November 15, 2008 (Permalink)

Apple has released Safari 3.2 for Mac and Windows to close various security holes. Interestingly one of the bugs comes from libxslt. "A heap buffer overflow issue exists in the libxslt library. Viewing a maliciously crafted HTML page may lead to an unexpected application termination or arbitrary code execution. Further information on the patch applied is available via http://xmlsoft.org/XSLT/ This issue does not affect Mac OS X systems that have applied Security Update 2008-007. Credit to Anthony de Almeida Lopes of Outpost24 AB, and Chris Evans of the Google Security Team for reporting this issue." All users should upgrade. Software Update doesn't seem to find it. You'll need to download and install it manually.


The Mozilla Project has released Firefox 3.0.4 and 2.0.0.18 and SeaMonkey 1.1.13 to fix various security bugs. All users should upgrade.

Friday, November 14, 2008 (Permalink)

The Call for Papers is now open for ApacheCon US 2009, taking place November 2-6 in Oakland, California. Proposals are due by February 28, 2009. Hmm, Europe gets to hold their conference in Amsterdam and the U.S. gets Oakland? Who decided this? I think I want a recount. How about New Orleans for 2010?

Wednesday, November 12, 2008 (Permalink)

Monkfish Software has XMLBlueprint 6.2, a $79 payware XML Editor for Windows that features context-sensitive XML completion, schema validation (DTD, XSD, and Relax NG), XSLT, and XPath.

Tuesday, November 11, 2008 (Permalink)

The W3C XML Core Working Group has published a note on Legacy extended IRIs for XML resource identification. "For historic reasons, some formats have allowed variants of IRIs that are somewhat less restricted in syntax, for example XML system identifiers and W3C XML Schema anyURIs. This document provides a definition and a name (Legacy Extended IRI or LEIRI) for these variants for easy reference. These variants have to be used with care; they require further processing before being fully interchangeable as IRIs. New protocols and formats should not use Legacy Extended IRIs." Characters allowed in LEIRIs but not IRIs include:

  • Space (U+0020)
  • Delimiters "<" (U+003C), ">" (U+003E) and '"' (U+0022)
  • Unwise characters "\" (U+005C), "^" (U+005E), "`" (U+0060), "{" (U+007B), "|" (U+007C) and "}" (U+007D)
  • The controls (C0 controls, DEL and C1 controls, U+0000 - U+001F U+007F - U+009F)
  • Bidi formatting characters (U+200E, U+200F, U+202A-202E)
  • Specials (U+FFF0-FFFD)
  • Private use code points (U+E000-F8FF, U+F0000-FFFFD, U+100000- 10FFFD)
  • Tags (U+E0000-E0FFF)
  • Non-characters (U+FDD0-FDEF, U+1FFFE-1FFFF, U+2FFFE-2FFFF, U+3FFFE-3FFFF, U+4FFFE-4FFFF, U+5FFFE-5FFFF, U+6FFFE-6FFFF, U+7FFFE-7FFFF, U+8FFFE-8FFFF, U+9FFFE-9FFFF, U+AFFFE-AFFFF, U+BFFFE-BFFFF, U+CFFFE-CFFFF, U+DFFFE-DFFFF, U+EFFFE-EFFFF, U+FFFFE-FFFFF, U+10FFFE-10FFFF)
Monday, November 10, 2008 (Permalink)

Clinical & Biomedical Computing Ltd. has released XQSharp 0.9, a free-as-in-beer XQuery processor for .NET platform. It includes a command line processor that can run ad-hoc queries against multiple files. According to Oliver Hallam, "Its API has been modelled closely on the System.XML.XPath classes and operates on the IXPathNavigable interface so fits in well with the rest of the framework classes. We aim to be releasing the API for free in the next release of the product. In the longer term we aim to release a Microsoft Visual Studio plugin allowing convenient editing and running of both ad-hoc and repeated-use queries."

Thursday, November 6, 2008 (Permalink)

IBM's developerWorks has published my latest article, Detect XML document encodings with SAX and XNI. Sometimes when you forward XML documents, you just want to copy the bytes from point A to point B. You don't necessarily want to parse the entire thing, but you do need to determine the character encoding to set the metadata appropriately. In these cases, streaming APIs such as SAX and XNI offer a fast and efficient way to inspect the encoding without paying for full parsing. This article shows you how.

Wednesday, November 5, 2008 (Permalink)

The Business Process Technology research group at the Hasso Plattner Institute of IT Systems Engineering at the University of Potsdam has released the Oryx XForms Editor, a browser-hosted XForms authoring application. You can visually edit forms, export and import them in XForms format. The editor runs in your web browser (Firefox), zero software installation required. The Oryx XForms Editor is published under an MIT license.

Monday, November 3, 2008 (Permalink)

The OpenOffice Project has released OpenOffice 2.4.2, an open source office suite for Linux, Solaris, and Windows that saves all its files as zipped XML. 2.4.2 is a bug fix release.

Thursday, October 30, 2008 (Permalink)

XimpleWare has released VTD-XML 2.4, a free-as-in-speech (GPL) non-extractive Java/C/C# library for processing XML that supports XPath. This appears to be an example of what Sam Wilmot calls "in situ parsing". In other words, rather than creating objects representing the content of an XML document, VTD-XML just passes pointers into the actual, real XML. (These are the abstract pointers of your data structures textbook, not C-style addresses in memory. In this cases the pointers are int indexes into the file.) You don't even need to hold the document in memory. It can remain on disk. This should improve speed and memory usage, but I haven't verified that, and I don't trust their own benchmarks. Version 2.4 supports memory mapped files and files up to 256 gigabytes. However it's still not a minimally conformant XML parser, and doesn't seem likely to become one. In particular, it only supports the five predefined entities, not others that may be declared in the internal DTD subset.

Wednesday, October 29, 2008 (Permalink)

The W3C Synchronized Multimedia Working Group has posted the proposed recommendation of the Synchronized Multimedia Integration Language 3.0 (SMIL 3.0). SMIL 3.0 has four goals:

  • Define an XML-based language that allows authors to write interactive multimedia presentations. Using SMIL, an author can describe the temporal behaviour of a multimedia presentation, associate hyperlinks with media objects and describe the layout of the presentation on a screen.
  • Allow reusing of SMIL syntax and semantics in other XML-based languages, in particular those who need to represent timing and synchronization. For example, SMIL components are used for integrating timing into XHTML [XHTML10] and into SVG [SVG].
  • Extend the functionalities contained in the SMIL 2.1 [SMIL21] into new or revised SMIL 3.0 modules.
  • Define new SMIL 3.0 Profiles incorporating features useful within the industry.
Tuesday, October 28, 2008 (Permalink)

James Clark has posted updated version of Jing and Trang. Jing is a Java application that validates XML documents against a RELAX NG schema. Trang is a Java application that converts different XML schema languages to and from RELAX NG. Both are published under a BSD license.

Monday, October 27, 2008 (Permalink)

Codalogic has released a free-as-in-beer version of its LMX C++ XML data binding tool for Windows and Linux.

The Codalogic LMX tool generates application specific C++ code that will read and write XML instances to and from C++ objects. The tool uses a W3C XML schema to define the C++ code that is generated. The generated C++ code consists of C++ classes that mirror the structure specified by the XML schema and the code ensures that the read and written XML instances conform to the constraints specified by the XML schema.

The Codalogic LMX code generator comes in a number of editions including Professional and Standard. The free edition, called the Express Edition, is a stripped down version of the Standard edition, and as such has a number of configuration options disabled.

Friday, October 24, 2008 (Permalink)

The W3C Voice Browser Activity has published the finished recommendation of Pronunciation Lexicon Specification (PLS) Version 1.0. This is an XML syntax for specifying pronunciation lexicons for Automatic Speech Recognition and Speech Synthesis engines in voice browser applications:

The language allows one or more pronunciations for a word or phrase to be specified using a standard pronunciation alphabet or if necessary using vendor specific alphabets. Pronunciations are grouped together into a PLS document which may be referenced from other markup languages, such as the Speech Recognition Grammar Specification [SRGS] and the Speech Synthesis Markup Language [SSML].

In its most general sense, a lexicon is merely a list of words or phrases, possibly containing information associated with and related to the items in the list. This document uses the term "lexicon" in only one specific way, as "pronunciation lexicon". In this particular document, "lexicon" means a mapping between words (or short phrases), their written representations, and their pronunciations suitable for use by an ASR engine or a TTS engine. Pronunciation lexicons are not only useful for voice browsers; they have also proven effective mechanisms to support accessibility for persons with disabilities as well as greater usability for all users. They are used to good effect in screen readers and user agents supporting multimodal interfaces.

Thursday, October 23, 2008 (Permalink)

SyncroSoft has released <Oxygen/> 10.0, $366 payware XML editor written in Java. Oxygen supports XML, XSL, DTDs, XQuery, SVG, Relax NG, Schematron, and the W3C XML Schema Language. This feels like a fairly minor update with lots of small improvements rather than big new features. (That happens to most products after a few years if they're any good.) According to the announcement:

One of the most important additions in Oxygen XML Editor and Author version 10 is the bundling of the schema-aware XSLT 2.0 and XQuery processor from Saxonica. Saxon-SA is now available in all Oxygen editions at no additional cost.

Version 10 comes with a large number of improvements including a powerful new XML instance generator, better content completion offering proposals from included or imported XML Schema or XSLT modules, a better integration of the Intel(R) XML Software Suite and updates the document frameworks and XML, XML Schema, XSLT, XPath and FOP processors.

The SVN support was also updated to include Subversion 1.5 features.

I could have sworn Saxon-SA was already bundled in version 9, but maybe I'm wrong about that. I rarely use the SA features anyway.

Wednesday, October 22, 2008 (Permalink)

The W3C XHTML2 Working Group has released of XHTML Modularization 1.1 . "This modularization provides a means for subsetting and extending XHTML, a feature needed for extending XHTML's reach onto emerging platforms. This specification is intended for use by language designers as they construct new XHTML Family Markup Languages. This specification does not define the semantics of elements and attributes, only how those elements and attributes are assembled into modules, and from those modules into markup languages. This second version of this specification includes several minor updates to provide clarifications and address errors found in the first version. It also provides an implementation using XML Schemas."

Tuesday, October 21, 2008 (Permalink)

Norm Walsh has posted version 0.8.2 of Calabash, an open source XProc implementation written in Java. Calabash currently passes all the tests in the XProc test suite. Java 5 or later is required. Calabash is published under the GNU General Public License Version 2.0.

Friday, October 17, 2008 (Permalink)

The W3C Semantic Web Best Practices and Deployment Working Group and HTML Working Groups have published the finished recommendation of RDFa in XHTML: Syntax and Processing.

The modern Web is made up of an enormous number of documents that have been created using HTML. These documents contain significant amounts of structured data, which is largely unavailable to tools and applications. When publishers can express this data more completely, and when tools can read it, a new world of user functionality becomes available, letting users transfer structured data between applications and web sites, and allowing browsing applications to improve the user experience: an event on a web page can be directly imported into a user's desktop calendar; a license on a document can be detected so that users can be informed of their rights automatically; a photo's creator, camera setting information, resolution, location and topic can be published as easily as the original photo itself, enabling structured search and sharing.

RDFa is a specification for attributes to express structured data in any markup language. This document specifies how to use RDFa with XHTML. The rendered, hypertext data of XHTML is reused by the RDFa markup, so that publishers don't need to repeat significant data in the document content. The underlying abstract representation is RDF [RDF-PRIMER], which lets publishers build their own vocabulary, extend others, and evolve their vocabulary with maximal interoperability over time. The expressed structure is closely tied to the data, so that rendered data can be copied and pasted along with its relevant structure.

The rules for interpreting the data are generic, so that there is no need for different rules for different formats; this allows authors and publishers of data to define their own formats without having to update software, register formats via a central authority, or worry that two formats may interfere with each other.

RDFa shares some use cases with microformats [MICROFORMATS]. Whereas microformats specify both a syntax for embedding structured data into HTML documents and a vocabulary of specific terms for each microformat, RDFa specifies only a syntax and relies on independent specification of terms (often called vocabularies or taxonomies) by others. RDFa allows terms from multiple independently-developed vocabularies to be freely intermixed and is designed such that the language can be parsed without knowledge of the specific term vocabulary being used.

This document is a detailed syntax specification for RDFa, aimed at:

  • those looking to create an RDFa parser, and who therefore need a detailed description of the parsing rules;
  • those looking to recommend the use of RDFa within their organisation, and who would like to create some guidelines for their users;
  • anyone familiar with RDF, and who wants to understand more about what is happening 'under the hood', when an RDFa parser runs.

For those looking for an introduction to the use of RDFa and some real-world examples, please consult the RDFa Primer.

Here's a syntax example from the draft:

<html
  xmlns="http://www.w3.org/1999/xhtml"
  xmlns:cal="http://www.w3.org/2002/12/cal/ical#"
  xmlns:xsd="http://www.w3.org/2001/XMLSchema"
  >
  <head><title>Jo's Friends and Family Blog</title></head>
  <body>
    <p>
      I'm holding
      <span property="cal:summary">
        one last summer Barbecue
      </span>,
      on
      <span property="cal:dtstart" content="20070916T1600-0500"
            datatype="xsd:datetime">
        September 16th at 4pm
      </span>.
    </p>
  </body>
</html>

You'll notice that RDFa manages to avoid using namespace prefixes ion attribute names (where they work) but do use them inside attribute values (where they don't). I can't get too worked up over this, though. It's not like anyone is ever going to pay any attention to it anyway. I confidently predict that RDFa will be every bit as successful as RDF itself (which is, to say, not at all.) RDF has been to informaticians of the 21st century what hot fusion was to physicists of the 20th: a fun way to waste a career on a technology doomed to failure. At least the informaticians won't blow hundreds of millions of research dollars while they discover this.

Thursday, October 16, 2008 (Permalink)

The Mozilla Project has posted the first beta of Firefox 3.1 for Mac, Linux, and Windows. Version 3.1 is built on Gecko 1.9.1, improves Web standards support, adds a Text API to the Canvas Element, supports border images and JavaScript query selectors, and claims to improve the URL bar (though they've been missing a lot of low-hanging fruit there for years, so I'm skeptical.) This beta adds:

  • Web standards improvements in the Gecko layout engine
  • More CSS 2.1 and CSS 3 properties
  • A new tab-switching shortcut that shows previews of the tab you’re switching to
  • Improved control over the Smart Location Bar using special characters to restrict your search
  • Support for new web technologies such as the <video> and <audio> elements, the W3C Geolocation API, JavaScript query selectors, web worker threads, SVG transforms and offline applications.

Windows 2000 or later and Mac OS X 10.3.9 or later or Linux are required. Windows 98 and earlier and Mac OS X 10.2 and earlier are no longer supported. Final release is not expected until next year.

Tuesday, October 14, 2008 (Permalink)

Mulberry Technologies has announced the second annual Balisage: The Markup Conference, to take place in Montreal August 11-14, 2009. "Balisage: The Markup Conference is designed for markup theoreticians and practitioners who are pushing the boundaries of the field. It's all about the markup: how to create it; what it means; hierarchies and overlap; modeling; taxonomies; transformation; query, searching, and retrieval; and performance. In short, it's a technical XML Conference; it's an XSL Conference; it's a conference about XQuery, RDF, XSD, SGML, LMNL, XSL-FO, XTM, micro-formats, SVG, MathML, OWL, TexMECS, RNG, UBL, and a lot more. Balisage welcomes papers about topic maps, document modeling, archival markup, ontologies and vocabulary development, metadata, content management, versioning, and other markup-related topics." This is the successor to the popular and fun Extreme Markup Languages Conference: same organizers (good), same hotel (bad), but no longer under the auspices of the GCA.

Monday, October 13, 2008 (Permalink)

The OpenOffice Project has released OpenOffice 3.0, an open source office suite for Linux, Solaris, and Windows that saves all its files as zipped XML and uses XForms.

The Writer word processor has a cool new slider control for zooming, allows multi-page display while editing, has powerful new multilingual support, and boasts improved notes capabilities. As well as conventional office documents, Writer can now edit wiki documents for the web.

The Calc spreadsheet has been given another increase in capacity - now up to 1024 columns per sheet. It also has a powerful new equation solver, and a great new collaboration feature for multiple users.

Draw can now cope with poster-size graphics (up to 3sq metres), and Impress supports multiple monitors for presentations. Chart now produces much more clean looking graphics by default, and has a range of additional features requested by power users.

The popular built-in PDF export facility has been further enhanced with PDF/A support and a range of new user-selectable options.

OpenOffice.org 3 is now also available for the first time as a full Mac OS X application, bringing the power of the world's leading open-source office suite to a whole new group of users. And it's even easier than ever to persuade MS-Office users to upgrade to OpenOffice.org, with new support for MS-Access 2007 'accdb' files, improved support for VBA macros, and a new ability to read MS-Office Open XML files (Microsoft Office 2007 and Office 2008 documents)

OpenOffice.org's support for extensions is really coming of age with OpenOffice.org 3. A rapidly expanding number of additional features are available from different developers to add great features such as an Impress presenter console, support for business analytics, PDF import, and a whole new way of supporting additional languages.

Version 3.0 seems to do a pretty decent job of handing simple business documents and heavily formatted Microsoft Word templates and forms and what not. However it still fails as a professional writer's tool relative to Word. There's no plausible outliner, and the lack of a normal view is a deal breaker. I deinstalled Microsoft Office form my system a few months ago, but if I were writing another book I'd have to reinstall it.

Friday, October 10, 2008 (Permalink)

Version 1.5.2 of Chiba Core, an open source, XForms library for Java. This release "adds the new Xforms 1.1 schema which is now configurable. Therefore the nil value is now valid for a bunch of Schema types like data, double and the like. Localisation has been improved and now falls back to US locale as default if incoming value cannot be parsed. A new appender has been added to logging."

Thursday, October 9, 2008 (Permalink)

Opera Software has released version 9.6 of their namesake free-beer web browser for Windows, Mac, and and Linux. "Opera 9.6 is a recommended security and stability upgrade." In addition, feed reading has been improved with feed previews and magazine layout.

Wednesday, October 8, 2008 (Permalink)

Daniel Parker has released ServingXML 0.94, an open source Java library for flat/XML data transformations. It defines an extensible markup vocabulary for expressing flat-XML, XML-flat, flat-flat, and XML-XML processing in pipelines. ServingXML currently has an active user base with applications producing and consuming EDI files, FpML files, and many others." ServingXML is published under the Apache 2.0 license.

Tuesday, October 7, 2008 (Permalink)

The W3C Web API Working Group has posted the second public working draft of XMLHttpRequest Level 2. "XMLHttpRequest Level 2 enhances XMLHttpRequest with new features, such as cross-site requests, progress events, and the handling of byte streams for both sending and receiving."

Monday, October 6, 2008 (Permalink)

The W3C EXI Working Group has published the last call working draft of Efficient XML Interchange. Nothing significant has changed. It's still binary goop.

Friday, October 3, 2008 (Permalink)

The Mozilla Project has released Firefox 3.0.3 to fix a bug that prevented the retrieval of saved passwords.

Wednesday, October 1, 2008 (Permalink)

Code Synthesis has released XSD 3.2.0, a free-as-in-speech (GPL) C++ W3C XML Schema to C++ data binding library. New features in this release include:

  • Support for locating object model nodes with XPath queries.
  • Automatic assignment of namespace prefixes during serialization.
  • Polymorphism-aware object model comparison and printing.
  • Generation of non-copying constructors.
  • Support for the fractionDigits/totalDigits facets during serialization.
  • Generation of the XML Schema namespace into a separate header file.
  • Reduced usage of virtual inheritance which results in a much smaller object code size and faster C++ compilation.
Tuesday, September 30, 2008 (Permalink)

The Apache XML Project has released Xerces-C++ 3.0.0, an open source schema validating XML parser written in reasonably cross-platform C++. Version 3.0 improves supports for XInclude, XPath 2 and 64-bit code.

Friday, September 26, 2008 (Permalink)

Several updates from The Mozilla Project today. First up is an important security upgrade to Firefox 3.0.2. This release fixes security issues and other bugs.

The Mozilla Project has also released Firefox 2.0.0.17 that fixes the same security issues for folks like me who are running behind the curve.

Finally the Mac native Camino has been upgraded to Camino 1.6.4, with these and other fixes.

Thursday, September 25, 2008 (Permalink)

Bare Bones Software has released version 9.0.1 of BBEdit, my preferred text editor on the Mac, my favorite XML editor on any platforms, what I'm using to type these very words. This is a bug fix release. New copies cost $125. If you bought 8.7 or later this year, upgrades are free. Mac OS X 10.4 or later is required.

Wednesday, September 24, 2008 (Permalink)

The W3C Web Application Formats Working Group has posted a second last call working draft of Widgets 1.0 Requirements. "A widget is an interactive single purpose application for displaying and/or updating local data or data on the Web, packaged in a way to allow a single download and installation on a user's machine or mobile device. A widget may run as a stand-alone application (meaning it can run outside of a Web browser), and it is envisioned that the kind of widgets being standardized by this effort will one day be embedded into Web documents. In this document, the runtime environment in which a widget is run is referred to as a widget user agent. Note that running widgets may be the specific purpose of a widget user agent, or it may be a mode of operation of a more genetic user agent (eg. a Web browser). A widget running on a widget user agent is referred to as an instantiated widget. Prior to instantiation, a widget exists as a widget resource."

Monday, September 22, 2008 (Permalink)

The W3C Voice Browser, Web APIs, and Web Application Formats (WAF) Working Groups have posted a new working draft of Access Control for Cross-site Requests. According to the draft, "This document defines a mechanism to enable client-side cross-site requests. Specifications that want to enable cross-site requests in an API they define can use the algorithms defined by this specification. If such an API is used on http://example.org resources, a resource on http://hello-world.example can opt in using the mechanism described by this specification (e.g., specifying Access-Control-Allow-Origin: http://example.org as response header), which would allow that resource to be fetched cross-site from http://example.org."

Friday, September 19, 2008 (Permalink)

The W3C Scalable Vector Graphics Working Group has posted a new last call working draft of Scalable Vector Graphics (SVG) Tiny 1.2. SVG Tiny is a "a language for describing two-dimensional vector and mixed vector/raster graphics in XML. Its goal is to provide the ability to create a whole range of graphical content, from static images to animations to interactive Web applications. SVG Tiny 1.2 is a profile of SVG intended for implementation on a range of devices, from cellphones and PDAs to desktop and laptop computers."

Wednesday, September 17, 2008 (Permalink)

The OpenOffice Project has posted the first release candidate of OpenOffice 3.0, an open source office suite for Linux, Solaris, and Windows that saves all its files as zipped XML. Version 3.0 is the first release to support Mac OS X.


Planamesa Software has released NeoOffice/J 2.2.5, a Mac port of OpenOffice 2.1 using a Java-based GUI. Changes in this release appear minor.

Tuesday, September 16, 2008 (Permalink)

CodeWeavers has released Crossover Chromium, a port of the open source Google Chrome browser to Mac OS X and Linux. The product is a bit of a dancing bear. On the Mac, it feels and looks sort of funny, though I can't always put my fingers on why. I do notice the lack of a "File/New..." menu item. I do like the smart bar. It's certainly way more usable than a lot of products. Hopefully, the official Google port will clear up the strange inconsistencies. Mac OS X 10.4 or later is required on the Mac.

Monday, September 15, 2008 (Permalink)

The W3C Semantic Web Best Practices and Deployment Working Group and HTML Working Groups have published the proposed recommendation of RDFa in XHTML: Syntax and Processing.

The modern Web is made up of an enormous number of documents that have been created using HTML. These documents contain significant amounts of structured data, which is largely unavailable to tools and applications. When publishers can express this data more completely, and when tools can read it, a new world of user functionality becomes available, letting users transfer structured data between applications and web sites, and allowing browsing applications to improve the user experience: an event on a web page can be directly imported into a user's desktop calendar; a license on a document can be detected so that users can be informed of their rights automatically; a photo's creator, camera setting information, resolution, location and topic can be published as easily as the original photo itself, enabling structured search and sharing.

RDFa is a specification for attributes to express structured data in any markup language. This document specifies how to use RDFa with XHTML. The rendered, hypertext data of XHTML is reused by the RDFa markup, so that publishers don't need to repeat significant data in the document content. The underlying abstract representation is RDF [RDF-PRIMER], which lets publishers build their own vocabulary, extend others, and evolve their vocabulary with maximal interoperability over time. The expressed structure is closely tied to the data, so that rendered data can be copied and pasted along with its relevant structure.

The rules for interpreting the data are generic, so that there is no need for different rules for different formats; this allows authors and publishers of data to define their own formats without having to update software, register formats via a central authority, or worry that two formats may interfere with each other.

RDFa shares some use cases with microformats [MICROFORMATS]. Whereas microformats specify both a syntax for embedding structured data into HTML documents and a vocabulary of specific terms for each microformat, RDFa specifies only a syntax and relies on independent specification of terms (often called vocabularies or taxonomies) by others. RDFa allows terms from multiple independently-developed vocabularies to be freely intermixed and is designed such that the language can be parsed without knowledge of the specific term vocabulary being used.

This document is a detailed syntax specification for RDFa, aimed at:

  • those looking to create an RDFa parser, and who therefore need a detailed description of the parsing rules;
  • those looking to recommend the use of RDFa within their organisation, and who would like to create some guidelines for their users;
  • anyone familiar with RDF, and who wants to understand more about what is happening 'under the hood', when an RDFa parser runs.

For those looking for an introduction to the use of RDFa and some real-world examples, please consult the RDFa Primer.

Here's a syntax example from the draft:

<html
  xmlns="http://www.w3.org/1999/xhtml"
  xmlns:cal="http://www.w3.org/2002/12/cal/ical#"
  xmlns:xsd="http://www.w3.org/2001/XMLSchema"
  >
  <head><title>Jo's Friends and Family Blog</title></head>
  <body>
    <p>
      I'm holding
      <span property="cal:summary">
        one last summer Barbecue
      </span>,
      on
      <span property="cal:dtstart" content="20070916T1600-0500"
            datatype="xsd:datetime">
        September 16th at 4pm
      </span>.
    </p>
  </body>
</html>

The thing that jumps out at me are the use of namespace prefixes in attribute values. Haven't we learned by now that this is a bad idea?

The working group does not yet seem to have convinced the HTML 5 working group that this is a good idea, and I'm inclined to agree. While I might approve of putting in generic hooks for various metadata systems, I can't see why we should preference RDFa over simpler schemes that may actually work.

Friday, September 12, 2008 (Permalink)

The SIP for Instant Messaging and Presence Leveraging Extensions Working Group has submitted RFC 5261 An Extensible Markup Language (XML) Patch Operations Framework Utilizing XML Path Language (XPath) Selectors to the IETF. "Extensible Markup Language (XML) documents are widely used as containers for the exchange and storage of arbitrary data in today's systems. In order to send changes to an XML document, an entire copy of the new version must be sent, unless there is a means of indicating only the portions that have changed. This document describes an XML patch framework utilizing XML Path language (XPath) selectors. These selector values and updated new data content constitute the basis of patch operations described in this document. In addition to them, with basic <add>, <replace>, and <remove> directives a set of patches can then be applied to update an existing XML document. "

This reminds me a lot of XQuery Update, XUpdate, and XProc. I'm not sure exactly what this brings to the table, but it may be simpler and easier to comprehend. It would help if there were some clear use cases this was intended to solve.

It's not clear who worked on this, but they're some pretty fundamental mistakes throughout the spec. Namespace handling looks seriously broken, but maybe that's fixable. Character set handling is definitely broken. ID handling is broken.

Perhaps future drafts will improve. Someone remind me: is an IETF spec at this stage fixable or not? Oh damn: looks like it's not. This is what happens when amateurs build on top of specs they don't understand. There may be the nugget of a good idea here, but the implementation is crippled by bad design decisions and misunderstanding of XML. Sadly XML is not as simple as it's supposed to be, or as simple as it appears. That goes double for XPath and triple for namespaces. If you don't live and breathe this stuff, you need to work with someone who does before publishing new specs and tools. Sadly it doesn't look like the SIP for Instant Messaging and Presence Leveraging Extensions Working Group did that, and consequently they produced an incoherent, contradictory, nearly unimplementable spec. :-(

Thursday, September 11, 2008 (Permalink)

The W3C has published the last call working draft of the CSS Backgrounds and Borders Module Level 3. "This draft contains the features of CSS level 3 relating to borders and backgrounds. It includes and extends the functionality of CSS level 2 [CSS21], which builds on CSS level 1 [CSS1]. The main extensions compared to level 2 are borders consisting of images, boxes with multiple backgrounds, boxes with rounded corners and boxes with shadows. This module replaces two earlier drafts: CSS3 Backgrounds and CSS3 Border."

Wednesday, September 10, 2008 (Permalink)

Matt Mullenweg has released Wordpress 2.6.2 an open source (GPL) blog engine based on PHP and MySQL. "With open registration enabled, it is possible in WordPress versions 2.6.1 and earlier to craft a username such that it will allow resetting another user’s password to a randomly generated password. The randomly generated password is not disclosed to the attacker, so this problem by itself is annoying but not a security exploit. However, this attack coupled with a weakness in the random number seeding in mt_rand() could be used to predict the randomly generated password." All users should upgrade. If you don't have time to upgrade at the moment, turn off registration.

Tuesday, September 9, 2008 (Permalink)

The Mozilla Project has posted the second alpha of Firefox 3.1 for Mac, Linux, and Windows. Version 3.1 is built on Gecko 1.9.1, improves Web standards support, adds a Text API to the Canvas Element, supports border images and JavaScript query selectors, and claims to improve the URL bar (though they've been missing a lot of low-hanging fruit there for years, so I'm skeptical.) This alpha adds:

Windows 2000 or later and Mac OS X 10.3.9 or later or Linux are required. Windows 98 and earlier and Mac OS X 10.2 and earlier are no longer supported. Final release is not expected for another year.

Monday, September 8, 2008 (Permalink)

Mark Volkmann has release WAXy, an open source library for streaming XML output. It looks well done, though the API is a tad overabbreviated for my tastes. And of course allowing "all error checking to be turned off for performance" is a bug, not a feature, and directly contradicts the claim that WAX "always outputs well-formed XML or throws an exception." The checking also doesn't seem to be as strong as it should be. For instance, consider this method:

public ElementWAX cdata(String text) { if (checkMe) { if (state == State.IN_PROLOG || state == State.AFTER_ROOT) { // EMMA incorrectly says this isn't called. badState("cdata"); } } return text("<![CDATA[" + text + "]]>", true, false); }

Off the top of my head I see two different ways to get WAXy to emit malformed data through this method. Still, if the API is sensible, the implementation details can be worked out. WAX is published under the GNU Lesser General Public License.

Thursday, September 4, 2008 (Permalink)

Michael Kay has released versions 9.1.0.2 and 9.0.0.8 of Saxon, his XSLT 2.0 and XQuery processor for Java and .NET. "This is the first maintenance release for Saxon 9.1. It applies all the patches that have been issued to date (2008-09-02) on Subversion." 9.0.0.8 "is a maintenance release that applies a number of bug corrections to the Saxon 9.0 branch. Note that this is NOT the latest Saxon release; it is an update to an older branch. The current branch is 9.1."

Saxon is published in two versions for both of which Java 1.4 or later (or .NET) is required. Saxon 9.0B is an open source product published under the Mozilla Public License 1.0 that "implements the 'basic' conformance level for XSLT 2.0 and XQuery." Saxon 9.0 SA is a £250.00 payware version that "allows stylesheets and queries to import an XML Schema, to validate input and output trees against a schema, and to select elements and attributes based on their schema-defined type. Saxon-SA also incorporates a free-standard XML Schema validator. In addition Saxon-SA incorporates some advanced extensions not available in the Saxon-B product. These include a try/catch capability for catching dynamic errors, improved error diagnostics, support for higher-order functions, and additional facilities in XQuery including support for grouping, advanced regular expression analysis, and formatting of dates and numbers."

Wednesday, September 3, 2008 (Permalink)

Norm Walsh has posted Calabash 0.6.0, an open source command line appication that runs XProc pipelines. "This version of Calabash is supposed to implement the 14 August 2008 version of the XProc specification....This is a very alpha release; it is implemented in Java and should run on any platform that supports Java 1.5 or later and has a command-line." Calabash is published under the GNU General Public License, Version 2.0.

Tuesday, September 2, 2008 (Permalink)

Apparently a little earlier than planned, Google has announced Chrome, an open source WebKit-based browser for Windows. The actual release should take place at 11:00 AM PDT today. The real purpose of Chrome seems to be to provide a stronger platform for web applications such as GMail and Google Docs. The JavaScript engine is supposed to be faster (though Apple and Mozilla are also planning order of magnitude speedups to JavaScript) and one crashing/hanging tab no longer takes down the entire browser.

The name "Chrome" seems to be a bit of a misnomer. In fact, one of the points of this browser is to get as much browser chrome out of the way as possible to enable the web app to have more space. This is something I remember usability consultants asking for at a WWAC meeting over 10 years ago. It's a shame it took us more than a decade to get this far. I'll aslo be curious to see if Chrome enables web apps to have better control over keyboard shortcuts, menu bars, and context menus. As much as I like Google Docs and GMail, their interfaces have been somewhat crippled relative to desktop applications by competing with the browser itself.

Interestingly, Google the ultimate server side vendor, seems to be pushing more data down into the client. I'll have to explore and see how it works, but it looks like Chrome may be able to store and make use of a lot of details like your browsing history locally, without sending them to the mothership. That would be great for privacy if true. Previously, features like autosuggest requried way too much information to go back to Google. (I'm not sure about this. I'm just reading between the lines of some of the things they've said. Of course, if I'm wrong, this is all open source, so some hacker may crack out their compiler and make it true.)

Similarly offline apps are a major focus of Chrome. Google Gears or equivalent is built right in. Will you be able to run Google Docs without ever talking to Google after the inital download? Or edit a file on your disk without ever uploading it to Google's servers? We'll have to wait and see on that one.

You can open private tabs where "nothing that occurs in that windows is ever logged on your computer." That's a little disingenuous. I'm far more concerned about whether anything in that window is ever logged at Google, or elsewhere. For a start, how about refusing to run Google analytics scripts in that incognito tab? Then don;t log any searches or other Google hits I make from that incognito tab. Now that would be a killer feature.

Plug-ins are supported, though they break through all the security model of the browser. I wonder how long before someone ports AdBlock Plus and Customize Google to Chrome?

Monday, September 1, 2008 (Permalink)

Microsoft has posted the second public beta of Internet Explorer 8 for Windows. There's some neat stuff here, iincluding a privacy mode that's going to play havoc with Google Analytics, Doubleclick, and other services that want to track where you go and what you see on the Web. This will also block some Microsoft services, so it will be interesting to see if the feature survives internal demands to cripple it. On the downside, this beta is no longer using standards mode for all sites by default. I suspect by the time it ships, it will be using standards mode for no sites at all. Microsoft simply refuses to accept that standards conformant rendering is more important than backwards compatibility with old IE bugs. Microsoft has never realy believed that there is any such thing as a standard excpt what microsoft has itself created. At its core, Microosft believes that they, and only they, define the standard.

Friday, August 29, 2008 (Permalink)

Bare Bones Software has released version 9.0 of BBEdit, my preferred text editor on the Mac, my favorite XML editor on any platforms, what I'm using to type these very words. Some of the many new features in this release include:

  • The text views in browsing windows (disk browsers, search results, P4 opened, and similar) are now editable; rather than having to open a file into a new window from such a browser, you can just edit it right in the window.
  • Scratchpad:

    There is a new command on the Window menu: Show Scratchpad

    The Scratchpad window’s purpose is to be a space where you can manipulate text by performing transforms, manual edits, or batches of copy/paste.

    It is ideal for quickly beating text from one source into submission before pasting it elsewhere.

    The Scratchpad window automatically saves its content and state, eliminating those pesky “Save Untitled 237?” warnings when closing a window, or quitting BBEdit.

    The Scratchpad is also available from BBEdit’s dock menu.

    Finally, there is a new item on BBEdit’s Services menu: Append Selection to Scratchpad. This command will take the selected text, and place it at the end of the scratchpad, attempting to preserve any selection that was previously present. The Scratchpad window does NOT need to be open to use this command. Any text appended in this fashion will be present the next time the window is opened.

  • Improved Ruby support
  • "Save as Styled Text” and “Copy as Styled Text”
  • Reorganized Find. For a long time, BBEdit had what was in my opinion the single best find and replace implementation ever seen in a text editor. It wasn't flawless, just better than everything else. This actually got a little worse in the last couple of releases, but this release corrects at least some of the flaws. Unfortunately it introduces one big new UI flaw that's pretty serious. The Find and Multifile Find dialogs have been split in two.
  • Much improved Ruby support (BBEdit's biggest weakness relative to TextMate)
  • Improved Python support
  • Objective C 2.0 support
  • Ponies
  • Text completion. This is incredibly annoying, and I think I'll have to turn it off. If I don't want pop-ups in my web browser, why would I want them in my text editor? I may be right in the bottom of the trough for this feature: too fast a typist to need it, and too slow a typist to finish my thought before the pop-up appears. What's really weird about this one is it isn't doing simple code completion like most IDEs. It's actually trying to complete ordinary words as I type this very text.

    Furthermore, when faced with the word don't, the autocomplete looks for completions that begin with the letter "t": table, take, taking, etc. That makes no sense at all.

    The pop-up even stays open when there's no possible completion.

    I try to ignore the popups and look at my keyboard like the bad two-finger typist I am. However even then I occasionally find an arrow key or the return key, navigating through the pop-up instead of the text. Worse yet, if a pop-up is open and I click the mouse to reposition the insertion point somewhere outside of it, the pop-up goes away but the insertion point doesn't get repositioned. I end upm typing at the old location instead of the new.

    Enough is enough. I'm turning this misfeature off now. I think this is an experimental feature that got out the door way before it was ready.

Barebones fixed at least one of the annoying bugs I reported in the HTML tools with 8.7. However the paragraph tool still pops up an annoying modal dialog, and the acronym and abbreviation tools don't. This really needs to be corrected.

Upgrades cost $30. New copies cost $125. If you bought 8.7 this year, upgrades are free. Mac OS X 10.4 or later is required.

Thursday, August 28, 2008 (Permalink)

IDEAlliance has posted the call for participation for XML 2008. This year it moves back to the D.C. area (Marriott Crystal Gateway, Arlington, Virginia to be precise) after several years in Boston. I've had a great time at this show for the last couple of years, and it's the major XML event in the U.S. The show runs from December 8-10. I likely won't be able to attend this year, but I do highly recommend it nonetheless.

Wednesday, August 27, 2008 (Permalink)

The W3C has decided to publish its test suites under dual license its test suites.

Licenses for distribution of W3C Test Suites should satisfy two goals:

  1. Enable developers to use test cases easily, and promote software development and bugtracking.
  2. Enable a W3C Working Group to create a branded, "Authoritative W3C Test Suite" to reflect the group consensus process, and to promote interoperability and stability of performance claims.

To achieve these goals, W3C makes available Test Suites under two distinct licenses for two mutually exclusive uses:

  1. a 3-clause BSD License for software development, bugtracking, and other applications that do not require assertions of performance to the public or implied claims of conformance to a W3C Specification. See summary of 3-clause BSD License.
  2. a W3C Test Suite License for an Authoritative W3C Test Suite or when claims of performance with respect to a specification are required. See summary of W3C Test Suite License.

The choice of license is up to the licensee for every single use of tests from a W3C Test Suite. It will typically depend on usage requirements: the first one allows changes, the second does not.

Tuesday, August 26, 2008 (Permalink)

Olivier Thereaux has added experimental support for HTML 5 to the W3C markup validator. as part of an effort to promote the current work on HTML to web developers, I've been "The result of that work is in CVS, and testable on the dev instance of the validator: http://qa-dev.w3.org/wmvs/HEAD/".

Monday, August 25, 2008 (Permalink)

Opera Software has released version 9.5.2 of their namesake free-beer web browser for Windows, Mac, and and Linux. It's not immediately obvious what's new in 9.5.2. Presumably bugs have been fixed.

Sunday, August 24, 2008 (Permalink)

Planamesa Software has released NeoOffice/J 2.2.4 Patch 5, a Mac port of OpenOffice 2.1 using a Java-based GUI. New features since 2.2.3 include a media browser, horizontal scrolling, native floating palettes, multitouch trackpad zoom and scroll up, slideshow remote control support and "Slideshow presentations now run up to 33% faster."

Friday, August 22, 2008 (Permalink)

Benjamin Pasero has posted milestone 8a of RSSOwl 2.0, an open source RSS/Atom reader written in Java and based on the SWT, db4o, and Lucene.

Thursday, August 21, 2008 (Permalink)

The W3C POWDER Working Group has updated five working drafts including three in last call.

According to the primer,

POWDER allows a variety of questions to be answered about a given Web resource or group of resources, without having to actually retrieve and inspect the resource(s).

At first a Description Resource is simply a claim: somebody is making some statement about a given resource, or group of resources. However, most users would have to trust the person that made the claim before deciding whether to trust the data. If a DR is made available directly by a well-known content provider that is trusted to uphold a certain level of quality, then the data might readily be trusted. However, this will not always be sufficient. Since a DR may be published by anyone, anywhere, to describe anything, an end user may reasonably want to query the cited author of the DR to ask questions such as: Did you really make that claim? And, if so, when? Would you make the same claim today?

For some situations this might still not be sufficient for the end user. To facilitate the further extension of trust a means has been provided to allow certification of DRs. A Description Resource that has been certified immediately gains in trust, through the verification by a third and trusted party of the original claims made by the DR author.

Through the combination of these tools various questions, not limited to the following, can be answered using a Description Resource, without having to retrieve the resource itself:

  • Which resources does the DR describe?
  • What is the description?
  • Who has created the description?
  • When was the description created?
  • Until when is the description considered valid?
  • From when is the description considered valid?
  • Does anybody agree with this description?
  • Do other descriptions exist about this group of resources?
Wednesday, August 20, 2008 (Permalink)

The W3C Content Transformation Task Force has posted the last call working draft of Content Transformation Landscape 1.0:

From the point of view of this document, Content Transformation is the manipulation in various ways, by proxies, of requests made to and content delivered by an origin server with a view to making it more suitable for mobile presentation.

The W3C Mobile Web Best Practices Working Group neither approves nor disapproves of Content Transformation, but recognizes that is being deployed widely across mobile data access networks. The deployments are widely divergent to each other, with many non-standard HTTP implications, and no well-understood means either of identifying the presence of such transforming proxies, nor of controlling their actions. This document establishes a framework to allow that to happen.

The overall objective of this document is to provide a means, as far as is practical, for users to be provided with at least a "functional user experience" [Device Independence Glossary] of the Web, when mobile, taking into account the fact that an increasing number of content providers create experiences specially tailored to the mobile context which they do not wish to be altered by third parties. Equally it takes into account the fact that there remain a very large number of Web sites that do not provide a functional user experience when perceived on many mobile devices.

Tuesday, August 19, 2008 (Permalink)

Version 1.5 of Chiba Core, an open source, XForms library for Java. Version 1.5 implements most of the functions defined by XForms 1.1 and has switched back from Maven to Ant.

Monday, August 18, 2008 (Permalink)

No sooner do I finally get upgraded to WordPress 2.6.0 than version 2.6.1 rolls out. For a pleasant change, though, there are no security issues fixed in this release so upgrades are optional. I think I'll wait for 2.6.2 or 2.7 myself.

Friday, August 15, 2008 (Permalink)

The W3C XML Processing Model Working Group has published a new last call Working Draft of XProc: An XML Pipeline Language. "This specification describes the syntax and semantics of XProc: An XML Pipeline Language, a language for describing operations to be performed on XML documents. An XML Pipeline specifies a sequence of operations to be performed on zero or more XML documents. Pipelines generally accept zero or more XML documents as input and produce zero or more XML documents as output. Pipelines are made up of simple steps which perform atomic operations on XML documents and constructs similar to conditionals, iteration, and exception handlers which control which steps are executed."

Thursday, August 14, 2008 (Permalink)

The Jakarta Apache Project has released JXPath 1.3, a class library that "applies XPath expressions to graphs of objects of all kinds: JavaBeans, Maps, Servlet contexts, DOM etc, including mixtures thereof." This seems mostly to be a bug fix release.

Wednesday, August 13, 2008 (Permalink)

The W3C Web API Working Group has published the candidate recommendation of Element Traversal Specification. "This specification defines the ElementTraversal interface, which allows script navigation of the elements of a DOM tree, excluding all other nodes in the DOM, such as text nodes. It also provides an attribute to expose the number of child elements of an element. It is intended to provide a more convenient alternative to existing DOM navigation interfaces, with a low implementation footprint."

ElementTraversal provides some extra properties/methods for navigating only through elements, while ignoring text and white space:

  • firstElementChild
  • lastElementChild
  • previousElementSibling
  • nextElementSibling
  • childElementCount

This makes it easier to process record-like XML, but inappropriate for reading documents with mixed content. It may be mildly helpful if it achieves broad adoption in browsers. However at this point adding more methods to DOM is just putting lipstick on a pig. Until we admit that DOM was a mistake, we can't really begin to address our problems.

Tuesday, August 12, 2008 (Permalink)

The W3C Web Accessibility Initiative Protocols & Formats Working Group has posted a working draft of Accessible Rich Internet Applications (WAI-ARIA) Version 1.0:

The domain of Web accessibility defines how to make Web content usable by people with disabilities. People with some types of disabilities use Assistive Technology (AT) to interact with content. AT can transform the presentation of content into a format more suitable to the user, and can allow the user to interact in different ways than the author designed. In order to accomplish this, AT must understand the semantics of the content. Semantics are knowledge of roles, states, and properties, as a person would understand them, that apply to elements within the content. For instance, if a paragraph is semantically identified as such, AT can interact with it as a unit separable from the rest of the content, knowing the exact boundaries of that paragraph. A slider or tree widget is a more complex example, in which various parts of a widget each have semantics that must be properly identified for the computer to support effective interaction.

Established content technologies define semantics for elements commonly used in those technologies. However, new technologies can overlook some of the semantics required for accessibility. Furthermore, new authoring practices evolve which override the intended semantics—elements that have one defined semantic meaning in the technology are used with a different semantic meaning intended to be understood by the user.

For example, Rich Internet Applications developers can create a tree widget in HTML using CSS and JavaScript even though HTML lacks a semantic element for that. A different element must be used, possibly a list element with display instructions to make it look and behave like a tree widget. Assistive technology, however, must present the element in a different modality and the display instructions may not be applicable. The AT will present it as a list, which has very different display and interaction from a tree widget, and the user may be unsuccessful at understanding and operating the widget.

The incorporation of WAI-ARIA is a way for an author to provide proper type semantics on custom widgets (elements with repurposed semantics) to make these widgets accessible, usable and interoperable with assistive technologies. This specification identifies the types of widgets and structures that are recognized by accessibility products, by providing an ontology of corresponding roles that can be attached to content. This allows elements with a given role to be understood as a particular widget or structural type regardless of any semantic inherited from the implementing technology. Roles are a common property of platform Accessibility APIs which applications use to support assistive technologies. Assistive technology can then use the role information to provide effective presentation and interaction with these elements.

This role taxonomy currently includes interaction widget (user interface widget) and structural document (content organization) types of objects. The role taxonomy describes inheritance (widgets that are types of other widgets) and details what states and properties each role supports. When possible, information is provided about mapping of roles to accessibility APIs.

Roles are element types and should not change with time or user actions. Changing the role on an element from its inital value will be treated, via accessibility API events, as the removal of the old element and insertion of a new element with the new role.

Changeable states and properties of elements are also defined in this specification. States and Properties are used to declare important properties of an element that affect and describe interaction. These properties enable the user agent or operating system to properly handle the element even when these properties are altered dynamically by scripts. For example, alternative input and output technology such as screen readers, speech dictation software and on-screen keyboards must recognize the state of an element (such as: if an object is disabled, checked, focused, collapsed, hidden, etc.).

While it is possible for assistive technologies to access these properties through the Document Object Model [DOM], the preferred mechanism is for the user agent to map the States and Properties to the accessibility API of the operating system.

Monday, August 11, 2008 (Permalink)

The W3C Mobile Web Initiative has published the finished recommendation of Mobile Web Best Practices 1.0. Here's the summary of the guidelines:

  1. [THEMATIC_CONSISTENCY] Ensure that content provided by accessing a URI yields a thematically coherent experience when accessed from different devices.

  2. [CAPABILITIES] Exploit device capabilities to provide an enhanced user experience.

  3. [DEFICIENCIES] Take reasonable steps to work around deficient implementations.

  4. [TESTING] Carry out testing on actual devices as well as emulators.

  5. [URIS] Keep the URIs of site entry points short.

  6. [NAVBAR] Provide only minimal navigation at the top of the page.

  7. [BALANCE] Take into account the trade-off between having too many links on a page and asking the user to follow too many links to reach what they are looking for.

  8. [NAVIGATION] Provide consistent navigation mechanisms.

  9. [ACCESS_KEYS] Assign access keys to links in navigational menus and frequently accessed functionality.

  10. [LINK_TARGET_ID] Clearly identify the target of each link.

  11. [LINK_TARGET_FORMAT] Note the target file's format unless you know the device supports it.

  12. [IMAGE_MAPS] Do not use image maps unless you know the device supports them effectively.

  13. [POP_UPS] Do not cause pop-ups or other windows to appear and do not change the current window without informing the user.

  14. [AUTO_REFRESH] Do not create periodically auto-refreshing pages, unless you have informed the user and provided a means of stopping it.

  15. [REDIRECTION] Do not use markup to redirect pages automatically. Instead, configure the server to perform redirects by means of HTTP 3xx codes.

  16. [EXTERNAL_RESOURCES] Keep the number of externally linked resources to a minimum.

  17. [SUITABLE] Ensure that content is suitable for use in a mobile context.

  18. [CLARITY] Use clear and simple language.

  19. [LIMITED] Limit content to what the user has requested.

  20. [PAGE_SIZE_USABLE] Divide pages into usable but limited size portions.

  21. [PAGE_SIZE_LIMIT] Ensure that the overall size of page is appropriate to the memory limitations of the device.

  22. [SCROLLING] Limit scrolling to one direction, unless secondary scrolling cannot be avoided.

  23. [CENTRAL_MEANING] Ensure that material that is central to the meaning of the page precedes material that is not.

  24. [GRAPHICS_FOR_SPACING] Do not use graphics for spacing.

  25. [LARGE_GRAPHICS] Do not use images that cannot be rendered by the device. Avoid large or high resolution images except where critical information would otherwise be lost.

  26. [USE_OF_COLOR] Ensure that information conveyed with color is also available without color.

  27. [COLOR_CONTRAST] Ensure that foreground and background color combinations provide sufficient contrast.

  28. [BACKGROUND_IMAGE_READABILITY] When using background images make sure that content remains readable on the device.

  29. [PAGE_TITLE] Provide a short but descriptive page title.

  30. [NO_FRAMES] Do not use frames.

  31. [STRUCTURE] Use features of the markup language to indicate logical document structure.

  32. [TABLES_SUPPORT] Do not use tables unless the device is known to support them.

  33. [TABLES_NESTED] Do not use nested tables.

  34. [TABLES_LAYOUT] Do not use tables for layout.

  35. [TABLES_ALTERNATIVES] Where possible, use an alternative to tabular presentation.

  36. [NON-TEXT_ALTERNATIVES] Provide a text equivalent for every non-text element.

  37. [OBJECTS_OR_SCRIPT] Do not rely on embedded objects or script.

  38. [IMAGES_SPECIFY_SIZE] Specify the size of images in markup, if they have an intrinsic size.

  39. [IMAGES_RESIZING] Resize images at the server, if they have an intrinsic size.

  40. [VALID_MARKUP] Create documents that validate to published formal grammars.

  41. [MEASURES] Do not use pixel measures and do not use absolute units in markup language attribute values and style sheet property values.

  42. [STYLE_SHEETS_USE] Use style sheets to control layout and presentation, unless the device is known not to support them.

  43. [STYLE_SHEETS_SUPPORT] Organize documents so that if necessary they may be read without style sheets.

  44. [STYLE_SHEETS_SIZE] Keep style sheets small.

  45. [MINIMIZE] Use terse, efficient markup.

  46. [CONTENT_FORMAT_SUPPORT] Send content in a format that is known to be supported by the device.

  47. [CONTENT_FORMAT_PREFERRED] Where possible, send content in a preferred format.

  48. [CHARACTER_ENCODING_SUPPORT] Ensure that content is encoded using a character encoding that is known to be supported by the target device.

  49. [CHARACTER_ENCODING_USE] Indicate in the response the character encoding being used.

  50. [ERROR_MESSAGES] Provide informative error messages and a means of navigating away from an error message back to useful information.

  51. [COOKIES] Do not rely on cookies being available.

  52. [CACHING] Provide caching information in HTTP responses.

  53. [FONTS] Do not rely on support of font related styling.

  54. [MINIMIZE_KEYSTROKES] Keep the number of keystrokes to a minimum.

  55. [AVOID_FREE_TEXT] Avoid free text entry where possible.

  56. [PROVIDE_DEFAULTS] Provide pre-selected default values where possible.

  57. [DEFAULT_INPUT_MODE] Specify a default text entry mode, language and/or input format, if the target device is known to support it.

  58. [TAB_ORDER] Create a logical order through links, form controls and objects.

  59. [CONTROL_LABELLING] Label all form controls appropriately and explicitly associate labels with form controls.

  60. [CONTROL_POSITION] Position labels so they lay out properly in relation to the form controls they refer to.

Friday, August 8, 2008 (Permalink)

The Mozilla Project has released Camino 1.6.3, an open source Mac OS X web browser based on the Gecko 1.8 rendering engine and the Quartz GUI toolkit. It supports pretty much all the technologies that Mozilla does: HTML, XHTML, CSS, XML, XSLT, etc. Version 1.6 adds much improved AppleScript support an various user interface enhancements. Mac OS X 10.3 or later is required.

Thursday, August 7, 2008 (Permalink)

The W3C XQuery working group has posted a new candidate recommendation of XQuery Update Facility. XQuery as it currently exists is basically just SELECT in SQL terms. XQuery Update adds INSERT, UPDATE, and DELETE. More specifically it is:

  • upd:mergeUpdates
  • upd:revalidate
  • upd:applyUpdates
  • upd:insertBefore
  • upd:insertAfter
  • upd:insertInto
  • upd:insertIntoAsFirst
  • upd:insertIntoAsLast
  • upd:insertAttributes
  • upd:delete
  • upd:replaceNode
  • upd:replaceValue
  • upd:replaceElementContent
  • upd:rename
  • upd:removeType
  • upd:setToUntyped

This is one of the last two pieces before XQuery 1.0 is really complete. (The other is full-text search.)

Wednesday, August 6, 2008 (Permalink)

The W3C Semantic Web Activity has apparently found it necessary to publish yet another foundational technology for the semantic web, the Rule Interchange Format, "a family of rule interchange dialects that allows rules to be translated between rule languages and thus transferred between rule systems." Hmm, isn't this sort of semantic translation exactly what first RDF and then OWL were supposed to enable? I guess it's still turtles all the way up.

In any case there are now six working drafts:

"RIF Basic Logic Dialect" (BLD) specifies an XML format for rules at an intermediate expressive power. The language is roughly Horn rules with URIs, datatypes, and builtins. This goes beyond datalog (it has function terms), but does not provide any kind of negation. "RIF RDF and OWL Compatibility" explains and specifies how RIF rulesets are to be used in combination with RDF and OWL. Comments on these documents welcome until 19 September. In addition, RIF Production Rule Dialect (PRD) specifies an XML format for the exchange of production rules. PRD and BLD are expected to be the basis of the two main dialect-branches, with RIF Core being the things in common between the two. RIF Framework for Logic Dialects (FLD) and RIF Datatypes and Builtins (DTB) provide common elements for specific dialects to use. RIF Uses Cases and Requirements (UCR), last published about two years ago, has been simplified and now has examples written in the PRD and BLD presentation syntaxes.

Tuesday, August 5, 2008 (Permalink)

The W3C has published the last call working draft of the CSS Marquee Module Level 3. "When documents (e.g., HTML) are laid out on visual media (e.g., screen or print) and the contents of some element are too large for a given area, CSS allows the designer to specify whether and how the overflow is displayed. One way, available on certain devices, is the “marquee” effect: the content is animated and moves automatically back and forth. This module defines the properties to control that effect."

Monday, August 4, 2008 (Permalink)

The W3C CSS Working Group has published the Candidate Recommendation of CSS Mobile Profile 2.0.. "This specification defines in general a subset of CSS 2.1 [CSS21] that is to be considered a baseline for interoperability between implementations of CSS on constrained devices (e.g. mobile phones). Its intent is not to produce a profile of CSS incompatible with the complete specification, but rather to ensure that implementations that due to platform limitations cannot support the entire specification implement a common subset that is interoperable not only amongst constrained implementations but also with complete ones. Additionally, this specification aligns itself as much as possible with the OMA Wireless CSS 1.1 [WCSS11] specification. At the same time, OMA is doing alignment work in OMA Wireless CSS 1.2 [WCSS12]. It is aimed at aligning the mandatory compliance items between CSS Mobile Profile 2.0 and OMA Wireless CSS 1.2 [WCSS12]."

Friday, August 1, 2008 (Permalink)

Efficient XML Interchange continues to chug right along. The working groups has now published one updated and one new working drafts:

The latter is particularly interesting:

This document presents the anticipated benefits of the EXI format 1.0 compared to XML and gzipped XML. Additionally, tests for compactness include comparison to ASN.1 PER. [Ed: What? No protobufs?] The points of comparison are the requirements set by the EXI Working Group charter, based on the results of the XML Binary Characterization Working Group.

This summarized evaluation of the EXI format uses the testing framework built during the first phase of the EXI Working Group's work so as to select a baseline candidate technology. Although this evaluation aims at demonstrating EXI benefits in the targeted XBC Use Cases, it can be read as a summary of the EXI measurements Note.

They're some nice graphs and tables that make EXI sounds like a good idea when taken at face value, but some of the claims the documents makes for XML are simply false; and some of the goals it sets for EXI are actively harmful (specifically the ones that involve schema awareness.) I'm getting a little verklempt. Talk amongst yourselves. I'll give you a topic: a Efficient XML Interchange is neither efficient, XML, nor interchangeable. Discuss!

Thursday, July 31, 2008 (Permalink)

The W3C XHTML working group has posted the final recommendation of XHTML Basic 1.1:

The XHTML Basic document type includes the minimal set of modules required to be an XHTML host language document type, and in addition it includes images, forms, basic tables, and object support. It is designed for Web clients that do not support the full set of XHTML features; for example, Web clients such as mobile phones, PDAs, pagers, and settop boxes. The document type is rich enough for content authoring.

XHTML Basic is designed as a common base that may be extended. The goal of XHTML Basic is to serve as a common language supported by various kinds of user agents.

This revision, 1.1, supercedes version 1.0 as defined in http://www.w3.org/TR/2000/REC-xhtml-basic-20001219. In this revision, several new features have been incorporated into the language in order to better serve the small-device community that is this language's major user:

  1. XHTML Forms (defined in [XHTMLMOD])
  2. Intrinsic Events (defined in [XHTMLMOD])
  3. The value attribute for the li element (defined in [XHTMLMOD])
  4. The target attribute (defined in [XHTMLMOD])
  5. The style element (defined in [XHTMLMOD])
  6. The style attribute (defined in [XHTMLMOD])
  7. XHTML Presentation module (defined in [XHTMLMOD])
  8. The inputmode attribute (defined in Section 5 of this document)

The document type definition is implemented using XHTML modules as defined in "XHTML Modularization"

Tuesday, July 29, 2008 (Permalink)

The Mozilla Project has posted the first alpha of Firefox 3.1 for Mac, Linux, and Windows. Version 3.1 is built on Gecko 1.9.1, improves Web standards support, adds a Text API to the Canvas Element, supports border images and JavaScript query selectors, and claims to improve the URL bar (though they've been missing a lot of low-hanging fruit there for years, so I'm skeptical.) Windows 2000 or later and Mac OS X 10.3.9 or later or Linux are required. Windows 98 and earlier and Mac OS X 10.2 and earlier are no longer supported. Final release is not expected for another year.

Monday, July 28, 2008 (Permalink)

The Apache XML Project has posted a beta of Xerces-C++ 3.0.0, an open source schema validating XML parser written in reasonably cross-platform C++. Version 3.0 improves supports for XPath 2 and 64-bit code.

Friday, July 25, 2008 (Permalink)

Tim Bacon and Jeff Martin have released XMLUnit 1.2, an extension to the popular JUnit testing framework that allows assertions to be made about the equality of whole XML Documents, XPath result trees, and XPath expressions. "The major new feature of XMLUnit for Java 1.2 is an alternative XML validation subsystem built on top of JAXP 1.3 which supports validation against alternative XML Schema languages - if your JAXP implementation supports them - and validation of the Schema definition itself."

Wednesday, July 23, 2008 (Permalink)

The W3C CSS Working Group has published the last call working draft of CSS Color Module Level 3. This candidate spec describes properties such as color, color-profile, and opacity.

Tuesday, July 22, 2008 (Permalink)

The W3C XQuery working group has published the first working drafts of XQuery 1.1 and XQuery 1.1 Use Cases. So far there are just two significant additions listed:

Sunday, July 20, 2008 (Permalink)

Balisage: The Markup Conference has posted the call for posters. The poster session here is somewhat more informal and less peer-reviewed than at most conferences. Short version: anything goes.

Friday, July 18, 2008 (Permalink)

Several updates from The Mozilla Project today. First up is the initial bug fix release in the 3.0 tree. Firefox 3.0.1. This release fixes security issues and other bugs. All 3.x users should upgrade.

The Mozilla Project has also released Firefox 2.0.0.16. This release fixes security issues. All 2.x users should upgrade.

A new version of SeaMonkey has also been posted, though Camino doesn't seem to have been updated yet. Camino users may want to switch to Firefox or Safari for the time being.

Wednesday, July 16, 2008 (Permalink)

JavaRanch is running a special promotion for Refactoring HTML in their HTML and JavaScript Forum:

Some time during the day on Friday, the promotion will end and the forum will be combed for qualifying messages since the promotion began (this is an automated process). From these, four winners will be randomly selected. Be sure to check the list of winners which will be posted in the forum on Friday. Winners must send their snail mail address and day time phone number to bookpromotion AT javaranch DOT com. (We apologize for not having a direct email link but the spammers are killing us.) The publisher will then send each winner a free copy of the book! Remember, to win you must be registered with a name that meets the JavaRanch Naming Policy and your post must be on topic.

Don't forget the best part: During the promotion, the author(s) of the book will be hanging out to answer questions!

And just in case you forgot, that author is me. :-)

Tuesday, July 15, 2008 (Permalink)

The OpenOffice Project has posted the second beta of OpenOffice 3.0, an open source office suite for Linux, Solaris, and Windows that saves all its files as zipped XML. Relative to beta 1, this seems to be a bug fix release.

Monday, July 14, 2008 (Permalink)

The W3C Voice Browser Working Group has posted the last call working draft of the Speech Synthesis Markup Language Version 1.1. "This document enhances SSML 1.0 [SSML] to provide better support for a broader set of natural (human) languages. To determine in what ways, if any, SSML is limited by its design with respect to supporting languages that are in large commercial or emerging markets for speech synthesis technologies but for which there was limited or no participation by either native speakers or experts during the development of SSML 1.0, the W3C held three workshops on the Internationalization of SSML. The first workshop [WS], in Beijing, PRC, in October 2005, focused primarily on Chinese, Korean, and Japanese languages, and the second [WS2], in Crete, Greece, in May 2006, focused primarily on Arabic, Indian, and Eastern European languages. The third workshop [WS3], in Hyderabad, India, in January 2007, focused heavily on Indian and Middle Eastern languages. Information collected during these workshops was used to develop a requirements document [REQS11]. Changes from SSML 1.0 are motivated by these requirements."

Friday, July 11, 2008 (Permalink)

The W3C has published the first working draft of Protocol for Web Description Resources (POWDER): Formal Semantics. "This document underpins the Protocol for Web Description Resources (POWDER). It describes how the relatively simple operational format of a POWDER document can be transformed through two stages, first into a more tightly constrained XML format (POWDER-BASE), and then into an RDF/OWL encoding (POWDER-S) that may be processed by Semantic Web tools. Such processing is only possible, however, if tools implement the semantic extension defined within this document."

Thursday, July 10, 2008 (Permalink)

The W3C POWDER Working Group has published a new working draft of Protocol for Web Description Resources (POWDER): Grouping of Resources. "The Protocol for Web Description Resources (POWDER) facilitates the publication of descriptions of multiple resources such as all those available from a Web site. This document describes how sets of IRIs can be defined such that descriptions or other data can be applied to the resources obtained by dereferencing IRIs that are elements of the set. IRI sets are defined as XML elements with relatively loose operational semantics. This is underpinned by the formal semantics of POWDER which include a semantic extension, both defined separately. A GRDDL transform is associated with the POWDER namespace that maps the operational to the formal semantics."

Wednesday, July 9, 2008 (Permalink)

Sun has released version 0.6 of xmlroff, an open source XSL Formatting Objects to PDF and PostScript converter. xmlroff is written in C for Linux, and relies on the libxml2, libxslt, and the GLib, and GObjectfrom GTK+ and GNOME (though neither GTK+ nor Gnome is required). It also needs PDFlib, FreeType2, and Fontconfig. xmlroff can be run from the command line. It also includes a libfo library. This version fixes bugs and changes the license to BSD.

Tuesday, July 8, 2008 (Permalink)

Google has released protobufs. Think of protobufs as doing for ASN.1 what XML did for SGML. That is, it's a simpler format for exchanging binary data that mere mortals may be able to use. Libraries are available for C++, Java, an Python; and the format is well-documented for anyone else who wants to work in some other language. According to Google,

Protocol buffers have many advantages over XML for serializing structured data. Protocol buffers:

  • are simpler
  • are 3 to 10 times smaller
  • are 20 to 100 times faster
  • are less ambiguous
  • generate data access classes that are easier to use programmatically

For example, let's say you want to model a person with a name and an email. In XML, you need to do:

  <person>
    <name>John Doe</name>

    <email>jdoe@example.com</email>
  </person>

while the corresponding protocol buffer message definition (in protocol buffer text format) is:

  person {
    name = "John Doe"
    email = "jdoe@example.com"
  }

In binary format, this message would probably be 28 bytes long and take around 100-200 nanoseconds to parse. The XML version is at least 69 bytes (if you remove whitespace) and would take around 5,000-10,000 nanoseconds to parse.

Also, manipulating a protocol buffer is much easier:

  cout << "Name: " << person.name() << endl;
  cout << "E-mail: " << person.email() << endl;

Whereas with XML you would have to do something like:

  cout << "Name: "
       << person.getElementsByTagName("name")->item(0)->innerText()
       << endl;
  cout << "E-mail: "
       << person.getElementsByTagName("email")->item(0)->innerText()
       << endl;

However, protocol buffers are not always a better solution than XML – for instance, protocol buffers would not be a good way to model a text-based document with markup (e.g. HTML), since you cannot easily interleave structure with text. In addition, XML is human-readable and human-editable; protocol buffers, at least in their native format, are not. XML is also – to some extent – self-describing. A protocol buffer is only meaningful if you have the message definition (the .proto file).

I think Google is overstating the downsides of XML here. They make the common mistake of conflating a horrible API (DOM) with XML itself. In a sane API, you'd just do something like this:

<xsl:template match='person'>
  <xsl:value-of select="name"/>
  <xsl:text/>
</xsl:text>
  <xsl:value-of select="email"/>
<xsl:template>

This would look even simpler in XQuery or E4X, but I don't have enough practice with those languages to type them with reasonable confidence before my morning coffee.

Still, maybe this binary format can give the people who really need (or who think they need) a binary format for efficiency or other reasons their own sandbox, so they can stop peeing in ours.

Protobufs do show one lesson learned from experience: they mirror XML's must-ignore semantics. It is possible to put extra fields in a protobuf and not break every downstream consumer that doesn't know about those fields. That's a rare quality in a binary format.

One question I have is what does it mean for a protobuf to be malformed? How easy is it to detect a corrupt byte stream? What will happen if someone deliberately attempts to feed bad data to a protobuf consumer? Protobufs are clearly designed with the idea in mind of taking bytes off the wire or from disk and shoving them into memory. This technique has been incredibly dangerous in the past, and led to incredibly brittle software. Whether the protobuf libraries are actually doing that or not, I'm not sure. However although I do see wire format documentation on Google's site, I don't see an actual BNF grammar anywhere and that makes me nervous. A good rule of thumb for any wire format or file format (and protobufs are really both) is that consumers must be prepared for absolutely any byte stream as input, whether it's what they expect or not. Any byte stream that does not satisfy the grammar must be detected and rejected. Any byte stream that does satisfy the grammar must be acceptable. Never trust external input to a program without verification. Anything less is insecure and dangerous. I do note that there C++ examples return error codes rather than throwing exceptions on parse failure, which smells bad to this java programmer, but maybe that's just C++.

The real question in my mind is whether protobufs have any hope of working over the public Internet. Schema-dependent, opaque binary formats work a lot better behind the firewall where one group writes the software to both produce and consume the data, than over the heterogenous world of the Internet where you have little idea who's reading your data or why. In that world, self-describing text makes all the difference, efficiency be damned.

Monday, July 7, 2008 (Permalink)

The W3C XML Schema Working Group has posted new last call working drafta of XML Schema 1.1 Part 1: Structures and XML Schema Definition Language (XSD) 1.1 Part 2: Datatypes According to the structures draft,

XSD 1.1 retains all the essential features of XSD 1.0, but adds several new features to support functionality requested by users, fixes many errors in XSD 1.0, and clarifies wording.

This draft was published on 20 June 2008. The previous working draft of 30 August 2007 was a Last-Call Working Draft which elicited numerous comments and suggestions for improvements. All substantive issues have now been resolved, although some editorial issues remain open. The major revisions since the previous draft include the following:
  • The minimal subset of XPath which processors were required to support for assertions has been eliminated; processors must support all of XPath.
  • A new wildcard keyword ##definedSibling has been added to allow a wildcard to match any element except one mentioned explicitly elsewhere in the current content model.
  • The definitions of must and ·error· have been revised to require that processors detect and report errors (although the quality and level of detail of the error messages are not constrained).
  • An <override> element has been defined to allow the declarations or definitions of specified components in other schema documents to be overridden.
  • The <redefine> element has been ·deprecated·.
  • XML Representation Constraints no longer refer to the component level; they can now be checked for schema documents in isolation.
  • Numerous editorial changes and clarifications have been made and numerous small errors corrected.

In the datatypes spec,

The previous working draft of 17 February 2006 was a Last-Call Working Draft which elicited numerous comments and suggestions for improvements. All substantive issues have now been resolved, although some editorial issues remain open. Changes since the previous public Working Draft include the following

  • Support has been added for assertions on simple type definitions, analogous to those allowed by [XSD 1.1 Part 1: Structures] for complex type definitions.
  • The requirements of conformance have been clarified in various ways.
    A distinction is now made between ·implementation-defined· and ·implementation-dependent· features, and a list of such features is provided in Implementation-defined and implementation-dependent features (normative) (§H).
    Requirements imposed on host languages which use or incorporate the datatypes defined by this specification are defined.
    The definitions of must, must not, and ·error· have been changed to specify that processors must detect and report errors in schemas and schema documents (although the quality and level of detail in the error report is not constrained).
  • Conforming implementations may now support ·primitive· datatypes and facets in addition to those defined here.
  • A number of syntactic and semantic errors in some of the regular expressions given to describe the lexical spaces of the ·primitive· datatypes have been corrected.
    The character sequence '+INF' has been added to the lexical spaces of float and double.
  • Parts of the grammar of regular expressions given in Regular Expressions (§G) has been revised for clarity. The set of legal regular expressions has not been changed since the previous Working Draft.
  • The lexical mapping of the QName datatype, in particular its dependence on the namespace bindings in scope at the place where the ·literal· appears, has been clarified.
  • The characterization of ·lexical mappings· has been revised to say more clearly when they are functions and when they are not, and when (in the ·special· datatypes) there are values in the ·value space· not mapped to by any members of the ·lexical space·.
  • The nature of equality and identity of lists has been clarified.
  • Enumerations, identity constraints, and value constraints now use equality-based comparisons, not identity-based comparisons, in cases where there is a difference between identity and equality.
  • The mutual relations of lists and unions have been clarified, in particular the restrictions on what kinds of datatypes may appear as the ·item type· of a list or among the ·member types· of a union.
  • Unions with no member types (and thus with empty ·value space· and ·lexical space·) are now explicitly allowed.
  • Cycles in the definitions of ·unions· and in the derivation of simple types are now explicitly forbidden.
  • Errors have been corrected in the description of the canonical mapping of decimal.
  • A variety of clarifications have been introduced connected with the terms "finite-length", "actual value", "type", "datatype", "·special value·", and others.
  • A number of minor errors and obscurities have been fixed.

Comments are due by September 12.

Saturday, July 5, 2008 (Permalink)

The Mozilla Project has released Firefox 2.0.0.15. This release fixes security issues. All 2.x users should upgrade. (That includes me: Firefox 3 broke some of the AppleScript I depend on to manage this site.)

A new version of SeaMonkey has also been posted, though Camino doesn't seem to have been updated yet. Camino users may want to switch to Firefox or Safari for the time being.

Friday, July 4, 2008 (Permalink)

Michael Kay has released version 9.1 of Saxon, his XSLT 2.0 and XQuery processor for Java and .NET. According to Kay,

For the XSLT user the most interesting developments are probably in the area of streaming, allowing large documents to be processed without constructing a complete tree in memory: see

http://www.saxonica.com/documentation/sourcedocs/serial.html

The saxon:stream() extension function is essentially a repackaging of the existing <xsl:copy-of saxon:read-once> instruction, but it becomes a lot more versatile with the new syntax; in addition, a wider class of XPath expressions can now be streamed. Apart from this syntactic change, there are two other significant enhancements:

  • the saxon:iterate extension instruction allows "stateful" streamed processing where the processing of an element in the document can depend on data that was seen earlier in the stream. This was not previously possible.

  • operations that only need to see data near the start of the document will cause the XML parsing to terminate as soon as the required data is available. So you can get the title of a document without parsing the whole document.

Saxon is published in two versions for both of which Java 1.4 or later (or .NET) is required. Saxon 9.0B is an open source product published under the Mozilla Public License 1.0 that "implements the 'basic' conformance level for XSLT 2.0 and XQuery." Saxon 9.0 SA is a £250.00 payware version that "allows stylesheets and queries to import an XML Schema, to validate input and output trees against a schema, and to select elements and attributes based on their schema-defined type. Saxon-SA also incorporates a free-standard XML Schema validator. In addition Saxon-SA incorporates some advanced extensions not available in the Saxon-B product. These include a try/catch capability for catching dynamic errors, improved error diagnostics, support for higher-order functions, and additional facilities in XQuery including support for grouping, advanced regular expression analysis, and formatting of dates and numbers."

Wednesday, July 2, 2008 (Permalink)

SyncroSoft has released <Oxygen/> 9.3, $345 payware XML editor written in Java. Oxygen supports XML, XSL, DTDs, XQuery, SVG, Relax NG, Schematron, and the W3C XML Schema Language. According to the announcement, 9.3 adds support for OOXML, ODF, and other ZIP-wrapped XML packages.

Tuesday, July 1, 2008 (Permalink)

The W3C POWDER Working Group has published a new working draft of Protocol for Web Description Resources (POWDER): Description Resources.

The Protocol for Web Description Resources (POWDER) facilitates the publication of descriptions of multiple resources such as all those available from a Web site. These descriptions are always attributed to a named individual, organization or entity that may or may not be the creator of the described resources. This contrasts with more usual metadata that typically applies to a single resource, such as a specific document's title, which is usually provided by its author.

This document sets out how Description Resources (DRs) can be created and published, how to link to DRs from other online resources, and, crucially, how DRs may be authenticated and trusted. The aim is to provide a platform through which opinions, claims and assertions about online resources can be expressed by people and exchanged by machines. POWDER has evolved from the data model developed for the final report [XGR] of the Web Content Label Incubator Group [WCL-XG] from which we define a Description Resource as: "a resource that contains a description, a definition of the scope of the description and assertions about both the circumstances of its own creation and the entity that created it."

Monday, June 30, 2008 (Permalink)

Microsoft has released the Office Open XML File Format Converter for Microsoft Office 2004 for the Mac. This updater enables Mac Office 2004 to open and save files in the OOXML format supported by Microsoft Office 2007 for Windows and Microsoft Office 2008 for Mac.

Personally I uninstalled Microsoft Office 2004 from my Mac about a month ago due to massive instability and hangs, and haven't missed it. I tried updating to the latest point release first, but they're about a dozen different updaters that have to be downloaded and applied in a specific order and who has time for that? Instead, I've been using Google Docs for the limited amount of Word docs I need to read. I suppose if I were writing another book, I might have to reinstall Word, but short of that I just don't the see the need.

Friday, June 27, 2008 (Permalink)

The W3C Semantic Web Best Practices and Deployment Working Group and HTML Working Groups have published a new working draft of RDFa Primer 1.0 and a candidate recommendation of RDFa in XHTML: Syntax and Processing.

The modern Web is made up of an enormous number of documents that have been created using HTML. These documents contain significant amounts of structured data, which is largely unavailable to tools and applications. When publishers can express this data more completely, and when tools can read it, a new world of user functionality becomes available, letting users transfer structured data between applications and web sites, and allowing browsing applications to improve the user experience: an event on a web page can be directly imported into a user's desktop calendar; a license on a document can be detected so that users can be informed of their rights automatically; a photo's creator, camera setting information, resolution, location and topic can be published as easily as the original photo itself, enabling structured search and sharing.

RDFa is a specification for attributes to be used with languages such as HTML and XHTML to express structured data. The rendered, hypertext data of XHTML is reused by the RDFa markup, so that publishers don't need to repeat significant data in the document content. This document only specifies the use of the RDFa attributes with XHTML. The underlying abstract representation is RDF [RDF-PRIMER], which lets publishers build their own vocabulary, extend others, and evolve their vocabulary with maximal interoperability over time. The expressed structure is closely tied to the data, so that rendered data can be copied and pasted along with its relevant structure.

The rules for interpreting the data are generic, so that there is no need for different rules for different formats; this allows authors and publishers of data to define their own formats without having to update software, register formats via a central authority, or worry that two formats may interfere with each other.

RDFa shares some use cases with microformats [MICROFORMATS]. Whereas microformats specify both a syntax for embedding structured data into HTML documents and a vocabulary of specific terms for each microformat, RDFa specifies only a syntax and relies on independent specification of terms (often called vocabularies or taxonomies) by others. RDFa allows terms from multiple independently-developed vocabularies to be freely intermixed and is designed such that the language can be parsed without knowledge of the specific term vocabulary being used.

This document is a detailed syntax specification for RDFa, aimed at:

  • those looking to create an RDFa parser, and who therefore need a detailed description of the parsing rules;
  • those looking to recommend the use of RDFa within their organisation, and who would like to create some guidelines for their users;
  • anyone familiar with RDF, and who wants to understand more about what is happening 'under the hood', when an RDFa parser runs.

For those looking for an introduction to the use of RDFa and some real-world examples, please consult the RDFa Primer.

Here's a syntax example from the primer draft:

   <div about="/posts/trouble_with_bob">
      <h2 property="dc:title">The trouble with Bob</h2>
      
      The trouble with Bob is that he takes much better photos than I do:
	
      <div about="http://example.com/bob/photos/sunset.jpg">
        <img src="http://example.com/bob/photos/sunset.jpg" />
        <span property="dc:title">Beautiful Sunset</span>

        by <span property="dc:creator">Bob</span>.
      </div>
   </div>

The thing that jumps out at me are the use of namespace prefixes in attribute values. Haven't we learned by now that this is a bad idea?

I'm actually designing a significant metadata system at my day job at the moment, and for the life of me I can't figure out why we should use RDF in any shape or form. It doesn't offer clients any useful tools, and just makes the data more opaque. Most of the interesting meta-things we want to say will have to be hand-coded anyway because there are no standards for them. I think we're going to go with a hand-rolled XML syntax as the simplest thing that could possibly work. If anyone asks for RDF, we can always publish a GRDDL or XSLT transform; but RDF just seems pointless.

Thursday, June 26, 2008 (Permalink)

Adobe has released Acrobat 9. You can now embed movies in Acrobat documents. I wonder if this version will include an actually working Firefox plugin and a Reader that doesn't crash on the second page of every document? Hmm, apparently the answer is no. The reader is still the unusably buggy 8.1.2.


The W3C Web Application Formats Working Group has posted the last call working draft of Widgets 1.0 Requirements. "A widget is an interactive single purpose application for displaying and/or updating local data or data on the Web, packaged in a way to allow a single download and installation on a user's machine or mobile device. A widget may run as a stand alone application (meaning it can run outside of a Web browser), or may be embedded into a Web document. In this document, the runtime environment on which a widget is run is referred to as a widget user agent and a running widget is referred to as an instantiated widget. Prior to instantiation, a widget exists as a widget resource.

Thursday, June 19, 2008 (Permalink)

The W3C has published a note on A Prototype Knowledge Base for the Life Sciences. "The prototype we describe is a biomedical knowledge base, constructed for a demonstration at Banff WWW2007 , that integrates 15 distinct data sources using currently available Semantic Web technologies such as the W3C standard Web Ontology Language [OWL] and Resource Description Framework [RDF]. This report outlines which resources were integrated, how the knowledge base was constructed using free and open source triple store technology, how it can be queried using the W3C Recommended RDF query language SPARQL [SPARQL], and what resources and inferences are involved in answering complex queries. While the utility of the knowledge base is illustrated by identifying a set of genes involved in Alzheimer's Disease, the approach described here can be applied to any use case that integrates data from multiple domains."

Wednesday, June 18, 2008 (Permalink)

Just in case you missed the sirens and flashing lights, Firefox 3 is now out. Download it. Use it. Love it. (Some older extensions may be incompatible. Downloader assumes all liability. Contents may be hot. Do not use while driving. Free ice cream offer void in Louisiana. Not responsible for typographical errors, or pretty much anything else.)

Tuesday, June 17, 2008 (Permalink)

The W3C has posted a new working draft of HTML 5. "This specification defines the 5th major revision of the core language of the World Wide Web: the Hypertext Markup Language (HTML). In this version, new features are introduced to help Web application authors, new elements are introduced based on research into prevailing authoring practices, and special attention has been given to defining clear conformance criteria for user agents in an effort to improve interoperability." There are also drafts of HTML 5 differences from HTML 4 and HTML 5 Publication Notes. The latter contains a convenient list of changes since the January 22 draft:

  • Implementation and authoring details around the ping attribute have changed.
  • <meta http-equiv=content-type> is now a conforming way to set the character encoding.
  • API for the canvas element has been cleaned up. Text support has been added.
  • globalStorage is now restricted to the same-origin policy and renamed to localStorage. Related event dispatching has been clarified.
  • postMessage() API changed. Only the origin of the message is exposed, no longer the URI. It also requires a second argument that indicates the origin of the target document.
  • Drag and drop API has got clarification. The dataTransfer object now has a types attribute indicating the type of data being transferred.
  • The m element is now called mark.
  • Server-sent events has changed and gotten clarification. It uses a new format so that older implementations are not broken.
  • The figure element no longer requires a caption.
  • The ol element has a new reversed attribute.
  • Character encoding detection has changed in response to feedback.
  • Various changes have been made to the HTML parser section in response to implementation feedback.
  • Various changes to the editing section have been made, including adding queryCommandEnabled() and related methods.
  • The headers attribute has been added for td elements.
  • The table element has a new createTBody() method.
  • MathML support has been added to the HTML parser section. (SVG support is still awaiting input from the SVG WG.)
  • Author defined attributes have been added. Authors can add attributes to elements in the form of data-name and can access these through the DOM using dataset[name] on the element in question.
  • The q element has changed to require punctation inside rather than having the browser render it.
  • The target attribute can now have the value _blank.
  • The showModalDialog API has been added.
  • The document.domain API has been defined.
  • The source element now has a new pixelratio attribute useful for videos that have some kind encoding error.
  • bufferedBytes, totalBytes and bufferingThrottled DOM attributes have been added to the video element.
  • Media begin event has been renamed to loadstart for consistency with the Progress Events specification.
  • charset attribute has been added to script.
  • The iframe element has gained the sandbox and seamless attributes which provide sandboxing functionality.
  • The ruby, rt and rp elements have been added to support ruby annotation.
  • A showNotification() method has been added to show notification messages to the user.
  • Support for beforeprint and afterprint events has been added.
Monday, June 16, 2008 (Permalink)

The W3C XHTML2 Working Group has published proposed recommendations of XHTML Modularization 1.1 and XHTML Basic 1.1. "The former provides a means for subsetting and extending XHTML, a feature needed for extending XHTML's reach onto emerging platforms. This specification is intended for use by language designers as they construct new XHTML Family Markup Languages. This second version of this specification includes several minor updates to provide clarifications and address errors found in the first version. It also provides an implementation using XML Schemas. This version of XHTML Basic, which uses the Modularization approach, has been brought into alignment with the widely deployed XHTML Mobile Profile from the Open Mobile Alliance (OMA). XHTML Basic 1.1 will thus make it easier to author Web pages that work on millions of mobile handsets. Comments on these specifications are welcome through 15 July."

Thursday, June 12, 2008 (Permalink)

Opera Software has released version 9.5 of their namesake free-beer web browser for Windows, Mac, and Linux, FreeBSD, and Solaris. This release is supposed to be much faster than previously.

I wonder if they've deuglified it yet? Hmm, looks like they tried, but it didn't quite take. They may have hired a real artist to draw the buttons and the icons for the first time, because those are looking good in isolation. However, I'd guess they didn't hire a professional user interface designer to put them all together. The fonts are still wrong (the ones in the UI widgets, that is, not the ones in the web page) and the alignment of various components is way off. This is a frequent problem with cross-platform apps, but Firefox has done a good job with this for years now, so it's certainly possible to get this right. There seem to be multiple other user interface glitches, like a close button (white X on a red background in a widgets pane) that doesn't seem to actually close anything.

Opera may be the fastest browser on the planet, but if it is, I'll never know because it's just too damned ugly to look at for any length of time. Opera should split out the core rendering engine (which isn't bad) from the UI, so someone else can wrap some decent chrome around it. Right now, Opera is like putting a 3900 HF VVT engine in a AMC pacer body and slapping a fresh coat of paint over it.

Wednesday, June 11, 2008 (Permalink)

The OpenOffice Project has released OpenOffice 2.4.1, an open source office suite for Linux, Solaris, and Windows that saves all its files as zipped XML. 2.4.1 is a bug fix release.

I deleted Microsoft Office from my MacBook a couple of weeks ago, because of severe bugs (startup took forever). I haven't had any trouble living without it, but then I'm no longer a fulltime writer. For now I'm just using Google Docs as my replacement. OpenOffice is still too ugly to tolerate.

Tuesday, June 10, 2008 (Permalink)

The W3C XML Security Specifications Maintenance Working Group has published the second edition of XML Signature Syntax and Processing (Second Edition) . "This Second Edition of XML Signature Syntax and Processing adds Canonical XML 1.1 as a required canonicalization algorithm and recommends its use for inclusive canonicalization. This version of Canonical XML enables use of xml:id and xml:base Recommendations with XML Signature and also enables other possible future attributes in the XML namespace. Additional minor changes, including the incorporation of known errata, are documented in Changes in XML Signature Syntax and Processing (Second Edition)."

Monday, June 9, 2008 (Permalink)

Friday is the last day to submit late-breaking news for Balisage this August in Montreal. "Balisage is a peer-reviewed conference designed to meet the needs of markup theoreticians and practitioners who are pushing the boundaries of the field. It's all about the markup: how to create it; what it means; hierarchies and overlap; modeling; taxonomies; transformation; query, searching, and retrieval; presentation and accessibility; making systems that make markup dance (or dance faster to a different tune in a smaller space) — in short, changing the world and the web through the power of marked-up information. It's an XML Conference. It's an XSL Conference. It's a conference about XSD, XQuery, RDF, UBL, SGML, LMNL, XSL-FO, XTM, SVG, MathML, OWL, TexMECS, RNG, and a lot more. We welcome papers about topic maps, document modeling, markup of overlapping structures, ontologies, metadata, content management, and other markup-related topics at Balisage."

Friday, June 6, 2008 (Permalink)

The Mozilla Project has posted the second release candidate of Firefox 3.0 for Mac, Linux, and Windows. Firefox 3 is based on the much improved Gecko 1.9 Web rendering platform. Mostly this release focuses on small user interface improvements, tightened security, and improved performance and under-the-hood architecture, rather than big new features. Still. there are a few new features including:

Thursday, June 5, 2008 (Permalink)

The W3C HTML Working Group has published a note on Offline Web Applications. "HTML 5 contains several features that address the challenge of building Web applications that work while offline. This document highlights these features (SQL, offline application caching APIs as well as online/offline events, status, and the localStorage API) from HTML 5 and provides brief tutorials on how these features might be used to create Web applications that work offline."

Wednesday, June 4, 2008 (Permalink)

The XML Apache Project has posted version 0.95 of FOP, an open source XSL Formatting Objects to PDF/PostScript/RTF converter written in Java. This release fixes bugs, improves table support, and removes the need for some additional libraries. Java 1.4 or later is required.

Tuesday, June 3, 2008 (Permalink)

The W3C Web Accessibility Initiative has published a working draft of Web Accessibility for Older Users: A Literature Review:

There has been extensive development and adoption of the WAI guidelines for Web accessibility for people with disabilities. However, while these guidelines address many of the requirements needed by the ageing population, the relevance of the WAI guidelines to the needs of older people with functional disabilities caused by ageing does not seem to be well understood.

This review examines the literature relating to the use of the Web by older people to primarily look for intersections and differences between the WAI guidelines and recommendations for web design and development issues that will improve the accessibility and usability for older people. It is intended that the review will:

  • better inform the ongoing work of W3C/WAI with regard to the needs of older computer users and their web accessibility related needs
  • inform the development of potential extensions on WAI guidelines and techniques and/or provide direct input into future versions of WAI guidelines
  • lead to the development of educational resources focussed towards industry implementers, and organisations representing and serving ageing communities
  • help foster dialog between ageing communities, disability communities, industry, and other interested parties around issues of web accessibility
  • inform the contributions that W3C makes into the standards development processes in Europe and internationally.
Monday, June 2, 2008 (Permalink)

Michael Kay has released version 9.0.0.6 of Saxon, his XSLT 2.0 and XQuery processor for Java and .NET. This is a bug fix release.

Saxon is published in two versions for both of which Java 1.4 or later (or .NET) is required. Saxon 9.0B is an open source product published under the Mozilla Public License 1.0 that "implements the 'basic' conformance level for XSLT 2.0 and XQuery." Saxon 9.0 SA is a £250.00 payware version that "allows stylesheets and queries to import an XML Schema, to validate input and output trees against a schema, and to select elements and attributes based on their schema-defined type. Saxon-SA also incorporates a free-standard XML Schema validator. In addition Saxon-SA incorporates some advanced extensions not available in the Saxon-B product. These include a try/catch capability for catching dynamic errors, improved error diagnostics, support for higher-order functions, and additional facilities in XQuery including support for grouping, advanced regular expression analysis, and formatting of dates and numbers."

Friday, May 30, 2008 (Permalink)

The W3C has posted a working draft of State Chart XML (SCXML): State Machine Notation for Control Abstraction:

a general-purpose event-based state machine language that can be used in many ways, including:

  • As a high-level dialog language controlling VoiceXML 3.0's encapsulated speech modules (voice form, voice picklist, etc.)
  • As a voice application metalanguage, where in addition to VoiceXML 3.0 functionality, it may also control database access and business logic modules.
  • As a multimodal control language in the MultiModal Interaction framework [W3C MMI], combining VoiceXML 3.0 dialogs with dialogs in other modalities including keyboard and mouse, ink, vision, haptics, etc. It may also control combined modalities such as lipreading (combined speech recognition and vision) speech input with keyboard as fallback, and multiple keyboards for multi-user editing.
  • As the state machine framework for a future version of CCXML.
  • As an extended call center management language, combining CCXML call control functionality with computer-telephony integration for call centers that integrate telephone calls with computer screen pops, as well as other types of message exchange such as chats, instant messaging, etc.
  • As a general process control language in other contexts not involving speech processing.

SCXML combines concepts from CCXML and Harel State Tables. CCXML [W3C CCXML 1.0] is an event-based state machine language designed to support call control features in Voice Applications (specifically including VoiceXML but not limited to it). The CCXML 1.0 specification defines both a state machine and event handing syntax and a standardized set of call control elements. Harel State Tables are a state machine notation that was developed by the mathematician David Harel [Harel and Politi] and is included in UML [UML 2.0]. They offer a clean and well-thought out semantics for sophisticated constructs such as a parallel states. They have been defined as a graphical specification language, however, and hence do not have an XML representation. The goal of this document is to combine Harel semantics with an XML syntax that is a logical extension of CCXML's state and event notation.

Wednesday, May 28, 2008 (Permalink)

I've begun serializing the first chapter of Refactoring HTML on The Cafes. The first two sections are posted now:

More are coming tomorrow and Friday.

Tuesday, May 27, 2008

XMLMind has released Qizx/db 2.1, a $3200 closed source, embeddable native XML database engine written in Java that supports XQuery 1.0. Version 2.1 adds support for XQuery Update. The query interpreter part is available under an open source license.

Monday, May 26, 2008 (Permalink)

The W3C XHTML 2 working group has posted the last call working draft of XHTML Access Module Module to enable generic document accessibility. This module defines acess, an empty element that can carry activate, key, targetid, and targetrole attributes.

  • The activate attribute indicates whether a target element should be activated or not once it obtains focus.
  • The key attribute assigns a key mapping to an access shortcut. Triggering an access key defined in an access element changes focus to the next element in navigation order from the current focus that has one of the the referenced role or id values.
  • The targetid attribute specifies one or more IDREFs related to target elements for the associated event.
  • The targetrole attribute specifies a space separated list of CURIEs that maps to an element with a role attribute with the same value.
Sunday, May 25, 2008 (Permalink)

The W3C CSS Working Group has posted the Candidate Recommendation of CSS Namespaces Module. This module "defines the syntax for using namespaces in CSS. It defines the @namespace rule for declaring the default namespace and binding namespaces to namespace prefixes, and it also defines a syntax that other specifications can adopt for using those prefixes in namespace-qualified names."

Given the namespace declarations:

@namespace toto "http://toto.example.org";
@namespace "http://example.com/foo";

In a context where the default namespace applies

toto|A
represents the name A in the http://toto.example.org namespace.
|B
represents the name B that belongs to no namespace.
*|C
represents the name C in any namespace, including no namespace.
D
represents the name D in the http://example.com/foo namespace.
Saturday, May 24, 2008 (Permalink)

Edwin Dankert has released XML Hammer 1.0, a GUI program written in Java and based on JAXP 1.3 for checking well-formedness, validating, transforming, and querying XML documents. XML Hammer is published under the Mozilla Public License 1.1.

Friday, May 23, 2008 (Permalink)

The W3C Web API Working Group has posted the third public working draft of Progress Events 1.0. This "defines events which can be used to monitor a process and provide feedback to a user, particularly for network-based events." Here's the IDL:

interface ProgressEvent : events::Event {
     readonly attribute boolean         lengthComputable;
     readonly attribute unsigned long   loaded;
     readonly attribute unsigned long   total;
     void               initProgressEvent(in DOMString typeArg,
                                          in boolean       canBubbleArg,
                                          in boolean       cancelableArg,
                                          in boolean       lengthComputableArg,
                                          in unsigned long loadedArg,
                                          in unsigned long totalArg,
     void               initProgressEventNS(in DOMString namespaceURI,
                                            in DOMString typeArg,
                                            in boolean       canBubbleArg,
                                            in boolean       cancelableArg,
                                            in boolean       lengthComputableArg,
                                            in unsigned long loadedArg,
                                            in unsigned long totalArg,
};
Wednesday, May 21, 2008 (Permalink)

The W3C CSS Working Group has posted the last call working draft of Cascading Style Sheets (CSS) Snapshot 2007

When the first CSS specification was published, all of CSS was contained in one document that defined CSS Level 1. CSS Level 2 was defined also by a single, multi-chapter document. However for CSS beyond Level 2, the CSS Working Group chose to adopt a modular approach, where each module defines a part of CSS, rather than to define a single monolithic specification. This breaks the specification into more manageable chunks and allows more immediate, incremental improvement to CSS.

Since different CSS modules are at different levels of stability, the CSS Working Group has chosen to publish this profile to define the current scope and state of Cascading Style Sheets as of late 2007. This profile includes only specifications that we consider stable and for which we have enough implementation experience that we are sure of that stability.

Note that this is not intended to be a CSS Desktop Browser Profile: inclusion in this profile is based on feature stability only and not on expected use or Web browser adoption. This profile defines CSS in its most complete form.

Note also that although we don't anticipate significant changes to the specifications that form this snapshot, their inclusion does are not mean they are frozen. The Working Group will continue to address problems as they are found in these specs. Implementers should monitor www-style and/or the CSS Working Group Blog for any resulting changes, corrections, or clarifications.

There actually isn't that much that's ready; mostly CSS Level 2 plus CSS Namespaces, Selectors Level 3 and CSS Color Level 3.

Tuesday, May 20, 2008 (Permalink)

The Mozilla Project has posted the first release candidate of Firefox 3.0 for Mac, Linux, and Windows. Firefox 3 is based on the much improved Gecko 1.9 Web rendering platform. Mostly this release focuses on small user interface improvements, tightened security, and improved performance and under-the-hood architecture, rather than big new features. Still. there are a few new features including:

  • Add-ons manager
  • Save tabs on exit
  • Easier password management
  • New Download Manager
  • Resumable downloading
  • Full page zoom

I haven't tried Firefox 3 yet myself, but initial reviews are very positive.

Monday, May 19, 2008 (Permalink)

The W3C XQuery working group has posted the candidate recommendation of XQuery and XPath Full Text 1.0:

XML documents may contain highly structured data (fixed schemas, known types such as numbers, dates), semi-structured data (flexible schemas and types), markup data (text with embedded tags), and unstructured data (untagged free-flowing text). Where a document contains unstructured or semi-structured data, it is important to be able to search using Information Retrieval techniques such as scoring and weighting.

Full-text search is different from substring search in many ways:

  1. A full-text search searches for tokens and phrases rather than substrings. A substring search for news items that contain the string "lease" will return a news item that contains "Foobar Corporation releases the 20.9 version ...". A full-text search for the token "lease" will not.

  2. There is an expectation that a full-text search will support language-based searches which substring search cannot. An example of a language-based search is "find me all the news items that contain a token with the same linguistic stem as 'mouse'" (finds "mouse" and "mice"). Another example based on token proximity is "find me all the news items that contain the tokens 'XML' and 'Query' allowing up to 3 intervening tokens".

  3. Full-text search must address the vagaries and nuances of language. Search results are often of varying usefulness. When you search a web site for cameras that cost less than $100, this is an exact search. There is a set of cameras that matches this search, and a set that does not. Similarly, when you do a string search across news items for "mouse", there is only 1 expected result set. When you do a full-text search for all the news items that contain the token "mouse", you probably expect to find news items containing the token "mice", and possibly "rodents", or possibly "computers". Not all results are equal. Some results are more "mousey" than others. Because full-text search may be inexact, we have the notion of score or relevance. We generally expect to see the most relevant results at the top of the results list.

Note:

As XQuery and XPath evolve, they may apply the notion of score to querying structured data. For example, when making travel plans or shopping for cameras, it is sometimes useful to get an ordered list of near matches in addition to exact matches. If XQuery and XPath define a generalized inexact match, we expect XQuery and XPath to utilize the scoring framework provided by XQuery and XPath Full Text.

[Definition: Full-text queries are performed on tokens and phrases. Tokens and phrases are produced via tokenization.] Informally, tokenization breaks a character string into a sequence of tokens, units of punctuation, and spaces.

Tokenization, in general terms, is the process of converting a text string into smaller units that are used in query processing. Those units, called tokens, are the most basic text units that a full-text search can refer to. Full-text operators typically work on sequences of tokens found in the target text of a search. These tokens are characterized by integers that capture the relative position(s) of the token inside the string, the relative position(s) of the sentence containing the token, and the relative position(s) of the paragraph containing the token. The positions typically comprise a start and an end position.

Tokenization, including the definition of the term "tokens", SHOULD be implementation-defined. Implementations SHOULD expose the rules and sample results of tokenization as much as possible to enable users to predict and interpret the results of tokenization. Tokenization is defined more formally in 4.1 Tokenization.

[Definition: A token is a non-empty sequence of characters returned by a tokenizer as a basic unit to be searched. Beyond that, tokens are implementation-defined.] [Definition: A phrase is an ordered sequence of any number of tokens. Beyond that, phrases are implementation-defined.]

Note:

Consecutive tokens need not be separated by either punctuation or space, and tokens may overlap.

Note:

In some natural languages, tokens and words can be used interchangeably.

[Definition: A sentence is an ordered sequence of any number of tokens. Beyond that, sentences are implementation-defined. A tokenizer is not required to support sentences.]

[Definition: A paragraph is an ordered sequence of any number of tokens. Beyond that, paragraphs are implementation-defined. A tokenizer is not required to support paragraphs.]

Some XML elements represent semantic markup, e.g., <title>. Others represent formatting markup, e.g., <b> to indicate bold. Semantic markup serves well as token boundaries. Some formatting markup serves well as token boundaries, for example, paragraphs are most commonly delimited by formatting markup. Other formatting markup may not serve well as token boundaries. Implementations are free to provide implementation-defined ways to differentiate between the markup's effect on token boundaries during tokenization. In the absence of an implementation-defined way to differentiate, element markup (start tags, end tags, and empty-element tags) creates token boundaries.

A sample tokenization is used for the examples in this document. The results might be different for other tokenizations.

Tokenization enables functions and operators that operate on a part or the root of the token (e.g., wildcards, stemming).

Tokenization enables functions and operators which work with the relative positions of tokens (e.g., proximity operators).

This specification focuses on functionality that serves all languages. It also selectively includes functionalities useful within specific families of languages. For example, searching within sentences and paragraphs is useful to many western languages and to some non-western languages, so that functionality is incorporated into this specification.

Friday, May 16, 2008 (Permalink)

SyncroSoft has released <Oxygen/> 9.2, $345 payware XML editor written in Java. Oxygen supports XML, XSL, DTDs, XQuery, SVG, Relax NG, Schematron, and the W3C XML Schema Language. According to the announcement, 9.2 adds support for the Intel XSLT engine and Saxon 9.0.0.4.

Thursday, May 15, 2008 (Permalink)

The Web Accessibility Initiative has published Web Accessibility for Older Users: A Literature Review. "This document is intended to provide an overview of currently available literature about the needs of older adults with functional impairments accessing the web. It will compare how well these requirements are addressed and communicated by the WAI guidelines. This early version is intended to elicit comment and feedback on the literature collected and discussed so far. In particular we are interested in whether there are gaps in our coverage, or key resources overlooked. It should be noted that this is a work-in-progress and that not all sections are yet complete."

Wednesday, May 14, 2008 (Permalink)

I am pleased to announce that my latest book, Refactoring HTML has been released by Addison Wesley. This book endeavors to improve the design of existing web sites along multiple axes: maintainability, security, attractiveness, and performance. It does this by moving sites to web standards: XHTML, CSS, and REST.

Rather than approaching this as a big bang project, small changes can be made in small steps that offer linear improvement. You don't need to spend months of developer time and thousands of dollars before you see any payback. You can improve your site some today, and then some more tomorrow. Refactoring a web site doesn't require large blocks of uninterrupted development time. Add up enough small changes in the little pieces of time scattered throughout the workday, and before you know it, your site is dramatically improved.

Not convinced yet? Let me offer a brief excerpt from Chapter 1:

Refactoring. What is it? Why do it?

In brief, refactoring is the gradual improvement of a code base by making small changes that don’t modify a program’s behavior, usually with the help of some kind of automated tool. The goal of refactoring is to remove the accumulated cruft of years of legacy code and produce cleaner code that is easier to maintain, easier to debug, and easier to add new features to.

Technically, refactoring never actually fixes a bug or adds a feature. However, in practice, when refactoring I almost always uncover bugs that need to be fixed and spot opportunities for new features. Often, refactoring changes difficult problems into tractable and even easy ones. Reorganizing code is the first step in improving it.

The concept of refactoring originally came from the object-oriented programming community, and dates back at least as far as 1990 (William F. Opdyke and Ralph E. Johnson, “Refactoring: An Aid in Designing Application Frameworks and Evolving Object-Oriented Systems,” Proceedings of the Symposium on Object-Oriented Programming Emphasizing Practical Applications [SOOPPA], September 1990, ACM), though likely it was in at least limited use before then. However, the term was popularized by Martin Fowler in 1999 in his book Refactoring (Addison-Wesley, 1999). Since then, numerous IDEs and other tools such as Eclipse, IntelliJ IDEA, and C# Refactory have implemented many of his catalogs of refactorings for languages such as Java and C#, as well as inventing many new ones.

However, it’s not just object-oriented code and object-oriented languages that develop cruft and need to be refactored. In fact, it’s not just programming languages at all. Almost any sufficiently complex system that is developed and maintained over time can benefit from refactoring. The reason is twofold.

  1. Increased knowledge of both the system and the problem domain often reveals details that weren’t apparent to the initial designers. No one ever gets everything right in the first release. You have to see a system in production for a while before some of the problems become apparent.

  2. 2. Over time, functionality increases and new code is written to support this functionality. Even if the original system solved its problem perfectly, the new code written to support new features doesn’t mesh perfectly with the old code. Eventually, you reach a point where the old code base simply cannot support the weight of all the new features you want to add.

When you find yourself with a system that is no longer able to support further developments, you have two choices: You can throw it out and build a new system from scratch, or you can shore up the foundations. In practice, we rarely have the time or budget to create a completely new system just to replace something that already works. It is much more cost-effective to add the struts and supports that the existing system needs before further work. If we can slip these supports in gradually, one at a time, rather than as a big-bang integration, so much the better.

Many sufficiently complex systems with large chunks of code are not object-oriented languages and perhaps are not even programming languages at all. For instance, Scott Ambler and Pramod Sadalage demonstrated how to refactor the SQL databases that support many large applications in Refactoring Databases (Addison-Wesley, 2006). However, while the back end of a large networked application is often a relational database, the front end is a web site. Thin client GUIs delivered in Firefox or Internet Explorer are everywhere, replacing thick client GUIs for all sorts of business applications, such as payroll and lead tracking. Adventurous users at companies such as Sun and Google are going even further and replacing classic desktop applications like word processors and spreadsheets with web apps built out of HTML, CSS, and JavaScript. Finally, the Web and the ubiquity of the web browser have enabled completely new kinds of applications that never existed before, such as eBay, Netflix, PayPal, Google Reader, and Google Maps.

HTML made these applications possible, and it made them faster to develop, but it didn’t make them easy. It didn’t make them simple. It certainly didn’t make them less fundamentally complex. Some of these systems are now on their second, third, or fourth generation; and wouldn’t you know it? Just like any other sufficiently complex, sufficiently long-lived application, these web apps are developing cruft. The new pieces aren’t merging perfectly with the old pieces. Systems are slowing down because the whole structure is just too ungainly. Security is being breached when hackers slip in through the cracks where the new parts meet the old parts. Once again, the choice comes down to throwing out the original application and starting over, or fixing the foundations; but really, there’s no choice. In today’s fast-moving world, nobody can afford to wait for a completely new replacement. The only realistic option is to refactor.

Most of the refactorings in this book focus on upgrading sites to web standards, specifically:

  • XHTML
  • CSS
  • REST

They are going to help you move away from

  • Tag soup
  • Presentation-based markup
  • Stateful applications

These are not binary choices, or all-or-nothing decisions. You can often improve the characteristics of your sites along these three axes without going all the way to one extreme. An important characteristic of refactoring is that it’s linear. Small changes generate small improvements. You do not need to do everything at once. You can implement well-formed XHTML before you implement valid XHTML. You can implement valid XHTML before you move to CSS. You can have a fully valid CSS-laid-out site before you consider what’s required to eliminate sessions and session cookies.

Nor do you have to implement these changes in this order. You can pick and choose the refactorings from the catalog that bring the most benefit to your applications. You may not require XHTML, but you may desperately need CSS. You may want to move your application architecture to REST for increased performance but not care much about converting the documents to XHTML. Ultimately, the decision rests with you. This book presents the choices and options so that you can weigh the costs and benefits for yourself.

It is certainly possible to build web applications using tag-soup table-based layout, image maps, and cookies. However, it’s not possible to scale those applications, at least not without a disproportionate investment in time and resources that most of us can’t afford. Growth both horizontally (more users) and vertically (more features) requires a stronger foundation. This is what XHTML, CSS, and REST provide.

Refactoring HTML is available now at Amazon, Safari, and other fine bookstores everywhere. The price is a very reasonable $39.99, and most stores are offering their customary discounts. (Amazon is 10% off at the moment.) I hope you enjoy it.

Tuesday, May 13, 2008 (Permalink)

The W3C Web Application Formats several new and update working drafts about Widgets:

A widget is an interactive single purpose application for displaying and/or updating local data or data on the Web, packaged in a way to allow a single download and installation on a user's machine or mobile device. A widget may run as a stand alone application (meaning it can run outside of a Web browser), or may be embedded into a Web document. In this document, the runtime environment on which a widget is run is referred to as a widget user agent and a running widget is referred to as an instantiated widget. Prior to instantiation, a widget exists as a widget resource. For more information about widgets, see the Widget Landscape document.

To be clear, this specification describes the requirements for desktop style widgets (akin to Dashboard, Opera Widgets, and Yahoo! Widgets). This document does not address the requirements of "web widgets", such as iGoogle Gadgets or Windows Live Gadgets.

The drafts include:

Monday, May 12, 2008 (Permalink)

The W3C Web Content Accessibility Guidelines Working Group has updated two working drafts on the Web Content Accessibility Guidelines:

Understanding WCAG 2.0

This document, "Understanding WCAG 2.0," is an essential guide to understanding and using Web Content Accessibility Guidelines 2.0 [WCAG20]. It is part of a series of documents that support WCAG 2.0. Please note that the contents of this document are informative (they provide guidance), and not normative (they do not set requirements for conforming to WCAG 2.0).

WCAG 2.0 establishes a set of Success Criteria to define conformance to the WCAG 2.0 Guidelines. A Success Criterion is a testable statement that will be either true or false when applied to specific Web content. "Understanding WCAG 2.0" provides detailed information about each Success Criterion, including its intent, the key terms that are used in the Success Criterion, and how the Success Criteria in WCAG 2.0 help people with different types of disabilities. This document also provides examples of Web content that meet the success criterion using various Web technologies (for instance, HTML, CSS, XML), and common examples of Web content that does not meet the success criterion.

This document indicates specific techniques to meet each Success Criterion. Details for how to implement each technique are available in Techniques and Failures for WCAG 2.0, but "Understanding WCAG 2.0" provides the information about the relationship of each technique to the Success Criteria. Techniques are categorized by the level of support they provide for the Success Criteria. "Sufficient techniques" are sufficient to meet a particular Success Criterion (either by themselves or in combination with other techniques), while other techniques are advisory and therefore optional. None of the techniques are required to meet WCAG 2.0, although some may be the only known method if a particular technology is used. "Advisory techniques" are not sufficient to meet the Success Criteria on their own (because they are not testable or provide incomplete support) but it is encouraged that authors follow them when possible to provide enhanced accessibility. Another support category is "Failure techniques", which describe authoring practices known to cause Web content not to conform to WCAG 2.0. Although failure techniques provide advisory information about certain authoring practices, authors must avoid those practices in order to meet the WCAG 2.0 Success Criteria.

Techniques for WCAG 2.0

"'Techniques and Failures for WCAG 2.0' provides information to Web content developers who wish to satisfy the success criteria of Web Content Accessibility Guidelines 2.0 (WCAG 2.0). Techniques are specific authoring practices that may be used in support of the WCAG 2.0 success criteria. This document provides "General Techniques" that describe basic practices that are applicable to any technology, and technology-specific techniques that provide information applicable to specific technologies. Currently, technology-specific techniques are available for HTML, CSS, ECMAScript, SMIL, ARIA, and Web servers. The World Wide Web Consortium only documents techniques for non-proprietary technologies; the WCAG Working Group hopes vendors of other technologies will provide similar techniques to describe how to conform to WCAG 2.0 using those technologies. Use of the techniques provided in this document makes it easier for Web content to demonstrate conformance to WCAG 2.0 success criteria than if these techniques are not used."

There's a lot of good information here. These should really be required reading for all HTML authors and web designers. The Techniques spec is probably the most practical, and where most readers should start.

Sunday, May 11, 2008 (Permalink)

The W3C XHTML 2 Working Group has posted the last call working draft of CURIE Syntax 1.0: A syntax for expressing Compact URIs. This is modeled after namespace URIs and qualified names. In brief, it defines a prefix for a known base IRI (a URI that can contain non-ASCII characters like é), then appends a colon and a local part. For example, the CURIE cafe:tradeshows.xml could be shorthand for http://www.cafeaulait.org/tradeshows.xml if the prefix cafe were mapped to the URL http://www.cafeaulait.org/. Exactly how prefixes are mapped to base IRIs is left to the specification of the documents in which the CURIEs appear. However if the CURIEs are in an XML document, then the namespaces in scope define the prefix mappings. The default namespace can be used for prefix-less CURIEs.

Frankly I'm surprised to see this. Namespaces and the namespace syntax are one of the notable failures of the XML ecosystem. Why someone would choose to imitate this now that we know better is beyond me. Based on experience with namespaces, I predict that the problems of moving CURIEs from one context to another are going to be especially problematic. Well, we've learned to live with (if not exactly like) namespaces. I guess we can get used to this.

Saturday, May 10, 2008 (Permalink)

Planamesa Software has released NeoOffice/J 2.2.3, a Mac port of OpenOffice 2.1 using a Java-based GUI. New features in this latest patch release include a Media Browser, native floating tool windows, trackpad magnify and swipe features. Features since 2.2.2 include grammar checking, importing images from scanners and cameras, QuickTime video, and menu bars that stay open when no window is present.

Friday, May 9, 2008 (Permalink)

SyncroSoft has released <Oxygen/> 9.2, $345 payware XML editor written in Java. Oxygen supports XML, XSL, DTDs, XQuery, SVG, Relax NG, Schematron, and the W3C XML Schema Language. According to the announcement, "Version 9.2 introduces a new XML Author edition specially tuned for content authors providing a well designed interface for XML editing by keeping only the relevant authoring features. The major additions in Oxygen XML Editor 9.2 are related to the WYSIWYG-like editing support and in particular to the DITA support. The general visual editing improvements include displaying the resolved content in the editor and navigation through links. With the new DITA features that include a new DITA map editor, actions for inserting conref links, a tight integration of the latest version of the DITA Open Toolkit, Oxygen XML Editor becomes the leading DITA editor and the easiest to use. Other improvements are browsing of XML databases using WebDAV connections, better handling of Chinese, Japanese and Korean (CJK) text, support for the Intel® XML Software Suite and multiple component updates."

Thursday, May 8, 2008 (Permalink)

The OpenOffice Project has posted the first beta of OpenOffice 3.0, an open source office suite for Linux, Solaris, and Windows that saves all its files as zipped XML.

The most immediately visible change to OpenOffice.org 3.0 is the new "Start Centre", new fresh-looking icons, and a new zoom control in the status bar. A closer look shows that 3.0 has a myriad of new features. Notable Calc improvements include a new solver component; support for spreadsheet collaboration through workbook sharing; and an increase to 1024 columns per sheet. Writer has an improved notes feature and displays of multiple pages while editing. There are numerous Chart enhancements, and an improved crop feature in Draw and Impress.

Behind the scenes, OpenOffice.org 3.0 will support the upcoming OpenDocument Format (ODF) 1.2 standard, and is capable of opening files created with MS-Office 2007 or MS-Office 2008 for Mac OS X (.docx, .xlsx, .pptx, etc.). This is in addition to read and write support for the MS-Office binary file formats (.doc, .xls, .ppt, etc.).

OpenOffice.org 3.0 will be the first version to run on Mac OS X without X11, with the look and feel of any other Aqua application. It introduces partial VBA support to this platform. In addition, OpenOffice.org 3.0 integrates well with the Mac OS X accessibility APIs, and thus offers better accessibility support than many other Mac OS X applications.

Saturday, May 3, 2008 (Permalink)

The W3C Web Content Accessibility Guidelines Working Group has posted the candidate recommendation of Web Content Accessibility Guidelines 2.0. "Web Content Accessibility Guidelines 2.0 (WCAG 2.0) covers a wide range of recommendations for making Web content more accessible. Following these guidelines will make content accessible to a wider range of people with disabilities, including blindness and low vision, deafness and hearing loss, learning disabilities, cognitive limitations, limited movement, speech difficulties, photosensitivity and combinations of these. Following these guidelines will also often make your Web content more usable to users in general."

Friday, May 2, 2008 (Permalink)

The W3C XML Core Working Group has published the finished recommendation Canonical XML 1.1. This attempts to address some of the weirdnesses of Canonical XML, such as the movement of xml:id attributes from one element to another and breaking of base URLs when canonicalizing.

Thursday, May 1, 2008 (Permalink)

The W3C XML Processing Model Working Group has published a new Working Draft of XProc: An XML Pipeline Language. According to group lead Norm Walsh, changes in this draft are:

  1. Fairly substantial syntax changes. A <p:pipeline> is now just syntactic sugar for a particular <p:declare-step>.

  2. Significantly reworked the syntax and semantics of variables, options, and parameters. Added <p:variable>. Imposed a syntactic distinction between declaration (<p:option>) and use (<p:with-option>/<p:with-param>) of options and parameters.

  3. Clarified the scope of variables and options.

  4. Removed value attribute from <p:variable>, <p:option>, <p:with-option>, and <p:with-param>.

  5. Removed automatic declaration of parameter input ports; you have to declare them explicitly if you need them.

  6. Added p:base-uri() and p:resolve-uri() XPath extension functions to support (XPath 1.0) pipelines that need access to the base URI of documents.

  7. Removed ignored namespaces, added <p:pipeinfo>.

  8. Redefined the <p:label-elements> step to use a step-local variable in the XPath context.

  9. Added psvi-required attribute to pipelines.

  10. Changed definition of <p:error> to better address localization issues.

The syntax changes, and making <p:pipeline> syntactic sugar for a particular <p:declare-step>, have the effect of making very simple, straight-through pipelines syntactically simple again.

Reorganizing some of the option and parameter elements, and adding a variable element, makes the language bigger (in the sense that it has more elements) but I think it has significantly reduced some of the confusing sublty that used to exist around declaration and use of options.

In general, I think these are all changes for the better. And I think we're done. This is a Last Call working draft in all but name. The changes are significant enough that we thought it would be best to float them in an ordinary working draft first. That will, I hope, save us the embarrassment of having to do more than two last calls.

Wednesday, April 30, 2008 (Permalink)

Mokka mit Schlag is borked at the moment. I think I know what went wrong with the upgrade, and I'm working on fixing it. In brief, the WordPress user did not have permissions to create and drop tables. This is indicative of a bug in WordPress--it does not verify that it has the necessary permissions before attempting to upgrade, nor does it notice that the upgrade has failed and perform a rollback. However the host (Pair Networks) has not been quickly responsive, so I'm not sure how long it will take; and I don't have the root database access necessary to repair the problem, so it may take a little while.

Tuesday, April 29, 2008 (Permalink)

Another day, another WordPress security bug. Matt Mullenweg has released Wordpress 2.5.1 an open source (GPL) blog engine based on PHP and MySQL. All users should upgrade.

Monday, April 28, 2008 (Permalink)

The W3C has posted the first working draft of Requirements of Japanese Text Layout. "This document describes requirements for general Japanese layout realized with technologies like CSS, SVG and XSL-FO. The document is mainly based on a standard for Japanese layout, JIS X 4051. However, it addresses also areas which are not covered by JIS X 4051. The document is currently in draft stage. This public draft contains the Introduction and section 1 Basics of Japanese Text Layout. Further sections are available in a non-public version of the document and will be integrated into a further public Working Draft."

Friday, April 25, 2008 (Permalink)

Daniel Veillard has released version 2.6.32 of libxml2, the open source XML C library for Gnome. This release fixes assorted bugs including some memory leaks. All users should upgrade.

Thursday, April 24, 2008 (Permalink)

The W3C Web API Working Group has posted the last call working draft of The XMLHttpRequest Object.

The XMLHttpRequest object implements an interface exposed by a scripting engine that allows scripts to perform HTTP client functionality, such as submitting form data or loading data from a server.

The name of the object is XMLHttpRequest for compatibility with the web, though each component of this name is potentially misleading. First, the object supports any text based format, including XML. Second, it can be used to make requests over both HTTP and HTTPS (some implementations support protocols in addition to HTTP and HTTPS, but that functionality is not covered by this specification). Finally, it supports "requests" in a broad sense of the term as it pertains to HTTP; namely all activity involved with HTTP requests or responses for the defined HTTP methods.

Tuesday, April 15, 2008 (Permalink)

Michael Kay has released version 9.0.0.4 of Saxon, his XSLT 2.0 and XQuery processor for Java and .NET. This is a bug fix release. "Although there's a steady stream of new bugs and fixes, I think they are largely problems that affect very few users, so unless you know you're affected by one of the bugs, there's no great urgency to upgrade to the latest maintenance build."

Saxon is published in two versions for both of which Java 1.4 or later (or .NET) is required. Saxon 9.0B is an open source product published under the Mozilla Public License 1.0 that "implements the 'basic' conformance level for XSLT 2.0 and XQuery." Saxon 9.0 SA is a £250.00 payware version that "allows stylesheets and queries to import an XML Schema, to validate input and output trees against a schema, and to select elements and attributes based on their schema-defined type. Saxon-SA also incorporates a free-standard XML Schema validator. In addition Saxon-SA incorporates some advanced extensions not available in the Saxon-B product. These include a try/catch capability for catching dynamic errors, improved error diagnostics, support for higher-order functions, and additional facilities in XQuery including support for grouping, advanced regular expression analysis, and formatting of dates and numbers."


In related news, the fourth edition of Kay's classic XSLT 2.0 and XPath 2.0 Programmer's Reference is scheduled to be released on April 28th. It's in hardcover, over 1300 pages, and is currently available for $37.79 at Amazon.

Monday, April 14, 2008 (Permalink)

XMLMind has released version 3.8.0 of their XML Editor. This $300 payware product features word processor and spreadsheet like views of XML documents. This release adds support for MathML 2 presentation markup. A free-beer hobbled version is also available.

Sunday, April 13, 2008 (Permalink)

The W3C HTML working group has posted the last call working draft of XHTML Role Attribute Module.

The Role Attribute Module defines the role attribute and some values for that attribute in the default vocabulary space. The role attribute takes as its value one or more whitespace separated CURIEs [CURIE]. Any non-qualified value MUST be interpreted as being from the XHTML vocabulary at http://www.w3.org/1999/xhtml/vocab#. For a list of all roles in the default vocabulary, see [XHTMLVOCAB].

The attribute describes the role(s) the current element plays in the context of the document. This can be used, for example, by applications and assistive technologies to determine the purpose of an element. This could allow a user to make informed decisions on which actions may be taken on an element and activate the selected action in a device independent way. It could also be used as a mechanism for annotating portions of a document in a domain specific way (e.g., a legal term taxonomy).

This example is informative
<ul role="navigation sitemap">
    <li href="downloads">Downloads</li>
    <li href="docs">Documentation</li>

    <li href="news">News</li>
</ul>

The following list represents some of the roles defined in the default vocabulary. They are intended to define regions of the document to help orient the user.

banner
A region that contains the prime heading or internal title of a page.

Most of the content of a banner is site-oriented, rather than being page-specific. Site-oriented content typically includes things such as the logo of the site sponsor, the main heading for the page, and site-specific search tool. Typically this appears at the top of the page spanning the full width.

complementary
Any section of the document that supports but is separable from the main content, but is semantically meaningful on its own even when separated from it.

There are various types of content that would appropriately have this role. For example, in the case of a portal, this may include but not be limited to show times, current weather, related articles, or stocks to watch. The content should be relevant to the main content; if it is completely separable, a more general role should be used instead.

contentinfo
Meta information about the content on the page or the page as a whole.

For example, footnotes, copyrights, links to privacy statements, etc. would belong here.

definition
A definition of a term or concept.

A role is not provided to specify the term being defined, although host languages may provide such an element; in XHTML this is the dfn element. The defined term should be included in such an element even when occurring within an element having the definition role.

main
Main content in a document.

This marks the content that is directly related to or expands upon the central topic of the page.

navigation
A collection of links suitable for use when navigating the document or related documents.
note
The content is parenthetic or ancillary to the main content of the resource.
search
The search tool of a web document.

This is typically a form used to submit search requests about the site or to a more general Internet search service.

You can add other values for this attribute by placing the values in a namespace. (Haven't we learned yet that namespaced attribute values are a bad idea?)

Friday, April 11, 2008 (Permalink)

The W3C Web API Working Group has published the second working draft of Language Bindings for DOM Specifications. "“Language Bindings for DOM Specifications” is intended to specify in detail the IDL language used by W3C specifications to define DOM interfaces, and to provide precise conformance requirements for ECMAScript and Java bindings of such interfaces. It is expected that this document acts as a guide to implementors of already-published DOM specifications, and that newly published DOM specifications reference this document to ensure conforming implementations of DOM interfaces are interoperable."

Thursday, April 10, 2008 (Permalink)

The W3C Semantic Web Activity has posted a ?working draft? of Experiences with the conversion of SenseLab databases to RDF/OWL. "One of the challenges facing Semantic Web for Health Care and Life Sciences is that of converting relational databases into Semantic Web format. The issues and the steps involved in such a conversion have not been well documented. To this end, we have created this document to describe the process of converting SenseLab databases into OWL. SenseLab is a collection of relational (Oracle) databases for neuroscientific research. The conversion of these databases into RDF/OWL format is an important step towards realizing the benefits of Semantic Web in integrative neuroscience research. This document describes how we represented some of the SenseLab databases in Resource Description Framework (RDF) and Web Ontology Language (OWL), and discusses the advantages and disadvantages of these representations. Our OWL representation is based on the reuse of existing standard OWL ontologies developed in the biomedical ontology communities. The purpose of this document is to share our implementation experience with the community."

Mildly interesting, but why this is working draft instead of a note, or why it's even published by the W3C I can't quite figure out. This is a case study at most, not a specification of anything in particular.

Wednesday, April 9, 2008 (Permalink)

The W3C Math Working Group has posted the third public working draft of Mathematical Markup Language (MathML) Version 3.0. Changes since 2.0 include content dictionaries, "a mechanism for recording that a particular notational structure has a particular mathematical meaning". Version 3.0 is also supposed to enable easier markup of elementary school mathematics.

Tuesday, April 8, 2008 (Permalink)

The Modis Team has released Sedna 3.0, an open source native XML database for Windows and Linux written in C++ and Scheme and published under the Apache License 2.0. Sedna supports XQuery and its own declarative update language. This release fixes bugs and improves transaction support.

Of the open source XML databases, this is the one I know the least about. Anyone want to comment on this one?

Monday, April 7, 2008 (Permalink)

The W3C XHTML 2 Working Group has posted the third public working draft of CURIE Syntax 1.0: A syntax for expressing Compact URIs. This is modeled after namespace URIs and qualified names. In brief, it defines a prefix for a known base IRI (a URI that can contain non-ASCII characters like é), then appends a colon and a local part. For example, the CURIE cafe:tradeshows.xml could be shorthand for http://www.cafeaulait.org/tradeshows.xml if the prefix cafe were mapped to the URL http://www.cafeaulait.org/. Exactly how prefixes are mapped to base IRIs is left to the specification of the documents in which the CURIEs appear. However if the CURIEs are in an XML document, then the namespaces in scope define the prefix mappings. The default namespace can be used for prefix-less CURIEs.

Frankly I'm surprised to see this. Namespaces and the namespace syntax are one of the notable failures of the XML ecosystem. Why someone would choose to imitate this now that we know better is beyond me. Based on experience with namespaces, I predict that the problems of moving CURIEs from one context to another are going to be especially problematic. Well, we've learned to live with (if not exactly like) namespaces. I guess we can get used to this.

Sunday, April 6, 2008 (Permalink)

The Unicode Consortium has released Unicode 5.1:

This release contains over 100,000 characters, and provides significant additions and improvements that extend text processing for software worldwide. Some of the key features are: increased security in data exchange, significant character additions for Indic and South East Asian scripts, expanded identifier specifications for Indic and Arabic scripts, improvements in the processing of Tamil and other Indic scripts, linebreaking conformance relaxation for HTML and other protocols, strengthened normalization stability, new case pair stability, plus others given below.

The Version 5.1.0 data files and documentation are final and posted on the Unicode site. In addition to updated existing files, implementers will find new test data files (for example, for linebreaking) and new XML data files that encapsulate all of the Unicode character properties. For details, see the page for Unicode 5.1.0 at http://www.unicode.org/versions/Unicode5.1.0/.

A major feature of Unicode 5.1.0 is the enabling of ideographic variation sequences. These sequences allow standardized representation of glyphic variants needed for Japanese, Chinese, and Korean text. The first registered collection, from Adobe Systems, is now available at http://www.unicode.org/ivd/.

Unicode 5.1 contains significant changes to properties and behaviorial specifications. Several important property definitions were extended, improving linebreaking for Polish and Portuguese hyphenation. The Unicode Text Segmentation Algorithms, covering sentences, words, and characters, were greatly enhanced to improve the processing of Tamil and other Indic languages. The Unicode Normalization Algorithm now defines stabilized strings and provides guidelines for buffering. Standardized named sequences are added for Lithuanian, and provisional named sequences for Tamil.

Unicode 5.1.0 adds 1,624 newly encoded characters. These additions include characters required for Malayalam and Myanmar and important individual characters such as Latin capital sharp s for German. Version 5.1 extends support for languages in Africa, India, Indonesia, Myanmar, and Vietnam, with the addition of the Cham, Lepcha, Ol Chiki, Rejang, Saurashtra, Sundanese, and Vai scripts. Scholarly support includes important editorial punctuation marks, as well as the Carian, Lycian, and Lydian scripts, and the Phaistos disc symbols. Other new symbol sets include dominoes, Mahjong, dictionary punctuation marks, and math additions. This latest version of the Unicode Standard has exactly the same character assignments as ISO/IEC 10646:2003 plus Amendments 1 through 4.

The Unicode Collation Algorithm (UCA), the core standard for sorting all text, is also being updated at the same time (see http://www.unicode.org/reports/tr10/). The major changes in UCA include coverage of all Unicode 5.1 characters, tightened conformance for canonical equivalence, clearer definitions of internationalized search and matching, specifications of parameters for customizing collation, and definitions of collation folding. There are also important clarifications on the use of contractions (such as "ch" in Slovak) in collation.

The next version of the Unicode locale project (CLDR) is also being prepared on the basis of Unicode 5.1, and is now open for public data submission (see http://www.unicode.org/cldr/).

Friday, April 4, 2008 (Permalink)

The W3C Web Security Context Working Group has posted the an updated public working draft of Web Security Context: Experience, Indicators, and Trust.

This specification deals with the trust decisions that users must make online, and with ways to support them in making safe and informed decisions where possible.

In order to achieve that goal, this specification includes recommendations on the presentation of identity information by Web user agents; on handling errors in security protocols in a way that minimizes the trust decisions left to users, and (we hope) induces them toward safe behavior where they have to make these decisions; and on data entry interactions that (we hope, again) will make it easier for users to enter sensitive data into legitimate sites than to enter them into illegitimate sites.

Where this document specifies user interactions with a goal toward making security usable, no claim is made at this time that this goal is met: As noted in the Status of this Document section, this is an initial draft to trigger discussion and commentary; assume that what is proposed here is untested.

To complement the interaction and decision related parts of this specification, 7 Robustness addresses the question of how the communication of context information needed to make decisions can be made more robust against attacks.

Finally, 8 Authoring and deployment best practices is about practices for those who deploy Web Sites. It complements some of the interaction related techniques recommended in this specification. The aim of this section is to provide guidelines for creating Web sites with reduced attack surfaces against certain threats, and with usefully provided security context information.

This specification comes with two companion documents: [WSC-USECASES] documents the use cases and assumptions that underly this specification. [WSC-THREATS] documents the Working Group's threat analysis.

Thursday, April 3, 2008 (Permalink)

The W3C XML Core Working Group has a new last call working draft of the XML Linking Language (XLink) Version 1.1. There are three major changes in XLink 1.1 compared to 1.0:

  1. XLinks now contain IRIs rather than URIs
  2. All attributes in the XLink namespace are now reserved for future versions of XLink.
  3. Most importantly, the xlink:type="simple" attribute is no longer required.

That is a simple link can now be written like this:

<composer xlink:href="http://www.beand.com/">Beth Anderson</composer>

It's no longer necessary to write this:

<composer xlink:type="simple" xlink:href="http://www.beand.com/">Beth Anderson</composer>

This is a good thing. I'm not sure who first came up with this idea, but I've been advocating it for a while now. This makes XLink a lot more palatable in applications like SVG.

It's not immediately clear what changes necessitated going back from the previous candidate recommendation to a last call status again.

Wednesday, April 2, 2008 (Permalink)

The Mozilla Project has posted the fifth beta of Firefox 3.0 for Mac, Linux, and Windows. "Firefox 3 is based on the Gecko 1.9 Web rendering platform, which has been under development for the past 32 months. Building on the previous release, Gecko 1.9 has more than 12,000 updates including some major re-architecting to provide improved performance, stability, rendering correctness, and code simplification and sustainability. Firefox 3 has been built on top of this new platform resulting in a more secure, easier to use, more personal product with a lot more under the hood to offer website and Firefox add-on developers. [Improved in Beta 5!] Firefox 3 Beta 5 includes more than 750 changes from the previous beta, improving stability and web compatibility, providing platform and user interface enhancements, and resulting in the fastest Firefox ever. Many of these improvements were based on community feedback from the previous beta."

Tuesday, April 1, 2008 (Permalink)

The W3C XML Security Specifications Maintenance Working Group has posted the Proposed Edited Recommendation of XML Signature Syntax and Processing (Second Edition) "This Proposed Second Edition of XML Signature Syntax and Processing adds Canonical XML 1.1 as a required canonicalization algorithm and recommends its use for inclusive canonicalization. This version of Canonical XML enables use of xml:id and xml:base Recommendations with XML Signature and also enables other possible future attributes in the XML namespace. Additional minor changes, including the incorporation of known errata, are documented in Changes in XML Signature Syntax and Processing (Second Edition)." I have to read through the detailed changes, but at first glance this looks like a reasonable adjustment that doesn't break any existing code.

Monday, March 31, 2008 (Permalink)

The W3C XSL Working Group has published the requirements for the XSL Formatting Objects 2.0. "A number of XSL 1.0 implementations already support dynamic inclusion of vector graphics using W3C SVG. The XSL and SVG WGs want to define a tighter interface between XSL-FO and SVG to provide enhanced functionality. Experiments with the use of SVG paths to create non-rectangular text regions, or 'run-arounds', have helped to motivate further work on deeper integration of SVG graphics inside XSL-FO documents, and to work with the SVG WG on specifying the meaning of XSL-FO markup inside SVG graphics. A similar level of integration with MathML is contemplated."

Saturday, March 29, 2008 (Permalink)

Cambridge University's Toby O. H. White has released FoX, an open source, validating XML parser written in Fortran 95. It includes both SAX-like push and DOM interfaces. FoX is published under a BSD license.

Friday, March 28, 2008 (Permalink)

The OpenOffice Project has released OpenOffice 2.4, an open source office suite for Linux, Solaris, and Windows that saves all its files as zipped XML. New features in 2.4 include:

  • Connect to WebDAV servers via HTTPS
  • Custom icons for toolbars are imported
  • Control password-storing with a master password
  • Warning if document is from a newer ODF
  • PDF documents: relative links, document references, PDF/A-1 (ISO 19005-1) supported, and cross-document link behavior options
  • Mac OS X: Quicktime support for movies and sound / use the built in spell checker
  • Print dialog improvements in usability
  • Edit boxes: warning at limit of characters
  • DejaVu font is now default instead of BitStream Vera

Localisation

  • Entries for 10 languages added

Base / DBA

  • Improved rendering of numeric(n) data from JDBC and Oracle
  • Easier choice of table name in "Copy table"
  • Editing of views in HSQLDB
  • Query designer for all properties which allow SQL command
  • Query designer in SQL view
  • Relation design accessible for MySQL databases
  • Setting to check for required fields on forms
  • Support for Access 2007 (.accdb files)

Calc

  • Convert text to columns: with this feature CSV data inside cells can be transformed into columns directly
  • Columns and rows in spreadsheet can be moved with drag and drop
  • Enter key returns to the column where the input started, one row below
  • Formula input: "+" and "-" can also be used to start
  • Individual zoom level per sheet
  • AutoFilter: choices clearer grouped and based on result of filtering in other columns
  • DataPilot: Manual Sorting / Double-click in DataPilot cell provides calculation data of that cell
  • Performance improvement with functions VLOOKUP and MATCH
  • Print dialog for Calc easier to use
  • PageUp and PageDown keys work in print preview
  • Sheet names in cell-hyperlinks: renamed properly

Chart

  • Regression curves: show equations and R² value
  • Reverse axes possible
  • Bars on different axes displayed next to each other
  • Data labels: Number format
  • Data point label: display both value and percentage
  • Data label: display each part in a separate line
  • Data labels: more flexible placement of labels
  • Labels on pie segments: avoiding overlapping
  • Data point label: can be removed with delete key

Draw

  • Navigation (tab) order of page objects
  • PDF export: page names as bookmark
  • Reduce complexity: no longer necessary display options removed

Impress

  • Navigation (tab) order of page objects
  • Thrilling 3D effects in slide transitions
  • Export slide names as PDF bookmarks
  • Easier to insert background picture

Writer

  • Selecting rectangular region of text
  • Find and Replace: backward references in regular expressions
  • Spell checking: easier selecting of the language
  • Insert&Insert Object toolbar redesign - Writer
  • Printing of hidden text can be turned on
  • Printing text place holders can be turned off
  • Shortcuts added for paragraph style Heading 4, Heading 5 and Textbody
  • Ctrl-click behaviour for hyperlinks can be changed
  • Custom document properties: Text fields and UI support

Extensions/ programmability / API

  • Extensible Help System for extensions
  • Extensions can have a separate display name
  • Extensions: support of web based update
  • Extensions: additional information about the publisher and release notes
  • Extensions: check for updates
  • Dialogs can have a wallpaper set
  • Transparent background for controls
  • Remote control presentations via API
  • API: get selected table(s) or query(s) in the main Base window
Thursday, March 27, 2008 (Permalink)

The Mozilla Project has released Firefox 2.0.0.13. This release fixes a number of security issues. All users should upgrade.

A new version of SeaMonkey has also been posted, though Camino doesn't seem to have been updated yet. Camino users may want to switch to Firefox or Safari for the time being.

Tuesday, March 25, 2008 (Permalink)

The W3C Semantic Web Best Practices and Deployment Working Group and HTML Working Groups have published a new working draft of RDFa Primer 1.0.

Current Web pages, written in XHTML, contain inherent structured data: calendar events, contact information, photo captions, song titles, copyright licensing information, etc. When authors and publishers can express this data precisely, and when tools can read it robustly, a new world of user functionality becomes available, letting users transfer structured data between applications and Web sites. An event on a Web page can be directly imported into a desktop calendar. A license on a document can be detected to inform the user of his rights automatically. A photo's creator, camera setting information, resolution, and topic can be published as easily as the original photo itself.

RDFa lets XHTML authors express this structured data using existing XHTML attributes and a handful of new ones. Where data, such as a photo caption, is already present on the page for human readers, the author need not repeat it for automated processes to access it. A Web publisher can easily reuse data fields, e.g. an event's date, defined by other publishers, or create new ones altogether. RDFa gets its expressive power from RDF [RDFPRIMER], though the reader need not understand RDF before reading this document.

For simplicity, instead of using RDF terminology, we use the word "field" to indicate a unit of labeled information, e.g. the "first name" field indicates a person's first name.

RDFa uses Compact URIs, which express a URI using a prefix, e.g. dc:title where dc: stands for http://purl.org/dc/elements/1.1/. In this document, for simplicity's sake, the following prefixes are assumed to be already declared: dc for Dublin Core [DC], foaf for Friend-Of-A-Friend [FOAF], cc for Creative Commons [CC], and xsd for XML Schema Definitions [XSD]:

  • dc: http://purl.org/dc/elements/1.1/
  • foaf: http://xmlns.com/foaf/0.1/
  • cc: http://creativecommons.org/ns#
  • xsd: http://www.w3.org/2001/XMLSchema#

We use standard XHTML notation for elements and attributes: both are denoted using fixed-width lowercase font, e.g. div, and attributes are differentiated using a preceding '@' character, e.g. @href.

Here's a syntax example from the draft:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN" "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd">
<html xmlns:cal="http://www.w3.org/2002/12/cal/ical#"
      xmlns:contact="http://www.w3.org/2001/vcard-rdf/3.0#">
  <head>
    <title>Jo's Friends and Family Blog</title>
  </head>

  <body>
...
  <p instanceof="cal:Vevent">
    I'm holding
    <span property="cal:summary">
      one last summer Barbecue,
    </span>
    on
    <span property="cal:dtstart" content="20070916T1600-0500">

      September 16th at 4pm.
    </span>
  </p>
...
  <p class="contactinfo" about="http://example.org/staff/jo">
    <span property="contact:fn">Jo Smith</span>.
    <span property="contact:title">Web hacker</span>

    at
    <a rel="contact:org" href="http://example.org">
      Example.org
    </a>.
    You can contact me
    <a rel="contact:email" href="mailto:jo@example.org">
      via email
    </a>.
  </p>
...
    </body>

</html>

The thing that jumps out at me are the use of namespace prefixes in attribute values. Haven't we learned by now that this is a bad idea?

Monday, March 24, 2008 (Permalink)

The W3C has published the second working draft of Cool URIs for the Semantic Web:

The Semantic Web is envisioned as a decentralised world-wide information space for sharing machine-readable data with a minimum of integration costs. Its two core challenges are the distributed modelling of the world with a shared data model, and the infrastructure where data and schemas can be published, found and used. Users benefit from getting information "raw and now" [Give] and in portable data formats [DP]. Providers often publish data embedded in a fixed user interface, in HTML. A basic question is thus how to publish information about resources in a way that allows interested users and software applications to find and interpret them.

On the Semantic Web, all information has to be expressed as statements about resources, like the members of the company Example.com are Alice and Bob or Bob's telephone number is "+1 555 262 or this Web page was created by Alice. Resources are identified by Uniform Resource Identifiers (URIs) [RFC3986]. This modelling approach is at the heart of Resource Description Framework (RDF) [RDFPrimer]. A nice introduction is given in the N3 primer [N3Primer].

Using RDF, the statements can be published on the Web site of the company. Others can read the data and publish their own information, linking to existing resources. This forms a distributed model of the world. It allows the user to pick any application to view and work with the same data, for example to see Alice's published address in your address book.

At the same time, Web documents have always been addressed with URIs (in common parlance often referred as Uniform Resource Locators, URLs). This is useful because it means we can easily make RDF statements about Web pages, but also dangerous because we can easily mix up Web pages and the things, or resources, described on the page.

So the question is, what URIs should we use in RDF? As an example, to identify the frontpage of the Web site of Example Inc., we may use http://www.example.com/. But what URI identifies the company as an organisation, not a Web site? Do we have to serve any content—HTML pages, RDF files—at those URIs? In this document we will answer these questions according to relevant specifications. We explain how to use URIs for things that are not Web pages, such as people, products, places, ideas and concepts such as ontology classes. We give detailed examples how the Semantic Web can (and should) be realised as a part of the Web.

Saturday, March 22, 2008 (Permalink)

Oracle's John Snelson has posted a beta of Faxpp, an open source XML pull parser written in C with an API that can return UTF-8 or UTF-16 strings. Faxpp is published under the Apache License v2.

Friday, March 21, 2008 (Permalink)

The W3C has published a proposed edited recommendation of XML Base (Second Edition). Changes since the first edition include:

  1. The published errata (see http://www.w3.org/2001/06/xmlbase-errata) have been incorporated;

  2. The definition of URI reference has been switched from RFC2396 to 3986;

  3. The xml:base attribute has been redescribed as a Legacy Extended IRI, but this does not change its syntax (the December 2006 PER used the term "XML Resource Identifier" which was to be defined in an XLink revision, but that plan has been superseded by the definition of LEIRI in RFC 3987 bis);

  4. Implementations are now encouraged to return base “URIs” without escaping non-URI characters;

  5. The meanings of xml:base="" and xml:base="#frag" have been clarified;

  6. The expected reference to XML Base in the forthcoming XML Media Types RFC (“son of 3023”) has been noted;

  7. It has been clarified that normal validity rules apply to the xml:base attribute;

  8. The out-of-date appendix describing effects on other standards has been removed;

Wednesday, March 19, 2008 (Permalink)

Apple has released Safari 3.1 for Mac and Windows. This release speeds up JavaScript and plugs some security holes. New features include:

  • CSS 3 web fonts
  • CSS transforms and transitions
  • HTML 5 video and audio elements
  • HTML 5 offline storage for Web applications in SQL databases
  • The img element and CSS now support SVG images (Is inline SVG supported? I'll have to check. Yep, looks like it works but only if Safari recognizes the document as XHTML, not HTML. Firefox behaves similarly. )
  • Option in Safari preferences to turn on the new Develop menu which contains various web development features
  • Double clicking on the Tab Bar opens new tab
  • Caps Lock icon in password fields
Tuesday, March 18, 2008 (Permalink)

The W3C Working Group has published a new working draft of Protocol for Web Description Resources (POWDER): Description Resources.

The Protocol for Web Description Resources (POWDER) facilitates the publication of descriptions of multiple resources such as all those available from a Web site. These descriptions are always attributed to a named individual, organization or entity that may or may not be the creator of the described resources. This contrasts with more usual metadata that typically applies to a single resource, such as a specific document's title, which is usually provided by its author.

This document sets out how Description Resources (DRs) can be created and published, whether individually or as bulk data, how to link to DRs from other online resources, and, crucially, how DRs may be authenticated and trusted. The aim is to provide a platform through which opinions, claims and assertions about online resources can be expressed by people and exchanged by machines. POWDER has evolved from the data model developed for the final report [XGR] of the Web Content Label Incubator Group [WCL-XG] from which we define a Description Resource as: "a resource that contains a description, a definition of the scope of the description and assertions about both the circumstances of its own creation and the entity that created it."

The method of defining the scope of a DR, that is, defining what is being described, is provided in a separate document: Grouping of Resources [GROUP]. Companion documents describe the RDF/OWL vocabulary [VOC] and XML data types [WDRD] that are derived from the Grouping of Resources document and this document, with each term's domain, range and constraints defined. As each term is introduced in this document, it is linked to its description in the vocabulary document.

Monday, March 17, 2008 (Permalink)

The W3C XQuery working group has posted the candidate recommendations of XQuery Update Facility, XQuery Update Facility Use Cases, and XQuery Update Facility 1.0 Requirements. XQuery as it currently exists is basically just SELECT in SQL terms. XQuery Update adds INSERT, UPDATE, and DELETE. More specifically it is:

  • upd:mergeUpdates
  • upd:revalidate
  • upd:applyUpdates
  • upd:insertBefore
  • upd:insertAfter
  • upd:insertInto
  • upd:insertIntoAsFirst
  • upd:insertIntoAsLast
  • upd:insertAttributes
  • upd:delete
  • upd:replaceNode
  • upd:replaceValue
  • upd:replaceElementContent
  • upd:rename
  • upd:removeType
  • upd:setToUntyped

This is one of the last two pieces before XQuery 1.0 is really complete. (The other is full-text search.)

Saturday, March 15, 2008 (Permalink)

The Helsinki University of Technology has released X-Smiles 1.2, a proof-of-concept XForms engine written in Java. Version 1.2 improves support for XBL 2 bindings.

Thursday, March 13, 2008 (Permalink)

The W3C Authoring Tool Accessibility Guidelines Working Group has posted new working drafts of Authoring Tool Accessibility Guidelines 2.0 and Implementation Techniques for Authoring Tool Accessibility Guidelines 2.0. "An authoring tool that conforms to these guidelines will promote accessibility by providing an accessible user interface to authors with disabilities as well as enabling, supporting, and promoting the production of accessible Web content by all authors." and

Wednesday, March 12, 2008 (Permalink)

The W3C Web API Working Group has published the last call working draft of ElementTraversal Specification. "This specification defines the ElementTraversal interface, which allows script navigation of the elements of a DOM tree, excluding all other nodes in the DOM, such as text nodes. It also provides a property to expose the number of child elements of an element. It is intended to provide a more convenient alternative to existing DOM navigation interfaces, with a low implementation footprint." Hmm, just what the DOM needs: yet another way to do it.

ElementTraversal provides some extra properties/methods for navigating only through elements, while ignoring text and white space:

  • firstElementChild
  • lastElementChild
  • previousElementSibling
  • nextElementSibling
  • childElementCount

This makes it easier to process record-like XML, but inappropriate for reading documents with mixed content.

Tuesday, March 11, 2008 (Permalink)

The Mozilla Project has posted the fourth beta of Firefox 3.0 for Mac, Linux, and Windows. This is code named "Gran Paradiso". "Firefox 3 is based on the new Gecko 1.9 Web rendering platform, which has been under development for the past 28 months and includes nearly 2 million lines of code changes, fixing more than 11,000 issues. Gecko 1.9 includes some major re-architecting for performance, stability, correctness, and code simplification and sustainability. Firefox 3 has been built on top of this new platform resulting in a more secure, easier to use, more personal product with a lot under the hood to offer website and Firefox add-on developers. [Improved in Beta 4!] Firefox 3 Beta 4 includes more than 900 enhancements from the previous beta, including drastic improvements to performance and memory usage, as well as fixes for stability, platform enhancements and user interface improvements. Many of these improvements were based on community feedback from the previous beta."

Monday, March 10, 2008 (Permalink)

Sun has posted version 0.5.5 of xmlroff, an open source XSL Formatting Objects to PDF and PostScript converter. (Web site not yet updated though.)elharo xmlroff is written in C for Linux, and relies on the libxml2, libxslt, and the GLib, and GObjectfrom GTK+ and GNOME (though neither GTK+ nor Gnome is required). It also needs PDFlib, FreeType2, and Fontconfig. xmlroff can be run from the command line. It also includes a libfo library. This version improves table rendering.

Friday, March 7, 2008 (Permalink)

I've posted the updated notes from today's XForms talk at SD 2008 West. I suspect I'll be retiring this one after this week. There seems to be very limited interest, and the software is just not making fast enough progress. I last gave this talk three years ago, and the progress since then has been glacial. The action's all in AJAX and, maybe, HTML 5. Waiting for third parties to finish specs and software just doesn't work in Internet time.

Thursday, March 6, 2008 (Permalink)

Microsoft has posted the first public beta of Internet Explorer 8 for Windows:

Beta 1 is a developer preview for web designers and developers to help prepare their websites for the launch of Internet Explorer 8. Some of the new features designed for developers include a developer toolbar and improved interoperability and compatibility.

Internet Explorer 8 is designed to work in standard mode out of this box. However, Microsoft provides a way for users to browse the web in a way similar to Internet Explorer 7 by using the emulate Internet Explorer 7 button on the chrome.

Wednesday, March 5, 2008 (Permalink)

Updates have been and likely will continue to be a little slow this week since I'm busy at SD 2008 West. However I have posted the notes from my first two sessions, RSS, Atom, APP, and All That and Native XML Databases.

Saturday, March 1, 2008 (Permalink)

The W3C has posted the first public working draft of SKOS Simple Knowledge Organization System Primer:

SKOS — Simple Knowledge Organisation System — provides a model for expressing the basic structure and content of concept schemes such as thesauri, classification schemes, subject heading lists, taxonomies, folksonomies, and other types of controlled vocabulary. As an application of the Resource Description Framework (RDF) SKOS allows concepts to be documented, linked and merged with other data, while still being composed, integrated and published on the World Wide Web.

This document is an implementors guide for those who would like to represent their concept scheme using SKOS.

In basic SKOS, conceptual resources (concepts) can be identified using URIs, labelled with strings in one or more natural languages, documented with various types of notes, semantically related to each other in informal hierarchies and association networks, and aggregated into distinct concept schemes.

In advanced SKOS, conceptual resources can be mapped to conceptual resources in other schemes and grouped into labelled or ordered collections. Concept labels can also be related to each other. Finally, the SKOS vocabulary itself can be extended to suit the needs of particular communities of practice.

This document is a companion to the SKOS Reference, which gives the normative reference on SKOS.

Thursday, February 28, 2008 (Permalink)

The W3C Cascading Style Sheets Working Group has posted the first public working draft of CSSOM View Module. "The APIs introduced by this specification provide authors with a way to inspect and manipulate the view information of a document. This includes getting the position of element layout boxes, obtaining the width of the viewport through script, and also scrolling an element."

Wednesday, February 27, 2008 (Permalink)

The W3C Web API Working Group has posted the first public working draft of XMLHttpRequest Level 2. "XMLHttpRequest Level 2 enhances XMLHttpRequest with new features, such as cross-site requests, progress events, and the handling of byte streams for both sending and receiving." I'm afraid I'm not familiar enough with XMLHttpRequest Level 1 to tell immediately what's new here. Anyone want to summarize?

Tuesday, February 26, 2008 (Permalink)

Addison-Wesley is looking for a few kind folks to contribute cover blurbs for Refactoring HTML, and possibly a forward. If you're interested, drop me a line, and I'll pass your info along to my editors so you can get a preview copy of the book.

Monday, February 25, 2008 (Permalink)

XimpleWare has released VTD-XML 2.3, a free (GPL) non-extractive Java/C/C# library for processing XML that supports XPath. This appears to be an example of what Sam Wilmot calls "in situ parsing". In other words, rather than creating objects representing the content of an XML document, VTD-XML just passes pointers into the actual, real XML. (These are the abstract pointers of your data structures textbook, not C-style addresses in memory. In this cases the pointers are int indexes into the file.) You don't even need to hold the document in memory. It can remain on disk. This should improve speed and memory usage, but I haven't verified that, and I don't trust their own benchmarks. Version 2.3 fixes bugs, adds more encodings, and can dump an in-memory copy of the text. However it's still not a minimally conformant XML parser, and doesn't seem likely to become one. That severely reduced my interest.

Saturday, February 23, 2008 (Permalink)

A "Rough Cut" version of Refactoring HTML is now available on Safari. For some reason, my Safari account doesn't allow me to login and read this, so I'm not sure exactly which version is there. (I just finished reviewing the copy edits this past week.) Online access is $27.99. If you also want the printed book shipped to you when it's released--hopefully in time for JavaOne in May--the combined price is $53.98. Or you can pre-order the printed book from Amazon for $39.99.

Friday, February 22, 2008 (Permalink)

The W3C Semantic Web Deployment Working Group and XHTML 2 Working Group have posted the last call working draft of RDFa in XHTML: Syntax and Processing.

The modern Web is made up of an enormous number of documents that have been created using HTML. These documents contain significant amounts of structured data, which is largely unavailable to tools and applications. When publishers can express this data more completely, and when tools can read it, a new world of user functionality becomes available, letting users transfer structured data between applications and web sites, and allowing browsing applications to improve the user experience: an event on a web page can be directly imported into a user's desktop calendar; a license on a document can be detected so that users can be informed of their rights automatically; a photo's creator, camera setting information, resolution, location and topic can be published as easily as the original photo itself, enabling structured search and sharing.

RDFa is a specification for attributes to be used with languages such as HTML and XHTML to express structured data. The rendered, hypertext data of XHTML is reused by the RDFa markup, so that publishers don't need to repeat significant data in the document content. This document only specifies the use of the RDFa attributes with XHTML. The underlying abstract representation is RDF [RDF-PRIMER], which lets publishers build their own vocabulary, extend others, and evolve their vocabulary with maximal interoperability over time. The expressed structure is closely tied to the data, so that rendered data can be copied and pasted along with its relevant structure.

The rules for interpreting the data are generic, so that there is no need for different rules for different formats; this allows authors and publishers of data to define their own formats without having to update software, register formats via a central authority, or worry that two formats may interfere with each other.

RDFa shares some use cases with microformats. Whereas microformats specify both a syntax for embedding structured data into HTML documents and a vocabulary of specific terms for each microformat, RDFa specifies only a syntax and relies on independent specification of terms (often called vocabularies or taxonomies) by others. RDFa allows terms from multiple independently-developed vocabularies to be freely intermixed and is designed such that the language can be parsed without knowledge of the specific term vocabulary being used.

This document is a detailed syntax specification for RDFa, aimed at:

  • those looking to create an RDFa parser, and who therefore need a detailed description of the parsing rules;
  • those looking to recommend the use of RDFa within their organisation, and who would like to create some guidelines for their users;
  • anyone familiar with RDF, and who wants to understand more about what is happening 'under the hood', when an RDFa parser runs.

I think I'm just about ready to declare this a dead technology. The problem is that explicit, semantic markup is simply of little to no interest to the vast majority of content creators. Not enough people are willing to put in the extra effort to identify the relevant parts. The only way to find licenses on a document, events on a web page, and so forth is to make the computers clever enough to recognize such things from the plain text content and associated formatting.

If RDF, XML, Schemas, and everything else we've been working on for the last ten years hasn't succeeded in breaking the WYSIWYG barrier yet, why do we think they're suddenly going to do so now? Sooner or later, it's time to admit that the enterprise is fundamentally misguided. For RDF and the semantic web that time has come.

Thursday, February 21, 2008 (Permalink)

The W3C Voice Browser, Web APIs, and Web Application Formats (WAF) Working Groups have posted a new draft of Access Control for Cross-site Requests (formerly "Enabling Read Access for Web Resources" and "Authorizing Read Access to XML Content Using the <?access-control?> Processing Instruction 1.0"). According to the draft,

Cross-site requests are possible using the HTML img and script elements for instance. However, it is not possible to exchange the contents of resources or manipulate resources "cross-domain". This is to prevent information leakage and to ensure that malicious site can not delete your calendar data with cross-site requests using the HTTP DELETE method.

The policy this document introduces allows a resource to opt-in to allowing cross-site data retrieval of it and also enables a mechanism based on the same policy to allow a resource to opt-in to requests using an HTTP method other than GET. This policy builds on top of the existing restrictions already in place. This policy described in this document can only be used by a technology, such as XMLHttpRequest or XBL, when the respective specification of that technology describes how it applies.

The access control policy is defined in the resource that might be obtained and is expected to be enforced by the client that retrieves and processes the resource. Thus the client is trusted and acts as a policy enforcement point.

If you have a simple text resource residing at http://example.com/hello which contains the string "Hello World!" and you would like the hello-world.invalid domain to be able to access it the resource would look as follows (including one HTTP header that is significant):

Access-Control: allow <hello-world.invalid>

Hello World!

The hello-world.invalid can now access this document using XMLHttpRequest for instance with the following code:

new client = new XMLHttpRequest();
client.open("GET", "http://example.com.com/hello")
client.onreadystatechange = function() { /* do something */ }
client.send()

I've had this one explained to me repeatedly, and I still don't understand exactly what's going on here or why it isn't a security hole, but I guess there's a use case for it.

Wednesday, February 20, 2008 (Permalink)

Eve Maler and Jeanne El Andaloussi have published Developing SGML DTDs: From Text to Model to Markup on the Web. A tad dated, but there's still a lot of good stuff here.

Tuesday, February 19, 2008 (Permalink)

The W3C Internationalization Tag Set Working Group has posted the finished note on Best Practices for XML Internationalization. "This document provides a set of guidelines for developing XML documents and schemas that are internationalized properly. Following the best practices describes here allow both the developer of XML applications, as well as the author of XML content to create material in different languages."

Monday, February 18, 2008 (Permalink)

The W3C CSS working group has posted the last call working draft of CSS Module: Namespaces. This module "defines the syntax for using namespaces in CSS. It defines the @namespace rule for declaring the default namespace and binding namespaces to namespace prefixes, and it also defines a syntax that other specifications can adopt for using those prefixes in namespace-qualified names. ."

Given the namespace declarations:

@namespace toto "http://toto.example.org";
@namespace "http://example.com/foo";

In a context where the default namespace applies

toto|A
represents the name A in the http://toto.example.org namespace.
|B
represents the name B that belongs to no namespace.
*|C
represents the name C in any namespace, including no namespace.
D
represents the name D in the http://example.com/foo namespace.
Saturday, February 16, 2008 (Permalink)

Sun has posted version 0.5.4 of xmlroff, an open source XSL Formatting Objects to PDF and PostScript converter. xmlroff is written in C for Linux, and relies on the libxml2, libxslt, and the GLib, and GObjectfrom GTK+ and GNOME (though neither GTK+ nor Gnome is required). It also needs PDFlib, FreeType2, and Fontconfig. xmlroff can be run from the command line. It also includes a libfo library. This version fixes bugs.

Friday, February 15, 2008 (Permalink)

IBM developerWorks has published my look ahead at The future of XML: How will you use XML in years to come?.

Wednesday, February 13, 2008 (Permalink)

The Mozilla Project has posted the third beta of Firefox 3.0 for Mac, Linux, and Windows. This is code named "Gran Paradiso". "Firefox 3 is based on the new Gecko 1.9 Web rendering platform, which has been under development for the past 28 months and includes nearly 2 million lines of code changes, fixing more than 11,000 issues. Gecko 1.9 includes some major re-architecting for performance, stability, correctness, and code simplification and sustainability. Firefox 3 has been built on top of this new platform resulting in a more secure, easier to use, more personal product with a lot under the hood to offer website and Firefox add-on developers. [Improved in Beta 3!] Firefox 3 Beta 3 includes approximately 1300 individual changes from the previous beta, including fixes for stability, performance, memory usage, platform enhancements and user interface improvements. Many of these improvements were based on community feedback from the previous beta." I recommend skipping this release unless you need to test your own site. It's been breaking some Web-2.0ish sites.

Tuesday, February 12, 2008 (Permalink)

The Mozilla Project has released Firefox 2.0.0.12. "This release fixes a number of security and stability issues discovered in Firefox 2.0.0.12." All users should upgrade.

New versions of SeaMonkey and Camino with these fixes have also been posted.

Sunday, February 10, 2008 (Permalink)

The W3C Core Working group has published a proposed edited recommendation of XML 1.0, fifth edition. "This fifth edition is not a new version of XML. As a convenience to readers, it incorporates the changes dictated by the accumulated errata (available at http://www.w3.org/XML/xml-V10-4e-errata) to the Fourth Edition of XML 1.0, dated 16 August 2006. In particular, erratum [E09] relaxes the restrictions on element and attribute names, thereby providing in XML 1.0 the major end user benefit currently achievable only by using XML 1.1."

Hmm, this certainly looks like a new version of XML to me. The BNF has changed and previously malformed documents are suddenly well-formed. Existing parsers cannot handle the syntax defined by this draft. XML 1.1 has failed so now the W3C is trying to rewrite history and pretend that this is what they meant all along. (If that were true, why did we waste so much time on XML 1.1?) Apparently stability of standards is no longer a virtue at the W3C. This proposed edit is unnecessary and actively harmful to the community. It should be rejected.

Saturday, February 9, 2008 (Permalink)

Code Synthesis has released XSD 3.1.0, a free-as-in-speech (GPL) C++ W3C XML Schema to C++ data binding library. New features in this release include support for xsi:type and substitution groups.

Friday, February 8, 2008 (Permalink)

The Mozilla Project has posted version 0.8.4 of its XForms extension for Firefox. Mozilla XForms support has been developed by IBM, Novell, and independent contributors. It's not a complete XForms 1.0 or 1.1 implementation yet, but it's getting there.

Thursday, February 7, 2008 (Permalink)

Another day, another WordPress security bug. Matt Mullenweg has released Wordpress 2.3.3 an open source (GPL) blog engine based on PHP and MySQL. "If you have registration enabled a flaw was found in the XML-RPC implementation such that a specially crafted request would allow a user to edit posts of other users on that blog. In addition to fixing this security flaw, 2.3.3 fixes a few minor bugs." All users should upgrade.

Saturday, February 2, 2008 (Permalink)

The W3C Semantic Web Deployment Working Group has published a new draft of Best Practice Recipes for Publishing RDF Vocabularies. "This document describes best practice recipes for publishing vocabularies or ontologies on the Web (in RDF Schema or OWL). The features of each recipe are described in details, so that vocabulary designers may choose the recipe best suited to their needs. Each recipe introduces general principles and an example configuration for use with an Apache HTTP server (which may be adapted to other environments). The recipes are all designed to be consistent with the architecture of the Web as currently specified." This contains six recipes:

  • Recipe 1. Minimal configuration for a 'hash namespace'
  • Recipe 2. Minimal configuration for a 'slash namespace'
  • Recipe 3. Extended configuration for a 'hash namespace'
  • Recipe 4. Extended configuration for a 'slash namespace', using a single HTML document
  • Recipe 5. Extended configuration for a 'slash namespace', using multiple HTML documents
  • Recipe 6. Extended configuration for a 'slash namespace', using multiple HTML documents and a query service
Thursday, January 31, 2008 (Permalink)

The W3C XML Core Working Group has published the proposed recommendation Canonical XML 1.1. This attempts to address some of the weirdnesses of Canonical XML, such as the movement of xml:id attributes from one element to another and breaking of base URLs when canonicalizing.

Wednesday, January 30, 2008 (Permalink)

According to XiTi Monitor, Internet Explorer's share of the browser market has dropped to 66.1% and Firefox has risen to 28%. Opera and Safari trail behind with 3.3% and 2% respectively. Firefox seems to be catching on faster in Europe than the U.S. and Asia.

However one has to be a little skeptical of these numbers since XiTi doesn't seem to provide any indication of the uncertainty in their figures. It's hard to believe they can really be accurate to ±0.1% in all these different countries.

Monday, January 28, 2008 (Permalink)

The W3C Synchronized Multimedia Working Group has published what is both the first public and last call working draft of SMIL Timesheets 1.0. "This document defines an XML timing language that makes SMIL 3.0 element and attribute timing control available to a wide range of other XML languages. This language allows SMIL timing to be integrated into a wide variety of a-temporal languages, even when several such languages are combined in a compound document. Because of its similarity with external style and positioning descriptions in the Cascading Style Sheet (CSS) language, this functionality has been termed SMIL Timesheets."

This was formerly part of the SMIL 3.0 spec so making the same document both first an last draft is not as strange as it seems. Comments are due by February 15.

Friday, January 25, 2008 (Permalink)

RDF, OWL, SPARQL, and now SKOS, the Simple Knowledge Organization System Reference. "Using SKOS, conceptual resources can be identified using URIs, labeled with lexical strings in one or more natural languages, documented with various types of note, linked to each other and organized into informal hierarchies and association networks, aggregated into concept schemes, and mapped to conceptual resources in other schemes. In addition, labels can be related to each other, and conceptual resources can be grouped into labeled and/or ordered collections." How many of these things do we need before the Semantic Web is here? I think Clay Shirky was right: it really is turtles all the way up. The Semantic Web is like an undergraduate paper: never really completed, just abandoned at the point of exhaustion.

Thursday, January 24, 2008 (Permalink)

The W3C has posted the first three working drafts covering OWL 1.1:

OWL 1.1 Web Ontology Language: Mapping to RDF Graphs
"OWL 1.1 extends the W3C OWL Web Ontology Language with a small but useful set of features that have been requested by users, for which effective reasoning algorithms are now available, and that OWL tool developers are willing to support. The new features include extra syntactic sugar, additional property and qualified cardinality constructors, extended datatype support, simple metamodelling, and extended annotations. This document provides a mapping from the functional-style syntax of OWL 1.1 to the RDF exchange syntax for OWL 1.1, and vice versa."
OWL 1.1 Web Ontology Language: Model-Theoretic Semantics
"This document provides a model-theoretic semantics for OWL 1.1."
OWL 1.1 Web Ontology Language: Structural Specification and Functional-Style Syntax
This document defines a functional-style syntax for OWL 1.1, and provides an informal discussion of the meaning of the additional constructs. As well, an informational structural specification of OWL 1.1 ontologies is provided.
Wednesday, January 23, 2008 (Permalink)

The W3C RDF Data Access Working Group has published the finished recommendations of SPARQL Query Results XML Format, SPARQL Protocol for RDF, and SPARQL Query Language for RDF. According to the latter, "RDF is a directed, labeled graph data format for representing information in the Web. This specification defines the syntax and semantics of the SPARQL query language for RDF. SPARQL can be used to express queries across diverse data sources, whether the data is stored natively as RDF or viewed as RDF via middleware. SPARQL contains capabilities for querying required and optional graph patterns along with their conjunctions and disjunctions. SPARQL also supports extensible value testing and constraining queries by source RDF graph. The results of SPARQL queries can be results sets or RDF graphs."

Tuesday, January 22, 2008 (Permalink)

The W3C HTML Working Group has posted the first public working draft of their version of HTML 5. I haven't had time to do a side-by-side compare, but at first glance this seems to be essentially the same as the current work in the WhatWG. Perhaps there's a little less focus on parsing models. There's also a nice summary of the differences from HTML 4. I noticed for the first time that the acronym element (which I actually use on this site) has been removed.

Monday, January 21, 2008 (Permalink)

The W3C Synchronized Multimedia Working Group has posted the candidate recommendation of the Synchronized Multimedia Integration Language 3.0 (SMIL 3.0). SMIL 3.0 has four goals:

  • Define an XML-based language that allows authors to write interactive multimedia presentations. Using SMIL, an author can describe the temporal behaviour of a multimedia presentation, associate hyperlinks with media objects and describe the layout of the presentation on a screen.
  • Allow reusing of SMIL syntax and semantics in other XML-based languages, in particular those who need to represent timing and synchronization. For example, SMIL components are used for integrating timing into XHTML [XHTML10] and into SVG [SVG].
  • Extend the functionalities contained in the SMIL 2.1 [SMIL21] into new or revised SMIL 3.0 modules.
  • Define new SMIL 3.0 Profiles incorporating features useful within the industry.
Sunday, January 20, 2008 (Permalink)

Michael Kay has released version 9.0.0.3 of Saxon, his XSLT 2.0 and XQuery processor for Java and .NET. This is a bug fix release. This is a "Maintenance Release clearing all known bugs up to 18 Jan 2008."

Saxon is published in two versions for both of which Java 1.4 or later (or .NET) is required. Saxon 9.0B is an open source product published under the Mozilla Public License 1.0 that "implements the 'basic' conformance level for XSLT 2.0 and XQuery." Saxon 9.0 SA is a £250.00 payware version that "allows stylesheets and queries to import an XML Schema, to validate input and output trees against a schema, and to select elements and attributes based on their schema-defined type. Saxon-SA also incorporates a free-standard XML Schema validator. In addition Saxon-SA incorporates some advanced extensions not available in the Saxon-B product. These include a try/catch capability for catching dynamic errors, improved error diagnostics, support for higher-order functions, and additional facilities in XQuery including support for grouping, advanced regular expression analysis, and formatting of dates and numbers."

Saturday, January 19, 2008 (Permalink)

Henry S. Thompson and Richard Tobin have released XSV 3.1, a partial W3C XML Schema Validator for Linux and Windows. There's also a web form based interface. This is a bug fix release. XSV is published under the GNU General Public License.

Friday, January 18, 2008 (Permalink)

Microsoft has released Mac Office 2008 with support for the OfficeOpen XML document format. Other new features include a simplified toolbar; a toolbox for managing formatting, clip art, research, and bibliography palettes; and a publishing layout. However Visual Basic for Applications has been removed in this release. Anyone who needs that will have to stick with the previous release or use Windows. :-( Mac Office 2008 is about $131 depending on where you buy it.

Thursday, January 17, 2008 (Permalink)

Daniel Veillard has released version 2.6.31 of libxml2, the open source XML C library for Gnome. This release fixes assorted bugs including a serious security issue in UTF-8 handling. All users should upgrade.

Wednesday, January 16, 2008 (Permalink)

XMLMind has released Qizx/db 2.0, a $3200 closed source, embeddable native XML database engine written in Java that supports XQuery 1.0.

Tuesday, January 15, 2008 (Permalink)

The Free Software Foundation has released GNU IceCat (a.k.a. Gnuzilla), the GNU version of the Mozilla Firefox web browser.

While the basic Mozilla Firefox source code is free software, and we thank them for their significant contributions to the community, some non-free files are distributed in the Firefox source tree, and Firefox can recommend non-free plugins. IceCat is entirely free.

In addition, GNU IceCat includes some privacy protection features:

  1. Some sites refer to zero-size images on other hosts to keep track of cookies. When IceCat detects this mechanism it blocks cookies from the site hosting the zero-length image file. (It is possible to re-enable such a site by removing it from the blocked hosts list.)

  2. Other sites rewrite the host name in links redirecting the user to another site, mainly to "spy" on clicks. When this behavior is detected, IceCat shows a message alerting the user.

Monday, January 14, 2008 (Permalink)

The W3C XHTML 2 working group has posted the first public working draft of XHTML Access Module Module to enable generic document accessibility. This module defines acess, an empty element that can carry activate, key, targetid, and targetrole attributes.

  • The activate attribute indicates whether a target element should be activated or not once it obtains focus.
  • The key attribute assigns a key mapping to an access shortcut. Triggering an access key defined in an access element changes focus to the next element in navigation order from the current focus that has one of the the referenced role or id values.
  • The targetid attribute specifies one or more IDREFs related to target elements for the associated event.
  • The targetrole attribute specifies a space separated list of CURIEs that maps to an element with a role attribute with the same value.
Sunday, January 13, 2008 (Permalink)

The W3C Ubiquitous Web Applications Working Group has posted the first working draft of Delivery Context Ontology:

The Delivery Context Ontology provides a formal model of the characteristics of the environment in which devices interact with the Web. The delivery context includes the characteristics of the device, the software used to access the Web and the network providing the connection among others.

The delivery context is an important source of information that can be used to adapt materials from the Web to make them useable on a wide range of different devices with different capabilities.

The ontology is formally specified in the Web Ontology Language [OWL]. This document describes the ontology and gives details of each property that it contains.

Saturday, January 12, 2008 (Permalink)

The XML Apache Project has released Batik 1.7, an open source SVG display engine based on Java 2D. New features in 1.7 include xml:id, data URIs, DOM Level 3 ElementTraversal, an improved WMF transcoder, and a few SVG 1.2 features including handler elements.

Friday, January 11, 2008 (Permalink)

DataDirect Technologies has released XML Converters 3.1, Java and .NET components that provide XML access (SAX, StAX, and DOM) to non-XML files including EDI, flat files and other legacy formats. Version 3.1 adds support for Standard Exchange Format (SEF) and Health Level Seven (HL7), "the health industry's standard for the exchange, management and integration of healthcare information to support patient care. DataDirect XML Converters version 3.1 for both Java and .NET include an implementation of the HL7 standard from the draft 2.1 to the current 2.5 release, across all messages and events". Pricing is roughly $1000 per format converted.

Thursday, January 10, 2008 (Permalink)

NewsGator has released version 3.1 of NetNewsWire, a closed source feed reader for the Mac. The big change in this release is that it's now free-as-in-beer. They've apparently discovered that selling customers' private reading details is a more profitable enterprise than selling software so they want to get it into as many hands as possible. Personally I prefer network clients that don't send my subscriptions and reading lists back to the mother ship.

Wednesday, January 9, 2008 (Permalink)

Altsoft N.V. has released Xml2PDF 2007 1.2, a payware Windows program for converting XSL-FO, SVG, WordML, and XHTML documents into PDF files. New features in this release include Custom XML in Word 2007 source and a COM interface.

Tuesday, January 8, 2008 (Permalink)

The DBIS Group at University of Konstanz has released BaseX 4.0, an open source native XML database with a GUI frontend that supports most of XQuery 1.0 and some of XQuery Full-Text. It seems to be written in java so one presumes its platform independent.

Monday, January 7, 2008 (Permalink)

The Mac Mini that hosts xom.nu, The Cafes, and Mokka mit Schlag seems to have died. I'll bring them back as soon as I can, but it may take a couple of days.

Sunday, January 6, 2008 (Permalink)

John Cowan has released TagSoup 1.2, an open source, Java-language, SAX parser for nasty, ugly HTML. Version 1.2 changes the license to Apache 2.0. In addition,

  • The default content model for bogons (unknown elements) is now ANY rather than EMPTY. This is a breaking change, which I have done only because there was so much demand for it. It can be undone on the command line with the --emptybogons switch, or programmatically with "parser.setFeature(Parser.emptyBogonsFeature, true)".
  • The processing of entity references in attribute values has finally been fixed to do what browsers do. That is, a reference is only recognized if it is properly terminated by a semicolon; otherwise it is treated as plain text. This means that URIs like "foo?cdown=32&cup=42" are no longer seen as containing an instance of the cup character.
  • Several new switches have been added:
    • --doctype-system and --doctype-public force a DOCTYPE declaration to be output and allow setting the system and public identifiers.
    • --standalone and --version allow control of the XML declaration that is output. (Note that TagSoup's XML output is always version 1.0, even if you use --version=1.1.)
    • --norootbogons causes unknown elements not to be allowed as the document root element. Instead, they are made children of the default root element (the html element for HTML).
  • The TagSoup core now supports character entities with values above U+FFFF. As a consequence, the HTML schema now supports all 2,210 standard character entities from the 2007-12-14 draft of XML Entity Definitions for Characters, except the 94 which require more than one Unicode character to represent.
  • The SAX events startPrefixMapping and endPrefixMapping are now being reported for all cases of foreign elements and attributes.
  • All bugs around newline processing on Windows should now be gone.
  • A number of content models have been loosened to allow elements to appear in new and non-standard (but commonly found) places. In particular, tables are now allowed inside paragraphs, against the letter of the W3C specification.
  • Since the span element is intended for fine control of appearance using CSS, it should never have been a restartable element. This very long-standing bug has now been fixed.
  • The following non-standard elements are now at least partly supported: bgsound, blink, canvas, comment, listing, marquee, nobr, rbc, rb, rp, rtc, rt, ruby, wbr, xmp.
  • In HTML output mode, boolean attributes like checked are now output as such, rather than in XML style as checked="checked".
  • Runs of < characters such as << and <<< are now handled correctly in text rather than being transformed into extremely bogus start-tags.
Saturday, January 5, 2008 (Permalink)

Andy Clark has posted version 0.9.6 of his CyberNeko Tools HTML Parser for the Xerces Native Interface (NekoXNI). CyberNeko is written in Java. Besides the HTML parser, CyberNeko includes a generic XML pull parser, a DTD parser, a RELAX NG validator, and a DTD to XML converter. According to Clark

the implementation was updated to be compatible with the newest version of Xerces and the latest XNI API changes. And a number of outstanding bugs were fixed.

The only change that could affect users is that the minimum Java version required to run NekoHTML was increased to Java 1.3.

Friday, January 4, 2008 (Permalink)

The W3C has published the first working draft of Cool URIs for the Semantic Web:

The Semantic Web is envisioned as a decentralised world-wide information space for sharing machine-readable data with a minimum of integration costs. Its two core challenges are the distributed modelling of the world with a shared data model, and the infrastructure where data and schemas can be published, found and used. A basic question is thus how to publish information about resources in a way that allows interested users and software applications to find them.

On the Semantic Web, all information has to be expressed as statements about resources, like the members of the company Example.com are Alice and Bob or Bob's telephone number is "+1 555 262 or this Web page was created by Alice. Resources are identified by Uniform Resource Identifiers (URIs) [RFC3986]. This modelling approach is at the heart of Resource Description Framework (RDF) [RDFPrimer].

Using RDF, the statements can be published on the website of the company. Others can read the data and publish their own information, linking to existing resources. This forms a distributed model of the world.

At the same time, Web documents have always been addressed with URIs (in common parlance often referred as Uniform Resource Locators, URLs). This is useful because it means we can easily make RDF statements about Web pages, but also dangerous because we can easily mix up Web pages and the things, or resources, described on the page.

So the question is, what URIs should we use in RDF? As an example, to identify the frontpage of the Web site of Example Inc., we may use http://www.example.com/. But what URI identifies the company as an organisation, not a Web site? Do we have to serve any content—HTML pages, RDF files—at those URIs? In this document we will answer these questions according to relevant specifications. We explain how to use URIs for things that are not Web pages, such as people, products, places, ideas and concepts such as ontology classes. We give detailed examples how the Semantic Web can (and should) be realised as a part of the Web.

Thursday, January 3, 2008 (Permalink)

The OpenOffice Project has released OpenOffice 2.3.1, an open source office suite for Linux, Solaris, and Windows that saves all its files as zipped XML. "This is a minor bug fix release with no new features for users. However, as this release also fixes a security vulnerability with database files, we recommend all affected users should upgrade to this release."

Tuesday, January 1, 2008 (Permalink)

The Efficient XML Interchange (which is in fact none of those things) continues to roll along. The working groups has now published three new working drafts:

The best practice I can suggest for this is to ignore it. However the primer is probably the right place to start if you can't. Looking in the primer I almost immediately found yet another way in which EXI is not an alternative encoding of the XML infoset: it does not guarantee preservation of namespace prefixes. While one would have wished that namespace prefixes were insignificant, thats hip sailed long ago. The fact is, any XML processing has to preserve namespace prefixes faithfully or everything from DTDs to XSLT breaks. I don't think XML's namespace syntax is ideal by any means, but if you don't follow it you can't reliably claim to be representing XML.


News from 2007 | News from 2006 | News from 2005 | News from 2004 | News from 2003 | | News from 2002 | News from 2001 | News from 2000 | News from 1998 | News from 1999
[ XML Books | XML Trade Shows | XML Mailing Lists | XML Quotes ]

Copyright 2008 Elliotte Rusty Harold
elharo@ibiblio.org