2006 XML News

Sunday, December 31, 2006 (Permalink)

The W3C Web Services Activity has published new working drafts of Web Services Policy 1.5 - Guidelines for Policy Assertion Authors and Web Services Policy 1.5 - Primer. According to the primer,

Web services are being successfully used for interoperable solutions across various industries. One of the key reasons for interest and investment in Web services is that they are well-suited to enable service-oriented systems. XML-based technologies such as SOAP, XML Schema and WSDL provide a broadly-adopted foundation on which to build interoperable Web services. The WS-Policy and WS-PolicyAttachment specifications extend this foundation and offer mechanisms to represent the capabilities and requirements of Web services as Policies.

Service metadata is an expression of the visible aspects of a Web service, and consists of a mixture of machine- and human-readable languages. Machine-readable languages enable tooling. For example, tools that consume service metadata can automatically generate client code to call the service. Service metadata can describe different parts of a Web service and thus enable different levels of tooling support.

First, service metadata can describe the format of the payloads that a Web service sends and receives. Tools can use this metadata to automatically generate and validate data sent to and from a Web service. The XML Schema language is frequently used to describe the message interchange format within the SOAP message construct, i.e. to represent SOAP Body children and SOAP Header blocks.

Second, service metadata can describe the ‘how’ and ‘where’ a Web service exchanges messages, i.e. how to represent the concrete message format, what headers are used, the transmission protocol, the message exchange pattern and the list of available endpoints. The Web Services Description Language is currently the most common language for describing the ‘how’ and ‘where’ a Web service exchanges messages. WSDL has extensibility points that can be used to expand on the metadata for a Web service.

Third, service metadata can describe the capabilities and requirements of a Web service, i.e. representing whether and how a message must be secured, whether and how a message must be delivered reliably, whether a message must flow a transaction, etc. Exposing this class of metadata about the capabilities and requirements of a Web service enables tools to generate code modules for engaging these behaviors. Tools can use this metadata to check the compatibility of requesters and providers. Web Services Policy can be used to represent the capabilities and requirements of a Web service.

Web Services Policy is a machine-readable language for representing the capabilities and requirements of a Web service. These are called ‘policies’. Web Services Policy offers mechanisms to represent consistent combinations of capabilities and requirements, to determine the compatibility of policies, to name and reference policies and to associate policies with Web service metadata constructs such as service, endpoint and operation. Web Services Policy is a simple language that has four elements - Policy, All, ExactlyOne and PolicyReference - and one attribute - wsp:Optional.

Saturday, December 30, 2006 (Permalink)

Todd Ditchendorf has released TeXMLMate 1.1, a free-beer plug-in for the Macintosh TextMate text editor that "adds an XML parsing palette to the popular TextMate text editor for Mac OS X. While editing an XML (or XHTML) document in TextMate, you can open the TeXMLMate palette to conveniently check your document for well-formedness or validity against a DTD, W3C XML Schema, RELAX NG schema, or Schematron schema."


Peter Borg has released Smultron 2.2.6, an open-source Mac OS X text editor. Features include "line numbers, support for syntax colouring for many different languages, functions list, support for text encodings, snippets, a toolbar, a status bar, HTML preview, split window, multi-document find and replace with regular expressions, possibility to show invisible characters, tabs, authenticated open and saves, command-line utility, .Mac synchronisation, full screen editing and running commands from within Smultron."

Thursday, December 28, 2006 (Permalink)

The W3C XML Protocol Working Group has published four proposed edited recommendations for SOAP:

Besides incorporating errata, the primer adds an "an overview of the XML-binary Optimized Packaging, SOAP Message Transmission Optimization Mechanism and Resource Representation SOAP Header Block specifications and their usage." The adjuncts draft "incorporates changes to the SOAP Request Response Message Exchange pattern (MEP) to permit the SOAP envelope in the response to be optional, to allow for one-way message interactions." Comments are due by February 2.

Friday, December 22, 2006 (Permalink)

The W3C XML Core Working Group has published the last call working draft of Canonical XML 1.1. This attempts to address some of the weirdnesses of Canonical XML, such as the movement of xml:id attributes from one element to another and breaking of base URLs when canonicalizing.

Thursday, December 21, 2006 (Permalink)

The W3C has published a proposed edited recommendation of XML Base (Second Edition). The most significant change is the addition of a section on "Interpretation of same-document references":

RFC 3986 defines certain relative URI references, in particular the empty string and those of the form #fragment, as same-document references. Dereferencing of same-document references is handled specially. However, their use as the value of an xml:base attribute does not involve dereferencing, and XML Base processors should resolve them in the usual way. In particular, xml:base="" does not reset the base URI to that of the containing document.

Note:

Some existing processors do treat these xml:base values as resetting the base URI to that of the containing document, so the use of such values is strongly discouraged.

I think that XOM is one of the processors that does treat xml:base="" as resetting the base URI. I'll have to look into that. The question that arises is whether there should be a way to reset the base URI to that of the containing document.

Wednesday, December 20, 2006 (Permalink)

IBM's developerWorks has published XML 2006 Return to where it all began, my wrap-up of this month's XML 2006 conference. In addition to the final emergence from the post dot-bomb malaise and the possible expansion of Bubble 2.0, several factors converged to make this one of the most interesting XML conferences since the late 90s:

  • XQuery
  • Atom
  • Web 2.0

Complete story on developerWorks.

Tuesday, December 19, 2006 (Permalink)

Advanced Software Production Line has released libaxl 0.3, a C parser for XML. However it doesn't seem to be namespace aware, which makes it of very limited use in modern environments. Also, the API misspells "annotate".


XimpleWare has released VTD-XML 1.9, a free (GPL) non-extractive Java/C/C# library for processing XML that supports XPath. This appears to be an example of what Sam Wilmot calls "in situ parsing". In other words, rather than creating objects representing the content of an XML document, VTD-XML just passes pointers into the actual, real XML. (These are the abstract pointers of your data structures textbook, not C-style addresses in memory. In this cases the pointers are int indexes into the file.) You don't even need to hold the document in memory. It can remain on disk. This should improve speed and memory usage, but I haven't verified that. Version 1.9 is a bug fix release.


Opera Software has released version 9.10 of their namesake free-beer web browser for Windows, Mac, and Linux, FreeBSD, and Solaris. Opera supports XML, CSS, and XSLT. 9.10 adds phishing protection.


Syntext has released Serna 3.1. a $268 payware XSL-based WYSIWYG XML Document Editor for Mac OS X, Windows, and Unix. Features include on-the-fly XSL-driven XML rendering and transformation, on-the-fly XML Schema validation, XInclude, and spell checking. Version 3.1 adds several localizations including Chinese, Dutch, French, German, and Italian. It also fixes bugs.

Monday, December 18, 2006 (Permalink)

The W3C has released version 9.53 of Amaya, their open source testbed web browser and authoring tool for Solaris, Linux, Windows, and Mac OS X that supports HTML 4.01, XHTML 1.0, XHTML Basic, XHTML 1.1, HTTP 1.1, MathML 2.0, SVG, XML, RDF, XPointer, XLink, and much of CSS 2. Version 9.53 is mostly a bug fix release, but it does improve the math support somewhat. They've also released version 8.7.2, a big fix release for the old user interface.


Recordare has released Dolet 3.5 for Finale, a $129.95 payware plug-in for reading and writing MusicXML files. This release adds support for Java 6 and fixes assorted bugs. Upgrades are $79.95. Finale is required.

Sunday, December 17, 2006 (Permalink)

Opera Software has released version 9.02 of their namesake free-beer web browser for Windows, Mac, and Linux, FreeBSD, and Solaris. Opera supports XML, CSS, and XSLT. 9.02 is a bug fix release.

Friday, December 15, 2006 (Permalink)

JAPISoft has released EditiX 5.1, a $99 payware cross-platform XML editor written in Java. Features include XPath location and syntax error detection, context sensitive popups based on DTD, W3C XML Schema Language, and RelaxNG schemas, XSLT and XSL-FO previews, XInclude, XML catalogs, an XSLT debugger, DocBook support, and multi-view preview. Version 5.1 adds XPath 2.0 support and automatic restore for the last-open project. EditiX is available for Mac OS X, Linux, and Windows.

Thursday, December 14, 2006 (Permalink)

I've posted beta 12 of Jaxen 1.1, an open source (modified BSD license) XPath 1.0 engine for Java that is adaptable to many different object models including XOM, JDOM, DOM, and dom4j. Beta 12 is the first release candidate and has no known bugs. If no major problems are uncovered in the next couple of weeks, 1.1 will probably be released around the end of the year. Please try this release out, and let us know if you find anything problematic.

The major changes in this release are the removal of the Visitor interface and the matrix-concat extension function. We took them out because they were undocumented and buggy, and no one was willing to maintain them. If someone is willing to commit some resources (either temporal or financial) to them, they could be restored in the future. A few web site issues were fixed too.

Jaxen was originally written by James Strachan and Bob McWhirter. It is published under a modified BSD license.

Wednesday, December 13, 2006 (Permalink)

Oracle has released Berkeley DB XML 2.3.8, an open source "application-specific, embedded data manager for native XML data" based on Berkeley DB. It supports the recent proposed recommendations of XQuery 1.0 and XPath 2.0. It includes C++, Java, Perl, Python, TCL and PHP APIs. According to the announcement, "This release of Berkeley DB XML improves many aspects of query planning and execution. Using indexed node storage, users will generally experience significant speed increase. A new event-style layer allows for tight coupling between Berkeley DB XML and other XML processing code. This greatly enhances integration with programming languages and XML parsing libraries by eliminating the need to create and then re-parse XML content. The W3C XQuery 1.0 specification is nearing completion and this release of Berkeley DB XML is compliant with the current Proposed Recommendation."

I've never really played with this product, but I have a funny feeling I'm going to be looking at it a lot more closely in 2007. Berkeley DB XML is published under a custom, viral license that is compatible with most major open source licenses.

Tuesday, December 12, 2006 (Permalink)

The OpenOffice Project has released OpenOffice 2.1, an open source office suite for Linux and Windows that saves all its files as zipped XML. It also runs on the Mac with X-Windows. "The presentations application, Impress, now supports multiple monitors, with the presenter choosing where to display the presentation. The Calc spreadsheet has an improved HTML export capability, using styles to better recreate in a browser the appearance of the original spreadsheet. The database application, Base, has a number of enhancements, including improved support for Microsoft's Access product. The popular Quickstarter is now available for GNU/Linux users as a GTK application. OpenOffice.org's impressive language support is enhanced with five more localisations." OpenOffice is dual licensed under the LGPL and Sun Industry Standards Source License.

Monday, December 11, 2006 (Permalink)

The Mozilla Project has posted the first alpha of Firefox 3.0 for Mac, Linux, and Windows. This is just the beginning of reorganizing the code; no new features yet; but it's nice to see work starting. Final release is not expected for another year.


The Apache Commons Team has released Digester 1.8, a SAX-based XML to object mapper, designed primarily for parsing XML configuration files though it has other uses too. Digester is configured through an XML to Java object mapping module, which triggers actions whenever a pattern of nested XML elements is recognized. Version 1.8 is now compatible with Kaffe/GNU-Classpath. There are also new setStackAction and getCurrentNamespaces methods.

Sunday, December 10, 2006 (Permalink)

The W3C XSL Working Group has released the finished recommedation of Extensible Stylesheet Language (XSL) Version 1.1. Despite the name, this actually only covers XSL Formatting Objects, not XSL Transformations. New features in 1.1 include:

  • Multiple flows
  • Change marks
  • Back of the book indexing
  • Bookmarks
  • Markers in tables
  • fo:page-number-citation-last.
  • fo:page-sequence-wrapper
  • clear and float inside and outside
  • prefixes and suffixes for page numbers
Saturday, December 9, 2006 (Permalink)

Antenna House, Inc has released XSL Formatter 4.1 for Mac, Linux, and Windows. This tool converts XSL-FO files to PDF. Version 4.1 add support for XSL 1.1, PDF 1.6, PDF/X and Tagged PDF. New features in this release are fairly minor; for instance the printer tray can be specified.

The "lite" version costs $300 and up, but is limited to 300 pages per document and doesn't support right-to-left languages. Prices for the uncrippled version start around $1250. Support costs more.

Friday, December 8, 2006 (Permalink)

Wolfgang Meier has released eXist 1.0.1 and 1.1.1. "eXist is an Open Source native XML database featuring efficient, index-based XQuery processing, automatic indexing, extensions for full-text search, XUpdate support, XQuery update extensions and tight integration with existing XML development tools. The database implements the current XQuery 1.0 working drafts, with exception of the schema import and schema validation features defined as optional in the XQuery specification." The two version differ primarily in the indexing scheme used. 1.1 should be faster and not have document size limits. It's more bleeding edge, but 1.0 is quite a bit too limited for my tastes.

eXist is probably the major pure XML open source (LGPL) native XML database. However, whether it's ready for production use or not, and how large an application it can support, I don't know. I tend to doubt that it's the equal of DB 2 9 or Mark Logic, though I'd love to be proved wrong about that.

Thursday, December 7, 2006 (Permalink)

Microsoft's Craig Kitterman kicks off the morning by talking about "Ecma Office Open XML". That's a disingenuous name. This has nothing to do with OpenOffice. In fact, it's a direct competitor. Is there a trademark attorney in the house?

This is the default format in Office 2007. .docx is the file extension. "100% compatible with previous office documents." In other words, everything in classic Office binary files can be converted to XML with full pixel level fidelity. Licensed under Covenant Not to Sue and Open Specification Promise. The current draft spec is 6,000 pages long.

The basic message of this talk is that the format is an open standard, supported by many players. I don't buy it. ECMA is the rubber stamp of standards organizations, and any company the size of Microsoft can get a few friends to lend their names. 6,000 page specs that document legacy formats aren't open. There's no reasonable way anyone can hope to implement all of this faithfully without Microsoft's legacy code base. I doubt even Microsoft can do it. Documenting all the kinks and corner cases of a 10+ year old legacy format of one product does not turn it into a true open standard. Open standards start from scratch with full consideration for all players. They are not crippled by insistence on compatibility with decades of legacy code from one product and one company. They are independent of particular implementations. This is not a neutral file format. It vastly privileges Microsoft Office.

That said, I'm glad this exists. It is an improvement over Microsoft's classic, undocumented, binary file formats. However, it is not a plausible alternative to OpenDocument. It is far too complex and too baroque.


Paolo Marinelli and Stefano Zacchiroli from the Università di Bologna won the XML 20006 Student Scholarship with Co-constraint Validation in a Streaming Context. Paolo gets to give the morning keynote. Oh great. There's no wireless in the keynote room, again. Bleah.

This was the most technical talk I've seen at this conference. I don't think I followed it all, but the ideas seem good. The notion of automatically rewriting location paths such that reverse axes turn into forward axes was quite clever. e.g. /descendant::x[preceding-sibling::y] becomes /descendant::x[/descendant::y/following-sibling::node() == self::node()].


I rushed out of the keynote as soon as it was over and actually got to the break area in time to get some coffee this morning. They still ran out, but not before got some.


For the first morning session, Andrew Savikas from O'Reilly talks about the Atom Publishing Protocol, APP; and there's no wireless again. They use DocBook subsets fro Safari and Safari U (but not the same subset). Moving from classic paper book publishing to more continuous, adaptable publishing in a variety of formats is driving some changes in process.

APP supports the creation of arbitrary resources over the Web, not just blog entries. They publish DocBook 4.4, XHTML, PDF, and a variety of image formats. "Having to resolve 10 years of DocBook validity errors takes a while." O'Reilly chose DocBook 4.4 because it's closest to their existing content, and the DocBook XSL stylesheets don't perfectly support DocBook 5 yet (though that's improving fast). The repository is a Mark Logic native XML database.


The next morning session features Michael Kay talking about Meta-stylesheets; i.e. stylesheets that generate or operate on stylesheets. XProc is useful. Pipelines are useful. Schemas are useful for debugging.

Michael Kay lecturing

Lunch was cold wraps and cookies, not too bad but not as good as the last couple of days. (Ever sin ce first grade, I've always been a fan of hot lunches.) I talked to a few vendors about their XML databases. IBM's DB2 looks worth a further look.


After the lunch break, I returned to Back Bay A for the DITA (Darwin Information Typing Architecture) panel, despite the nonexistent wireless network in this room. I've heard a lot about DITA, but I'm not really sure what it does. My vague picture is sort of like DocBook but for man pages. Perhaps I'll get an idea whether or not this is worth paying further attention too.

First speaker is Alan Houser from Group Wellesley. He''s giving the 30,000 foot view. DITA was developed inside IBM to replace IBMIDDOC. Donated to OASIS, DITA 1.0 is current spec. 1.1 is under development. It's an architecture, not just a markup language.

  • Topics are core information units; a stand alone reusable chunk of information
  • Task, concept, and reference information types
  • Sppecialization and generalization: domain and structural specialization
  • Attribute based formatting; like class attribute in HTML
  • Maps organize topics, blocks, and words/phrases into books, web sites etc.
  • Metadata-based filtering excludes or annotates content at runtime.
  • DITA Open toolkit

Start by creating the map file (the outline) rather than the text.

Second speaker is Sean Angus from XyEnterprise talking about the RIM Blackberry and DITA. Cost of tranlsation reduced 75%. Productivity increased 20%. 14 month recouped investment.

Third speaker is Scott Hudson from Flatirons Solutions. He's talking about DocBook vs. DITA. This is interesting, but I'm falling asleep anyway. I may have to split early to hit Dunkin Donuts for some coffee. Twice as much coffee as Starbucks for half the price.


I would have liked to have heard morre about DITA, but I instead I defected at the half to hear IBM's Elias Torres talk about the Apache Project's Abdera, a Java class library for publishing, consuming, and transmitting APP. There's no client user interface or storage backend. A file system backed storage system is planned but not yet implemented. Lotus Ventura (whatever that is) is a major user.


Kenneth Sall and Ronald R. Reck are talking about "applying XQuery which is safe to say the hit technology of the conference" (Simon St. Laurent). Specifically they are talking about applying XQuery and OWL to Wikipedia, the CIA World Factbook, and Project Gutenberg. They want to combine these data sources.

Problem statement: find all the project Gutenberg books written by male European authors in the 19th century.

  1. First they need to convert Project Gutenberg from text to RDF.
  2. Find the authors of each book in Wikipedia to determine gender and time.
  3. Use the factbook to identify European countries.

Wikipedia wasn't very structured or marked up. They had to use proximity search to determine nationality.

Used the DAML OWL version of the CIA World Factbook.

eXist was their XQuery implementation. It worked well for them. Scalability is not yet known.


The final slot of the conference was the first one where I didn't find at least two sessions I really wanted to see. I decided to stick around in the Web 2.0 track to hear Harry Halpin from the University of Edinburgh talk about Social Semantic Mashups: Exploring Social Networks with Microformats and GRDDL.

Data is trapped within HTML or in relational databases behind firewalls. He wants to liberate the data and put it in a common format: RDF. Microformats are "the lower case semantic web". Many sites are using microformats including Yahoo; but data checks in. It doesn't check out. They are writing XSLT stylesheets to convert microformats documents to RDF. Social networks (Linked In, etc.) trap users within their own network. Oh my god! Eigenvectors! There's something I haven't seen or thought of for ten years, probably more.

I think I get GRDDL for the first time. It's just a way of linking an arbitrary namespace well-formed XML document or an XHTML document to a stylesheet that transforms that document into RDF. That's it. You can also put the transformation links in the namespace document (e.g. RDDL) rather than in the document itself.

I still don't buy the semantic web though (upper or lower case).


Searched Bloglines for various other people commenting on XML 2006 from the conference. People I've noted include:

Wednesday, December 6, 2006 (Permalink)

The morning begins with yet another sponsored vendor presentation from Just Systems, a Japanese company that's acquired XMetal and is branching out. They're doing client side mashups, but not in a standard browser.


There are two kinds of conference keynotes. Some keynotes are given by techie celebrities talking about interesting things. The second kind are given by some CEO or marketing flack you've never heard of, talking about their company's "vision". The crucial difference between the two is that the conference usually pays a large amount of money for the first and is paid a large amount of money for the second. It's the difference in whether the conference is selling the speakers to the attendees or the attendees to the speakers. Most conferences try to do a little of both to greater or lesser degrees.

This one may actually be the first type. The speaker is Darrin McBeath from Reed Elsevier on Unleashing the Power of XML. Elsevier is not a conference sponsor, and is not selling us a product. They're an XML user, not a vendor. They use DTDs for documents, W3C schemas for some services. They've played with web services, but without a lot of success. They can only make this work with people they've been working with before, and whom they have contracts with; not with end users. (He seems a little surprised by this, but it's pretty much exactly what I'd expect from a SOAP-based system.) He claims web services haven't had a large impact on publishing. That doesn't surprise me either.

Namespaces are a major weakness of XML, mostly due to complexity. Schemas are too complex. In general, complexity of any kind and from any source seems like a big problem for him. Only the techies understand XML.

3 papers on XQuery last year at XML 2005. 9 papers this year. I'm not sure that reflects anything real, or just the preferences of the referees. Most publishers are not yet using either hybrid (XQuery+SQL) or native (XQuery only) XML databases. If anything they're using Saxon or DataDirect XQuery. XQuery+XML database does speed up the development of publishing applications, and the execution speed of these applications. Unlike other search engines, XQuery has no predefined granularity.


Oh damn. Wireless is broken again. I have connectivity. It looks like a DNS problem.


Note to speakers: don't tell us how big your company is, how many employees you have, how much revenue you had last year, how long you've been in business, the number of Ph.D's you've hired, the percentage of Lexus's in the HQ parking lot, or your CEO's golf handicap. We don't care!


First conference session of the morning is Yahoo's Douglas Crockford on "JSON, the Fat Free Alternative to XML". Of course, a fat free diet can kill you. I intend to listen very closely to this talk for advocacy of any nutritional deficiencies, fad diets, or eating disorders. I plan to explain why this approach is broken in my first panel session this afternoon. I expect I'll later write that up in a Cafes article.

"XML on the Web has effectively died" -- Simon St. Laurent.

Douglas Crockford's question is "How should the data be delivered?" More accurately, in what format? He has programmer-colored glasses. He wants all his data to look like a programming languages data structure, specifically, JavaScript. I will explain exactly what is wrong with this goal this afternoon.

  • JSON is language independent, text-based (Unicode; autodetected encoding), lightweight, easy-to-parse.
  • Only for data, not documents. (a false dichotomy, IMNSHO)
  • Based on quasi-literal notation.
  • Types include integers, reals, strings, booleans, null.
  • Objects are collections of name-value pairs. (Really it's just an associative array.)
  • Names are strings that need not be unique (though it is recommended).
  • Arrays are ordered lists.
  • application/json MIME type. "It appears that compliance with formal standards causes things to break."
  • No version.
  • YAML is a superset of JSON.
  • "JSON has become the X in AJAX"
  • eval is a security risk; use parseJSON instead
  • There are other security issues I don't fully understand

"Because the consumer of your data and the generator of your data both tend to be written in programming languages, I contend it's not a problem."

He explains objections to JSON:

  • Namespaces
  • No Validators
  • Not extensible
  • Not XML

I can't type fast enough to explain his arguments or rebut them, but he's missing a lot. He has a very narrow view of the world. A little more on that this afternoon.

There's a JSON-XML mapping, but someone screwed up. It's not fully round-trippable. I don't see any fundamental reason it couldn't be. It just isn't. (XML->JSON is much trickier.)

JSON used to have comments, but people started putting metadata into comments, so they took them out. (Maybe people do need metadata?)


Next in this track, Jason Hunter discusses Web Publishing 2.0 and XQuery. Content sizes are increasing. People's expectations are growing for immediate, relevant, searchable access. XML doesn't fit well into a relational model. XML is a triangle (tree) and can't fit into rectangular tables (SQL). Excellent visualization. I'll have to remember that one.

He misses one of the crucial lessons of Google, though. Fielded search is powerful for professional librarians, but a failure for end users such as doctors (or really, anyone).

XML is the raw negative. It gives the publisher more information than Google (which only sees the HTML) so they can do better search. (True, but not important outside a few very special areas like Lexis-Nexis). Most searchers want broader search rather than more specialized.

He sees a lot of Web 2.0 startups using mySQL that he thinks should be using a native XML database. Well, maybe; but until there's a decent open source native XML database that's not going to happen. Mark Logic is way too expensive for a startup. Last time I checked, eXist was too unreliable. I do expect to see a solid open source native XML database, but probably not before 2008 at the earliest.

They're some interesting use cases here such as Safari U, but it's all Web 1.0. So far I don't see any Ajax or interactivity with the client. It's all done on the back end with Mark Logic's XML database. He's proposing some personalization of RSS and so forth, but these examples are hypothetical.

XQuery has much less impedance mismatch for web apps than relational databases and Java. Hibernate == translate Java to relational. JSF == translate Java to HTML. XQuery has less translation to do.


In the first afternoon sessions, Yahoo's Dan Theurer discusses "What Powers Web2.0 mashups?" He's describing Yahoo's APIs for Mail, Answers, etc.


I don't have notes from the last two afternoon sessions because I was in them. Simon St. Laurent led a rollicking panel on Web 2.0 and XML with me, Jason Hunter, and Eric van der Vlist. I'll be writing up soome of my remarks for the Cafes on the train home. Look for it this weekend or early next week.


The final session was a panel on next generation XML APIs led by Norman Walsh. I talked about XOM. Eric van der Vlist talked about TreeBind and Philippe Poulard talked about Active Tags. That session didn't go so well since none of us had enough time to really explain our approach, and everyone was rushed. Plus none of the panelists could see or comment on the others's slides.

Most of the panels at this show, including these two, were made by combining proposals the referees didn't think were worth a full 45 minute slot; and that rarely works. In the future I think papers should be accepted or rejected without offering panel positions as consolation prizes. Panels are fun, but they need to be developed as a panel, not as an unrelated collection of 15 minute sessions.

One thought about panels: if there's PowerPoint or other forms of slides, they don't work. Panels are designed for conversation, not for lecture.

Tuesday, December 5, 2006 (Permalink)

The train ran on time, but wireless was nowhere to be found so updates will be time delayed. Jason Hunter suggested EVDO. That's the one where you pay Verizon $60 a month for "unlimited" access until you actually try to use it and Verizon cuts you off. The conference program says "Wireless access will be available in the Registration Area, Tutorial Rooms, Breakout Rooms, Exposition Hall and General Session". I guess that doesn't include the classrooms. :-(


In the keynote (which I missed, second hand info here) Oracle announced a new Zorba native XML database.


For the first session of the morning there were at least three interesting talks, and everyone I talked to thought they should go here Microsoft talk about the schema adoption study, but decided to go hear the fun one about Mozilla Application Development instead. (That's the sort of conference this is. Mozilla application development counts as fun. Even for these geeks, W3C schemas aren't fun.)

It's a polite audience, too polite. If this were Extreme someone would have interrupted the first speaker by now. Ken Holman tells me it's been getting smaller every year. I thought this was supposed to be a the big XML conference but it really doesn't seem any bigger than Extreme, though they do run four concurrent tracks instead of two.

Fabrice Desré from France Telecom is talking about developing applications on top of Mozilla with RDF and XUL. With JavaScript and DOM, it's too hard because of impedance mismatches between objects, XML, relational databases, and so forth. He wants to replace this with an XML centric architecture that stores all data in a native XML database (specifically Berkeley DB XML) and queries it with XQuery. He also introduces something called REX, remote events for XML. XForms are also involved somehow.

REX is new to me. (I feel a developerWorks article coming on.) It's being developed by the W3C Web API working group. It's an XML grammar for representing DOM events, so it can transmit DOM modifications from one endpoint to another. It can stream and supports timestamped events. Supported events include:

  • mutation
  • node removed
  • attribute modified
  • character data modified

XPath-like syntax is used to target the events. XQuery generates the events. (How?)

What version of Mozilla supports this? Possibly a customized version? XPCom components expose the database and the REX processor.

They've defined two custom URI schemes, xdb to get a document out of the database and xqy to run an XQuery.

JavaScript is still needed because not all XUL widgets expose all their properties as XML. Furthermore, XPCOM components are not accessible from XQuery.

This has only been a high level overview. I'm left wanting to see some actual code, and an actual application.


XML 2007 will also be in Boston, December 3-7.


The second session of the morning is also a tough choice, but I think I'm going to stick around in Back Bay B to hear Robin Hastings from the Missouri River Regional Library talk about PHP and XML. I should probably listen to Michael Sperberg-Mcqueen discuss "Daddy? Where do schemas come from? Some facts of life for schema users" but the coffee ran out before I could get any; and I'm just not awake enough to pay attention to "rules for finding schema components", even though Michael's usually a wonderful speaker. The Microsoft talk on Linq and XLinq also sounds interesting, for the ideas at least. But it's likely to be limited to Microsoft platforms and thus unlikely to be directly relevant to anything I do.

She's talking about Magpie RSS, and RSS 1.0 reader written in PHP that supports caching. This might be a nice basis for a custom feed reader I could install on my server just for me. I find I prefer web-based aggregators like Artima or Bloglines from a user interface perspective. However, I don't like letting them know what I read. A custom local bloglines running on my desktop could be very useful.


Lunch was tastier than the usual cold sandwiches, but once again I got skunked on the coffee. This conference is suffering from a serious lack of caffeine.

The afternoon commences with a talk from Sam Hiser about the OpenDocument plugin for Microsoft Office in Back Bay A. There is minimal wireless coverage in this room (unlike Back Bay B, which had no wireless access at all). That is, occasionally the wireless cuts in for long enough for me to surf one web page or two, but it doesn't stay up long enough to check e-mail or do anything complicated. I asked someone from the hotel staff about this, and he evinced shock that there was no wireless in the previous room, but I don't believe him. I know I'm not the only one who couldn't connect in that room -- and I'm sure it's been a problem before. I wish hotels would stop lying about their Internet connectivity. They have to know they have problems. It's not like it's hard to set up wireless access. You just put a $50 router in every room. I was told at lunch that the conference is switching hotels next year. Maybe the Marriott will be better.

His comparison between the forces of darkness and forces of light are too much, even for me. He does everything short of calling Microsoft Nazis. He does use the word "fascists". Dude, it's just a document format! Not World War III.

Microsoft Office file formats all "check in through RTF". Word plugin in about 60% done. Another 3-4 months of work is required, but funding is needed first. The plugin will actually generate version 1.2 of ODF which isn't finished yet. There are too many incompatibilities between Word and ODF 1.0.


Next up is a panel discussion on using Word and/or OpenOffice to create XML documents. Panelists include John Parsons from XyEnterprise and Marc Jacobson from Really Strategies and Clyde Hatter from Propylon. According to Parsons, word processors are for one-off, single use, non-reuabale documents. XML is for reusable documents. Jacobson discusses customzing Word to generate XML in various ways. Hatter talks about OpenOffice, which seems to provide much superior hooks for extending it as a custom XML editor. What he's demoing is way beyond what Jacobson and Parsons showed with Word.


No coffee at all during the break; just Pepsi, which ran out before I could get any. (Did they underestimate attendance?) Thank the Flying Spaghetti Monster there's a Starbucks in the lobby.


Third afternoon session is Patrick Chanezon with Fun and Profit with the Google Checkout API, in Back Bay B. I'm sitting in the middle of the room instead of the back now; and, fingers crossed, this seems to have almost adequate wireless reception. I'm not sure what this talk is doing at an XML conference. Maybe it uses HTTP to transfer XML documents? We'll see. Yep, that seems to be what it's doing. Basic Auth over SSL. Synchronous and asynchronous. XML Digital Signatures. There's also an option to add a simple HTML form to your page (though it's unsigned).

63% of online shopping carts abandoned after beginning checkout. 37% of online purchases begin with search. Google checkout focuses on ease of use to encourage shopping. Transaction fees are cheaper for AdWords buyers. Otherwise 2% + $0.20 er transaction. Google does not share customer e-mail address with the merchant. E-mail from merchant to customer must go through Google (which could read it). Not sure whether credit card statement indicates Google or merchant. This is very important for disputes.

There's some back and forth from Google to the merchant to calculate shipping and taxes and so forth. Merchant must respond within 3 seconds or default values will be used. That's going to be tricky for small merchants. This is all non-trivial for a merchant to implement. It is much harder than accepting Paypal, for example. I am not sure this is all that much easier than accepting credit cards yourself.


The final afternoon session is Norm Walsh and Sam Page on XML Pipelines. This should be interesting. I wrote about the rough approach of pipelines in Item 30 of Effective XML, but now there's a standard and some tools for doing this, instead of just having to roll your own.

Three goals for the XML Processing Model (XProc):

  1. A vocabulary for doucments that specify what should happen to a given set of XML documents in what sequence (call these processing schemas?)
  2. What are the default for processing in the absence of such a document?
  3. Exception handling (very poorly described in current spec accoording to Norm)

Multistage XSLT and XInclude are part of the problem. Validate before or after XInclusion or both?

Norm is implementing XProc in Java.

Parameters can be passed to components; e.g. XSLT variables. Parameters are strings, not document fragments.

Dana Florescu asks about the data model. It's a good question. Not all the specs involved use the same data model. They are punting on the data model, and I suspect that's the correct answer to Dana's question.

Dana Florescu also is worried about duplication with XQuery. Norm isn't. (me: XQuery duplicates a lot more of XSLT.)

I like what they're trying to do. The angle bracket syntax seems very hard to follow. The same names show up in too many different places, and mean too many different things. step and source are likely to change names.


Final event of the day: PechaKuchu which is Japanese for "Yackity Yack". It's an open mike night. Everyone gets 6 minutes and 40 seconds to talk about whatever they want.

First presentation is Stylus Studio 2007. Blah. Blah. Blah. Marketing drivel. I could have read this in a brochure. It supports pipelines.

Next is Jonathan Robie talking about DataDirect XQuery, an adapter for non-XML data you want to query with XML tools. Is this session all just vendor hype? If the next talk isn't better, I may skip out early.

Ken Holman's a better speaker, but he's still just hyping his wares. I should have submitted to show some of my best bird slides. It would have been more interesting, and about as relevant.

Next is the Oracle XML Query Service. This is followed by more Oracle hype. If this room didn't have the best wireless in the building, I'd be out of here. Oh my god, the next talk is more Oracle, and it's the worst one yet. Binary XML! That's it. I'm gone. See you tomorrow.

Monday, December 4, 2006 (Permalink)

This week I'll be at XML 2006 in Boston. If the trains run on time and wireless access is available, live coverage should commence about 10:30 A.M. tomorrow.


Chris Chiasson has released MMADE, a free-as-in-speech (GPL) tool for generating DocBook documentation from Mathematica calculations.

Sunday, December 3, 2006 (Permalink)

The W3C Compound Document Formats Working Group has updated three last call Web Integration Compound Document (WICD) working drafts. For example, a compound document might embed SVG and MathML in DocBook or SMIL and XForms in XHTML.

  • WICD Core 1.0 defines a "specifies WICD Core 1.0, a device independent Compound Document profile based on XHTML, CSS and SVG."
  • WICD Full 1.0 defines "a Compound Document profile based on XHTML, CSS and SVG, targeted at desktop agents."
  • WICD Mobile 1.0 defines "a Compound Document profile based on XHTML, CSS and SVG, which is targeted at mobile agents."
Friday, December 1, 2006 (Permalink)

The W3C Compound Document Formats Working Group has published an updated last call working draft of Compound Document by Reference Framework 1.0.

Combining content delivery formats can often be desirable in order to provide a seamless experience for the user.

For example, XHTML-formatted content can be augmented by SVG objects, to create a more dynamic, interactive and self adjusting presentation. A set of standard rules is required in order to provide this capability across a range of user agents and devices.

These are examples of possible Compound Document profiles:

  • XHTML + SVG + MathML
  • XHTML + SMIL
  • XHTML + XForms
  • XHTML + VoiceML

This document defines a generic Compound Document by Reference Framework (CDRF) that defines a language-independent processing model for combining arbitrary document formats.

NOTE: The Compound Document Framework is language-independent. While it is clearly meant to serve as the basis for integrating W3C's family of XML formats within its Interaction Domain (e.g., CSS, MathML, SMIL, SVG, VoiceXML, XForms, XHTML, XSL) with each other, it can also be used to integrate non-W3C formats with W3C formats or integrate non-W3C formats with other non-W3C formats.

Thursday, November 30, 2006 (Permalink)

The W3C the Timed Text (TT) Working Group has posted the candidate recommendation of Timed Text (TT) Authoring Format 1.0 – Distribution Format Exchange Profile (DFXP). According to the abstract,

This document specifies the distribution format exchange profile (DFXP) of the timed text authoring format (TT AF) in terms of a vocabulary and semantics thereof.

The timed text authoring format is a content type that represents timed text media for the purpose of interchange among authoring systems. Timed text is textual information that is intrinsically or extrinsically associated with timing information.

The Distribution Format Exchange Profile is intended to be used for the purpose of transcoding or exchanging timed text information among legacy distribution content formats presently in use for subtitling and captioning functions.

In addition to being used for interchange among legacy distribution content formats, DFXP content may be used directly as a distribution format, for example, providing a standard content format to reference from a <text> or <textstream> media object element in a [SMIL 2.1] document.

Wednesday, November 29, 2006 (Permalink)

The W3C XML Schema Patterns for Databinding working group has published the Last Call Working Draft of Basic XML Schema Patterns for Databinding 1.0. This spec attempts to describe a subset of W3C XML schema that is more or less supported by most data binding tools. The problem is that various data binding libraries support different subsets of W3C schema. This spec tries to define a lowest common denominator.


The W3C XML Schema Patterns for Databinding working group has also published the Last Call Working Draft of Advanced XML Schema Patterns for Databinding 1.0. This spec offers provides some data binding schema patterns in common use that nonetheless cause problems for some data binding tools and libraries.

Tuesday, November 28, 2006 (Permalink)

Comments are due by December 31. I haven't had time to scan all the drafts (who does?); but the changes since the last drafts seem like reasonable bug fixes and if nothing too awful is found in these specs, maybe we'll finally have official releases next year.


ActiveState has posted the first beta of Komodo 4.0, a $295 payware IDE for Perl, Ruby, PHP, Python, Tcl, and XSLT. Komodo runs on Mac OS X 10.3 and later, Linux, and Windows.


Gerald Schmidt has released XML Copy Editor 1.0.8.4,a free-as-in-speech (GPL) XML editor for Windows and Linux "with DTD/XML Schema/RELAX NG validation, XSLT, XPath, pretty-printing, syntax highlighting, folding, tag completion/locking and lossless import/export of Microsoft Word documents." This release adds validate-as-you-type.

Monday, November 27, 2006 (Permalink)

Benjamin Pasero has released of RSSOwl 1.2.3, an open source RSS reader written in Java and based on the SWT toolkit. Version 1.2.3 is a bug fix release. RSSOwl is the best open source RSS client I've seen written in Java, though it's decidedly inferior to non-Java clients like Vienna and closed-source clients like NetNewsWire Pro.

Sunday, November 26, 2006 (Permalink)

XimpleWare has released VTD-XML 1.8, a free (GPL) non-extractive Java/C/C# library for processing XML that supports XPath. This appears to be an example of what Sam Wilmot calls "in situ parsing". In other words, rather than creating objects representing the content of an XML document, VTD-XML just passes pointers into the actual, real XML. (These are the abstract pointers of your data structures textbook, not C-style addresses in memory. In this cases the pointers are int indexes into the file.) You don't even need to hold the document in memory. It can remain on disk. This should improve speed and memory usage, but I haven't verified that. Version 1.8 adds XMLModifier for incremental updates, expands the number of XPath built-in functions, and adds support for various ISO 8859 and Windows encodings.

Saturday, November 25, 2006 (Permalink)

AGYNAMIX has released Dopus, a free-as-in-speech (GPL) Docbook framework. According to Torsten Uhlmann, "Dopus is build upon Java and Apache Ant and uses freely available components like Apache FOP, Saxon and Apache Xerces. The components are put together using Apache Ant and a generator.[bat|sh] script which makes generating output a snap." Dopus supports XInclude, XML catalogs, and various customizations. Output formats include HTML, chunked HTML, PDF, Eclipse Help, JavaHelp, and Zip.

Friday, November 24, 2006 (Permalink)

The Apache XML Project has released XML Commons External Components 1.3.04.

xml-commons provides an Apache-hosted set of DOM, SAX, and JAXP interfaces for use in other xml-based projects. Our hope is that we can standardize on both a common version and packaging scheme for these critical XML standards interfaces to make the lives of both our developers and users easier.

The External Components portion of xml-commons contains interfaces that are defined by external standards organizations. For DOM, that's the W3c; for SAX it's David Megginson and sax.sourceforge.net; for JAXP it's Sun. While we could send users to each of the primary sources for these deliverables, keeping our own versions of these in the xml-commons repository gives us a number of advantages:

  • Simplicity of downloads: users get the whole product from one place.
  • Better version control: we can only take fixes we want, and add Apache-specific changes.
  • Better overview documentation of how these interfaces fit into the XML processing world.
  • More chance for cross-project community building within Apache projects.

This release supports the Simple API for CSS (SAC) 1.3. It also adds a SchemaFactoryLoader class. "This class was removed from the JAXP 1.3 specification before it was finalized but was mistakenly included in Java 5. It only exists here (and in JAXP 1.4) for compatibility reasons. Applications should avoid using it." Various bugs are fixed as well.

Thursday, November 23, 2006 (Permalink)

The Apache XML Project has released Commons Resolver 1.2, an open source implementation of the OASIS XML Catalogs 1.1 specification. Catalogs allows you to replace the content at one URI with that fro a different URI at runtime, for instance to swap in a local copy of a DTD. besides adding support for version 1.1, this release fixes bugs.

Wednesday, November 22, 2006 (Permalink)

The XML Apache Project has released Xerces-J 2.9.0, a minor upgrade to the preeminent open source XML parser for Java. "2.9.0 includes the Xalan serializer in its distribution. Xerces and Xalan now share a common serialization codebase. The DOM Level 3 serialization support which was in Xerces was migrated into the Xalan serializer and Xerces' native serializer was deprecated. This release also includes a few minor enhancements and several bug fixes." In particular it now supports W3C XML Schemas 1.1 and OASIS XML catalogs 1.2. (Web site not yet updated, but the files are in the download area.)

Tuesday, November 21, 2006 (Permalink)

The W3C CSS Working Group has posted a new last call working draft of Cascading Style Sheets, level 2 revision 1. According to the abstract,

CSS 2.1 builds on CSS2 [CSS2] which builds on CSS1 [CSS1]. It supports media-specific style sheets so that authors may tailor the presentation of their documents to visual browsers, aural devices, printers, braille devices, handheld devices, etc. It also supports content positioning, table layout, features for internationalization and some properties related to user interface.

CSS 2.1 corrects a few errors in CSS2 (the most important being a new definition of the height/width of absolutely positioned elements, more influence for HTML's "style" attribute and a new calculation of the 'clip' property), and adds a few highly requested features which have already been widely implemented. But most of all CSS 2.1 represents a "snapshot" of CSS usage: it consists of all CSS features that are implemented interoperably at the date of publication of the Recommendation.

CSS 2.1 is derived from and is intended to replace CSS2. Some parts of CSS2 are unchanged in CSS 2.1, some parts have been altered, and some parts removed. The removed portions may be used in a future CSS3 specification. Future specs should refer to CSS 2.1 (unless they need features from CSS2 which have been dropped in CSS 2.1, and then they should only reference CSS2 for those features, or preferably reference such feature(s) in the respective CSS3 Module that includes those feature(s)).

Comments are due by December 7..

Monday, November 20, 2006 (Permalink)

The W3C XML Processing Model Working Group has posted the second public working draft of XProc: An XML Pipeline Language. According to the introduction,

An XML Pipeline specifies a sequence of operations to be performed on a collection of input documents. Pipelines take zero or more XML documents as their input and produce zero or more XML documents as their output. Steps in the pipeline may read or write non-XML resources as well.

A pipeline consists of components. Like pipelines, components take zero or more XML documents as their input and produce zero or more XML documents as their output. The inputs to a component come from the web, from the pipeline document, from the inputs to the pipeline itself, or from the outputs of other components in the pipeline. The outputs from a component are consumed by other components, are outputs of the pipeline as a whole, or are discarded.

There are two kinds of components: steps and (language) constructs. Steps carry out single operations and have no substructure as far as the pipeline is concerned, whereas constructs can include components within themselves.

Standard steps include load, parse, serialize, XSLT, and XInclude. Others may be defined.

Saturday, November 18, 2006 (Permalink)

SyncroSoft has released oNVDL, an open source NVDL implementation based on James Clark's Jing. "NVDL stands for Namespace-based Validation Dispatching Language and it is Part 4 of ISO/IEC 19757 DSDL (Document Schema Definition Languages). It allows specifying sections of XML documents to be validated against different schemas thus enabling the creation of complex documents containing multiple languages without the need to modify the schemas that define each language to take into account the other languages. It allows also mixing different schema types like XML Schema, Relax NG and Schematron. A typical example is a document that contains XForms content inside XHTML."


SyncroSoft has also released <Oxygen/> 8.0, $298 payware XML editor written in Java. Oxygen supports XML, XSL, DTDs, XQuery, SVG, Relax NG, Schematron, and the W3C XML Schema Language. Version 8.0 adds a grid editor, an XML database perspective, and various other small improvements.

Friday, November 17, 2006 (Permalink)

The W3C Web Services Activity has published new working drafts of Web Services Policy 1.5 - Framework and Web Services Policy 1.5 - Attachment. "The Web Services Policy 1.5 - Framework provides a general purpose model and corresponding syntax to describe the policies of entities in a Web services-based system. Web Services Policy Framework defines a base set of constructs that can be used and extended by other Web services specifications to describe a broad range of service requirements and capabilities." "Web Services Policy 1.5 - Attachment, defines two general-purpose mechanisms for associating policies, as defined in Web Services Policy 1.5 - Framework, with the subjects to which they apply. This specification also defines how these general-purpose mechanisms may be used to associate policies with WSDL and UDDI descriptions." But really, Pete Lacey explains everything you need to know about these and other WS-* specs.

Thursday, November 16, 2006 (Permalink)

The W3C Internationalization Tag Set Working Group has published a candidate recommendation of Internationalization Tag Set (ITS) Version 1.0. This document defines standardized XML markup for identifying directionality, translatability, ruby text, and other common aspects of document localization and internationalization. For example, in this DocBook article an its:translate attribute indicates that the author element should not be translated:

<dbk:article
  xmlns:its="http://www.w3.org/2005/11/its" 
  xmlns:dbk="http://docbook.org/ns/docbook" 
  its:version="1.0" version="5.0" xml:lang="en">
 <dbk:info>
  <dbk:title>An example article</dbk:title>
  <dbk:author
    its:translate="no">
   <dbk:personname>

    <dbk:firstname>John</dbk:firstname>
    <dbk:surname>Doe</dbk:surname>
   </dbk:personname>
   <dbk:affiliation>
    <dbk:address>
     <dbk:email>foo@example.com</dbk:email>

    </dbk:address>
   </dbk:affiliation>
  </dbk:author>
 </dbk:info>
 <dbk:para>This is a short article.</dbk:para>
</dbk:article>

The W3C Mobile Web Initiative has published a proposed recommendation of Mobile Web Best Practices 1.0. Here's the summary of the guidelines:

  1. [THEMATIC_CONSISTENCY] Ensure that content provided by accessing a URI yields a thematically coherent experience when accessed from different devices.

  2. [CAPABILITIES] Exploit device capabilities to provide an enhanced user experience.

  3. [DEFICIENCIES] Take reasonable steps to work around deficient implementations.

  4. [TESTING] Carry out testing on actual devices as well as emulators.

  5. [URIS] Keep the URIs of site entry points short.

  6. [NAVBAR] Provide only minimal navigation at the top of the page.

  7. [BALANCE] Take into account the trade-off between having too many links on a page and asking the user to follow too many links to reach what they are looking for.

  8. [NAVIGATION] Provide consistent navigation mechanisms.

  9. [ACCESS_KEYS] Assign access keys to links in navigational menus and frequently accessed functionality.

  10. [LINK_TARGET_ID] Clearly identify the target of each link.

  11. [LINK_TARGET_FORMAT] Note the target file's format unless you know the device supports it.

  12. [IMAGE_MAPS] Do not use image maps unless you know the device supports them effectively.

  13. [POP_UPS] Do not cause pop-ups or other windows to appear and do not change the current window without informing the user.

  14. [AUTO_REFRESH] Do not create periodically auto-refreshing pages, unless you have informed the user and provided a means of stopping it.

  15. [REDIRECTION] Do not use markup to redirect pages automatically. Instead, configure the server to perform redirects by means of HTTP 3xx codes.

  16. [EXTERNAL_RESOURCES] Keep the number of externally linked resources to a minimum.

  17. [SUITABLE] Ensure that content is suitable for use in a mobile context.

  18. [CLARITY] Use clear and simple language.

  19. [LIMITED] Limit content to what the user has requested.

  20. [PAGE_SIZE_USABLE] Divide pages into usable but limited size portions.

  21. [PAGE_SIZE_LIMIT] Ensure that the overall size of page is appropriate to the memory limitations of the device.

  22. [SCROLLING] Limit scrolling to one direction, unless secondary scrolling cannot be avoided.

  23. [CENTRAL_MEANING] Ensure that material that is central to the meaning of the page precedes material that is not.

  24. [GRAPHICS_FOR_SPACING] Do not use graphics for spacing.

  25. [LARGE_GRAPHICS] Do not use images that cannot be rendered by the device. Avoid large or high resolution images except where critical information would otherwise be lost.

  26. [USE_OF_COLOR] Ensure that information conveyed with color is also available without color.

  27. [COLOR_CONTRAST] Ensure that foreground and background color combinations provide sufficient contrast.

  28. [BACKGROUND_IMAGE_READABILITY] When using background images make sure that content remains readable on the device.

  29. [PAGE_TITLE] Provide a short but descriptive page title.

  30. [NO_FRAMES] Do not use frames.

  31. [STRUCTURE] Use features of the markup language to indicate logical document structure.

  32. [TABLES_SUPPORT] Do not use tables unless the device is known to support them.

  33. [TABLES_NESTED] Do not use nested tables.

  34. [TABLES_LAYOUT] Do not use tables for layout.

  35. [TABLES_ALTERNATIVES] Where possible, use an alternative to tabular presentation.

  36. [NON-TEXT_ALTERNATIVES] Provide a text equivalent for every non-text element.

  37. [OBJECTS_OR_SCRIPT] Do not rely on embedded objects or script.

  38. [IMAGES_SPECIFY_SIZE] Specify the size of images in markup, if they have an intrinsic size.

  39. [IMAGES_RESIZING] Resize images at the server, if they have an intrinsic size.

  40. [VALID_MARKUP] Create documents that validate to published formal grammars.

  41. [MEASURES] Do not use pixel measures and do not use absolute units in markup language attribute values and style sheet property values.

  42. [STYLE_SHEETS_USE] Use style sheets to control layout and presentation, unless the device is known not to support them.

  43. [STYLE_SHEETS_SUPPORT] Organize documents so that if necessary they may be read without style sheets.

  44. [STYLE_SHEETS_SIZE] Keep style sheets small.

  45. [MINIMIZE] Use terse, efficient markup.

  46. [CONTENT_FORMAT_SUPPORT] Send content in a format that is known to be supported by the device.

  47. [CONTENT_FORMAT_PREFERRED] Where possible, send content in a preferred format.

  48. [CHARACTER_ENCODING_SUPPORT] Ensure that content is encoded using a character encoding that is known to be supported by the target device.

  49. [CHARACTER_ENCODING_USE] Indicate in the response the character encoding being used.

  50. [ERROR_MESSAGES] Provide informative error messages and a means of navigating away from an error message back to useful information.

  51. [COOKIES] Do not rely on cookies being available.

  52. [CACHING] Provide caching information in HTTP responses.

  53. [FONTS] Do not rely on support of font related styling.

  54. [MINIMIZE_KEYSTROKES] Keep the number of keystrokes to a minimum.

  55. [AVOID_FREE_TEXT] Avoid free text entry where possible.

  56. [PROVIDE_DEFAULTS] Provide pre-selected default values where possible.

  57. [DEFAULT_INPUT_MODE] Specify a default text entry mode, language and/or input format, if the target device is known to support it.

  58. [TAB_ORDER] Create a logical order through links, form controls and objects.

  59. [CONTROL_LABELLING] Label all form controls appropriately and explicitly associate labels with form controls.

  60. [CONTROL_POSITION] Position labels so they lay out properly in relation to the form controls they refer to.


x-port.net has released of formsPlayer 1.5, a free-beer (e-mail address required) "set of modules designed to make it easy to build XForms processors, editors and debuggers. These processors can run on a variety of platforms, using a range of user interfaces." New features in this release include:

  • Tablet PC support
  • Submissions that return text (from XForms 1.1);
  • The new target attribute on submission (from XForms 1.1);
  • An experimental instance-error attribute on submission

Internet Explorer is required.


Gerald Schmidt has released XML Copy Editor 1.0.8.3,a free-as-in-speech (GPL) XML editor for Windows and Linux "with DTD/XML Schema/RELAX NG validation, XSLT, XPath, pretty-printing, syntax highlighting, folding, tag completion/locking and lossless import/export of Microsoft Word documents." This is a bug fix release.

Permalink to This Item

Wednesday, November 15, 2006 (Permalink)

The W3C has released XML Inclusions (XInclude) Version 1.0 (Second Edition). The most significant change is that, "An XInclude processor may, at user option, suppress xml:base and/or xml:lang fixup." Otherwise most of the changes had already been addressed in errata.


The W3C has also launched a registry for unprefixed XPointer schemes. There's some interesting stuff in here such as xpath1, xpath2, string-range, svgView, right, and left. Possibly I'll implement one or two of these in XOM. xpath1 would be useful and not too hard to do.

Tuesday, November 14, 2006 (Permalink)

The W3C Technical Architecture Group (TAG) has published On Linking Alternative Formats To Enable Discovery And Publishing. The problem is:

Content creators wishing to publish multiple versions of a given resource on the Web face a number of questions with respect to how such URIs are created, published and discovered. Questions include:

  • Given a resource http://example.com/ubiquity/ that can be delivered in a multiplicity of representations, how should one publish the relevant URIs to enable automatic discovery of these representations (AKA specific resources)?

  • How does one ensure that the alternative relationship amongst these various representations is available in a machine readable form, and consequently discoverable?

  • Here, multiple representations might include:

    Representations appropriate for different delivery contexts
    Alternative formats of the resource distinguished by Content-type
    Different versions of the resource e.g., either by language or date
    Representations in different languages

This document explores the issues that arise in this context, and attempts to define best practices that help:

  • Preserve the One Web while enabling content publishing to a multiplicity of delivery contexts.

  • Enable the creation of RESTful URIs that remain representation agnostic while delivering the correct end-user experience.

  • Enable automatic discovery of the available representations.

  • Enable web crawlers discover the relationship between a given generic resource and the specific resources that correspond to its various alternatives. This will help search engines build better Web indices and avoid the need to index all available alternatives of a given resource

The suggested solution is:

  1. Create representation-specific URIs (specific resources) for each available alternative (representation_i), e.g., http://example.com/ubiquity/resource/representation_i.

  2. If no content negotiation is in place, serve a canonical representation (generic resource) of the content at http://example.com/ubiquity/resource

  3. With that same URI, use HTTP content-negotiation, along with the correct HTTP VARY headers to serve up the appropriate representation at access time. Ensure that the VARY headers capture the right parameters that were used to choose the representation that is being served — this is important for correct behavior when using cacheing proxies.

  4. As an alternative to the previous step, arrange for the server to generate an HTTP 302 (Found) redirect to automatically serve up http://example.com/ubiquity/representation_i when http://example.com/ubiquity is accessed by user-agent_i. This form of redirect involves an extra client/server round-trip, and may therefore be suboptimal for mobile devices. This is a temporary redirect; the accessing user-agent should continue to use the canonical URI when creating bookmarks, or emailing URI. Finally, note that to optimize link traversal out of the resulting document, the content provider might wish to rewrite relative links to point at the specific resource. This will ensure that later uses of the URI results in expected end-user results; e.g., In the following scenario:

    Cell-phone user emails link
    Recipiant opens message on a desktop
    Clicks on the link

    The user following the link from inside the email message on a desktop browser should receive the desktop version, and not the mobile version. Notice that passing around the canonical URI is critical in achieving this behavior.

    Additionally, contrast this solution with using HTTP content-negotiation with VARY headers; using a redirect to the URI as a specific resource has the advantage of freezing all parameters that were used to choose that representation into the URI.

  5. Use linking mechanisms provided by the representation being served to create links to the other available representations. As an example, when using HTML, one might use a and link elements to advertize the availability of alternate representations. In this context, note that there are two distinct types of such links:

    Links for human consumption that are to be presented to the user
    And links for machine consumption, that are used by the user agent to provide additional functionality.

    As an example, links to available alternatives meant for human consumption might use the HTML a element since these are rendered by user-agents. In contrast, links meant for use by bots might use the HTML link element — as an example, this reflects present practice when publishing pointers to Atom/RSS feeds.

    In either case, notice that following these steps creates a mini-graph comprising of the canonical URI and URIs for its various representations.

This is actually just the solution suggested for one particular use case, but the others are very similar.

This seems wise, and in general points out something I've noticed in designing RESTful systems. The server maintainer needs to be able to freely define resources and invent URLs pointing to those resources. A given resource can have more than one URL, and indeed different parts of one document may be individual resources with their own unique URLs. For example, this page could have one URL (http://www.cafeconleche.org/) and every news item on the page could have its own URL (http://www.cafeconleche.org/news/November_14_2006_35156, http://www.cafeconleche.org/news/November_14_2006_34857). Parts of the page could be updated by PUTting the relevant content to the individual item URLs.

Of course this requires an additional layer of indirection on the server, maybe more than one. The current static file system that serves Cafe con Leche can't really handle this. However fewer and fewer sites are generated out of static files these days anyway. The key is to design the server side systems such that URLs are freely created for everything of interest.

I'm reminded of a problem a lot of my intro to Java students have. They can't figure out how to make two objects talk to each other (often action listeners and the applet they're responding to) so they want to put everything in one class. The proper solution to this problem is to add methods to one or the other of the two classes so the objects can communicate. In the RESTful world of HTTP, when you find you're having trouble sending the server the message you want to send it, the solution is definitely not adding a new method. Rather it's adding a new URI. Don't be afraid of URIs. A good RESTful system will have lots of them.


The W3C Privacy Activity has released the final version of the Platform for Privacy Preferences 1.1 (P3P1.1) Specification as a note, not a recommendation, because "The P3P Specification Working Group is lacking the necessary support from implementers to carry on through the Recommendation Process. Therefor the Working Group decided to publish the current P3P 1.1 Specification as a Working Group Note after a successful Last Call." New features in P3P 1.1 include a mechanism to name and group statements together so user agents can organize the summary display of those policies and a generic means of binding P3P Policies to arbitrary XML to support XForms, WSDL, and other XML applications.

Monday, November 13, 2006 (Permalink)

As part of my continuing e-mail Inbox purging, I have now collected and posted all the user-submitted errata for Effective XML. overall, there's nothing too major here; just lots of little annoying syntax errors. Hopefully I'll get a chance to fix these in a future printing sometime.


Steve Palmer has posted a new beta (2.1.0.2108) of Vienna 2.1, an open source RSS/Atom client for Mac OS X. Vienna is the first reader I've found acceptable for daily use; not great but good enough. (Of course my standards for "good enough" are pretty high.) 2.1 focuses on improving the user interface with a unified layout that lets you scroll through several articles, article filtering (e.g. read all articles since the last refresh), manual folder reordering, a new get info window, and an improved condensed layout.

Sunday, November 12, 2006 (Permalink)

I've finally gotten around to restoring the Fibonacci servers on elharo.com described in Processing XML with Java, after last spring's migration from Linux to Mac OS X. This involved rewriting the services in PHP instead of Java. Tomcat felt largely responsible for the constant failure of the servers over the last few years, and it seemed like way too heavyweight to install and maintain just for these few simple programs.

Saturday, November 11, 2006 (Permalink)

The Unicode Consortium has released Unicode 5.0. Version 5 adds 1,369 new characters for Cyrillic, Greek, Hebrew, Kannada, Latin, math, phonetic extensions, symbols, and five new scripts: Balinese, N’Ko, Phags-pa, Phoenician, and Sumero-Akkadian Cuneiform. In addition it:

makes changes to guarantee case-folding stability. Unicode 5.0 incorporates all the changes introduced in Unicode 4.1, including full interoperability with the most recent versions of GB 18030, JIS X 0213, and HKSCS, and support for stable identifiers and pattern syntax characters.

Unicode 5.0 revises and improves property values and behavioral specifications in areas such as character, word, line, and sentence segmentation, and tightens conformance requirements on Bidi implementations (used for Arabic and Hebrew). The text is significantly revised for clarity and completeness, especially for Unicode conformance.

The printed book is due out a week or two, and online in February, next year. However all the data is available on the Unicode web site now.


Todd Ditchendorf has released GooeySAX 2.0, a Swing GUI wrapped around a SAX parser that allows you to check your local or remote XML documents for well-formedness, validity, or schema-validity.

Friday, November 10, 2006 (Permalink)

BEA has posted a maintenance release of JSR-181 Web Services Metadata for the Java Platform. The changes seem relatively minor.


IBM has published the second maintenance release of Java Specification Request 110, Java APIs for WSDL. This is actually a fairly major, functional update compared to most maintenance releases. Changes include

  1. Read WSDL from a DOM Element specifying the base URI via a WSDLLocator instead of a URI string. See new method WSDLReader.readWSDL(WSDLLocator, Element)
  2. Perform ‘clean up’ of a WSDLLocator object to release system resources and/or prepare it for reuse (e.g. close any open input streams). See new method WSDLLocator.close().
  3. Specify the WSDLFactory implementation class name via a property file located in /META-INF/services. This permits more fine-grained control than the existing system-wide means of specifying the class name, namely; as a JVM system property, via a property file in the JRE/lib directory or via an implementation-specific default. See Javadoc for the WSDLFactory.newInstance() method.
  4. Specify not only the WSDLFactory implementation class, but also the classloader used to load it. See new method WSDLFactory.newInstance(String, ClassLoader).
  5. ‘Flatten’ a WSDL import tree by returning all PortTypes, Bindings or Services declared in the top-level Definition and in any imported Definitions. See the new methods on the javax.wsdl.Definition interface getAllPortTypes(), getAllBindings() and getAllServices().
  6. Explicitly get operations with unnamed input or output messages by specifying the value “:none” for those search parameters (versus the existing API behaviour of specifying null to ignore input or output message name from the search criteria). See the new Javadoc for methods PortType.getOperation(String,String,String) and Binding.getBindingOperation(String,String,String).
  7. For every addXXX method in the JWSDL API ensure a corresponding removeXXX method exists. This will improve the programmatic modification of WSDL definitions.
  8. Relax the restrictions on the use of extensibility elements and attributes imposed by the W3C WSDL
  9. 1 schema (and enforced by JWSDL
  10. 1) and instead permit every WSDL element to be extensible by elements or attributes, as per the WS-I Basic Profile
  11. 1 requirements. This will enable JWSDL to support the new JAX-WS specification.
  12. SOAP 2.1 binding extensions. See the new package javax.wsdl.extensions.soap
  13. WSDLException.toString() changed so that it no longer prints the stack trace, but instead just returns the ‘short message’, as described for the Exception.toString() method in Java 1.4
  14. The minimum supported Java level is now Java 1.4. It was previously Java 1.2

Nokia and Sun have posted the public review draft of JSR-280 XML API for JavaTM ME . to the Java Community Process (JCP). This attempts to subset SAX, StAX, JAXP, and DOM to run in small devices. This strikes me as such as bad idea, it's hard to believe they're serious. If size is such a concern (and in small devices it is) then pick one API and stick with it; or design a new one. Don't try to pull out half of each. Including both SAX and StAX in particular really smells of design by committee. There's no need to force both on device vendors. Pick one and be done with it. Comments are due by November 13.


Sun has posted the maintenance review change log for JSR 222: Java Architecture for XML Binding 2. Changes appear quite large, beyond what I'd expect from a real maintenance draft. For example,

Today, it just takes too many lines to do a simple stuff with JAXB. One typical example is the following code for reading XML from a file:

  File in = new File("FamilyData.xml");
  JAXBContext jc = JAXBContext.newInstance("kgh.geneology.xml");
  Unmarshaller u = jc.createUnmarshaller();
  JAXBElement junk = (JAXBElement)u.unmarshal(fin);
  DataFile df = (DataFile) junk.getValue();

(and if you count exception handling, add at least 4 lines for that.) Think of the DataFile class as the top-level class generated by a schema compiler.

We need convenience methods that focus on typical simple use case.

Proposed Solution

Define the JAXB class and the unmarshal/marshal methods on them. Those methods generally look like this:

    public static <T> T unmarshal( SOMETHING xml, Class<T> type );
    public static void marshal( Object jaxbObject, SOMETHING xml );

These methods do not throw checked exceptions.

Comments are due by November 27.


IBM has posted the public review draft of JSR-106 XML Digital Encryption APIs. to the JCP. This is a Java API for W3C Recommendation, XML Encryption Syntax and Processing. Interestingly it is model-independent. A DOM reference implementation is planned, but it should be possible to also write implementations for XOM, JDOM, and so forth. At first glance, it seems overly complex, however. I do plan to eventually add XML signatures to XOM, but when and if I do, I expect I'll design my own simpler API.


Alan Ezust has uploaded the eighth pre-release of jEdit 4.3, an open source programmer's editor written in Java with extensive plug-in support and my preferred text editor on Windows and Unix. This release fixes bugs and cleans up the API.


Advanced Software Production Line has posted LibAxl 0.27, an open source (LGPL) XML parser for Linux written in ANSI C. It uses its own custom API rather than one of the standards. At first glance, it does not appear to have namespace support.


Sun has posted the second maintenance review change log for JSR 224: Java API for XML-Based Web Services 2. At first glance the changes appear to mostly be in the documentation, rather than substantive. Comments are due by November 27.

Thursday, November 9, 2006 (Permalink)

The Mozilla Project has released Firefox 1.5.0.8, Thunderbird 1.5.0.8, and SeaMonkey 1.0.6. These releases fix security flaws, and all users should upgrade. You should be able to upgrade just by going to Help/Check for Updates..., though when I tried that in Thunderbird it failed:

Failed (unknown reason)

For Firefox users, I recommend jumping straight to 2.0. It's a much nicer browser that doesn't have the flaws in the first place.


The Mozilla Project has also posted the first beta of SeaMonkey 1.1. SeaMonkey is the continuation of the integrated Mozilla suite, and has XML support roughly equivalent to Firefox 1.5 (e.g. XML, XSLT, CSS, XHTML, etc.) It also bundles an e-mail client, web editor, browser, and more into one application. Particularly notable in this release is that "Message labelling has been superceded by tagging, which provides much more than the original 5 labels and comes with new preferences." That's almost enough to make me switch, but I suspect I'll hang on with Thunderbird until version 2.0 or Eudora 7 comes out. The current Thunderbird is crash prone, feature-poor, has horrible user interaction, and is ridiculously slow for no good reason but at least it doesn't seem to lose my e-mail.

Wednesday, November 8, 2006 (Permalink)

IBM developerWorks has published my latest article, Simple Xalan extension functions: Mixing Java with XSLT. Xalan can invoke almost any method in almost any Java class in the classpath. Taking advantage of this can improve performance, provide features like trigonometric functions that aren't available in XSLT, perform file I/O, talk to databases and network servers, or implement algorithms that are easy to write in Java but hard to write in XSLT. This article teaches the basics of invoking Java code from Xalan.

Tuesday, November 7, 2006 (Permalink)

Adobe has open sourced the JavaScript engine in Flash under the auspices of the Mozilla Foundation. I don't think this is all of Flash, but I could be wrong abnout that. (I'm not a big Flash person.) Specifically, they are releasing

the ActionScript™ Virtual Machine, the powerful standards-based scripting language engine in Adobe® Flash® Player, to the Mozilla Foundation. Mozilla will host a new open source project, called Tamarin, to accelerate the development of this standards-based approach for creating rich and engaging Web applications.

The Tamarin project will implement the final version of the ECMAScript Edition 4 standard language, which Mozilla will use within the next generation of SpiderMonkey, the core JavaScript engine embedded in Firefox®, Mozilla’s free Web browser. As of today, developers working on SpiderMonkey will have access to the Tamarin code in the Mozilla CVS repository via the project page located at www.mozilla.org/projects/tamarin/. Contributions to the code will be managed by a governing body of developers from both Adobe and Mozilla.

This code is licensed under the same Mozilla tri-license (MPL/GPL/LGPL) as other Mozilla code. They even beat Java out the door. Isn't that ironic?


The Eclipse Project has released the Web Tools Platform 1.5.2. I've tried this out in the past, and found it to be a hideous mess; and probably a good case study in how not to design a GUI app. This is a bug fix release, but really what this project needs is to be taken out behind the barn and shot.

Monday, November 6, 2006 (Permalink)

Alain Frisch has released XStream, a small language for XML transformation. "Transformations written in XStream are compiled into efficient XML stream processors: the output is computed and produced while the input is being parsed, which makes it possible to run some transformations on very big XML documents which could not even fit in memory. Though XStream is mostly intended as a back-end for higher-level languages, it is also possible to use it directly. The language features ML-like pattern matching and higher-order functions, but no types." The XStream compiler is distributed under the CeCILL license. I haven't heard of that before, but apparently it's a specifically French free software license that's compaptible with the GPL.


Andrea Marchesini has released libnxml 0.15, a C library for parsing, writing, and creating XML 1.0 and 1.1. Version 0.15 fixes a proxy authentication bug. libnxml is published under the LGPL.

Sunday, November 5, 2006 (Permalink)

The W3C XForms working group has posted the fourth public working draft of XForms 1.1. Changes since 1.0 include:

  • A new namespace URI, http://www.w3.org/2004/xforms/
  • power, luhn, current, choose, id and property XPath extension functions
  • An email address datatype
  • An ID card number datatype
  • A prompt action element
  • An xforms-close event
  • An xforms-submit-serialize event
  • Inline rendering of non-text media types

Major changes I noted in this draft include:

  • For element instance, XForms 1.1 provides a new optional attribute resource that provides an xsd:anyURI link to externally defined initial instance data.
  • The days-to-date(number) and seconds-to-dateTime(number) function returns a string containing an xsd:date for the date that occured that many days/seconds since midnight January 1, 1970.
  • The local-date() function returns an xsd:date for the current time in the current time zone.
  • encode() and decode() functions for converting strings to and from hex or base-64.
  • digest() and hamc() functions for calculating hash functions.
  • A random() generates a uniformly distributed random or pseudorandom number between 0.0 and 1.0.
  • XPath expressions can be used to specify values.
  • The submission element allows an optional serialize attribute. If the value of this attribute is false, then instance data is not serialized or submitted.
  • The replace attribute of submission supports the additional value of text. If this setting is made, and the submission response conforms to an XML mediatype (as defined by the content type specifiers in [RFC 3023]) or a text media type (as defined by a content type specifier of text/*), then the response data is encoded as text and replaces the content of the replacement target node.
  • HTTP headers can be controlled from the XForm submission.
  • Support help and hint in item and choices has been removed.

The W3C Web API Working Group has updated Document Object Model (DOM) Level 3 Events Specification working draft. "This specification defines the Document Object Model Events Level 3, a generic platform- and language-neutral event system which allows registration of event handlers, describes event flow through a tree structure, and provides basic contextual information for each event. The Document Object Model Events Level 3 builds on the Document Object Model Events Level 2."

Saturday, November 4, 2006 (Permalink)

Version 5.2.0 of PHP has been released. Along with numerous bug fixes and API additions, this release adds several new functions to xmlReader pull parser including readInnerXml(), setSchema(), readOuterXML(), and readString().


The Omni Group has released OmniWeb 5.5.1, a $29.95 payware web browser for Mac OS X that supports the core parts of XML on the Web including XSLT and CSS. This is a bug fix release.

Friday, November 3, 2006 (Permalink)

Myron Turner has released XML_PullParser for PHP 1.3.1, a "token-based interface to the PHP expat XML library. It is modeled in part on the PullParser module found in the Perl HTML::Parser distribution. It moves the API from an event-based model to a token-based model. Instead of processing data as it is passed from the parser to callbacks, a script using XML_PullParser requests "tokens" from various "tokenizing" functions. Tokens are arrays representing XML structures, which become available in the order in which they appear in the document being parsed. In addtion to the tokenizers, a rich set of accessors are provided to extract data from the elements and attributes bundled in the tokens. There are also techniques and class methods for selecting elements and attributes, and for testing for their position and relevancy. Finally, there are package-level functions to set the contexts that affect the operations of the module."

Thursday, November 2, 2006 (Permalink)

IBM developerWorks has published my latest article, Why XForms? An apologia and exegesis. This is an unusual article for me because it's almost totally non-tutorial in focus. In fact, I don't show a single line of XForms code. Instead, this article explains what XForms is attempting to do and what domains and problems it's appropriate for. If you're just looking for a slightly better HTML form, I'm not sure XForms is worth its cost; but if you need something more, it may be.


The Mozilla Project has posted version 0.7 of its XForms extension for Firefox 1.5 and later. Mozilla XForms support has been developed by IBM, Novell, and independent contributors. Improvements in this release include partial support for attribute-based repeats, improved accessibility for controls, and improved schema support. It's not a complete XForms implementation yet, but it's getting there.


Orbeon has posted the first Milestone of the Orbeon Presentation Server (OPS) 3.5. OPS is an open source, server-based XForms implementation that delivers standard HTML+JavaScript to clients, with a hefty does of AJAX thrown in for good measure. OPS is published under the LGPL.


Eric van der Vlist has released TreeBind, "a generic hierarchical data model binding API that is now supporting RDF and LDAP in addition to XML and Java objects". TreeBind is published under the GNU Lesser General Public License.

Wednesday, November 1, 2006 (Permalink)

The W3C Schema working group has published the first public working draft of Guide to Versioning XML Languages using XML Schema 1.1. According to the introduction:

creating and using multiple versions of a language is common and useful. As described, extensibility is a key contributor to versioning. It can enable forwards and backwards compatible versioning. The majority of this guide focuses on Schema 1.1 extensibility techniques that enable forwards-compatible versioning. In schema terms, this is when a schema processor with an older schema can process and validate an instance that is valid against a newer schema.


The W3C GRDDL Working Group has posted the first public working draft of Gleaning Resource Descriptions from Dialects of Languages (GRDDL). According to the abstract,

GRDDL is a mechanism for Gleaning Resource Descriptions from Dialects of Languages. This GRDDL specification introduces markup for declaring that an XML document includes gleanable data and for linking to an algorithm, typically represented in XSLT, for gleaning the resource descriptions from the document.

The markup includes a namespace-qualified attribute for use in general-purpose XML documents and a profile-qualified link relationship for use in valid XHTML documents. The GRDDL mechanism also allows an XML namespace document (or XHTML profile document) to declare that every document associated with that namespace (or profile) includes gleanable data and for linking to an algorithm for gleaning the data.

The result of such a glean is an RDF description of the document.


Scott Stanchfield has released ANTXR, an ANTLR-based XML parser. ANTXR is distributed under the Eclipse Public License.


The Lucene Apache Project has released Nutch 0.81, an open source web-search engine based on Lucene Java but adding web-specific tools including a web crawler, a link-graph database, an HTML parser, and so forth.


SysOnyx has posted a beta of xmlDig, a tool that queries a database with a valid SQL statement and returns an XML result set.


The W3C Technical Architecture Working Group (TAG) has published a draft finding on Passwords in the Clear:

The purpose of this finding is to clarify the security concerns around using passwords on the world wide web.  Specifically, the objective is to point out a few conclusions the TAG has come to;

1) Passwords MUST NOT be transmitted in clear text.
2) Passwords MUST  use password masking when displayed in the html form

The purpose of this paper to explain these findings and give direction around possible alternatives.

I guess HTTP basic auth is dead then.


The W3C Multimodal Interaction Working Group has published the last call working draft of the Ink Markup Language. According to the abstract,

The Ink Markup Language serves as the data format for representing ink entered with an electronic pen or stylus. The markup allows for the input and processing of handwriting, gestures, sketches, music and other notational languages in applications. It provides a common format for the exchange of ink data between components such as handwriting and gesture recognizers, signature verifiers, and other ink-aware modules. ...

This fourth version of the Working Draft includes a few conceptual changes to simplify the definition while achieving greater expressive power. It also contains many small changes of details to make element and attribute use uniform accross the the definition to make it easier to learn and simpler to process.

The main changes are:

  • InkML now more robustly supports program transformations. The text has been revised to remove any requirement for a particular element order in archival ink. This allows applications to regroup and organize traces into logical structures without losing information.
  • InkML now more robustly supports streaming. The content model of the top-level ink element has been relaxed to allow interspersion of more definitional elements. The definition of continuation traces has been simplified.
  • InkML now better supports optical devices and other technologies. The language has been revised to be technology neutral, where possible, and to keep technology-specific concepts localized to specific elements.
  • There is greater support for applications to use InkML as a representation for their own application-defined structures. Trace groups and trace references can be nested, allowing applications to group ink into logical units, if desired. This may be done to the explicit ink traces or by reference.
  • The support for annotation has been enhanced to allow arbitrary textual or XML-based annotation. This provides sufficient hooks for rich semantic annotation of ink while keeping the standard simple. The model is based on experience with MathML.
  • The concepts of trace formats and capture devices have been more clearly distinguished. Trace formats can be used to describe all the logical properties of an ideal channel. They are used to describe traces and the coordinates of shared canvases. Consequently, the channel element has a richer set of attributes. Capture devices are now seen as "ink sources" which may additionally describe other characteristics of the ink source, such as accuracy, latency, channel cross-coupling, etc.
  • The notions of canvas transformations and channel mappings have been converged into a single mapping type. As a consequence, applications may agree on more general coordinate systems for shared canvases. (For example, they may share tip force information.)

Several changes of detail have been made to support the above, to make the naming and use of elements and attributes consistent, and to remove duplication.

My main objection to this spec is that it embeds lots of non-XML markup that you have to write your own parser for, rather than using an XML parser.

<trace id = "id4525abc">
   1125 18432,'23'43,"7"-8,3-5,7 -3,6 2,6 8,3 6 T,2 4*T,3 6,3-6 F F
</trace>

<sarcasm>Gee, that's not the least bit opaque.</sarcasm>. This looks like the SVG mistake all over again. I wrote about this in Item 11 of Effective XML, "Make Structure Explicit through Markup.".


Steve Palmer has posted a new beta (2.1.0.2107) of Vienna 2.1, an open source RSS/Atom client for Mac OS X. Vienna is the first reader I've found acceptable for daily use; not great but good enough. (Of course my standards for "good enough" are pretty high.) 2.1 focuses on improving the user interface with a unified layout that lets you scroll through several articles, article filtering (e.g. read all articles since the last refresh), manual folder reordering, a new get info window, and an improved condensed layout.


Axizon has released the Tiger XSLT Mapper, a $399 payware Java tool that provides a visual interface for creating XSLT stylehseets by mapping expected inputs to desired outputs.

Tuesday, October 31, 2006 (Permalink)

Alan Ezust has released version 2.0 of the jEdit XML plugin. It "includes the Docbook 4.2 DTD, but will validate against any DTD or XML schema. It recognizes OASIS catalog files, and provides completion for elements and attributes, in XML, HTML, and CSS. And it also has a javascript structure browser too." Download it using the jEdit plugin manager. Java 5 and jEdit 4.3pre5 are required.


Bare Bones Software has released version 8.5.1 of BBEdit, my preferred text editor on the Mac, and what I'm using to type these very words. New features include support for support for Ruby, SQL, and YAML; code folding; HTML Format, Translate and Tidy; and autosave. BBEdit is $199 payware. 8.5.1 is a bug fix release. Mac OS X 10.3.9 or later is required.


Norm Walsh has published the ninth beta of DocBook 5.0. DocBook 5 is "a significant redesign that attempts to remain true to the spirit of DocBook." The schema is written in RELAX NG. A DTD and W3C XML Schema generated from the RELAX NG schema are also available. There's also a Schematron schema "that validates some extra-grammatical DocBook constraints. These patterns are also present directly in the RELAX NG Grammar and some validators, for example MSV, can perform both kinds of validation at the same time." Changes in this beta are quite minor, and include allowing jobtitle inline, requiring titles for task elements, and making the targetdoc attribute optional on olink.

Monday, October 30, 2006 (Permalink)

Uche Ogbuji has released 4Suite XML 1.0, an open source "comprehensive library for XML processing. It is implemented in Python and C and supports XML (SAX-like and DOM-like), XPath, XSLT, RELAX NG, XUpdate, XInclude, XPointer, and more."


The W3C Device Independence Working Group has posted a second last call working draft of Content Selection for Device Independence (DISelect) 1.0. They've also split out the XPath parts into a separate document, Delivery Context: XPath Access Functions. Finally they've started work on a primer about all this, though it's mostly empty at this point.

According to the abstract, "This document specifies a syntax and processing model for general purpose content selection or filtering. Selection involves conditional processing of various parts of an XML information set according to the results of the evaluation of expressions. Using this mechanism some parts of the information set can be selected for further processing and others can be suppressed. The specification of the parts of the infoset affected and the expressions that govern processing is by means of XML-friendly syntax. This includes elements, attributes and XPath expressions."

That sounds unobjectionable, but what the working group is really proposing is XML markup that can be added to a page to indicate which devices certain content is appropriate for. For example, this sel:if element says that the image should only be displayed if the user's device supports color or has a window size wider than 500 pixels.

<div sel:expr="dc:cssmq-width('px') &gt; 500" 
    & dc:cssmq-color() > 0" >
  <object src="picture.png"/>
</div>

This feels more than a little like presentation based markup. This is very much like using JavaScript or server side programs to identify different browsers and send them content tailored specifically to them. This syntax is definitely easier-to-use, and more powerful than the various JavaScript and server-side hacks people use today; but should we be doing this at all? Whatever happened to the vision of sending browsers XML documents with appropriate stylesheets and letting the client decide how to best present it? The thing that bothers me the most about this proposal is that the syntax mixes the presentation information straight into the document, rather than linking to it from a separate hints sheet. In many ways, this document seems to reflect a belief that the W3C has been going down the wrong road for the last eight years in attempting to separate content from presentation.


Bill de hÓra and Joe Gregorio have posted the eleventh public working draft of The Atom Publishing Protocol, a REST-based system for communicating with weblog servers. Significant examples here seem to be related to the handling of collections, media collections, and their metadata. Major changes in this draft include:

  • "Introspection documents" are now called "service documents".
  • Category documents have been added.
  • Media resources (e.g. non-Atom jpegs, MP3s, etc.) are now possible.
  • Slug headers can be specified for use by the server in picking a URL.

Alex Milowski has released Atomic, a Firefox extension that supports the Atom Publishing Protocol (APP). APP is a RESTful protocol for publishing web logs and similar content repositories.


Brendan Taylor has released atom-tools, a Ruby library that "provides an easy way to manipulate entries and feeds along with the HTTP bits needed for an APP client or server. It includes a small APP suitable for testing clients.


William F. Hammond has posted gellmu 0.8.3, "a LaTeX-like way to produce article-level for online display in the modern, fully accessible, form of HTML extended by the World Wide Web Consortium's Mathematical Markup Language (MathML)."

Sunday, October 29, 2006 (Permalink)

Matt Mullenweg has released Wordpress 2.0.5, an open source blog engine based on PHP and MySQL. 2.0.5 fixes a dozen or so assorted bugs. WordPress is published under the GPL.

I use WordPress to power The Cafes and Mokka mit Schlag. It's got a lot to recommend it including the user interface and themability. However it has some serious problems with HTTP, XML, and security that the developers are in denial about. It may (or may not) be the best open source blog engine available today, but it's certainly not even close to the best one that's possible.

Saturday, October 28, 2006 (Permalink)

Bruno Lowagie has released iText 1.4.6, an open source Java library for generating PDF, XML, HTML, and RTF documents. It can also parse XML documents and convert them into any of these formats. Pages of existing PDF files can be imported and copied to new PDF documents. iText is published under the Mozilla Public License. Java 1.4 or later is required.


Sonic Software has released Stylus Studio 2007 XML Enterprise Suite, a $895 payware XML editor for Windows. Features include:

  • XML differencing
  • XSLT debugging
  • XSLT mapping
  • XSLT profiling
  • XSL:FO
  • XQuery editing, mapping, and debugging.
  • XML Schema Editor
  • Document Type Definition (DTD) Editor
  • XPath Evaluator
  • XPath Expression Generator
  • Web Service Call Composer
  • UDDI Registry Browser
  • Tools for mapping to and from XML documents, Web service data, relational data, and flat files
  • Import/export utilities for RDBMS, XML, CSV, ADO, and flat files
  • JSP Editor
  • RenderX XEP Personal Edition XSL-FO processor bundled
  • An XPath Query Editor,
  • Java APIs for accessing EDI, X12, EDIFACT and other legacy data formats.

New features in this release include

  • XML Pipeline support
  • An XML report designer
  • Data Conversion APIs

Hermitech Laboratory has released Formulator 3.7 MathML Suite, a $40 payware MathML editor. They've also published free MathML rendering plugins for Internet Explorer and XML Spy.


The Apache Web Services Project has posted version 0.5.2 of JaxMe 2, an open source implementation of the Java API for XML Binding. Quoting from the web page,

JaxMe 2 is an open source implementation of JAXB, the specification for Java/XML binding.

A Java/XML binding compiler takes as input a schema description (in most cases an XML schema but it may be a DTD, a RelaxNG schema, a Java class inspected via reflection or a database schema). The output is a set of Java classes:

  • A Java bean class compatible with the schema description. (If the schema was obtained via Java reflection, then the original Java bean class.)
  • An unmarshaller that converts a conforming XML document into the equivalent Java bean.
  • Vice versa, a marshaller that converts the Java bean back into the original XML document.

In the case of JaxMe, the generated classes may also

  • Store the Java bean into a database. Preferably an XML database like eXist, Xindice, or Tamino, but it may also be a relational database like MySQL. (If the schema is sufficiently simple. :-)
  • Query the database for bean instances.
  • Implement an EJB entity or session bean with the same abilities.

This release adds an xmlCatalog element and fixes assorted bugs.

Friday, October 27, 2006 (Permalink)

In case you missed it in the recommended reading, please check out, Chameleon schemas considered harmful. Something very weird is going on in the XHTML 2/XForms space, and either I'm totally misunderstanding what they're up to, or they've gone completely off the rails. I'm not the only one who thinks this either. I've been hearing from working group members who are perhaps not quite as befuddled about this as I am, but who seem equally perturbed by the developments. I really want to figure out why the XHTML working group is doing what they're doing (and why the XForms working group is going along with them). It's hard to credit that the reason could possibly be as trivial as it seems to be, but so far no one has suggested anything else. Possibly I should bring this issue to the TAG for discussion, but I want to make sure I understand it first.


Daniel Veillard has released version 2.6.27 of libxml2, the open source XML C library for Gnome. He's also released version 1.1.18 of libxslt, the Gnome Project's XSLT library for C. These releases fix assorted bugs.


Andrea Marchesini has released libnxml 0.14, a C library for parsing, writing, and creating XML 1.0 and 1.1. Version 0.13 adds support for basic authentication and proxy authentication. libnxml is published under the LGPL.

Thursday, October 26, 2006 (Permalink)

Infoblazer LLC has released the XX Framework 1.1, a configurable, open source (LGPL), XML/XSL-centric implementation of the Model-View-Controller development paradigm. According to the announcement:

The primary goal of the XX Framework is to handle typical application CRUD (create, retrieve, update, delete) with little or no Java programming. Instead of telling the application how to retrieve and how to display the data, we configure what to retrieve (through XML) and what to display (through XSLT).

This approach generally leads to a simpler and more elegant solution that a purely procedural approach. Where the applications needs more than simple CRUD, additional business logic can be easily incorporated into the process. Some additional features of the framework are configurable data caching, thread pooling, and web service integration.

Some benefits of the framework are:

  • Extremely simple to use
  • Built around open web standards, including J2EE, XHTML, XML, XSL, CSS
  • Uses XSL and CSS as the application's View layer, allowing total separation of presentation from back end concerns. Page-focused/HTML templating approaches rarely achieve this separation
  • Configurable data caching for optimal performance
  • Automated data persistence (CRUD). 80% of a typical web app can be built with no Java code
  • Uses a Portal-based approach to page design, allowing easy compartmentalization of functionality
  • Integration with web services
  • Reuse common classes and operations for pre-built functionality
  • Enabled caching and thread pooling for greatly increased performance

The framework promotes a use case oriented development approach. In this approach, use cases are defined for each task the user will perform. In general, each use case will be implemented by a single logical servlet, as defined in the J2EE Specification. The logical servlet may be implemented by one or more implementation classes each implemented a distinct portion of that use case and providing a portion of the resultant display.

The developer simply needs to write implementation of for these classes. Configuration files determine which implementation classes are called based on user click events. The most common implementation approach has each class return an XML result, yielding a set of XML documents for each use case.. XSL transformation is then applied to the XML results, each transform providing a portion of the desired display. A single JSP page is then used to display the final product.

The framework then builds upon this foundational approach to provide automation of typical application tasks, such as add, update, delete, select of records from a database. By specifying a simple mapping from the HTML page on one end, through the middle layers, and to the database on the other end, a large subset of application functionality can be achieved without the need to write any Java code. Instead, a combination of XML configuration files, XSL transformation templates, as well as open source tools, namely Hibernate and Castor, are used.

The goal of the framework is to incorporate more and more common programming tasks, in an open, configurable, and generic manner. Furthermore, since much of the framework if based on XML and XSL, automatic generation of complete applications is achievable.


Edwin Dankert has posted the second beta of XML Hammer 1.0, a GUI program written in Java and based on JAXP 1.3 for checking well-formedness, validating, transforming, and querying XML documents. XML Hammer is published under the Mozilla Public License 1.1.

Wednesday, October 25, 2006 (Permalink)

The Mozilla Project has released Firefox 2.0, a open source web browser for Windows, Mac OS X, and Linux that supports XML, XHTML, XSLT, and SVG. New features in 2.0 include:

  • Anti-Phishing Protection.
  • Search suggestions now appear with search history in the search box for Google, Yahoo! and Answers.com
  • Scrollable tabs
  • Ability to re-open accidentally closed tabs
  • Better support for previewing and subscribing to web feeds
  • Inline spell checking in forms (ironically the first word it flagged for me as misspelled was "Firefox")
  • Search plugin manager
  • Microsummaries feature for bookmarks
  • Automatic restoration of your browsing session if there is a crash
  • New combined and improved Add-Ons manager for extensions and themes
  • New Windows installer
  • JavaScript 1.7
  • Client-side session and persistent storage (a really hideous idea, sure to be misused)
  • svg:textPath

I've been using Firefox 2 as my primary browser since beta 1, and it's been pretty smooth over all. The inline spell checking is indispensable, and worth the upgrade alone. The restore session option is also quite nice. As of the last release candidate all my important extensions now seem to work in this version.


Werner Guttmann has released Castor 1.0.4, an open source (BSD license) data binding tool for XML and Java. Castor can marshal and unmarshal XML documents into Java objects, and store those objects in SQL databases. Automatic generation of Java classes from W3C XML schema language schema is supported, though that doesn't seem to be required. This is a bug fix release.

Tuesday, October 24, 2006 (Permalink)

Google has launched Google Coop, a customized search engine service for sites like this one. I've been using Google searches here for a while. The search box on the right is just a simple form that links to regular Google. However, now they're letting me customize it more, give preference to links to this site without excluding other sites, change the look and feel of the search results (or even host my own), and get kickbacks^H^H^H^H^H^H^H^H^H referral fees for the ads on the search results page. I'm still playing with this. I haven't yet figured out how to set the look and feel to fit into the sidebar, but in the meantime here's a basic search box. Try it out and see what you think:

OK. I've figured out how to hack the search box code so that it fits nicely in the sidebar. The next question is just whether that violates the Google Coop terms of service or not. They tend to be picky about things like that. Looks like that might be OK. I don't see anything in the terms of service that suggests they mind this.

I may also have to configure this so it only searches the sites I specify. I don't mind it giving out additional sites that may be helpful, but it doesn't always place hits from my sites on the first page; and if someone uses the search box on my page that's probably what I want. For instance, I searched for "Downs", an unusual string that occurs about twice on all the sites (both in the last week or two) and it didn't find either of those hits. Instead, it found irrelevant sites about Downs syndrome, and companies named Downs.

Monday, October 23, 2006 (Permalink)

Dave Beckett has released the Raptor RDF Parser Toolkit 1.4.13, an open source C library for parsing the RDF/XML, N-Triples. Turtle, and Atom Resource Description Framework formats. It uses expat or libxml2 as the underlying XML parser. This is a bug fix release. Raptor is dual licensed under the LGPL and Apache 2.0 licenses.

Sunday, October 22, 2006 (Permalink)

The W3C CSS Working Group has published a new public working draft of the CSS Print Profile. "This specification defines a subset of Cascading Style Sheets Level 2, revision 1 [CSS21] and CSS3 Module: Paged Media [PAGEMEDIA] for printing to low-cost devices. It is designed for printing in situations where it is not feasible or desirable to install a printer-specific driver, and for situations were some variability in the output is acceptable. This profile is designed to work in conjunction with XHTML-Print [XHTMLPRINT] and defines a minimum level of conformance as well as an extension set that provides stronger layout control for the printing of mixed text and images, tables and image collections."

Friday, October 20, 2006 (Permalink)

IDEAlliance is offering one XML 2006 scholarship for a paper submitted by a student enrolled in any degree or diploma program at a post-secondary institution. The winner will receive a one-time award of $1,000 and an invitation to present her or his paper at the XML 2006 conference in Boston, MA (December 5-7). The winner will also receive free conference registration, a travel stipend of $500, and two nights' hotel accommodation in Boston during the conference. Submissions are due by Monday, October 30.


Michael Smith has released version 1.71.1 of the DocBook XSL stylesheets. According to Smith,

This is a minor update to the 1.71.0 release. Along with a number of bug fixes, it includes two feature changes:

  • Added support for profiling based on xml:lang and status attributes.
  • Added initial support in manpages output for footnote, annotation, and alt instances. Basically, they all now get handled the same way ulink instances are. They are treated as a class as "note sources": A numbered marker is generated at the place in the main text flow where they occur, then their contents are displayed in an endnotes section at the end of the man page.

Pavel Sher has posted Juxy 0.8, "a simple unit testing library for XSLT written in Java. Juxy allows to call or apply individual XSLT templates from Java and does not use any specific features of XSLT processor for that purposes. It relies entirely on TRaX API and should work with any TRaX compliant XSLT processor." Version 0.8 adds W3C schema validation and an XPathAssert class. Juxy is published under the Apache 2.0 license. Java 1.4 or later is required.

Thursday, October 19, 2006 (Permalink)

Microsoft has released Internet Explorer 7 (Windows XP only). According to general manager Dean Hachamovitch:

The Phishing Filter and the architectural work in IE7 around networking and ActiveX opt-in will help keep users more secure. IE7 also delivers a much easier browsing experience with features like tabbed browsing (especially with QuickTabs), shrink-to-fit printing, an easily customizable search box, and a new design that leaves more screen real estate for the web site you’re viewing. IE7’s CSS improvements are incredibly important for developers as many of you have made quite clear. I also think IE7’s RSS experience and platform are important, powerful, and innovative."

Five years ago this release might have been state-of-the-art. Today, it's a browser that still doesn't fully support CSS, still doesn't recognize the right MIME types for XHTML and XSLT, still doesn't pass the Acid 2 test, still doesn't support SVG, MathML, or XForms, and can only be considered an improvement by comparison to previous versions of itself. There's nothing here that will impress Firefox or Safari users.

Wednesday, October 18, 2006 (Permalink)

The Mozilla Project has posted the third release candidate of Firefox 2.0. New features in 2.0 include:

  • Anti-Phishing Protection.
  • Search suggestions now appear with search history in the search box for Google, Yahoo! and Answers.com
  • Scrollable tabs
  • Ability to re-open accidentally closed tabs
  • Better support for previewing and subscribing to web feeds
  • Inline spell checking in forms (ironically the first word it flagged for me as misspelled was "Firefox")
  • Search plugin manager
  • Microsummaries feature for bookmarks
  • Automatic restoration of your browsing session if there is a crash
  • New combined and improved Add-Ons manager for extensions and themes
  • New Windows installer
  • JavaScript 1.7
  • Client-side session and persistent storage (a really hideous idea, sure to be misused)
  • svg:textPath

It's not immediately clear what's changed since RC 2, but presumably bugs were fixed. I've been using Firefox 2 as my primary browser since beta 1, and it's been pretty smooth over all. The inline spell checking is indispensable, and worth the upgrade alone. The restore session option is also quite nice. The only problem I've encountered are extensions that don't yet support FireFox 2.

Tuesday, October 17, 2006 (Permalink)

Tonight, Tuesday October 17, I'll be joining the monthly meeting of the XML Developers Network of the Capital District in Albany, New York to talk about RSS, Atom, APP, and All That. The meeting runs from 6:00 to 8:30 P.M. Everyone's invited. The meeting is free and open to the public.


Next Tuesday, October 24, I'll be attending the New York PHP Users Group meeting to hear my friend Ken Downs talk about Andromeda, a system for generating database applications in PHP. I'm trying to convince him to change the description format from plain text and regular expressions (bleah) to clean, parsed XML. RSVP required.

Monday, October 16, 2006 (Permalink)

XimpleWare has released VTD-XML 1.7, a free (GPL) non-extractive Java library for processing XML that supports XPath. This appears to be an example of what Sam Wilmot calls "in situ parsing". In other words, rather than creating objects representing the content of an XML document, VTD-XML just passes pointers into the actual, real XML. (These are the abstract pointers of your data structures textbook, not C-style addresses in memory. In this cases the pointers are int indexes into the file.) You don't even need to hold the document in memory. It can remain on disk. This should improve speed and memory usage. Current tree models typically require at least 3 times the size of the actual document, more often more. Using a model based on indexes into one big array might allow these to reduce their requirements to twice the size of the original document or even less. VTD-XML claims 1.3 times, but I haven't verified that.

However VTD-XML currently only supports the built-in entity references (&quot; &amp; &apos; &gt; &lt;). They're some other limits. Element names are limited to 2048 characters. Documents can't be much bigger than a billion characters, so SAX (or XOM) is still needed for really huge documents. There's also a maximum depth to the document, though exactly what it is isn't specified. All this means VTD-XML is not a conformant XML parser. Given this, comparisons to other parsers are unfair and misleading. I've seen many products that outperform real XML parsers by sub-setting XML and cutting out the hard parts. It's often the last 10% that kills the performance. :-( The other question I have for anything claiming these speed gains is whether it correctly implements well-formedness testing, including the internal DTD subset. Will VTD-XML correctly report all malformed documents as malformed? Has it been tested against the W3C XML conformance test suite? I'm not sure.

Sunday, October 15, 2006 (Permalink)

The OpenOffice Project has released OpenOffice 2.0.4, an open source office suite for Linux and Windows that saves all its files as zipped XML. It also runs on the Mac with X-Windows. This release adds many new locales, fondu (a set of programs for interoperating between Mac and Unix font formats), and many bug fixes. OpenOffice is dual licensed under the LGPL and Sun Industry Standards Source License.

Friday, October 13, 2006 (Permalink)

The Gnu Project has released IceWeasel, a Linux browser based on Firefox that is more free than Firefox. There's also Gnuzilla, a Gnu version of the whole Mozilla suite. The differences freedom wise between Gnuzilla/IceWeasel on the one hand and Mozilla/Firefox on the other are pretty trivial, and have been widely discussed elsewhere. They remind me of the Star Trek Enterprise episode where a planet gets destroyed over the question of whether God created the universe in nine days or ten. What hasn't been mentioned is that IceWeasel is making some security improvements I've been hoping Firefox would make for years. Specifically,

  1. Some sites refer to zero-size images on other hosts to keep track of cookies. When IceWeasel detects this mechanism it blocks cookies from the site hosting the zero-length image file. (It is possible to re-enable such a site by removing it from the blocked hosts list.)
  2. Other sites rewrite the host name in links redirecting the user to another site, mainly to "spy" on clicks. When this behavior is detected, IceWeasel shows a message alerting the user.

Freedom issues aside, that's reason enough alone to switch to IceWeasel.

One final note: while I do appreciate humor in the jab at Mozilla implies by the name IceWeasel, I suspect it will just turn off regular users. Why not call this something a little less negative like IceMink or IceFerret? The proposed logos are great though, and could do for Firefox what Tux did for Linux.


Thursday, October 12, 2006 (Permalink)

Next week, my university, Polytechnic in Brooklyn, is hosting Hyperpolis 3.0: Really Useful Media, a free conference dedicated to digital media:

We don't know enough about digital media as something other than a means to an end, as “instrumental culture”, where culture itself —mainstream, alternative, underground, or otherwise— is degraded to the status of tools (some hard, some soft, all ware).

We know too much about media discourses as, on the one hand, “popular culture”: alienated and commodified cultural forms; and on the other, “cultural theory”: paranoid cosmologies of hyper-rhetoric, and the ubiquitous inevitability of evil...

Hyperpolis: Really Useful Media will provide a forum for the discussion and presentation of some positive contributions to the field, in light of these chronic imbalances.

The conference takes place Thursday, October 19th and Friday, October 20th from 11am to 6pm in the Dibner Auditorium at Polytechnic University, 6 MetroTech in Brooklyn. The Borough Hall, Jay Street, and Hoyt Street subway stations are all within a couple of blocks. To register just add your name to the attendees list on the Wiki.

Wednesday, October 11, 2006 (Permalink)

The W3C XSL Working Group has published the proposed recommendation of Extensible Stylesheet Language (XSL) Version 1.1. Despite the name, this actually only covers XSL Formatting Objects, not XSL Transformations. New features in 1.1 include:

  • Multiple flows
  • Change marks
  • Back of the book indexing
  • Bookmarks
  • Markers in tables
  • fo:page-number-citation-last.
  • fo:page-sequence-wrapper
  • clear and float inside and outside
  • prefixes and suffixes for page numbers

Changes in this draft are fairly minor. The most significant is the ability to specify transparent borders.

Tuesday, October 10, 2006 (Permalink)

IBM's developerWorks has published my latest article, SimpleXML processing with PHP. This article introduces the SimpleXML library that's built into PHP 5 and later. It shows you how to use it, and explains what it can and cannot do.

Sunday, October 8, 2006 (Permalink)

I'm uploading beta 11 of Jaxen 1.1 as I type this. Jaxen is an open source (modified BSD license) XPath 1.0 engine for Java that is adaptable to many different object models including XOM, JDOM, DOM, and dom4j. Jaxen was originally written by James Strachan and Bob McWhirter. The primary impetus for beta 11 was fixing the build process so it once again generates source bundles. A couple of small, almost cosmetic, bugs were also fixed. If you haven't noticed any problems with beta 10, you can safely skip this iteration.

However, do not be fooled by the "beta" designation. This release has many fewer bugs and is much more conformant to the XPath specification than the official 1.0 release. I hope to release 1.1 final toward the end of the year after closing a few more bugs. However, there's no need to wait for that. If you're using Jaxen 1.0, you should upgrade to this beta.

Saturday, October 7, 2006 (Permalink)

The Mozilla Project has posted the second release candidate of Firefox 2.0. New features in 2.0 include:

  • Anti-Phishing Protection.
  • Search suggestions now appear with search history in the search box for Google, Yahoo! and Answers.com
  • Scrollable tabs
  • Ability to re-open accidentally closed tabs
  • Better support for previewing and subscribing to web feeds
  • Inline spell checking in forms (ironically the first word it flagged for me as misspelled was "Firefox")
  • Search plugin manager
  • Microsummaries feature for bookmarks
  • Automatic restoration of your browsing session if there is a crash
  • New combined and improved Add-Ons manager for extensions and themes
  • New Windows installer
  • JavaScript 1.7
  • Client-side session and persistent storage (a really hideous idea, sure to be misused)
  • svg:textPath

It's not immediately clear what's changed since RC 1, but presumably bugs were fixed. I've been using Firefox 2 as my primary browser since beta 1, and it's been pretty smooth over all. The inline spell checking is indispensable, and worth the upgrade alone. The restore session option is also quite nice. The only problem I've encountered are extensions that don't yet support FireFox 2.

Friday, October 6, 2006 (Permalink)

The W3C GRDDL Working Group has posted the first public working draft of a GRDDL Primer. According to the draft,

GRDDL is a mechanism for Gleaning Resource Descriptions from Dialects of Languages. It is a technique for obtaining RDF data from XML documents and in particular XHTML pages. Authors may explicitly associate documents with transformation algorithms, typically represented in XSLT, using a link element in the head of the document. Alternatively the information needed to obtain the transformation may be held in an associated metadata profile document or namespace document. Clients reading the document can follow their nose using techniques described in the GRDDL specification to discover the appropriate transformations. This document uses a number of examples from the GRDDL Use Cases document to illustrate in detail the techniques GRDDL provides for associating documents with appropriate instructions for extracting any embedded data.

The W3C GRDDL Working Group has also posted the first public working draft of GRDDL Use Cases: Scenarios of extracting RDF data from XML.


The W3C RDF Data Access Working Group has pushed the SPARQL Query Language for RDF back to working draft status. They are looking for input on two questions in particular:

Thursday, October 5, 2006 (Permalink)

Stefano Mazzocchi has released Gadget, an open source (BSD license) XML inspector based on XPath that analyzes XML documents too large to fit into RAM. According to Mazzocchi, "I was given the task of transforming a few hundred Mb of XML into RDF and I found out (the hard way!) that with that amount of data things start to break down: you need radically different approaches since you can't simply open your 100Mb XML document in your browser to take a look at it. Before writing Gadget I used a collection of 12-stages-long grep+sed+sort+uniq pipelines to understand what I had in that big XML pile, but that started to become a little cumbersome so I wrote this." Gadget is written in Java.


The W3C HTML Working Group has released the final recommendation of XHTML-Print. According to the abstract, "XHTML-Print is member of the family of XHTML languages defined by the Modularization of XHTML [XHTMLMOD]. It is designed to be appropriate for printing from mobile devices to low-cost printers that might not have a full-page buffer and that generally print from top-to-bottom and left-to-right with the paper in a portrait orientation. XHTML-Print is also targeted at printing in environments where it is not feasible or desirable to install a printer-specific driver and where some variability in the formatting of the output is acceptable." In essence, this subsets XHTML with the features appropriate for printing. For instance, frames are not supported because "Frames depend on a screen interface and therefore are not applicable to printers."

Wednesday, October 4, 2006 (Permalink)

The W3C has published a proposed edited recommendation of XML Inclusions (XInclude) Version 1.0 (Second Edition). The most significant change is that, "An XInclude processor may, at user option, suppress xml:base and/or xml:lang fixup." Otherwise most of the changes had already been addressed in errata.

Tuesday, October 3, 2006 (Permalink)

The W3C XML Processing Model Working Group has posted the first public working draft of XProc: An XML Pipeline Language. According to the introduction,

An XML Pipeline specifies a sequence of operations to be performed on a collection of input documents. Pipelines take zero or more XML documents as their input and produce zero or more XML documents as their output. Steps in the pipeline may read or write non-XML resources as well.

A pipeline consists of components. Like pipelines, components take zero or more XML documents as their input and produce zero or more XML documents as their output. The inputs to a component come from the web, from the pipeline document, from the inputs to the pipeline itself, or from the outputs of other components in the pipeline. The outputs from a component are consumed by other components, are outputs of the pipeline as a whole, or are discarded.

There are two kinds of components: steps and (language) constructs. Steps carry out single operations and have no substructure as far as the pipeline is concerned, whereas constructs can include components within themselves.

This specification defines a standard library, Appendix D, Standard Component Library, of steps. Pipeline implementations may support additional steps as well.

The goals look laudable. I'm not sure I like the syntax that's proposed.

Monday, October 2, 2006 (Permalink)

Bob Stayton has written an experimental version of the DocBook XSL stylesheets "in which the DocBook V5 namespace is used in element matches. These stylesheets handle the DocBook 5 namespace natively, rather than using the nodeset() function to strip it out so the elements can be processed with the existing namespace-free templates. Avoiding the nodeset step avoids losing the xml:base of the document, among other things. Also, with these stylesheets you can write a customization layer that uses the namespace in element matches."

Friday, September 29, 2006 (Permalink)

Norm Walsh has published the eighth beta of DocBook 5.0. DocBook 5 is "a significant redesign that attempts to remain true to the spirit of DocBook." The schema is written in RELAX NG. A DTD and W3C XML Schema generated from the RELAX NG schema are also available. There's also a Schematron schema "that validates some extra-grammatical DocBook constraints. These patterns are also present directly in the RELAX NG Grammar and some validators, for example MSV, can perform both kinds of validation at the same time." Changes in this beta are quite minor, mostly bug fixes and some tightening up of the data types.

Thursday, September 28, 2006 (Permalink)

The W3C Web API working group has posted the second public working draft of the Selectors API. "It is often desirable to perform script and or DOM operations on a specific set of elements in a document. Selectors [Selectors], mostly used in CSS [CSS21] context, provides a way of matching such a set of elements. This specification introduces two methods which take a group of selectors (often simply referred to as selector) as argument and return the matched elements as result." The spec offers the following JavaScript example:

function resolver(str) {
  var prefixes = {
    h:  "http://www.w3.org/1999/xhtml",
    g: "http://www.w3.org/2000/svg"}
  return prefixes[str];
}
var a = document.matchAll("h|div > g|svg", resolver);
var b = document.matchSingle("div.foo.bar");
Wednesday, September 27, 2006 (Permalink)

The Mozilla Project has posted the first release candidate of Firefox 2.0. New features in 2.0 include:

  • Anti-Phishing Protection.
  • Search suggestions now appear with search history in the search box for Google, Yahoo! and Answers.com
  • Scrollable tabs
  • Ability to re-open accidentally closed tabs
  • Better support for previewing and subscribing to web feeds
  • Inline spell checking in forms (ironically the first word it flagged for me as misspelled was "Firefox")
  • Search plugin manager
  • Microsummaries feature for bookmarks
  • Automatic restoration of your browsing session if there is a crash
  • New combined and improved Add-Ons manager for extensions and themes
  • New Windows installer
  • JavaScript 1.7
  • Client-side session and persistent storage (a really hideous idea, sure to be misused)
  • svg:textPath

It's not immediately clear what's changed since beta 2, but presumably bugs were fixed. I've been using Firefox 2 as my primary browser since beta 1, and it's been pretty smooth over all. The inline spell checking is indispensable, and worth the upgrade alone. The only problem I've encountered is that Chris Pederick's Web Developer plugin doesn't yet support FireFox 2, so occasionally I have to drop back to Firefox 1.5 to use it. Hopefully that will be updated soon.

Tuesday, September 26, 2006 (Permalink)

Javier Freire has released AJAXForms 1.0.2, a free tool that compiles XForms into HTML with Javascript. AJAXForms is published under the LGPL.

Monday, September 25, 2006 (Permalink)

The W3C XML Core Working Group has begin addressing some of the weirdnesses of Canonical XML, such as the movement of xml:id attributes from one element to another and breaking of base URLs when canonicalizing. Known Issues with Canonical XML 1.0 (C14N/1.0) describes the problems. The first public working draft of Canonical XML 1.1 attempts to define a new kind of canonicalization that does not have these problems. Using XML Digital Signatures in the 2006 XML Environment describes some workarounds for managing all this in the context of digital signatures.

Saturday, September 23, 2006 (Permalink)

The W3C Voice Browser Working Group has posted the last call working draft of VoiceXML 2.1 recommendation. VoiceXML is used to describe those annoying call trees you hear when calling most major companies. (Press 1 if you want to wait on hold for 20 minutes and then be hung up on; press 2 if you want to wait indefinitely; press 3 if you'd rather we just hung up on you now.) New features in 2.1 include data and foreach elements, dynamic grammars and scripts, detecting barge-in during prompt playback, fetching xml without requiring a dialog transition, recording user utterances while attempting recognition, and specifying the media format of utterance recordings.

Friday, September 22, 2006 (Permalink)

The W3C Web Application Formats Working Group has published the last call working draft of XML Binding Language (XBL) 2.0.

This specification describes the ability to map elements to script, event handlers, CSS, and more complex content models. This can be used to re-order and wrap content so that, for instance, simple HTML or XHTML markup can have complex CSS styles applied without requiring that the markup be polluted with multiple semantically neutral div elements.

It can also be used to implement new DOM interfaces, and, in conjunction with other specifications, enables arbitrary tag sets to be implemented as widgets. For example, XBL could in theory be used to implement XForms.

This version is a non-backwards-compatible "revision of Mozilla's XBL 1.0 language, originally developed at Netscape in 2000, and originally implemented in the Gecko rendering engine" developed by Mozilla, Opera, Google, and Apple. (Hmm, who's missing from that list?) It's supposedly less Mozilla focused, more browser independent. This is not the same as the W3C's sXBL effort, and it's not immediately clear whether work on that will continue in parallel, or if this will replace it in the W3C standards track. Either way this looks very interesting, and I hope the W3c can navigate the rocky shores of browser compatibility to get something usefully implemented.


RealObjects has released PDFreactor 1.1.936.7, a $2494 payware "formatting processor for converting XML and XHTML/HTML documents into PDF. It uses Cascading Style Sheets (CSS) to define page layout and styles" which distinguishes it from most other similar solutions which are based on XSL. SVG is also supported, and XSLT fits in somehow I don't quite understand.

Thursday, September 21, 2006 (Permalink)

The W3C CSS Working Group has posted a new working draft of CSS3 module: Generated Content for Paged Media. "This module describes features often used in printed publications. In particular, this specification describes how CSS style sheets can express named strings, leaders, cross-references, footnotes, endnotes, running headers and footers, named flows, ad hoc counter styles, paged-based floats, hyphenation, change bars, named page lists, and generated lists. Along with two other CSS3 modules — multicolumn layout and paged media — this module offers a way of presenting structured documents on paged media."


The W3C CSS Working Group has also published a new working draft CSS Level 3, Values and Units . "This CSS3 module describes the various values and units that CSS properties accept. Also, it describes how values are computed from 'specified' (which is what the cascading process yields) through 'computed' and 'used' into 'actual' values. The main purpose of this module is to define common values and units in one specification which can be referred to by other modules."

Wednesday, September 20, 2006 (Permalink)

The Help Markup Language team has released the specification for HelpML 0.2, an "XML-based file format and proposed standard for writing help documents" such as FAQs and other topic-based information. The spec looks pretty weak to me. I see no support for embedded HTML, no namespace, and no support for mixed content. There's no schema, DTD, stylesheets, or other tools that I could find; but version 0.3 is all Wikified so you should be able to fix this. :-)


ActiveState has posted the fifth alpha release of Komodo 4.0, a $295 payware IDE for Perl, Ruby, PHP, Python, Tcl, and XSLT. Komodo runs on Mac OS X 10.3 and later, Linux, and Windows.


buldocs has released xnsdoc 1.2, a €49 payware "documentation generator for XML namespaces defined by W3C XML Schema in HTML in a JavaDoc like visualization. xnsdoc supports all common schema design practices like chameleon, russian doll, salami slice, venetian blind schemas or circular schema references. xnsdoc can be used from the command line, as an Apache Ant Task, as an Apache Maven Plugin, as an eclipse plugin or integrated as a custom tool in many XML development tools such as StylusStudio, oXygen XML or XMLWriter." Version 1.2 fixes bugs and adds some additional configuration options.


Syntext has posted the first release candidate of Serna 3.0. a $268 payware XSL-based WYSIWYG XML Document Editor for Mac OS X, Windows, and Unix. Features include on-the-fly XSL-driven XML rendering and transformation, on-the-fly XML Schema validation, XInclude, and spell checking. A roughly $500 enterprise edition adds a Python API and WebDAV support. Version 3.0 now allows text input in Chinese, Japanese, and Korean.


The Big Faceless Organization has released the Big Faceless Report Generator 1.1.32, a $1200 payware Java application for converting XML documents to PDF. Unlike most similar tools it appears to be based on HTML and CSS rather than XSL Formatting Objects. This is mostly a bug fix release. Java 1.2 or later is required.


Andrea Marchesini has released libnxml 0.13, a C library for parsing, writing, and creating XML 1.0 and 1.1. Version 0.13 adds support for proxies and SSL certification. libnxml is published under the LGPL.

Tuesday, September 19, 2006 (Permalink)

Code Synthesis has released xsd 2.3.0, a free-as-in-speech (GPL) W3C XML Schema language based data binding tool for C++.

Given an XML instance description (XML Schema), it generates C++ classes that represent the given vocabulary as well as parsing and serialization code (collectively called a mapping or binding).

Compared to APIs such as DOM and SAX, the generated code allows you to access the information in XML instance documents using your domain vocabulary instead of generic elements, attributes, and text. Static typing helps catch errors at compile-time rather than at run-time. Automatic code generation frees you for more interesting tasks while minimizing the effort needed to adopt your applications to changes in the document structure.

xsd supports two C++ mappings: in-memory C++/Tree and event-driven C++/Parser. The C++/Tree mapping consists of C++ classes that represent data types defined in XML Schema, a set of parsing functions that convert XML instance documents to a tree-like in-memory data structure, and a set of serialization functions that convert the in-memory representation back to XML....

The C++/Parser mapping provides parser templates for data types defined in XML Schema. Using these parser templates you can build your own in-memory representations or perform immediate processing of XML instance documents.

2.3.0 allows the generated code to contain user-defined types and improves performance.

Monday, September 18, 2006 (Permalink)

The W3C WebCGM Working Group has posted the candidate recommendation of WebCGM 2.0, an updated version of the ISO Computer Graphics Metafile standard (ISO/IEC 8632:1999). "WebCGM 2.0 adds a DOM (API) specification for programmatic access to WebCGM objects, and a specification of an XML Companion File (XCF) architecture, for externalization of non-graphical metadata. WebCGM 2.0, in addition, builds upon and extends the graphical and intelligent content of WebCGM 1.0, delivering functionality that was forecast for WebCGM 1.0, but was postponed in order to get the standard and its implementations to users expeditiously." Personally I thought this had been superseded by SVG, and was now a purely legacy format.

Saturday, September 16, 2006 (Permalink)

I've posted my slides from Thursday's Testing XML presentation at SD Best Practices 2006.


FourThought has released the Amara XML Toolkit 1.1.9, an open source "collection of Python tools for XML processing-- not just tools that happen to be written in Python, but tools built from the ground up to use Python idioms and take advantage of the many advantages of Python." Amara includes:

  • Bindery: data binding tool (a very Pythonic XML API)
  • Scimitar, an implementation of the Schematron language for that converts Schematron documents to Python scripts
  • domtools: set of tools to augment Python DOMs
  • saxtools: set of tools to make SAX easier to use in Python
  • Flextyper: user-defined datatypes in Python for XML processing

Python 2.3 or later is required. This release adds support for EasyInstall, and makes various other small improvements and bug fixes.

Friday, September 15, 2006 (Permalink)

The XML Apache Project has released Xerces-J 2.8.1, a minor upgrade to the preeminent open source XML parser for Java. This release mostly fixes bugs.

Monday, September 11, 2006 (Permalink)

Michael Smith has released version 1.71.0 of the DocBook XSL stylesheets. According to Smith, "As with all DocBook Project dot-zero releases, this is an experimental release. It will be followed shortly by a stable release. This is mainly a bug fix release." This release does add support for source-code highlighting and also improves autoindexing.

Saturday, September 9, 2006 (Permalink)

IBM has updated the Compound XML Document Toolkit, a closed source Eclipse plugin Web Tools Platform for editing XML documents that use multiple namespaces.

The Compound XML Document Toolkit uses XML schemas to define the semantics of constructing documents spanning one or more namespaces. Those semantics include the order and placement of elements, the allowable child elements, and available attributes for each element.

Sample XML schema profiles for these XML-based standards are provided with the Compound XML Document Toolkit; documents having mark-up of the following types may therefore be created and edited immediately upon installation:

  • XHTML 1.1 + XForms 1.0
  • XHTML 1.1 + SVG 1.1
  • XHTML 1.1 + MathML 2.0
  • XHTML 1.1 + XForms 1.0 + SVG 1.1
  • SVG 1.1 + XHTML 1.1
  • SVG 1.1 + XHTML 1.1 + XForms 1.0
  • XHTML Mobile 1.1 + SVG Tiny 1.2
  • SVG Tiny 1.2 + XHTML Mobile 1.1
  • XHTML 1.1 + SVG 1.1 + MathML 2.0
  • XHTML 1.1 + VoiceXML 2.0
  • XHTML 1.1 + VoiceXML 2.0 + SVG 1.1
  • XHTML 1.1 + SMIL 2.0

The Compound XML Document Toolkit also provides also provides tools for validating compound XML documents, in addition to one-step rendering of documents being edited.

This release adds support for Eclipse 3.2.

I'm skeptical of the schema-based, strong typing ideal that drives this project. Personally I'm much more interested in products that treat schemas as suggestions rather than strait jackets. For instance, I don't mind an editor using a schema to suggest auto-complete options; but I don't want it to freak out if I add an xi:include element that isn't accounted for by the schema or paste in some invalid (but well-formed) legacy HTML.

Friday, September 8, 2006 (Permalink)

The Omni Group has released OmniWeb 5.5, a $29.95 payware web browser for Mac OS X. Version 5.5 supports the core parts of XML on the Web including XSLT and CSS.


IBiblio is upgrading server hardware starting tonight at 6:00 P.M. EDT. This and other IBiblio hosted sites will be up and down sporadically between then and 6:00 P.M. Saturday. I will likely not be able to check e-mail at my metalb address for most of that time either.


IBM's alphaWorks has updated the XML Forms Generator, a data-driven Eclipse plug-in that "generates forms that adhere to the XForms 1.0 standard, using as a starting point either Web Service Description Language (WSDL) documents or XML instance documents having optional XML Schema backing models. The generated forms adhere to the XHTML and XForms 1.0 standards and can be viewed in popular XHTML and XForms renderers." This release adds support for Eclipse 3.2 and a schema-flattening utility.

IBM has also released the Visual XForms Designer, an Eclipse plug-in for graphically editing XForms. This product sits on top of the Eclipse Modeling Framework (EMF), Graphical Editing Framework (GEF), and Eclipse Web Tools Platform (WTP). This release integrates the WTP XML Editor and supports Eclipse 3.2.

Both the XML Forms Generator and the Visual Forms Designer are part of the Emerging Technologies Toolkit (ETTK), which is a polite way of saying they're closed source and more than likely IBM will eventually abandon them without ever making them available for production use; either as closed or open source.

Thursday, September 7, 2006 (Permalink)

I've posted beta 10 of Jaxen 1.1, an open source (modified BSD license) XPath 1.0 engine for Java that is adaptable to many different object models including XOM, JDOM, DOM, and dom4j. Jaxen was originally written by James Strachan and Bob McWhirter. Beta 10 fixes an assortment of small issues.

Do not be fooled by the "beta" designation. This release has many fewer bugs and is much more conformant to the XPath specification than the official 1.0 release. I hope to release 1.1 final toward the end of the year after closing a few more bugs. However, there's no need to wait for that. If you're using Jaxen 1.0, you should upgrade to this beta.


Bare Bones Software has released version 8.5 of BBEdit, my preferred text editor on the Mac, and what I use to edit this very site. New features include support for support for Ruby, SQL, and YAML; code folding; HTML Format, Translate and Tidy; and autosave. BBEdit is $199 payware. Upgrades from 8.x are $30. They're $40 for owners of earlier versions. Mac OS X 10.3.9 or later is required.

Wednesday, September 6, 2006 (Permalink)

Some students from several universities have undertaken to write an entire text book about XML in a Wiki. Is this the future of textbook publishing? Maybe, but only if it attracts enough users to correct its mistakes and rewrite its language. It could really use a thorough tech edit and a thorough copy edit. The lack of a single authorial voice, the frequent use of academic passive-speak, and the inconsistent and often inaccurate terminology are severe problems. Small technical mistakes are frequent, and there's at least one major flaw with the whole premise that underlies the book--excessive dependence on and faith in schemas. Probably some of this is precisely because the book was written by students learning XML rather than experts trying to teach XML. I attempted to correct some of the more obvious howlers in the text, but the Wiki was acting up and kept giving me the wrong section to edit.

I'm sure this was a valuable learning exercise for the students who wrote it. I'm tempted to assign a project like this to my own class one of these semesters. How valuable it is for others trying to learn from it, I don't know. I'm probably too close to judge myself. I do recall that readers often seemed to appreciate the tone of my earlier books where I too was learning as I wrote more. They had a real conversational, tutorial, we're all in this together approach that a lot of readers liked. My later books don't have as much of that. On the other hand, my early books also had a lot of small and large mistakes too, many of which still make me blanch when I read them today. This Wiki text is certainly superior to the first editions of the first books I wrote about XML; even if it's decidedly inferior to the more modern editions on store shelves now.

Of course, it's really not fair to compare the revised editions of my books to the first edition of the Wiki book. If the Wiki improves at the same rate as new editions of paper books do (and it might well improve faster) then it could be a real challenge to existing books like the XML Bible or XML in a Nutshell. If nothing else the project (which includes many other textbooks besides XML) should help control the exorbitant price of college texts. It may also finally give some old dinosaurs like Halliday and Resnick, Apostol, or Jackson a run for their money. That would be a very good thing.


JAPISoft has released EditiX 5.0, a $99 payware cross-platform XML editor written in Java. Features include XPath location and syntax error detection, context sensitive popups based on DTD, W3C XML Schema Language, and RelaxNG schemas, XSLT and XSL-FO previews, XInclude, XML catalogs, an XSLT debugger, DocBook support, and multi-view preview. Version 5.0 enhances code assistance, adds a DTD documentation generator, and various other small features. EditiX is available for Mac OS X, Linux, and Windows.

Tuesday, September 5, 2006 (Permalink)

The W3C CSS working group has posted the first public working draft of CSS Module: Namespaces. "This CSS module defines the syntax for using namespaces in CSS. It introduces the @namespace rule for declaring the default namespace and binding namespaces to namespace prefixes, and it defines a syntax that other specifications can adopt for using those prefixes in namespace-qualified names."

Given the namespace declarations:

@namespace toto "http://toto.example.org";
@namespace "http://example.com/foo";

In a context where the default namespace applies

toto|A
represents the name A in the http://toto.example.org namespace.
|B
represents the name B that belongs to no namespace.
*|C
represents the name C in any namespace, including no namespace.
D
represents the name D in the http://example.com/foo namespace.

Michael Kay has released version 8.8 of Saxon, his XSLT 2.0 and XQuery processor for Java and .NET. According to Kay, "There's a long list of changes at http://www.saxonica.com/documentation/changes/intro.html but most of them taken individually are fairly small. Saxon now achieves 100% pass rates in both the W3C XQuery and XSLT 2.0 test suites (a unique achievement), and many of the changes that were needed to reach this target are in obscure corner cases that very few users are likely to notice. (Do you really care about the difference between a float NaN and a double NaN? - the conformance tests do.)" New features include:

  • An implementation of the draft XQuery API for Java
  • The default collation for XSLT sorting is now Unicode codepoint collation.
  • Extension functions written in C# or other .NET languages can now be invoked fron the .NET version
  • In XSLT, the namespace URI for newly-constructed elements and attributes must now be a valid URI.
  • The serialization pipeline is now configurable by a user-specified SerializerFactory class

Saxon is published in two versions for both of which Java 1.4 or later (or .NET) is required. Saxon 8.8B is an open source product published under the Mozilla Public License 1.0 that "implements the 'basic' conformance level for XSLT 2.0 and XQuery." Saxon 8.8SA is a £250.00 payware version that "allows stylesheets and queries to import an XML Schema, to validate input and output trees against a schema, and to select elements and attributes based on their schema-defined type. Saxon-SA also incorporates a free-standard XML Schema validator. In addition Saxon-SA incorporates some advanced extensions not available in the Saxon-B product. These include a try/catch capability for catching dynamic errors, improved error diagnostics, support for higher-order functions, and additional facilities in XQuery including support for grouping, advanced regular expression analysis, and formatting of dates and numbers."

Monday, September 4, 2006 (Permalink)

The W3C XML Schema Working Group has posted the third public working draft of XML Schema 1.1 Part 1: Structures. According to the introduction to the structures spec,

The Working Group has three main goals for this version of W3C XML Schema:

  • Significant improvements in simplicity of design and clarity of exposition without loss of backward or forward compatibility;
  • Provision of support for versioning of XML languages defined using the XML Schema specification, including the XML transfer syntax for schemas itself.
  • Provision of support for co-occurrence constraints, that is constraints which make the presence of an attribute or element, or the values allowable for it, depend on the value or presence of other attributes or elements.

These goals are in tension with one another. The Working Group's strategic guidelines for changes between versions 1.0 and 1.1 can be summarized as follows:

  1. Support for versioning (acknowledging that this may be slightly disruptive to the XML transfer syntax at the margins)
  2. Support for co-occurrence constraints (which will certainly involve additions to the XML transfer syntax, which will not be understood by 1.0 processors)
  3. Bug fixes (unless in specific cases we decide that the fix is too disruptive for a point release)
  4. Editorial changes
  5. Design cleanup will possibly change behavior in edge cases
  6. Non-disruptive changes to type hierarchy (to better support current and forthcoming international standards and W3C recommendations)
  7. Design cleanup will possibly change component structure (changes to functionality restricted to edge cases)
  8. No significant changes in existing functionality
  9. No changes to XML transfer syntax except those required by version control hooks, co-occurrence constraints and bug fixes

The aim with regard to compatibility is that

  • All schema documents conformant to version 1.0 of this specification should also conform to version 1.1, and should have the same validation behavior across 1.0 and 1.1 implementations (except possibly in edge cases and in the details of the resulting PSVI);
  • The vast majority of schema documents conformant to version 1.1 of this specification should also conform to version 1.0, leaving aside any incompatibilities arising from support for versioning or co-occurrence constraints, and when they are conformant to version 1.0 (or are made conformant by the removal of versioning information), should have the same validation behavior across 1.0 and 1.1 implementations (again except possibly in edge cases and in the details of the resulting PSVI);

Changes since the last working draft appear to be quite technical.

Sunday, September 3, 2006 (Permalink)

Microsoft has posted the first release candidate of Internet Explorer 7 (Windows only). According to general manager Dean Hachamovitch, "The RC1 build includes improvements in performance, stability, security, and application compatibility. You may not notice many visible changes from the Beta 3 release; all we did was listen to your feedback, fix bugs that you reported, and make final adjustments to our CSS support. I do want to call attention to two changes in particular. First, IE7 RC1 setup automatically detects and uninstalls previous IE7 betas before trying to install IE7 so you don’t have to. We’ll post more detail on the install/uninstall process very soon. Second, IE7 RC1 will automatically detect add-ons with known stability or compatibility problems so that end users can easily get a newer version or temporarily turn the add-on off. "

Saturday, September 2, 2006 (Permalink)

I'm in the process fo revising and updating the XML mailing lists directory. If you have any updates, corrections, additions, or deletions, please send them to me. Thanks!

Friday, September 1, 2006 (Permalink)

The Mozilla Project has posted the second beta of Firefox 2.0, New features in 2.0 include:

  • Anti-Phishing Protection.
  • Search suggestions now appear with search history in the search box for Google, Yahoo! and Answers.com
  • Scrollable tabs
  • Ability to re-open accidentally closed tabs
  • Better support for previewing and subscribing to web feeds
  • Inline spell checking in forms (ironically the first word it flagged for me as misspelled was "Firefox")
  • Search plugin manager
  • Microsummaries feature for bookmarks
  • Automatic restoration of your browsing session if there is a crash
  • New combined and improved Add-Ons manager for extensions and themes
  • New Windows installer
  • JavaScript 1.7
  • Client-side session and persistent storage (a really hideous idea, sure to be misused)
  • svg:textPath

It's not immediately clear what's changed since beta 1, but presumably bugs were fixed. The inline spell checking is indispensable, and worth the upgrade alone.

Thursday, August 31, 2006 (Permalink)

freebxml.org has released ebxmlrr 3.0, an open source ebXML Registry. An XML registry/repository "can store XML, web services, or any other type of data, and the registry manages the entire life cycle of information in the repository using sophisticated meta-data technology." According to the announcement, version 3.0 "provides a near, feature complete implementation of Registry Full conformance profile for the OASIS ebXML Registry 3.0 standard [ebRR]. The only missing feature is support for Single Sign On (SSO) based on the Registry SAML Profile. This implementation therefor currently only claims conformance to the Registry Lite profile for the OASIS ebXML Registry 3.0 standard. The freebXML Registry also provides a feature complete implementation of level 1 conformance profile of the Java API for XML Registries API [JAXR]. Also included are implementation of two profiles [PROF] of ebXML Registry", the ebXML Registry Profile for Web Services and the ebXML Registry Profile for WSRP Remote Portlets.


The CRDP the Académie de Grenoble has released WéM, a €35 payware XHTML+MathML editor aimed at the creation of scientific and interactive documents. WéM (and its web site) are in French.

Wednesday, August 30, 2006 (Permalink)

Planamesa Software has posted the first public beta of NeoOffice/J 2.0, a Mac port of OpenOffice using a Java-based GUI. "NeoOffice 2.0 Aqua Beta 3 offers a significantly enhanced "Aqua" appearance, enhanced international input support, as well as significantly improved overall performance over NeoOffice 1.2. Based on the latest stable OpenOffice.org codebase (2.0.3), this release also include native open and save dialog support for reading and writing files in the international standard ISO 26300 OpenDocument (ODF) file formats."

The icons are too small, the default print layout is too focused on looking like a piece of paper rather than an editing window, and the View/Print Layout menu item seems completely non-functional, and that's just what I noticed in the first 30 seconds after launching the app. It took me another 30 seconds to note the lack of an outline mode and the File/Exit menu item. It took another 30 seconds to quit the application and notice the very strange font in the Save/Don't Save/Cancel dialog box. However, this is still a sizeable improvement over OpenOffice. It's catching up to Microsoft Office, though it's certainly not there yet. Mac OS X 10.3 or later is required. NeoOffice is published exclusively under the GPL.

Tuesday, August 29, 2006 (Permalink)

The Unicode® Consortium has released version 5.0 of the Unicode Character Database. Version 5.0 defines more than 99,000 characters. This includes the the Common Locale Data Repository 1.4 which supports 360 different locales covering 121 languages and 142 territories.

Version 5.0 adds five new scripts: Balinese, N’Ko, Phags-pa, Phoenician, and Sumero-Akkadian Cuneiform. New characters were also added for Cyrillic, Greek, Hebrew, Kannada, Latin, math, phonetic extensions, and symbols. Overall, there are 1,369 new characters in version 5.0.

New features in 5.0 include dependable caseless matching through stable case folding operations. Version 5.0 also revises and improves "property values and behavioral specifications in areas such as character, word, line, and sentence segmentation, and tightens conformance requirements on Bidi implementations (used for Arabic and Hebrew)." The Unicode Consortium has also released the Unicode Collation Algorithm 5.0 specifying default collation for all 99,000 characters.


The W3C XML Query Working Group and the XSL Working Group have released version 1.0 of the XML Query Test Suite (XQTS). According to Andrew Eisenberg,

We encourage implementors to run this test suite and request that they provide feedback to us by Sept. 29, 2006. If enough positive results are received, then we will be able to request a transition to Proposed Recommendation. During this period we will continue to respond to bug reports and will likely issue one or two point releases of the test suite.

In this release, we have added 230 test cases, including a small number of tests for fn:collection. We have changed the way that tests of fn:doc and tests that expect a context item from the host environment are expressed in our catalog (spelled out in Guidelines for Running the XQuery Test Suite).

We request that submittors re-read our guidelines if they have not done so recently and follow the guidelines for the transformation of queries and for the comparison of results as closely as possible. While we have provided a mechanism for submittors to tell us about any deviations that have been made, we hope that this will be used sparingly.

To date, we have received the results for several implementations of XQuery: Saxon-SA, xq2xsl, X-Hive/DB, xbird/open, XQuest,Qizx/open, and one anonymous implementation. A report that reflects these results is available from our web page. We will update this report as new results are received. We encourage implementors to send us results early, and then to update their results as their implementations progress.

Monday, August 28, 2006 (Permalink)

The IETF has published what amounts to a proposed recommendation of XML Pipelining with Chunks for the Information Registry Information Service.

This transfer protocol defines simple framing for sending XML in chunks so that XML fragments may be acted upon (or pipelined) before the reception of the entire XML instance. This document calls this XML pipelining with chunks (XPC) and its use with IRIS as IRIS-XPC.

XPC is for use with simple request and response interactions between clients and servers. Clients send a series of requests to a server in data blocks. The server will respond to each data block individually with a corresponding data block, but through the same connection. Request and response data blocks are sent using the TCP SEND function and received using the TCP RECEIVE function.

The lifecycle of an XPC session has the following phases:

  1. A client establishes a TCP connection with a server.
  2. The server sends a connection response block (CRB).
  3. The client sends a request block (RQB). In this request, the client can set a "keep open" flag requesting that the server keep the XPC session open following the response to this request.
  4. The server responds with a response block (RSB). In this response, the server can indicate to the client whether or not the XPC session will be closed.
  5. If the XPC session is not to be terminated, then the lifecycle repeats from step 3.
  6. The TCP connection is closed.

What I'm confused about is why it's necessary to send XML fragments. Why not just send multiple small documents instead? What they're proposing may be reasonable, but at first read it gives me the willies. There may be deep flaws here. If nothing else, existing XML APIs like DOM, XOM, and SAX really aren't designed to handle semi-independent fragments. Furthermore the draft doesn't seem to address what happens to the remaining chunks when one is malformed. That's a huge omission that likely has security implications. Finally, this whole protocol feels like it deeply mixes layers that should stay independent. Comments are due today.


RenderX has released INX2FO, a set of free-as-in-beer XSL Stylesheets for converting Adobe InDesign documents to XSL Formatting Objects.


Axos Technologies has released Axos FormMapper, a "tool for creating print templates for static layout forms and documents with embedded variable data. Simply re-use an existing empty PDF form as a background, draw boxes for the variable XML data on top, and map the boxes to their corresponding XML data structure elements from a model XML data file. The resulting template can be used repeatedly with an XSL-FO rendering engine to output PDF and Postscript documents." FormMapper is payware, but Axos doesn't put the copst anywhere obvious on their site. I guess they want to figure out how much they think you can afford to pay before quoting you a price.


Sylvain Hellegouarch has posted amplee 0.1.0, a Python "implementation of the Atom Publishing Protocol using atomixlib 0.4.3 and CherryPy 3."


The Apache Jakarta Project has posted Commons SCXML 0.5. "State Chart XML (SCXML) is currently a Working Draft published by the World Wide Web Consortium (W3C). SCXML provides a generic state-machine based execution environment. Commons SCXML provides a Java implementation of the SCXML engine. Anything that can be represented as a UML state chart -- business process flows, view navigation bits, interaction or dialog management, and many more -- can leverage the Commons SCXML library." Currently SCXML is being used as part of VoiceXML. For instance, it can be used to map out thoise annoying phone trees. E.g. "If Wonder Shampoo turned your hair green, please press 1. If Wonder Shampoo turned your hair purple, please press 2. If Wonder Shampoo made you bald, please press 3."


The Mozilla Project has posted version 0.6 of its XForms extension for Firefox 1.5. Mozilla XForms support has been developed by IBM, Novell, and independent contributors. Improvements include repeat and select optimization, more XUL controls, improved schema support, and a new permission manager.


Alex Selkirk has posted a beta of GUIXML 0.82, a free-as-in-beer browser for Windows 2000 and later that "can potentially use any XML vocabulary with a namespace. There are a number of demonstration vocabularies, including SimpleXHTML, SimpleDrawing, Identity, Chatroom. There are APIs for display, editing, resources, and database serialization. The extensibility of the web browser is achieved through the ability to convert W3C XML schemas to C++ code. The C++ code compiles to a DLL usable by the web browser. This allows a person to create their own XML vocabulary and integrate it with other existing vocabularies and with the web browser."

Sunday, August 27, 2006 (Permalink)

The W3C is sponsoring a Workshop on Gathering Requirements for Extensible Stylesheet Language (XSL) 2.0 "to be held 18 October in Heidelberg, Germany, hosted by Heidelberger Druckmaschinen AG. The Workshop will be held in conjunction with a symposium on Web printing at the same location. Participants will discuss the requirements, features and design of a future version of" XSL-FO. I'd really like to see some thought about why XSL-FO 1.0 failed; and what could be done to improve matters in version 2 for both users and implementers. Uptake of XSL-FO 1.0 has been so slow (in large part due to the lack of quality open source implementations) that backwards compatibility should not be a huge concern.

Saturday, August 26, 2006 (Permalink)

The W3C Compound Document Formats Working Group has published an updated last call working draft of Compound Document by Reference Framework 1.0.

Combining content delivery formats can often be desirable in order to provide a seamless experience to the user.

For example, XHTML-formatted content can be augmented by SVG objects, to create a more dynamic, interactive and self adjusting presentation. A set of standard rules is required in order to provide this capability across a range of user agents and devices.

These are examples of possible Compound Document profiles:

  • XHTML + SVG + MathML
  • XHTML + SMIL
  • XHTML + XForms
  • XHTML + VoiceML

This document defines a generic Compound Document by Reference Framework (CDRF) that defines a language-independent processing model for combining arbitrary document formats.


Jacek Radajewski has posted versioon 0.0.8 of UTF-X, a JUnit extension for testing XSLT stylesheets. According to Radajewski, "We've developed it at USQ for unit testing our stylesheets about three years ago, and two years ago released it under GPL. Although still in Alpha versions UTF-X works well and has been used in reasonable size projects (over 1500 templates/tests). UTF-X tests or Test Definition Files (TDFs) are XML documents which can be validated and rendered. Being able to render your tests works well for the test-first-design approach as you can write all your tests, validate them against your and/or xhtml DTD and render them for visual inspection. If everything is OK you can write your templates untill all tests pass." New features in this release include:

  • Testing named templates with parameters
  • Stylesheet parameters
  • External CSS stylesheets
  • Absolute XPath expressions
  • Setting the context node

UTF-X requires Java 5 and is published under the GPL.

Thursday, August 24, 2006 (Permalink)

The W3C Web Application Formats Working Group has posted the first public working draft of the Web Applications Packaging Formats Requirements. According to the draft,

Application Packaging is the process of bundling an application and its resources into an archive format (eg. a .zip file [ZIP]) for the purpose of distribution and deployment. A package bundle usually includes a manifest, which is a set of instructions that tell a host runtime environment how to install and run the packaged application. Application packaging is used on both the server-side, as is the case with Sun's JEE .war files and .ear files and Microsoft's .NET .cab files, and on the client-side, as is the case with widgets such as those distributed by Apple, Opera, Yahoo! and Microsoft.

Currently, there is no standardized way to package an application for distribution and deployment on the web. Each vendor has come up with their own solution to what is essentially the same problem (see Appendix). The working group hopes that by standardising application packaging authors will be able to distribute and deploy their applications across a variety of platforms in a standardized manner that is both easy to use and device independent.

Wednesday, August 23, 2006 (Permalink)

V. Omprakash has begin work on exalto, a free-as-in-speech (GPL) XML editor for narrative documents such as DocBook.


Peter Nunn has released MozzIE 1.0, an open source plugin for Internet explorer plug-in that uses the Mozilla Gecko 1.8 rendering engine to display XHTML and XForms.


Todd Ditchendorf has released XSLPalette 1.1, a free-as-in-beer XSLT debugging palette for BBEdit, TextMate, Xcode, and other Mac OS X editors.



G. Ken Holman has released some free stylesheets for converting UBL documents into XSL-FO. A special edition of the Ibex Signature Edition XSL-FO processor can be used with these stylesheets to convert UBL documents to PDF.


Tuesday, August 22, 2006 (Permalink)

IDEAlliance has posted the proceedings from the recently concluded Extreme Markup Languages conference. Lots of interesting stuff to look through.


Daniel Howe has released TrackMeNot. a Firefox extension that sends a constant stream of search terms to various search engines to hide genuine seacrhes down in the noise. "With TrackMeNot, actual web searches, lost in a cloud of false leads, are essentially hidden in plain view. User-installed TrackMeNot works with the Firefox Browser and popular search engines, e.g. AOL, Yahoo!, Google, and MSN." The initial version isn't very secure against a determined adversary, but this should improve in the future.

Monday, August 21, 2006 (Permalink)

Tatu Saloranta has released WoodStox 3.0, a free-as-in-speech (LGPL) validating XML processor written in Java that implements StAX pull-parsing API. WoodStox supports XML 1.0 and 1.1, and is dual-licensed under the LGPL and the Apache license 2.0.


Jez Higgins has posted a new version of Arabica, an open source C++ XML parser toolkit that supports SAX2 and DOM2 by wrapping an underlying parser such as expat, Xerces, libxml, or the Microsoft XML parser COM component. It supports various string types. According to Higgins, "The August 2006 release extends the XPath engine to support arbitrary strings types. A new dual DOM/Streaming parser has been added. By registering a callback function, partially built DOM trees can be processed, modified, manipulated or even discarded, before proceeding to build more of the tree. The release also includes assorted minor bug fixes." Arabica is published under a BSD style license.


Pavel Sher has posted Juxy 0.7, "a simple unit testing library for XSLT written in Java. Juxy allows to call or apply individual XSLT templates from Java and does not use any specific features of XSLT processor for that purposes. It relies entirely on TRaX API and should work with any TRaX compliant XSLT processor." Juxy is published under the Apache 2.0 license. Java 1.4 or later is required.


Jacek Radajewski has posted an alpha of UTF-X, a JUnit extension for testing XSLT stylesheets. According to Radajewski, "We've developed it at USQ for unit testing our stylesheets about three years ago, and two years ago released it under GPL. Although still in Alpha versions UTF-X works well and has been used in reasonable size projects (over 1500 templates/tests). UTF-X tests or Test Definition Files (TDFs) are XML documents which can be validated and rendered. Being able to render your tests works well for the test-first-design approach as you can write all your tests, validate them against your and/or xhtml DTD and render them for visual inspection. If everything is OK you can write your templates untill all tests pass." UTF-X requires Java 5 and is published under the GPL.


Oleg Paraschenko has released XSieve, "an XML transformation language based on combination of XSLT and Scheme (a Lisp dialect). XSieve make XSLT to be a general-purpose language." I'm not quite sure why you'd want that when we already have excellent general purpose languages like Python and Java, but there it is. XSieve allows XSLT extension functions to be written in Scheme. Since XSLT and Scheme are both functional languages, that may be a better match than extension functions written in imperative languages like Java and C.


XRules.org has released XmlVoyager, a spreadsheet-like XML browser for Windows. According to developer Waleed Abdullav, "XmlVoyager is especially useful when working with publicly standardized XML formats such as those created by vertical standards groups like UBL and OAGIS. Columns can be added to identify which nodes are populated by the XML creator and which ones are required or optional by the XML consumer (see example in the download). And, additional columns can be used to provide comments or describe business rules that govern the use of each node. XmlVoyager makes it easy to customize the view to show only relevant data as needed."


Orbeon has submitted the XML Pipeline Language (XPL) Version 1.0 (Draft) to the W3C. "An XPL program defines orchestrated sequences of operations on XML Information Sets (Infosets). Individual operations are encapsulated within components called XML processors. Operations include production, consumption, and transformation of XML Infosets. An XPL program supports unconditional operations, and may support as well conditions, loops, and change of control following runtime errors." This is an important idea, and a big hole int the existing XML family of specs. Whether this is the right implementation of this idea, I don't yet know.


Alex Milowski is working on smallx:

library and set of tools that is being developed to process XML infosets. It has two distinct features in that the infoset implementation allows streaming of documents and that processing of infosets can be accomplished using a concept called pipelines. The library contains a full compliment of technologies--include XPath and XSLT.

Pipelines provide the ability to chain together different components that perform different tasks to process a XML document. Some of these tasks might be decision points in the processing while other might transform the input (e.g. XSLT). All components in the pipeline have the ability to stream the infoset it they so choose.

The key difference of this code over others is that it allows streaming of infosets to be mixed in with non-streamed document-based processing. This allows large data sets to be processed in a minimal amount of memory while allowing traditional technologies like XSLT to still be used.

For example, in the following pipeline:

<p:pipe xmlns:p="http://www.smallx.com/Vocabulary/Pipeline/2005/1/0" name="scoped-xslt">
<p:subtree select="/doc/part/subsection">

<p:xslt src="translate.xsl"/>
</p:subtree>
</p:pipe>

the XSLT transform "translate.xsl" is limited to only the elements that match the "/doc/part/subsection" XPath. Every other part of the document "flows" around the XSLT in a streaming fashion so that only the subsection subtree needs to be built and passed to XSLT. In the end, the pipeline puts all the pieces back together in the right order.

Again, whether this is the right implementation of this idea, I don't yet know.


IBM's alphaWorks has released XJ, a derivative of the Java programming language 1.4 that builds in native support for XML (like Cω does for C#). "In XJ, one can import XML schemas just as one does Java classes. All the element declarations in the XML schema are then available to programmers as if they were Java classes. Programmers can write inline XPath expressions on these classes, and the compiler checks them for correctness with respect to the XML schema. In addition, it performs optimizations to improve the evaluation of XPath expressions. A programmer may construct new XML documents by writing XML directly inline. Again, the compiler ensures correctness with respect to the appropriate schema." It sounds interesting, but the tight coupling to schemas is a serious mistake. I want my XPaths and XML literals to work regardless of what the schema says. Indeed the schema agnosticity of both XML and XPath is one of their strengths. It's disturbing how many people keep trying to force the schemaless genie back into the bottle.


Axizon has released the Tiger XSLT Mapper Professional Edition, a $399 payware tool creatign XSLT stylesheets by matching structures in sample inputs with structures in sample outputs.

Sunday, August 20, 2006 (Permalink)

Ambush Commander has posted a beta of HTMLPurifier 1.0.0, a PHP library for filtering unsafe HTML from incoming data and rendering it standards conformant. This helps prevent cross-site-scripting attacks. "HTML Purifier takes a different approach, one that doesn't use specification-ignorant regexes or narrow blacklists. HTML Purifier will decompose the whole document into tokens, and rigorously process the tokens by: removing non-whitelisted elements, transforming bad practice tags like font into span, properly checking the nesting of tags and their children and validating all attributes according to their RFCs." HTMLPurifier is published under the LGPL.

Friday, August 18, 2006 (Permalink)

The W3C has updated four recommendations:

The new editions just incorporate various errata. I skimmed the changes. Most of them don't look too serious or likely to cause problems. There is one nasty one (rewriting history to put the xmlns prefix in a namespace) but that's been a problem for a long time. Moving that decision from the errata page to the core spec doesn't really change anything. If anything, it's more honest since we can now specify tools as supporting the first or second edition of Namespaces in XML rather than pretending there's really only one variant.

Thursday, August 17, 2006 (Permalink)

Dimitre Novatchev has released FXSL 2.0, v2. FXSL is a library for implementing functional programming in XSLT provides XSLT through a reusable set of functions, a means of implementing higher-order functions and using those functions as first class objects in XSLT. New features in this release include:

  • Almost all standard XPath 2.0 functions (F & O) have now higher-order FXSL wrappers that makes possible to use them as higher order functions and to create partial applications from them.
  • Some standard XSLT 2.0 functions and instructions have now higher-order FXSL wrappers that makes possible to use them as higher order functions and to create partial applications from them.
  • All standard XPath 2.0 operators (F & O) have now higher-order FXSL wrappers that makes possible to use them as higher order functions and to create partial applications from them.
  • All standard XPath 2.0 constructors have now higher-order FXSL wrappers that makes possible to use them as higher order functions and to create partial applications from them.
  • Currying and partial application uses dynamic type detection of the arguments of the function. On the final evaluation of the function when all arguments have been specified the typed values of the arguments are reconstructed using the recorded type information."
Tuesday, August 15, 2006 (Permalink)

Version 2.4.5 of AbiWord has been released. AbiWord is an open source word processor for Mac OS X, Linux, and Windows. Supported file formats include Word, WordPerfect, DocBook, and OpenDocument. Features include grammar checking, an equation editor, tight image wrapping, revision tracking, styles, and all the usual featurs you'd expect in a word processor.


Paul DuBois has released xmlformat 1.04 an open source pretty-printer for XML documents written in Perl (or Ruby) that can adjust indentation, line-breaking, and text wrapping on a per-element basis. Version 1.0.4 adds line numbers. xmlformat is published under a BSD license.

Monday, August 14, 2006 (Permalink)

Len Bullard recently pointed out to me the OASIS-hosted xml-dev list seems to have gone dark again. Could someone please bang some heads over at OASIS to fix this? Thanks.


Alex Moffat's XSLTXT is a compact, non-XML syntax for XSLT. Think of it as doing for XSLT what RNC does for RELAX NG. It's an interesting idea. It may especially appeal to Python folks. I'm not sure I like it, but then I'm, more than usually comfortable with both XSLT and XML. I even prefer RNG to RNC. XSLTXT includes translators to and from XSLTXT. It's published under the GNU Lesser General Public License.

Sunday, August 13, 2006 (Permalink)

The W3C Scalable Vector Graphics Working Group has posted the candidate recommendation of Scalable Vector Graphics (SVG) Tiny 1.2. SVG Tiny is a "modularized language for describing two-dimensional vector and mixed vector/raster graphics in XML. SVG Tiny 1.2 is the baseline profile of SVG, implementable on a range of devices from cellphones and PDAs to desktop and laptop computers, and is the core of SVG 1.2. Other SVG 1.2 specifications will extend this functionality to form supersets (for example, SVG 1.2 Full)."

Friday, August 11, 2006 (Permalink)

Chris Fuenty is working on Swift, an unofficial port of Apple's Safari web browser to Windows based on the open source Webkit framework.

Thursday, August 10, 2006 (Permalink)

Todd Ditchendorf has released XSLPalette 1.0, a closed source free-beer floating palette for BBEdit and TextMate based on libxml and libxslt. It allows you to transform and set parameters from within BBEdit and TextMate.

Wednesday, August 9, 2006 (Permalink)

IBM's developerWorks has published my latest article, The Java XML Validation API. This article introduces the validation funtionality that's built into Java 5 and later. It's also available as a standard extension for Java 1.3 and later.


The Shiira Project has posted a beta of Shiira 2.0, an open source (Modified BSD license) Mac OS X web browser based on Web Kit and written in Cocoa. "The goal of the Shiira Project is to create a browser that is better and more useful than Safari." They've failed. Shiira assumes all XML files are RSS files. Bad browser. No cookie. I know this isn't the first browser in which I've seen this particular brain damage, but Safari doesn't have this bug. Mac OS X 10.3.9 or later is required.

Tuesday, August 8, 2006 (Permalink)

The W3C We-Hate-XML Working Group has published a note on Efficient XML Interchange Measurements. "In particular, this draft covers measurements of the properties of "compactness" and "processing efficiency", as defined by the XBC WG. We start by describing the context in which this analysis is being made, and the position of an efficient format in the landscape of high performance XML strategies. Then we describe the measured quantities in detail and the test framework in which they were made. A short description of each format is included. Since the measurement and analysis effort is still in progress, for this first draft the raw results and preliminary findings are given elsewhere (see respectively the EXI Measurements Results Preview and the Analysis of the EXI Measurements)." No actual results are presented.

Monday, August 7, 2006 (Permalink)

Nikolai Grigoriev has released SVGMath 0.3, a presentation MathML formatter that produces SVG written in pure Python and published under an MIT license. According to Grigoriev, "The new version can work with multiple-namespace documents (e.g. replace all MathML subtrees with SVG in an XSL-FO or XHTML document); configuration is made more flexible, and several bugs are fixed. There is also a stylesheet to adjust the vertical position of the resulting SVG image in XSL-FO."


Stefan Behnel has released MathDOM 0.7, "a set of Python 2.4 modules (using PyXML or lxml, and pyparsing) that import mathematical terms as a Content MathML DOM. It currently parses MathML and literal infix terms into a DOM or lxml document and writes out MathML and literal infix/prefix/postfix/Python terms. The DOM elements are enhanced by domain specific methods that make using the DOM a little easier. It comes with an XSLT-based output filter for Presentational MathML and RelaxNG-based document validation." mathDPm is open source, published under an MIT license.


Monkfish Software has released xmlBlueprint 4.1, a $45 payware XML editor for Windows 98 and later that features schema-based tag completion.

Sunday, August 6, 2006 (Permalink)

XQuey expert Howard Katz has released OscarsX. The site provides a raw XQuery interface to 78 years of Oscars nomination data in XML."

Friday, August 4, 2006 (Permalink)

Opera Software has released version 9.0.1 of their namesake free-beer web browser for Windows, Mac, and Linux, FreeBSD, and Solaris. Opera supports XML, CSS, and XSLT. 9.01 is a bug fix release.

Thursday, August 3, 2006 (Permalink)

The Mozilla Project has released Firefox 1.5.0.6. This release fixes a bug when playing Windows media content.

Wednesday, August 2, 2006 (Permalink)

Gerald Schmidt has released XML Copy Editor 1.0.6.5,a free-as-in-speech (GPL) XML editor for Windows and Linux "with DTD/XML Schema/RELAX NG validation, XSLT, XPath, pretty-printing, syntax highlighting, folding, tag completion/locking and lossless import/export of Microsoft Word documents."

Tuesday, August 1, 2006 (Permalink)

Steve Palmer has posted a beta of Vienna 2.1, an open source RSS/Atom client for Mac OS X. Vienna is the first reader I've found acceptable for daily use; not great but good enough. (Of course my standards for "good enough" are pretty high.) 2.1 focuses on improving the user interface with a unified layout that lets you scroll through several articles, article filtering (e.g. read all articles since the last refresh), manual folder reordering, a new get info window, and an improved condensed layout.


Matt Mullenweg has released Wordpress 2.0.4, a blog engine based on PHP and MySQL. 2.0.4 plugs various security holes, mostly involving plugins.

Monday, July 31, 2006 (Permalink)

The W3C Internationalization GEO (Guidelines, Education & Outreach) Working Group has updated the working draft of Internationalization Best Practices: Specifying Language in XHTML & HTML Content. According to the draft, "Specifying the language of content is useful for a wide number of applications, from linguistically sensitive searching to applying language-specific display properties. In some cases the potential applications for language information are still waiting for implementations to catch up, whereas in others, such as detection of language by voice browsers, it is a necessity today. On the other hand, adding markup for language information to content is something that can and should be done today. Without it, it will not be possible to take advantage of any future developments." This advice is summarized in 16 "best practices:

  • Best Practice 1: Always declare language in the html tag
  • Best Practice 2: html-based declarations for multilingual audiences
  • Best Practice 3: Declare language changes inside the document
  • Best Practice 4: Should I use the lang or xml:lang attribute?
  • Best Practice 5: Don't rely on Content-Language for text-processing
  • Best Practice 6: Don't use the body tag instead of the html tag
  • Best Practice 7: When attribute and element content are in different languages
  • Best Practice 8: Use HTTP or a meta tag for metadata
  • Best Practice 9: Provide a comma-separated list of languages
  • Best Practice 10: Division of multilingual docs
  • Best Practice 11: Use RFC3066bis or its successor
  • Best Practice 12: Use short language tags
  • Best Practice 13: Use Hans and Hant codes
  • Best Practice 14: Pros and cons of identifying the language
  • Best Practice 15: Using hreflang with CSS
  • Best Practice 16: Don't use flags to indicate languages
Sunday, July 30, 2006 (Permalink)

The W3C HTML working group has posted the first public working draft of XHTML Role Attribute Module.

The role attribute takes as its value one or more white-space separated QNames. The attribute describes the role(s) the current element plays in the context of the document. It is used by applications and assistive technologies to determine the purpose of UI widgets. In the case of a web page it may be declarative as a function of particular elements or it may be an attribute which is configurable by the page author. Additionally, role information may be used to define each action which may be performed on an element. This allows a user to make informed decisions on which actions may be taken on an element and activate the selected action in a device independent way.

<ul role="navigation wai:sitemap">
    <li href="downloads">Downloads</li>

    <li href="docs">Documentation</li>
    <li href="news">News</li>
</ul>

Defined roles include:

  • main
  • secondary
  • navigation
  • banner
  • contentinfo
  • note
  • seealso
  • search

You can add other values for this attribute by placing the values in a namespace. (Haven't we learned yet that namespaced attribute values are a bad idea?)

Saturday, July 29, 2006 (Permalink)

The W3C XHTML working group has published the eighth public working draft of XHTML 2.0. XHTML 2.0 is the next, backwards incompatible version of HTML that incorporates XFrames, XForms, and lots of other crunchy XML goodness. This draft adds support for xml:id, but still retains the old non-namespaced id attribute. XLink is not yet included and may never be. (The HTML Working Group are extreme XLink skeptics.) Whether browser vendors will ever agree to implement this is an open question.


The W3C XForms working group has posted the third public working draft of XForms 1.1. Changes since 1.0 include:

  • A new namespace URI, http://www.w3.org/2004/xforms/
  • power, luhn, current, choose, id and property XPath extension functions
  • An e-mail address datatype
  • An ID card number datatype
  • A prompt action element
  • An xforms-close event
  • An xforms-submit-serialize event
  • Inline rendering of non-text media types

The major addition I noted in this draft is support for HTTP DELETE. This is critical for the Atom Publishing Protocol (APP) among other RESTful protocols.

Friday, July 28, 2006 (Permalink)

Norm Walsh has published the seventh beta of DocBook 5.0. DocBook 5 is "a significant redesign that attempts to remain true to the spirit of DocBook." The schema is written in RELAX NG. A DTD and W3C XML Schema generated from the RELAX NG schema are also available. There's also a Schematron schema "that validates some extra-grammatical DocBook constraints. These patterns are also present directly in the RELAX NG Grammar and some validators, for example MSV, can perform both kinds of validation at the same time." Changes in this beta are quite minor and include a startingnumber attribute for orderedlist and making msgaud, msgorig, and msglevel optional on simplemsgentry.

Thursday, July 27, 2006 (Permalink)

The Mozilla Project has released Firefox 1.5.0.5, Thunderbird 1.5.0.5, and SeaMonkey 1.0.3. These are bug fix releases and include fixes for several security problems. All users should upgrade.


Polarion Software has released Subversive 1.0, a pure Java, open source Eclipse plug-in that provides Subversion integration. It's based on JavaSVN.

Wednesday, July 26, 2006 (Permalink)

IBM's developerWorks has published my latest article, The Java XPath API: Querying XML from Java programs. This article introduces the XPath funtionality that's built into Java 5 and later. It's also available as a standard extension for Java 1.3 and later.

Tuesday, July 25, 2006 (Permalink)

The W3C Scalable Vector Graphics Working Group has posted a fourth last call working draft of Scalable Vector Graphics (SVG) Tiny 1.2. SVG Tiny is a "modularized language for describing two-dimensional vector and mixed vector/raster graphics in XML. SVG Tiny 1.2 is the baseline profile of SVG, implementable on a range of devices from cellphones and PDAs to desktop and laptop computers, and is the core of SVG 1.2. Other SVG 1.2 specifications will extend this functionality to form supersets (for example, SVG 1.2 Full)."

Monday, July 24, 2006 (Permalink)

FormFaces 1.0, a pure JavaScript XForms processor, has been released.

FormFacesTM is a pure JavaScript solution that utilizes AJAX techniques and can be seamlessly integrated with AJAX applications. This means that XForms+HTML can be sent directly to the browser where JavaScript transcodes the XForms controls to HTML form controls and processes the binding directly within the browser - requiring ZERO server-side processing and ZERO plug-ins. This is extremely simple to use, just insert the following tag into your XForms+HTML document:

<script type="text/javascript" src="formfaces.js"></script>

The FormFacesTM JavaScript is compatible with browsers that implement XHTML 1.0, ECMA-262 3rd Edition, and DOM Level 2 which includes Internet Explorer, Netscape, Mozilla, FireFox, Opera, Konquerer, Safari, and NetFront. To this end, the new FormFacesTM framework enables:

  • Cross-browser support - existing client-side browser can be used.
  • Server-side technology agnostic - the same forms can be used across disparate frameworks such as Java and .Net
  • Offline mode - user interaction does not require server round-trips

This looks very interesting. I really need to explore this further and see if it actually works. This has the potential to be a real game changer. (If only XForms supported PUT and DELETE...) FormFaces is published under both free (GPL) and non-free licenses.

Sunday, July 23, 2006 (Permalink)

The W3C XML Query and XSL Working Groups has updated the Candidate Recommendation XQuery 1.0 and XPath 2.0 Data Model (XDM). The changes since the last draft don't feel too huge.


The W3C XQuery working group has also posted the third working draft ofXQuery Update Facility. XQuery as it currently exists is basically just SELECT in SQL terms. This is INSERT, UPDATE, and DELETE. More specifically it is:

  • upd:mergeUpdates
  • upd:revalidate
  • upd:applyUpdates
  • Update Primitives
  • upd:insertBefore
  • upd:insertAfter
  • upd:insertInto
  • upd:insertIntoAsLast
  • upd:insertAttributes
  • upd:delete
  • upd:replaceValue
  • upd:rename

Changes in this draft appear to be primarily editorial.

Friday, July 21, 2006 (Permalink)

IBM's developerWorks has published my latest article, Debug stylesheets with xsl:message; Echo printing in XSLT. If you're still using System.out.println() to debug Java code, (and honestly, who isn't?) you'll love this one. :-)


The Omni Group has posted the first public beta of OmniWeb 5.5, a $29.95 payware web browser for Mac OS X. This release is now a universal binary, and warns users when reloading POST pages. Most importantly, although the Omni Group hasn't advertised this feature, this beta seems to be able to render XML pages styled with XSLT for the first time.

Thursday, July 20, 2006 (Permalink)

Todd Ditchendorf has released XML Nanny 2.0, a free-as-in-beer Mac OS X program that checks XHTML and XML documents for well-formedness and validity. Version 2.0 is now a universal binary and adds support for RELAX NG and Schematron. Mac OS X 10.4 or later is required.

Tuesday, July 18, 2006 (Permalink)

I've posted the first beta of XOM 1.2, my free-as-in-speech (LGPL) library for processing XML with Java. Compared to the 1.0-->1.1 transition, this is a very minor upgrade. There are just a couple of additional methods, a few bug fixes, and maybe a small optimization or two. All code written to the 1.1 or 1.0 APIs should run unchanged with 1.2. Possibly the biggest change is to the build process. Jaxen is now bundled rather than being loaded directly from CVS. If you download the source build, please let me know how the build goes for you. Thanks.

Update: I accidentally omitted one key file from the source distribution. I've uploaded fixed archives.

Monday, July 17, 2006 (Permalink)

ALT Mobile has released <alt> XML Studio v6, a $1000 payware something. I think it's some sort of combined XML browser/editor/code generator. Possibly there's some more info on their web site, but the site is truly hideous, and I just couldn't take it anymore. I have no idea whether the product itself is any good or not. Perhaps it could be if the software developers had nothing to do with the web site and vice versa.


Yves Zoundi has posted the first release candidate of XPontus 1.0, is an open source XML editor written in Java. XPontus is published under the GPL License.

Sunday, July 16, 2006 (Permalink)

The KDE Project has released KOffice 1.5.2, an open source office suite (word processor, spreadsheet, presentation program, etc.) for Linux. KOffice now saves files in the XML-based OASIS OpenDocument file format by default. This is basically a bug fix release.

Thursday, July 13, 2006 (Permalink)

Code Synthesis has released xsd 2.2.0, a free-as-in-speech (GPL) W3C XML Schema language based data binding tool for C++.

Given an XML instance description (XML Schema), it generates C++ classes that represent the given vocabulary as well as parsing and serialization code (collectively called a mapping or binding).

Compared to APIs such as DOM and SAX, the generated code allows you to access the information in XML instance documents using your domain vocabulary instead of generic elements, attributes, and text. Static typing helps catch errors at compile-time rather than at run-time. Automatic code generation frees you for more interesting tasks while minimizing the effort needed to adopt your applications to changes in the document structure.

xsd supports two C++ mappings: in-memory C++/Tree and event-driven C++/Parser. The C++/Tree mapping consists of C++ classes that represent data types defined in XML Schema, a set of parsing functions that convert XML instance documents to a tree-like in-memory data structure, and a set of serialization functions that convert the in-memory representation back to XML....

The C++/Parser mapping provides parser templates for data types defined in XML Schema. Using these parser templates you can build your own in-memory representations or perform immediate processing of XML instance documents.

2.2.0 adds streaming XML serialization and binary serialization.

Wednesday, July 12, 2006 (Permalink)

The Mozilla Project has posted the first beta of Firefox 2.0, New features in 2.0 include:

  • Anti-Phishing Protection.
  • Search suggestions now appear with search history in the search box for Google, Yahoo! and Answers.com
  • Scrollable tabs
  • Ability to re-open accidentally closed tabs
  • Better support for previewing and subscribing to web feeds
  • Inline spell checking in forms (ironically the first word it flagged for me as misspelled was "Firefox")
  • Search plugin manager
  • Microsummaries feature for bookmarks
  • Automatic restoration of your browsing session if there is a crash
  • New combined and improved Add-Ons manager for extensions and themes
  • New Windows installer
  • JavaScript 1.7
  • Client-side session and persistent storage (a really hideous idea, sure to be misused)
  • svg:textPath

You may want to be careful and wait for the next beta. The very first time I tried to run this it crashed.

Tuesday, July 11, 2006 (Permalink)

Oleg Paraschenko has released TeXML 2.0, an XML vocabulary for TeX. The processor that transforms TeXML markup into TeX markup is written in Python, and thus should run on most modern platforms. According to Paraschenko, "The main new feature is an automatic laying out of the generated LaTeX code. In fully automatic mode, the TeXML processor deletes redundant spaces and splits long lines on smaller chunks. The generated LaTeX code is legible enough for humans to read and modify." TeXML is now published under the MIT/X Consortium license..

Monday, July 10, 2006 (Permalink)

The W3C Web API Working Group has posted the first public working draft of The XMLHttpRequest Object.

The XMLHttpRequest object has been implemented for many years as ActiveX control in the Windows Internet Explorer browser and has later been adopted by other popular web browsers. Unfortunately the current implementations are not completely interoperable. Based on those early implementations this specification defines how a common subset of XMLHttpRequest should work and this will probably result in changes in said implementations leading to more interoperable and useful implementations of the XMLHttpRequest object.

Future versions of this specification (as opposed to future drafts of this version) may add new features, after careful examination from browser developers and Web content developers.

Sunday, July 9, 2006 (Permalink)

The W3C XHTML working group has posted the last call working draft of XHTML Basic 1.1.

The XHTML Basic document type includes the minimal set of modules required to be an XHTML host language document type, and in addition it includes images, forms, basic tables, and object support. It is designed for Web clients that do not support the full set of XHTML features; for example, Web clients such as mobile phones, PDAs, pagers, and settop boxes. The document type is rich enough for content authoring.

XHTML Basic is designed as a common base that may be extended. The goal of XHTML Basic is to serve as a common language supported by various kinds of user agents.

This revision, 1.1, supercedes version 1.0 as defined in http://www.w3.org/TR/2000/REC-xhtml-basic-20001219. In this revision, six new features have been incorporated into the language in order to better serve the small-device community that is this language's major user:

  1. XHTML Forms (defined in [XHTMLMOD])
  2. Intrinsic Events (defined in [XHTMLMOD])
  3. The value attribute for the li element (defined in [XHTMLMOD])
  4. The target attribute (defined in [XHTMLMOD])
  5. The style element (defined in [XHTMLMOD])
  6. The inputmode attribute (defined in Section 5 of this document)

The document type definition is implemented using XHTML modules as defined in "XHTML Modularization"

Comments are due by August 4.


The W3C XHTML working group has also posted the last call working draft of XHTML Modularization 1.1, on which XHTML Basic is based. "This document is version 1.1 of XHTML Modularization, an abstract modularization of XHTML and implementations of the abstraction using XML Document Type Definitions (DTDs), and XML Schemas. This modularization provides a means for subsetting and extending XHTML, a feature needed for extending XHTML's reach onto emerging platforms. This specification is intended for use by language designers as they construct new XHTML Family Markup Languages. This specification does not define the semantics of elements and attributes, only how those elements and attributes are assembled into modules, and from those modules into markup languages. This second version of this specification includes several minor updates to provide clarifications and address errors found in the first version. It also provides an implementation using XML Schemas...This document is the merger of the Modularization of XHTML in XML Schema last call draft of 3 October 2003 and the Modularization of XHTML W3C Recommendation of 10 April 2001. The materials from the former are incorporated as appendices into this document (as indicated during that document's last call period), and some clarifications were applied to material from the latter. No major changes in methodology or functionality are included in this version." Comments are due by August 4.

Friday, July 7, 2006 (Permalink)

Advanced Software Production Line has posted LibAxl 0.24, an open source (LGPL) XML parser for Linux written in ANSI C. It uses its own custom API rather than one of the standards. At first glance, it does not appear to have namespace support. I'm not sure who wrote this library which makes me nervous. It's easier to get these sort of libraries wrong than right. However, I don't find anything in the API I can definitively point to and say, "That's wrong." This is actually quite unusual, and a good sign.


Andrew Welch has released Kernow 1.4, a cross-platform, open source graphical front end for Saxon written in Java. According to Welch, "Everything you would normally have to type into the command line is available through the mouse, with some extra features thrown in. If you have Schema Aware Saxon it will run that too." New features in 1.5 include optional caching URI and entity resolvers, directory validation, schema aware XQueries, and a choice of default, lax or strict validation for schema aware transforms.

Thursday, July 6, 2006 (Permalink)

Planamesa Software has posted the fourth alpha release of NeoOffice/J 2.0, a Mac port of OpenOffice 2.0.2 using a Java-based GUI. Mac OS X 10.3 or later is required. This release is now compatible with the Intel Macs. NeoOffice is published exclusively under the GPL.

Wednesday, July 5, 2006 (Permalink)

The W3C Web Services Choreography Working Group has posted the first working draft of Web Services Choreography Description Language: Primer. "This primer is intended to give an overview of WS-CDL and can be read by WS-CDL users (e.g. a software professional wishing to write choreography descriptions) and WS-CDL implementers (e.g software professionals wishing to create WS-CDL compliant tools) alike."

Tuesday, July 4, 2006 (Permalink)

The W3C Web Services Activity. has posted the first public working draft of Semantic Annotations for WSDL. According to the draft,

Semantic Annotations in WSDL Version 1.0 (SAWSDL) defines how to add semantic annotations to WSDL 2.0 components. The specification defines extension attributes that can be applied to both WSDL elements and XML Schema elements to annotate input and output messages defined in a WSDL 2.0 interface.

Semantic annotations are references from an element within a WSDL or XML Schema document to a concept in an ontology. This specification defines annotation mechanisms for relating WSDL inputs and outputs to concepts defined in an outside ontology. Similarly, it defines how to annotate WSDL operations and how to categorize WSDL interfaces. Further, it defines an annotation mechanism for specifying the structural mapping of XML Schema types to and from an ontology. The annotation mechanism is independent of the ontology expression language and this specification requires no particular ontology language.

Monday, July 3, 2006 (Permalink)

The W3C Web Application Formats Working Group has published the first official working draft of XML Binding Language (XBL) 2.0.

This specification describes the ability to map elements to script, event handlers, CSS, and more complex content models. This can be used to re-order and wrap content so that, for instance, simple HTML or XHTML markup can have complex CSS styles applied without requiring that the markup be polluted with multiple semantically neutral div elements.

It can also be used to implement new DOM interfaces, and, in conjunction with other specifications, enables arbitrary tag sets to be implemented as widgets. For example, XBL could in theory be used to implement XForms.

This version is a non-backwards-compatible "revision of Mozilla's XBL 1.0 language, originally developed at Netscape in 2000, and originally implemented in the Gecko rendering engine" developed by Mozilla, Opera, Google, and Apple. (Hmm, who's missing from that list?) It's supposed less Mozilla foxused,. more browser independent. This is not the same as the W3C's sXBL effort, and it's not immediately clear whether work on that will continue in parallel, or if this will replace it in the W3C standards track. Eityher way this looks very interesting, and I hope the W3c can navigate the rocky shores of browser compatibility to get something usefully implemented.

Sunday, July 2, 2006 (Permalink)

The W3C has published the candidate recommendation of Mobile Web Best Practices 1.0 Basic Guidelines. Here's the summary of the guidelines:

  1. [THEMATIC_CONSISTENCY] Ensure that content provided by accessing a URI yields a thematically coherent experience when accessed from different devices.

  2. [CAPABILITIES] Exploit device capabilities to provide an enhanced user experience.

  3. [DEFICIENCIES] Take reasonable steps to work around deficient implementations.

  4. [TESTING] Carry out testing on actual devices as well as emulators.

  5. [URIS] Keep the URIs of site entry points short.

  6. [NAVBAR] Provide only minimal navigation at the top of the page.

  7. [BALANCE] Take into account the trade-off between having too many links on a page and asking the user to follow too many links to reach what they are looking for.

  8. [NAVIGATION] Provide consistent navigation mechanisms.

  9. [ACCESS_KEYS] Assign access keys to links in navigational menus and frequently accessed functionality.

  10. [LINK_TARGET_ID] Clearly identify the target of each link.

  11. [LINK_TARGET_FORMAT] Note the target file's format unless you know the device supports it.

  12. [IMAGE_MAPS] Do not use image maps unless you know the device supports them effectively.

  13. [POP_UPS] Do not cause pop-ups or other windows to appear and do not change the current window without informing the user.

  14. [AUTO_REFRESH] Do not create periodically auto-refreshing pages, unless you have informed the user and provided a means of stopping it.

  15. [REDIRECTION] Do not use markup to redirect pages automatically. Instead, configure the server to perform redirects by means of HTTP 3xx codes.

  16. [EXTERNAL_RESOURCES] Keep the number of externally linked resources to a minimum.

  17. [SUITABLE] Ensure that content is suitable for use in a mobile context.

  18. [CLARITY] Use clear and simple language.

  19. [LIMITED] Limit content to what the user has requested.

  20. [PAGE_SIZE_USABLE] Divide pages into usable but limited size portions.

  21. [PAGE_SIZE_LIMIT] Ensure that the overall size of page is appropriate to the memory limitations of the device.

  22. [SCROLLING] Limit scrolling to one direction, unless secondary scrolling cannot be avoided.

  23. [CENTRAL_MEANING] Ensure that material that is central to the meaning of the page precedes material that is not.

  24. [GRAPHICS_FOR_SPACING] Do not use graphics for spacing.

  25. [LARGE_GRAPHICS] Do not use images that cannot be rendered by the device. Avoid large or high resolution images except where critical information would otherwise be lost.

  26. [USE_OF_COLOR] Ensure that information conveyed with color is also available without color.

  27. [COLOR_CONTRAST] Ensure that foreground and background color combinations provide sufficient contrast.

  28. [BACKGROUND_IMAGE_READABILITY] When using background images make sure that content remains readable on the device.

  29. [PAGE_TITLE] Provide a short but descriptive page title.

  30. [NO_FRAMES] Do not use frames.

  31. [STRUCTURE] Use features of the markup language to indicate logical document structure.

  32. [TABLES_SUPPORT] Do not use tables unless the device is known to support them.

  33. [TABLES_NESTED] Do not use nested tables.

  34. [TABLES_LAYOUT] Do not use tables for layout.

  35. [TABLES_ALTERNATIVES] Where possible, use an alternative to tabular presentation.

  36. [NON-TEXT_ALTERNATIVES] Provide a text equivalent for every non-text element.

  37. [OBJECTS_OR_SCRIPT] Do not rely on embedded objects or script.

  38. [IMAGES_SPECIFY_SIZE] Specify the size of images in markup, if they have an intrinsic size.

  39. [IMAGES_RESIZING] Resize images at the server, if they have an intrinsic size.

  40. [VALID_MARKUP] Create documents that validate to published formal grammars.

  41. [MEASURES] Do not use pixel measures and do not use absolute units in markup language attribute values and style sheet property values.

  42. [STYLE_SHEETS_USE] Use style sheets to control layout and presentation, unless the device is known not to support them.

  43. [STYLE_SHEETS_SUPPORT] Organize documents so that if necessary they may be read without style sheets.

  44. [STYLE_SHEETS_SIZE] Keep style sheets small.

  45. [MINIMIZE] Use terse, efficient markup.

  46. [CONTENT_FORMAT_SUPPORT] Send content in a format that is known to be supported by the device.

  47. [CONTENT_FORMAT_PREFERRED] Where possible, send content in a preferred format.

  48. [CHARACTER_ENCODING_SUPPORT] Ensure that content is encoded using a character encoding that is known to be supported by the device.

  49. [CHARACTER_ENCODING_USE] Indicate in the response the character encoding being used.

  50. [ERROR_MESSAGES] Provide informative error messages and a means of navigating away from an error message back to useful information.

  51. [COOKIES] Do not rely on cookies being available.

  52. [CACHING] Provide caching information in HTTP responses.

  53. [FONTS] Do not rely on support of font related styling.

  54. [MINIMIZE_KEYSTROKES] Keep the number of keystrokes to a minimum.

  55. [AVOID_FREE_TEXT] Avoid free text entry where possible.

  56. [PROVIDE_DEFAULTS] Provide pre-selected default values where possible.

  57. [DEFAULT_INPUT_MODE] Specify a default text entry mode, language and/or input format, if the device is known to support it.

  58. [TAB_ORDER] Create a logical order through links, form controls and objects.

  59. [CONTROL_LABELLING] Label all form controls appropriately and explicitly associate labels with form controls.

  60. [CONTROL_POSITION] Position labels so they lay out properly in relation to the form controls they refer to.

Saturday, July 1, 2006 (Permalink)

John Cowan has released TagSoup 1.0.1, an open source, Java-language, SAX parser for nasty, ugly HTML. According to Cowan, "Previous versions of TagSoup always ignored whitespace in elements that don't have PCDATA as a possible child. Now, if you turn on the ignorableWhitespaceFeature (or use the --ignorable option), that whitespace will be returned to your application through the previously unused ContentHandler.ignorableWhitespace callback. This isn't done by default for backwards compatibility, and also because HTML is an SGML application and SGML parsers routinely dropped such whitespace." This release also fixes a couple of bugs where TagSoup could report malformed comments and public identifiers.

Friday, June 30, 2006 (Permalink)

The OpenOffice Project has released OpenOffice 2.0.3, an open source office suite for Linux and Windows that saves all its files as zipped XML. It also runs on the Mac with X-Windows. This is a bug fix release that includes several security fixes. All users should upgrade. OpenOffice is dual licensed under the LGPL and Sun Industry Standards Source License.


Microsoft has posted the third beta of Internet Explorer 7 (Windows only). Feed reading is supposed to be feature complete in this release. Other new features in IE3 include tabbed browsing and print-to-fit. I hope it still supports XML, XSLT, and CSS; though Microsoft doesn't seem to say that anywhere I can find; and I don't feel like booting up Windows just to test that.

Thursday, June 29, 2006 (Permalink)

The Modis Team has released Sedna 1.0, an open source native XML database for Windows and Linux written in C++ and Scheme and published under the Apache License 2.0. Sedna supports XQuery and its own declarative update language.

Wednesday, June 28, 2006 (Permalink)

Opera Software has released version 9.0 of their namesake free-beer web browser for Windows, Mac, and Linux, FreeBSD, and Solaris. Opera supports XML, CSS, and XSLT. New features in Opera 9 include a content blocker, BitTorrent support, and per-site preferences.

Tuesday, June 27, 2006 (Permalink)

Sometime in the last 24 hours some virus or spammer started using the cafeconleche.org domain as the from address in its messages. Consequently I began getting a plethora of bounces. I don't actually use any e-mail addresses in the cafeconleche.org domain so I just cancelled the MX record that forwards everything to my real e-mail address. Feel free to drop e-mails from cafeconleche.org in the bit bucket if that helps you out.

One side note: in 2006 why are spam programs still sending out non-delivery messages? About half the crap I get is from spam filters that feel obliged to tell me that they filtered my spam. Any vendor who doesn't recopgnize by now that spam and viruses use forged e-mail addresses is simply incompetent and deserves to be black-holed into oblivion.


The W3C WebCGM Working Group has posted the first public working draft of WebCGM 2.0, an updated version of the ISO Computer Graphics Metafile standard (ISO/IEC 8632:1999). "WebCGM 2.0 adds a DOM (API) specification for programmatic access to WebCGM objects, and a specification of an XML Companion File (XCF) architecture, for externalization of non-graphical metadata. WebCGM 2.0, in addition, builds upon and extends the graphical and intelligent content of WebCGM 1.0, delivering functionality that was forecast for WebCGM 1.0, but was postponed in order to get the standard and its implementations to users expeditiously." Personally I thought this had been superseded by SVG, and was now a purely legacy format.

Monday, June 26, 2006 (Permalink)

DataDirect Technologies has released DataDirect XQuery 2.0, a closed source Java library for integrating XQuery functionality into your application. As well as supporting XML documents, it can query relational databases, EDI, and CSV data. DataDirect XQuery 2.0 implements the XQuery API for Java. Pricing is not available.

Monday, June 19, 2006 (Permalink)

The W3C has published four proposed edited recommendations for the core XML specs:

These are not supposed to be new versions of XML. They are supposed to be limited to accumulating various small errata that have been noticed in these specs since their previous publication. I have not yet had a chance to review the proposed changes to see if that's actually true. Comments are due by July 14.

Saturday, June 17, 2006 (Permalink)

John Cowan has released TagSoup 1.0, an open source, Java-language, SAX parser for nasty, ugly HTML. XOM uses TagSoup to convert JavaDoc to well-formed XHTML.

Friday, June 16, 2006 (Permalink)

Matt Mullenweg has released Wordpress 2.0.3, a blog engine based on PHP and MySQL. 2.0.3 claims to fix some security holes, but I'm not convinced. What 2.0.3 does is implement a complex Rube Goldberg contraption to try and plug the holes caused by the insistence on using GETs for operations with side-effects.

Thursday, June 15, 2006 (Permalink)

RenderX has released version 4.6 of XEP, its payware XSL Formatting Objects to PDF and PostScript converter. XEP also supports part of Scalable Vector Graphics (SVG) 1.1. New features in 4.6 include an AFP backend, an improved line-breaking algorithm, and a new implementation of XSL 1.1 change bars. The basic client is $299.95. The developer edition with an API is $999.95. The server version is $3999.95.

Wednesday, June 14, 2006 (Permalink)

The W3C XML Query and XSL Working Groups have updated the Candidate Recommendations for XQuery, XSLT 2 and XPath 2. The major change in these drafts is that several data types have migrated from the xdt, http://www.w3.org/2005/xpath-datatypes namespace to the W3C XML Schema xs, http://www.w3.org/2001/XMLSchema namespace. There are numerous small bug fixes and clarifications as well. Updated drafts include:


In related news, Michael Kay has released version 8.7.3 of Saxon, his XSLT 2.0 and XQuery processor for Java and .NET. According to Kay, "This maintenance release clears 35 documented bugs, and on .NET it adds the capability to process a DOM Document "in situ" rather than by first copying it to create a Saxon tree."

Saxon is published in two versions for both of which Java 1.4 or later (or .NET) is required. Saxon 8.7B is an open source product published under the Mozilla Public License 1.0 that "implements the 'basic' conformance level for XSLT 2.0 and XQuery." Saxon 8.7SA is a £250.00 payware version that "allows stylesheets and queries to import an XML Schema, to validate input and output trees against a schema, and to select elements and attributes based on their schema-defined type. Saxon-SA also incorporates a free-standard XML Schema validator. In addition Saxon-SA incorporates some advanced extensions not available in the Saxon-B product. These include a try/catch capability for catching dynamic errors, improved error diagnostics, support for higher-order functions, and additional facilities in XQuery including support for grouping, advanced regular expression analysis, and formatting of dates and numbers."

Tuesday, June 13, 2006 (Permalink)

Sonic Software has released Stylus Studio 2006 Release 3, a $895 payware XML editor for Windows. Features include:

  • XML differencing
  • XSLT debugging
  • XSLT mapping
  • XSLT profiling
  • XSL:FO
  • XQuery editing, mapping, and debugging.
  • XML Schema Editor
  • Document Type Definition (DTD) Editor
  • XPath Evaluator
  • XPath Expression Generator
  • Web Service Call Composer
  • UDDI Registry Browser
  • Tools for mapping to and from XML documents, Web service data, relational data, and flat files
  • Import/export utilities for RDBMS, XML, CSV, ADO, and flat files
  • JSP Editor

New features in this release include:

  • DataDirect XQuery 2.0 support
  • RenderX XEP Personal Edition XSL-FO processor bundled
  • A new XPath Query Editor,
  • New Java APIs for accessing EDI, X12, EDIFACT and other legacy data formats.
Monday, June 12, 2006 (Permalink)

Tomorrow, Tuesday June 13, I'll be joining the monthly meeting of the Amateur Computer Group of New Jersey (ACGNJ) JUG in Scotch Plains to talk about RSS, Atom, APP, and All That. The meeting is free and open to the public.

Saturday, June 10, 2006 (Permalink)

Lunasil Ltd has released Xinc 2.0.2. an XSL formatting objects to PDF converter. Java 1.4 or later is required. Pricing runs from $95-$2500 plus support.

Thursday, June 8, 2006 (Permalink)

The W3C XHTML working group has posted the first public working draft of XHTML Basic 1.1.

The XHTML Basic document type includes the minimal set of modules required to be an XHTML host language document type, and in addition it includes images, forms, basic tables, and object support. It is designed for Web clients that do not support the full set of XHTML features; for example, Web clients such as mobile phones, PDAs, pagers, and settop boxes. The document type is rich enough for content authoring.

XHTML Basic is designed as a common base that may be extended. The goal of XHTML Basic is to serve as a common language supported by various kinds of user agents.

This revision, 1.1, supercedes version 1.0 as defined in http://www.w3.org/TR/2000/REC-xhtml-basic-20001219. In this revision, four new features have been incorporated into the language in order to better serve the small-device community that is this language's major user:

  1. Intrinsic Events (defined in [XHTMLMOD])
  2. The target attribute (defined in [XHTMLMOD])
  3. The style element (defined in [XHTMLMOD])
  4. The inputmode attribute (defined in Section 5 of this document)

The document type definition is implemented using XHTML modules as defined in "XHTML Modularization" [XHTMLMOD].

Wednesday, June 7, 2006 (Permalink)

I've posted the notes from last night's RSS, Atom, APP, and All That presentation to the Philadelphia Java User's Group.


SyncroSoft has released version 7.2 of the <Oxygen/> XML editor. Oxygen supports XML, XSL, DTDs, XQuery, SVG, Relax NG, Schematron, and the W3C XML Schema Language. Version 7.2 adds a Subversion client, flattening XML Schemas, rename refactoring, and support for the X-Hive/DB, MarkLogic XML Database and TigerLogic XML Databases. Oxygen costs $298 with support. Upgrades from 6.0 cost $130.

Tuesday, June 6, 2006 (Permalink)

The W3C Device Independence Working Group has posted the first public working draft of Device Independent Authoring Language (DIAL). "DIAL is a language profile based on existing W3C XML vocabularies and CSS modules. These provide standard mechanisms for representing Web page structure, presentation and form interaction. The DIAL also makes use of the DISelect metadata vocabulary [DISelect] for overcoming the authoring challenges [ACDI] inherent in authoring for multiple delivery contexts." In other words, it's a mix of an XHTML 2 subset, XForms, and some custom markup for publishing content and web apps to cell phones, PDAs, and other non-traditional devices.


John Cowan has posted the eighth release candidate of TagSoup, an open source, Java-language, SAX parser for nasty, ugly HTML. I use TagSoup to convert JavaDoc to well-formed XHTML. According to Cowan, this release "fixes a paper-bag bug that made it impossible to compile the jar-file from the released sources. I added a few bits of defensive programming as well, but there are no user-visible changes."

Monday, June 5, 2006 (Permalink)

Tomorrow, Tuesday June 6, I'll be joining the Philadelphia Java User's Group in Malvern, PA, at the Unisys East Coast Development Center, 2476 Swedesford Rd, Paoli PA. to talk about RSS, Atom, APP, and All That. The meeting is free and open to the public, but you need to RSVP to Dave Fecak if you would like to attend.

Next week on Tuesday June 13, I'll be delivering essentially the same talk to the monthly meeting of the Amateur Computer Group of New Jersey (ACGNJ) JUG in Scotch Plains. Again the meeting is free and open to the public.


Norm Walsh has published the sixth beta of DocBook 5.0 DocBook 5 is "a significant redesign that attempts to remain true to the spirit of DocBook." The schema is written in RELAX NG. A DTD and W3C XML Schema generated from the RELAX NG schema are also available. There's also a Schematron schema "that validates some extra-grammatical DocBook constraints. These patterns are also present directly in the RELAX NG Grammar and some validators, for example MSV, can perform both kinds of validation at the same time." This beta allows MathML and SVG in imagedata and improves support for aspect-oriented programming source code in DocBook documents.


Norm Walsh has also posted the third candidate release of DocBook 4.5. Version 4.5 implements a minor bug-fix to citebiblioid and updates the reference documentation.

As you may recall, I wrote Processing XML with Java in DocBook 4. I've been playing with DocBook 5 lately for a couple of possible future book projects. While it's clearly an improvement over DocBook 4 in numerous ways—for instance it uses namespaces, embeds SVG and MathML, and has reasonable XInclude support—the tool chain isn't up to snuff yet. The stylesheets and various editors like Oxygen haven't adapted to life in a DocBook 5 world yet. I'll probably continue to use DocBook 5 because I'm a bleeding edge sort of guy, but most users should stick to DocBook 4 for the time being.


Dominik Brettnacher has posted Annotate 0.1.6, a free-as-in-speech (GPL) annotation facility for DocBook documents. According to brettnacher, "Annotate enables visitors of an online version of a DocBook document to add comments to any paragraph or chapter of the document." This sounds immensely useful. The Uuser interface could use a little polishing, but it seems functional.


The Helsinki University of Technology has posted the first alpha of X-Smiles 1.0, a proof-of-concept XForms engine written in Java. According to Mikko Pohja, "The main new features of this release are in the areas of XBL, custom controls, and XHTML+SVG compound documents by inclusion."

Sunday, June 4, 2006 (Permalink)

John Cowan has posted the seventh release candidate of TagSoup, an open source, Java-language, SAX parser for nasty, ugly HTML. I use TagSoup to convert JavaDoc to well-formed XHTML. According to Cowan,

Mike Bremford sent me a patch that causes TagSoup to send the system and public IDs to the LexicalHandler if there is a DOCTYPE declaration present in the input. Formerly, DOCTYPE declarations were simply ignored. This patch is too good to reject even at this stage, and with a few emendations it passed all my acceptance tests, so I've incorporated it.

In addition, the last known remaining bug was removed. In the last few releases, the script element was allowed to be a root element (a by-product of allowing it anywhere). Now it will be wrapped in an html element instead. This eliminates some random newlines that were being added at the end of such a root-level script element as well.

TagSoup is dual licensed under the Academic Free License and the GPL.

Saturday, June 3, 2006 (Permalink)

Antenna House, Inc has released XSL Formatter 4.0 for Mac, Linux, and Windows. This tool converts XSL-FO files to PDF. Version 4.0 now supports XSL 1.1, PDF 1.6, PDF/X and Tagged PDF. Other new features in this release include:

  • Bundled hyphenation for over 40 languages
  • Microsoft Excel Charts
  • axf:hyphenation-minimum-character-count, axf:printer-marks-line-length, axf:printer-marks-zero-margin, and axf:repeat-page-sequence-master extensions elements
  • An onscreen ruler that can be turned on and off

The "lite" version costs $300 and up, but is limited to 300 pages per document and doesn't support right-to-left languages. Prices for the uncrippled version start around $1250. Support costs more.

Friday, June 2, 2006 (Permalink)

Dennis Sosnoski has released JiBX 1.1, yet another open source (BSD license) framework for binding XML data to Java objects using your own class structures. It falls into the custom-binding document camp as opposed to the schema driven binding frameworks like JaxMe and JAXB. 1.1 adds support for StAX input and output.


The Mozilla Project has released Firefox 1.5.0.4, Thunderbird 1.5.0.4, and SeaMonkey 1.0.2. These are bug fix releases and include fixes for several security problems. Also, Thunderbird is now a universal binary for Intel Macs. All users should upgrade.

Thursday, June 1, 2006 (Permalink)

The Mozilla Project has posted the third alpha of Firefox 2.0, code named "Bon Echo". New features in this release include:

I'm sorry, but that last one is just plain wrong. It's like cookies but worse. Apparently someone noticed that cookies don't actually do what they're supposed to do (Well, duh; but what do you expect when you completely violate one of the fundamental principles of the Web architecture?), but rather than turning around and figuring out how HTTP is supposed to work, (Hint: the lack of sessions is a feature--not a bug.) these incompetents have gone even further in the wrong direction. I know that in 1995 a lot of people back then didn't really understand the Web. This certainly included myself and the Netscape engineers who invented cookies. However I thought we'd learned a few things in the last 10 years. At least I did. I'm sorry to see that browser vendors didn't. At least Microsoft didn't sign on for this one (though Apple, Google, and Opera did). Maybe this time IE will uncharacteristically serve as a bulwark of sanity against this particularly stupidity.

Tuesday, May 30, 2006 (Permalink)

The W3C Web API working group has posted the first public working draft of the Selectors API. "It is often desirable to perform script and or DOM operations on a specific set of elements in a document. [Selectors], mostly used in CSS [CSS21] context, provides a way of matching such a set of elements. This specification introduces two methods which take a selector (technically a group of selectors) as argument and return the matched elements as result." The spec offers the following JavaScript example:

function resolver(str) {
  var prefixes = {
    xh:  "http://www.w3.org/1999/xhtml",
    svg: "http://www.w3.org/2000/svg"}
  return prefixes[str];
}
var a = document.matchAll("xh|div > svg|svg", resolver);
var b = document.match("div.foo.bar");

I'm not sure why they've chosen the weak CSS syntax insetad of the much more powerful and expressive XPath. Among other things, using XPath would allow colons to be used in qualified names instead of these weird vertical bars. Other things being equal, consistency of syntax should be preferred. Perhaps it needs to handle malformed HTML?

Monday, May 29, 2006 (Permalink)

John Cowan tells me I'm all wrong about the Authorizing Read Access to XML Content Using the <?access-control?> Processing Instruction 1.0:

Here's the scenario. You surf to a web page containing an applet (Java or Flash or whatever) downloaded from foo.com. Your browser will run that applet, but will only trust it to access content hosted on foo.com. This is good for security on your system -- a malicious applet can't grab confidential content from your behind-the-firewall servers and pipe it back to the applet's host -- but it makes the applet rather inflexible: any public data it needs to read has to be mirrored on its host.

Now comes this specification, and tells the browser that if the document contains <?access-control allow="foo.com"?>, then the browser should let the applet read that document just as if the document resided on foo.com. It has absolutely nothing to do with whether browsers at foo.com can themselves read the document or not -- anyone can read the document by running trusted code. The PI says that this document can be read even by untrusted code from specific sources.

It's also possible to use wildcards in the access-control PI to allow access by applets from anywhere, or from a set of hosts.

That's a little more sensible, but only a little. This is still a security problem. First of all, it allows the web page to say that it's OK to use applets (or JavaScript, or whatever) to launch a DOS attack on it. Secondly the limitations on applets talking to third party hosts aren't just for the protection of the third party hosts. They exist to protect the client system too. I'm not sure a client should routinely trust an applet to talk to server X because server Y says it's OK.

Sunday, May 28, 2006 (Permalink)

Henri Sivonen has implemented an online Validation Service for RELAX NG that can validate against standard (Docbook, XHTML. etc.) and user supplied schemas. "The validation service checks whether a given document meets the constraints of the chosen schema(s). Both XML syntax and compact syntax RELAX NG schemas are supported. Also, there is 'experimental' support for standalone (not embedded) Schematron 1.5 schemas. Multiple schemas at a time are supported."

Saturday, May 27, 2006 (Permalink)

The W3C Voice Browser, Web APIs and Web Application Formats (WAF) Working Groups have worked together to release a specification so colossally brain damaged it could not possibly have been designed by a single group. I am referring to, Authorizing Read Access to XML Content Using the <?access-control?> Processing Instruction 1.0. In brief, "This note describes a mechanism being used in the industry that allows a content provider to use a processing instruction embedded within the XML prolog to specify the access policy of that content. In this model a user agent can safely extend the sandbox in which it has restricted the application to include access to the XML content if and only if the specified policy grants permission." For example, you would put the processing instruction <?access-control allow="www.sun.com" deny="www.microsoft.com"?> in a document prolog, and Sun can read it but Microsoft can't. In other words, the client is supposed to trust the document it receives because that document says to trust it? Or in reverse, the server is supposed to believe that the client will obey any restrictions placed in the document? I keep thinking they can't possibly mean what they say, but they really seem to. At best this is a very poorly written specification that doesn'tt explain what it's actually trying to do. At worst, it's the single most broken security design I've seen in years, and that's saying a lot.

Friday, May 26, 2006 (Permalink)

Michael Smith has released version 1.70.1 of the DocBook XSL stylesheets. This release adds a number of small new features including three new attribute sets: revhistory.title.properties, revhistory.table.properties, and revhistory.table.cell.properties.

Thursday, May 25, 2006 (Permalink)

The W3C Semantic Web Best Practices and Deployment Working Group and HTML Working Groups have joined forces to publish RDFa Primer 1.0.

Current web pages, written in HTML, are chock-full of structured data. When publishers can express the document's metadata, and when tools can read it, a new world of user functionality becomes available, letting users copy and paste structured data between applications and web sites. An event on a web page can be directly imported into a user's desktop calendar. A license on a document can be automatically detected so that the user is informed of his rights automatically. A photo's creator, camera setting information, resolution, and topic can be published to enable structured search and sharing.

RDFa is a syntax for expressing such metadata in XHTML. The rendered, hypertext data of XHTML is reused by the RDFa markup, so that publishers don't repeat themselves. The underlying abstract metadata representation is RDF, which lets publishers build their own metadata vocabulary, extend others, and evolve their vocabulary with maximal interoperability over time. The metadata is closely tied to the data it describes, so that rendered data can be copied and pasted along with its relevant structure.

Here's a syntax example from the draft:

<h1 property="dc:title">Vacation in the South of France</h1>
<h2>created 
  by <span property="dc:creator">Mark Birbeck</span>
  on <span property="dc:date" type="xsd:date"
           content="2006-01-02">
    January 2nd, 2006
     </span>
</h2>

The thing that jumps out at me are the use of namespace prefixes in attribute values. Haven't we learned by now that this is a bad idea?

Wednesday, May 24, 2006 (Permalink)

Opera Software has posted the second beta (after a couple of preview releases) of version 9.0 of their namesake free-beer web browser for Windows, Mac, and Linux. Opera supports XML, CSS, and XSLT. I tested it on the raw conferences page and XSLT does appear to finally be working in this release.

Tuesday, May 23, 2006 (Permalink)

The W3C Internationalization Tag Set Working Group has posted the first public working draft of Best Practices for XML Internationalization. "This document provides a set of guidelines for developing XML documents and schemas that are internationalized properly. Following the best practices describes here allow both the developer of XML applications, as well as the author of XML content to create material in different languages." Suggestions include:

This draft is very rough, and I'm not sure I agree with everything they say. "Avoid translatable attributes" sounds very questionable to me, though there is a rationale for it. Feedback is requested,

Sunday, May 21, 2006 (Permalink)

John Cowan has posted the sixth release candidate of TagSoup, an open source, Java-language, SAX parser for nasty, ugly HTML. I use TagSoup to convert JavaDoc to well-formed XHTML. RC6 focuses on namespaces. According to Cowan,

This release fixed a bunch of bugs around namespaces. The SAX spec was a little hard to follow, so I am now doing a subset of what Xerces does, in hopes that that will be compatible with what most SAX applications expect. In particular, the namespace-prefixes feature is now false by default, as it should be, and cannot be made true. It used to be true by default, but did not meet the contract that implies, namely that xmlns: attributes would be provided to the application.

This also involved fixing a bug in XMLWriter that made it work incorrectly when the namespaces feature is false. In addition, most people don't want namespaces in HTML mode, so --html now implies --nons. To get the namespaces back, use --no-xml-declaration --method=html instead.

TagSoup is dual licensed under the Academic Free License and the GPL.

Saturday, May 20, 2006 (Permalink)

Papers from last weeks XTech conference in Amsterdam are now online. The navigation's a little confusing. Follow the schedule to the abstracts for the individual papers, and then click on "View: Full paper". Not every presenter submitted an official paper to go with their talk. On the plus side, those papers that are online are in HTML, not PDF. I wish more conferences did this.


Slava Pestov has uploaded the fourth pre-release of jEdit 4.3, an open source programmer's editor written in Java with extensive plug-in support and my preferred text editor on Windows and Unix. Besides bug fixes this release adds syntax highlighting for TypoScript, Myghty, and JavaCC.

Thursday, May 18, 2006 (Permalink)

IDEAlliance has posted the call for papers for XML 2006. The conference takes place in Boston December 4-7. This is the major North American XML show. I may attend this year. I haven't yet decided. They are planning four tracks this year:

  • Enterprise XML computing
  • XML on the Web
  • Documents and Publishing
  • Hands-on XML

According to David Megginson, the Hands-on XML track "will include workshops, tutorials, and the Masters' Series that started at XML 2005 (a makeover show for tech projects, where a panel of experts dissects what you've done and makes suggestions)." Proposals are due by June 19.

Wednesday, May 17, 2006 (Permalink)

Michael Smith has released version 1.70 of the DocBook XSL stylesheets. New features in 1.70 include This release adds a number of new features, including: alternative index-collation methods, improved handling of DocBook 5, full support for CALS and HTML tables in manpage output, support for FOP 0.90, and crop marks in FO and PDF output. "As with all DocBook Project dot-zero releases, this is an experimental release. It will be followed shortly by a stable release."

Tuesday, May 16, 2006 (Permalink)

The Mozilla Project has posted the second alpha of Firefox 2.0, code named "Bon Echo". New features in this release include:

  • Links default to opening in new tabs, not new windows (That seems like a bad idea.)
  • Inline spell checking in text boxes
  • Automatic restoration of your browsing session if there is a crash
  • Search suggestions now appear in the search box autocomplete for Google and Yahoo!
  • Improved support for previewing and subscribing to web feeds
  • New microsummaries feature for bookmarks
  • New search service that supports Sherlock and OpenSearch engines
Sunday, May 14, 2006 (Permalink)

The W3C Web Services Activity has posted the first public working draft of XML Schema Patterns for Common Data Structures Version 1.0. Data Types & Structures include:

  • String Value
  • Boolean Value
  • Decimal Value
  • Null Value
  • Default Value
  • Enumeration
  • Collection
  • Vector
  • Inherited Collection

Schema patterns include

  • Target Namespace
  • Qualified Local Elements
  • Qualified Attributes
  • String
  • Boolean
  • Int
  • String Enumeration
  • Collection
  • Optional Element
  • Nillable Element
  • Nillable-Optional
  • Element
  • Simple List
  • Wrapped Repeated Element
  • Repeated Element
Friday, May 12, 2006 (Permalink)

Manos Batsis has posted Sarissa 0.9.7, an open source (GPL) JavaScript library for processing XML under Mozilla and Internet Explorer. It provides methods to obtain DOM Document/XMLHTTP objects, synchronous and asynchronous loading, XSLT transformations, implements of some non-standard IE extensions for Mozilla, and adds NodeType constants for IE. "This release has many fixes especially for Safari and Opera. Also, Sarissa now is triple licensed. Available flavours are GNU GPL, GNU LGPL or Apache License 2.0."

Thursday, May 11, 2006 (Permalink)

NewsGator has released version 2.1 of NetNewsWire, a closed source RSS client for the Mac. It's available in both free-beer lite and $30 payware versions. Version 2.1 is a universal binary, can sync with NewsGator, and can email and print articles.

Wednesday, May 10, 2006 (Permalink)

The W3C Web Content Accessibility Guidelines Working Group has updated three working drafts covering various topics:

Web Content Accessibility Guidelines 2.0

"Web Content Accessibility Guidelines 2.0 (WCAG 2.0) covers a wide range of issues and recommendations for making Web content more accessible. This document contains principles, guidelines, and success criteria that define and explain the requirements for making Web-based information and applications accessible. 'Accessible' means usable to a wide range of people with disabilities, including blindness and low vision, deafness and hearing loss, learning difficulties, cognitive limitations, limited movement, speech difficulties, photosensitivity and combinations of these. Following these guidelines will also make your Web content more accessible to the vast majority of users, including older users. It will also enable people to access Web content using many different devices - including a wide variety of assistive technologies."

This draft is in last call. Comments are due by May 31.

Understanding WCAG 2.0

This draft "provides detailed information about each success criterion, including its intent; the key terms that are used in the success criterion; examples of Web content that meet the success criterion using various Web technologies (for instance, HTML, CSS, XML) and common examples of Web content that does not meet the success criterion. Finally, this document also explains how the success criteria in WCAG 2.0 help people with different types of disabilities."

Techniques for WCAG 2.0

"This is a First Public Working Draft of Techniques for WCAG 2.0. It is the first publication as a combined document; previously, techniques were published as separate documents - one for each technology. This document is being published as WCAG 2.0 goes to Last Call. It provides explanation of the techniques documented by the Web Content Accessibility Guidelines Working Group. Some are sufficient to meet a particular success criterion (either by themselves or in combination with other techniques) while other techniques are advisory and optional. None of the techniques are required to meet WCAG 2.0 although some may be the only known method if a particular technology is used."

There's a lot of good information here. These should really be required reading for all HTML authors and web designers. The Techniques spec is probably the most practical, and where most readers should start.

Tuesday, May 9, 2006 (Permalink)

The W3C XQuery working group has posted the second working drafts ofXQuery Update Facility and XQuery Update Facility Use Cases. XQuery as it currently exists is basically just SELECT in SQL terms. This is INSERT, UPDATE, and DELETE. More specifically it is:

  • upd:mergeUpdates
  • upd:revalidate
  • upd:applyUpdates
  • Update Primitives
  • upd:insertBefore
  • upd:insertAfter
  • upd:insertInto
  • upd:insertIntoAsLast
  • upd:insertAttributes
  • upd:delete
  • upd:replaceValue
  • upd:rename

According to spec editor Jonathan Robie:

The major changes in the XQuery Update Facility include a change to the grammar that eliminates superfluous and inconsistent use of curly braces, and a change to the compatibility matrix to eliminatehe incompatibility between primitive operations "insert into" and "insert into as last" when these operations are applied to the same target node.

The major changes in the XQuery Update Facility Use Cases include use of the new grammar and several new use cases, including updates on recursive structures, updates that create nilled elements, and an update that moves elements and attributes from one namespace to another.

Monday, May 8, 2006 (Permalink)

John Cowan has posted the fifth release candidate of TagSoup, an open source, Java-language, SAX parser for nasty, ugly HTML. I use TagSoup to convert JavaDoc to well-formed XHTML. RC5 fixes bugs and adds a --nocolons command line option that "translates colons in elements and attribute names to underscores." TagSoup is dual licensed under the Academic Free License and the GPL.

Friday, May 5, 2006 (Permalink)

The W3C XQuery working group has updated XQuery 1.0 and XPath 2.0 Full-Text Use Cases and XQuery 1.0 and XPath 2.0 Full-Text. Quoting from the latter:

As XML becomes mainstream, users expect to be able to search their XML documents. This requires a standard way to do full-text search, as well as structured searches, against XML documents. A similar requirement for full-text search led ISO to define the SQL/MM-FT [SQL/MM] standard. SQL/MM-FT [SQL/MM] defines extensions to SQL to express full-text searches providing similar functionality as does this full-text language extension to XQuery 1.0 and XPath 2.0.

XML documents may contain highly-structured data (numbers, dates), unstructured data (untagged free-flowing text), and semi-structured data (text with embedded tags). Where a document contains unstructured or semi-structured data, it is important to be able to search using Information Retrieval techniques such as scoring and weighting.

Full-text search is different from substring search in many ways:

  1. A full-text search searches for tokens and phrases rather than substrings. A substring search for news items that contain the string "lease" will return a news item that contains "Foobar Corporation releases the 20.9 version ...". A full-text search for the token "lease" will not.

  2. There is an expectation that a full-text search will support language-based searches which substring search cannot. An example of a language-based search is "find me all the news items that contain a token with the same linguistic stem as "mouse" (finds "mouse" and "mice"). Another example based on token proximity is "find me all the news items that contain the tokens "XML" and "Query" allowing up to 3 intervening words.

  3. Full-text search must address the vagaries and nuances of language. Search results are often of varying usefulness. When you search a web site for cameras that cost less than $100, this is an exact search. There is a set of cameras that matches this search, and a set that does not. Similarly, when you do a string search across news items for "mouse", there is only 1 expected result set. When you do a full-text search for all the news items that contain the token "mouse", you probably expect to find news items containing the token "mice", and possibly "rodents", or possibly "computers". Not all results are equal. Some results are more "mousey" than others. Because full-text search may be inexact, we have the notion of score or relevance. We generally expect to see the most relevant results at the top of the results list.

    As XQuery and XPath evolve, they may apply the notion of score to querying structured data. For example, when making travel plans or shopping for cameras, it is sometimes useful to get an ordered list of near matches in addition to exact matches. If XQuery and XPath define a generalized inexact match, we expect XQuery and XPath to utilize the scoring framework provided by XQuery and XPath Full-Text.

The following definitions apply to full-text search:

  1. [Definition: Full-text queries are performed on tokens and phrases. Tokens and phrases are produced via tokenization.] Informally, tokenization breaks a character string into a sequence of words, units of punctuation, and spaces.

  2. [Definition: A token is defined as a character, n-gram, or sequence of characters returned by a tokenizer as a basic unit to be searched. Each instance of a token consists of one or more consecutive characters. Beyond that, tokens are implementation-defined.] Note that consecutive tokens need not be separated by either punctuation or space, and tokens may overlap. [Definition: A phrase is an ordered sequence of any number of tokens. Beyond that, phrases are implementation-defined.]

    Note:

    In some natural languages, tokens and words can be used interchangeably.

  3. Tokenization enables functions and operators that operate on a part or the root of the token (e.g., wildcards, stemming).

    Tokenization enables functions and operators which work with the relative positions of tokens (e.g., proximity operators).

    Tokenization also uniquely identifies sentences and paragraphs in which tokens appear. [Definition: A sentence is an ordered sequence of any number of tokens. Beyond that, sentences are implementation-defined. A tokenizer is not required to support sentences.] [Definition: A paragraph is an ordered sequence of any number of tokens. Beyond that, paragraphs are implementation-defined. A tokenizer is not required to support paragraphs.] Whatever a tokenizer for a particular language chooses to do, it must preserve the containment hierarchy: paragraphs contain sentences which contain tokens.

    The tokenizer has to evaluate two equal strings in the same way, i.e., it should identify the same tokens. Everything else is implementation-defined.

  4. This specification focuses on functionality that serves all languages. It also selectively includes functionalities useful within specific families of languages. For example, searching within sentences and paragraphs is useful to many western languages and to some non-western languages, so that functionality is incorporated into this specification.

  5. Some XML elements represent semantic markup, e.g., <title>. Others represent formatting markup, e.g., <b> to indicate bold. Semantic markup serves well as token boundaries, while formatting markup sometimes does not. Implementations are free to provide implementation-defined ways to differentiate between the markup's effect on token boundaries during tokenization.

Thursday, May 4, 2006 (Permalink)

The Mozilla Project has released Camino 1.0.1, a Mac OS X web browser based on the Gecko 1.8 rendering engine and the Quartz GUI toolkit. Camino is free for Mac OS X 10.2 through 10.4. It supports pretty much all the technologies that Mozilla does: HTML, XHTML, CSS, XML, XSLT, etc. 1.0.1 is mostly a bug fix release, including fixes for several security problems. All users should upgrade. Mac OS X 10.2 or later is required.

Wednesday, May 3, 2006 (Permalink)

The Apache WebServices Commons Project has released of AXIOM 1.0. Near as I can tell this is yet another tree model like DOM, JDOM, or XOM. However it's built from StAX rather than SAX. Most importantly Axiom can build the object tree on demand so you don't spend memory on nodes you don't want. That sounds good, but it's been tried before (notably in Xerces's deferred DOM) and the results have not been impressive. Maybe these folks have figured out a more practical way to do this, though. The underlying push-pull parser distinction may be important for this.

Also of note is the support for XML Optimized Packaging (XOP) and MTOM. The Axiom announcement gets this exactly backwards though. XOP and MTOM do not allow "XML to carry binary data efficiently and in a transparent manner." Instead they allow both XML and binary data to be bundled together in the same non-XML file. Understanding the distinction is critical for proper use of these technologies.

The Axiom API itself is too complex. For example, here's a chunk of code from the tutorial:

OMFactory factory = OMAbstractFactory.getOMFactory();
OMNamespace ns1 = factory.createOMNamespace("bar","x");
OMElement root = factory.createOMElement("root",ns1);
OMNamespace ns2 = root.declareNamespace("bar1","y");
OMElement elt1 = factory.createOMElement("foo",ns1);
OMElement elt2 = factory.createOMElement("yuck",ns2);
OMText txt1 = factory.createOMText(elt2,"blah");
elt2.addChild(txt1);
elt1.addChild(elt2);
root.addChild(elt1);

And here's the equivalent in XOM for comparison:

Element root =  new Element("x:root", "bar");
Element elt1 = new Element("x:foo", "bar");
Element elt2 = new Element("y:yuck", "bar1");
Text txt1 = new Text("blah");
elt2.appendChild(txt1);
elt1.appendChild(elt2);
root.appendChild(elt1);

Of course, XOM would notice that the requested elements use relative namespace URIs, and thus that the document containing them does not have a valid Infoset. For all the talk about Infosets on the Axiom pages, you'd hope somebody would have noticed this. Their examples also demonstrate a lack of correct white space handling, and some serious mistakes with encoding detection. I haven't tried to write code with this API yet, so I can't tell if the problems are in the library itself or just the tutorial. Either way, it's disturbing.

Folks: if you're going to write yet another XML API, please, please ask for early review from people who have been through this before. The reason the mistakes in Axiom jump out at me is that I've seen them all dozens of times before. XML is not as simple a spec as it seems at first glance. There are a lot of tricky areas that trip up the unwary. There are some interesting new ideas here, that should be explored further. However, as a library it's clearly unsuitable for production use.

Tuesday, May 2, 2006 (Permalink)

Code Synthesis has released xsd 2.1.1, an open source (GPL) W3C XML Schema language based data binding tool for C++.

Given an XML instance description (XML Schema), it generates C++ classes that represent the given vocabulary as well as parsing and serialization code (collectively called a mapping or binding).

Compared to APIs such as DOM and SAX, the generated code allows you to access the information in XML instance documents using your domain vocabulary instead of generic elements, attributes, and text. Static typing helps catch errors at compile-time rather than at run-time. Automatic code generation frees you for more interesting tasks while minimizing the effort needed to adopt your applications to changes in the document structure.

xsd supports two C++ mappings: in-memory C++/Tree and event-driven C++/Parser. The C++/Tree mapping consists of C++ classes that represent data types defined in XML Schema, a set of parsing functions that convert XML instance documents to a tree-like in-memory data structure, and a set of serialization functions that convert the in-memory representation back to XML....

The C++/Parser mapping provides parser templates for data types defined in XML Schema. Using these parser templates you can build your own in-memory representations or perform immediate processing of XML instance documents.

2.1.1 is a bug fix release.


The Mozilla Project has released Firefox 1.5.0.3 for Windows, Mac (including X86 Macs), and Linux. This is a security update, and all users should upgrade. If you're running Firefox 1.5, you should see an automated update notificationsoon or you can 'Check for Updates…' from the Help menu.

Monday, May 1, 2006 (Permalink)

The W3C Internationalization Activity has published the third working draft of Internationalization Tag Set (ITS). "ITS is designed to be used with new and existing schemas to support the internationalization and localization of schemas and documents. An implementation is provided for three schema languages: XML DTD, XML Schema and RELAX NG." For example an its:translate attribute specifies whether particular content is to be translated or not.

Saturday, April 29, 2006 (Permalink)

x-port.net has released of formsPlayer 1.4.3.1028, a free-beer (e-mail address required) "set of modules designed to make it easy to build XForms processors, editors and debuggers. These processors can run on a variety of platforms, using a range of user interfaces." This release supports the second edition of XForms 1.0 and adds asynchronous submissions, adopts conditional action handlers from XForms 1.1, improves CSS support, and fixes bugs. Internet Explorer is required.

Friday, April 28, 2006 (Permalink)

Syntext has released Serna 2.6.0. a $268 payware XSL-based WYSIWYG XML Document Editor for Mac OS X, Windows, and Unix. Features include on-the-fly XSL-driven XML rendering and transformation, on-the-fly XML Schema validation, XInclude, and spell checking. A roughly $500 enterprise edition adds a Python API and WebDAV support.

Version 2.6 adds support for DITA, the Darwin Information Typing Architecture Frankly, the appeal of DITA escapes me completely. Perhaps some people need it, or perhaps it's just another idea that has a large company willing to throw a bunch of dollars at it. Near as I can make out DITA merges the simplicity and intelligibility of architectural forms with the open process and patent savvy of Web Services. That sure sounds like a winning combination. :-)

Thursday, April 27, 2006 (Permalink)

The W3C Internationalization Core Working Group has posted the first public working draft of Language and Locale Identifiers for the World Wide Web (LTLI). This draft doesn't seem to say very much except that languages and locales are not the same thing, and

  1. Specifications that make use of language or locale values MUST meet the conformance criteria defined for "well-formed" processors, as defined in sec. 2.2.9 of [RFC 3066bis].

  2. Specifications that make use of language or locale values MAY validate these values. If they do so, they MUST meet the conformance criteria defined for "validating" processors, as defined in sec. 2.2.9 of [RFC 3066bis].

  3. Specifications that define operations on language or locale values using matching Must use either a basic language range or an extended language range.

  4. Specifications that define operations on language or locale values using matching MUST specify whether the resulting language priority list contains a single result (lookup as defined in [RFC 3066bis Matching]), or a possible empty set of results (filtering as defined in [RFC 3066bis Matching]).

  5. Specifications that describe the identification of locales or aspects thereof with IRIs may use IRIs [RFC 3987] for this purpose, or to point to more detailed locale or preference data.

Wednesday, April 26, 2006 (Permalink)

Matthew Cruickshank has released Docvert 2.1.4, a PHP program that converts various word processor formats including Microsft Word to Oasis OpenDocument v1.0 format. From there it can optionally proceed to HTML or DocBook. PHP 5.0 or later and various plugins are required.


The Big Faceless Organization has released the Big Faceless Report Generator 1.1.30, a $1200 payware Java application for converting XML documents to PDF. Unlike most similar tools it appears to be based on HTML and CSS rather than XSL Formatting Objects. This is mostly a bug fix release. Java 1.2 or later is required.

Tuesday, April 25, 2006 (Permalink)

OSoft is now selling XML in a Nutshell, 3rd edition in ThoutReader format. I played around with this a little. There's the seed of a good idea here, but only a seed. The implementation is faulty. Both the web site and the client software make a number of basic GUI bloopers. More seriously, it's vastly too difficult to purchase and install content, even the free content (though at least it doesn't insist on having a credit card before it will let you have the free stuff like iTunes does). Finally the client isn't integrated enough with the server. You can't browse and install content directly into the client like you can buy songs in iTunes. Instead you have to purchase and download the new books, then install them manually. This should be no more than a two-click operation.


Microsoft has posted the second beta of Internet Explorer 7 for Windows XP. This release fixes bugs and improves CSS handling. One hopes XML, XHTML, and XSLT are supported as well, though I don't have a Windows system handy to test with at the moment.

Monday, April 24, 2006 (Permalink)

Benjamin Pasero has released of RSSOwl 1.2.1, an open source RSS reader written in Java and based on the SWT toolkit. Version 1.2.1 is now a universal binary for PowerPC and Intel on Mac OS X. RSSOwl is the best open source RSS client I've seen written in Java.

Sunday, April 23, 2006 (Permalink)

John Cowan has posted the fourth release candidate of TagSoup, an open source, Java-language, SAX parser for nasty, ugly HTML. I use TagSoup to convert JavaDoc to well-formed XHTML. RC4 fixes bugs and adds support for SAX Locators to specify the location in the input of any I/O exception thrown. Cowan has also added --html, -help and --version command line options. TagSoup is dual licensed under the Academic Free License and the GPL.

Saturday, April 22, 2006 (Permalink)

The XML Apache Project has posted version 0.92 of FOP, an open source XSL Formatting Objects to PDF/PostScript/RTF converter written in Java. "This third release contains many bug fix release and new features compared to 0.91beta." Java 1.3 or later is required.

FOP's improving, and it's good enough for simple protoyping and experimenting. However it's still not close to ready for most serious production needs. There are just too many missing pieces. The most important ones for my needs are automatic table layout, floating images, and whitespace preservation in code samples. Hopefully these will be improved in future releases in the push to 1.0.

Friday, April 21, 2006 (Permalink)

Opera Software has posted the first beta (after a couple of preview releases) of version 9.0 of their namesake free-beer web browser for Windows, Mac, and Linux. Opera supports XML and CSS. It now claims to support XSLT, but I haven't gotten that working yet.


The Mozilla Project has released Mozilla 1.7.13 to fix "security and stability issues. This release marks the end-of-life of the 1.7.x product line."

Thursday, April 20, 2006 (Permalink)

Code Synthesis has released xsd 2.1.0, an open source (GPL) W3C XML Schema language based data binding tool for C++.

Given an XML instance description (XML Schema), it generates C++ classes that represent the given vocabulary as well as parsing and serialization code (collectively called a mapping or binding).

Compared to APIs such as DOM and SAX, the generated code allows you to access the information in XML instance documents using your domain vocabulary instead of generic elements, attributes, and text. Static typing helps catch errors at compile-time rather than at run-time. Automatic code generation frees you for more interesting tasks while minimizing the effort needed to adopt your applications to changes in the document structure.

xsd supports two C++ mappings: in-memory C++/Tree and event-driven C++/Parser. The C++/Tree mapping consists of C++ classes that represent data types defined in XML Schema, a set of parsing functions that convert XML instance documents to a tree-like in-memory data structure, and a set of serialization functions that convert the in-memory representation back to XML....

The C++/Parser mapping provides parser templates for data types defined in XML Schema. Using these parser templates you can build your own in-memory representations or perform immediate processing of XML instance documents.

This release improves performance and adds automatic handling of forward inheritance and a new enum mapping with support for inheritance has been implemented.


Kiyut has released Sketsa 3.3.1, a $49 payware SVG editor written in Java. Version 3.3,1 fixes bugs. Java 5 or later is now required.

Wednesday, April 19, 2006 (Permalink)

Steve Cheng has posted docbook2X 0.8.7, an open source package for Unix that converts DocBook files to to man pages and Texinfo. This is a bug fix release.

Tuesday, April 18, 2006 (Permalink)

The Mozilla Project has posted version 0.5 of its XForms extension for Firefox 1.5. Mozilla XForms support has been developed by IBM, Novell, and independent contributors. According to the announcement, "there has been a lot of work done in refactoring and fixing a lot of UI code. We now support the basic XForms controls inside a XUL document. The refactoring has also paved some of the way for direct SVG-support (which is yet to come)." Numerous bugs have been fixed as well, and this should be a lot closer to a complete usable implementation of XForms (though not there yet).

Monday, April 17, 2006 (Permalink)

The W3C Web Services Description Working Group has posted three revised candidate recommendations and two working drafts for WSDL 2.0:

Web Services Description Language (WSDL) Version 2.0 Part 1: Core Language

"Web Services Description Language Version 2.0 (WSDL 2.0) provides a model and an XML format for describing Web services. WSDL 2.0 enables one to separate the description of the abstract functionality offered by a service from concrete details of a service description such as 'how' and 'where' that functionality is offered. This specification defines a language for describing the abstract functionality of a service as well as a framework for describing the concrete details of a service description. "

Web Services Description Language (WSDL) Version 2.0 Part 2: Adjuncts

WSDL is an XML format for describing network services as a set of endpoints operating on messages containing either document-oriented or procedure-oriented information. Web Services Description Language (WSDL) Version 2.0 Part 2: Adjuncts defines predefined extensions for use in WSDL 2.0:

  • Message exchange patterns

  • Operation styles

  • Binding Extensions

Web Services Description Language (WSDL) Version 2.0 Part 0: Primer

"This document is a companion to the WSDL 2.0 specification (Web Services Description Language (WSDL) Version 2.0 Part 1: Core Language [WSDL 2.0 Core], Web Services Description Language (WSDL) Version 2.0 Part 2: Adjuncts [WSDL 2.0 Adjuncts]). It is intended for readers who wish to have an easier, less technical introduction to the main features of the language."

Web Services Description Language (WSDL) Version 2.0 SOAP 1.1 Binding

"WSDL SOAP 1.1 Binding (this specification) describes the binding extension for SOAP 1.1 [SOAP11] protocol. This binding is intended to ease the migration from WSDL 1.1 to WSDL 2.0 for implementers describing services that use SOAP 1.1 protocol. And, this binding allows users to continue using SOAP 1.1 protocol."

Web Services Description Language (WSDL) Version 2.0: RDF Mapping

Web Services Description Language is defined in XML, because XML is the standard format for exchange of structured information. The use of XML brings better interoperability to WSDL generators and parsers, and the use of XML Schema makes the structure of WSDL well constrained, yet extensible. On the other hand, XML vocabularies in general don't have clear composition rules, so combining for example the WSDL description of a Web service, the service's policies and other information (presumably expressed in XML) can be done in many significantly different ways (e.g. extending WSDL, extending the policy language, creating a special XML container for all the information etc.), and little interoperability can be expected when such combined documents are used.

For example, a policy can be combined with WSDL by adding the policy elements in WSDL service element. Equally, a WSDL description can be combined with a policy by adding the WSDL description as part of the policy. While the results should be similar (WSDL with policy information), they are in fact very different for the processing software, and a policy in WSDL cannot easily be used by software that doesn't know WSDL.

In contrast, the Semantic web requires knowledge from many different sources to be easily combined so that unexpected data connections can be used. For this purpose there is the Resource Description Framework (RDF), whose graph structure together with the use of URIs for identifying nodes makes it very easy for different documents to be brought together. If a WSDL document describes a Web service, a policy document attaches constraints to the service and a general description specifies the author of the service, all this information can be merged and the resulting document will contain all the three kinds of information associated with the single service.

The main objective of this specification is to present a standard RDF ([RDF]) and OWL ([OWL]) vocabulary equivalent to WSDL 2, so that WSDL 2 documents can be transformed into RDF and merged with other Semantic Web data.

Comments are due by July 1.

Sunday, April 16, 2006 (Permalink)

The W3C RDF Data Access Working Group has published three candidate recommendations about SPARQL:

According to the introduction to SPARQL Query Language,

An RDF graph is a set of triples; each triple consists of a subject, a predicate and an object. RDF graphs are defined in RDF Concepts and Abstract Syntax [CONCEPTS]. These triples can come from a variety of sources. For instance, they may come directly from an RDF document; they may be inferred from other RDF triples; or they may be the RDF expression of data stored in other formats, such as XML or relational databases. The RDF graph may be virtual, in that it is not fully materialized, only doing the work needed for each query to execute.

SPARQL is a query language for getting information from such RDF graphs. It provides facilities to:

  • extract information in the form of URIs, blank nodes, plain and typed literals.
  • extract RDF subgraphs.
  • construct new RDF graphs based on information in the queried graphs.

As a data access language, it is suitable for both local and remote use. The companion SPARQL Protocol for RDF document [SPROT] describes the remote access protocol.

Here's a simple example SPARQL query adapted from the draft:

PREFIX  dc: <http://purl.org/dc/elements/1.1/>
PREFIX  : <http://example.org/book/>
SELECT  ?var
WHERE   ( :book1  dc:title  ?var )

The ? indicates a variable name. This query stores the title of a book in a variable named var. There are boolean and numeric operators as well. Strangely you can also use a dollar sign to represent a variable name. As far as I can tell this is exaclty the same as using a question mark. Why two forms? I don't know but I can guess. This really smells of a massive and pointless argument within the working group that was resolved by agreeing to do both when only one was necessary.

Comments on all three CRs are due by June 6.

Saturday, April 15, 2006 (Permalink)

The W3C Internationalization Activity has published the third working draft of Internationalization Tag Set (ITS). "ITS is designed to be used with new and existing schemas to support the internationalization and localization of schemas and documents. An implementation is provided for three schema languages: XML DTD, XML Schema and RELAX NG." For example an its:translate attribute specifies whether particualr content is to be translated or not.

Friday, April 14, 2006 (Permalink)

The Mozilla Project has released Firefox 1.5.0.2 for Windows, Mac (including X86 Macs), and Linux. They "strongly recommend that all users upgrade to this latest release. This update is available immediately in 37 languages including German, French, Spanish, Japanese, Simplified and Traditional Chinese, Korean, and more. If you already have Firefox 1.5, you will receive an automated update notification within 24 to 48 hours. This update can also be applied manually by selecting 'Check for Updates…' from the Help menu within at any time. Mozilla Corporation is also strongly recommending that Firefox 1.0 users upgrade to this latest release of Firefox 1.5 in order to take advantage of significant security and stability improvements. Firefox 1.5 includes an automated update mechanism that ensures users are always up to date with the very latest updates. This release of Firefox includes native support for new Macintosh computers with Intel Core processors, improvements for the Japanese locale, and fixes for security issues, common crashes, and performance."


The Mozilla Project has also released Firefox 1.0.8. No automatic update in this branch though. You'll need to upgrade manually. This too "includes fixes for security and stability issues. This release marks the end-of-life of the 1.0.x product line. See the Firefox 1.0.x Product Sunset Announcement. Mozilla Corporation strongly recommends that all Firefox 1.0 users upgrade to Firefox 1.5 available for Windows, Mac, and Linux for free download from getfirefox.com. This update is available immediately in 35 languages including German, French, Spanish, Japanese, Simplified and Traditional Chinese, Korean, and more."


And finally the Mozilla Project has released SeaMonkey 1.0.1. "All users of previous SeaMonkey versions are encouraged to update, as SeaMonkey 1.0.1 includes multiple security fixes along with other critical bug fixes." It's also now a universal binary on Mac OS X.

Thursday, April 13, 2006 (Permalink)

Michael Kay has released version 8.7.1 of Saxon, his XSLT 2.0 and XQuery processor for Java and .NET. According to Kay, "This is a maintenance release that fixes known bugs and non-conformances; it also implements a few spec changes agreed by W3C since the Candidate Recommendation came out (for example the decision to put types such as xdt:dayTimeDuration into the XML Schema namespace - Saxon currently supports both the old and the new namespaces)."

Saxon is published in two versions for both of which Java 1.4 or later (or .NET) is required. Saxon 8.7B is an open source product published under the Mozilla Public License 1.0 that "implements the 'basic' conformance level for XSLT 2.0 and XQuery." Saxon 8.7SA is a £250.00 payware version that "allows stylesheets and queries to import an XML Schema, to validate input and output trees against a schema, and to select elements and attributes based on their schema-defined type. Saxon-SA also incorporates a free-standard XML Schema validator. In addition Saxon-SA incorporates some advanced extensions not available in the Saxon-B product. These include a try/catch capability for catching dynamic errors, improved error diagnostics, support for higher-order functions, and additional facilities in XQuery including support for grouping, advanced regular expression analysis, and formatting of dates and numbers."


Norm Walsh has published the fifth beta of DocBook 5.0 DocBook 5 is "a significant redesign that attempts to remain true to the spirit of DocBook." The schema is written in RELAX NG. A DTD and W3C XML Schema generated from the RELAX NG schema are also available. There's also a Schematron schema "that validates some extra-grammatical DocBook constraints. These patterns are also present directly in the RELAX NG Grammar and some validators, for example MSV, can perform both kinds of validation at the same time." This beta repairs the broken DTD from beta 4.


Norm Walsh has also posted the second candidate release of DocBook 4.5. Version 4.5 implements a minor bug-fix to citebiblioid and updates the reference documentation. As you may recall, I wrote Processing XML with Java in DocBook 4. I've been playing with DocBook 5 lately for a couple of possible future book projects. While it's clearly an improvement over DocBook 4 in numerous ways—for instance it uses namespaces, embeds SVG and MathML, and has reasonable XInclude support—the tool chain isn't up to snuff yet. The stylesheets and various editors like Oxygen haven't adapted to life in a DocBook 5 world yet. I'll probably continue to use DocBook 5 because I'm a bleeding edge sort of guy, but most users should stick to DocBook 4 for the time being.


Altsoft N.V. has released Xml2PDF 3.0, a $49 payware Windows program for converting XSL-FO, SVG, WordML, and XHTML documents into PDF files. New features in 3.0 include:

  • SVG Basic output
  • XSL-FO embededded in SVG
  • OpenType fonts with Type1 outlines and kernings
  • MathML as an external graphics format
  • Multipage floats
  • Extensions for absolutely positioned floats and tabulation support

This release should be faster and use less memory too.


NEW!

Benoit Guillon has posted dblatex 0.1.7, a free-as-in-speech (GPL) tool for transforming DocBook documents to PDF. It requires an existing TeX/LaTeX setup on your system.


XimpleWare has released VTD-XML 1.5, a free (GPL) non-extractive Java library for processing XML that supports XPath. This appears to be an example of what Sam Wilmot calls "in situ parsing". In other words, rather than creating objects representing the content of an XML document, VTD-XML just passes pointers into the actual, real XML. (These are the abstract pointers of your data structures textbook, not C-style addresses in memory. In this cases the pointers are int indexes into the file.) You don't even need to hold the document in memory. It can remain on disk. This should improve speed and memory usage. Current tree models typically require at least 3 times the size of the actual document, more often more. Using a model based on indexes into one big array might allow these to reduce their requirements to twice the size of the original document or even less. VTD-XML claims 1.3 times, but I haven't verified that.

However VTD-XML currently only supports the built-in entity references (&quot; &amp; &apos; &gt; &lt;). They're some other limits. Element names are limited to 2048 characters. Documents can't be much bigger than a billion characters, so SAX (or XOM) is still needed for really huge documents. There's also a maximum depth to the document, though exactly what it is isn't specified. All this means VTD-XML is not a conformant XML parser. Given this, comparisons to other parsers are unfair and misleading. I've seen many products that outperform real XML parsers by subsetting XML and cutting out the hard parts. It's often the last 10% that kills the performance. :-( The other question I have for anything claiming these speed gains is whether it correctly implements well-formedness testing, including the internal DTD subset. Will VTD-XML correctly report all malformed documents as malformed? Has it been tested against the W3C XML conformance test suite? I'm not sure.

Wednesday, April 12, 2006 (Permalink)

The KDE Project has released KOffice 1.5, an open source office suite (word processor, spreadsheet, presentation program, etc.) for Linux. This release now saves files in the XML-based OASIS OpenDocument file format by default. There's an old programmer saying that an additional layer of indirection is the solution to almost any problem. I think it's very important that the OpenDocument forces such an additional layer of indirection between the WYSIWYG view on the screen and the data on the disk. Traditionally word processors have been far too closely tied to their file formats, and this has led to less than robust products like Microsoft Word and WordPerfect. Irrespective of the XML nature of the format, merely having a common document format that is not tied to any one program is an important step forward. Although OpenDocument started its life as the OpenOffice file format, there are now two major office suites with independent code bases that use it by default. That should shake out any unintentional couplings between OpenOffice and OpenDocument.

Tuesday, April 11, 2006 (Permalink)

I noticed something today while exploring KisKis - Keep It Secret! Keep It Safe!, a free-as-in-speech (GPL) password manager written in Java. KisKis can export and import the passwords in XML using dom4j. The problem is MetaStuff publishes dom4j under an unmodified BSD license that is incompatible with the GPL due to the advertising clause. Unless the dom4j folks are willing to change their license to remove the advertising clause, free-as-in-speech products should avoid it. XOM, DOM, and SAX, by contrast, are all GPL-compatible. I just spent a lot of time over the last few months getting the advertising clause removed from the Jaxen license, so this is very much on my mind right now.

Monday, April 10, 2006 (Permalink)

IBM's alphaWorks has updated the XML Forms Generator, a data-driven Eclipse plug-in that "generates forms that adhere to the XForms 1.0 standard, using as a starting point either Web Service Description Language (WSDL) documents or XML instance documents having optional XML Schema backing models. The generated forms adhere to the XHTML and XForms 1.0 standards and can be viewed in popular XHTML and XForms renderers." This release adds schematron support, can externalize text strings for localization, can place instance data inline, and runs with Java 5.

IBM has also released the Visual XForms Designer, an Eclipse plug-in for graphicaly editing XForms. This product sits on top of the Eclipse Modeling Framework (EMF), Graphical Editing Framework (GEF), and Eclipse Web Tools Platform (WTP) which gives me very little hope that it will actually work. WTP in particular is truly hideous, buggy piece of software, not up to the standards of rest of the Eclipse platform. I'm not sure whether that's because WTP is bad software or because it's built on top of the shaky foundations of GEF and EMF. (I suspect a little of both.)

Both the XML Forms Generator and the Visual Forms Designer are part of the Emerging Technologies Toolkit (ETTK), which is a nice way of saying they're closed source and more than likely IBM will eventually abandon them without ever making them available for production use; either as closed or open source.

Sunday, April 9, 2006 (Permalink)

I've posted beta 9 of Jaxen 1.1, an open source (modified BSD license) XPath 1.0 engine for Java that is adaptable to many different object models including XOM, JDOM, DOM, and dom4j. Jaxen was originally written by James Strachan and Bob McWhirter. Beta 9 fixes an assortment of small issues. Most importantly it cleans up the license which was a little contradictory and confused. (Not all pieces had the same license.) The entire package is now released under the modified BSD license (no advertising clause).

Do not be fooled by the "beta" designation. This release has many fewer bugs and is much more conformant to the XPath specification than the official 1.0 release. We'll probably get around to calling it 1.1 final sometime later this year after closing a few more bugs and doing a little more work on performance. However, there's no reason to wait for that. If you're using Jaxen 1.0, you should upgrade to this beta.

Saturday, April 8, 2006 (Permalink)

IBM has updated the Compound XML Document Toolkit, a closed source Eclipse plugin for editing XML documents that use multiple namespaces.

The Compound XML Document Toolkit uses XML schemas to define the semantics of constructing documents spanning one or more namespaces. Those semantics include the order and placement of elements, the allowable child elements, and available attributes for each element.

Sample XML schema profiles for these XML-based standards are provided with the Compound XML Document Toolkit; documents having mark-up of the following types may therefore be created and edited immediately upon installation:

  • XHTML 1.1 + XForms 1.0
  • XHTML 1.1 + SVG 1.1
  • XHTML 1.1 + MathML 2.0
  • XHTML 1.1 + XForms 1.0 + SVG 1.1
  • SVG 1.1 + XHTML 1.1
  • SVG 1.1 + XHTML 1.1 + XForms 1.0
  • XHTML Mobile 1.1 + SVG Tiny 1.2
  • SVG Tiny 1.2 + XHTML Mobile 1.1
  • XHTML 1.1 + SVG 1.1 + MathML 2.0
  • XHTML 1.1 + VoiceXML 2.0
  • XHTML 1.1 + VoiceXML 2.0 + SVG 1.1
  • XHTML 1.1 + SMIL 2.0

The Compound XML Document Toolkit also provides also provides tools for validating compound XML documents, in addition to one-step rendering of documents being edited.

CXDE is based on the rather hideous Web Tools Platform though, so it's hard to recommend as a serious editor rather than a proof of concept. Conceptually, I'm very skeptical of the schema-based, strong typing ideal that drives this project. Personally I'm much more interested in products that treat schemas as suggestions rather than strait jackets. For instance, I don't mind an editor using a schema to suggest auto-complete options; but I don't want it to freak out if I add an xi:include element that isn't accounted for by the schema or paste in some invalid (but well-formed) legacy HTML.

Friday, April 7, 2006 (Permalink)

I've posted my XML class notes from last month's Software Development 2006 conference:

Thursday, April 6, 2006 (Permalink)

The W3C Web API Working Group has published the first public working draft of The XMLHttpRequest Object:

The XMLHttpRequest object is an interface exposed by a scripting engine that allows scripts to perform HTTP client functionality, such as submitting form data or loading data from a remove Web site.

The XMLHttpRequest object is implemented today, in some form, by many popular Web browsers. Unfortunately the implementations are not completely interoperable. The goal of this specification is to document a minimum set of interoperable features based on existing implementations, allowing Web developers to use these features without platform-specific code. In order to do this, only features that are already implemented are considered. In the case where there is a feature with no interoperable implementations, the authors have specified what they believe to be the most correct behavior.

Future versions of this specification (as opposed to future drafts of this version) may add new features, after careful examination from browser developers and Web content developers.

This specification was originally derived from the WHAT WG's Web Applications 1.0 document. The authors acknowledge the work of the WHAT WG in documenting existing behavior.

Wednesday, April 5, 2006 (Permalink)

Matthew Cruickshank has released Docvert, some "web service software" that "takes multiple word processor files (typically .doc) and converts them to Oasis OpenDocument. Web Service receives .doc file and converts it to a Oasis OpenDocument 1.0 which can then be converted to HTML, RSS, or any XML format. The resulting OpenDocument XML is then optionally converted to HTML or any XML. This is done with XML Pipelines, an approach that supports XSLT, breaking up content over headings or sections, and saving those results to multiple files (e.g., chapter1.html, chapter2.html…). The result is returned in a .zip file. Docvert is easy to integrate as it uses a simple REST-style interface, and it's released under the LGPL".


Adam Souzis has released Rx4RDF 0.6.0, a set of technologies designed to make the Resource Description Framework (RDF) measier to use. It includes:

  • RxPath for querying, transforming and updating RDF by specifying a deterministic mapping of the RDF model to the XPath data model
  • ZML, a Wiki-like text formatting language that lets you write arbitrary XML or HTML
  • RxML, yet another alternative XML serialization for RDF, this one designed for easy authoring in ZML
  • Raccoon, a simple application server that uses an RDF model for its data store
  • Rhizome is a content management and delivery system that runs on Raccoon.
  • RDFScribbler, a web application that can display and edit any arbitrary RDF model using RxSLT and RxUpdate.

This release improves performance and integration with other RDF databases.

Tuesday, April 4, 2006 (Permalink)

Matthias Bethke has released HTML Sucks Completely 1.0a, a preprocessor that reads HTML files that use an extended syntax with macros, conditionals, variables, expressions, etc. HSC transforms these into static (X)HTML pages.

Monday, April 3, 2006 (Permalink)

Planamesa Software has released NeoOffice/J 1.2.2, a Mac port of OpenOffice 1.1. 1.2.2 is a bug fix release. I used 1.2 to give one tutorial at Software Development a couople of weeks back, and it seemd to work OK. Mac OS X 10.3 or later and a PowerPC Mac is required. NeoOffice is not compatible with the Intel Macs. (Another WORA failure.) NeoOffice is published exclusively under the GPL.

Saturday, April 1, 2006 (Permalink)

Antenna House, Inc has released XSL Formatter 3.4 MR3 for Linux and Windows. This tool converts XSL-FO files to PDF. New features in 3.4 include Pantone colors and .NET 2.0 support. The lite version costs $300 and up on Windows and $900 and up on Linux/Unix, but is limited to 300 pages per document. Prices for the uncrippled version start around $1250 on Windows and $3000 on Linux/Unix.

Friday, March 31, 2006 (Permalink)

The W3C Technical Architecture Working Group (TAG) has published The Disposition of Names in an XML Namespace. This "document addresses the question of whether or not adding new names to a (published) namespace is a sound practice." Short answer: it depends. Slightly longer answer:

Specifications that define namespaces SHOULD explicitly state their policy with respect to changes in the names defined in that namespace.

For namespaces that are not immutable, the specification SHOULD describe how names may be given definitions (or have them removed) and by whom.

If a namespace document is provided, as [WebArch Vol 1] recommends, the namespace change policy SHOULD be stated in the namespace document.

As a general rule, resources on the web can and do change. In the absence of an explicit statement, one cannot infer that a namespace is immutable.

Wednesday, March 29, 2006 (Permalink)

The W3C XML Core Working Group has posted the candidate recommendation of the XML Linking Language (XLink) Version 1.1. There are three major changes in XLink 1.1 compared to 1.0:

  1. XLinks now contain IRIs rather than URIs
  2. All attributes in the XLink namespace are now reserved for future versions of XLink.
  3. Most importantly, the xlink:type="simple" attribute is no longer required.

That is a simple link can now be written like this:

<composer xlink:href="http://www.beand.com/">Beth Anderson</composer>

It's no longer necessary to write this:

<composer xlink:type="simple" xlink:href="http://www.beand.com/">Beth Anderson</composer>

This is a good thing. I'm not sure who first came up with this idea, but I've been advocating it for a while now. This makes XLink a lot more palatable in applications like XHTML 2 and SVG.

Tuesday, March 28, 2006 (Permalink)

David Heinemeier Hannsen has released version 1.1 of Rails. New features include:

  • RJS: JavaScript written in Ruby
  • Bottomless eager loading
  • Polymorphic associations and join models
  • Integration tests
  • XML representations for records with to_xml

NEW!

PDFTron Systems has released PDF2SVG 3.0, a $549 command-line application for converting PDF files to SVG. PDF2SVG works on Windows, Linux, and Mac OS X. PDF2SVG is also available as a library for C, C++, C#, Java, VB, and others.

Monday, March 27, 2006 (Permalink)

Worldlabel.com has released SVG Document Templates 1.0, a free collection of Scalable Vector Graphics (SVG) templates for mailing labels, address books, CD/DVD labels, and jewel case inserts.

Friday, March 24, 2006 (Permalink)

Planamesa Software has released NeoOffice/J 1.2.1, a Mac port of OpenOffice 1.1. 1.2.1 speeds up the application and fixes a few bugs. Mac OS X 10.3 or later and a PowerPC Mac is required. NeoOffice is not compatible with the Intel Macs. (Another WORA failure.) NeoOffice is published exclusively under the GPL.

Thursday, March 23, 2006 (Permalink)

The W3C Web Services Addressing Working Group has posted the proposed recommendations of Web Services Addressing 1.0 Core and Web Services Addressing - SOAP Binding. As expected, the working group pretty much ignored all the comments that warned them they were driving the wrong way down the highway (and on the wrong side of the road) and instead focused on the comments about which radio stations to play along the way to inevitable disaster. I don't really care if WS crashes and burns. I just hope they don't run into anybody else's car driving the other way down the road when they do it.

The core spec defines abstract generic extensions to the Infoset for endpoint references and message addressing properties. The binding spec describes how the abstract properties defined in the core spec is implemented in SOAP. The problem is that there already is an addressing system for the Web. It's called the URI, and web services addressing just adds complexity to that for no special benefit. In fact, it's pretty clear that it doesn't do anything except add complexity.

Here's the problem. Web Services Addressing "defines two constructs, message addressing properties and endpoint references, that normalize the information typically provided by transport protocols and messaging systems in a way that is independent of any particular transport or messaging system." In other words this is another example of the excessive genericity problem, just like DOM; and we all remember how well that worked. One of the fundamental problems with DOM was that the W3C tried to develop an architecture that could work for all conceivable programming languages; but developers didn't want and didn't need an API for all programming languages. they wanted an API that was tailored to their own programming language. This is why language-specific libraries like XOM and Amara are so much easier to use and more productive than DOM.

Web Services Addressing is trying to define an addressing scheme that can work over HTTP, SMTP, FTP, and any other protocol you can imagine. However, each of these protocols already have their own addressing systems. Developers working with these protocols don't want and don't need a different addressing system that's marginally more compatible with some protocol they're not using in exchange for substantially less compatibility with the protocol they are using. Besides nobody's actually doing web services over anything except HTTP anyway. Doesn't it just make more sense to use the well understood, already implemented debugged HTTP architecture for this instead of inventing something new?

Wednesday, March 22, 2006 (Permalink)

The Mozilla Project has posted the first alpha of Firefox 2.0, code named "Bon Echo". New features include:

  • Places window for managing feeds, bookmarks and history
  • A close tab button on each tab
  • New data storage layer for bookmarks and history (using SQLlite)
  • Extended search plugin format
  • Updates to the extension system to provide enhanced security and to allow for easier localization of extensions
  • Support for SVG text using svg:textPath
Tuesday, March 21, 2006 (Permalink)

Microsoft has posted a new beta (build 5335.5) of Internet Explorer 7, This requires Windows XP Service Pack 2. Uninstall any previous version first.

Monday, March 20, 2006 (Permalink)

Steve Cheng has posted docbook2X 0.8.6, an open source package for Unix that converts DocBook files to to man pages and Texinfo. This release supports nested tables in man pages.

Sunday, March 19, 2006 (Permalink)

The W3C XForms Working Group has posted the second edition of XForms 1.0. Like the XML 1.0 second edition, this does not really change the language. It just corrects various errata.


Steve Palmer has released Vienna 2.0.2, an open source RSS/Atom client for Mac OS X. Vienna is the first reader I've found acceptable for daily use; not great but good enough. (Of course my standards for "good enough" are pretty high.) 2.0.2 improves performance and adds a Spanish localization.

Saturday, March 18, 2006 (Permalink)

Ranchero has posted the first public beta of NetNewsWire 2.1, a closed source RSS client for the Mac. It's available in both free-beer lite and $25 payware versions. Version 2.1 is a universal binary, can sync with NewsGator, fixes bugs, and improves performance.


The XML Apache Project has released XML::Xerces 2.7.0-0, a Perl wrapper around the Xerces C++ API. It provides access to most of the C++ API from Perl, except for "some functions in the C++ API which either have better Perl counterparts (such as file I/O) or which manipulate internal C++ information that has no role in the Perl module." XML::Xerces supports XML 1.0; DOM levels 1, 2, and 3; SAX 1 and 2, Namespaces, and W3C XML Schemas.

Friday, March 10, 2006 (Permalink)

Matt Mullenweg has released Wordpress 2.0.2, a blog engine based on PHP and MySQL. 2.0.2 fixes some security problems. All users should upgrade, including 1.5.x users. That version is vulnerable and unlike Apache is not being supported, even with major security bug fixes.

Thursday, March 9, 2006 (Permalink)

There's been a cancellation for one of the XML classes at Software Development West in Santa Clara next week, so I'm looking for someone to fill the spot. If anyone is willing and able to give a class on Java-XML data binding (or really any sort of XML data binding) in Thursday, March 16 from 1:45 PM-3:15 PM, please drop me an e-mail right away. Thanks!


The OpenOffice Project has released OpenOffice 2.0.2, an open source office suite for Linux and Windows that saves all its files as zipped XML. This is mostly a bug fix release but also adds a Quattro Pro import filter and several new languages including Moore. Tsonga, Friulian, Bambara, Breton, Luxembourgish, Akan, Ndebele, and Venda. OpenOffice is dual licensed under the LGPL and Sun Industry Standards Source License.

Wednesday, March 8, 2006 (Permalink)

Andrea Marchesini has released libnxml 0.9, a C library for parsing, writing, and creating XML 1.0 and 1.1. Version 0.9 plugs memory leaks. libnxml is published under the LGPL.

Tuesday, March 7, 2006 (Permalink)

The W3C Web Services Addressing Working Group has posted the last call working draft of Web Services Addressing - WSDL Binding. According to the abstract, "Web Services Addressing provides transport-neutral mechanisms to address Web services and messages. Web Services Addressing 1.0 - WSDL Binding (this document) defines how the abstract properties defined in Web Services Addressing 1.0 - Core are described using WSDL." Changes in this draft include now allowing XML 1.1 and depending more on the infoset and less on the actual XML. (This is not an improvement.)

Monday, March 6, 2006 (Permalink)

Steve Palmer has released Vienna 2.0.1, an open source RSS/Atom client for Mac OS X. Vienna is the first reader I've found acceptable for daily use; not great but good enough. (Of course my standards for "good enough" are pretty high.) 2.0.1 is now a Universal binary. On top of that, it fixes assorted bugs.


Andrew Welch has released Kernow 1.2 (nee EasyTransformer), a cross-platform, open source graphical front end for Saxon written in Java. According to Welch, "Everything you would normally have to type into the command line is available through the mouse, with some extra features thrown in. If you have Schema Aware Saxon it will run that too." 1.2 can now validate documents against Relax NG and W3C schemas.

Sunday, March 5, 2006 (Permalink)

The W3C RDF Data Access Working Group has published the last call working draft of SPARQL Query Language for RDF. According to the introduction,

An RDF graph is a set of triples; each triple consists of a subject, a predicate and an object. This is defined in RDF Concepts and Abstract syntax. These triples can come from a variety of sources. For instance, they may come directly from an RDF document; they may be inferred from other RDF triples; or they may be the RDF expression of data stored in other formats, such as XML or relational databases. The RDF graph may be virtual, in that it is not fully materialized, only doing the work needed for each query to execute.

SPARQL is a query language for getting information from such RDF graphs. It provides facilities to:

  • extract information in the form of URIs, blank nodes, plain and typed literals.
  • extract RDF subgraphs.
  • construct new RDF graphs based on information in the queried graphs.

Here's a simple example SPARQL query adapted from the draft:

PREFIX  dc: <http://purl.org/dc/elements/1.1/>
PREFIX  : <http://example.org/book/>
SELECT  ?var
WHERE   ( :book1  dc:title  ?var )

The ? indicates a variable name. This query stores the title of a book in a variable named var. There are boolean and numeric operators as well. Strangely you can also use a dollar sign to represent a variable name. As far as I can tell this is exaclty the same as using a question mark. Why two forms? I don't know but I can guess. This really smells of a massive and pointless argument within the working group that was resolved by agreeing to do both when only one was necessary.

Saturday, March 4, 2006 (Permalink)

The W3C Internationalization Activity has published the second working draft of Internationalization Tag Set (ITS). "ITS is designed to be used with new and existing schemas to support the internationalization and localization of schemas and documents. An implementation is provided for three schema languages: XML DTD, XML Schema and RELAX NG." For example an its:translate attribute specifies whether particular content is to be translated or not.

Friday, March 3, 2006 (Permalink)

SyncroSoft has released version 7.1 of the <Oxygen/> XML editor. Oxygen supports XML, XSL, DTDs, XQuery, SVG, Relax NG, Schematron, and the W3C XML Schema Language. Version 7.1 adds support for Berkeley dbXML and eXist, an XSL templates view, and an XPath builder view for editing complex XPath expressions. Oxygen costs $298 with support. Upgrades from 6.0 cost $130.

Thursday, March 2, 2006 (Permalink)

The XML Apache Project has released Xerces-J 2.8, a minor upgrade to the preeminent open source XML parser for Java. This release adds some features to the schema API, implements several errata in the W3C schema specification, and fixes the dtdjars target so it's once again possible to build a smaller version of Xerces without schema support.

Wednesday, March 1, 2006 (Permalink)

Andrew Welch has released Kernow 1.1 (nee EasyTransformer), a cross-platform, open source graphical front end for Saxon written in Java. According to Welch, "Everything you would normally have to type into the command line is available through the mouse, with some extra features thrown in. If you have Schema Aware Saxon it will run that too." This release adds support for XQuery.

Tuesday, February 28, 2006 (Permalink)

Michael Kay has released version 8.7 of Saxon, his XSLT 2.0 and XQuery processor. This release is now available for .NET as well as Java. According to Kay,

The porting of Saxon to .NET was pioneered by M. David Peterson, Pieter Siegers Kort, and others, and the process has now been brought in-house within Saxonica. Saxon is written in Java, and the same source code is used for both products. The port is achieved by cross-compiling the Java code to MSIL using the IKVMC compiler developed by Jeroen Frijters, and using a combination of classes from the GNU ClassPath library and the .NET Framework library for run-time support.

Saxon 8.7 on .NET goes beyond previous Saxon.NET releases by providing greater integration with the .NET platform: in particular the System.Xml parser and utilities. It also provides a brand-new API designed to match the stylistic conventions of .NET and to provide a uniform approach to XSLT, XQuery, XPath, and XML Schema processing.

Saxon is published in two versions for both of which Java 1.4 or later (or .NET) is required. Saxon 8.7B is an open source product published under the Mozilla Public License 1.0 that "implements the 'basic' conformance level for XSLT 2.0 and XQuery." Saxon 8.7SA is a £250.00 payware version that "allows stylesheets and queries to import an XML Schema, to validate input and output trees against a schema, and to select elements and attributes based on their schema-defined type. Saxon-SA also incorporates a free-standard XML Schema validator. In addition Saxon-SA incorporates some advanced extensions not available in the Saxon-B product. These include a try/catch capability for catching dynamic errors, improved error diagnostics, support for higher-order functions, and additional facilities in XQuery including support for grouping, advanced regular expression analysis, and formatting of dates and numbers."

Monday, February 27, 2006 (Permalink)

Tomorrow, Tuesday, February 28, I'll be at the New York PHP Users Group to talk about RSS, Atom, OPML, and All That. The meeting starts at 6:30 P.M. You need to RSVP online to attend. Hope to see you there!

Friday, February 24, 2006 (Permalink)

XML in a Nutshell is going to be reprinted in a couple of weeks. This is not a new edition, just a reprint; but we do take the opportunity to fix any minor typos, small code bugs, and the like. If they're any problems you've noticed, please send them in; and I'll see what I can do about fixing them. I tend to do this work in spurts, so I recently went through my inbox and handled all the errata submitted over the last few months. However if you submitted something, and didn't get a recent response from me (probably because you didn't use the word Nutshell in the title so it got lost in my 3300 message inbox) it wouldn't hurt to ping me now either. Thanks.


Kiyut has released Sketsa 3.3, a $49 payware SVG editor written in Java. Version 3.3 adds adds snap-to-grid. Java 1.5 or later is now required.

Thursday, February 23, 2006 (Permalink)

Steve Palmer has released Vienna 2.0, an open source RSS/Atom client for Mac OS X. Vienna is the first reader I've found acceptable for daily use; not great but good enough. (Of course my standards for "good enough" are pretty high.) There's still one annoying AppleScript bug, and focus management is a little off. (i.e. the delete and arrow keys don't always work in the pane I expect them too.) However it's 90% there, and has all the basic features I require. I've also improved the experience a little by installing Feed Your Reader into Firefox so I can now add subscriptions to Vienna directly from Firefox.


Bill de hÓra and Joe Gregorio have posted the eighth public working draft of The Atom Publishing Protocol, a REST-based system for communicating with weblog servers. Significant examples here seem to be related to the handling of collections, media collections, and their metadata. I'll be talking about this (and other Atom topics) next week at the New York PHP User's Group.


buldocs has released xnsdoc 1.1, a €49 payware "documentation generator for XML namespaces defined by W3C XML Schema in HTML in a JavaDoc like visualization. xnsdoc supports all common schema design practices like chameleon, russian doll, salami slice, venetian blind schemas or circular schema references. xnsdoc can be used from the command line, as an Apache Ant Task, as an Apache Maven Plugin, as an eclipse plugin or integrated as a custom tool in many XML development tools such as StylusStudio, oXygen XML or XMLWriter." Version 1.1 adds an Eclipse plugin and fixes bugs.

Wednesday, February 22, 2006 (Permalink)

The W3C XSL Working Group has published the candidate recommendation of Extensible Stylesheet Language (XSL) Version 1.1. Despite the more generic name, this actually only covers XSL Formatting Objects, not XSL Transformations. New features in 1.1 include:

  • Multiple flows
  • Change marks
  • Back of the book indexing
  • Bookmarks
  • Markers in tables
  • fo:page-number-citation-last.
  • fo:page-sequence-wrapper
  • clear and float inside and outside
  • prefixes and suffixes for page numbers
Tuesday, February 21, 2006 (Permalink)

The W3C XML Schema Working Group has posted the last call working draft of XML Schema 1.1 Part 2: Datatypes.

  • "0000" is a legal year and values with negative years map onto the timeline such that "the year 0000 is 1 B.C.E., the year –0001 is 2 B.C.E., etc."
  • Distinction between identity and equality; for instance positive and negative zero would be equal but not identical. Think of the difference between == and equals() in Java.
  • New yearMonthDuration and dayTimeDuration types
  • A precisionDecimal type that "retains information about the precision of the value. This type is aligned with the floating-point decimal types which will be part of the next edition of IEEE 754."
  • An anyAtomicType data type
  • Negative and positive zero are distinct, in conformance with IEEE 754

Changes since the last working draft are quite techical, and mostly involve validation rules. However no new types appear to have been added or removed.

Monday, February 20, 2006 (Permalink)

Todd Ditchendorf has released AquaXSL 1.0, a free-as-in-beer XSLT debugger for Mac OS X 10.4 or later.

Friday, February 17, 2006 (Permalink)

The expired-but-not-dead-yet W3C HTML working group has published a proposed recommendation of XHTML™ Modularization 1.1. No, you didn't miss a step. They went straight from no-spec at all to proposed recommendation with no working drafts, no last call, no candidate recommendattions, no nothing. This is in pretty clear violation of every rule the W3C claims to operate by. The claim is that this is just an update of Modularization of XHTML™ and Modularization of XHTML™ in XML Schema. However this document actually introduces several completely new ideas that have never previously seen the light of day anywhere. Most notably, XHTML attributes can now be exposed to other vocabularies by placing them in the XHTML namespace. For example,

<Foo xmlns:xhtml="http://www.w3.org/1999/xhtml" xhtml:id="f1" xhtml:onkeypress="some Javascript" xhtml:class="notHTML" />

Comments are due by March 6.

Wednesday, February 15, 2006 (Permalink)

The Mozilla Project has released Camino 1.0, a Mac OS X web browser based on the Gecko 1.8 rendering engine and the Quartz GUI toolkit. Camino is free for Mac OS X 10.2 through 10.4. It supports pretty much all the technologies that Mozilla does: HTML, XHTML, CSS, XML, XSLT, etc. Mac OS X 10.2 or later is required.

Tuesday, February 14, 2006 (Permalink)

The W3C Device Description Working Group has posted the first working draft of a note on Device Description Landscape. According to the note,

Developing Web content for mobile devices is more challenging than developing for the desktop Web. Compared to desktop Web clients, mobile Web devices come in a much wider range of shapes, sizes and capabilities. The mobile Web developer relies upon accurate device descriptions in order to dynamically adapt content to suit the client.

This Note describes what efforts the W3C and other organizations are doing in order to provide accurate device descriptions.

Monday, February 13, 2006 (Permalink)

The W3C Semantic Web Best Practices and Deployment Working Group has posted a note on A Survey of RDF/Topic Maps Interoperability Proposals. This note "records existing proposals for integrating data represented in W3C's RDF/OWL family of languages with data represented in ISO's Topic Maps. It is a starting point for establishing guidelines for combined usage of these standards, assuring interoperability."


The W3C Device Description Working Group has posted the first public working draft of Device Description Landscape, "a companion to Device Description Ecosystem. This draft describes the current state of the various options that exist for providing Device Descriptions to enable device-aware applications."

Sunday, February 12, 2006 (Permalink)

Todd Ditchendorf has released AquaPath 1.1, a free-beer Mac application that can "evaluate XPath 1.0 expressions against any XML document and view the result sequence in a dynamic, intuitive tree representation. AquaPath is based on Apple's Cocoa/Objective-C NSXML and WebKit Frameworks." 1.1 allows you to specify the context node for the XPath expression, shows line numbers in the source editor, and is now a Universal Binary. Mac OS X 10.4 is required.

Friday, February 10, 2006 (Permalink)

Andrea Marchesini has released libnxml 0.8, a C library for parsing, writing, and creating XML 1.0 and 1.1. Version 0.8 fixes bugs. libnxml is published under the LGPL.

Thursday, February 9, 2006 (Permalink)

The W3C Scalable vector Grphic working group and Web API working group have posted the first public working drafts of Remote Events for XML (REX) 1.0 Requirements and Remote Events for XML (REX) 1.0. REX is "XML grammar intended for representing events as they are defined in DOM 3 Events, primarily but not exclusively for purposes of transmission or synchronisation of remote documents. Such a vocabulary would enable one endpoint to interact remotely with another endpoint holding a DOM representation by sending it DOM Events as if they had occurred directly at the same location." That is, it is

a transport agnostic XML syntax for the transmission of DOM events as specified in the DOM 3 Events specification [DOM3EV] in such a way as to be compatible with streaming protocols.

The first version of this specification deliberately restricts itself to the transmission of mutation events so as to remain limited in scope and allow for progressive enhancements to implementations over time rather than require a large specification to be deployed at once. The framework specified here is however compatible with the transmission of any other event type, and great care has been taken to ensure its extensibility and evolvability.

For example, this fragment might be used to insert a new item on this page:


  
    

text here

]]>

It's not immediately clear to me who's supposed to be sending or receiving these events.

Wednesday, February 8, 2006 (Permalink)

Opera Software has posted the second preview release of version 9.0 of their namesake free-beer web browser for Windows and Unix. (No Mac version yet.) Changes since the first preview include Opera Widgets, content blocking, and BitTorrent support. Opera supports XML and CSS. It now claims to support XSLT, but I haven't gotten that working yet.


However, Mac users interested in alternative browsers should not despair. The Mozilla Project has posted the first release candidate of Camino 1.0, a Mac OS X web browser based on the Gecko 1.8 rendering engine and the Quartz GUI toolkit. Version 1.0 mostly fixes bugs and speeds up the browser. This RC adds bookmark sorting and is now a universal binary. Camino is free for Mac OS X 10.2 through 10.4. It supports pretty much all the technologies that Mozilla does: HTML, XHTML, CSS, XML, XSLT, etc. Mac OS X 10.2 or later is required.

Tuesday, February 7, 2006 (Permalink)

The W3C Math Interest Group has published a note on Arabic mathematical notation. "This Note analyzes potential problems with the use of MathML for the presentation of mathematics in the notations customarily used with Arabic, and related languages. The goal is to clarify avoidable implementation details that hinder such presentation, as well as to uncover genuine limitations in the specification. These limitations in the MathML specification may require extensions in future versions of the specification."

Monday, February 6, 2006 (Permalink)

The W3C Voice Browser Activity has published the candidate recommedation of Semantic Interpretation for Speech Recognition (SISR) Version 1.0. According to the document,

This document defines the process of Semantic Interpretation for Speech Recognition and the syntax and semantics of semantic interpretation tags that can be added to speech recognition grammars to compute information to return to an application on the basis of rules and tokens that were matched by the speech recognizer. In particular, it defines the syntax and semantics of the contents of Tags in the Speech Recognition Grammar Specification [SRGS].

The results of semantic interpretation describe the meaning of a natural language utterance. The current specification represents this information as an ECMAScript object, and defines a mechanism to serialize the result into XML. The W3C Multimodal Interaction Activity [MMI] is defining an XML data format [EMMA] for containing and annotating the information in user utterances. It is expected that the EMMA language will be able to integrate results generated by Semantic Interpretation for Speech Recognition.

Comments are due by February 20.


The W3C Voice Browser Activity has also published the last call working draft of Pronunciation Lexicon Specification (PLS) Version 1.0. This is an XML syntax for specifying pronunciation lexicons for Automatic Speech Recognition and Speech Synthesis engines in voice browser applications. This draft makes support for the IPA phonetic alphabet mandatory and adds RELAX NG amd W3C XMl Schema Language schemas. "There is also a new section on multiple pronunciations, clarifying the use of the 'prefer' attribute. A lot of the previous text has been corrected or clarified, and a glossary of terms has been added." Comments are due by March 15.

Saturday, February 4, 2006 (Permalink)

The W3C HTML Working Group has published the proposed recommendation of XHTML-Print. According to the abstract, "XHTML-Print is member of the family of XHTML languages defined by the Modularization of XHTML [XHTMLMOD]. It is designed to be appropriate for printing from mobile devices to low-cost printers that might not have a full-page buffer and that generally print from top-to-bottom and left-to-right with the paper in a portrait orientation. XHTML-Print is also targeted at printing in environments where it is not feasible or desirable to install a printer-specific driver and where some variability in the formatting of the output is acceptable." In essence, this subsets XHTML with the features appropriate for printing. For instance, frames are not supported because "Frames depend on a screen interface and therefore are not applicable to printers." Comments are due by February 28.

Friday, February 3, 2006 (Permalink)

Planamesa Software has released NeoOffice/J 1.2, a Mac port of OpenOffice 1.1. New features in 1.2 include OpenOffice 2.0 document import, EPS image printing, and accessibility support. Mac OS X 10.3 or later and a PowerPC Mac is required. NeoOffice is not compatible with the Intel Macs. (Another WORA failure.) NeoOffice is published exclusively under the GPL.

Thursday, February 2, 2006 (Permalink)

The Mozilla Project has released Firefox 1.5.0.1. This is a bug fix release that includes some security fixes. It also restores compatibility with Mac OS X 10.2 Jaguar. All users should upgrade.

Wednesday, February 1, 2006 (Permalink)

Microsft has posted the first public beta of Internet Explorer 7 for Windows. IE 7 provides some improved security (though I'd bet phisher sites are going to work around that in about a week) and adds suport for RSS. I'm really curious to know if this passes the Acid 2 test yet. Windows XP SP 2 is required, and you can't run this simultaneously with IE 6. I'm still on 2000 myself, and I can't even see the page in Firefox on the Mac. Anyone want to report on this release?

Tuesday, January 31, 2006 (Permalink)

The Mozilla Project has released SeaMonkey 1.0. This is the continuation of the integrated Mozilla suite, and has XML support roughly equivalent to Firefox 1.5 (e.g. XML, XSLT, CSS, XHTML, etc.) There's something to be said for having the e-mail client, web editor, browser, and more rolled into one application. However, there's little to be said for maintaining the same ugly user interface of the old Mozilla builds. I didn't realize it until I switched back after surfing with Firefox for some months, and then tried switching back; but there's more to Firefox than just a stripped down Mozilla. I can't quite put my finger on it, but Firefox just looks prettier than Mozilla/SeaMonkey does. It sounds trivial; but if you try using both, I think you'll vastly prefer Firefox.


Meanwhile, over in AOL-land Netscape has released version 8.1 of its namesake web browser for Windows based on Firefox 1.0.x. This release plugs some security holes. All users (both of them) should upgrade.

Monday, January 30, 2006 (Permalink)

Due to a scheduling mixup and some crossed signals, I will not be speaking at the Long Island PHP User's Group tonight as previously announced. However, Matt Surico and Chris Merlo will be presenting about "RSS Feeds" so it should be an interesting meeting anyway. I still hope to get out to Long Island sometime later this year. I will still be at the New York PHP Users Group on Tuesday, February 28 in Manhattan. See you there!


Syntext has released Serna 2.5.0. a $268 payware XSL-based WYSIWYG XML Document Editor for Mac OS X, Windows, and Unix. Features include on-the-fly XSL-driven XML rendering and transformation, on-the-fly XML Schema validation, XInclude, and spell checking. New features in 2.5 include the ability to a selected portion of a document as XML source without switching to text mode and commenting and uncommenting the selection. A roughly $500 enterprise edition adds a Python API and WebDAV support.

Sunday, January 29, 2006 (Permalink)

Roman Fordinal has posted some XSLT stylesheets for converting DocBook to the Oasis Open Document Format. This is published under the LGPL.

Saturday, January 28, 2006 (Permalink)

The W3C XQuery working group has published the first public working drafts of XQuery Update Facility and XQuery Update Facility Use Cases. XQuery as it currently exists is basically just SELECT in SQL terms. This is INSERT, UPDATE, and DELETE. More specifically it is:

  • upd:mergeUpdates
  • upd:revalidate
  • upd:applyUpdates
  • Update Primitives
  • upd:insertBefore
  • upd:insertAfter
  • upd:insertInto
  • upd:insertIntoAsLast
  • upd:insertAttributes
  • upd:delete
  • upd:replaceValue
  • upd:rename
Friday, January 27, 2006 (Permalink)

The Apache Project has released version 2.2 of their namesake web server. I don't recommend you run out and upgrade immediately, but if you're developing a sophisticated server side application on top of Apache for future deployment, this is probably what you should be writing to. There are a lot of important new features in this release including:

  • Vastly improved authentication options. Can we finally get rid of cookies for usernames and passwords, please?
  • Production quality c aching
  • Simpler Configuration
  • Graceful stop
  • Better Proxying
  • Updated Regular Expression Library
  • Smart Filtering
  • Support for files and request bodies larger than 2 gigabytes
  • Direct SQL Database Support

Apache 2 modules will need to be recompiled to support this release, but otherwise should not need to be changed.

Thursday, January 26, 2006 (Permalink)

x-port.net has released of formsPlayer 1.4.1.1009, a free-beer (e-mail address required) "set of modules designed to make it easy to build XForms processors, editors and debuggers. These processors can run on a variety of platforms, using a range of user interfaces." New features in this release include:

  • Fully asynchronous submissions, with begin, success, and error events
  • No-click installation--fP now uses signed CAB files (Sounds dangerous to me. Does this mean it can autoinstall itself withou user permissions? If this is true, what else can do that?)
  • Ajax-style animation effects driven from the data model via simple CSS rules without scripting

Internet Explorer 6.0 SP1 is required.

Wednesday, January 25, 2006 (Permalink)

Andrew Welch has released EasyTransformer, a cross-platform, open source graphical front end for Saxon written in Java. According to Welch, "Everything you would normally have to type into the command line is available through the mouse, with some extra features thrown in. If you have Schema Aware Saxon it will run that too."

Tuesday, January 24, 2006 (Permalink)

Andrea Marchesini has released libnxml 0.5, a C library for parsing, writing, and creating XML 1.0 and 1.1. libnxml is published under the LGPL.


Todd Ditchendorf has released AquaPath 1.0.3, a free-beer Mac application that can "evaluate XPath 1.0 expressions against any XML document and view the result sequence in a dynamic, intuitive tree representation. AquaPath is based on Apple's Cocoa/Objective-C NSXML and WebKit Frameworks." 1.0.3 adds support for result sequences containing atomic values and now supports all the basic categories of XPath expressions, not just location paths. Mac OS X 10.4 is required.

Monday, January 23, 2006 (Permalink)

Todd Ditchendorf has released AquaPath 1.0.2, a free-beer Mac application that can "evaluate XPath 1.0 expressions against any XML document and view the result sequence in a dynamic, intuitive tree representation. AquaPath is based on Apple's Cocoa/Objective-C NSXML and WebKit Frameworks." This is a bug fix release. Mac OS X 10.4 is required.

Sunday, January 22, 2006 (Permalink)

The W3C XML Schema Working Group has posted the third public working draft of XML Schema 1.1 Part 2: Datatypes.

  • "0000" is a legal year and values with negative years map onto the timeline such that "the year 0000 is 1 B.C.E., the year –0001 is 2 B.C.E., etc."
  • Distinction between identity and equality; for instance positive and negative zero would be equal but not identical. Think of the difference between == and equals() in Java.
  • New yearMonthDuration and dayTimeDuration types
  • A precisionDecimal type that "retains information about the precision of the value. This type is aligned with the floating-point decimal types which will be part of the next edition of IEEE 754."
  • An anyAtomicType data type
  • Negative and positive zero are distinct, in conformance with IEEE 754
Saturday, January 21, 2006 (Permalink)
Friday, January 20, 2006 (Permalink)

Todd Ditchendorf has released AquaPath 1.0.1, a free-beer Mac application that can "evaluate XPath 1.0 expressions against any XML document and view the result sequence in a dynamic, intuitive tree representation. AquaPath is based on Apple's Cocoa/Objective-C NSXML and WebKit Frameworks." This is a bug fix release. Mac OS X 10.4 is required.

Thursday, January 19, 2006 (Permalink)

GCA has moved XML 2006 across the country from the previously announced Seattle location to Boston. The new dates are December 4-8. I hope nobody bought non-refundable tickets yet.


Todd Ditchendorf has released AquaPath, a free-beer Mac application that can "evaluate XPath 1.0 expressions against any XML document and view the result sequence in a dynamic, intuitive tree representation. AquaPath is based on Apple's Cocoa/Objective-C NSXML and WebKit Frameworks." Mac OS X 10.4 is required.

Wednesday, January 18, 2006 (Permalink)

Code Synthesis has released xsd 1.8.0, an open source (GPL) W3C XML Schema language based data binding tool for C++.

Given an XML instance description (XML Schema), it generates C++ classes that represent the given vocabulary as well as parsing and serialization code (collectively called a mapping or binding).

Compared to APIs such as DOM and SAX, the generated code allows you to access the information in XML instance documents using your domain vocabulary instead of generic elements, attributes, and text. Static typing helps catch errors at compile-time rather than at run-time. Automatic code generation frees you for more interesting tasks while minimizing the effort needed to adopt your applications to changes in the document structure.

xsd supports two C++ mappings: in-memory C++/Tree and event-driven C++/Parser. The C++/Tree mapping consists of C++ classes that represent data types defined in XML Schema, a set of parsing functions that convert XML instance documents to a tree-like in-memory data structure, and a set of serialization functions that convert the in-memory representation back to XML....

The C++/Parser mapping provides parser templates for data types defined in XML Schema. Using these parser templates you can build your own in-memory representations or perform immediate processing of XML instance documents.

This release expands support for various compilers and schema features including base64Binary and hexBinary.


Kiyut has released Sketsa 3.2.3, a $49 payware SVG editor written in Java. Version 3.2.3 fixes bugs. Java 1.4.1 or later is required.

Tuesday, January 17, 2006 (Permalink)

Karl Waclawek has posted version 2.0 of Expat, a non-validating parser XML processor for C. This release adds an XML_LARGE_SIZE switch to enable 64-bit integers for byte indexes and line/column numbers, supports the AmigaOS for the first time, and fixes assorted bugs.

Monday, January 16, 2006 (Permalink)

Tomorrow evening (Tuesday) I'll be talking about XOM at the XML Developers Network of the Capital District in Albany, New York. The meeting runs from 6:00 to 8:30 P.M. Everyone's invited.


The W3C has published the last call working draft of Scope of Mobile Web Best Practices 1.0. According to the draft,

The recommendations in this document are intended to improve mobile experience of the Web on mobile devices. While the recommendations are not specifically addressed at the desktop browsing experience it must be understood that they are made in the context of wishing to work towards 'One Web'.

As discussed in [Scope] One Web means making, as far as is reasonable, the same information and services available to users irrespective of the device they are using. However it does not mean that exactly the same information is available in exactly the same way across all devices. Some services and information are more suitable for and targeted at particular user contexts.

Some services have a primarily mobile appeal (location based services, for example). Some have a primarily mobile appeal but have a complementary desktop aspect (perhaps for complex configuration tasks). Still others have a primarily desktop appeal but a complementary mobile aspect (possibly for alerting). Finally there will remain some Web applications which have a primarily desktop appeal (lengthy reference material, rich images, perhaps).

It is likely that application designers and service providers will wish to provide the best possible experience in the context in which their service has the most appeal. However, while services may be most appropriately experienced in one context or another, it is considered best practice to provide a reasonable experience irrespective of the device, and as far as is possible, not to exclude access from any particular class of device.

From the perspective of this document this means that services should be available as some variant of HTML over HTTP.

Comments are due by February 17.

Sunday, January 15, 2006 (Permalink)

The W3C Compound Document Formats Working Group has published four last call working drafts. "When combining separate markup languages, specific problems have to be resolved that are not addressed by their individual language specifications, such as the propagation of events across namespaces, the combination of rendering or the user interaction model. Compound Document is the W3C term for a document that combines multiple formats." For example, a compound document might embed SVG and MathML in DocBook or SMIL and XForms in XHTML.

  • The Compound Document by Reference Framework 1.0 "defines a generic Compound Document Framework that defines a language-independent processing model for combining arbitrary document formats."
  • WICD Core 1.0 defines a "specifies WICD Core 1.0, a device independent Compound Document profile based on XHTML, CSS and SVG."
  • WICD Full 1.0 defines "a Compound Document profile based on XHTML, CSS and SVG, targeted at desktop agents."
  • WICD Mobile 1.0 defines "a Compound Document profile based on XHTML, CSS and SVG, which is targeted at mobile agents."

Comments on all four are due by January 27.

Saturday, January 14, 2006 (Permalink)

SyncroSoft has released version 7.0 of the <Oxygen/> XML editor. Oxygen supports XML, XSL, DTDs, XQuery, SVG, Relax NG, Schematron, and the W3C XML Schema Language. New features in 7.0 include continuous validation, an XQuery debugger and profiler, and XInclude shorthand pointers. Oxygen costs $298 with support. Upgrades from 6.0 cost $130.

Friday, January 13, 2006 (Permalink)

Apple has released Safari 1.3.2 for Mac OS X 10.3.9 Panther. This release "improves website compatibility, application stability and support for 3rd party web applications." I think this version supports XML+CSS but not XSLT. For that you need Safari 2 on Tiger.

Thursday, January 12, 2006 (Permalink)

Sun has posted the maintenance review change log for JSR 206: Java API for XML Processing specification. According to Norm Walsh, "the biggest change is support for StAX Sources and Results." Comments are due by February 8.


Dennis Sosnoski has posted the first beta of JiBX 1.1, yet another open source (BSD license) framework for binding XML data to Java objects using your own class structures. It falls into the custom-binding document camp as opposed to the schema driven binding frameworks like JaxMe and JAXB. 1.1 adds support for StAX input and output.

Tuesday, January 10, 2006 (Permalink)


The W3C Technical Architecture Group (TAG) has published The Disposition of Names in an XML Namespace. "This Finding addresses the question of whether or not adding new names to a (published) namespace is a sound practice." Short version: yes, it is. However, "Specifications that define namespaces SHOULD explicitly state their policy with respect to changes in the names defined in that namespace. For namespaces that are not immutable, the specification SHOULD describe how names may be given definitions (or have them removed) and by whom."


Altsoft N.V. has posted a beta of Xml2PDF 3.0, a $49 payware Windows program for converting XSL-FO, SVG, WordML, MathML, and XHTML documents into PDF, SVG, BMP, EMF, TIFF, GIF, JPEG, PNG, or WMF files. 3.0 should be faster and adds support for MathML input and SVG and bitmapped graphic output. "The formatting engine is implemented as pure managed .NET components and can be easily integrated into any .NET-based solution. It does not use any third-party applications or COM components."

Monday, January 9, 2006 (Permalink)

The Shiira Project has released Shiira 1.2, an open source (Modified BSD license) Mac OS X web browser based on Web Kit and written in Cocoa. "The goal of the Shiira Project is to create a browser that is better and more useful than Safari." They've failed. Shiira assumes all XML files are RSS files. Bad browser. No cookie. I know this isn't the first browser in which I've seen this particular brain damage, but Safari doesn't have this bug. Mac OS X 10.3.9 or later is required.


Daniel Veillard has released version 2.6.23 of libxml2, the open source XML C library for Gnome. that supports XML base, XInclude, xml:id, XML Catalogs, DTDs, RELAX NG, and W3C XML Schemas. This release fixes assorted minor bugs.


The Apache Web Services Project has posted version 0.5.1 of JaxMe 2, an open source implementation of the Java API for XML Binding. Quoting from the web page,

JaxMe 2 is an open source implementation of JAXB, the specification for Java/XML binding.

A Java/XML binding compiler takes as input a schema description (in most cases an XML schema but it may be a DTD, a RelaxNG schema, a Java class inspected via reflection or a database schema). The output is a set of Java classes:

  • A Java bean class compatible with the schema description. (If the schema was obtained via Java reflection, then the original Java bean class.)
  • An unmarshaller that converts a conforming XML document into the equivalent Java bean.
  • Vice versa, a marshaller that converts the Java bean back into the original XML document.

In the case of JaxMe, the generated classes may also

  • Store the Java bean into a database. Preferrably an XML database like eXist, Xindice, or Tamino, but it may also be a relational database like MySQL. (If the schema is sufficiently simple. :-)
  • Query the database for bean instances.
  • Implement an EJB entity or session bean with the same abilities.

According to Jochen Weidman, 0.5.1 "is a bug fix release with no major changes, except that preliminary support for external schema bindings has been added."

Sunday, January 8, 2006 (Permalink)

The W3C Mobile Web Initiative Best Practices Working Group has published a note on Scope of Mobile Web Best Practices and the second public working draft of Mobile Web Best Practices 1.0. According to the abstract of Scope of Mobile Web Best Practices:

Web access from mobile devices suffers from problems that make the Web unattractive for most mobile users. W3C's Mobile Web Initiative (MWI ) proposes to address these issues through a concerted effort of key players in the mobile value chain, including authoring tool vendors, content providers, handset manufacturers, browser vendors and mobile operators.

To help frame the development of "best practices" for the mobile Web this document - created by the members of the Mobile Web Initiative Best Practices Working Group ( BPWG) as an elaboration of its charter - identifies the nature of problems to be solved, outlines the scope of work to be undertaken and specifies the assumptions regarding the target audience and the anticipated deliverables.

Most of the actual practices suggested make sense, though I take issue with a couple. They are as follows:

Saturday, January 7, 2006 (Permalink)

The W3C XML Key Management Working Group has published a note about Using XKMS with PGP.

The XML Key Management Specification (XKMS 2.0) [XKMS] aims at providing a PKI independent interface to key management. XKMS services comprise discovery and validation of keys as well as support for certain aspects of the key life cycle management, including registration, reissuance and revocation.

XKMS employs XML Signature [XMLSIG] for the purpose of providing message security in the form of authentication and integrity. In addition, XKMS is based on the use of the <ds:KeyInfo> element as a means of transporting key information used as templates for the various operations it specifies.

This technical note addresses some of the issues related to the use of XKMS in conjunction with PGP

Friday, January 6, 2006 (Permalink)

Sun's released JAXP 1.3.1. JAXP 1.3.x includes SAX, DOM, schema support, XSLT, XPath, and XInclude. 1.3.1 fixes a long list of mostly minor bugs.


Norm Walsh has released XML Unicode 1.7, a Lisp library for inserting Unicode characters into XML in Emacs. "This version of xmlunicode improves the single-quote cycling, supporting apostrophe in the cycle, and fixes a small dependency bug." It's published under the GPL.

Thursday, January 5, 2006 (Permalink)

I've been getting serious about Atom and RSS lately, and using it to enable some new cool features here and elsewhere. You may have noticed the links on Cafe au Lait to the news items on my other sites, as well as the spiffy new full text Atom feeds. More is coming. I'm finding new uses of this technology everywhere I look.

Yes, I know I'm a little late to the party, but I just couldn't stomach the crash prone, unreadable, crippled, privacy invading, Web architecture violating, brain damaged, catty, and unusable software and specs that have been passed around in this space. However now that Vienna has crossed my personal usability threshold for a feed reader and the Atom group has begun producing XML-sane, HTTP-savvy specifications that don't trigger my gag reflex, I'm finding these technologies to be ready for real work.

To explore some of the issues, over the next few weeks I'm going to be doing a mini-usergroup tour here in the Northeast. I'll be talking about how you can use RSS and Atom to manage, read, and publish all sorts of information. Hint: it's not just about weblogs. Dates and locations are as follows:

OK. XOM and JUnit Code Coverage don't have that much to do with Atom; but I'll try to work something in; maybe an Atom example or two in the XOM talk. If you'd like me to reprise one of these talks for your group, drop me an email. Hope to see you at one or more of these!

Wednesday, January 4, 2006 (Permalink)

Planamesa Software has posted the first beta of NeoOffice/J 1.2, a Java-based Mac port of the open source OpenOffice 1.1 suite. New features in 1.2 include suport for the OpenOffice 2.0 file format, EPS images, and accessibility features. Mac OS X 10.3 or later is required. NeoOffice is published exclusively under the GPL.


Chris Pederick has released the Web Developer Extension 1.0 for Firefox, Flock, and Mozilla. It has lots of neat features. I use it to set my browser window size for taking book screenshots and testing out presentations. It's also very useful for debugging weird CSS layout issues; though that's hardly all it does.

Tuesday, January 3, 2006 (Permalink)

The W3C CSS Working Group has published a new working draft of CSS3 module: Cascading and inheritance."This CSS3 module describes how values are assigned to properties. CSS allows several style sheets to influence the rendering of a document, and the process of combining these style sheets is called “cascading”. If no value can be found through cascading, a value can be inherited from the parent element or the property's initial value is used....The main purpose of this module is to rewrite the relevant parts of CSS2 as a module for CSS3. With the exception of the 'initial' value and the optional title for '@import' and '@media', all features described in this module also exist in CSS2. Compared to CSS2, the cascading order has been changed in two cases as noted in the text."


Dave Beckett has released the Raptor RDF Parser Toolkit 1.4.8, an open source C library for parsing the RDF/XML, N-Triples. Turtle, and Atom Resource Description Framework formats. It uses expat or libxml2 as the underlying XML parser. Version 1.48 adds Atom support. Raptor is dual licensed under the LGPL and Apache 2.0 licenses.


Dennis Sosnoski has released JiBX 1.0.1, yet another open source (BSD license) framework for binding XML data to Java objects using your own class structures. It falls into the custom-binding document camp as opposed to the schema driven binding frameworks like JaxMe and JAXB. 1.0.1 restores Java 1.3 compatibility.

Monday, January 2, 2006 (Permalink)

The W3C CSS Working Group has published the third public working draft of Multi-column layout in CSS. This proposal defines three groups of new properties to support multi-column layouts in CSS3. The first group sets the number and width of the columns:

  • column-count
  • column-width
  • columns

Properties in the second group specify the amount of space and rules between columns:

  • column-gap
  • column-rule
  • column-rule-color
  • column-rule-style
  • column-rule-width

The third group controls column breaks:

  • column-break-before
  • column-break-after
  • column-break-inside
Sunday, January 1, 2006 (Permalink)

Happy New Year. I've got a little present for everyone today. I've often been asked for full text feeds, and I've responded that this would only happen "at such time as someone writes an RSS client that gives a user experience at least equal to a real Web browser." I also stipulated that said client had to be open source and run on my platform of choice. Privacy invasive server side solutions like Bloglines need not apply.

Well about a month ago, that happened. Steve Palmer released Vienna 2, the first open source, client based web browser for the Mac that passes the "It doesn't suck" test. There are still a few minor user interface glitches, mostly related to keyboard shortcuts and panel focus; but overall reading feeds in Vienna is equal or superior to reading them in a browser. And thus I am ready to announce something new:

Cafe con Leche is now publishing a full text feed. This includes not only the complete text of today's news. It also includes the quote of the day, all recommended reading links, and all recent news. If you prefer to read this site in a feed reader like Vienna go right ahead. You won't miss anything.

The feed is valid Atom 1.0. Every hour or so, cron fires up an XSLT 1.0 stylesheet that scrapes this web page and generates the feed. If you'd like something added to the feed, holler. However, if your reader has trouble handling this feed, please file a bug with the reader vendor; or download a better reader like Vienna. I don't have the time or inclination to work around every bug in every feed reader on the planet.

Now one word of warning: Just because I'm publishing full text feeds does not mean I'm giving up my copyrights, any more than publishing a web page does. No one but me has the right to take my articles and republish them on another site. You're welcome to read them in your personal reader, just like you can read them in a web browser. I'm not going to get too worried if your personal reader is server based, like Bloglines. If you want to let someone track every article you read and every click you make, that's your business. But otherwise, republishing this content without prior permission is prohibited by U.S. and international law. The RSS summary feeds will remain available for any site that wants to aggregate the headlines from Cafe con Leche along with other sites. Full text, though, is available exclusively here.


News from 2005 | News from 2004 | News from 2003 | | News from 2002 | News from 2001 | News from 2000 | News from 1998 | News from 1999
[ XML Books | XML Trade Shows | XML Mailing Lists | XML Quotes ]

Copyright 2006 Elliotte Rusty Harold
elharo@ibiblio.org