2007 XML News

Monday, December 31, 2007 (Permalink)

On the last day of the year, IBM developerWorks has published my look back at 2007 in the world of XML: XML 2007 Year in review. Between XQuery, Atom, OpenDoc, and OOXML, 2007 was actually probably the most exciting year we've had since the dot bomb. I think we're finally heading up the second bump of a classic double bump adoption curve.

Sunday, December 30, 2007 (Permalink)

Another day, another WordPress security bug. Matt Mullenweg has released Wordpress 2.3.2 an open source (GPL) blog engine based on PHP and MySQL. This release fixes a bug that exposes draft posts to everyone. All users should upgrade.

Saturday, December 29, 2007 (Permalink)

XMLMind has released version 3.7.0 of their XML Editor. This $250 payware product features word processor and spreadsheet like views of XML documents. This release adds support for DITA 1.1. A free-beer hobbled version is also available.

Friday, December 28, 2007 (Permalink)

AOL has finally read the writing on the wall and decided to discontinue Netscape. Netscape always struck me as a company in the right place at the right time. However they never had the business sense or the technical skills to really capitalize on what they stumbled into. Of course, Firefox shall live on. In many ways Firefox is what Netscape should have been. It is the browser that learned from Netscape's mistakes. (Most others--Opera and IE included--never did.)

Thursday, December 27, 2007 (Permalink)

The W3C Scalable Vector Graphics Working Group has posted the last call working drafts of SVG Print 1.2, Part 2: Language and SVG Print 1.2, Part 1: Primer. According to the primer:

Because of its scalable, geometric nature, SVG is inherently better suited to print than raster image formats. The same geometry can be displayed on screen and on a printer, with identical layout in both but taking advantage of the higher resolution of print media. The same colors can be output, using an ICC-based color managed workflow on the printer and an sRGB fallback approximation on screen. This has been true since SVG 1.0, and so SVG has been used in print workflows (for example, in combination with XSL FO) as well as on screen.

However, SVG also has dynamic, interactive features such as declarative animation, scripting, timed elements like audio and video, and user interaction such as event flow and link activation. None of these are applicable to a print context. SVG 1.1 gives static and dynamic conformance classes, but further guidance on what exactly SVG Printers should do with such general content is helpful. The SVG Print specification defines processing rules for handling such general purpose content which was not designed to be printed, but which may be encountered anyhow.

It is common in cross-media publishing to design content which will be used both online and in print media. This specification gives guidance on how to create such content and how to indicate that it has been adapted to improve its print capability.

Lastly, it is possible to generate SVG which is exclusively intended for print (for example, a printer which natively understands SVG). This content might be created in an illustration program, or it might be an output from a layout program, such as an XSL-FO renderer; or it might be generated by an SVG Print driver. This specification defines conformance classes for software which reads this type of SVG,and also a conformance class for SVG Print content.

Comments on both are due by February 8.

Wednesday, December 26, 2007 (Permalink)

The GCA has posted the Call for Participation for XTech 2008, the largest European XML conference (even if it's not strictly limited to XML any more). "Suggested topic areas include social network platforms, identity management, Ajax, web of data, databases, operations, programming, browsers and mobile devices." This year XTech takes place in Dublin, Ireland from May 6-9. Submissions are due by January 25. It's a nice conference, but I'm not currently planning on attending myself. travel is just becoming too difficult.

Tuesday, December 25, 2007 (Permalink)

Andrew Welch has released Kernow 1.6, a cross-platform, open source graphical front end for Saxon written in Java. According to Welch, New features in this release include Saxon 9 support, Schematron, and an XSLT sandbox. According to Welch, "This version is currently only available via Java Web Start because we've added some new features and I'd like to try them out first before committing to a "proper" release."

Monday, December 24, 2007 (Permalink)

The Helsinki University of Technology has released X-Smiles 1.1, a proof-of-concept XForms engine written in Java. Version 1.1 adds support for XBL 2 bindings.

Sunday, December 23, 2007 (Permalink)

The W3C has released version 9.99 of Amaya, their open source testbed web browser and authoring tool for Solaris, Linux, Windows, and Mac OS X that supports HTML 4.01, XHTML 1.0, XHTML Basic, XHTML 1.1, HTTP 1.1, MathML 2.0, SVG, XML, RDF, XPointer, XLink, and much of CSS 2. "The major changes are a contextual menu, a customized user interface (see Preferences to change it), Amaya themes, and a new style panel to style documents."

Saturday, December 22, 2007 (Permalink)

The W3C Math Working Group has posted the first public working draft of XML Entity definitions for Characters. "This document defines several sets of names which are assigned to Unicode characters. Each of these sets is also implemented as a file of XML entity declarations." These include:

isobox Box and Line Drawing
isocyr1 Russian Cyrillic
isocyr2 Non-Russian Cyrillic
isodia Diacritical Marks
isolat1 Added Latin 1
isolat2 Added Latin 2
isonum Numeric and Special Graphic
isopub Publishing
isoamsa Added Math Symbols: Arrow Relations
isoamsb Added Math Symbols: Binary Operators
isoamsc Added Math Symbols: Delimiters
isoamsn Added Math Symbols: Negated Relations
isoamso Added Math Symbols: Ordinary
isoamsr Added Math Symbols: Relations
isogrk1 Greek Letters
isogrk2 Monotoniko Greek
isogrk3 Greek Symbols
isogrk4 Alternative Greek Symbols
isomfrk Math Alphabets: Fraktur
isomopf Math Alphabets: Open Face
isomscr Math Alphabets: Script
isotech General Technical
mmlextra Additional MathML Symbols
mmlalias MathML Aliases
xhtml1-lat1 Latin for HTML
xhtml1-special Special for HTML
xhtml1-symbol Symbol for HTML

Friday, December 21, 2007 (Permalink)

Bare Bones Software has released version 8.7.2 of BBEdit, my preferred text editor on the Mac, and what I'm using to type these very words. Besides bug fixes, the major new feature in this release is that you can now scroll background windows with the scroll wheel on Mac OS X 10.5 Leopard. BBEdit is $199 payware. Upgrades from 8.5 and 8.6 are free. Upgrades from 8.0-8.2 cost $30 and upgrades from 7.x cost $40. Mac OS X 10.4 or later is required.

Thursday, December 20, 2007 (Permalink)

The Mozilla Project has posted the second beta of Firefox 3.0 for Mac, Linux, and Windows. This is code named "Gran Paradiso". "Firefox 3 is based on the new Gecko 1.9 Web rendering platform, which has been under development for the past 28 months and includes nearly 2 million lines of code changes, fixing more than 11,000 issues. Gecko 1.9 includes some major re-architecting for performance, stability, correctness, and code simplification and sustainability. Firefox 3 has been built on top of this new platform resulting in a more secure, easier to use, more personal product with a lot under the hood to offer website and Firefox add-on developers. [Improved in Beta 2!] Firefox 3 Beta 2 includes approximately 900 improvements over the previous beta, including fixes for stability, performance, memory usage, platform enhancements and user interface improvements. Many of these improvements were based on community feedback from the previous beta."

Thursday, December 13, 2007 (Permalink)

The W3C HTML Working Group, led by Apple and Opera, has posted the first public working draft of HTML Design Principles. These include:

Support Existing Content
Degrade Gracefully
Do not Reinvent the Wheel
Pave the Cowpaths
Evolution Not Revolution
Solve Real Problems
Priority of Constituencies
Secure By Design
Separation of Concerns
DOM Consistency
Well-defined Behavior
Avoid Needless Complexity
Handle Errors
Media Independence
Support World Languages
Accessibility

These are all good, though I can think of a few others, most importantly, "Avoid proprietary technologies" and "Allow free, open source implementations."

Wednesday, December 12, 2007 (Permalink)

Version 2.0 of XQilla, an open source XQuery 1.0 and XPath 2.0 library and command line utility written in C++, has been released. Xqilla is implemented on top of Xerces-C++ and derives from Pathan. Version 2.0 implements the DOM 3 XPath API, and conforms to the both the XQuery and XPath 2.0 recommendations. It is now published under an Apache 2.0 licence.

Tuesday, December 11, 2007 (Permalink)

The W3C HTML Working Group has posted the second public working draft of CURIE Syntax 1.0: A syntax for expressing Compact URIs. This is modeled after namespace URIs and qualified names. In brief, it defines a prefix for a known base IRI (a URI that can contain non-ASCII characters like é), then appends a colon and a local part. For example, the CURIE cafe:tradeshows.xml could be shorthand for http://www.cafeaulait.org/tradeshows.xml if the prefix cafe were mapped to the URL http://www.cafeaulait.org/. Exactly how prefixes are mapped to base IRIs is left to the specification of the documents in which the CURIEs appear. However if the CURIEs are in an XML document, then the namespaces in scope define the prefix mappings. The default namespace can be used for prefix-less CURIEs.

Frankly I'm surprised to see this. Namespaces and the namespace syntax are one of the notable failures of the XML world. Why someone would choose to imitate this now that we know better is beyond me. Based on experience with namespaces, I predict that the problems of moving CURIEs from one context to another are going to be especially problematic. Well, we've learned to live with (if not exactly like) namespaces. I guess we can get used to this.

Monday, December 10, 2007 (Permalink)

The W3C Voice Browser, Web APIs, and Web Application Formats (WAF) Working Groups have posted a new draft of Enabling Read Access for Web Resources (formerly Authorizing Read Access to XML Content Using the <?access-control?> Processing Instruction 1.0). According to the draft,

Cross-site requests are possible using the HTML img and script elements for instance. However, it is not possible to exchange the contents of resources or manipulate resources "cross-domain". This is to prevent information leakage and to ensure that malicious site can not delete your calendar data with cross-site requests using the HTTP DELETE method.

The policy this document introduces allows a resource to opt-in to allowing cross-site data retrieval of it and also enables a mechanism based on the same policy to allow a resource to opt-in to requests using an HTTP method other than GET. This policy builds on top of the existing restrictions already in place. This policy described in this document can only be used by a technology, such as XMLHttpRequest or XBL, when the respective specification of that technology describes how it applies.

The access control policy is defined in the resource that might be obtained and is expected to be enforced by the client that retrieves and processes the resource. Thus the client is trusted and acts as a policy enforcement point.

Sunday, December 9, 2007 (Permalink)

David Heinemeier Hannsen has released Rails 2.0, a web development framework written in Ruby, Version 2.0 implements a much cleaner, far more RESTful design including sane URLs and HTTP Basic Authentication. Other new features include JSON and Atom support, CSRF protection, and broader exception handling.

I'll have to play with this some before I'm sure if it's really right, but it certainly feels more correct than 1.0, 1.1, and 1.2. Of course, even if Rails has finally gotten HTTP and REST right, it's still struggling with the wrong data store. Relational databases just don't fit web sites all that well. In fact, probably 80% of Rails and similar frameworks is dedicated to working around the hassle of shredding your pages into tables and then putting them back together again. If instead you build a web site on top of a native XML database such as eXist, there's just a lot less work to do in the first place.

Saturday, December 8, 2007 (Permalink)

ETH Zurich has posted MXQuery 0.4.1, an open source (Apache 2.0) XQuery engine written in Java. It supports XQuery 1.0 (though typeless), XQueryP, FORSEQ, and XQuery Update. Version 0.4.1 fixes bugs, adds more functions and operators, and can now pass about 95% of the test suite.

Friday, December 7, 2007 (Permalink)

Oracle has posted the proposed final draft of Java Specification Request 225, the XQuery API for Java. Think of this as JDBC for native XML databases.

Thursday, December 6, 2007 (Permalink)

The W3C XML Processing Model Working Group has posted a new working draft of XProc: An XML Pipeline Language. According to the introduction,

An XML Pipeline specifies a sequence of operations to be performed on a collection of XML input documents. Pipelines take zero or more XML documents as their input and produce zero or more XML documents as their output.

A pipeline consists of steps. Like pipelines, steps take zero or more XML documents as their inputs and produce zero or more XML documents as their outputs. The inputs to a step come from the web, from the pipeline document, from the inputs to the pipeline itself, or from the outputs of other steps in the pipeline. The outputs from a step are consumed by other steps, are outputs of the pipeline as a whole, or are discarded.

There are two kinds of steps: atomic steps and compound steps. Atomic steps carry out single operations and have no substructure as far as the pipeline is concerned, whereas compound steps control the execution of other steps, which they include in the form of one or more subpipelines.

Standard steps include count, delete, equal, error, load, parse, serialize, insert, escape markup, unescape markup, identity, label elements, XSLT, XQuery, rename, namespace rename, replace, wrap, unwrap, wrap sequence, sink, set attributes, split sequence, string replace, XInclude, HTTP request, RELAX NG validate, and W3C Schema validate. This draft removes implicit input and output pipes, supports XPath 2, and adds a number of attributes to various steps. Others may be defined.

Wednesday, December 5, 2007 (Permalink)

IBM gives a 7:30 A.M. sponsored breakfast that is essentially content free. I do wonder what companies think they get out of these things? Given the inflated price of hotel catering, this talk probably cost them a few thousand dollars, maybe more after travel and staff time is included.

Yesterday DataDirect used their slot to play a quiz game and give away a half dozen iPod Nanos. Amusing, but I doubt anyone really learned anything and I for one actually wanted to hear them talk about their product at a technical level. Oh well, at least there's eggs.

OK, he's finally talking about something new: a product called MashupHub that lets so-called business users build web apps by gluing together widgets and feeds. You can only publish to the server, not save the page you build. This would be more interesting if it weren't server-locked; that is is it were an interchangeable spoke rather than a hub, but even then I'm not sure would see any use for it. Basically this is another in-browser web page editor.

IBM thinks this tool is for business users so they can build web 2.0 apps rather than having the developers do it. This is common refrain we've been hearing since before I was born. Cobol was probably the first one to make that particular promise. We shouldn't forget that a few of the technologies that promised this actually delivered--HyperCard and the spreadsheet to name two--but far more of these technologies failed. The best ones like Cobol and Access became the province of professional developers. Most of them simply sunk under the waves and are now forgotten. MashupHub looks likely to sink before it leaves the harbor.

Just Systems is apparently partnering with IBM on this MashupHub thingie so now they're going to talk. They've got a mildly interesting demo but it's no clear what their tool actually does. They're playing videos of the result and talking about what they do rather than actually building something.

OK. They finally started demoing the product. It's a compound document editor for different XML vocabularies. The internal representation is DOM. This is sort of interesting, but I'm not sure if there's a need for it. Compound editors have been tried before and never really taken the world by storm. Microsoft Office/OLE is probably the most successful but most people just use it as a collection of single document type editors rather than a compound editor.

I suspect the user interface is the problem. Different document types need different user interfaces. A text editor interface doesn't work for a table. A table editor interface doesn't work for an equation. An equation editor interface doesn't work for a photo. Trying to shove an indefinite number of different UIs into one application causes interface overload. Furthermore, if one group does al the interfaces, then they don't have the time or skills to do each one well. If different groups do this, the interfaces are inconsistent.

Miguel de Icaza demoes Moonlight/Silverlight and XAML. He at least knows how to show code in a large font size and do Hello World examples. Unlike the more corporate presenters he doesn't waste time on PowerPointless slides.

He's written a Unix SDK for all this, and is demoing on Linux. Off the top of my head I think he's the only Linux presenter I've seen this week. The other presenters are split about 50/50 between Mac and Windows. A few have been using PowerBooks with Parallels to demo their Windows only software.

He isn't actually sure Silverlight will be successful (how refreshingly honest) but wants to make sure Linux isn't left out if it is. "I thought Linux was going to win nine years ago, and we're a little behind schedule."

More Microsoft stuff on Linq now with Microsoft's Shyam Pather on "Linq to XML: Visual Studio 2008, Silverlight, and Beyond". This mixes declarative SQL-like programming with traditional imperative C# programming. There's some danger at the seams though because C# is neither functional nor declarative. The optimizer could really muck things up if you aren't careful to avoid side effects in the wrong places.

Bang! In an offhand comment, Pather just put his finger on something I've seen a thousand times but never realized: DOM's use of the Document interface as a factory and requiring each node to be created by its containing document requires coders to keep passing the Document object around from method to method. when it would not be otherwise required. This bloats method signatures, makes thread safety harder, tightens coupling, and in general encourages ugly, spaghetti, procedural code instead of clean OOP design. Both XOM and JDOM cure this particular problem. I alway knew the abstract factory/factory method patterns were overused and usually ugly, but I hadn't realized just how much worse DOM's factory was than all other factories. I have to add this to the growing list of DOM's sins.

LINQ doesn't use prefixes in the API, just local names and namespace URIs. That's smart.

Performance is superior to .NET's DOM implementation.

Jason Hunter from Mark Logic gives the closing keynote with the title "You're Darn Right XML has a Future on the Web", but the wireless network is having troubles, so I may not be able to report very much.

He says MarkMail is one of the most XML-centric sites on the Web. He doesn't need servlets, Perl, relational DBs, etc. E-mail is half content and half data. "Semi-structured" is the dirty word this year. Jason's about the third person I've heard complain about it, because XML is actually more structured than tables, not less. It's just not repeating structures.

Case studies:

Dolly Madison Digital Collection
Alberta Govt. K-12
New England Journal of Medicine
CQ Legislative Impact tells you which bills/laws affect which other bills/laws.
Oxford University Press
Elsevier Differential Diagnosis

We usually want answers, not links. (That's a good explanation of why I often go to Wikipedia before Google these days. I use Google to find specific pages or a group of opinions about a subject. Use Wikipedia to get answers about a subject.)

Google gives subpar answers to programmers compared to O'Reilly books, but it gives these results much faster than pain through books.

(How big is MarkMail? How big could/would it be if it indexed all public mailing lists? or all Usenet groups? Would it hit the petabytes? What does his crawler look like?) He wrote the e-mail XML converter in Java in 3 hours (plus months handling bad mailers that didn't conform to spec.)

Jason brings his own EVDO network connection rather than relying on the crapy hotel wireless. All presenters should be this prepared.

"Michael Kay is the number one human talking about XML."

The biggest Mark Logic deployment is 200 terabytes. This is the U.S. government doing their own search engine to monitor sites they don't like. I wonder if Cafe con Leche is included? (and if not, what am I doing wrong?) Their crawler tries to avoid being noticed.

XML 2008 will take place December 8-10 in Crystal City, Virginia (near D.C.)

Is anyone else blogging from here? I haven't noticed anyone yet. Let's see what Google can find:

Tuesday, December 4, 2007 (Permalink)

The morning kicked off with a Microsoft sponsored breakfast on interoperability. This is a little like listening to Cardinal Egan preach about the evils of having sex with altar boys. Also like Cardinal Egan, it's pretty obvious that Microsoft doesn't really understand the issues at hand or why people or annoyed with them. Their definition of interoperability still involves Microsoft defining the formats, languages, and APIs exclusively, and Microsoft refusing to accept any compromises that would in any way hinder their own goals and values. To them interoperability means that they define the formats and allow other developers to use them, and if they're feeling especially magnanimous they might even let us do that without a click-through license an an NDA. However they certainly don't intend to allow any of us peasants to have a voice in what the rules should be. They claim they've reformed, but what they really mean is that now they're a benevolent dictator instead of a malevolent dictator. They don't understand that what we want is a democracy, not a dictator at all (or they understand but they're not willing to give it to us).

Gregg Pollack talks about RESTful XML Web Services with Ruby on Rails. He plays Ruby in this video:

He's here to talk about the "big problem in web services."

RAILS has three parts:

Model: database
View: HTML and XML display logic
Controller: code, User action controller. These need to be as small as possible. There should only be one controller per each model (database table).

I'm still not sure I like convention over configuration. The database is rarely configured like I'd want, and I'm usually not working with a green field database over which I have full control. However, the RAILS he's showing looks a lot nicer than the version I last looked at. They've fixed a lot of their early REST mistakes. I wonder what this would look like if the backend were a native XML database instead of a relational database? Maybe just the eXist REST API.

I have to figure out how he does that cool cursor spotlight trick.

At 9:45, John Davies from IONA Technologies gave a fascinating talk about working with XML in financial services and banking industries: 100 billion dollar hedge funds that need 3 millisecond response times and the like. (They aren't doing very well: despite that capitalization they're only making about $19,000,000 a day. They'd do better in index funds.)

I didn't have the energy to take live notes, but there was a lot of good stuff. He thinks that the industry is moving away from web services and to REST. On the other hand, he also said that HTTP didn't work very well for them because it was stateless, so they needed to use JMS and MQSeries. I can't quite reconcile those two. Maybe I misheard him on the bit about HTTP, or he switched from one part of financial services to another.

He's also doing a lot of work with legacy comma-separated value formats by defining filters that present it as XML without actually rewriting it on disk. Then he can use XPath, XSLT, and so forth. A lot of his data does not fit well into relational databases. (He says the SQL queries to reconstruct some of this are half a page long.) He didn't talk much about native XML databases, but at least to me it sounded like he was hinting that that was what he needed.

After the coffee break (I'm finally awake after chugging about 16 ounces) I'm listening to Arofan Gregory from the Open Data Foundation talk about "Towards a Global Infrastructure for Data and Metadata: The Open Data Foundation." They're mostly looking at the raw data collected by governments and researchers not the information generated by processing this data.

The organization is virtual and lives on Skype. He's wasting too much time telling us about the organization. He has yet to tell us what they're actually doing.

Response rates are falling in surveys because a lot of people (including myself) flat out refuse to participate in any surveys.

Disses the semantic web. He wants a federated web of data registries run by professionals. He wants to have standard ontologies to enable semantic interoperability. SDMX (ISO 17369) is important as are several other ISO standards including METS.

For the next session there are three talks I want to hear. Which to pick? BBC iPlayer? Bringing Collaborative Edting of Open Document Format (ODF) Documents to the Web? The missing architecture of the AEA (AJAX Enterprise Applications)? The ODF folks didn't show up, so I think I'll pick the iPlayer.

Robin Doran and Matthew Browning from the BBC:

First attempts at implementing iPlayer took for granted that a relational database would be used to store programme data. Project requirements, however, manifested themselves as amendments to the underlying data model. Each of these amendments would, in turn, require an update to the database schema and corresponding modification to the assumptions of the client code. This became cumbersome and confusing. Additionally, serialisation to and from the database store introduced latency to the publication pipeline. Huge quantities of data that quickly became of only historical interest were being stored, requiring tuning of the database server just to make it perform acceptably, using software developer resource when it could better be spent elsewhere.

Rationalisation and Streamlining

It was recognised that the evolving data model could be expressed in terms of a RelaxNG schema. A great deal of work went into getting this right and outputting documents adhering to it in a single transformation on input data. This gave us a readable point of reference and a handy way to determine the feasibility of new requirements: if they can be expressed in terms of a transformation on our so-called ‘Content Package’ they are possible. Business Rules Database removal and input rationalisation allowed us to impose order upon domain-specific business rules. Requirements were no longer implemented in the software but both specified and implemented in terms of a transform on our input data. Separation also enabled more effective testing and tracking of ownership and history of rules.

Content Publication

Output content is destined for both human and machine consumption. All publishing is divided into two stages: firstly, the production of an initial XML representation of an artefact and, secondly, publication of the output itself. Two-pass publication allowed us to validate XML representations against their corresponding schemas for quality assurance. Config-driven implementation means that adding a new output format is just a matter of dropping in a schema that describes it.

Did It Work?

Solution initially conceived as a mechanism to publish a subset of web content has expanded without effort to make two other internal projects redundant and produce all non-dynamic web content as well as inter-component messaging.

Doesn't work in the U.S. because it's UK government funded. That's utter crap. It's time to tear down artificial boundaries. If the BBC doesn't want to send their content overseas, then we'll just get it from BitTorrent. (And our files will likely be higher quality too.)

TVA is an emerging XML standard for television schedules and TV content metadata.

Relational database performance was "not great". Why was it so slow? Schemas were too inflexible for RAD. What database were they using? MySQL. They don't need to store historical data. They can throw it away. What content store are they using now? a file system or a non-relational DB? They're using the file system.

Directed acyclic graphs are the basic nature of their data. Sounds like a forest to me (and any forest can trivially be come a tree by adding a special root element that holds all the trees in the forest.)

In the first afternoon session, I'm listening to Intel's Ken Graf talk about "Building a XSLT Processor for large documents and high-performance." Large to him means 0.3-2 GB, and the largest document they can handle is 32GB. That qualifies in my mind. I haven't seen many documents bigger than that. He does warn that these techniques may not work for smaller documents. They apparently just announced this product this week.

He asks how many people have XML performance problems (about a third of the audience) and how many have abandoned a project due to performance problems. (No on admits to this. One person tentatively puts their hand half way up.) He should have asked how many of those with problems were using DOM or XSLT vs. SAX. He's basically right that DOM takes 3-5 times the size of the actual document, but he severely underrates SAX's abilities, and vastly overestimates its memory usage. After all these years SAX still gets no respect, even though it's the obvious choice for documents like these.

The core data structure is a table that manages symbols. They store event records for each parser type. The records contain an offset into the actual XML document on disk. The table can be built in streaming mode, and you can work with the start of the data before you get to the end. This reminds me of VTD-XML.

By Intel's measurements, new threads are only justified for operations that take one million or more assembly level instructions.

I'm not sure but I think they're breaking the document into pieces and and running it across several threads/processors/cores at once. This is called "Simultaneous XPath expressions". This doesn't seem to help as much on transforms, as opposed to pure queries.

The multithreaded approach is interesting. I wonder if it's possible to design a multithreaded parser that would give us another order of magnitude improvement in parsing speed. Parsing seems like a fundamentally serial operation but a lot of apparently serial operations can be parallelized when you think about the problem a little. Nonetheless if it's true that new threads are only justified for operations that take one million or more assembly level instructions, then this may well not help for a lot of documents. I suspect the real gain is in running many smaller documents through multiple threads simultaneously.

Now Tony Lavinio from Data Direct XQuery talks about "Using XQuery and XSLT on Non-XML Data". They do this by plugging a converter into a URIResolver. They can also represent the input as a SAXSource or DOMSource ("Nothing good to say about DOM.") or StreamSource or STAXSource. Transforms happen on the fly.

I want to see what the output XML from a CSV input (for example) or a relational query looks like. (Update: just constant td, tr, and table elements.) To resolve a CSV file:

java -cp saxon9.jar -r com.ddtek.xml2007.CSVResolver net.sf.saxon.Transform x-csv:file///c:/XML_2007/books.txt -u table.xsl

OutputURIResolver goes the other way. The XQuery resolver converts some stuff to SQL and other parts to Java code.

Micah Dubinko from Yahoo talks about "WebPath: Querying the web as XML". He calls this the "Platonic Web". He says we need better web tools.

WebPath started as a Hack Day project. 5 Main components:

Lexer: PLY (Python Lex-Yacc)
Recognizer (because of div div div and * * *, middle tokens are different than outer ones in XPath)
Parser (top down operator precedence)
Interpreter

(I missed one.)

Liberal name tests that don't require prefixes in XPath expressions for XHTML.

Adds a get(url) extension function that retrieves a page from the web. Sort of like the document function, but can use this as a location step; e.g. get(a/@href). Or could use ---> or a traverse() function. In fact, this turns out to be the the XPath 2.0 doc() function.

Perhaps I was just in my usual afternoon daze, but I confess I didn't see what exactly the point was here.

For the final afternoon session, Mark Birbeck talks about "XForms, REST, XQuery...and skimming" (like a stone bouncing across a lake):

'Skimming' is about being able to install various pieces of server-side software and then not have to touch them again. No configuration…no writing of server-side scripts…just store data and retrieve it. It may sound a little odd, but a good example of a component that can do this is a WebDAV server; here you simply install the software and then start saving documents, editing and updating them, searching, and so on.

There is no reason why you couldn't build an entire client-side application that manipulates documents and stores and retrieves them, without having to do any more to the server than the initial installation of the WebDAV software.

The XML database eXist can be much the same as WebDAV in that you can install it and then immediately start punching XML documents; unlike relational databases you don't need to know in advance what you want to store so there no need to create tables first, define schemas, etc.

But the skimming architecture goes further; by using a standard interface to our data-in this case XQuery-we don't even need to write server-side scripts, applications or servlets to manage the data. Instead we just use queries from our 'rich client'. The resulting application is very loosely-coupled, and can run on just about any server-side architecture; client-side forms can be deployed by any HTTP server because there is no scripting involved in their creation, and the data can be delivered by any XML database that supports XQuery.

Application development and deployment can therefore become very fast.

He believes the client is too thin, and insufficient to build web applications, so most of the work is done on the server. When the server program generates the user interface from the data (as in Rails or AJAX based XForms toolkits) it's difficult to decouple them, and use the same UI with multiple data sources or the same data source with multiple UIs. URLs invariably point to the application that acts on the data, rather than the data itself. This also helps splits tasks into writers and HTML jockies and away from server side developers.

Monday, December 3, 2007 (Permalink)

I've arrived at XML 2007 in Boston. There's some snow on the ground (though not a huge amount by Boston standards) so it's not clear how many people will get here in time for the morning keynote. I am glad I stayed in the conference hotel instead of the cheaper one a few blocks away. Normally it wouldn't be a big deal, but in the snow it's trickier.

I brought the camera to this show, so hopefully I can get a few pictures. However it seems my ancient copy of Photoshop Elements 2 won't run on the Intel MacBook and/or Leopard so my editing capabilities are limited.

This is the first time I've taken the new MacBook out for a spin in public. The battery definitely does not have the advertised four hours of battery life. It's running down in two or less.

This keyboard is going to take a little getting used to. I really miss my home, page up and page down keys. Seems like Function-up arrow and Function down arrow d page up and page down respectively. I wonder if Function-left arrow and Function-right arrow do home and end? Yep, looks like they do.

The text cursor seems to disappear a lot. (I've used Universal Access to make it bigger. That may help. Hmm, no, it doesn't. The text cursor just takes too long to show up when I use the trackpad. That's weird.) I'm also not accustomed to Leopard yet, and not all of the preferences seem to have transferred over from my desktop. Among other things I seem to have lost all my keychain items. I hope I can remember the passwords to everything. Hopefully this won't make a big deal for the PowerPoint slides I have prepared for tonight.

Over 300 people are attending, a little drop-off from last year. The printed program is out of date. Check online.

The first keynote is a panel with Michael Day, Douglas Crockford, and C. Michael Sperberg-McQueen on "Does XML Have a Future on the Web?"

David Megginson, C. Michael Sperberg-McQueen, Douglas Crockford, Michael Day, IDEAlliance host

Michael Day

No. XML is used more on the server than sent directly to the client. "XHTML" is rarely well-formed. Will not replace HTML on web sites.

Douglas Crockford

"Certainly yes, and I'd offer as evidence of that you can still buy Cobol compilers." However "it's clearly trending down." "XML is really not a very effective data format." JSON is just easier to use. The Web itself is in danger as a result of the XML adventure. There has been no progress made on the Web since 1999. Security is the major issue. We've been too distracted by XML to give HTML the repairs it needs. (First thing he's said I agree with.) He wants to reexamine HTML and DOM and JavaScript with security in mind.

In Q&A, he elaborates that the problem is that different pieces of the HTML page (including JavaScripts and mashup programs from different sources) are not separated from each other and can each see the whole page. Multiple languages--HTML, XML, JavaScript, etc.--make securing this and finding the evil scripts inside the data very difficult. Good points, however when McQueen challenges him on JSON security, he's in complete denial about the specific security issues that he introduced with JSON. (And this continues in later Q&A.)

C. Michael Sperberg-McQueen

Yes, and "it ought to have a future on the Web, and it depends in part on what you mean by the Web." The Web is a "single connected information space," not even just HTTP. Internationalization and accessibility are keys. Compromised notations for specialized niches have non-trivial costs. We need loose coupling between client and server. He focuses on writing things. He needs richer markup than down translation to HTML allows. He publishes XML on the Web. "XML will die when you rip it out of my cold dead hands."

In Q&A he suggests that the WhatWG is broken They are defining a parser spec rather than a language spec. He thinks that if publishers cared about interoperable interpretation of their documents they'd publish valid HTML.

The password for the wireless network here is 01DBA2 (so I don't have to answer this question or the sixth time).

For the next session, I have the choice between two different flavors of snake oil (microformats and XML hardware) and something really boring and mostly irrelevant (DITA). Maybe I'll flip a three-sided coin. OK. Microformats wins.

Melissa Utzinger from the Mitre Corporation is giving a basic introduction to microformats. Firefox 3 has an API for this. There are some Firefox extensions for editing these. Google Maps, Yahoo Local, Yahoo tech, Flickr are using this.

Melissa Utzinger

For the next session though I get to hear about OOXML formats in native XML databases. That's interesting. Mark Turner from Mark Logic is presenting. Damn: I got misled. This is the Microsoft format, not the OpenOffice format. I should have remembered that. This is a trademark suit waiting to happen. Still, should be an interesting talk though.

Mark Turner, XML 2007

I'm not sure why, but this site is not updating as fast as it should. I'm not sure if it's a server caching issue or the local wireless network here at the hotel is caching or just what is going on. Hmm, looks like it's on the server. I've ssh'd into the server and used lynx from there and I still see a non-updated page. Hmm, wait: it does look like a client side SFTP problem. Maybe the local WAN or Cyberduck doesn't work with Leopard? I'll try updating it.

The new version of CyberDuck does seem to be more stable. However it's changed the Upload menu accelerator from Command-U to Option-Up arrow. I hate it when programs do that. My fingers remember Command-U.

E-mail also seems to be a problem. My Speakeasy account works, but IBiblio/Metalab doesn't unless I turn off encrypted connections. I'll have to change my passwords once the conference is over. This may be the wireless proxy/firewall or it may be IBiblio. (Their server certificate expired unexpectedly on Thanksgiving, and they're running on a self-signed certificate for the moment.)

The hotel is putting bottled water in all the conference rooms. This is environmentally a very bad idea. Remind me to suggest on the evals that future shows serve tap water.

Water bottles on conference table

Fo the first afternoon session, we're not sure if the speaker is going to show up. Hmm, looks like he/she didn't. I'm switching over to the XML on the Web track for Mark Pruett talking about "Taming XML in Ajax". AJAX apps are one-page applications. (I knew there was something fishy about them. He just put his finger on it. Different resources should have different URLs. You shouldn't be able to change the resource without changing the URL, but too many AJAX apps obviously do that. Some AJAX apps like GMail get this right--each message has its own URL--but too many don't. We need more granularity than one URL per application. We need separate URLs/URIs for different states of the application. These URLs may be client generated, and the server from which the application code was downloaded may never even see them; but we still need the URLs.)

He's demoing four approaches to building on simple AJAX weather app. The same domain problem is an issue. Approach #1 talks to a server based proxy that talks to the National Weather Service to get around this. Approach #2 uses server side XSLT. Approach #3 uses browser side XSLT. Approach #4 uses Yahoo Pipes and JSON. Apparently there's a script tag hack that can completely get around the cross-domain limitation. This opens up security issues, but he thinks these are not a big deal if you're just loading XML data.

The next session Kurt Cagle talks about "The Trouble with DOM and/or "Lightweight XML: An Exploration of E4X". Only he's not here. He's talking over the Net. Dan McCreary is hosting locally. Weird.

DOM is semantically neutral. The initial setup cost to use XPath and XSLT is a problem. The plumbing costs too much.

JSON is not a good document format because of "unique addressability". There can only be one value per key name. (I'm not sure that's true.) You can have lists or maps, but not both at the same time. "JSON is a degenerate case of XML." He warns us not to tell the AJAX people this because they'll get upset, but he doesn't know Douglas Crawford is sitting in the room. The disadvantages of a long-distance presentation. :-)

In the third afternoon session I listen to Norm Walsh from Sun talk about XProc: An XML Pipeline Language. They're running late. They'll have to go back to second last call working draft. Should be finished by the Spring. XProc specifies what should be done to which XML documents and in what order. For example, XInclude, then validate; or validate, then XInclude. The output of one step flows into the input of the next step. Steps may have options (expected) or parameters (unexpected). It should be amenable to streaming.

<p:xslt name="db2html">
  <p:input port="source">
    <p:pipe step="expand"/>
  </p:input>
  <p:input port="stylesheet">
    <p:document href="docbook.xsl"/>
  </p:input>
  <p:option name="initial-mode" select="$imode" />
  <p:parameter name="foo" value="bar" />
</p:xslt>

Each step has a type and a name.
Steps have named input and output ports, which are parts of the signature. (How to handle multiple inputs and outputs such as may go into XInclude or come out of XSLT? There's a secondary output port that returns zero or more documents. Is there such a thing as an EntityResolver step that an map URIs to other sources? No, there isn't.)
Primary output of one step is default primary input of next step.
Literal inputs are allowed via p:inline.
XPath 2 is allowed.
Compound steps contain other steps. Users cannot define these, only atomic steps.
There is iteration for operating on a bunch of documents with the same step.
Selective processing handles data islands via p:viewport.
30 required atomic steps: add-attribute, add-xml-base, etc.

They're about half a dozen implementations including one written in XQuery! Overall, this is quite interesting and potentially useful. It's the first practical and essentially new thing I've heard about at this conference so far. If time permits, maybe I should see if I can write a developerWorks article about this.

In the last afternoon session Intel's Stewart Taylor talks about XML and XPath in the Wild. They scraped files from the Web, which I suspect gives them a very non-representative sample. (Much, probably most XML, isn't on the public Web.) Types included XHTML, RSS, VoiceXML, SVG, SAML, and SMIL. They also scraped XPath expressions from open source projects. They did various statistics on this including principle component analysis.

XPath Results: 50/50 split between child and descendant axes. A third of the expressions were very simple. 47% had two or more steps but relatively few had three or four. Only 18% of expressions used predicates. About half of these were attribute value tests. Numeric comparisons were non-existent, so don't worry about type conversions. Half used functions, mostly string(), count(), text(), sum(), and boolean(). DOM usage is more common than XPath. They were looking in Java and .NET source code, not XSLT stylesheets and XQuery databases (which I expect would have had much more complex expressions on average.)

Monday night is XForms Evening. I have to get ready for my own keynote so I may not be able to write much here.

Sunday, December 2, 2007 (Permalink)

I'm leaving this afternoon for XML 2007 in Boston. Wireless access permitting, I'll try to update this site live from the show. Tomorrow night I'll be giving a keynote for the XForms evening on "What XForms Needs to Do to Win". See you there.

Michael Kay has released version 9.0.0.2 of Saxon, his XSLT 2.0 and XQuery processor for Java and .NET. This is a bug fix release. According to Kay, "Although Saxon 9.0.0.1 is proving very reliable, this maintenance release was necessary because Saxon-SA 9.0.0.1 was inadvertently compiled using JDK 1.5 and will not run under JDK 1.4."

Saxon is published in two versions for both of which Java 1.4 or later (or .NET) is required. Saxon 9.0B is an open source product published under the Mozilla Public License 1.0 that "implements the 'basic' conformance level for XSLT 2.0 and XQuery." Saxon 9.0 SA is a £250.00 payware version that "allows stylesheets and queries to import an XML Schema, to validate input and output trees against a schema, and to select elements and attributes based on their schema-defined type. Saxon-SA also incorporates a free-standard XML Schema validator. In addition Saxon-SA incorporates some advanced extensions not available in the Saxon-B product. These include a try/catch capability for catching dynamic errors, improved error diagnostics, support for higher-order functions, and additional facilities in XQuery including support for grouping, advanced regular expression analysis, and formatting of dates and numbers."

Saturday, December 1, 2007 (Permalink)

The Mozilla Project has released Firefox 2.0.0.11. "This release corrects a compatibility issue with some websites and extensions discovered in Firefox 2.0.0.10."

Friday, November 30, 2007 (Permalink)

Just in time for XML 2007, the W3C XForms working group has posted the candidate recommendation of XForms 1.1. Changes since 1.0 include:

Support for PUT and DELETE actions
power, card-number, current, choose, id and property XPath extension functions
An email address datatype
An (credit) card-number datatype
An ID card number datatype
An xforms-submit-serialize event
Inline rendering of non-text media types

Wednesday, November 28, 2007 (Permalink)

The XML Apache Project has released Xalan-Java 2.7.1, an open source XSLT processor. Besides bugs fixes, this release upgrades to Xerces-J 2.9.0 and adds support for DOM Level 3 serialization.

The Mozilla Project has released Firefox 2.0.0.10. This release plugs three security holes. All users should upgrade.

Tuesday, November 27, 2007 (Permalink)

Mark Logic has published an XQuery based site for interacting with email archives. According to Jason Hunter, "Each email is stored internally as an XML document and accessed using XQuery. All searches, faceted navigation, analytic calculations, and HTML page renderings are performed on a single MarkLogic Server machine." They're currently indexing 500 or so Apache mailing lists, jdom-interest, and xml-dev, among others.

Monday, November 26, 2007 (Permalink)

The W3C User Agent Accessibility Guidelines Working Group (UAWG) has posted User Agent Accessibility Guidelines 2.0 Requirements:

User Agent Accessibility Guidelines 1.0 (UAAG 1.0) provides guidelines for designing user agents (browsers) that lower barriers to Web accessibility for people with disabilities (visual, hearing, physical, cognitive, and neurological).

Since the release of UAAG 1.0 as a W3C Recommendation in December 2002, the UAWG has received feedback about the usability, understandability, and applicability of the suite of documents. Also, in the intervening years there have been changes and improvements in

technologies and techniques used in web content,

functionality of assistive technology,

accessibility application programming interfaces (APIs), and

platforms used to receive content.

The feedback, changes, and information gathered from evaluating user agents using test suites to develop implementation reports is driving the development of UAAG 2.0 and is captured as the Requirements for UAAG 2.0 (this document).

The primary goal of UAAG 2.0 is the same as it was for version 1.0. To lower barriers to accessibility of user agents.

We intend to ensure that the revision is backwards and forward compatible.

We intend to attract the participation of developers of browsers, assistive technologies, plug-ins, extensions, accessibility APIs (Microsoft Active Accessibility - MSAA, Gnome Accessibility Toolkit - ATK, iaccessible2, Microsoft UI Automation on Windows Vista - UIA, etc.) as well as consumers of accessibility APIs (e.g., some assistive technology and plug-in developers) and end users.

Saturday, November 24, 2007 (Permalink)

RenderX has posted a beta of INX2FO Plug-In 1.0. This pug-in converts Adobe InDesign CS2 documents to XSL Formatting Objects. "The plug-in enables a user to use Adobe InDesign CS2 to design the layout, insert tags for the variable content, and generate outputs that can be merged with XML data. With the help of XEP, the documents are converted from their respective XML-based formats to XSL FO (XSLFO) and then to PDF or PostScript output."

Friday, November 23, 2007 (Permalink)

Stephen Rider has released Virtual Multiblog 2.0, a multiuser fork of WordPress in which Stephen says:

Each blog acts as a completely separate install -- separate admin, users, etc.

NO known incompatibilities with any plugins or themes.

Thursday, November 22, 2007 (Permalink)

Wolfgang Hoschek has uncovered a surprising bug in Java 1.6's SAX parser, a Xerces variant. Apparently in some circumstances com.sun.org.apache.xerces.internal.util.XMLStringBuffer "consumes 40 MB of memory in a single char[] array." The bug does not seem to be present when using the real Xerces 2.9.0 from Apache.

Wednesday, November 21, 2007 (Permalink)

The W3C Web Security Context Working Group has posted the first public working draft of Web Security Context: Experience, Indicators, and Trust.

This specification deals with the trust decisions that users must make online, and with ways to support them in making safe and informed decisions where possible.

In order to achieve that goal, this specification includes recommendations on the presentation of identity information by Web user agents; on handling errors in security protocols in a way that minimizes the trust decisions left to users, and (we hope) induces them toward safe behavior where they have to make these decisions; and on data entry interactions that (we hope, again) will make it easier for users to enter sensitive data into legitimate sites than to enter them into illegitimate sites.

Where this document specifies user interactions with a goal toward making security usable, no claim is made at this time that this goal is met: As noted in the Status of this Document section, this is an initial draft to trigger discussion and commentary; assume that what is proposed here is untested.

To complement the interaction and decision related parts of this specification, 8 Robustness addresses the question of how the communication of context information needed to make decisions can be made more robust against attacks.

Finally, 9 Authoring and deployment best practices is about practices for those who deploy Web Sites. It complements some of the interaction related techniques recommended in this specification. The aim of this section is to provide guidelines for creating Web sites with reduced attack surfaces against certain threats, and with usefully provided security context information.

This specification comes with two companion documents: [WSC-USECASES] documents the use cases and assumptions that underly this specification. [WSC-THREATS] documents the Working Group's threat analysis.

Tuesday, November 20, 2007 (Permalink)

The Mozilla Project has posted the first beta of Firefox 3.0 for Mac, Linux, and Windows. This is code named "Gran Paradiso". "Firefox 3 Beta 1 is based on the new Gecko 1.9 Web rendering platform, which has been under development for the past 27 months and includes nearly 2 million lines of code changes, fixing more than 11,000 issues. Gecko 1.9 includes some major re-architecting for performance, stability, correctness, and code simplification and sustainability. Firefox 3 has been built on top of this new platform resulting in a more secure, easier to use, more personal product with a lot under the hood to offer website and Firefox add-on developers."

Monday, November 19, 2007 (Permalink)

The W3C Content Transformation Task Force has posted the first public working draft of Content Transformation Landscape 1.0. "This document identifies the issues surrounding use of transforming proxies in the delivery of Web content. It does not comment on the techniques that cause these issues, it merely identifies them in order to inform the requirements of the Content Transformation Guidelines document. That document is to offer recommendations as to how components of the delivery context can cooperate to achieve, at a minimum, a functional user experience."

Sunday, November 18, 2007 (Permalink)

Jason Hunter has released JDOM 1.1, a library for processing XML with Java using a tree metaphor. Besides bug fixes, this release adds:

EntityResolvers when building documents
A forceNamespaceAware property in DOMOutputter to specify that you want a DOM constructed with namespaces even if the source JDOM document has no namespaces
An isXMLWhitespace() method in Verifier
set/getIgnoringBoundaryWhitespace() methods and features to SAXBuilder and SAXHandler

Saturday, November 17, 2007 (Permalink)

The W3C POWDER Working Group has published a new working draft of Protocol for Web Description Resources (POWDER): Grouping of Resources. "The Protocol for Web Description Resources (POWDER) facilitates the publication of descriptions of multiple resources such as all those available from a Web site. This document describes how sets of resources may be defined, either for use in Description Resources or in other contexts. An OWL Class is to be interpreted as the Resource Set with its predicates and objects either defining the characteristics that elements of the set share, or directly listing its elements. Resources that are directly identified or that can be interpreted as being elements of the set can then be used as the subject of further RDF triples."

Friday, November 16, 2007 (Permalink)

The W3C RDF Data Access Working Group has published the proposed recommendations of SPARQL Query Results XML Format, SPARQL Protocol for RDF, and SPARQL Query Language for RDF. According to the latter, "RDF is a directed, labeled graph data format for representing information in the Web. This specification defines the syntax and semantics of the SPARQL query language for RDF. SPARQL can be used to express queries across diverse data sources, whether the data is stored natively as RDF or viewed as RDF via middleware. SPARQL contains capabilities for querying required and optional graph patterns along with their conjunctions and disjunctions. SPARQL also supports extensible value testing and constraining queries by source RDF graph. The results of SPARQL queries can be results sets or RDF graphs."

Thursday, November 15, 2007 (Permalink)

The Unicode Consortium has posted a beta of Unicode 5.1.0. I don't think they're any new characters in this release, just some changes to algorithms and locale data.

Tuesday, November 13, 2007 (Permalink)

The W3C Internationalization Tag Set Working Group has posted a new working draft of Best Practices for XML Internationalization.

Monday, November 12, 2007 (Permalink)

Code Synthesis has released XSD/e 2.0.0, a free-as-in-speech (GPL) C++ schema validating XML parser for embedded environments. According to Boris Kolpackove, "This release adds the new Embedded C++/Serializer mapping which generates validating serializer skeletons for data types defined in XML Schema. These skeletons can then be implemented to serialize application data to XML. In comparison to the traditional, tree-like data binding model, the C++/Serializer mapping allows you to create large XML documents that would not fit into memory, perform stream- oriented serialization, and use your own in-memory representation as a data source."

Sunday, November 11, 2007 (Permalink)

The W3C Web API Working Group has posted a new working draft of The XMLHttpRequest Object.

The XMLHttpRequest object implements an interface exposed by a scripting engine that allows scripts to perform HTTP client functionality, such as submitting form data or loading data from a server.

The name of the object is XMLHttpRequest for compatibility with the web, though each component of this name is potentially misleading. First, the object supports any text based format, including XML. Second, it can be used to make requests over both HTTP and HTTPS (some implementations support protocols in addition to HTTP and HTTPS, but that functionality is not covered by this specification). Finally, it supports "requests" in a broad sense of the term as it pertains to HTTP; namely all activity involved with HTTP requests or responses for the defined HTTP methods.

Saturday, November 10, 2007 (Permalink)

Syntext has released Xsl-Status 1.3.0, an open source progress tracking tool for XSLT stylesheet developers Xsl-Status tracks which elements of an XML Schema are supported in the XSLT stylesheet, what the development status of XSLT templates is, and which template supports which XML element. New features in this release include:

Generating multiple reports at a time
Grouping generated reports
Summary reports
XML Catalog support

Xsl-Status is written in Python and published under the Apache 2.0 license.

Friday, November 9, 2007 (Permalink)

The W3C XML Schema Patterns for Databinding Working Group has posted the first public working drafts of Basic XML Schema Patterns for Databinding Version 1.0 and Advanced XML Schema Patterns for Databinding Version 1.0. According to the basic spec,

A representative collection of databinding implementations in common use has been used to provide an indication of the "state of the art". State of the art databinding implementations have displayed uneven and inconsistent support of the W3C [XML Schema 1.0] Recommendation resulting in impaired interoperability and a poor user experience of databinding tools:

rejecting valid [XML Schema 1.0] documents,

rejecting valid [XML 1.0] instance documents, and

making the content of valid [XML 1.0] instance documents unavailable in mapped data structures.

This specification provides a basic set of example [XML Schema 1.0] constructs and types in the form of concrete [XPath 2.0] expressions. These patterns are known to work well with state of the art databinding implementations.

Authors of [XML Schema 1.0] documents may find these patterns useful in providing a better user experience for consumers of their schemata using databinding tools. Whilst it is not possible to guarantee that schemata produced using these patterns will give a good user experience with the universal set of databinding tools, the patterns contained in this specification have been all been tested with a number of different tools covering a variety of different programming languages and environments.

Implementers of databinding tools may find these patterns useful to represent simple and common place data structures. Ensuring tools recognize at least these simple [XML Schema 1.0] patterns and present them in terms most appropriate to the specific language, database or environment will provide an improved user experience when using databinding tools. It is inappropriate to use this specification to constrain implementation of the [XML Schema 1.0] Recommendation.

The advanced spec "provides a set of commonly used [XML Schema 1.0] patterns known to cause issues with some state of the art databinding implementations."

Thursday, November 8, 2007 (Permalink)

The W3C Web API Working Group has posted the second public working draft of Progress Events 1.0. This "defines events which can be used to monitor a process and provide feedback to a user, particularly for network-based events." Here's the IDL:

interface ProgressEvent : events::Event {
     readonly attribute boolean         lengthComputable;
     readonly attribute unsigned long   loaded;
     readonly attribute unsigned long   total;
     void               initProgressEvent(in DOMString typeArg,
                                          in boolean       canBubbleArg,
                                          in boolean       cancelableArg,
                                          in boolean       lengthComputableArg,
                                          in unsigned long loadedArg,
                                          in unsigned long totalArg,
     void               initProgressEventNS(in DOMString namespaceURI,
                                            in DOMString typeArg,
                                            in boolean       canBubbleArg,
                                            in boolean       cancelableArg,
                                            in boolean       lengthComputableArg,
                                            in unsigned long loadedArg,
                                            in unsigned long totalArg,
};

Wednesday, November 7, 2007 (Permalink)

Michael Kay has released version 9.0 of Saxon, his XSLT 2.0 and XQuery processor for Java and .NET. This is a bug fix release. According to Kay,

There is a new Java API, called s9api. Existing APIs remain supported.

The command line interfaces have received a revamp, while retaining backwards compatibility for most options.

The schema processor now supports assertions, as defined in XML Schema 1.1.

A new extension function allows multiple document output in XQuery.

It is now possible to save a compiled schema (the schema component model) as XML.

There is a new model for pull-based evaluation of queries., improving the ability to integrate into a pull-based pipeline architecture.

The latest draft of the XQJ specification (XQuery API for Java) is implemented

Number and date formatting has been added for a number of additional European languages including Belgian French, Flemish, Dutch, Danish, Swedish, and Italian

A number of new optimizations have been introduced. These include function and variable inlining, wider use of automatic indexing, wider use of tail call optimization, hashing for large xsl:choose expressions, and a speed-up of the DOM interface.

Document projection analyzes a query and discards the parts of the source tree that are not needed to answer the query, giving a significant saving in tree-building time and memory.

Optimized expression trees can now be output ("explained") in an XML format, making it amenable to processing or graphical rendition.

Please note that queries compiled into Java code are not backwards-compatible at this release; they must be recompiled.

Tuesday, November 6, 2007 (Permalink)

After eight years of neglect, the W3C CSS Working Group has resurrected Behavioral Extensions to CSS. "Behavioral Extensions provide a way to link to binding technologies, such as XBL, from CSS style sheets. This allows bindings to be selected using the CSS cascade, and thus enables bindings to transparently benefit from the user style sheet mechanism, media selection, and alternate style sheets." This defines a single binding property and a :bound-element pseudo-class.

Monday, November 5, 2007 (Permalink)

SyncroSoft has released <Oxygen/> 9.0, $345 payware XML editor written in Java. Oxygen supports XML, XSL, DTDs, XQuery, SVG, Relax NG, Schematron, and the W3C XML Schema Language. According to the announcement, "The main feature of version 9.0 is a CSS-based visual XML editor allowing WYSIWYG-like editing of XML documents. With an innovative approach to XML authoring <oXygen/> allows you to work with XML frameworks (DocBook, DITA, TEI, XHTML, etc) easier than ever before. Version 9 adds a new concept called Document Type that allows you to provide ready-to-use support for a framework or an XML language and share it with other users. This version also brings improved error reporting for validation against Relax NG schemas and Schematron, additional side view helpers, some component updates and a number of other features."

Sunday, November 4, 2007 (Permalink)

The W3C CSS Working Group has published Cascading Style Sheets (CSS) Snapshot 2007. "When the first CSS specification was published, all of CSS was contained in one document that defined CSS Level 1. CSS Level 2 was defined also by a single, multi-chapter document. However for CSS beyond Level 2, the CSS Working Group chose to adopt a modular approach, where each module defines a part of CSS, rather than to define a single monolithic specifcation. This breaks the specification into more manageable chunks and allows more immediate, incremental improvement to CSS. Since different CSS modules are at different levels of stability, the CSS Working Group has chosen to publish this profile to define the current scope and state of Cascading Style Sheets as of late 2007. This profile includes only specifications that we consider consider stable and for which we have enough implementation experience that we are sure of that stability. Note that this is not intended to be a CSS Desktop Browser Profile: inclusion in this profile is based on feature stability only and not on expected use. This profile defines CSS in its most complete form."

Saturday, November 3, 2007 (Permalink)

Liquid Technologies has released Liquid XML Studio 1.0.4, a free-as-in-beer XML Schema Editor for Windows that supports auto-complete, syntax highlighting, validation, and HTML generation.

Friday, November 2, 2007 (Permalink)

The Mozilla Project has released Firefox 2.0.0.9. This release fixes a few regression in 2.0.0.8, or at least it claims to. 2.0.0.8 was stable for me but 2.0.0.9 crashed almost immediately. Your mileage may vary.

The Mozilla Project has released Camino 1.5.3, an open source Mac OS X web browser based on the Gecko 1.8 rendering engine and the Quartz GUI toolkit. It supports pretty much all the technologies that Mozilla does: HTML, XHTML, CSS, XML, XSLT, etc. Mac OS X 10.3 or later is required. Version 1.5.1 adds version 1.8.1.9 of the Mozilla Gecko rendering engine.

Thursday, November 1, 2007 (Permalink)

The CSS Working Group has published the last call working draft the CSS Mobile Profile 2.0. "This specification defines in general a subset of CSS 2.1 [CSS21] that is to be considered a baseline for interoperability between implementations of CSS on constrained devices (e.g. mobile phones). Its intent is not to produce a profile of CSS incompatible with the complete specification, but rather to ensure that implementations that due to platform limitations cannot support the entire specification implement a common subset that is interoperable not only amongst constrained implementations but also with complete ones. Additionally, this specification aligns itself as much as possible with the OMA's Wireless CSS 1.1 [WCSS11] specification."

Wednesday, October 31, 2007 (Permalink)

The Modis Team has released Sedna 2.2, an open source native XML database for Windows and Linux written in C++ and Scheme and published under the Apache License 2.0. Sedna supports XQuery and its own declarative update language. This release fixes bugs, adds XQuery triggers, and no longer requires root privileges to run on Linux.

Of the open source XML databases, this is the one I know the least about. Anyone want to comment on this one?

Tuesday, October 30, 2007 (Permalink)

The W3C Semantic Web Best Practices and Deployment Working Group and HTML Working Groups have published a new working draft of RDFa Primer 1.0.

Current Web pages, written in XHTML, contain inherent structured data: calendar events, contact information, photo captions, song titles, copyright licensing information, etc. When authors and publishers can express this data precisely, and when tools can read it robustly, a new world of user functionality becomes available, letting users transfer structured data between applications and Web sites. An event on a Web page can be directly imported into a desktop calendar. A license on a document can be detected to inform the user of his rights automatically. A photo's creator, camera setting information, resolution, and topic can be published as easily as the original photo itself.

This document is an introduction to RDFa, a method for achieving precisely this kind of structured data embedding in XHTML. The normative specification of RDFa may be found in [RDFa-SYNTAX].

Here's a syntax example from the draft:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN" "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd">
<html xmlns:cal="http://www.w3.org/2002/12/cal/ical#"
      xmlns:contact="http://www.w3.org/2001/vcard-rdf/3.0#">
  <head>
    <title>Jo's Friends and Family Blog</title>
  </head>

  <body>
...
  <p instanceof="cal:Vevent">
    I'm holding
    <span property="cal:summary">
      one last summer Barbecue,
    </span>
    on
    <span property="cal:dtstart" content="20070916T1600-0500">

      September 16th at 4pm.
    </span>
  </p>
...
  <p class="contactinfo" about="http://example.org/staff/jo">
    <span property="contact:fn">Jo Smith</span>.
    <span property="contact:title">Web hacker</span>

    at
    <a rel="contact:org" href="http://example.org">
      Example.org
    </a>.
    You can contact me
    <a rel="contact:email" href="mailto:jo@example.org">
      via email
    </a>.
  </p>
...
    </body>

</html>

The thing that jumps out at me are the use of namespace prefixes in attribute values. Haven't we learned by now that this is a bad idea?

The W3C Web API working group has posted the fourth public working draft of the Selectors API. "It is often desirable to perform script and or DOM operations on a specific set of elements in a document. Selectors [Selectors], mostly used in CSS [CSS21] context, provides a way of matching such a set of elements. This specification introduces two methods which take a group of selectors (often simply referred to as selector) as argument and return the matched elements as result." The spec offers the following JavaScript example:

var i = 0;
function resolver(prefix) {

  var ns = ["http://example.org/foo",
            "http://example.org/bar",
            "http://example.org/baz"];
  return ns[i++];
}

var x = document.querySelectorAll("foo|x, foo|y, bar|z", resolver);

Once again we see how namespaces take a relatively straightforward idea, and turn it into an illegible mess. :-(

Monday, October 29, 2007 (Permalink)

The W3C has published XForms 1.0 (Third Edition). According to John Boyer, "This version of the specification contains 343 'diffs' that have significantly hardened XForms for enterprise deployment. By comparison, XForms 1.0 Second Edition in 2006 was based on just over 100 diffs." The most significant change is the addition of a section on "Interpretation of same-document references".

Friday, October 26, 2007 (Permalink)

Matt Mullenweg has released Wordpress 2.3.1 an open source (GPL) blog engine based on PHP and MySQL. This release fixes over 20 bugs including some security bugs. All users should upgrade.

Thursday, October 25, 2007 (Permalink)

The W3C Semantic Web Deployment Working Group and XHTML 2 Working Group have posted the first public working draft of RDFa in XHTML: Syntax and Processing.

The modern Web is made up of an enormous number of documents that have been created using HTML. These documents contain significant amounts of structured data, which is largely unavailable to tools and applications. When publishers can express this data more completely, and when tools can read it, a new world of user functionality becomes available, letting users transfer structured data between applications and web sites, and allowing browsing applications to improve the user experience: an event on a web page can be directly imported into a user's desktop calendar; a license on a document can be detected so that users can be informed of their rights automatically; a photo's creator, camera setting information, resolution, location and topic can be published as easily as the original photo itself, enabling structured search and sharing.

RDFa is a specification for attributes to be used with languages such as HTML and XHTML to express structured data. The rendered, hypertext data of XHTML is reused by the RDFa markup, so that publishers don't need to repeat significant data in the document content. This document only specifies the use of the RDFa attributes with XHTML. The underlying abstract representation is RDF [RDF-PRIMER], which lets publishers build their own vocabulary, extend others, and evolve their vocabulary with maximal interoperability over time. The expressed structure is closely tied to the data, so that rendered data can be copied and pasted along with its relevant structure.

The rules for interpreting the data are generic, so that there is no need for different rules for different formats; this allows authors and publishers of data to define their own formats without having to update software, register formats via a central authority, or worry that two formats may interfere with each other.

RDFa shares some use cases with microformats. Whereas microformats specify both a syntax for embedding structured data into HTML documents and a vocabulary of specific terms for each microformat, RDFa specifies only a syntax and relies on independent specification of terms (RDF Classes and Properties) by others. RDFa allows terms from multiple independently-developed vocabularies to be freely intermixed and is designed such that the language can be parsed without knowledge of the specific term vocabulary being used.

Wednesday, October 24, 2007 (Permalink)

I've posted the notes from last night's Native XML Databases presentation at the New York PHP User's Group monthly meeting. It was an active and inquiring audience, which is always fun; so we only got through the first 30 or so slides. I didn't really have time to go into the details of XQuery. There was also a tangent on REST and proper URL and web application design, we couldn't really explore fully. Maybe next time.

If anyone's interested in having me deliver this talk to their user group or conference, drop me a line. I don't have much time for out of town travel right now, but I can usually arrange something in New York City and its immediate vicinity.

Tuesday, October 23, 2007 (Permalink)

The W3C Web API Working Group has published the first working draft of Language Bindings for DOM Specifications. "“Language Bindings for DOM Specifications” is intended to specify in detail the IDL language used by W3C specifications to define DOM interfaces, and to provide precise conformance requirements for ECMAScript and Java bindings of such interfaces. It is expected that this document acts as a guide to implementors of already-published DOM specifications, and that newly published DOM specifications reference this document to ensure conforming implementations of DOM interfaces are interoperable." For reasons I'll be elaborating on later this year, this spec is part of the problem, not part of the solution.

Don't forget, tonight, Tuesday October 23, I will be presenting Native XML Databases to the New York PHP User's Group in midtown Manhattan. The meeting is free but preregistration and photo ID are required.

Monday, October 22, 2007 (Permalink)

The W3C Web Application Formats Working Group has posted an updated working draft of Widgets 1.0:

Widgets are a class of client-side web application for displaying and/or updating local or remote data, packaged in a way to allow a single download and installation on a client machine or device. Examples include clocks, stock tickers, news casters, games and weather forecasters. This specification, when combined with other dependent specifications, defines a software solution for Widgets, including:

A packaging format defined in terms of the Zip File Format Specification, to provide authors with an interoperable way to encapsulate and distribute widgets.

An XML-based configuration format and processing model, to allow authors to declare metadata about a widget.

A model that allows a user-agent to automatically start a widget.

An HTTP-based model for version control, to allow user agents to automatically keep widgets up-to-date.

A set of ECMAScript implementable DOM APIs and events, including an API to allow instantiated widgets to communicate with one another.

A model that leverages the XML-Signature Syntax and Processing Specification to allow a widget to be digitally signed.

A security model to reduce privacy risks and reduce the potential for damage to an end-users machine or device.

A means for web browsers to automatically "discover" widgets from within a HTML document.

Accessibility requirements for user agents to ensure that perceptual and interactive parts of widgets are accessible.

Saturday, October 20, 2007 (Permalink)

The Mozilla Project has released Firefox 2.0.0.8. This release plugs security holes, adds Georgian and Romanian localizations, and improves support for Mac OS X Leopard. All users should upgrade.

Opera Software has released version 9.2.4 of their namesake free-beer web browser for Windows, Mac, and Linux, FreeBSD, and Solaris. This release fixes some Flash related security problems and improves Leopard compatibility. All users should upgrade.

Thursday, October 18, 2007 (Permalink)

TreeStages has released XEntrant 0.2, a closed source, Windows "Model Driven XML Tree Editor" that supports W3C schemas and XSLT. Pricing does not seem to be available. XEntrant is written in Python and seems to make the same mistake 90% of XML editors have made over the last ten years; specifically the user interface is designed according to what the program's developers find easy to code rather than what the program's users would find easy to code in. Tree-based editors just aren't all that useful, but they're relatively easy to develop. Linear editors are much more useful, but they're also much harder to write.

Altova has released XMLSPY 2008, a $499-$999 payware XML editor for Windows. This release now supports the Office Open XML formats from Microsoft Office 2007. It also adds support for XInclude and XPointer.

Gerald Schmidt has released XML Copy Editor 1.1.0.3, a free-as-in-speech (GPL) XML editor for Windows and Linux. Features include DTD/XML Schema/RELAX NG validation, XSLT, XPath, pretty-printing, syntax highlighting, tag folding, tag completion, spell and style check, XHTML, XSL, DocBook and TEI, and Microsoft Word import and export. This release fixes bugs and improves Gnome integration.

Wednesday, October 17, 2007 (Permalink)

The W3C has posted the call for papers for the Seventeenth International World Wide Web Conference (WWW2008) to take place April 21-25, 2008 in Beijing, China. Topics include:

Browsers and User Interfaces
Data Mining
Internet Monetization
Mobility
Performance and Scalability
Rich Media
Search
Security and Privacy
Semantic / Data Web
Social Networks and Web 2.0
Web Engineering
XML and Web Data
Industrial Practice and Experience
Technology for Developing Regions
WWW in China

Hmm, I wonder if they'd accept a paper on bypassing the great firewall of China for that last one? The main theme of the conference is "One World, One Web". It does seem funny to host a conference with that theme in a country that is so committed to creating its own censored Web that provides distinctly different content on subjects like Tianamen Square, Tibet, and Taiwan than one would find when surfing in the rest of the world, but perhaps it will do some good. In any case, papers are due by November 1.

I also notice that HTML and XML are not welcome at this conference:

Refereed papers must be submitted as PDF documents. No other format will be accepted. It is the responsibility of all authors to produce PDF documents that can be read and printed on any platform. Please check to ensure that you can produce PDF documents well before the submission deadline. The inability to produce a PDF document will not result in an extension of the paper submission deadline.

...

Refereed papers can be prepared using either LaTeX or Microsoft Word. (Other document preparation systems can be used, but are not recommended and no assistance will be provided in the case of problems. Authors using other document preparation systems are responsible for producing output completely equivalent to that produced using one of the methods below.)

May I suggest a theme for WWW 2009? "Practice what we preach and eat our own dog food."

Tuesday, October 16, 2007 (Permalink)

Next week, Tuesday October 23, I will be presenting Native XML Databases to the New York PHP User's Group in midtown Manhattan. Time permitting, I may do a little actual XQuery and PHP code, but mostly I'll be talking at a high level about just what native XML databases are and are not good for and why and when you might want to use one. I expect there to be several committed SQL and flat file ~~bigots~~aficionados in attendance so expect sparks to fly; and if they don't, well, I guess we'll just have to go drink some more beer until they do. :-) The meeting is free but preregistration and ID are required. Bring an open mind or a basket of tomatoes: your choice.

Monday, October 15, 2007 (Permalink)

James Snell has launched a Wiki to capture Atom best Practices. Suggestions include:

Many feed readers are incapable of preserving the base URI unless xml:base is used. Consider always using absolute IRIs or xml:base
While the spec allows atom:category to have child text and elements, most feed consumers are incapable doing anything with the extra markup. Child markup should be avoided. See: http://feedvalidator.org/docs/warning/AtomLinkNotEmpty.html
Encode and decode non-ASCII characters according to the rules of RFC2822 email headers.
When presenting internationalized email addresses to users, be able to show them both the encoded and the decoded versions of the address.
Namespace prefix declarations for atom and xhtml haven't proven to be very interoperable in the context of an Atom document. Avoid declaring prefixes for the atom and xhtml namespaces. See
(X)HTML markup contained in text constructs and content should be limited to a subset of elements, attributes, and styles that are known to be safe. See http://wiki.whatwg.org/wiki/Sanitization_rules
The "self" link should either be an absolute IRI or xml:base should be used.

It's disturbing that so many of these revolve around supporting broken and brain damaged feed readers and tools that can't handle the basic specs like Namespaces in XML and xml:base. Personally I consider it a best practice to fix or replace broken tools rather than living with them.

Saturday, October 13, 2007 (Permalink)

RenderX has released Visual-XSL 1.5, a $1000 payware that "simplifies visual design of printable layouts that populate form-like documents with XML data. A PDF file or a raster image, such as a scan of a preprinted form, can be used as a layout's background. The resulting XSLT stylesheet that merges the layout and the XML data to produce XSL FO can be used for creation of PDF or PostScript print files using RenderX's XEP XSL FO formatter. The resulting printable output may contain the imported background, or the background may be omitted for print on preprinted forms."

Friday, October 12, 2007 (Permalink)

IBM has released Lotus Forms 3.0, a closed source "XForms-enabled product suite that includes a visual forms design experience, a web browser plugin run-time, and a web server run-time that provides XForms functionality to web browser clients using HTML and AJAX. The product line includes support for XForms 1.0 as well as more features from XForms 1.1."

Thursday, October 11, 2007 (Permalink)

The W3C Service Modeling Language (SML) Working Group has published working drafts of Service Modeling Language, Version 1.1 and Service Modeling Language Interchange Format Version 1.1. According to the former:

The Service Modeling Language (SML) provides a rich set of constructs for creating models of complex services and systems. Depending on the application domain, these models may include information such as configuration, deployment, monitoring, policy, health, capacity planning, target operating range, service level agreements, and so on. Models provide value in several important ways.
Models focus on capturing all invariant aspects of a service/system that must be maintained for the service/system to function properly.

Models represent a powerful mechanism for validating changes before applying the changes to a service/system. Also, when changes happen in a running service/system, they can be validated against the intended state described in the model. The actual service/system and its model together enable a self-healing service/system the ultimate objective. Models of a service/system must necessarily stay decoupled from the live service/system to create the control loop.

Models are units of communication and collaboration between designers, implementers, operators, and users; and can easily be shared, tracked, and revision controlled. This is important because complex services are often built and maintained by a variety of people playing different roles.
Models drive modularity, re-use, and standardization. Most real-world complex services and systems are composed of sufficiently complex parts. Re-use and standardization of services/systems and their parts is a key factor in reducing overall production and operation cost and in increasing reliability.
Models enable increased automation of management tasks. Automation facilities exposed by the majority of services/systems today could be driven by software -- not people -- both for reliable initial realization of a service/system as well as for ongoing lifecycle management.
A model in SML is realized as a set of interrelated XML documents. The XML documents contain information about the parts of a service, as well as the constraints that each part must satisfy for the service to function properly. Constraints are captured in two ways:

Schemas these are constraints on the structure and content of the documents in a model. SML uses XML Schema [XML Schema Structures, XML Schema Datatypes] as the schema language. In addition SML defines a set of extensions to XML Schema to support inter-document references.
Rules are Boolean expressions that constrain the structure and content of documents in a model. SML uses a profile of Schematron [ISO/IEC 19757-3, Introduction to Schematron, Improving Validation with Schematron] and XPath [XPath] for rules.
One of the important operations on the model is to establish its validity. This involves checking whether all data in a model satisfies the schemas and rules declared.

Wednesday, October 10, 2007 (Permalink)

Planamesa Software has released NeoOffice/J 2.2.2, a Mac port of OpenOffice 2.1 using a Java-based GUI. This release fixes bugs and speeds up rendering.

Tuesday, October 9, 2007 (Permalink)

Intel has posted a beta of the Intel XML Software Suite, a collection of libraries for XSLT processing, XPath, DOM, SAX, and XML Schema Validation. The libraries seem to be written in native code for Linux and Windows, but a JNI based wrapper for Java is included. They claim this is twice as fast as XSLTC and Xalan for XPath and XSLT and six times faster than Xerces-C++ for raw parsing. If that's true, that's very interesting. Xerces isn't the fastest parser out there, but a six times speed-up is better than I think anyone else has done. It also suggests that the push for non-XML binary encodings is very likely premature. Most interestingly, they claim to have done this using standard APIs: SAX and DOM. Personally I had little doubt that XML parsing performance could be sped up, but I expected that this would require some new APIs designed for high performance. They don't seem to have needed that. I look forward to hearing more details of their algorithms, and seeing whether these claims hold up when others inspect them.

Monday, October 8, 2007 (Permalink)

The W3C Mobile Web Initiative Best Practices Working Group has published the last call working draft of W3C mobileOK Basic Tests 1.0:

mobileOK Basic is a scheme for assessing whether Web resources (Web content) can be delivered in a manner that is conformant with Mobile Web Best Practices [BestPractices] to a simple and largely hypothetical mobile user agent, the Default Delivery Context.

This document describes W3C mobileOK Basic tests for delivered content, and describes how to emulate the DDC when requesting that content.

mobileOK Basic is the lesser of two levels of claim, the greater level being mobileOK Pro, described separately. Claims to be W3C mobileOK Basic conformant are represented using Description Resources (see [POWDER]) also described separately.

The intention of mobileOK is to help catalyze development of Web content that provides a functional user experience in a mobile context. It is not a test for browsers, user agents or mobile devices, and is not intended to imply anything about the way these should behave.

mobileOK does not imply endorsement or suitability of content. For example, it must not be assumed that a claim that a resource is mobileOK conformant implies that it is of higher informational value, is more reliable, more trustworthy or is more appropriate for children than any other resource.

Sunday, October 7, 2007 (Permalink)

The Web has a rich set of resources that can be combined to build content, applications and feature-rich Web sites. A contributor to this richness is Web sites including references (e.g. a link or an image inclusion) to resources residing in other domains.
To prevent information leakage, user agents, such as Web browsers, implement a same origin policy that allows a document (e.g. some JavaScript) to read, process, or otherwise interrogate the contents of another resource if, and only if, the other resource resides in the same domain. This policy prevents domain A, acting on behalf of the user, to get information from domain B. For instance, this prevents a malicious site from reading information from the user's intranet using a technology such as XMLHttpRequest.
This restriction is very strict and generally appropriate. However, there are scenarios where an application would like to get data from another resource on the Web without these restrictions. For this to work the browser's same origin policy has to be extended or eased. For example, a car reservation Web site may want to request trip itinerary data from an affiliated airline reservation website to streamline making a car reservation. The easing of read access restrictions is particularly important to Web browsers that implement the XMLHttpRequest object and VoiceXML 2.1 browsers using the data element.
To facilitate clear and controlled read access to resources, this specification defines a read access control mechanism that enables a Web resource to permit access to its content from external domains when such access would otherwise be prohibited by a same origin policy. The defined mechanism only works in conjunction with other specifications that are using the read access control mechanism to enable read access.

Saturday, October 6, 2007 (Permalink)

The W3C HTML working group has posted the second public working draft of XHTML Role Attribute Module.

The role attribute takes as its value one or more whitespace separated CURIEs. Any non-qualified value MUST be interpreted in the XHTML namespace, and MUST be taken from the list defined in this section.

The attribute describes the role(s) the current element plays in the context of the document. This can be used, for example, by applications and assistive technologies to determine the purpose of an element. This could allow a user to make informed decisions on which actions may be taken on an element and activate the selected action in a device independent way. It could also be used as a mechanism for annotating portions of a document in a domain specific way (e.g., a legal term taxonomy).
This example is informative
<ul role="navigation wai:sitemap">
    <li href="downloads">Downloads</li>
    <li href="docs">Documentation</li>

    <li href="news">News</li>
</ul>
Authors may use the following standard roles, listed here with their conventional interpretations. They are intended to define regions of the document to help orient the user.

banner

A banner is usually defined as the advertisement at the top of a web page. The banner content typically contains the site or company logo and other key advertisements for the site.

contentinfo

This is information about the content on the page. For example, footnotes, copyrights, links to privacy statements, etc. would belong here.

definition

The contents of the associated element represent a definition (e.g., of a term or concept). If there is a dfn element within the contents (as defined in [XHTMLMOD]), then that represents the term being defined.

main

This defines the main content of a document.

navigation

This is a collection of links suitable for use when navigating the document or related documents.

note

The content is parenthetic or ancillary to the main content of the resource.

search

This is the search section of a web document. This is typically a form used to submit search requests about the site or a more general Internet wide search service.

secondary

This is any unique section of the document. In the case of a portal, this may include but not be limited to: show times; current weather; or stocks to watch.

seealso

Indicates that the element contains content that is related to the main content of the page.

You can add other values for this attribute by placing the values in a namespace. (Haven't we learned yet that namespaced attribute values are a bad idea?)

Friday, October 5, 2007 (Permalink)

The W3C Math Working Group has posted the second public working draft of Mathematical Markup Language (MathML) Version 3.0. Changes since 2.0 include content dictionaries, "a mechanism for recording that a particular notational structure has a particular mathematical meaning". Version 3.0 is also supposed to enable easier markup of elementary school mathematics.

Thursday, October 4, 2007 (Permalink)

Mulberry Technologies has announced Balisage: The Markup Conference, to take place in Montreal August 12-15, 2008. "Balisage is designed to meet the needs of markup theoreticians and practitioners who are pushing the boundaries of the field. It's all about the markup: how to create it; what it means; hierarchies and overlap; modeling; taxonomies; transformation; query, searching, and retrieval; presentation and accessibility; making systems that make markup dance (or dance faster to a different tune in a smaller space) - in short, changing the world and the web through the power of marked-up information." This appears to be the successor to the popular and fun Extreme Markup Languages Conference: same organizers (good), same hotel (bad), but no longer under the auspices of the GCA. If the Looney continues its run against the dollar this may not be the best year for Americans to go, but Europeans and Canadians should have a good time.

Wednesday, October 3, 2007 (Permalink)

SpeakEasy/Covad is acting up again, and my network connection is going up and down for no apparent reason. For the moment that means xom.nu, Mokka mit Schlag, The Cafes are down and e-mail is unreliable. I'm not sure when this is likely to be fixed. For future reference, the magic words for getting at least something working are "Manual Reprovision" whatever that means. This time a regular rebuild and reprovision did not accomplish anything. SpeakEasy is shipping me a new DSL modem that may fix the problem, but that isn't scheduled to arrive until Friday, assuming it ships when and how they say. (Last time this happened they told me they'd overnight it and instead sent it regular delivery.) This is the 3rd and worst outage I've had with them in the last month. I'm still looking for reliable ISP service in Brooklyn.

Tuesday, October 2, 2007 (Permalink)

Bob Stayton has released the fourth edition of DocBook XSL: The Complete Guide. New features include:

The DocBook stylesheets version 1.73
Processing DocBook version 5 documents.
DocBook DTD 4.5.
New chapters on DocBook 5 and Revision Control

Those who purchased the book in the last year get a free upgrade to the new PDF version.

Norm Walsh has posted the seventh release candidate of DocBook 5.0. DocBook 5 is "a significant redesign that attempts to remain true to the spirit of DocBook." The schema is written in RELAX NG. A DTD and W3C XML Schema generated from the RELAX NG schema are also available. There's also a Schematron schema "that validates some extra-grammatical DocBook constraints. These patterns are also present directly in the RELAX NG Grammar and some validators, for example MSV, can perform both kinds of validation at the same time." This may become the final version of DocBook 5.

The DocBook Project has released version 1.0 of the DocBook Saxon extensions and the DocBook Xalan extensions.

Monday, October 1, 2007 (Permalink)

Dave Beckett has released the Raptor RDF Parser Toolkit 1.4.16, an open source C library for parsing the RDF/XML, N-Triples. Turtle, and Atom Resource Description Framework formats. It uses expat or libxml2 as the underlying XML parser. This release adds a TRiG parser, updates the GRDDL parser to support the final recommendation, and supports @base in the Turtle parser and serializer. Raptor is dual licensed under the LGPL and Apache 2.0 licenses.

Peter Jipsen has released ASCIIMathML 2.0.2, a JavaScript program that converts calculator-style ASCII math notation and some LaTeX formulas to Presentation MathML while a Web page loads. The resulting MathML can be displayed in Mozilla-based browsers and Internet Explorer 6 with MathPlayer. ASCIIMathML is published under the LGPL.

Sunday, September 30, 2007 (Permalink)

ETH Zurich has released MXQuery 0.4, an open source (Apache 2.0) XQuery engine written in Java. It supports XQuery 1.0 (though typeless), XQueryP, FORSEQ, and XQuery Update.

Saturday, September 29, 2007 (Permalink)

The W3C Web Services Activity has published new drafts of Web Services Policy 1.5 - Guidelines for Policy Assertion Authors and Web Services Policy 1.5 - Primer. According to the latter,

Web services are being successfully used for interoperable solutions across various industries. One of the key reasons for interest and investment in Web services is that they are well-suited to enable service-oriented systems. XML-based technologies such as SOAP, XML Schema and WSDL provide a broadly-adopted foundation on which to build interoperable Web services. The WS-Policy and WS-PolicyAttachment specifications extend this foundation and offer mechanisms to represent the capabilities and requirements of Web services as Policies.

Service metadata is an expression of the visible aspects of a Web service, and consists of a mixture of machine- and human-readable languages. Machine-readable languages enable tooling. For example, tools that consume service metadata can automatically generate client code to call the service. Service metadata can describe different parts of a Web service and thus enable different levels of tooling support.

First, service metadata can describe the format of the payloads that a Web service sends and receives. Tools can use this metadata to automatically generate and validate data sent to and from a Web service. The XML Schema language is frequently used to describe the message interchange format within the SOAP message construct, i.e. to represent SOAP Body children and SOAP Header blocks.

Second, service metadata can describe the ‘how’ and ‘where’ a Web service exchanges messages, i.e. how to represent the concrete message format, what headers are used, the transmission protocol, the message exchange pattern and the list of available endpoints. The Web Services Description Language is currently the most common language for describing the ‘how’ and ‘where’ a Web service exchanges messages. WSDL has extensibility points that can be used to expand on the metadata for a Web service.

Third, service metadata can describe the capabilities and requirements of a Web service, i.e. representing whether and how a message must be secured, whether and how a message must be delivered reliably, whether a message must flow a transaction, etc. Exposing this class of metadata about the capabilities and requirements of a Web service enables tools to generate code modules for engaging these behaviors. Tools can use this metadata to check the compatibility of requesters and providers. Web Services Policy can be used to represent the capabilities and requirements of a Web service.

Web Services Policy is a machine-readable language for representing the capabilities and requirements of a Web service. These are called ‘policies’. Web Services Policy offers mechanisms to represent consistent combinations of capabilities and requirements, to determine the compatibility of policies, to name and reference policies and to associate policies with Web service metadata constructs such as service, endpoint and operation. Web Services Policy is a simple language that has four elements - Policy, All, ExactlyOne and PolicyReference - and two attributes - wsp:Optional and wsp:Ignorable.

Friday, September 28, 2007 (Permalink)

The W3C POWDER working group has published three new working drafts:

According to the first,

The Protocol for Web Description Resources (POWDER) facilitates the publication of descriptions of multiple resources such as all those available from a Web site. These descriptions are attributable to a named individual, organization or entity that may or may not be the creator of the described resources. This contrasts with more usual metadata that typically applies to a single resource, such as a specific document's title, which is usually provided by its author.

This document sets out how Description Resources (DRs) can be created and published, whether individually or as a bulk data, how to link to DRs from other online resources, and, crucially, how DRs may be authenticated. The aim is to provide a platform through which opinions, claims and assertions about online resources can be expressed by people and exchanged by machines. POWDER has evolved from the data model developed for the final report [XGR] of the Web Content Label Incubator Group [WCL-XG] from which we define a Description Resource as: "a resource that contains a description, a definition of the scope of the description and assertions about both the circumstances of its own creation and the entity that created it."

The method of defining the scope of a DR, that is, defining what is being described, is provided in a separate document: Grouping of Resources [GROUP]. Companion documents describe the RDF/OWL vocabulary [VOC] and XML data types [WDRD] that are derived from the Grouping of Resources document and this document, with each term's domain, range and constraints defined. As each term is introduced in this document, it is linked to its description in the vocabulary document. The POWDER vocabulary namespace is http://www.w3.org/2007/05/powder# for which we use the QName wdr.

POWDER takes a very broad approach so that it is possible for both the resource creator and third parties to make assertions about all kinds of things, with no architectural limits on the kind of thing they are making claims about. For example, medically proficient organizations might be concerned with properties of the agencies and processes that produce Web content (e.g.. companies, people, and their credentials). Equally, a 'Mobile Web' application might need to determine the properties of various devices such as their screen dimensions, and those device types might be described with such properties by their manufacturer or by others. Although the broad approach is supported, we have focused on Web resources rather than trying to define a universal labeling system for objects.

Thursday, September 27, 2007 (Permalink)

The W3C RDF Data Access Working Group has published the candidate recommendation of SPARQL Query Results XML Format. "This document describes an XML format for the variable binding and boolean results formats provided by the SPARQL query language for RDF".

Wednesday, September 26, 2007 (Permalink)

I just noticed that www.xom.nu is redirecting to www.elharo.com. I suspect I broke it while testing some redirect scripts for Refactoring HTML. It should be fixed shortly.

OK. It's fixed now. It's a little surprising that the site was broken for probably two days and nobody noticed. Either XOM is so clear that nobody need to read the documentation or not as many people are using it as I'd like. Today I had the displeasure of using both DOM and dom4j on another project I've been working on (For non-technical reasons I'm not at liberty to go into, XOM is not an option on this project.) and they were both appalling. JDK 1.5's DOM implementation was flat-out buggy and dom4j was broken by design. I am astonished that developers are still using these products. I was sorely tempted to bring in JDOM, but I didn't want to introduce yet another dependency. I ended up settling for Jaxen.

Wednesday, September 25, 2007 (Permalink)

The OpenOffice Project has released OpenOffice 2.3, an open source office suite for Linux, Solaris, and Windows that saves all its files as zipped XML. There's also an alpha-quality version for the Mac (X-Windows no longer required). Version 2.3 adds a new report designer, provides localizations for several more languages including Tagalog and Frisian, improves charts and databases, and fixes numerous bugs.

I tried out the Mac version. OpenOffice is improving but even leaving aside stability issues that probably don't exist on other platforms, it's clearly not ready to replace Microsoft Office anytime soon. It's still missing some really basic functionality like a scrolling view of a word processing document. There are also numerous UI inconsistencies that need to be cleaned up. Can you spot this one? (There's one problem on all platforms and an extra problem just on the Mac.)

New design/Existing Design

Actually looking at that dialog again, I see three problems, two for all platforms and one only on the Mac. Yes, I'm being picky, but it's getting the little stuff like this right that really makes an application feel and look professional and clean. Outside of the Mac world, most open source developers aren't nearly picky enough. (Firefox and recent versions of NetBeans are two notable exceptions. It's no coincidence that both of those projects include significant proportions of PowerBook-wielding developers.)

I tried to report some of these problems, but their bug reporter requires registration, and never sent me a password. :-( OpenOffice is dual licensed under the LGPL and Sun Industry Standards Source License.

Monday, September 24, 2007 (Permalink)

The W3C Math Working Group has posted a new working draft of A MathML for CSS profile. "The current profile is intended to be subset of MathML 3.0 [mathml] that could be used to capture structure of mathematical formulae in the way suitable for further CSS formatting. This profile is expected to facilitate adoption of MathML in web browsers and CSS formatters, allowing them to reuse existing CSS [css] visual formatting model, enhanced with a few mathematics oriented extensions, for rendering of layouts schemata of presentational MathML. Development of the CSS profile is assumed to be coordinated with ongoing work on CSS3 and may require a limited set of new properties to be added to existing CSS3 modules."

Sunday, September 23, 2007 (Permalink)

The W3C Web Application Formats Working Group Working Group has published a note effectively cancelling work on Declarative Formats for Applications and User Interfaces:

On 15 November 2005 the W3C announced the decision to start the Web Application Formats (WAF) Working Group (WG). This WG's Charter includes a deliverable named Specification of a declarative format for applications and user interfaces (called DFAUI in this document) and it is defined as follows:

This deliverable should be based on an existing application/UI format, such as Mozilla's XUL, Microsoft's XAML, Macromedia's MXML or Laszlo Systems' LZX, provided the owners of the format are willing to contribute. The format should allow embedded program code. This format, combined with the deliverables below and existing technologies including XHTML, CSS, XForms, SVG and SMIL, should provide a strong basis for rich client application development.

Tentative milestones: First draft of requirements during October. First draft of specification during November. Candidate Recommendation 4th quarter of 2006.

This Note includes a recommendation that the Working Group formally stop its work on this deliverable and consider this Note as the one and only document the WG will publish for the DFAUI. The document also includes the status of this deliverable and some options if Members choose to do DFAUI-related work.

...

For all practical purposes, worked on the DFAUI deliverable stopped after the WG's April 2007 face-to-face meeting.

The primary reasons and factors that contributed to the slow progress on, and low participation in the DFAUI are:

Insufficient resources - only two members of the WG actively contributed (via significant contributions) to the DFAUI work

Work on the DFAUI deliverable detracted from the WG's other specification work (see above) and this other work has active support and contributions from more WG members

Lack of key industry participants and stakeholders to pro-actively drive the DFAUI

Some members of the WG asserted the identified Use Cases and Requirements can be addressed by existing open standards (i.e. HTML4.01, CSS2.1, JavaScript, etc.) and/or by open standards in progress (i.e. HTML5, CSS3, XBL2, etc.). Other WG members asserted the existing specifications cannot meet some of the Use Cases and Requirements

Most WG participants could not make a prolonged (i.e. multi-year) resource commitment to create a new DFAUI language

Saturday, September 22, 2007 (Permalink)

The W3C XML Processing Model Working Group has posted the last call working draft of XProc: An XML Pipeline Language. According to the introduction,

An XML Pipeline specifies a sequence of operations to be performed on a collection of XML input documents. Pipelines take zero or more XML documents as their input and produce zero or more XML documents as their output.

A pipeline consists of steps. Like pipelines, steps take zero or more XML documents as their inputs and produce zero or more XML documents as their outputs. The inputs to a step come from the web, from the pipeline document, from the inputs to the pipeline itself, or from the outputs of other steps in the pipeline. The outputs from a step are consumed by other steps, are outputs of the pipeline as a whole, or are discarded.

There are two kinds of steps: atomic steps and compound steps. Atomic steps carry out single operations and have no substructure as far as the pipeline is concerned, whereas compound steps control the execution of other steps, which they include in the form of one or more subpipelines.

Standard steps include count, delete, equal, error, load, parse, serialize, insert, escape markup, unescape markup, identity, label elements, XSLT, XSLT 2, XQuery, rename, namespace rename, replace, wrap, unwrap, wrap sequence, sink, set attributes, split sequence, string replace, XInclude, HTTP request, RELAX NG validate, XSLT 2.0, XQuery 1.0, and W3C Schema validate, Others may be defined.

Friday, September 21, 2007 (Permalink)

I've posted the notes from Wednesday's class on Effective XML at SD Best Practices. I've been giving this talk for a few years now. It used to be I could assume that XML was the default choice for most applications. However, in 2007 there are some alternatives such as JSON for some use cases, so I added a new "Part O" that explained when competing formats were and were not appropriate; and why you might choose one over the other.

PowerPoint's XML export is truly hideous and incompatible with Firefox so I've used Openoffice Impress 2.3 to generate these. I see a number of issues I should file bugs on with OpenOffice, just as soon as they send me a password for the Bug tracker.

Thursday, September 20, 2007 (Permalink)

The Mozilla Project has released Firefox 2.0.0.7. This release plugs a QuickTime related security holes. All users should upgrade.

Wednesday, September 19, 2007 (Permalink)

The OpenOffice.org Annual Conference opens today. I'm pleased to note that the conference presentations are online and will be broadcast online in real time. I wish more conferences would do this.

Tuesday, September 18, 2007 (Permalink)

The W3C GRDDL Working Group has posted the finished recommendation of Gleaning Resource Descriptions from Dialects of Languages (GRDDL). According to the abstract,

GRDDL is a mechanism for Gleaning Resource Descriptions from Dialects of Languages. This GRDDL specification introduces markup based on existing standards for declaring that an XML document includes data compatible with the Resource Description Framework (RDF) and for linking to algorithms (typically represented in XSLT), for extracting this data from the document.

The markup includes a namespace-qualified attribute for use in general-purpose XML documents and a profile-qualified link relationship for use in valid XHTML documents. The GRDDL mechanism also allows an XML namespace document (or XHTML profile document) to declare that every document associated with that namespace (or profile) includes gleanable data and for linking to an algorithm for gleaning the data.

The result of such a glean is an RDF description of the document. GRDDL may well be the tipping point that turns the Semantic Web from an academic fantasy to practical tool. Then again it may not. If this doesn't work, the Semantic Web is dead. If it does work, about all I'm sure of is that the Semantic Web is going to look nothing like anyone imagines it today.

The W3C GRDDL Working Group has also posted the finished recommendation of GRDDL Test Cases. "This document describes and includes test cases for software agents that extract RDF from XML source documents by following the set of mechanisms outlined in the Gleaning Resource Description from Dialects of Language [GRDDL] specification. They demonstrate the expected behavior of a GRDDL-aware agent by specifying one (or more) RDF graph serializations which are the GRDDL results associated with a single source document."

Monday, September 17, 2007 (Permalink)

The W3C CSS Working Group has published the first working draft of CSS Grid Positioning Module Level 3."This module describes integration of grid-based layout (similar to the grids traditionally used in books and newspapers) with CSS sizing and positioning." Proposed properties include grid-columns and grid-rows. If this means I can do sidebars without struggling with position properties, I'm all for it.

Sunday, September 16, 2007 (Permalink)

The W3C Web Services Activity has published the final recommendations of Web Services Policy 1.0 Metadata. According to the abstract,

Web Services Addressing provides transport-neutral mechanisms to address Web services and messages. Web Services Addressing 1.0 - Metadata (this document) defines how the abstract properties defined in Web Services Addressing 1.0 - Core are described using WSDL, how to include WSDL metadata in endpoint references, and how WS-Policy can be used to indicate the support of WS-Addressing by a Web service.

Saturday, September 15, 2007 (Permalink)

The XML Apache Project has released Xerces-J 2.9.1, a minor upgrade to the preeminent open source XML parser for Java. This release mostly fixes bugs amd improves performance slightly.

Friday, September 14, 2007 (Permalink)

Matt Mullenweg has released Wordpress 2.2.3 an open source (GPL) blog engine based on PHP and MySQL. This release fixes security bugs. All users should upgrade.

Wednesday, September 12, 2007 (Permalink)

I've posted the notes from today's JavaZone session on Refactoring HTML. (Yes, I know the HTML here is pretty pathetic by its own standards. This is just a quick PowerPoint export. I'll refactor it when I get a minute. Hmm, doesn't seem to work in Firefox. I can see it in Safari, and Firefox can display the local copy off my hard drive. I may have to refactor sooner rather than later. )

This was the first time I'd publicly presented this material, and I think it went well. We had about 200 people in attendance and most of them stayed for the whole thing. I did have to go through so many points at lightning speed. It would have been nice to be able to give a few more details of exactly how to implement some of these refactorings and some more evidence that they are in fact good ideas; but in 60 minutes you can't do more than hit the high points. Maybe next year I can do a full day class on this somewhere.

Tuesday, September 11, 2007 (Permalink)

The W3C Web Services Activity has published the final recommendations of Web Services Policy 1.5 - Framework and Web Services Policy 1.5 - Attachment. According to the former,

Web Services Policy 1.5 - Framework defines a framework and a model for expressing policies that refer to domain-specific capabilities, requirements, and general characteristics of entities in a Web services-based system.

A policy is a collection of policy alternatives. A policy alternative is a collection of policy assertions. A policy assertion represents a requirement, capability, or other property of a behavior. A policy expression is an XML Infoset representation of its policy, either in a normal form or in its equivalent compact form. Some policy assertions specify traditional requirements and capabilities that will manifest themselves in the messages exchanged(e.g., authentication scheme, transport protocol selection). Other policy assertions have no wire manifestation in the messages exchanged, yet are relevant to service selection and usage (e.g., privacy policy, QoS characteristics). Web Services Policy 1.5 - Framework provides a single policy language to allow both kinds of assertions to be expressed and evaluated in a consistent manner.

Web Services Policy 1.5 - Framework does not cover discovery of policy, policy scopes and subjects, or their respective attachment mechanisms. A policy attachment is a mechanism for associating policy with one or more policy scopes. A policy scope is a collection of policy subjects to which a policy applies. A policy subject is an entity (e.g., an endpoint, message, resource, interaction) with which a policy can be associated. Web Services Policy 1.5 - Attachment [Web Services Policy Attachment] defines such policy attachment mechanisms, especially for associating policy with arbitrary XML elements [XML 1.0], WSDL artifacts [WSDL 1.1, WSDL 2.0 Core Language], and UDDI elements [UDDI API 2.0, UDDI Data Structure 2.0, UDDI 3.0]. Other specifications are free to define either extensions to the mechanisms defined in Web Services Policy 1.5 - Attachment [Web Services Policy Attachment], or additional mechanisms not covered by Web Services Policy 1.5 - Attachment [Web Services Policy Attachment], for purposes of associating policy with policy scopes and subjects.

Monday, September 10, 2007 (Permalink)

The W3C Voice Browser Working Group has posted the third public working draft of the Speech Synthesis Markup Language Version 1.1. "This document enhances SSML 1.0 [SSML] to provide better support for a broader set of natural (human) languages. To determine in what ways, if any, SSML is limited by its design with respect to supporting languages that are in large commercial or emerging markets for speech synthesis technologies but for which there was limited or no participation by either native speakers or experts during the development of SSML 1.0, the W3C held three workshops on the Internationalization of SSML. The first workshop [WS], in Beijing, PRC, in October 2005, focused primarily on Chinese, Korean, and Japanese languages, and the second [WS2], in Crete, Greece, in May 2006, focused primarily on Arabic, Indian, and Eastern European languages. The third workshop [WS3], in Hyderabad, India, in January 2007, focused heavily on Indian and Middle Eastern languages. Information collected during these workshops was used to develop a requirements document [REQS11]. Changes from SSML 1.0 are motivated by these requirements."

Friday, September 7, 2007 (Permalink)

I'll be in Norway for JavaZone for the next week. Updates here will be a little slow until I get back.

The W3C Multimedia Semantics Incubator Group has published a note on Image Annotation on the Semantic Web. "Many applications that process multimedia assets make use of some form of metadata that describe the multimedia content. The goals of this document are to explain the advantages of using Semantic Web languages and technologies for the creation, storage, manipulation, interchange and processing of image metadata. In addition, it provides guidelines for Semantic Web-based image annotation, illustrated by use cases. Relevant RDF and OWL vocabularies are discussed, along with a short overview of publicly available tools."

Thursday, September 6, 2007 (Permalink)

The W3C Web Services Activity has published a note on Semantic Annotations for WSDL and XML Schema — Usage Guide. "Web services provide a standards-based foundation for exchanging information between distributed software systems. The W3C Recommendation Web Services Description Language (WSDL) specifies a standard way to describe the interfaces of a Web Service at a syntactic level and how to invoke it. While the syntactic descriptions provide information about the structure of input and output messages of an interface and about how to invoke the service, semantics are needed to describe what a Web service actual does. These semantics, when expressed in formal languages, disambiguate the description of Web services interfaces, paving the way for automatic discovery, composition and integration of software components. WSDL does not explicitly provide mechanisms to specify the semantics of a Web service. Semantic Annotations for WSDL and XML Schema (SAWSDL) defines mechanisms by which semantic annotations can be added to WSDL components. This usage guide is an accompanying document to SAWSDL specification. It presents examples illustrating how to associate semantic annotations with a Web service. These annotations could be used for classifying, discovering, matching, composing, and invoking Web services."

Wednesday, September 5, 2007 (Permalink)

Continuing its never-ending quest to prove that it's turtles all the way up, the W3C Semantic Web Activity has updated its note on POWDER: Use Cases and Requirements:

The development of the Protocol for Web Description Resources has been motivated by both commercial and social concerns. On the social side, there is a demand for a system to identify content that meets certain criteria as they apply to specified audiences. Commercially, there is a demand to be able to personalize content for a particular user or delivery context.

POWDER will address these demands by defining a method through which relatively small amounts of metadata, that can be produced quickly and easily, can be applied to large amounts of content.

Tuesday, September 4, 2007 (Permalink)

The W3C XML Schema Working Group has posted the last call working draft of XML Schema 1.1 Part 1: Structures. According to the introduction,

The Working Group has three main goals for this version of W3C XML Schema:

Significant improvements in simplicity of design and clarity of exposition without loss of backward or forward compatibility;

Provision of support for versioning of XML languages defined using this specification, including the XML transfer syntax for schemas itself.

Provision of support for co-occurrence constraints, that is constraints which make the presence of an attribute or element, or the values allowable for it, depend on the value or presence of other attributes or elements.

These goals are in tension with one another. The Working Group's strategic guidelines for changes between versions 1.0 and 1.1 can be summarized as follows:

Support for versioning (acknowledging that this may be slightly disruptive to the XML transfer syntax at the margins)

Support for co-occurrence constraints (which will certainly involve additions to the XML transfer syntax, which will not be understood by 1.0 processors)

Bug fixes (unless in specific cases we decide that the fix is too disruptive for a point release)

Editorial changes

Design cleanup will possibly change behavior in edge cases

Non-disruptive changes to type hierarchy (to better support current and forthcoming international standards and W3C recommendations)

Design cleanup will possibly change component structure (changes to functionality restricted to edge cases)

No significant changes in existing functionality

No changes to XML transfer syntax except those required by version control hooks, co-occurrence constraints and bug fixes

The aim with regard to compatibility is that

All schema documents conformant to version 1.0 of this specification should also conform to version 1.1, and should have the same validation behavior across 1.0 and 1.1 implementations (except possibly in edge cases and in the details of the resulting PSVI);

The vast majority of schema documents conformant to version 1.1 of this specification should also conform to version 1.0, leaving aside any incompatibilities arising from support for versioning or co-occurrence constraints, and when they are conformant to version 1.0 (or are made conformant by the removal of versioning information), should have the same validation behavior across 1.0 and 1.1 implementations (again except possibly in edge cases and in the details of the resulting PSVI);

Comments are due by November 8.

Monday, September 3, 2007 (Permalink)

The W3C XQuery working group has posted the last call working drafts of XQuery Update Facility, XQuery Update Facility Use Cases, and XQuery Update Facility 1.0 Requirements. XQuery as it currently exists is basically just SELECT in SQL terms. XQuery Update adds INSERT, UPDATE, and DELETE. More specifically it is:

upd:mergeUpdates
upd:revalidate
upd:applyUpdates
upd:removeType
upd:setToUntyped
upd:insertBefore
upd:insertAfter
upd:insertInto
upd:insertIntoAsFirst
upd:insertIntoAsLast
upd:insertAttributes
upd:delete
upd:replaceNode
upd:replaceValue
upd:replaceElementContent
upd:rename

Comments are due by October 31.

Version 1.1.0 of XQilla, an open source XQuery 1.0 and XPath 2.0 library and command line utility written in C++, has been released. Xqilla is implemented on top of Xerces-C++ and derives from Pathan. Version 1.1 adds support for XQuery Update (Last Call Working Draft 28 August 2007). It is dual licensed under the Sleepycat licence and a BSD style licence.

The Saarland University Database Group has released GCX, an open source open-source XQuery implementation written in cross-platform C++ and released under the BSD license. GCX is "designed for memory-efficient XQuery evaluation against large XML documents. The prototype supports a powerful fragment of the XQuery language, with nested for-expressions, child- and descendant axes, and joins."

Sunday, September 2, 2007 (Permalink)

The Apache XML Project has released version 2.8.0 of Xerces-C++, an open source schema validating XML parser written in reasonably cross-platform C++. Version 2.8.0 is mostly a bug fix and optimization release.

Saturday, September 1, 2007 (Permalink)

I've made some adjustments in the mod_rewrite filters on Cafe con Leche. The immediate reason is to allow me to serve .phtml files instead of .html files, and the reason for that is to improve the recommended reading by using del.icio.us as my backend, instead of the homegrown scripts and text files I've been using for the last 10+ years. If nothing goes wrong, the changeover should be transparent. However something always goes wrong when I play with mod_rewrite. Please do let me know if you notice any problems, like a page stuck in an infinite reload loop, or a page that can't be accessed.

Friday, August 31, 2007 (Permalink)

The DocBook Project has released version 1.73.2 of the DocBook XSL stylesheets. According to Michael Smith, "This is solely a minor bug-fix update to the 1.73.1 release. It fixes a packaging error in the 1.73.1 package, as well as a bug in footnote handling in FO output."

Wednesday, August 29, 2007 (Permalink)

Microsoft seems to be stacking the deck in standards organizations in favor of OfficeOpen XML, starting in Sweden. 23 mostly minor Microsoft affiliated companies joined the Swedish Standards Institute at the last minute and 22 of them voted in favor of OfficeOpen standardization. I only wonder why this didn't happen sooner or why it doesn't happen more often. I guess most companies just don't care all that much about standards most of the time. Microosft usually just ignores standards it doesn't like or doesn't understand (consider CSS and HTML); but this one may actually affect them in the pocketbook since governments are starting to require open formats before signing purchase contracts.

Planamesa Software has released NeoOffice/J 2.2.1, a Mac port of OpenOffice 2.1 using a Java-based GUI. This release adds support for the Mac OS X Spellchecker and Address Book and can now open and save Office 2007 Excel and PowerPoint files (though likely with a few glitches).

Tuesday, August 28, 2007 (Permalink)

The W3C Web API Working Group has published the first working draft of ElementTraversal Specification. "This specification defines the ElementTraversal interface, which allows script navigation of the elements of a DOM tree, excluding all other nodes in the DOM, such as text nodes. It also provides a property to expose the number of child elements of an element. It is intended to provide a more convenient alternative to existing DOM navigation interfaces, with a low implementation footprint." Hmm, just what the DOM needs: yet another way to do it.

ElementTraversal provides some extra properties/methods for navigating only through elements, while ignoring text and white space:

firstElementChild
lastElementChild
previousElementSibling
nextElementSibling
childElementCount

This makes it easier to process record-like XML, but inappropriate for reading documents with mixed content.

Leaving aside the issue of whether ElementTraversal is a reasonably designed API considered in isolation, I think it's time to wake up and realize that the DOM has been a massive hobble on XML for years now, and it's time to abandon it. Everything we add to it is just putting lipstick on a horse. Lately I've been realizing that outside the XML community, the uptake in JSON, YAML, and competitive formats isn't as much a reaction to XML as it is a reaction to DOM. Most developers don't distinguish between DOM and XML, especially in the JavaScript community. DOM disgusts them (which only proves they have good taste). They then proceed to throw out the XML baby with the DOM bathwater.

The whole enterprise of a cross-language API was doomed from the start. Browsers should move to E4X, and sooner rather than later. Standalone programs should move to language specific APIs such as XOM and Amara. Let's deprecate DOM, recognize it for the mistake it was, and get on with our lives and our work. There are better alternatives out there. Let's use them.

Monday, August 27, 2007 (Permalink)

The W3C Ubiquitous Web Application Working Group Working Group has published the candidate recommendation of Delivery Context: XPath Access Functions 1.0. "This document specifies a set of XPath functions that can be used to manipulate the Delivery Context associated with a request for an item of content. These functions have been designed to satisfy the requirements to adapt content based on the Delivery Context. While designed to work with Device Independent Content Selection [DISelect] it can be used in any XPath processor." Functions include:

dcn:delivery-context()
dcn:getProperty()
dcn:setProperty()
dcn:search()
dcn:cssmq-width()
dcn:cssmq-height()
dcn:cssmq-device-width()
dcn:cssmq-device-height()
dcn:cssmq-device-aspect-ratio()
dcn:cssmq-device-aspect-ratio-width()
dcn:cssmq-device-aspect-ratio-height()
dcn:cssmq-color()
dcn:cssmq-color-index()
dcn:cssmq-monochrome()
dcn:cssmq-resolution()
dcn:cssmq-scan()
dcn:cssmq-grid()

Saturday, August 25, 2007 (Permalink)

The XML Apache Project has posted version 0.94 of FOP, an open source XSL Formatting Objects to PDF/PostScript/RTF converter written in Java. This release can auto-detect installed fonts, supports the collapsing-border model in tables, improves internal links, and features more Unicode savvy line breaking

FOP's improving, and it's good enough for simple protoyping and experimenting. However it's still not close to ready for most serious production needs. There are just too many missing pieces. The most important ones for my needs are automatic table layout and floating images. Hopefully these will be improved in future releases in the push to 1.0.

Friday, August 24, 2007 (Permalink)

Daniel Veillard has released version 2.6.30 of libxml2, the open source XML C library for Gnome. This release fixes assorted bugs and porting issues.

Thursday, August 23, 2007 (Permalink)

Sun has posted version 0.5.2 of xmlroff, an open source XSL Formatting Objects to PDF and PostScript converter. xmlroff is written in C for Linux, and relies on the libxml2, libxslt, and the GLib, and GObjectfrom GTK+ and GNOME (though neither GTK+ nor Gnome is required). It also needs PDFlib, FreeType2, and Fontconfig. xmlroff can be run from the command line. It also includes a libfo library. This version fixes bugs and plugs memory leaks.

Wednesday, August 22, 2007 (Permalink)

The DocBook Project has released version 1.73.1 of the DocBook XSL stylesheets. According to Michael Smith, "This is mostly a bug-fix update to the 1.73.0 release."

Tuesday, August 21, 2007 (Permalink)

The Mozilla Project has released Camino 1.5.1, an open source Mac OS X web browser based on the Gecko 1.8 rendering engine and the Quartz GUI toolkit. It supports pretty much all the technologies that Mozilla does: HTML, XHTML, CSS, XML, XSLT, etc. 1.5 adds spell-checking, feed detection, session restore, Keychain sharing with Safari, and enhanced security for cookies, Flash, and plug-ins. Mac OS X 10.3 or later is required. Version 1.5.1 adds version 1.8.1.6 of the Mozilla Gecko rendering engine, improves ad blocking, and fixes some critical security issues. All users should upgrade.

Monday, August 20, 2007 (Permalink)

The W3C Device Independence Working Group has posted the candidate recommendation of Content Selection for Device Independence (DISelect) 1.0. According to the abstract, "This document specifies a syntax and processing model for general purpose content selection or filtering. Selection involves conditional processing of various parts of an XML information set according to the results of the evaluation of expressions. Using this mechanism some parts of the information set can be selected for further processing and others can be suppressed. The specification of the parts of the infoset affected and the expressions that govern processing is by means of XML-friendly syntax. This includes elements, attributes and XPath expressions. This document specifies how these components work together to provide general purpose selection."

That sounds unobjectionable, but what the working group is really proposing is XML markup that can be added to a page to indicate which devices certain content is appropriate for. For example, this sel:if element says that the image should only be displayed if the user's device supports color or has a window size wider than 500 pixels.

<div sel:expr="dc:cssmq-width('px') &gt; 500" 
    and dc:cssmq-color() > 0" >
  <object src="picture.png"/>
</div>

This feels more than a little like presentation based markup. This is very much like using JavaScript or server side programs to identify different browsers and send them content tailored specifically to them. This syntax is definitely easier-to-use, and more powerful than the various JavaScript and server-side hacks people use today; but should we be doing this at all? Whatever happened to the vision of sending browsers XML documents with appropriate stylesheets and letting the client decide how to best present it? The thing that bothers me the most about this proposal is that the syntax mixes the presentation information straight into the document, rather than linking to it from a separate hints sheet. In many ways, this document seems to reflect a belief that the W3C has been going down the wrong road for the last eight years in attempting to separate content from presentation.

Sunday, August 19, 2007 (Permalink)

The Modis Team has released Sedna 2.1, an open source native XML database for Windows and Linux written in C++ and Scheme and published under the Apache License 2.0. Sedna supports XQuery and its own declarative update language. This release fixes bugs and improves performance.

Saturday, August 18, 2007 (Permalink)

The W3C Web Services Activity has published new drafts of Web Services Policy 1.5 - Guidelines for Policy Assertion Authors and Web Services Policy 1.5 - Attachment. According to the latter,

Web services are being successfully used for interoperable solutions across various industries. One of the key reasons for interest and investment in Web services is that they are well-suited to enable service-oriented systems. XML-based technologies such as SOAP, XML Schema and WSDL provide a broadly-adopted foundation on which to build interoperable Web services. The WS-Policy and WS-PolicyAttachment specifications extend this foundation and offer mechanisms to represent the capabilities and requirements of Web services as Policies.

Service metadata is an expression of the visible aspects of a Web service, and consists of a mixture of machine- and human-readable languages. Machine-readable languages enable tooling. For example, tools that consume service metadata can automatically generate client code to call the service. Service metadata can describe different parts of a Web service and thus enable different levels of tooling support.

First, service metadata can describe the format of the payloads that a Web service sends and receives. Tools can use this metadata to automatically generate and validate data sent to and from a Web service. The XML Schema language is frequently used to describe the message interchange format within the SOAP message construct, i.e. to represent SOAP Body children and SOAP Header blocks.

Second, service metadata can describe the ‘how’ and ‘where’ a Web service exchanges messages, i.e. how to represent the concrete message format, what headers are used, the transmission protocol, the message exchange pattern and the list of available endpoints. The Web Services Description Language is currently the most common language for describing the ‘how’ and ‘where’ a Web service exchanges messages. WSDL has extensibility points that can be used to expand on the metadata for a Web service.

Third, service metadata can describe the capabilities and requirements of a Web service, i.e. representing whether and how a message must be secured, whether and how a message must be delivered reliably, whether a message must flow a transaction, etc. Exposing this class of metadata about the capabilities and requirements of a Web service enables tools to generate code modules for engaging these behaviors. Tools can use this metadata to check the compatibility of requesters and providers. Web Services Policy can be used to represent the capabilities and requirements of a Web service.

Web Services Policy is a machine-readable language for representing the capabilities and requirements of a Web service. These are called ‘policies’. Web Services Policy offers mechanisms to represent consistent combinations of capabilities and requirements, to determine the compatibility of policies, to name and reference policies and to associate policies with Web service metadata constructs such as service, endpoint and operation. Web Services Policy is a simple language that has four elements - Policy, All, ExactlyOne and PolicyReference - and one attribute - wsp:Optional.

Friday, August 17, 2007 (Permalink)

Opera Software has released version 9.2.3 of their namesake free-beer web browser for Windows, Mac, and Linux, FreeBSD, and Solaris. This release fixes some JavaScript security and crashing bugs found with fuzz testing (about which I'll have more to say at SD Best Practices next month). All users should upgrade.

Recodare has released Dolet for Finale 4.0:

a plug-in for the Finale music notation program that reads and writes MusicXML 2.0 files on Windows and Mac OS X. With Dolet software, you can finally read files created in Finale 2008 with Finale 2006. Read music from Sibelius into Finale by using Dolet for Finale together with our Dolet 3 for Sibelius plug-in. MusicXML files created with Dolet 4 for Finale usually import into Sibelius 4 and 5 better than Finale ETF files do.

Even if you're using Finale 2008, Dolet 4 has three big advantages:

It can save files as MusicXML 2.0 files, as well as MusicXML 1.1 and 1.0 files. This lets you take advantage of MusicXML 2.0's improved features like compressed files, improved formatting control, and graphics support.

It allows you to translate an entire folder of Finale or MusicXML files at one time - an enormous time savings when you have to move a lot of files from one program to another.

The Dolet for Finale plug-in is updated much more often than Finale. Finale releases typically have one or two maintenance updates. In our previous release, Dolet 3 for Finale had nine updates which kept making file translations even more accurate.

Do you want to read files from scanners like SharpEye Music Reader and capella-scan? Exchange files with Sibelius users? Create files for use in digital sheet music players like musicRAIN and MuseBook Score? Dolet 4 for Finale lets you exchange music between applications more accurately than ever before.

Dolet 4.0 is $149.95. Upgrades from 3.0 are $99.95.

Thursday, August 16, 2007 (Permalink)

Max Berger has released JEuclid 3.0, an open source MathML rendering solution:

JEuclid 3.0 consists of:

An MathML viewer application

Command line converters from MathML to other image formats, including JPEG, BMP, WBMP, GIF, SVG, EMF, PDF, PS, SWF

An ant task for automated conversion

Java display components for AWT and Swing

A FO preprocessor application to support MathML in xsl-fo renderers

Two plugins are part of JEuclid:

A XXE Plugin provides MathML support for XMLMind XML-Editor (http://www.xmlmind.com/ ) and is available through the XXE update site.

A FOP plugin provides MathML support in FOP (http://xmlgraphics.apache.org/fop/). It will be released shortly after the fop 0.94 release.

Google is offering free downloads of Sun's StarOffice, the usually $70 payware derivative of OpenOffice for Windows XP and later. This release adds a Google-based web search feature. Staroffice uses the same OpenDocument XML file formats that OpenOffice has popularized.

Wednesday, August 15, 2007 (Permalink)

Brett Zamir has released XqUSEme, a Firefox extension based on berkeley DB XML that "can perform XQueries on the current webpage under view (including even many malformed pages whose Firefox DOM representation was able to clean up), under external websites which conform to XML, or to other files stored in the database. One unfortunate catch is that due to an apparent bug in Firefox with its LiveConnect/Java features, try-catch exceptions do not always work, so, one malformed XQuery, and you have to restart Firefox to try again... :("

Monday, August 13, 2007 (Permalink)

The W3C Cascading Stylesheets Working Group has published working drafts of CSS Advanced Layout Module and CSS basic box model. According to the latter,

When textual documents (e.g., HTML) are laid out on visual media (e.g., screen or print), CSS models the document as a hierarchy of boxes containing words, lines, paragraphs, tables, etc. each with properties such as size, color and font.

This module describes the basic types of boxes, with their padding and margin, and the normal “flow” (i.e., the sequence of blocks of text with margins in-between). It also defines “floating” boxes, but other kinds of layout, such as tables, absolute positioning, ruby annotations, grid layouts, columns and numbered pages, are described by other modules. Also, the layout of text inside each line (including the handling of left-to-right and right-to-left scripts) is defined elsewhere.

Boxes may contain either horizontal or vertical lines of text. Boxes of different orientations may be mixed in one flow. (This is a level 3 feature.)

The advanced layout module:

contains features to describe layouts at a high level, meant for tasks such as the positioning and alignment of “widgets” in a graphical user interface or the layout grid for a page or a window, in particular when the desired visual order is different from the order of the elements in the source document. Other CSS3 modules contain properties to specify fonts, colors, text alignment, list numbering, tables, etc.

Saturday, August 11, 2007 (Permalink)

The Mozilla Project has released SeaMonkey 1.1.4. SeaMonkey is the continuation of the integrated Mozilla suite, and has XML support roughly equivalent to Firefox 1.5 (e.g. XML, XSLT, CSS, XHTML, etc.) It also bundles an e-mail client, web editor, browser, and more into one application. This release fixes security bugs. All users should upgrade.

Friday, August 10, 2007 (Permalink)

Antenna House, Inc has released XSL Formatter 4.2 for Mac, Linux, and Windows. This tool converts XSL-FO files to PDF. Version 4.2 adds support for Unicode 5.0, surrogate pairs, pair kernings and ligatures for European languages JIS X 0213:2004, Devanagari, PDF1.7, and PDF/A, and PDF Forms.

The "lite" version costs $300 and up, but is limited to 300 pages per document and doesn't support right-to-left languages. Prices for the uncrippled version start around $1250. Support costs more.

Thursday, August 9, 2007 (Permalink)

Roman Fordinal has posted docbook2odf 0.244, a set of XSLT stylesheets for converting DocBook to the Oasis Open Document Format. This release adds support for subscript, superscript, command, guimenu, guilabel, guibutton, keycal, accel, variablelist, varlistentry, term, sidebar, bibliography, and biblioentry. docbook2odf is released under the LGPL.

Wednesday, August 8, 2007 (Permalink)

IBM developerWorks has published my latest article: New elements in HTML 5: Structure and semantics. This article introduces many elements being proposed for the next release of HTML such as aside, figure, section, time, meter, progress, video, audio, details, datagrid, and command. Doubtless many details will change. More will be added and a few will not make the final cut. However, bleeding edge browsers like Opera are starting to support some of this now.

Tuesday, August 7, 2007 (Permalink)

Bare Bones Software has released version 8.7 of BBEdit, my preferred text editor on the Mac, and what I'm using to type these very words. This release adds code folding and Ruby, YAML, and SQL syntax coloring and function navigation. It also addresses at least one of my major complaints about earlier versions (confusing, hard-to-navigate preferences) and may offer a workaround for recently buggy multifile search as well. I'll need to play with this release a little to be sure. BBEdit is $199 payware. Upgrades from 8.5 and 8.6 are free. Upgrades from 8.0-8.2 cost $30 and upgrades from 7.x cost $40. Mac OS X 10.4 or later is required.

Monday, August 6, 2007 (Permalink)

The W3C Web Services activity has posted the proposed recommendation of Web Services Addressing 1.0 - Metadata. "Web Services Addressing provides transport-neutral mechanisms to address Web services and messages. Web Services Addressing 1.0 - Metadata (this document) defines how the abstract properties defined in Web Services Addressing 1.0 - Core are described using WSDL, how to include WSDL metadata in endpoint references, and how WS-Policy can be used to indicate the support of WS-Addressing by a Web service."

The W3C Web Services Policy Working Group has posted a new working draft of WSDL 1.1 Element Identifiers. This document "defines a fragment identifier syntax for identifying elements of a WSDL 1.1 document. This fragment identifier syntax is compliant with the [XPointer Framework]. This document is primarily based upon [WSDL 2.0 Core]. There is a substantial difference between the WSDL 1.1 and WSDL 2.0 fragment identifiers. WSDL 2.0 defines fragment identifiers with respect to the WSDL 2.0 component model, whereas WSDL 1.1 defines XML element and attribute syntax only. Because there is no WSDL 1.1 component model, the WSDL 1.1 fragment identifiers identify WSDL 1.1 elements."

Sunday, August 5, 2007 (Permalink)

The Helsinki University of Technology has released X-Smiles 1.0, a proof-of-concept XForms engine written in Java.

Wednesday, August 1, 2007 (Permalink)

Apple has posted a beta of Safari 3.0.3 for Windows and the Mac. Safari supports XML, XSLT, CSS, XHTML, and RSS. Also Safari 3.x can drive XSLT from JavaScript, which 2.x could not do. Mac OS X 10.4 or Windows XP or later is required. 3.0.3 fixes some security issues. All 3.x users should upgrade.

Microsoft's has posted version 0.2 of the Office Open XML File Format Converter for Mac, a tool that converts Office Open XML files to a format that is compatible with Microsoft Office 2004 for Mac and Microsoft Office v. X for Mac. Version 0.2:

improves conversion of Word documents that contain XML content, inline graphics, hyperlinked graphics, WMF/EMF graphics, SmartArt graphics, tracked changes in the document header and footer, Unicode characters, and Japanese Rubi fields. In addition, this version succeeds when converting Word documents that contain bibliography fields, citation fields, and complex tables.

This version of the converter can convert the following Office Open XML file formats:

Word Document (*.docx)

Word Macro-Enabled Document (*.docm)

PowerPoint Presentation (*.pptx)

PowerPoint Show (*.ppsx)

PowerPoint Template (*.potx)

Tuesday, July 31, 2007 (Permalink)

The Mozilla Project has released Firefox 2.0.0.6. This release plugs several security holes. All users should upgrade.

Monday, July 30, 2007 (Permalink)

Andrew Welch has released Kernow 1.5.1, a cross-platform, open source graphical front end for Saxon written in Java. According to Welch, "Everything you would normally have to type into the command line is available through the mouse, with some extra features thrown in. If you have Schema Aware Saxon it will run that too." This release adds a "very high level API for running transforms, such as to run a directory transform with a compiled stylesheet and caching resolvers".

Friday, July 27, 2007 (Permalink)

The W3C Device Independence Working Group has posted the candidate recommendation of Content Selection Primer 1.0. According to the abstract,

This document specifies a syntax and processing model for general purpose content selection or filtering. Selection involves conditional processing of various parts of an XML information set according to the results of the evaluation of expressions. Using this mechanism some parts of the information set can be selected for further processing and others can be suppressed. The specification of the parts of the infoset affected and the expressions that govern processing is by means of XML-friendly syntax. This includes elements, attributes and XPath expressions. This document specifies how these components work together to provide general purpose selection.

Here's an example from the primer:

<sel:select>
       <sel:when expr="eg:getStyleSheetSupport() = 'excellent'">
          <link rel="stylesheet" type="text/css" href="../styles/sensational.css"/>
       </sel:when>
       <sel:when expr="eg:getStyleSheetSupport() = 'basic'">
          <link rel="stylesheet" type="text/css" href="../styles/mediocre.css"/>

       </sel:when>
   </sel:select>

The basic idea is that markup inline in the document will indicate the device classes for which given content is appropriate. Such hints may be necessary and useful, but I'm still skeptical that it really has to be inline. I don't see why it couldn't be part of some kind of external style sheet, much as CSS rules can be attached through external style sheets. I still think this mixes markup and presentation.

Thursday, July 26, 2007 (Permalink)

Syntext has released Xsl-Status 1.2.0, an open source progress tracking tool for XSLT stylesheet developers. Xsl-Status tracks

the support of XML Schema elements in your XSLT stylesheet
the support status of a particular element: development, testing, finished
which template supports a particular XML element

Xsl-Status is written in Python and published under the Apache 2.0 license.

Wednesday, July 25, 2007 (Permalink)

The W3C has published a proposed edited recommendation of XForms 1.0 (Third Edition). The most significant change is the addition of a section on "Interpretation of same-document references":

The list of errata and a diff-marked version relative to XForms 1.0 Second Edition are available. There are two corrections that affect schema conformance. Erratum E9 adds id to the list of common attributes. This aligns with implementations, which must all add the ID attribute to the XForms schema in order to use the many features of XForms that rely on IDREF referencing to specify related elements, such as a submission or repeat element and a send or setindex action. Erratum 32f adds switch to the content model of repeat. Prior versions of XForms 1.0 have indicated that this content model change would appear in a future version based on implementation experience, which has now occurred. So this correction was also added to align the specification with implementations.

Tuesday, July 24, 2007 (Permalink)

The Mozilla Project has released SeaMonkey 1.1.3. SeaMonkey is the continuation of the integrated Mozilla suite, and has XML support roughly equivalent to Firefox 1.5 (e.g. XML, XSLT, CSS, XHTML, etc.) It also bundles an e-mail client, web editor, browser, and more into one application. This release fixes security bugs. All users should upgrade.

Monday, July 23, 2007 (Permalink)

The DocBook Project has released version 1.73 of the DocBook XSL stylesheets. New features in this release include:

Latvian and Esperanto localizations
ISO690 citation style for bibliography output.
Documentation for processing instructions that you can use to control output from the stylesheets.
Profiling parameters for audience and wordsize
Improved man-page output
C syntax highlighting support
A rsync based, experimental docbook-xsl-update script

Saturday, July 21, 2007 (Permalink)

The W3C Compound Document Formats Working Group has published the candidate recommendation of Compound Document by Reference Framework 1.0.

Combining content delivery formats can often be desirable in order to provide a seamless experience for the user.
For example, XHTML-formatted content can be augmented by SVG objects, to create a more dynamic, interactive and self adjusting presentation. A set of standard rules is required in order to provide this capability across a range of user agents and devices.
These are examples of possible Compound Document profiles:
XHTML + SVG + MathML
XHTML + SMIL
XHTML + XForms
XHTML + VoiceML
This document defines a generic Compound Document by Reference Framework (CDRF) that defines a language-independent processing model for combining arbitrary document formats.
NOTE: The Compound Document Framework is language-independent. While it is clearly meant to serve as the basis for integrating W3C's family of XML formats within its Interaction Domain (e.g., CSS, MathML, SMIL, SVG, VoiceXML, XForms, XHTML, XSL) with each other, it can also be used to integrate non-W3C formats with W3C formats or integrate non-W3C formats with other non-W3C formats.

Friday, July 20, 2007 (Permalink)

The W3C CSS Working Group has posted the candidate recommendation of Cascading Style Sheets, level 2 revision 1. According to the abstract,

CSS 2.1 builds on CSS2 [CSS2] which builds on CSS1 [CSS1]. It supports media-specific style sheets so that authors may tailor the presentation of their documents to visual browsers, aural devices, printers, braille devices, handheld devices, etc. It also supports content positioning, table layout, features for internationalization and some properties related to user interface.

CSS 2.1 corrects a few errors in CSS2 (the most important being a new definition of the height/width of absolutely positioned elements, more influence for HTML's "style" attribute and a new calculation of the 'clip' property), and adds a few highly requested features which have already been widely implemented. But most of all CSS 2.1 represents a "snapshot" of CSS usage: it consists of all CSS features that are implemented interoperably at the date of publication of the Recommendation.

CSS 2.1 is derived from and is intended to replace CSS2. Some parts of CSS2 are unchanged in CSS 2.1, some parts have been altered, and some parts removed. The removed portions may be used in a future CSS3 specification. Future specs should refer to CSS 2.1 (unless they need features from CSS2 which have been dropped in CSS 2.1, and then they should only reference CSS2 for those features, or preferably reference such feature(s) in the respective CSS3 Module that includes those feature(s)).

Significant changes include:

New color value: 'orange'
New 'display' value: 'inline-block'
New 'content' values 'none' and 'normal'.
New 'white-space' values: 'pre-wrap' and 'pre-line'
New 'cursor' value: 'progress'
Redefined "computed value" and created the concept of "used value" so that inheritance can be performed without laying out the document. This change has the effect of allowing (requiring) percentages to be inherited as percentages and affects many other layout calculations throughout the spec.
Many margin calculation improvements; the position property now applies to all elements, including generated content, floats are no longer required to have an explicit width, and many other changes to the layout algorithms.
The list styles 'hebrew', 'armenian', 'georgian', 'cjk-ideographs', 'hiragana', 'katakana', 'hiragana-iroha' and 'katakana-iroha' have been removed.
Aural style sheets are not supported.

While in isolation this seems like a good idea, I'm concerned that this is just going to cause even more confusion for web authors and browser vendors alike. Layout and positioning is hard enough in CSS already. Browser vendors are just starting to implement this is a more-or-less (mostly less) interoperable fashion. To we really need to add yet another subtly incompatible set of rules that will be implemented by some versions of some browsers and not others that can only confuse matters further? Will any browser actually support this, or will it just break existing pages that rely on the current algorithms. What does the Acid2 test look like in a CSS 21. compliant browser?

Thursday, July 19, 2007 (Permalink)

The Mozilla Project has released Firefox 2.0.0.5. This release plugs several security holes. All users should upgrade.

Wednesday, July 18, 2007 (Permalink)

The W3C has published the first public working draft of Efficient XML Interchange (EXI) Format 1.0. "EXI is a very compact representation for the eXtensible Markup Language (XML) Information Set that is intended to simultaneously optimize performance and the utilization of computational resources. The EXI format uses a hybrid approach drawn from the information and formal language theories, plus practical techniques verified by measurements, for entropy encoding XML information. Using a relatively simple algorithm, which is amenable to fast and compact implementation, and a small set of data types, it reliably produces efficient encodings of XML event streams. The event production system and format definition of EXI are presented."

I'm feeling a little verklempt. Talk amongst yourselves. I'll give you a topic. The Efficient XML Interchange Format is neither efficient nor XML nor interchangeable. Discuss.

OK, I'm better now. I'm not sure what's worse: the incredible opaqueness of the format or the fact that EXI really, truly is not a representation of the XML infoset. Opaqueness expected, but the latter surprised me. EXI introduces data types such as Binary, Boolean, Decimal, Float, Integer, Unsigned Integer, and Date-Time. XML does not have data types, and that's a feature, not a bug. XML does not presume to tell any reader how it must interpret any particular string of text it may find in a document. EXI does. (So does JSON, by the way. EXI and JSON are both grounded in the same mistaken belief that data types are interoperable across domains.)

Whatever the EXI format is, it's not XML and using the name "XML" to describe it an attempt by people who want something very different from XML to trade on XML's good name. If EXI were really a good idea, it could succeed on its own merits without pretending to be something it's not. I guess the working group members don't really believe in it though.

Tuesday, July 17, 2007 (Permalink)

The W3C GRDDL Working Group has posted the proposed recommendation of Gleaning Resource Descriptions from Dialects of Languages (GRDDL). According to the abstract,

GRDDL is a mechanism for Gleaning Resource Descriptions from Dialects of Languages. This GRDDL specification introduces markup based on existing standards for declaring that an XML document includes data compatible with the Resource Description Framework (RDF) and for linking to algorithms (typically represented in XSLT), for extracting this data from the document.

The markup includes a namespace-qualified attribute for use in general-purpose XML documents and a profile-qualified link relationship for use in valid XHTML documents. The GRDDL mechanism also allows an XML namespace document (or XHTML profile document) to declare that every document associated with that namespace (or profile) includes gleanable data and for linking to an algorithm for gleaning the data.

The result of such a glean is an RDF description of the document.

The W3C GRDDL Working Group has also posted the proposed recommendation of GRDDL Test Cases. "This document describes and includes test cases for software agents that extract RDF from XML source documents by following the set of mechanisms outlined in the Gleaning Resource Description from Dialects of Language [GRDDL] specification. They demonstrate the expected behavior of a GRDDL-aware agent by specifying one (or more) RDF graph serializations which are the GRDDL results associated with a single source document."

Monday, July 16, 2007 (Permalink)

Continuing its never-ending quest to prove that it's turtles all the way up, the W3C Semantic Web Activity has posted the first public working draft of Protocol for Web Description Resources (POWDER): Grouping of Resources:

The Protocol for Web Description Resources (POWDER) facilitates the publication of descriptions of multiple resources such as all those available from a Web site. This document describes how sets of resources may be defined, either for use in Description Resources or in other contexts. An OWL Class is to be interpreted as the Resource Set with its predicates and objects either defining the characteristics that elements of the set share, or directly listing its elements. Resources that are directly identified or that can be interpreted as being elements of the set can then be used as the subject of further RDF triples.

Sunday, July 15, 2007 (Permalink)

The W3C Synchronized Multimedia Working Group has posted the last call working draft of the Synchronized Multimedia Integration Language 3.0 (SMIL 3.0). SMIL 3.0 has four goals:

Define an XML-based language that allows authors to write interactive multimedia presentations. Using SMIL, an author can describe the temporal behaviour of a multimedia presentation, associate hyperlinks with media objects and describe the layout of the presentation on a screen.
Allow reusing of SMIL syntax and semantics in other XML-based languages, in particular those who need to represent timing and synchronization. For example, SMIL components are used for integrating timing into XHTML [XHTML10] and into SVG [SVG].
Extend the functionalities contained in the SMIL 2.1 [SMIL21] into new or revised SMIL 3.0 modules.
Define new SMIL 3.0 Profiles incorporating features useful within the industry.

Saturday, July 14, 2007 (Permalink)

The W3C XHTML working group has posted the candidate recommendation of XHTML Basic 1.1.

The XHTML Basic document type includes the minimal set of modules required to be an XHTML host language document type, and in addition it includes images, forms, basic tables, and object support. It is designed for Web clients that do not support the full set of XHTML features; for example, Web clients such as mobile phones, PDAs, pagers, and settop boxes. The document type is rich enough for content authoring.

XHTML Basic is designed as a common base that may be extended. The goal of XHTML Basic is to serve as a common language supported by various kinds of user agents.

This revision, 1.1, supercedes version 1.0 as defined in http://www.w3.org/TR/2000/REC-xhtml-basic-20001219. In this revision, several new features have been incorporated into the language in order to better serve the small-device community that is this language's major user:

XHTML Forms (defined in [XHTMLMOD])

Intrinsic Events (defined in [XHTMLMOD])

The value attribute for the li element (defined in [XHTMLMOD])

The target attribute (defined in [XHTMLMOD])

The style element (defined in [XHTMLMOD])

The style attribute (defined in [XHTMLMOD])

XHTML Presentation module (defined in [XHTMLMOD])

The inputmode attribute (defined in Section 5 of this document)

The document type definition is implemented using XHTML modules as defined in "XHTML Modularization"

Friday, July 13, 2007 (Permalink)

The W3C GRDDL Working Group has updated the working draft of a GRDDL Primer. According to the draft,

GRDDL provides an inexpensive set of mechanisms for bootstrapping RDF content from XML and XHTML. GRDDL does this by shifting the burden of formulating RDF away from the author to transformation algorithms written specifically for XML dialects such as XHTML. In this document the term HTML is used to refer to the XHTML dialect of HTML.

GRDDL works through associating transformations with an individual document either through direct inclusion of references or indirectly through profile and namespace documents. For XML dialects the transformations are commonly expressed using XSLT 1.0, although other methods are permissible. Generally, if the transformation can be fully expressed in XSLT 1.0 then it is preferable to use that format since GRDDL processors should be capable of interpreting an XSLT 1.0 document.

While anyone can create a transformation, a standard transform library has been provided that can extract RDF that's embedded directly in XML or HTML using <rdf:RDF> tags as well as extract any profile transformations. GRDDL transformations can be made for almost any dialect, including microformats.

Thursday, July 12, 2007 (Permalink)

The W3C XML Processing Model Working Group has posted the fourth public working draft of XProc: An XML Pipeline Language. According to the introduction,

An XML Pipeline specifies a sequence of operations to be performed on a collection of XML input documents. Pipelines take zero or more XML documents as their input and produce zero or more XML documents as their output.

A pipeline consists of steps. Like pipelines, steps take zero or more XML documents as their input and produce zero or more XML documents as their output. The inputs to a step come from the web, from the pipeline document, from the inputs to the pipeline itself, or from the outputs of other steps in the pipeline. The outputs from a step are consumed by other steps, are outputs of the pipeline as a whole, or are discarded.

There are two kinds of steps: atomic steps and compound steps. Atomic steps carry out single operations and have no substructure as far as the pipeline is concerned, whereas compound steps include a subpipeline of steps within themselves.

Wednesday, July 11, 2007 (Permalink)

Michael Kay has released version 8.9.0.4 of Saxon, his XSLT 2.0 and XQuery processor for Java and .NET. This is a bug fix release.

Saxon is published in two versions for both of which Java 1.4 or later (or .NET) is required. Saxon 8.9B is an open source product published under the Mozilla Public License 1.0 that "implements the 'basic' conformance level for XSLT 2.0 and XQuery." Saxon 8.9SA is a £250.00 payware version that "allows stylesheets and queries to import an XML Schema, to validate input and output trees against a schema, and to select elements and attributes based on their schema-defined type. Saxon-SA also incorporates a free-standard XML Schema validator. In addition Saxon-SA incorporates some advanced extensions not available in the Saxon-B product. These include a try/catch capability for catching dynamic errors, improved error diagnostics, support for higher-order functions, and additional facilities in XQuery including support for grouping, advanced regular expression analysis, and formatting of dates and numbers."

Tuesday, July 10, 2007 (Permalink)

The W3C Web Services Activity. has published the proposed recommendation of Semantic Annotations for WSDL and XML Schema back to last call. According to the draft,

Semantic Annotations for WSDL and XML Schema (SAWSDL) defines how to add semantic annotations to various parts of a WSDL document such as input and output message structures, interfaces and operations. The extension attributes defined in this specification fit within the WSDL 2.0 [WSDL 2.0], WSDL 1.1 [WSDL 1.1] and XML Schema [XMLSchema Part 1: Structures] extensibility frameworks. For example, this specification defines a way to annotate WSDL interfaces and operations with categorization information that can be used to publish a Web service in a registry. The annotations on schema types can be used during Web service discovery and composition. In addition, SAWSDL defines an annotation mechanism for specifying the data mapping of XML Schema types to and from an ontology; such mappings could be used during invocation, particularly when mediation is required. To accomplish semantic annotation, SAWSDL defines extension attributes that can be applied both to WSDL elements and to XML Schema elements.

The semantic annotations reference a concept in an ontology or a mapping document. The annotation mechanism is independent of the ontology expression language and this specification requires and enforces no particular ontology language. It is also independent of mapping languages and does not restrict the possible choices of such languages.

Monday, July 9, 2007 (Permalink)

Sun has released an OpenDocument Format (ODF) plug-in for Microsoft Office 2000, XP and 2003. This plug-in enables Microsoft Office (for Windows) users to open and save ODF files More specifically Word can open and save while Excel and PowerPoint can import and export. The plug-in is free-as-in-beer but is not open source.

Saturday, July 7, 2007 (Permalink)

The W3C Web Services Activity has published proposed recommendations of Web Services Policy 1.5 - Framework and Web Services Policy 1.5 - Attachment. According to the former,

Web Services Policy 1.5 - Framework defines a framework and a model for expressing policies that refer to domain-specific capabilities, requirements, and general characteristics of entities in a Web services-based system.

A policy is a collection of policy alternatives. A policy alternative is a collection of policy assertions. A policy assertion represents a requirement, capability, or other property of a behavior. A policy expression is an XML Infoset representation of its policy, either in a normal form or in its equivalent compact form. Some policy assertions specify traditional requirements and capabilities that will manifest themselves in the messages exchanged(e.g., authentication scheme, transport protocol selection). Other policy assertions have no wire manifestation in the messages exchanged, yet are relevant to service selection and usage (e.g., privacy policy, QoS characteristics). Web Services Policy 1.5 - Framework provides a single policy language to allow both kinds of assertions to be expressed and evaluated in a consistent manner.

Web Services Policy 1.5 - Framework does not cover discovery of policy, policy scopes and subjects, or their respective attachment mechanisms. A policy attachment is a mechanism for associating policy with one or more policy scopes. A policy scope is a collection of policy subjects to which a policy applies. A policy subject is an entity (e.g., an endpoint, message, resource, interaction) with which a policy can be associated. Web Services Policy 1.5 - Attachment [Web Services Policy Attachment] defines such policy attachment mechanisms, especially for associating policy with arbitrary XML elements [XML 1.0], WSDL artifacts [WSDL 1.1, WSDL 2.0 Core Language], and UDDI elements [UDDI API 2.0, UDDI Data Structure 2.0, UDDI 3.0]. Other specifications are free to define either extensions to the mechanisms defined in Web Services Policy 1.5 - Attachment [Web Services Policy Attachment], or additional mechanisms not covered by Web Services Policy 1.5 - Attachment [Web Services Policy Attachment], for purposes of associating policy with policy scopes and subjects.

Steve Palmer has released Vienna 2.1.3.2111, an open source RSS/Atom client for Mac OS X. Vienna is the first reader I've found acceptable for daily use; not great but good enough. (Of course my standards for "good enough" are pretty high.) This is a bug fix release.

Friday, July 6, 2007 (Permalink)

The Mozilla Project has posted the sixth alpha of Firefox 3.0 for Mac, Linux, and Windows. This is code named "Gran Paradiso". Besides bug fixes, this alpha adds:

A major rewrite of the text layout and rendering code, including:
- Better support for Thai and other complex scripts
- Better kerning (Windows & Linux) and ligature support (Mac only)
Updated SQLite engine to version 3.3.17
Support for site-specific preferences - text size
A new Quit dialog box that resolves termination errors
Added permanent 'Restart Firefox' button to Add-Ons Manager

Thursday, July 5, 2007 (Permalink)

This document defines a mechanism to selectively provide cross-site access to a web resource. Using either a HTTP header or an XML processing instruction (or both), resources can indicate they allow read access from specified hosts (optionally using patterns). When a pattern is used, one can also exclude certain hosts. For instance, allow read access from all subdomains of example.org (*.example.org) with the exception of public.example.org (public.example.org).

The use of an HTTP header (in addition to processing instructions) is new in this draft.

Wednesday, July 4, 2007 (Permalink)

The W3C RDF Data Access Working Group has published a second last call working draft of SPARQL Query Results XML Format. "This document describes an XML format for the variable binding and boolean results formats provided by the SPARQL query language for RDF". Comments are due by July 5.

Tuesday, July 3, 2007 (Permalink)

The W3C Internationalization Tag Set Working Group has posted the third public working draft of Best Practices for XML Internationalization. "This document provides a set of guidelines for developing XML documents and schemas that are internationalized properly. Following the best practices describes here allow both the developer of XML applications, as well as the author of XML content to create material in different languages." Suggestions include:

Provide xml:lang to specify natural language content
Provide a way to specify text directionality
Avoid translatable attributes
Indicate the translatability of elements and attributes
Provide a way to override translatability information
Provide text segmentation-related information
Provide a way to specify ruby text
Provide a way to specify comments for translators
Provide a way to specify unique identifiers
Identify terminology-related elements
Provide a way to override terminology information
Use multilingual documents with caution
Name elements with caution
Provide ITS rules for your DTD or schema
Specify the language of the content
Specify text directionality if needed
Override translatability information if needed
Assign unique identifiers to text items when possible
Use CDATA sections with caution
Provide comments for translators
Ensure any inserted text is context-independent
Use entity references with caution
Place sub-flow elements with caution

There are also specific suggestions for XHTML, DITA, and DocBook.

I'm not sure I agree with everything they say. "Avoid translatable attributes" sounds very questionable to me, though there is a rationale for it. Feedback is requested,

Monday, July 2, 2007 (Permalink)

Tim Bray has released mod_atom, an Atom Publishing Protocol module for Apache that stores all content straight into files in the file system. In other words, we finally have decent, generic publishing support for Apache HTTPD. It's about time.

Sunday, July 1, 2007 (Permalink)

Matt Mullenweg has released Wordpress 2.2.1 an open source (GPL) blog engine based on PHP and MySQL. This is a bug fix release.

Saturday, June 30, 2007 (Permalink)

The W3C Web Services activity has posted a new last call Web Services Addressing 1.0 - Metadata working draft. "Web Services Addressing 1.0 - Metadata (this document) defines how the abstract properties defined in Web Services Addressing 1.0 - Core are described using WSDL, how to include WSDL metadata in endpoint references, and how WS-Policy can be used to indicate the support of WS-Addressing by a Web service." Comments are due by July 11.

Friday, June 29, 2007 (Permalink)

The W3C Web Services Description Working Group has posted three finished recommendations for WSDL 2.0:

Web Services Description Language (WSDL) Version 2.0 Part 0: Primer

"This document is a companion to the WSDL 2.0 specification (Web Services Description Language (WSDL) Version 2.0 Part 1: Core Language [WSDL 2.0 Core], Web Services Description Language (WSDL) Version 2.0 Part 2: Adjuncts [WSDL 2.0 Adjuncts]). It is intended for readers who wish to have an easier, less technical introduction to the main features of the language."

Web Services Description Language (WSDL) Version 2.0 Part 1: Core Language

"Web Services Description Language Version 2.0 (WSDL 2.0) provides a model and an XML format for describing Web services. WSDL 2.0 enables one to separate the description of the abstract functionality offered by a service from concrete details of a service description such as 'how' and 'where' that functionality is offered. This specification defines a language for describing the abstract functionality of a service as well as a framework for describing the concrete details of a service description. "

Web Services Description Language (WSDL) Version 2.0 Part 2: Adjuncts

WSDL is an XML format for describing network services as a set of endpoints operating on messages containing either document-oriented or procedure-oriented information. Web Services Description Language (WSDL) Version 2.0 Part 2: Adjuncts defines predefined extensions for use in WSDL 2.0:

Message exchange patterns

Operation styles

Binding Extensions

Comments are due by June 20.

Thursday, June 28, 2007 (Permalink)

IDEAlliance has posted the final program for Extreme Markup Languages this August in Montreal. They've also announced a preconference International Workshop on Markup of Overlapping Structures. "Extreme is the leading international conference on markup theory and practice. If you have interesting markup applications, difficult markup problems, or intriguing solutions to problems related to the design and use of markup, markup languages, or markup tools; if you want to know what the leading theorists of markup are thinking; if you are the house markup expert and want to spend time with your kind, then you should plan on attending Extreme Markup Languages® 2007." Extreme is always a lot of fun, though I won't be able to attend this year myself.

Wednesday, June 27, 2007 (Permalink)

The W3C Web API Working Group has posted a new working draft of The XMLHttpRequest Object.

The XMLHttpRequest object implements an interface exposed by a scripting engine that allows scripts to perform HTTP client functionality, such as submitting form data or loading data from a server.

The name of the object is XMLHttpRequest for compatibility with the web, though each component of this name is potentially misleading. First, the object supports any text based format, including XML. Second, it can be used to make requests over both HTTP and HTTPS (some implementations support protocols in addition to HTTP and HTTPS, but that functionality is not covered by this specification). Finally, it supports "requests" in a broad sense of the term as it pertains to HTTP; namely all activity involved with HTTP requests or responses for the defined HTTP methods.

"Given that the previous Last Call Working Draft got feedback that required extensive changes this is a normal Working Draft again. It is expected that this document will become a second Last Call Working Draft with no more than editorial changes. "

Tuesday, June 26, 2007 (Permalink)

Planamesa Software has released NeoOffice/J 2.1 Patch 7, a Mac port of OpenOffice 2.1 using a Java-based GUI. This release improves Leopard compatibility.

Monday, June 25, 2007 (Permalink)

The W3C XML Core Working Group has published the candidate recommendation Canonical XML 1.1. This attempts to address some of the weirdnesses of Canonical XML, such as the movement of xml:id attributes from one element to another and breaking of base URLs when canonicalizing.

Sunday, June 24, 2007 (Permalink)

The W3C Voice Browser Working Group has published the finished recommendation of VoiceXML 2.1 recommendation. VoiceXML is used to describe those annoying call trees you hear when calling most major companies. (Press 1 if you want to wait on hold for 20 minutes and then be hung up on; press 2 if you want to wait indefinitely; press 3 if you'd rather we just hung up on you now.) New features in 2.1 include data and foreach elements, dynamic grammars and scripts, detecting barge-in during prompt playback, fetching xml without requiring a dialog transition, recording user utterances while attempting recognition, and specifying the media format of utterance recordings.

Saturday, June 23, 2007 (Permalink)

Apple has posted a beta of Safari 3.0.2 for Windows and the Mac. Safari supports XML, XSLT, CSS, XHTML, and RSS. Also Safari 3.x can drive XSLT from JavaScript, which 2.x could not do. Mac OS X 10.4 or Windows XP or later is required. 3.0.2 is a bug fix release.

Friday, June 22, 2007 (Permalink)

Recordare has released version 2.0 of MusicXML, an XML application for common Western music notation used in printed sheet music. "The big change is the addition of many new features for music formatting. Files saved in the MusicXML 1.1 format can now include full information about how notes, symbols, measures, staves, systems, credits, and pages appear in a printed score." New elements in 2.0 include:

image and credit-image elements for including graphics in scores.
appearance element for general score graphical settings, including line-width, note-size, and other-appearance child elements.
container, rootfiles, and rootfile elements for describing compressed zip archives containing MusicXML documents
volume, pan, and elevation elements for better mixer support.
solo and ensemble elements for better specification of playback sounds.
metronome-note and metronome-relation elements for swing and other metrical markings, including metronome-type, metronome-dot, metronome-beam, and metronome-tuplet child elements.
measure-numbering element for better specification of how measure numbers are displayed in each part.
inverted-turn ornament element
stress and unstress articulation elements.
part-name-display, part-abbreviation-display, group-name-display, group-abbreviation-display, display-text, and accidental-text elements to allow full formatting of part and group names and abbreviations.
key-octave element for more accurate display of unusual key signatures.
part-symbol element for formatting control of the symbol that groups multi-staff parts.
slash-type and slash-dot elements for more complete specification of beat-repeat and slash notation.
accordion-registration elements for accordion registration symbols, including accordion-high, accordion-middle, and accordion-low elements.
group-time element for time signatures that stretch vertically across multiple staves or parts.
relation element for metadata, similar to the same element in Dublin Core.

Thursday, June 21, 2007 (Permalink)

Microsoft has posted version 0.1.1 of its Office Open XML File Format Converter for Mac, a tool for converting Office Open XML Word files to a format that can be read by Office on Mac OS X. "This update fixes an issue with beta expiration functionality. The issue causes the converter application to become unusable well ahead of its intended expiration date. This update is highly recommended for all users of the Microsoft Office Open XML File Format Converter for Mac 0.1 (Beta)."

Wednesday, June 20, 2007 (Permalink)

The W3C RDF Data Access Working Group has posted a new candidate recommendation of SPARQL Query Language for RDF:

RDF is a directed, labeled graph data format for representing information in the Web. This specification defines the syntax and semantics of the SPARQL query language for RDF. SPARQL can be used to express queries across diverse data sources, whether the data is stored natively as RDF or viewed as RDF via middleware. SPARQL contains capabilities for querying required and optional graph patterns along with their conjunctions and disjunctions. SPARQL also supports extensible value testing and constraining queries by source RDF graph. The results of SPARQL queries can be results sets or RDF graphs.

Two features are identified as being "at risk", REDUCED and leading digits in prefixed names. I have no idea how significant these are. The latter at least strikes me as a really bad idea. Comments are due by August 31.

Tuesday, June 19, 2007 (Permalink)

David Reitter has released Aquamacs 1.0b, an Aqua-native build of GNU Emacs 22. "For instance, in addition to traditional Emacs shortcuts like C-x C-f (open a new file), Aquamacs understands Apple-O. Aquamacs behaves like a modern application on Mac (or Windows) when it comes to selecting, copying, pasting texts within Aquamacs or in between applications. Aquamacs offers nice, smooth fonts. Asian input methods work. It's easy to install and runs out-of-the box with no configuration. And all is built on GNU Emacs, so you can use your favorite Emacs packages!" Mac OS X 10.3.9 or later is required.

Planamesa Software has released NeoOffice/J 2.1 Patch 6, a Mac port of OpenOffice 2.1 using a Java-based GUI. This is mostly a bug fix release.

NeoOffice is catching up to Microsoft Office, though it's certainly not there yet, and I'm not sure it will ever get there. (Various look-and-feel bugs I've filed have been closed as won't fix.) Mac OS X 10.3.9 or later is required. NeoOffice is published exclusively under the GPL.

Monday, June 18, 2007 (Permalink)

Karl Waclawek and Fred L. Drake, Jr. have posted version 2.0.1 of Expat, a non-validating parser XML processor for C. This release fixes assorted bugs and build issues.

Saturday, June 16, 2007 (Permalink)

The printed book has been out for a while, but the Unicode Consortium has just posted the complete book in PDF format on its website. Version 5 adds 1,369 new characters for Cyrillic, Greek, Hebrew, Kannada, Latin, math, phonetic extensions, symbols, and five new scripts: Balinese, N’Ko, Phags-pa, Phoenician, and Sumero-Akkadian Cuneiform. In addition it:

makes changes to guarantee case-folding stability. Unicode 5.0 incorporates all the changes introduced in Unicode 4.1, including full interoperability with the most recent versions of GB 18030, JIS X 0213, and HKSCS, and support for stable identifiers and pattern syntax characters.

Unicode 5.0 revises and improves property values and behavioral specifications in areas such as character, word, line, and sentence segmentation, and tightens conformance requirements on Bidi implementations (used for Arabic and Hebrew). The text is significantly revised for clarity and completeness, especially for Unicode conformance.

Friday, June 15, 2007 (Permalink)

IDEAlliance has posted the call for papers for XML 2007. The conference takes place in Boston December 3-5 at a new hotel: the Marriott Copley Place, This is the major North American XML show. They are planning four tracks this year:

Enterprise XML computing
XML on the Web
Documents and Publishing
XML Training

According to David Megginson:

There will be no separate tutorial day; instead, the regular program (three tracks) ends at noon on Wednesday, then Wednesday afternoon will be devoted to a special training track for all different skill levels (no separate registration required)

This is the *only* call for participation; there will be no late-breaking call this year.

On one of the evenings, we plan to offer lightning sessions for standards groups -- each group will have 20 slides and 6 minutes and 40 seconds to let us know what they're working on. This will be a great way to learn a lot about a lot of specs and standards in a short time.

We continue to encourage presentations on open data and document technologies other than XML, such as JSON.

I'd like to attend, but I may not have the free time. Proposals are due by August 31.

Thursday, June 14, 2007 (Permalink)

The W3C Cascading Style Sheets working group has posted the candidate recommendation of Media Queries:

HTML4 and CSS2 currently support media-dependent style sheets tailored for different media types. For example, a document may use sans-serif fonts when displayed on a screen and serif fonts when printed. "Screen" and "print" are two media types that have been defined. Media Queries extend the functionality of media types by allowing more precise labeling of style sheets.

A Media Query consists of a media type and one or more expressions to limit the scope of style sheets. Among the media features that can be used in media queries are "width", "height", and "color". By using Media Queries, presentations can be tailored to a specific range of output devices without changing the content itself.

Wednesday, June 13, 2007 (Permalink)

The OpenOffice Project has released OpenOffice 2.2, an open source office suite for Linux and Windows that saves all its files as zipped XML. It also runs on the Mac with X-Windows. This release fixes bugs. OpenOffice is dual licensed under the LGPL and Sun Industry Standards Source License.

Daniel Veillard has released version 2.6.29 of libxml2, the open source XML C library for Gnome. This release fixes assorted bugs.

Tuesday, June 12, 2007 (Permalink)

Apple has posted a beta of Safari 3.0 for Windows. (There's one for Mac too, but Windows is the real shocker.) Safari supports XML, XSLT, CSS, XHTML, and RSS. Windows XP or later is required. New features include some CSS3 support, tabbed browsing, and an inspect element option in the context menu for web developers. This release is quite buggy, and will overwrite Safari 2 on a Mac. Don't install this on a production system.

This is also the browser that will be on the iPhone. The iPhone will be extensible through AJAX, just like a regular web browser. You still probably won't be able to install your own software directly onto the iPhone, but you can play it in the browser. The distinction isn't irrelevant, but it's less important than it used to be. Flash will most likely not be supported though.

Monday, June 11, 2007 (Permalink)

The Mozilla Project has posted version 0.8 of its XForms extension for Firefox 1.5 and later. Mozilla XForms support has been developed by IBM, Novell, and independent contributors. Improvements in this release include SOAPAction on submission, mediatype on output elements, range support for many different datatypes, resource element and attribute for submission. It's not a complete XForms implementation yet, but it's getting there.

Sunday, June 10, 2007 (Permalink)

The Mozilla Project has posted the fifth alpha of Firefox 3.0 for Mac, Linux, and Windows. This is code named "Gran Paradiso". This alpha is based on Gecko 1.9 rendering engine, and can now pass the Acid 2 test. There's supposed to be a new password manager but to my eyes it looks just like the old one, with all its user interface flaws. Maybe it's better at recognizing when different passwords need to be entered or stored? There's also a new bookmarks manager. SVG, XSLT, and Web Applications 1.0 supports has been improved.

Netscape has posted the first beta of version 9.0 of its namesake web browser. This release is based on Firefox. New features include:

Navigator 9 will automatically correct common typos in URLs.
News Menu and Sidebar
Link pad for saving links/URLs that you want to visit later without cluttering your bookmarks
In-browser voting
User resizeable textareas in forms (Good idea! Why did no one think of this before?)
Tab history

Netscape 9 is also once again available for Linux and Mac OS X as well as Windows.

Friday, June 8, 2007 (Permalink)

I've posted the second beta of XOM 1.2, my free-as-in-speech (LGPL) library for processing XML with Java. Compared to the 1.0-->1.1 transition, this is a very minor upgrade. There are just a couple of additional methods, a few bug fixes, and maybe a small optimization or two. All code written to the 1.1 or 1.0 APIs should run unchanged with 1.2. This beta upgrades jaxen to 1.1.1.

Thursday, June 7, 2007 (Permalink)

The Mozilla Project has released Camino 1.5, an open source Mac OS X web browser based on the Gecko 1.8 rendering engine and the Quartz GUI toolkit. It supports pretty much all the technologies that Mozilla does: HTML, XHTML, CSS, XML, XSLT, etc. 1.5 adds spell-checking, feed detection, session restore, Keychain sharing with Safari, and enhanced security for cookies, Flash, and plug-ins. Mac OS X 10.3 or later is required.

NewsGator has released version 3.0 of NetNewsWire, a closed source feed reader for the Mac. New features in 3.0 include:

Desktop integration – Spotlight, Address Book, iCal, iPhoto, Growl, Twitterific and more.
Full screen mode
Synchronized clippings
Microformat detection for contacts and calendar events.
Autoupdates

NetNewsWire is $30 payware. A free lite version of 3.0 has not yet been published.

Wednesday, June 6, 2007 (Permalink)

The OpenOffice Project has posted the first alpha of OpenOffice for Mac OS X Aqua. Previous version required X Windows. This one doesn't.

OpenOffice.org Aqua requires Mac OS X 10.4 (Tiger).

WARNING: THIS SOFTWARE MAY CRASH AND MAY DESTROY YOUR DATA DO NOT USE THIS SOFTWARE FOR REAL WORK IN A PRODUCTION ENVIRONMENT

This is an alpha test version so that developers and users can find out what works and not, and make comments on how to improve it.

There are a number of things that do not work in this version including, but not limited to:

You cannot print

PDF export does not properly work as the text won't show on the page right

Starting OpenOffice.org from a shared folder does not work

Copy and paste does not fully work

OpenOffice.org will crash after quitting

Some text is not drawn in places like Impress

Impress will not recognise multiple monitors

Better late than never. OpenOffice is dual licensed under the LGPL and Sun Industry Standards Source License.

Tuesday, June 5, 2007 (Permalink)

The W3C Web Services Activity has published revised candidate recommendations of Web Services Policy 1.5 - Framework and Web Services Policy 1.5 - Attachment. According to the former,

Web Services Policy 1.5 - Framework defines a framework and a model for expressing policies that refer to domain-specific capabilities, requirements, and general characteristics of entities in a Web services-based system.

A policy is a collection of policy alternatives. A policy alternative is a collection of policy assertions. A policy assertion represents a requirement, capability, or other property of a behavior. A policy expression is an XML Infoset representation of its policy, either in a normal form or in its equivalent compact form. Some policy assertions specify traditional requirements and capabilities that will manifest themselves in the messages exchanged(e.g., authentication scheme, transport protocol selection). Other policy assertions have no wire manifestation in the messages exchanged, yet are relevant to service selection and usage (e.g., privacy policy, QoS characteristics). Web Services Policy 1.5 - Framework provides a single policy language to allow both kinds of assertions to be expressed and evaluated in a consistent manner.

Web Services Policy 1.5 - Framework does not cover discovery of policy, policy scopes and subjects, or their respective attachment mechanisms. A policy attachment is a mechanism for associating policy with one or more policy scopes. A policy scope is a collection of policy subjects to which a policy applies. A policy subject is an entity (e.g., an endpoint, message, resource, interaction) with which a policy can be associated. Web Services Policy 1.5 - Attachment [Web Services Policy Attachment] defines such policy attachment mechanisms, especially for associating policy with arbitrary XML elements [XML 1.0], WSDL artifacts [WSDL 1.1, WSDL 2.0 Core Language], and UDDI elements [UDDI API 2.0, UDDI Data Structure 2.0, UDDI 3.0]. Other specifications are free to define either extensions to the mechanisms defined in Web Services Policy 1.5 - Attachment [Web Services Policy Attachment], or additional mechanisms not covered by Web Services Policy 1.5 - Attachment [Web Services Policy Attachment], for purposes of associating policy with policy scopes and subjects.

There's also an updated draft of Web Services Policy 1.5 - Primer. Comments are due by June 30.

Monday, June 4, 2007 (Permalink)

The W3C Scalable Vector Graphics Working Group has posted the first drafts of three specifications about SVG 1.2 filters:

According to the primer:

Filter effects are defined by 'filter' elements. To apply a filter effect to a graphics element or a container element, you set the value of the 'filter' property on the given element such that it references the filter effect.

Each 'filter' element contains a set of filter primitives as its children. Each filter primitive performs a single fundamental graphical operation (e.g., a blur or a lighting effect) on one or more inputs, producing a graphical result. Because most of the filter primitives represent some form of image processing, in most cases the output from a filter primitive is a single RGBA image.

The original source graphic or the result from a filter primitive can be used as input into one or more other filter primitives. A common application is to use the source graphic multiple times. For example, a simple filter could replace one graphic by two by adding a black copy of original source graphic offset to create a drop shadow. In effect, there are now two layers of graphics, both with the same original source graphics.

When applied to container elements such as 'g', the 'filter' property applies to the contents of the group as a whole. The group's children do not render to the screen directly; instead, the graphics commands necessary to render the children are stored temporarily. Typically, the graphics commands are executed as part of the processing of the referenced 'filter' element via use of the keywords SourceGraphic or SourceAlpha. Filter effects can be applied to container elements with no content (e.g., an empty 'g' element), in which case the SourceGraphic or SourceAlpha consist of a transparent black rectangle that is the size of the filter effects region.

Sunday, June 3, 2007 (Permalink)

Oleg Tkachenko has posted IronXSLT 0.1: According to Tkachenko:

Visual Studio already supports editing, running and even debugging XSLT, but it's still a painfully limited support. So I'm started building IronXSLT - Visual Studio plugin aimed to provide total integration of the XSLT language in Visual Studio IDE.

Current list of planned and already implemented IronXSLT features includes:

XSLT Library Project (Visual Studio project type for compiling XSLT into DLL)
XSLT Refactorings
Multiple XSLT engines
XSLT Profiler
Extensive library of XSLT code snippets
XPath Intellisense
Visual XSLT builder
XSLT2XLinq and XLinq2XSLT converters

IronXSLT version 0.1 implements first point.

IronXSLT supports only forthcoming Microsoft Visual Studio version, codenamed "Orcas", which is about to be released later this year.

The DocBook Project has released version 1.72.0 of the DocBook 5 XSL stylesheets. According to Bob Stayton, "The DocBook 5 XSL Stylesheets are the same as the regular 1.72.0 stylesheet release except that the templates match on elements in the DocBook namespace." I should really upgrade the Java Student's Resource to use this.

Norm Walsh has posted the third release candidate of DocBook 5.0. DocBook 5 is "a significant redesign that attempts to remain true to the spirit of DocBook." The schema is written in RELAX NG. A DTD and W3C XML Schema generated from the RELAX NG schema are also available. There's also a Schematron schema "that validates some extra-grammatical DocBook constraints. These patterns are also present directly in the RELAX NG Grammar and some validators, for example MSV, can perform both kinds of validation at the same time." Changes in this RC are small but significant.

Saturday, June 2, 2007 (Permalink)

The W3C Web Services activity has posted the last call Web Services Addressing 1.0 - Metadata working draft. "Web Services Addressing 1.0 - Metadata (this document) defines how the abstract properties defined in Web Services Addressing 1.0 - Core are described using WSDL, how to include WSDL metadata in endpoint references, and how WS-Policy can be used to indicate the support of WS-Addressing by a Web service."

Friday, June 1, 2007 (Permalink)

The Mozilla Project has released Firefox 2.0.0.4 and 1.5.0.12, Thunderbird 2.0.0.4 and 1.5.0.12, and SeaMonkey 1.0.9 and 1.1.2. These releases fix security flaws and improves support for Windows Vista. They also add Afrikaans and Belarusian localizations. All users should upgrade.

Thursday, May 31, 2007 (Permalink)

IDEAlliance has posted a call for late-breaking news for Extreme Markup Languages this August in Montreal. "Extreme is the leading international conference on markup theory and practice. If you have interesting markup applications, difficult markup problems, or intriguing solutions to problems related to the design and use of markup, markup languages, or markup tools; if you want to know what the leading theorists of markup are thinking; if you are the house markup expert and want to spend time with your kind, then you should plan on attending Extreme Markup Languages® 2007." Extreme is always a lot of fun, though I won't be able to attend this year myself. Submissions are due by June 15.

Wednesday, May 30, 2007 (Permalink)

Continuing its never-ending quest to prove that it's turtles all the way up, the W3C Semantic Web Activity has posted a note on POWDER: Use Cases and Requirements:

The development of the Protocol for Web Description Resources has been motivated by both commercial and social concerns. On the social side, there is a demand for a system to identify content that meets certain criteria as they apply to specified audiences. Commercially, there is a demand to be able to personalize content for a particular user or delivery context.

POWDER will address these demands by defining a method through which relatively small amounts of metadata, that can be produced quickly and easily, can be applied to large amounts of content.

The use cases and requirements for POWDER were originally developed under the Web Content Label Incubator Activity. They have been revised and updated for this Working Group Note.

Tuesday, May 29, 2007 (Permalink)

Planamesa Software has released NeoOffice/J 2.1 Patch 5, a Mac port of OpenOffice 2.1 using a Java-based GUI.

After countless hours of performance analysis work, we recently had a breakthrough that enabled us to identify specific performance bottlenecks in NeoOffice's underlying OpenOffice.org code and fix those bottlenecks by replacing the OpenOffice.org code with code that is optimized specifically for Mac OS X.

As a result of our fixes, image scaling and drawing speed has been increased and memory usage has been reduced. While users may not notice much change in performance when using text-based documents, users should see image-intensive activities such as running a slide show with transitions or scrolling through large documents perform several times faster than before.

Monday, May 28, 2007 (Permalink)

The W3C Web Security Context Working Group has updated the working draft of Web Security Experience, Indicators and Trust: Scope and Use Cases.

Web user agents are now used to engage in a great variety and number of commercial and personal activities. Though the medium for these activities has changed, the potential for fraud has not. This Working Group is chartered to recommend user interfaces that help users make trust decisions on the Web.

This first Working Group document elaborates upon the group's charter to explain what the group aims to achieve, what technologies may be used and how proposals will be evaluated. This elaboration is limited to the group's technical work and does not cover additional activities the group intends to engage in, such as ongoing outreach and education.

Sunday, May 27, 2007 (Permalink)

The W3C Web Services Description Working Group has posted three proposed recommendations for WSDL 2.0:

Web Services Description Language (WSDL) Version 2.0 Part 1: Core Language

Web Services Description Language (WSDL) Version 2.0 Part 2: Adjuncts

WSDL is an XML format for describing network services as a set of endpoints operating on messages containing either document-oriented or procedure-oriented information. Web Services Description Language (WSDL) Version 2.0 Part 2: Adjuncts defines predefined extensions for use in WSDL 2.0:

Message exchange patterns

Operation styles

Binding Extensions

Web Services Description Language (WSDL) Version 2.0 Part 0: Primer

Comments are due by June 20.

Saturday, May 26, 2007 (Permalink)

Sun has posted version 0.5 of xmlroff, an open source XSL Formatting Objects to PDF and PostScript converter. xmlroff is written in C for Linux, and relies on the libxml2, libxslt, and the GLib, and GObjectfrom GTK+ and GNOME (though neither GTK+ nor Gnome is required). It also needs PDFlib, FreeType2, and Fontconfig. xmlroff can be run from the command line. It also includes a libfo library. This release eliminates some dependencies.

Friday, May 25, 2007 (Permalink)

The W3C and the Unicode Consortium have jointly published a note on Unicode in XML and other Markup Languages. Essentially it lists characters that should and should not be used in markup. A quick glance at the forbidden characters suggests that XML 1.0 is more in conformance with this note than XML 1.1 is.

Thursday, May 24, 2007 (Permalink)

Edgewall Software has posted Genshi 0.4.1, an open source "Python library that provides an integrated set of components for parsing, generating, and processing HTML, XML or other textual content for output generation on the web. The major feature is a template language, which is heavily inspired by Kid."

Kiyut has released Sketsa 4.2.2, a $49 payware SVG editor written in Java. Version 4.2.2 fixes bugs. Java 5 or later is required.

Wednesday, May 23, 2007 (Permalink)

The W3C Semantic Web Deployment Working Group has posted the first public working draft of SKOS Use Cases and Requirements. According to the document:

Knowledge organisation systems, such as taxonomies, thesauri or subject heading lists, play a fundamental role in information structuring and access. The Semantic Web Deployment Working Group aims at providing a model for representing such vocabularies on the Semantic Web: SKOS (Simple Knowledge Organisation System).

This document presents the preparatory work for a future version of SKOS. It lists representative use cases, which were obtained after a dedicated questionnaire was sent to a wide audience. It also features a set of fundamental or secondary requirements derived from these use cases, that will be used to guide the design of SKOS.

Monday, May 21, 2007 (Permalink)

The W3C Web Content Accessibility Guidelines Working Group has updated three working drafts covering various topics:

Web Content Accessibility Guidelines 2.0

"Web Content Accessibility Guidelines 2.0 (WCAG 2.0) covers a wide range of issues and recommendations for making Web content more accessible. This document contains principles, guidelines, and success criteria that define and explain the requirements for making Web-based information and applications accessible. 'Accessible' means usable to a wide range of people with disabilities, including blindness and low vision, deafness and hearing loss, learning difficulties, cognitive limitations, limited movement, speech difficulties, photosensitivity and combinations of these. Following these guidelines will also make your Web content more accessible to the vast majority of users, including older users. It will also enable people to access Web content using many different devices - including a wide variety of assistive technologies."

This draft is in last call. Comments are due by May 31.

Understanding WCAG 2.0

This draft "provides detailed information about each success criterion, including its intent; the key terms that are used in the success criterion; examples of Web content that meet the success criterion using various Web technologies (for instance, HTML, CSS, XML) and common examples of Web content that does not meet the success criterion. Finally, this document also explains how the success criteria in WCAG 2.0 help people with different types of disabilities."

Techniques for WCAG 2.0

This document "provides information to Web content developers who wish to satisfy the success criteria of Web Content Accessibility Guidelines 2.0 (WCAG 2.0). Techniques are specific authoring practices that may be used in support of the WCAG 2.0 success criteria. This document provides 'General Techniques' that describe basic practices that are applicable to any technology, and technology-specific techniques that provide information applicable to specific technologies. Currently, technology-specific techniques are available for HTML, CSS, ECMAScript, SMIL, ARIA, and Web servers. The World Wide Web Consortium only documents techniques for non-proprietary technologies; the WCAG Working Group hopes vendors of other technologies will provide similar techniques to describe how to conform to WCAG 2.0 using those technologies. Use of the techniques provided in this document makes it easier for Web content to demonstrate conformance to WCAG 2.0 success criteria than if these techniques are not used." This document also describes common failure modes where the web content accessibility guidelines are violated.

There's a lot of good information here. These should really be required reading for all HTML authors and web designers. The Techniques spec is probably the most practical, and where most readers should start.

Saturday, May 19, 2007 (Permalink)

The W3C XQuery working group has published the last call working drafts of:

Since the last version was published several technical and editorial changes have been made. Among the most significant changes are: The formal semantics diagrams have been redrawn. A conformance statement has been added. XML Schemas that together define the XML representation of XQuery 1.0 and XPath 2.0 Full-Text have been added, along with a stylesheet to transform that XML representation to the ordinary XQuery syntax. Section 3 has been significantly restructured for clarity and readability. The semantics of nesting FTDistance selections have been made more useful. The semantics for FTMildNot now properly handle phrases.

Comments are due by June 22.

Thursday, May 17, 2007 (Permalink)

YesLogic has released Prince 6.0, a $495-$3900 payware batch formatter for Linux, Windows, and Mac OS X that produces PDF and PostScript from XML documents with CSS stylesheets that passes the Acid2 test. Version 6.0 improves "support for HTML, CSS, SVG, and MathML. Documents and style sheets can now be loaded from the Web over HTTP, and new publishing features have been added including hyphenation, crop marks, small caps, and CMYK colors."

DataDirect Technologies has released DataDirect XQuery 3.0, a closed source Java library for integrating XQuery functionality into your application that implements the XQuery API for Java. As well as supporting XML documents, it can query and update relational databases, EDI, and CSV data. Pricing ranges from $455 to $2100.

DataDirect Technologies has also released DataDirect XML Converters, a set of closed source Java and .NET libraries for converting IATA, EDIFACT, X12, EANCOM, and Flat Files and Stylus Studio Enterprise Suite to XML. Pricing ranges from $955 to $20000.

Wednesday, May 16, 2007 (Permalink)

Microsoft has posted the first beta of Office Open XML File Format Converter for Mac. Not to be confused with OpenOffice, Office Open XML is a poorly designed XML format for Microsoft Office documents used by the latest versions of Windows Word, Excel, and PowerPoint. According to Geoff Price, the converter "is a stand-alone Macintosh application that converts .docx documents - that is, documents saved by Word 2007 for Windows in the Office Open XML file format - into rich text format (RTF) documents so that they can be automatically opened in either Word 2004 or Word v.X for Mac OS X."

Matt Mullenweg has released Wordpress 2.2 an open source (GPL) blog engine based on PHP and MySQL. This release includes full Atom 1.0 support. I hope I can now retire my Atomic plug-in (based on Benjamin Smedberg's Atom 1.0 plug-in, itself based on James Snell's Atom 1.0 templates). This release also adds a new Blogger importer and widget support for easier customization. All users should upgrade when they get a minute, and turn off RSS feeds shortly thereafter.

Kiyut has released Sketsa 4.2.2, a $49 payware SVG editor written in Java. Version 4.2.2 fixes bugs. Java 5 or later is required.

Tuesday, May 15, 2007 (Permalink)

The XML Apache Project has released XIndice 1.1, an open source native XML database published under the Apache Software License. XIndice supports XPath for queries and XML:DB XUpdate for XML updates and the XML:DB XML database API for Java as well as an XML-RPC interface. Changes since 1.0 are mostly minor and include bug fixes and Java 1.4 support.

Monday, May 14, 2007 (Permalink)

The W3C CSS Working Group has posted a new working draft of CSS3 module: Generated Content for Paged Media. "This module describes features often used in printed publications. In particular, this specification describes how CSS style sheets can express running headers and footers, leaders, cross-references, footnotes, sidenotes, named flows, hyphenation, new counter styles, character substitution, image resolution, page floats, advanced multi-column layout, conditional content, crop and cross marks, bookmarks, CMYK colors, continuation markers, change bars, line numbers, named page lists, and generated lists. Along with two other CSS3 modules – multi-column layout and paged media – this module offers advanced functionality for presenting structured documents on paged media."

Friday, May 11, 2007 (Permalink)

John Cowan has released TagSoup 1.1.3, an open source, Java-language, SAX parser for nasty, ugly HTML. This release fixes bugs that arose when working with Saxon. "The 1.1.3 release fixes a problem that made TagSoup unable to interoperate with the XSLT processors Saxon-B and Saxon-SA. Any attempt to set the ContentHandler, DTDHandler, ErrorHandler, LexicalHandler, or EntityResolver of a TagSoup Parser object to null caused the parser to break rather than resetting its behavior to the default. As a result, Saxon would crash with a NullPointerException when processing the second document."

Thursday, May 10, 2007 (Permalink)

The W3C GRDDL Working Group has posted the candidate recommendation of Gleaning Resource Descriptions from Dialects of Languages (GRDDL). According to the abstract,

GRDDL is a mechanism for Gleaning Resource Descriptions from Dialects of Languages. This GRDDL specification introduces markup for declaring that an XML document includes gleanable data and for linking to an algorithm, typically represented in XSLT, for gleaning the resource descriptions from the document.

The markup includes a namespace-qualified attribute for use in general-purpose XML documents and a profile-qualified link relationship for use in valid XHTML documents. The GRDDL mechanism also allows an XML namespace document (or XHTML profile document) to declare that every document associated with that namespace (or profile) includes gleanable data and for linking to an algorithm for gleaning the data.

The result of such a glean is an RDF description of the document.

The W3C GRDDL Working Group has also posted a working draft of GRDDL Test Cases. "This document describes and includes test cases for software agents that extract RDF from XML source documents by following the set of mechanisms outlined in the Gleaning Resource Description from Dialects of Language [GRDDL] specification. They demonstrate the expected behavior of a GRDDL-aware agent by specifying one (or more) RDF graph serializations which are the GRDDL results associated with a single source document."

Wednesday, May 9, 2007 (Permalink)

SyncroSoft has also released <Oxygen/> 8.2, $298 payware XML editor written in Java. Oxygen supports XML, XSL, DTDs, XQuery, SVG, Relax NG, Schematron, and the W3C XML Schema Language. According to the announcement,

The main addition is support for *working with modules*, that is with XSLT stylesheets that are supposed to be included or imported in other XSLT stylesheets.

Imagine a stylesheet that defines a global variable and that includes another one that uses that global variable. What happens when you work on the included file? You will get errors if you try to validate it because the variable is not defined in the included file - thus the module is invalid by itself but valid if included from the main file that defines the global variable. oXygen 8.2 solve this problem allowing you to configure a validation scenario and instead of validating the module file you can instruct oXygen to validate the main file, thus you get the current module file validated in the context it is used from.

A validation scenario allows you to specify a set of files to validate and for each file what processor to use and whether or not to perform continuous validation. Thus they do not only solve the problem of working with modules, they also enable support for performing *multiple validations in a single action*. One usecase here is for instance if you want to make sure the your file is validated against different processors - instead of validating it against each processor in part you just configure a validation scenario that validates it against all the processors. Another usecase is for instance if you want to make sure a module is valid in different contexts - imagine a module that is reused by different applications - then when you edit it you can specify all the files that include it to be validated, thus you know that your code will work in all situations you are using it.

Another XSLT related addition is the updated *support for the latest Saxon 8 XSLT 2.0 processor* from Saxonica. oXygen 8.2 supports Saxon 8.9.0.3, both the Basic version and the Schema Aware version (for the later you need a separate license from Saxonica) for validation, transformations, debugging and profiling.

Tuesday, May 8, 2007 (Permalink)

I have uploaded Jaxen 1.1.1, an open source XPath 1.0 engine written in Java that supports multiple object models including DOM, XOM, JDOM, and dom4j. It is also flexible enough to be adapted to XML views of non-XML data structures. For instance, PMD uses it to enable XPath expressions to query compiled Java byte code. Version 1.1.1 is believed to be fully conformant with the XPath 1.0 specification. This release fixes assorted minor bugs and one significant bug that incorrectly evaluated some XPath expressions. You should upgrade when you get a chance. Jaxen is published under a modified BSD license.

Monday, May 7, 2007 (Permalink)

The W3C XForms working group has posted the "post-last call" working draft of XForms 1.1. Changes since 1.0 include:

A new namespace URI, http://www.w3.org/2004/xforms/
power, luhn, current, choose, id and property XPath extension functions
An email address datatype
An ID card number datatype
A prompt action element
An xforms-close event
An xforms-submit-serialize event
Inline rendering of non-text media types

According to John Boyer

The largest task in preparing the post last call editors draft was to properly remove all the diff marks, which is how I found out that you now have 544 reasons to take a look at XForms 1.1.

Another interesting statistic is that the XForms 1.1 specification print preview is 203 pages, which is 30% more content than XForms 1.0 Second Edition, excluding consideration of changes to existing content.

Finally, note that XForms 1.0 Second Edition, which became a Recommendation in March 2006, contained 143 differences relative to the original XForms 1.0 from 2003. This includes 39 pages of errata and a content increase of 10 pages, or 7%. So if it's been a while for you, then perhaps there are nearly 700 reasons to take a look at XForms 1.1.

Comments are due by April 5.

Saturday, May 5, 2007 (Permalink)

The W3C Voice Browser Working Group has posted the proposed recommendation of VoiceXML 2.1 recommendation. VoiceXML is used to describe those annoying call trees you hear when calling most major companies. (Press 1 if you want to wait on hold for 20 minutes and then be hung up on; press 2 if you want to wait indefinitely; press 3 if you'd rather we just hung up on you now.) New features in 2.1 include data and foreach elements, dynamic grammars and scripts, detecting barge-in during prompt playback, fetching xml without requiring a dialog transition, recording user utterances while attempting recognition, and specifying the media format of utterance recordings.

Friday, May 4, 2007 (Permalink)

John Cowan has released TagSoup 1.1.2, an open source, Java-language, SAX parser for nasty, ugly HTML. "This release fixes the reporting of CDATA sections. In TagSoup 1.1 and previous versions, if you specified a SAX LexicalHandler to receive indications of comments and CDATA sections, you were told that the contents of the elements 'style' and 'script' constitute a CDATA section. This report was incorrect, as 'style' and 'script' elements may contain characters that are illegal in XML CDATA sections. Consequently, the reports have been removed. However, this release now correctly reports actual CDATA sections, which was not the case in any previous release. Neither of these changes affects users who don't make use of the SAX LexicalHandler facilities." All users should upgrade.

Thursday, May 3, 2007 (Permalink)

The W3C Math Working Group has posted the first public working draft of Mathematical Markup Language (MathML) Version 3.0. Changes since 2.0 include content dictionaries, "a mechanism for recording that a particular notational structure has a particular mathematical meaning". Version 3.0 is also supposed to enable easier markup of elementary school mathematics.

The W3C Math Working Group has also posted the first public working draft of A MathML for CSS Profile . "This document presents a subset of MathML 3.0 [mathml3] which can be used to capture the structure of mathematical formulas in a way particularly suitable for further CSS formatting. This subset, called here a MathML profile, is expected to facilitate adoption of MathML in web browsers and CSS formatters, since it emphasizes the widely adopted CSS [css] visual formatting model enhanced with only a few mathematically oriented extensions. These are present to allow formatting some complex inline expressions requiring special layout schemata given in presentational MathML. The development of this CSS-oriented profile is coordinated with ongoing work on CSS3 and may require a limited set of new properties to be added to existing modules. The full MathML specification defines a more extensive markup language for mathematical formalism than can readily be rendered using the present CSS visual formatting model and its realizations."

Elsevier has released xqdoc, an open source "automated documentation tool for XQuery". Supported engines include MarkLogic, eXist, and Saxon. xqdoc is published under the Apache 2.0 license. Java 1.4 or later is required.

Tuesday, May 1, 2007 (Permalink)

ETH Zurich has released MXQuery 0.2.1, an open source (Apache 2.0) XQuery engine written in Java. It supports XQuery 1.0 (though typeless), XQueryP, FORSEQ, and XQuery Update.

Monday, April 30, 2007 (Permalink)

The W3C XML Protocol Working Group has published "second edition" recommendations for SOAP 1.2:

Besides incorporating errata, the primer adds an "an overview of the XML-binary Optimized Packaging, SOAP Message Transmission Optimization Mechanism and Resource Representation SOAP Header Block specifications and their usage." The adjuncts draft "incorporates changes to the SOAP Request Response Message Exchange pattern (MEP) to permit the SOAP envelope in the response to be optional, to allow for one-way message interactions." Comments are due by February 2.

Saturday, April 28, 2007 (Permalink)

The W3C Internationalization Tag Set Working Group has posted the second public working draft of Best Practices for XML Internationalization. "This document provides a set of guidelines for developing XML documents and schemas that are internationalized properly. Following the best practices describes here allow both the developer of XML applications, as well as the author of XML content to create material in different languages." Suggestions include:

Provide xml:lang to specify natural language content
Provide a way to specify text directionality
Avoid translatable attributes
Indicate the translatability of elements and attributes
Provide a way to override translatability information
Provide text segmentation-related information
Provide a way to specify ruby text
Provide a way to specify comments for translators
Provide a way to specify unique identifiers
Indicate terminology-related elements
Provide a way to override terminology information
Use multilingual documents with caution
Name elements with caution
Provide ITS rules for your DTD or schema
Specify the language of the content
Specify text directionality if needed
Override translatability information if needed
Assign unique identifiers to text items when possible
Use CDATA sections with caution
Provide comments for translators
Ensure any inserted text is context-independent
Use entity references with caution
Place sub-flow elements with caution

I'm not sure I agree with everything they say. "Avoid translatable attributes" sounds very questionable to me, though there is a rationale for it. Feedback is requested,

Friday, April 27, 2007 (Permalink)

Michael Sweet has released Mini-XML 2.3, an open source (LGPL) XML parser for C.

Leigh Klotz has released DeXSS 1.0, an open source SAX parser based on TagSoup that removes all JavaScript from an HTML document.

Gerald Schmidt has released XML Copy Editor 1.0.9.4, a free-as-in-speech (GPL) XML editor for Windows and Linux. Features include DTD/XML Schema/RELAX NG validation, XSLT, XPath, pretty-printing, syntax highlighting, tag folding, tag completion, spell and style check, XHTML, XSL, DocBook and TEI, and Microsoft Word import and export. This release adds a Chinese localization and fixes bugs.

Syntext has released Serna 3.3. a $268 payware XSL-based WYSIWYG XML Document Editor for Mac OS X, Windows, and Unix. Features include on-the-fly XSL-driven XML rendering and transformation, on-the-fly XML Schema validation, XInclude, and spell checking. Version 3.3 fixes bugs and improves performance.

Thursday, April 26, 2007 (Permalink)

The W3C Web Application Formats Working Group has published candidate recommendation of XML Binding Language (XBL) 2.0.

This specification defines the XML Binding Language and some supporting DOM interfaces and CSS features. XBL is a mechanism for overriding the standard presentation and interactive behavior of particular elements by attaching those elements to appropriate definitions, called bindings. Bindings can be attached to elements using either CSS, the DOM, or by declaring, in XBL, that elements matching a specific selector are implemented by a particular binding. The element that the binding is attached to, called the bound element, acquires the new behavior and presentation specified by the binding.

Bindings can contain event handlers that watch for events on the bound element, an implementation of new methods and properties that become accessible from the bound element, shadow content that is inserted underneath the bound element, and associated resources such as scoped style sheets and precached images, sounds, or videos.

XBL cannot be used to give a document new semantics. The meaning of a document is not changed by any bindings that are associated with it, only its presentation and interactive behavior.

Wednesday, April 25, 2007 (Permalink)

The W3C Voice Browser Working Group has posted the proposed recommendation of VoiceXML 2.1. VoiceXML is used to describe those annoying call trees you hear when calling most major companies. (Press 1 if you want to wait on hold for 20 minutes and then be hung up on; press 2 if you want to wait indefinitely; press 3 if you'd rather we just hung up on you now.) New features in 2.1 include data and foreach elements, dynamic grammars and scripts, detecting barge-in during prompt playback, fetching xml without requiring a dialog transition, recording user utterances while attempting recognition, and specifying the media format of utterance recordings.

Tuesday, April 24, 2007 (Permalink)

The W3C Internationalization GEO (Guidelines, Education & Outreach) Working Group has posted the finished version of Internationalization Best Practices: Specifying Language in XHTML & HTML Content. According to the note, "Specifying the language of content is useful for a wide number of applications, from linguistically-sensitive searching to applying language-specific display properties. In some cases the potential applications for language information are still waiting for implementations to catch up, whereas in others, such as detection of language by voice browsers, it is a necessity today. On the other hand, adding markup for language information to content is something that can and should be done today. Without it, it will not be possible to take advantage of any future developments." This advice is summarized in 16 "best practices:

Best Practice 1: Always declare the default language for text in the page using attributes on the html tag, unless the document contains content aimed at speakers of more than one language.
Best Practice 2: Where a document contains content aimed at speakers of more than one language, decide whether you want to declare one language in the html tag, or leave the languages undefined until later.
Best Practice 3: Where a document contains content aimed at speakers of more than one language, try to divide the document linguistically at the highest possible level, and declare the appropriate language for each of those divisions.
Best Practice 4: Use the lang and/or xml:lang attributes around text to indicate any changes in language.
Best Practice 5: For HTML use the lang attribute only, for XHTML 1.0 served as text/html use the lang and xml:lang attributes, and for XHTML served as XML use the xml:lang attribute only.
Best Practice 6: Use language attributes rather than HTTP or meta elements to declare the default language for text processing.
Best Practice 7: Do not declare the default language of a document in the body element, use the html element.
Best Practice 8: If the text in attribute values and element content is in different languages, consider using a nested approach.
Best Practice 9: Consider using a Content-Language declaration in the HTTP header or a meta tag to declare metadata about the language(s) of the intended audience of a document.
Best Practice 10: Where a document contains content aimed at speakers of more than one language, use Content-Language with a comma-separated list of language tags.
Best Practice 11: Follow the guidelines in the IETF's BCP 47 for language attribute values.
Best Practice 12: Use the shortest possible language tag values.
Best Practice 13: Where possible, use the codes zh-Hans and zh-Hant to refer to Simplified and Traditional Chinese, respectively.
Best Practice 14: When pointing to a resource in another language, consider the pros and cons of indicating the language of the target document.
Best Practice 15: If you want to indicate that the target document of an a element is in another language, consider the pros and cons of using hreflang with CSS.
Best Practice 16: Do not use flag icons to indicate languages.

The W3C GRDDL Working Group has posted the second public working draft of GRDDL Use Cases: Scenarios of extracting RDF data from XML. Use cases include:

The W3C Authoring Tool Accessibility Guidelines Working Group has posted a new working draft of Implementation Techniques for Authoring Tool Accessibility Guidelines 2.0. "This document provides non-normative information to authoring tool developers who wish to satisfy the checkpoints of 'Authoring Tool Accessibility Guidelines 2.0' [ATAG20]. It includes suggested techniques, sample strategies in deployed tools, and references to other accessibility resources (such as platform-specific software accessibility guidelines) that provide additional information on how a tool may satisfy each ATAG 2.0 checkpoint."

Monday, April 23, 2007 (Permalink)

The Shiira Project has released Shiira 2.0, an open source (Modified BSD license) Mac OS X web browser based on Web Kit and written in Cocoa. "The goal of the Shiira Project is to create a browser that is better and more useful than Safari." They've failed. Shiira is indeed much prettier than Safari, but a pretty face does not make up for an inability to handle raw XML documents like this one. Safari can handle this. There's no excuse for Shiira not to. Mac OS X 10.3.9 or later is required.

Saturday, April 21, 2007 (Permalink)

The W3C Web API Working Group has posted the first public working draft of Progress events 1.0. "This document describes event types that can be used for monitoring the progress of an operation. It is primarily intended for data transfer operations such as XMLHTTPRequest [XHR], but should be usable in other relevant contexts."

Thursday, April 19, 2007 (Permalink)

Bare Bones Software has released version 8.6.2 of BBEdit, my preferred text editor on the Mac, and what I'm using to type these very words. This is a bug fix release, though it doesn't seem to fix any of several issues I've reported lately. I am getting a little disturbed that a number of the bugs and user interface quirks I've been reporting are getting "won't fixed". BBEdit is $199 payware. Upgrades from 8.5 are free. Upgrades from 8.0 cost $30 and upgrades from 7.x costs $40. Mac OS X 10.4 or later is required.

Wednesday, April 18, 2007 (Permalink)

The W3C Semantic Web Deployment Working Group and XHTML2 Working Group have posted the first public working draft of RDFa Use Cases: Scenarios for Embedding RDF in HTML. "Current web pages, written in HTML, contain significant inherent structured data. When publishers can express this data more completely, and when tools can read it, a new world of user functionality becomes available, letting users transfer structured data between applications and web sites. An event on a web page can be directly imported into a user's desktop calendar. A license on a document can be detected so that the user is informed of his rights automatically. A photo's creator, camera setting information, resolution, and topic can be published as easily as the original photo itself, enabling structured search and sharing. RDFa is a syntax for expressing RDF structured data in HTML. This document provides use case scenarios for RDFa. An introduction to implementing RDFa is provided in the RDFa Primer, while the details of the syntax are explained in the RDFa Syntax (to be published)." Here are some of the use cases:

Use Case #1 — Basic Structured Blogging: Paul maintains a blog and wishes to "mark up" his existing page with structure so that tools can pick up his blog post tags, authors, titles, and his blogroll. In particular, his HTML blog should be usable as its own structured feed.
Use Case #2 — Publishing an Event - Overriding Some of the Rendered Data: Paul sometimes gives talks on various topics, and announces them on his blog. He would like to mark up these announcements with proper scheduling information, so that RDFa-enabled agents can automatically obtain the scheduling information and add it to the browsing user's calendar. Importantly, some of the rendered data might be more informal than the machine-readable data required to produce a calendar event. Also of importance: Paul may want to annotate his event with a combination of existing vocabularies and a new vocabulary of his own design.
Use Case #3 — Content Management Metadata: Tod sells an HTML-based content management system, where all documents are processed and edited as HTML, sent from one editor to another, and eventually published and indexed. He would like to build up the editorial metadata within the HTML document itself, so that it is easier to manage and less likely to be lost.
Use Case #4 — Self-Contained HTML Fragments: Tara runs a video sharing web site. When Paul wants to blog about a video, he can paste a fragment of HTML provided by Tara directly into his blog. The video is then available inline, in his blog, along with any licensing information (Creative Commons?) about the video.
Use Case #5 — Web Clipboard: Ursula is looking for a new apartment and some items with which to furnish it. She browses various RDFa-enabled web pages, including apartment listings, furniture stores, kitchen appliances, etc. Every time she finds an item she likes, she can point to it, extract the locally-relevant structured data expressed using RDFa, and transfer it to her apartment-hunting page, where it can be organized, sorted, categorized. Any additional features of the HTML that are not structured, e.g. links to photos, are conserved by the transfer.
Use Case #6 — Semantic Wiki: Tim runs an RDFa-aware Semantic Wiki, where users contribute content in Wiki markup, using a WYSIWYG tool, or using HTML+RDFa. In all cases, the semantic wiki produces HTML+RDFa, so that users like Ursula can transfer the structured content from one semantic wiki (or any other RDFa source) to another semantic wiki (or any other RDFa destination). In particular, Ursula may be pasting her apartment-and-furnishing finds into her own Semantic Wiki.
Use Case #7 — Augmented Browsing for Scientists: Patrick writes a science blog where he discusses proteins, genes, and chemicals. As he has very little control over the layout—he's using a fairly constrained hosting provider—, Patrick adds RDFa to indicate the scientific components he's working with. Ulrich, a scientist, can browse Patrick's site with an RDFa-aware browser and automatically cross-reference the proteins and genes that Patrick is talking about.
Use Case #8 — Advanced Data Structures: Patrick keeps a list of his scientific publications on his web site. Using the BibTex vocabulary, he would like to provide structure within this publications page so that Ulrich, who browses the web with an RDFa-aware client, can automatically extract this information and use it to cite Patrick's papers.
Use Case #9 — Publishing a RDF Vocabulary: Paul wants to publish a large vocabulary in RDFS and/or OWL. Paul also wants to provide a clear, human readable description of the same vocabulary. Using RDFa, the terms themselves can be mixed with a descriptive text in HTML. The RDFa engine can then extract the vocabulary in RDF/XML and/or n3 formats, to be included used directly by RDF aware applications (eg, reasoners).

Personally, I'm still skeptical of anything that involves page-author-created metadata for web pages.

Tuesday, April 17, 2007 (Permalink)

Daniel Veillard has released version 2.6.28 of libxml2, the open source XML C library for Gnome. This release fixes assorted bugs.

Monday, April 16, 2007 (Permalink)

The Modis Team has released Sedna 2.0, an open source native XML database for Windows and Linux written in C++ and Scheme and published under the Apache License 2.0. Sedna supports XQuery and its own declarative update language. This release adds PHP and Python APIs, and an administration GUI.

Sunday, April 15, 2007 (Permalink)

The W3C Multimodal Interaction working group has posted the sixth working draft of EMMA: Extensible MultiModal Annotation markup language. According to the abstract, this spec "provides details of an XML markup language for containing and annotating the interpretation of user input. Examples of interpretation of user input are a transcription into words of a raw signal, for instance derived from speech, pen or keystroke input, a set of attribute/value pairs describing their meaning, or a set of attribute/value pairs describing a gesture. The interpretation of the user's input is expected to be generated by signal interpretation processes, such as speech and ink recognition, semantic interpreters, and other types of processors for use by components that act on the user's inputs such as interaction managers."

Friday, April 13, 2007 (Permalink)

The W3C Web Services Description Working Group has posted four working drafts and two last call working drafts for WSDL 2.0:

Web Services Description Language (WSDL) Version 2.0 Part 1: Core Language

Web Services Description Language (WSDL) Version 2.0 Part 2: Adjuncts

WSDL is an XML format for describing network services as a set of endpoints operating on messages containing either document-oriented or procedure-oriented information. Web Services Description Language (WSDL) Version 2.0 Part 2: Adjuncts defines predefined extensions for use in WSDL 2.0:

Message exchange patterns

Operation styles

Binding Extensions

Web Services Description Language (WSDL) Version 2.0 Part 0: Primer

Web Services Description Language (WSDL) Version 2.0 SOAP 1.1 Binding

"WSDL SOAP 1.1 Binding (this specification) describes the binding extension for SOAP 1.1 [SOAP11] protocol. This binding is intended to ease the migration from WSDL 1.1 to WSDL 2.0 for implementers describing services that use SOAP 1.1 protocol. And, this binding allows users to continue using SOAP 1.1 protocol."

Web Services Description Language (WSDL) Version 2.0: Additional MEPs

"This document defines additional message exchange patterns (MEPs) to be used in WSDL 2.0 and are provided as examples of the extensibility of WSDL 2.0. This document is the product of the Web Services Description Working Group, but its contents are non-normative."

Web Services Description Language (WSDL) Version 2.0: RDF Mapping

Web Services Description Language is defined in XML, because XML is the standard format for exchange of structured information. The use of XML brings better interoperability to WSDL generators and parsers, and the use of XML Schema makes the structure of WSDL well constrained, yet extensible. On the other hand, XML vocabularies in general don't have clear composition rules, so combining for example the WSDL description of a Web service, the service's policies and other information (presumably expressed in XML) can be done in many significantly different ways (e.g. extending WSDL, extending the policy language, creating a special XML container for all the information etc.), and little interoperability can be expected when such combined documents are used.

For example, a policy can be combined with WSDL by adding the policy elements in WSDL service element. Equally, a WSDL description can be combined with a policy by adding the WSDL description as part of the policy. While the results should be similar (WSDL with policy information), they are in fact very different for the processing software, and a policy in WSDL cannot easily be used by software that doesn't know WSDL.

In contrast, the Semantic web requires knowledge from many different sources to be easily combined so that unexpected data connections can be used. For this purpose there is the Resource Description Framework (RDF), whose graph structure together with the use of URIs for identifying nodes makes it very easy for different documents to be brought together. If a WSDL document describes a Web service, a policy document attaches constraints to the service and a general description specifies the author of the service, all this information can be merged and the resulting document will contain all the three kinds of information associated with the single service.

The main objective of this specification is to present a standard RDF ([RDF]) and OWL ([OWL]) vocabulary equivalent to WSDL 2, so that WSDL 2 documents can be transformed into RDF and merged with other Semantic Web data.

Comments are due by April 15.

The W3C Web Services Policy Working Group has posted a working draft of WSDL 1.1 Element Identifiers. "This section defines a fragment identifier syntax for identifying elements of a WSDL 1.1 document. This fragment identifier syntax is compliant with the [XPointer Framework]. This document is primarily based upon [WSDL 2.0 Core]. There is a substantial difference between the WSDL 1.1 and WSDL 2.0 fragment identifiers.WSDL 2.0 defines fragment identifiers with respect to the WSDL 2.0 component model, whereas WSDL 1.1 defines XML element and attribute syntax only. Because there is no WSDL 1.1 component model, the WSDL 1.1 fragment identifiers identify WSDL 1.1 elements."

The W3C Web Services Activity. has sent Semantic Annotations for WSDL and XML Schema back to last call. According to the draft,

Semantic Annotations for WSDL and XML Schema (SAWSDL) defines how to add semantic annotations to various parts of a WSDL document such as input and output message structures, interfaces and operations. The extension attributes defined in this specification fit within the WSDL 2.0 [WSDL 2.0] and WSDL 1.1 [WSDL 1.1] extensibility frameworks. For example, it defines a way to annotate WSDL interfaces and operations with categorization information that can be used to publish a Web service in a registry. The annotations on schema types can be used during Web service discovery and composition. In addition, SAWSDL defines an annotation mechanism for specifying the structural mapping of XML Schema types to and from an ontology such mappings could be used during invocation, particularly when mediation is required. To accomplish semantic annotation, SAWSDL defines extension attributes that can be applied both to WSDL elements and to XML Schema elements.

Semantic annotations are references from an element within a WSDL or XML Schema document to a concept in an ontology or to a mapping. This specification defines annotation mechanisms for relating the constituent structures of WSDL input and output messages to concepts defined in an outside ontology. Similarly, it defines how to annotate WSDL operations and interfaces. Further, it defines an annotation mechanism for specifying the structural mapping of XML Schema types to and from an ontology by means of a reference to a mapping definition. The annotation mechanism is independent of the ontology expression language and this specification requires no particular ontology language. It is also independent of mapping languages and does not restrict the possible choices of such languages.

Thursday, April 12, 2007 (Permalink)

Opera Software has released version 9.2 of their namesake free-beer web browser for Windows, Mac, and Linux, FreeBSD, and Solaris. This release "introduces Speed Dial and Developer Tools." Security bugs are also fixed, and all users should upgrade. Opera supports XML, CSS, and XSLT. 9.10 adds phishing protection.

Wednesday, April 11, 2007 (Permalink)

Altsoft N.V. has released Xml2PDF 2007, a $49 payware Windows program for converting XSL-FO, SVG, WordML, and XHTML documents into PDF files. New features in this release include:

Improved XSL-FO 1.1 support including bookmarks, multiple flows, and floats.
XPS output
PostScript output
DocX and Word 2007 XML input
Direct GDI+ print and preview function.
Can merge several documents of different formats into one output document
Use any source document as a background.

This release should be faster too.

Tuesday, April 10, 2007 (Permalink)

The Apache Velocity team Velocity DocBook Framework 1.0 released "It is intended to help creating high-quality documentation in the Docbook format which can be used online or as PDF for print out." It's not immediately clear what this does that the DocBook XSL stylesheets don't.

The Apache Project has also released Velocity Engine 1.5, an open source template engine written in Java. Velocity

permits anyone to use a simple yet powerful template language to reference objects defined in Java code.

When Velocity is used for web development, Web designers can work in parallel with Java programmers to develop web sites according to the Model-View-Controller (MVC) model, meaning that web page designers can focus solely on creating a site that looks good, and programmers can focus solely on writing top-notch code. Velocity separates Java code from the web pages, making the web site more maintainable over its lifespan and providing a viable alternative to Java Server Pages (JSPs) or PHP.

Velocity's capabilities reach well beyond the realm of the web; for example, it can be used to generate SQL, PostScript and XML (see Anakia for more information on XML transformations) from templates. It can be used either as a standalone utility for generating source code and reports, or as an integrated component of other systems. For instance, Velocity provides template services for the Turbine web application framework, together resulting in a view engine facilitating development of web applications according to a true MVC model.

Finally, the Apache Project has released VelocityTools 1.3, "a collection of Velocity subprojects with a common goal of creating tools and infrastructure for building both web and non-web applications using the Velocity template engine."

Sunday, April 8, 2007 (Permalink)

The W3C Web Services Activity has published new working drafts of Web Services Policy 1.5 - Guidelines for Policy Assertion Authors and Web Services Policy 1.5 - Primer. According to the primer,

Web services are being successfully used for interoperable solutions across various industries. One of the key reasons for interest and investment in Web services is that they are well-suited to enable service-oriented systems. XML-based technologies such as SOAP, XML Schema and WSDL provide a broadly-adopted foundation on which to build interoperable Web services. The WS-Policy and WS-PolicyAttachment specifications extend this foundation and offer mechanisms to represent the capabilities and requirements of Web services as Policies.

Service metadata is an expression of the visible aspects of a Web service, and consists of a mixture of machine- and human-readable languages. Machine-readable languages enable tooling. For example, tools that consume service metadata can automatically generate client code to call the service. Service metadata can describe different parts of a Web service and thus enable different levels of tooling support.

First, service metadata can describe the format of the payloads that a Web service sends and receives. Tools can use this metadata to automatically generate and validate data sent to and from a Web service. The XML Schema language is frequently used to describe the message interchange format within the SOAP message construct, i.e. to represent SOAP Body children and SOAP Header blocks.

Second, service metadata can describe the ‘how’ and ‘where’ a Web service exchanges messages, i.e. how to represent the concrete message format, what headers are used, the transmission protocol, the message exchange pattern and the list of available endpoints. The Web Services Description Language is currently the most common language for describing the ‘how’ and ‘where’ a Web service exchanges messages. WSDL has extensibility points that can be used to expand on the metadata for a Web service.

Third, service metadata can describe the capabilities and requirements of a Web service, i.e. representing whether and how a message must be secured, whether and how a message must be delivered reliably, whether a message must flow a transaction, etc. Exposing this class of metadata about the capabilities and requirements of a Web service enables tools to generate code modules for engaging these behaviors. Tools can use this metadata to check the compatibility of requesters and providers. Web Services Policy can be used to represent the capabilities and requirements of a Web service.

Web Services Policy is a machine-readable language for representing the capabilities and requirements of a Web service. These are called ‘policies’. Web Services Policy offers mechanisms to represent consistent combinations of capabilities and requirements, to determine the compatibility of policies, to name and reference policies and to associate policies with Web service metadata constructs such as service, endpoint and operation. Web Services Policy is a simple language that has four elements - Policy, All, ExactlyOne and PolicyReference - and one attribute - wsp:Optional.

The W3C Web Services Activity has also published candidate recommendations of Web Services Policy 1.5 - Framework and Web Services Policy 1.5 - Attachment. According to the former,

Web Services Policy 1.5 - Framework defines a framework and a model for expressing policies that refer to domain-specific capabilities, requirements, and general characteristics of entities in a Web services-based system.

A policy is a collection of policy alternatives. A policy alternative is a collection of policy assertions. A policy assertion represents a requirement, capability, or other property of a behavior. A policy expression is an XML Infoset representation of its policy, either in a normal form or in its equivalent compact form. Some policy assertions specify traditional requirements and capabilities that will manifest themselves in the messages exchanged(e.g., authentication scheme, transport protocol selection). Other policy assertions have no wire manifestation in the messages exchanged, yet are relevant to service selection and usage (e.g., privacy policy, QoS characteristics). Web Services Policy 1.5 - Framework provides a single policy language to allow both kinds of assertions to be expressed and evaluated in a consistent manner.

Web Services Policy 1.5 - Framework does not cover discovery of policy, policy scopes and subjects, or their respective attachment mechanisms. A policy attachment is a mechanism for associating policy with one or more policy scopes. A policy scope is a collection of policy subjects to which a policy applies. A policy subject is an entity (e.g., an endpoint, message, resource, interaction) with which a policy can be associated. Web Services Policy 1.5 - Attachment [Web Services Policy Attachment] defines such policy attachment mechanisms, especially for associating policy with arbitrary XML elements [XML 1.0], WSDL artifacts [WSDL 1.1, WSDL 2.0 Core Language], and UDDI elements [UDDI API 2.0, UDDI Data Structure 2.0, UDDI 3.0]. Other specifications are free to define either extensions to the mechanisms defined in Web Services Policy 1.5 - Attachment [Web Services Policy Attachment], or additional mechanisms not covered by Web Services Policy 1.5 - Attachment [Web Services Policy Attachment], for purposes of associating policy with policy scopes and subjects.

The W3C Voice Browser Activity has released the finished recommendation of Semantic Interpretation for Speech Recognition (SISR) Version 1.0.

This document defines the process of Semantic Interpretation for Speech Recognition and the syntax and semantics of semantic interpretation tags that can be added to speech recognition grammars to compute information to return to an application on the basis of rules and tokens that were matched by the speech recognizer. In particular, it defines the syntax and semantics of the contents of Tags in the Speech Recognition Grammar Specification [SRGS].

The results of semantic interpretation describe the meaning of a natural language utterance. The current specification represents this information as an ECMAScript object, and defines a mechanism to serialize the result into XML. The W3C Multimodal Interaction Activity [MMI] is defining an XML data format [EMMA] for containing and annotating the information in user utterances. It is expected that the EMMA language will be able to integrate results generated by Semantic Interpretation for Speech Recognition.

Friday, April 6, 2007 (Permalink)

The W3C XML Processing Model Working Group has posted the third public working draft of XProc: An XML Pipeline Language. According to the introduction,

An XML Pipeline specifies a sequence of operations to be performed on a collection of XML input documents. Pipelines take zero or more XML documents as their input and produce zero or more XML documents as their output.

A pipeline consists of steps. Like pipelines, steps take zero or more XML documents as their input and produce zero or more XML documents as their output. The inputs to a step come from the web, from the pipeline document, from the inputs to the pipeline itself, or from the outputs of other steps in the pipeline. The outputs from a step are consumed by other steps, are outputs of the pipeline as a whole, or are discarded.

There are two kinds of steps: atomic steps and compound steps. Atomic steps carry out single operations and have no substructure as far as the pipeline is concerned, whereas compound steps include steps within themselves.

Standard steps include load, parse, serialize, delete, insert, XSLT, XSLT 2, XQuery, rename, namespace rename, replace, wrap, unwrap, XInclude, HTTP request, RELAX NG validate, and W3C Schema validate, Others may be defined.

Thursday, April 5, 2007 (Permalink)

Orbeon has released the Orbeon Presentation Server (OPS) 3.5.1. OPS is an open source, server-based XForms implementation that delivers standard HTML+JavaScript to clients, with a hefty does of AJAX thrown in for good measure. "This release is an update to Orbeon Forms 3.5 which brings performance improvements, the most notable one being the ability to combine JavaScript and CSS resources, as well as a series of bug-fixes." OPS is published under the LGPL.

Version 2.0 of Chiba, an open source, web-based implementation of XForms based on servlets and XSLT, has been released. Chiba enables XForms to be used in current browsers without plugins or special requirements on the client-side. According to Sebastian Schnitzenbaumer, version 2.0 adds "a client-side AJAX XForms implementation together with the existing server-side script-free XForms implementation as a fall-back and for less capable devices." Chiba is published under the artistic license.

Wednesday, April 4, 2007 (Permalink)

Matt Mullenweg has released Wordpress 2.1.3 and 2.0.10, an open source (GPL) blog engine based on PHP and MySQL. This release includes "minor bugfixes, feature enhancements, and security fixes." All 2.x users should upgrade.

Tuesday, April 3, 2007 (Permalink)

The W3C Internationalization Tag Set Working Group has released the finished Internationalization Tag Set (ITS) Version 1.0. This document defines standardized XML markup for identifying directionality, translatability, ruby text, and other common aspects of document localization and internationalization. For example, in this DocBook article an its:translate attribute indicates that the author element should not be translated:

<dbk:article
  xmlns:its="http://www.w3.org/2005/11/its" 
  xmlns:dbk="http://docbook.org/ns/docbook" 
  its:version="1.0" version="5.0" xml:lang="en">
 <dbk:info>
  <dbk:title>An example article</dbk:title>
  <dbk:author
    its:translate="no">
   <dbk:personname>

    <dbk:firstname>John</dbk:firstname>
    <dbk:surname>Doe</dbk:surname>
   </dbk:personname>
   <dbk:affiliation>
    <dbk:address>
     <dbk:email>foo@example.com</dbk:email>

    </dbk:address>
   </dbk:affiliation>
  </dbk:author>
 </dbk:info>
 <dbk:para>This is a short article.</dbk:para>
</dbk:article>

Monday, April 2, 2007 (Permalink)

The W3C XQuery Working Group has also published the first draft of XML Query (XQuery) 1.1 Requirements. Here are the requirements:

XQuery 1.1 MUST be backward compatible.
Every valid XQuery 1.0 expression MUST be valid in XQuery 1.1 and it MUST evaluate to the same result.
XQuery 1.1 MUST be compatible with XQuery 1.0 extensions developed by the XML Query Working Group, including XQuery Update Facility and XQuery 1.0 and XPath 2.0 Full-Text.
XQuery 1.1 MUST include a language facility to specify value-based grouping.
XQuery 1.1 MUST provide a mechanism to process errors raised by an expression and to return an alternative value.
This MAY be implemented by introducing a try-catch expression.
XQuery 1.1 MUST include additional library functions or an equivalent mechanism to perform formatting of numeric values.
It SHOULD be similar to the functionality provided in XSLT 2.0, such as by function format-number().
XQuery 1.1 MUST include additional library functions or an equivalent mechanism to perform formatting of date and time values.
It SHOULD be similar to the functionality provided in XSLT 2.0, such as by functions format-date(), format-time(), format-dateTime().
XQuery 1.1 MUST have a mechanism to specify default values for external variables.
XQuery 1.1 MUST provide a way to denote that an external function is non-deterministic.
XQuery 1.1 SHOULD provide a facility for positional grouping of items in a sequence according to specified partitioning conditions.
XQuery 1.1 SHOULD provide a way to iterate over a sequence by several values at a time.
XQuery 1.1 SHOULD provide a mechanism to associate ordinal numbers with the items returned by a FLWOR expression.
XQuery 1.1 SHOULD allow dynamic creation of namespace bindings.
XQuery 1.1 SHOULD have a mechanism to specify serialization parameters in the query prolog.
XQuery 1.1 SHOULD support creation of a reference to an existing node having the following properties:
a) the reference could be included in a constructed element
b) the reference can be dereferenced, returning the original node with the original node id.
XQuery 1.1 SHOULD provide additional mechanisms to specify joins between sequences. A possible approach would be to add an "outer-for" clause to the FLWOR expression to specify variable binding which is guaranteed to be bound to an empty sequence if there are no other bindings generated.
XQuery 1.1 SHOULD allow explicit type declaration for the context item.
XQuery 1.1 SHOULD support new data types introduced in XML Schema 1.1.
XQuery 1.1 MAY provide an ability to pass a function as an argument to another function and to invoke a function that has been passed as an argument.
XQuery 1.1 MAY also provide the ability to define anonymous functions e.g., lambda expressions.
XQuery 1.1 MAY add a language extension to the node constructors to specify, in a compact notation, that a node should be constructed only if its typed value would not be an empty sequence or if it would satisfy some other condition.
XQuery 1.1 MAY provide a mechanism to validate an element or document node with respect to a global named type or against non-global element declarations or types.
XQuery 1.1 MAY provide a mechanism to validate an element or document node against a named schema without importing the schema.
XQuery 1.1 MAY provide a way to compare the type of an expression to the type of another expression without exposing the type itself.
XQuery 1.1 MAY relax the restrictions on the module import feature relating to forward references and circular imports.
XQuery 1.1 MAY provide a normative way to invoke external functions and modules that are not implemented in XQuery, such as functions defined as web services or XSLT functions and templates.
XQuery 1.1 MAY extend static typing rules

The W3C XQuery Working Group has also published the final 1.0 versions of XML Query (XQuery) Requirements and XML Query Use Cases.

Finally the XQuery working group has posted the first draft of XQuery Scripting Extension 1.0 Requirements. "This document describes the requirements for the XQuery Scripting Extensions. XQuery [XQuery 1.0] is a functional language that is Turing-complete and well suited to write code that ranges from simple queries to complete applications. However, some categories of applications are more easily implemented by combining XQuery capabilities with some imperative features, such as the ability to explicitly manage internal states. The same issue stands for XQuery enriched with the [XQuery Update Facility] (hereafter, XQuery With Updates). The scripting extension is intended to overcome this problem, and allow programmers to write such applications without relying on embedding XQuery into an external language."

Friday, March 30, 2007 (Permalink)

In version 2.2, users will immediately notice the improvement in the quality of text display in all parts of OpenOffice.org. The reason for this is that the previously optional support for kerning, a technique to improve the appearance of text written in proportional fonts, has now been enabled by default. OpenOffice.org's unique pdf export function has also been enhanced with the addition of the optional creation of bookmarks feature, and support for user-definable export of form fields.

While OpenOffice.org 2.1 functions well on Microsoft's Windows Vista, version 2.2 makes use of some of the new cosmetic changes available in Vista, the new file dialogues being an example. Apple Mac users will notice a smaller download and a smaller installed size. The Apple Mac Intel version has many stability improvements, and bug fixes ranging from .ppt export to improved UNO connections. Version 2.2 now requires Mac OS X 10.4.x running X11.

Turning to some of the enhancements made to the individual components of OpenOffice.org, the Calc spreadsheet has received additional enhancements to its support for Microsoft file formats, including improved support for Pivot Tables and some specialised trigonometric functions. Base, the database component, has improved SQL editing functionality as well as a new "Queries within Queries" feature. Compatibility options for some database drivers, such as Oracle ODBC, have been improved. Impress, the presentations component, offers improvements in the handling of hidden slides which has been made more intuitive.

OpenOffice is dual licensed under the LGPL and Sun Industry Standards Source License.

The XML Apache Project has posted the first beta of Batik 1.7, an open source SVG display engine based on Java 2D. New features in 1.7 include DOM Level support, an improved WMF transcoder, more complete SMIL Animation support, and a few SVG 1.2 features including and handler elements.

Thursday, March 29, 2007 (Permalink)

Code Synthesis has released XSD/e 1.0, a free-as-in-speech (GPL) C++ schema validating XML parser for embedded environments.

Matthew Cruickshank has released Docvert 3.11, a PHP program that converts various word processor formats including Microsft Word to Oasis OpenDocument v1.0 format. From there it can optionally proceed to HTML or DocBook. PHP 5.0 or later and various plugins are required.

Dave Beckett has released the Raptor RDF Parser Toolkit 1.4.15, an open source C library for parsing the RDF/XML, N-Triples. Turtle, and Atom Resource Description Framework formats. It uses expat or libxml2 as the underlying XML parser. This release adds new serializers for Turtle and GraphViz DOT. This release updates the GRDDL parser to support the latest working draft. Raptor is dual licensed under the LGPL and Apache 2.0 licenses.

RealObjects has released PDFreactor 2.0.1544.3, a $2494 payware "formatting processor for converting XML and XHTML/HTML documents into PDF. It uses Cascading Style Sheets (CSS) to define page layout and styles" which distinguishes it from most other similar solutions which are based on XSL. SVG is also supported, and XSLT fits in somehow I don't quite understand. New features in this release include .NET and PHP APIs, HTML form and Acroform support, CSS namespace selectors, data URIs, tagged PDF, CMYK colors, 2D barcodes, images in generated content, and resizing of background images.

Andrea Marchesini has released libnxml 0.17.1, a C library for parsing, writing, and creating XML 1.0 and 1.1. This is a bug fix release. libnxml is published under the LGPL.

Kiyut has released Sketsa 4.2.1, a $49 payware SVG editor written in Java. Version 4.2.1 fixes bugs. Java 5 or later is required.

Wednesday, March 28, 2007 (Permalink)

Planamesa Software has released NeoOffice/J 2.1, a Mac port of OpenOffice 2.1 using a Java-based GUI. This release now runs on Intel Macs. New features in 2.1 include:

ISO 26300 OpenDocument (aka OpenOffice.org 2.0) file formats
Microsoft OpenXML Word documents
Microsoft Excel VBA macros
Novell's Solver for Calc
LaTeX and BibTeX export
Microsoft Works word processing documents
New database component
Enhanced sound
Native Cocoa Open and Save dialogues and other more Aqua-savvy appearances

Sunday, March 25, 2007 (Permalink)

The Mozilla Project has released Firefox 2.0.0.3 and 1.5.0.11. These releases fix security flaws and improves support for Windows Vista. All users should upgrade.

John Cowan has released TagSoup 1.0.5, an open source, Java-language, SAX parser for nasty, ugly HTML. This release fixes a major bug in comment parsing as well as several other minor issues. All users should upgrade.

Cowan has also released TagSoup 1.1. This release adds an ill-considered JAXP adapter layer from Tatu Saloranta. The problem is TagSoup is not a general purpose XML parser, and trying to use it as one will only cause trouble.

JAPISoft has released EditiX 5.2, a $99 payware cross-platform XML editor written in Java. Features include XPath location and syntax error detection, context sensitive popups based on DTD, W3C XML Schema Language, and RelaxNG schemas, XSLT and XSL-FO previews, XInclude, XML catalogs, an XSLT debugger, DocBook support, and multi-view preview. Version 5.2 a adds refactoring support. EditiX is available for Mac OS X, Linux, and Windows.

Friday, March 23, 2007 (Permalink)

I've posted the notes from today's Testing XML talk at Software Development 2007 West. There's a lot of material here. This could easily be a full two-day class; but in the ninety minute session I mostly try to hit the high points without going into too much technical detail about all the tools.

I'm flying home tonight. Regular updates should resume tomorrow,

Thursday, March 22, 2007 (Permalink)

I gave two new talks at SD 2007 West today, Web Forms 2.0 and What's New in XML in Java 5 and 6. The Web Forms 2 talk was especially fun. It covered both Web Forms 2.0 and HTML 5. I think there's a lot of pent-up demand for new HTML, which in many ways hasn't really changed since 1999. HTML has one of the better models for forward and backwards compatibility in tech. This in large part drove the Web from about 1992-1997 or so. Sadly no one has taken advantage of that to push HTML forward for a long time.

The Web is actually in better shape to introduce new markup than it has been before. Well-formedness is part of that. Separation of syntax from semantics is critical. Sadly the WhatWG is actively hostile to well-formedness and XML. They are mired in the outdated 1980s model that requires all semantics to be defined up front. That's crippling them in several ways. The second piece we need to really let the Web jump forward is a simple change in the browsers. They need to start allowing CSS to style and script unrecognized elements, not just ones they know about in advance. The key is to let the browsers handle what they understand, and provide stylesheets and scripts for the rest. Currently they simply throw away the pieces they don't understand. That's better than rejecting the document completely, but it's not as good as allowing stylesheets and scripts to woork on those parts. This would enable us to start using nav, m, header, footer, meter, and many other proposed elements now without first revving the browsers. Of course, this really does require end-tags on everything. However, it's much more powerful and simpler. The WhatWG approach of specifying everything in excruciating detail will take far longer and achieve less.

The alternative is to serve nothing but XML to the client and let the stylesheets do all the work. However JavaScript doesn't work with this, and we go from some predefined semantics to none.

I've also posted the notes from today's Effective XML, class at Software Development 2007 West.

Wednesday, March 21, 2007 (Permalink)

I've posted the notes from today's RSS, Atom, APP and All That and RELAX NG classes at Software Development 2007 West. There was quite a lot of interest in both of these technologies, though from a very self-selected audience.

Also, my apologies to everyone who showed up for the bird walk this morning that is actually tomorrow morning (Thursday). There's actually an important software development lesson to be learned there, and I'll have more to say about that on The Cafes in due course.

Monday, March 19, 2007 (Permalink)

I've posted the notes from today's XML Fundamentals tutorial at Software Development 2007 West. Overall, it went well, though it's a large amount of material for just a half day. I need to cut out about 30 slides before trying this again. In particular I should cut a lot of the DOM slides.

Sunday, March 18, 2007 (Permalink)

Gerald Schmidt has released XML Copy Editor 1.0.9.2, a free-as-in-speech (GPL) XML editor for Windows and Linux. Features include DTD/XML Schema/RELAX NG validation, XSLT, XPath, pretty-printing, syntax highlighting, tag folding, tag completion, spell and style check, XHTML, XSL, DocBook and TEI, and Microsoft Word import and export.

Saturday, March 17, 2007 (Permalink)

Advanced Software Production Line has released libaxl 0.4.2, a C parser for XML. This release fixes bugs and improves Windows compatibility. The API has some serious problems, but the developers have already frozen it, so they won't be fixed. I recommend against this library. There are many superior alternatives available.

Friday, March 16, 2007 (Permalink)

Michael Kay has released version 8.9.0.3 of Saxon, his XSLT 2.0 and XQuery processor for Java and .NET. This is a bug fix release.

Thursday, March 15, 2007 (Permalink)

Peter Hosey has released LMX, an event based reverse XML parser for Objective C on Mac OS X. It starts at the end of the document and parses forward, rather than the other way around. "It has mostly the same API as NSXMLParser; if you're used to NSXMLParser, LMX should have no learning curve for you (other than the mind-bendingness of events coming in in the reverse order)." LMX is published under a BSD license.

Andrea Marchesini has released libnxml 0.17, a C library for parsing, writing, and creating XML 1.0 and 1.1. Version 0.17 adds entity support. libnxml is published under the LGPL.

Wednesday, March 14, 2007 (Permalink)

Norm Walsh has posted the second release candidate of DocBook 5.0. DocBook 5 is "a significant redesign that attempts to remain true to the spirit of DocBook." The schema is written in RELAX NG. A DTD and W3C XML Schema generated from the RELAX NG schema are also available. There's also a Schematron schema "that validates some extra-grammatical DocBook constraints. These patterns are also present directly in the RELAX NG Grammar and some validators, for example MSV, can perform both kinds of validation at the same time." Changes in this beta are quite minor, and empty glossaries and allowing inline elements in HTML table captions.

Kiyut has released Sketsa 4.2, a $49 payware SVG editor written in Java. Version 4.2 fixes bugs. Java 5 or later is required.

Tuesday, March 13, 2007 (Permalink)

The W3C Web API Working Group has posted the last call working draft of The XMLHttpRequest Object.

The XMLHttpRequest object implements an interface exposed by a scripting engine that allows scripts to perform HTTP client functionality, such as submitting form data or loading data from a server.

The name of the object is XMLHttpRequest for compatibility with the web, though each component of this name is potentially misleading. First, the object supports any text based format, including XML. Second, it can be used to make requests over both HTTP and HTTPS (some implementations support protocols in addition to HTTP and HTTPS, but that functionality is not covered by this specification). Finally, it supports "requests" in a broad sense of the term as it pertains to HTTP; namely all activity involved with HTTP requests or responses for the defined HTTP methods.

Comments are due by April 2.

The W3C Semantic Web Best Practices and Deployment Working Group and HTML Working Groups have published a new working draft of RDFa Primer 1.0.

Current web pages, written in HTML, contain significant inherent structured data. When publishers can express this data more completely, and when tools can read it, a new world of user functionality becomes available, letting users transfer structured data between applications and web sites. An event on a web page can be directly imported into a user's desktop calendar. A license on a document can be detected so that the user is informed of his rights automatically. A photo's creator, camera setting information, resolution, and topic can be published as easily as the original photo itself, enabling structured search and sharing.

RDFa is a syntax for expressing this structured data in XHTML. The rendered, hypertext data of XHTML is reused by the RDFa markup, so that publishers don't repeat themselves. The underlying abstract representation is RDF, which lets publishers build their own vocabulary, extend others, and evolve their vocabulary with maximal interoperability over time. The expressed structure is closely tied to the data, so that rendered data can be copied and pasted along with its relevant structure.

Here's a syntax example from the draft:

<h1 property="dc:title">Vacation in the South of France</h1>
<h2>created 
  by <span property="dc:creator">Mark Birbeck</span>
  on <span property="dc:date" type="xsd:date"
           content="2006-01-02">
    January 2nd, 2006
     </span>
</h2>

The thing that jumps out at me are the use of namespace prefixes in attribute values. Haven't we learned by now that this is a bad idea?

Monday, March 12, 2007 (Permalink)

The Apache XML Project has released XML Security v1.4, an implementation of security related XML standards including Canonical XML, XML Encryption, and XML Signature Syntax and Processing. A compatible Java Cryptography Extension provider is required. Version 1.4 implements the Java XML Encryption API from JSR 105.

Oleg Paraschenko has released XSieve 1.2, "an XML transformation language based on combination of XSLT and Scheme (a Lisp dialect). XSieve make XSLT to be a general-purpose language." XSieve allows XSLT extension functions to be written in Scheme. Since XSLT and Scheme are both functional languages, that may be a better match than extension functions written in imperative languages like Java and C. This release adds two new functions:

x:call calls any XPath function,
x:<=> compares the nodes in the document order.

Saturday, March 10, 2007 (Permalink)

The W3C HTML Working Group has posted the first public working draft of CURIE Syntax 1.0: A syntax for expressing Compact URIs. This is modeled after namespace URIs and qualified names. In brief, it defines a prefix for a known base URI, then apends a colon and a local part. For example, the CURIE cafe:tradeshows.xml could be shorthand for http://www.cafeaulait.org/tradeshows.xml if the prefix cafe were mapped to the URL http://www.cafeaulait.org/. Exactly how prefixes are mapped to URIs is left to the specification of the documents in which the CURIEs appear. However if the CURIEs are in an XML document, then the namespaces in scope define the prefix mappings. The default namespace can be used for prefix-less CURIEs.

Frankly I'm surprised to see this. Namespaces and the namespace syntax are one of the notable failures of the XML world. Why someone would choose to imitate this now that we know better is beyond me. Based on experience with namespaces, I predict that the problems of cutting and pasting CURIEs from one context to another are going to be especially problematic. Well, we've learned to live with (if not exactly like) namespaces. I guess we can get used to this.

Friday, March 9, 2007 (Permalink)

Florent Georges has released XTS, a set of open source stylesheets that forms a unit testing framework for XSLT 2 and XQuery.

Abel Braaksma has launched X*X*X Wiki, a wiki for XSLT, XPath and XQuery. There's very little useful content there now, but it can be added.

Oracle has released Berkeley DB XML 2.3.10, an open source "application-specific, embedded data manager for native XML data" based on Berkeley DB. It supports the recent proposed recommendations of XQuery 1.0 and XPath 2.0. It includes C++, Java, Perl, Python, TCL and PHP APIs. According to the announcement, This release of Berkeley DB XML fixes bugs. Berkeley DB XML is published under a custom, viral license that is compatible with most major open source licenses.

Thursday, March 8, 2007 (Permalink)

The W3C is restarting its own effort to define a new version of HTML. The initial chairs are Dan Connolly from the W3C and Chris Wilson, Platform Architect of the Internet Explorer Platform team at Microsoft. Interestingly Microsoft has stayed out of the similar WhatWG efforts to develop the next version of HTML. This time the effort is supposed to be more public and open than in the past.

The W3C Cascading Style Sheets working group has posted the third public working draft of CSS3 Text Effects Module. "This CSS3 module defines properties for text manipulation and specifies their processing model. It covers line breaking, justification and alignment, white space handling, text decoration and text transformation." Properties defined in this spec include:

word-break
hyphenate
text-wrap
word-wrap
text-align
hanging-punctuation
text-emphasis
text-indent
text-shadow
text-outline
text-align-last
text-justify
word-spacing
letter-spacing
text-kashida-space

Bill de hÓra and Joe Gregorio have posted the fourteenth public working draft of The Atom Publishing Protocol, a REST-based system for communicating with weblog servers. Changes in this draft are relatively minor and editorial in nature.

The Mozilla Project has released Camino 1.0.4, a Mac OS X web browser based on the Gecko 1.8 rendering engine and the Quartz GUI toolkit. Camino is free for Mac OS X 10.2 through 10.4. It supports pretty much all the technologies that Mozilla does: HTML, XHTML, CSS, XML, XSLT, etc. 1.0.4 is mostly a bug fix release, including fixes for several security problems. All users should upgrade. Mac OS X 10.2 or later is required.

Wednesday, March 7, 2007 (Permalink)

IBM developerWorks has published my latest article: Configure Apache to send the right MIME type for XHTML. The basic problem is this: XHTML documents are supposed to be tagged as application/xhtml+xml when sent over HTTP. However Internet Explorer doesn't like that and won't display such a document. This article explains how to hack the problem by tagging a document as text/html for IE and application/xhtml+xml for everyone else.

Tuesday, March 6, 2007 (Permalink)

Macromates has released Textmate 1.5.5, a €39 payware text editor for Mac OS X. This is a bug fix release. A lot of people like TextMate and I bought a copy to see what all the fuss was about, but so far I haven't been converted from BBEdit. The lack of multifile search and replace just killed it for me, though perhaps I can fake that using projects. BBEdit's interface for this is imperfect (and got worse in the 8.x series) but it does seem superior to TextMate's. Likely, someone who didn't have years of experience getting used to BBEdit's quirks might feel differently. Indeed my biggest complaints about BBEdit are usually when they change the interface from one version to the next.

Monday, March 5, 2007 (Permalink)

The Mozilla Project has released SeaMonkey 1.1.1. SeaMonkey is the continuation of the integrated Mozilla suite, and has XML support roughly equivalent to Firefox 1.5 (e.g. XML, XSLT, CSS, XHTML, etc.) It also bundles an e-mail client, web editor, browser, and more into one application. This release fixes security bugs. All users should upgrade.

Sunday, March 4, 2007 (Permalink)

RealObjects has released PDFreactor 2.0.1544, a $2494 payware "formatting processor for converting XML and XHTML/HTML documents into PDF. It uses Cascading Style Sheets (CSS) to define page layout and styles" which distinguishes it from most other similar solutions which are based on XSL. SVG is also supported, and XSLT fits in somehow I don't quite understand. New features in this release include .NET and PHP APIs, HTML form and Acroform support, CSS namespace selectors, data URIs, tagged PDF, CMYK colors, 2D barcodes, images in generated content, and resizing of background images.

Saturday, March 3, 2007 (Permalink)

Matt Mullenweg has released Wordpress 2.1.2, an open source (GPL) blog engine based on PHP and MySQL. This is an urgent security release. All 2.1.1 users should upgrade immediately. It seems a cracker broke into the distro servers at some point and modified the files to leave a backdoor open. 2.0.x was not affected.

Benjamin Pasero has posted a preview release of of RSSOwl 2.0, an open source RSS reader written in Java and based on the SWT toolkit.

Friday, March 2, 2007 (Permalink)

The W3C GRDDL Working Group has posted the last call working draft of Gleaning Resource Descriptions from Dialects of Languages (GRDDL). According to the abstract,

GRDDL is a mechanism for Gleaning Resource Descriptions from Dialects of Languages. This GRDDL specification introduces markup for declaring that an XML document includes gleanable data and for linking to an algorithm, typically represented in XSLT, for gleaning the resource descriptions from the document.

The markup includes a namespace-qualified attribute for use in general-purpose XML documents and a profile-qualified link relationship for use in valid XHTML documents. The GRDDL mechanism also allows an XML namespace document (or XHTML profile document) to declare that every document associated with that namespace (or profile) includes gleanable data and for linking to an algorithm for gleaning the data.

The result of such a glean is an RDF description of the document.

Thursday, March 1, 2007 (Permalink)

XimpleWare has released VTD-XML 2.0, a free (GPL) non-extractive Java/C/C# library for processing XML that supports XPath. This appears to be an example of what Sam Wilmot calls "in situ parsing". In other words, rather than creating objects representing the content of an XML document, VTD-XML just passes pointers into the actual, real XML. (These are the abstract pointers of your data structures textbook, not C-style addresses in memory. In this cases the pointers are int indexes into the file.) You don't even need to hold the document in memory. It can remain on disk. This should improve speed and memory usage, but I haven't verified that, and I don't trust their own benchmarks. Version 2.0 introduces native XML indexing, though it's hardly the first product to do that. I remain troubled by the developer's hype, as well as the fact that this is still not a minimally conforming XML parser. :-(

Wednesday, February 28, 2007 (Permalink)

Macromates has released Textmate 1.5.4 (1324), a €39 payware text editor for Mac OS X. This is a bug fix release of which my favorite is, "TextMate no longer pays tribute to human sacrifices, rape, nor does it show a picture of the God of the deaths in your dock." I'm not quite sure what this means, but I vaguely recall there used to be a picture of Neptune or some such in the splash screen at startup.

A lot of people like TextMate and I bought a copy to see what all the fuss was about, but so far I haven;t been converted from BBEdit. The lack of multifile search and replace just killed it for me, though perhaps I can fake that using projects. BbEdit's interface for this is imperfect (and got worse in the 8.x series) but it does seem superior to TextMate's. Likely, someone who didn't have years of experience getting used to BBEdit's quirks might feel differently. Indeed my biggest complaints about BBEdit are usuallyt when they change the interface from one version to the next.

Tuesday, February 27, 2007 (Permalink)

The W3C XForms working group has posted the last call working draft of XForms 1.1. Changes since 1.0 include:

A new namespace URI, http://www.w3.org/2004/xforms/
power, luhn, current, choose, id and property XPath extension functions
An email address datatype
An ID card number datatype
A prompt action element
An xforms-close event
An xforms-submit-serialize event
Inline rendering of non-text media types

Comments are due by April 5.

Monday, February 26, 2007 (Permalink)

Orbeon has released the Orbeon Presentation Server (OPS) 3.5. OPS is an open source, server-based XForms implementation that delivers standard HTML+JavaScript to clients, with a hefty does of AJAX thrown in for good measure. OPS is published under the LGPL.

Sunday, February 25, 2007 (Permalink)

Gerald Schmidt has released XML Copy Editor 1.0.9,a free-as-in-speech (GPL) XML editor for Windows and Linux "with DTD/XML Schema/RELAX NG validation, XSLT, XPath, pretty-printing, syntax highlighting, folding, tag completion/locking and lossless import/export of Microsoft Word documents." This release adds incremental find and replace.

Advanced Software Production Line has released libaxl 0.4.1, a C parser for XML. This release fixes bugs and improves Windows compatibility. The API has some serious problems, but the developers have already frozen it, so they won't be fixed. I recommend against this library. There are many superior alternatives available.

Saturday, February 24, 2007 (Permalink)

The Mozilla Project has released Firefox 2.0.0.2 and 1.5.0.10. These releases fix security flaws and improves support for Windows Vista. All users should upgrade.

Friday, February 23, 2007 (Permalink)

I've mucked around with the XSLT scripts that generate the Atom feeds to work around some bugs in Sage and other feed readers that don't properly handle xml:base. Furthermore, permalinks to news items will now be available in the feed, though I don't yet know if feed readers will notice them or use them. (They use <link rel="permalink" href="..."/>). This may, however, trigger other bugs in non-conformant feed readers. Possibly I should just make the permalink the default link. Holler if you notice any problems.

Thursday, February 22, 2007 (Permalink)

Matt Mullenweg has released Wordpress 2.1.1 and 2.0.9. WordPress is an open source (GPL) blog engine based on PHP and MySQL. Both of these releases spackle over specific security holes, without fixing the underlying architectural flaws that caused the problems in the first place. Increasingly it seems possible to run an entire site in WordPress, if only someone could convince Matt to stop using GET for unsafe operations. :-(

Wednesday, February 21, 2007 (Permalink)

Effective XML is back in stock at various bookstores including Amazon and Barnes & Noble There seems to have been some sort of glitch in the system where it got listed as out of print for a while, but that wasn't accurate. It is still available and should be for the foreseeable future.

Wednesday, February 20, 2007 (Permalink)

John Cowan has released TagSoup 1.0.4, an open source, Java-language, SAX parser for nasty, ugly HTML. This release better handles comments inside script and style elements.

Monday, February 19, 2007 (Permalink)

The W3C HTML Working Group has posted a new working draft of XML Events 2. "The XML Events module defined in this specification provides XML languages with the ability to uniformly integrate event listeners and associated event handlers with Document Object Model (DOM) Level 2 event interfaces [DOM2EVENTS]. The result is to provide an interoperable way of associating behaviors with document-level markup." According to Mark Birbeck,

Although it was produced by the HTML Working Group (editors of this version being Shane McCarron and myself), it has features that are specifically geared towards future versions of XForms.

Some of the interesting new features are the ability to register and remove handlers at run-time, and the ev:script element, which allows action handlers to be created using script. This means that script can be interspersed with other action handlers.

Saturday, February 17, 2007 (Permalink)

ActiveState has posted released Komodo 4.0.2, a $295 payware IDE for Perl, Ruby, PHP, Python, Tcl, and XSLT. Komodo runs on Mac OS X 10.3 and later, Linux, and Windows. They've also release Komodo Edit which looks like a free-as-in-lite-beer version of Komodo.

Friday, February 16, 2007 (Permalink)

The W3C Voice Browser, Web APIs and Web Application Formats (WAF) Working Groups have posted a new version of Enabling Read Access for Web Resources (formerly Authorizing Read Access to XML Content Using the <?access-control?> Processing Instruction 1.0). According to the draft,

A plethora of applications and data are exposed as XML over HTTP. User agents such as Voice and Web browsers fetch and execute applications but restrict the XML content accessible to those applications merely to the URLs located in the same domain as the application. To take advantage of the rich XML content available on the Web, application developers must resort to proxying the content through the domain hosting their application thereby increasing overhead and limiting scalability.

This note describes a mechanism being used in the industry that allows a content provider to use a processing instruction embedded within the XML prolog to specify the access policy of that content. In this model a user agent can safely extend the sandbox in which it has restricted the application to include access to the XML content if and only if the specified policy grants permission.

The processing instruction is designed explicitly to enable extending the sandbox and is not designed as a restriction mechanism. The expectation is that the user agent's default policy is more strict. Therefore, it is always safe to fall-back to default policy in the event of an error.

ISSUE: The Task Force would like to enable this mechanism as an HTTP header (e.g. Content-Access-Control). We expect to apply this change to a later draft.

Thursday, February 15, 2007 (Permalink)

The W3C Web Services Activity. has posted the candidate recommendation of Semantic Annotations for WSDL. According to the draft,

Semantic Annotations for WSDL and XML Schema (SAWSDL) defines how to add semantic annotations to various parts of a WSDL document such as input and output message structures, interfaces and operations. The extension attributes defined in this specification fit within the WSDL 2.0 extensibility framework. For example, it defines a way to annotate WSDL interfaces and operations with categorization information that can be used to publish a Web service in a registry. The annotations on schema types can be used during Web service discovery and composition. In addition, SAWSDL defines an annotation mechanism for specifying the structural mapping of XML Schema types to and from an ontology such mappings could be used during invocation, particularly when mediation is required. To accomplish semantic annotation, SAWSDL defines extension attributes that can be applied both to WSDL elements and to XML Schema elements.

Semantic annotations are references from an element within a WSDL or XML Schema document to a concept in an ontology or to a mapping. This specification defines annotation mechanisms for relating the constituent structures of WSDL input and output messages to concepts defined in an outside ontology. Similarly, it defines how to annotate WSDL operations and interfaces. Further, it defines an annotation mechanism for specifying the structural mapping of XML Schema types to and from an ontology by means of a reference to a mapping definition. The annotation mechanism is independent of the ontology expression language and this specification requires no particular ontology language. It is also independent of mapping languages and does not restrict the possible choices of such languages.

Wednesday, February 14, 2007 (Permalink)

The W3C CSS Working Group has posted a new working draft of CSS3 module: Generated Content for Paged Media that "describes how CSS style sheets can express named strings, leaders, cross-references, footnotes, endnotes, running headers and footers, named flows, new counter styles, page and column floats, hyphenation, bookmarks, change bars, continuation markers, named page lists, and generated lists."

The W3C CSS Working Group has resurrected Behavioral Extensions to CSS. "Behavioral Extensions provide a way to link to binding technologies, such as XBL, from CSS style sheets. This allows bindings to be selected using the CSS cascade, and thus enables bindings to transparently benefit from the user style sheet mechansim, media selection, and alternate style sheets." In brief this proposes a new binding property that can locate rendering descriptions in another document:

input[type="checkbox"] {
  binding: url("http://example.org/htmlBindings.xml#checkbox");
}

The W3C Web Application Formats Working Group has posted a working draft of Widgets 1.0 Requirements. The goal is to

standardize the way client-side web applications (widgets) are to be scripted, digitally signed, secured, packaged and deployed in a way that is device independent.

The type of web applications that are addressed by this document are usually small client-side applications for displaying and updating remote data, packaged in a way to allow a single download and installation on a client machine. The application may execute outside of the typical web browser interface. Examples include clocks, stock tickers, currency converters, news readers, games and weather forecasters. Some existing industry solutions go by the names "widgets", "gadgets" or "modules".

The W3C Voice Browser Activity has published the proposed recommendation of Semantic Interpretation for Speech Recognition (SISR) Version 1.0.

This document defines the process of Semantic Interpretation for Speech Recognition and the syntax and semantics of semantic interpretation tags that can be added to speech recognition grammars to compute information to return to an application on the basis of rules and tokens that were matched by the speech recognizer. In particular, it defines the syntax and semantics of the contents of Tags in the Speech Recognition Grammar Specification [SRGS].

The results of semantic interpretation describe the meaning of a natural language utterance. The current specification represents this information as an ECMAScript object, and defines a mechanism to serialize the result into XML. The W3C Multimodal Interaction Activity [MMI] is defining an XML data format [EMMA] for containing and annotating the information in user utterances. It is expected that the EMMA language will be able to integrate results generated by Semantic Interpretation for Speech Recognition.

Tuesday, February 13, 2007 (Permalink)

IBM's developerWorks has published Ten predictions for XML in 2007. In this article, I peer into my crystal ball, and predict what we're likely to see happen to XML over the coming year. In brief:

2007 is shaping up to be the most exciting year since the community drove off the XML highway into the Web services swamp half a decade ago. XQuery, Atom, Atom Publishing Protocol (APP), XProc, and GRDDL are all promising new power. Some slightly older technologies like XForms and XSLT are having new life breathed into them. 2007 will be a very good year to work with XML.

Monday, February 12, 2007 (Permalink)

Michael Kay has released version 8.9 of Saxon, his XSLT 2.0 and XQuery processor for Java and .NET. Saxon can now compile a query to Java source code. It includes a Saxon specific Ant task. This release also adds support for XInclude.

Sunday, February 11, 2007 (Permalink)

Recordare has released Dolet 3.6 for Finale, a $129.95 payware plug-in for reading and writing MusicXML files. This release fixes assorted bugs. Finale is required.

Friday, February 9, 2007 (Permalink)

The Mozilla Project has posted the second alpha of Firefox 3.0 for Mac, Linux, and Windows. This release should be ACID2 compliant for the first time, though I haven't yet tested it. It also implements the Web Apps 1.0 API for changing stylesheets and supports CSS 2.1's inline-blocks and inline-tables. Furthermore, "XML documents can now be rendered as they're downloaded instead of only after the full document has been loaded." JavaScript and DOM are also more standard. Windows 2000 or later and Mac OS X 10.3.9 or later or Linux are required. Windows 98 and earlier and Mac OS X 10.2 and earlier are no longer supported. Final release is not expected for another year.

Thursday, February 8, 2007 (Permalink)

Joerg Moebius has posted a new alpha release of yax, an open source (LGPL) implementation of XProc. Components include:

Pipeline
Choose/When/Otherwise
Try/Group/Catch
XSLT
XInclude
Load
Store
Identity
Parameter
Pipeline-Library

Wednesday, February 7, 2007 (Permalink)

Notation Software has released Notation Composer, a $150 notation editor and MIDI sequencer for Windows that can convert MIDI files into music notation and MusicXML.

Myriad Software has released PDFtoMusic Pro, a $199 payware product that "turns PDF files created by notation programs (e.g. they have vector graphics, font data, etc., not just pixels from an image) into MusicXML files. This is huge for unlocking the sheet music created in older programs, or those that still don't support MusicXML, and getting them into more modern, open programs."

The Big Faceless Organization has released the Big Faceless Report Generator 1.1.34, a $1200 payware Java application for converting XML documents to PDF. Unlike most similar tools it appears to be based on HTML and CSS rather than XSL Formatting Objects. This is mostly a bug fix release. Java 1.2 or later is required.

Kiyut has released Sketsa 4.1, a $49 payware SVG editor written in Java. Version 4.1 adds Union, Subtract, Intersect, ExclusiveOr, Path Combine, and Path Break. Java 5 or later is required.

Tuesday, February 6, 2007 (Permalink)

John Cowan has released TagSoup 1.0.3, an open source, Java-language, SAX parser for nasty, ugly HTML. This release should now run on pre-1.6 VMs.

Monday, February 5, 2007 (Permalink)

The W3C WebCGM Working Group has released WebCGM 2.0, an updated version of the ISO Computer Graphics Metafile standard (ISO/IEC 8632:1999). "WebCGM 2.0 adds a DOM (API) specification for programmatic access to WebCGM objects, and a specification of an XML Companion File (XCF) architecture, for externalization of non-graphical metadata. WebCGM 2.0, in addition, builds upon and extends the graphical and intelligent content of WebCGM 1.0, delivering functionality that was forecast for WebCGM 1.0, but was postponed in order to get the standard and its implementations to users expeditiously."

Sunday, February 4, 2007 (Permalink)

John Cowan has released TagSoup 1.0.2, an open source, Java-language, SAX parser for nasty, ugly HTML. This release:

Removes the Version attribute from the html element
trims leading and trailing hyphens from comments
Adds --output-encoding switch to control the encoding
Does not generate character references when the output encoding is Unicode.
Compresses whitespace and strips junk from public identifiers

Stefan Champailler has released DTDDoc 1.1.0, a JavaDoc like tool for creating HTML documentation of document type definitions from embedded DTD comments. It includes an Ant task and a Maven 2 plug-in. This release adds DTD source highlighting. DTDDoc is published under an MIT license.

ActiveState has released Komodo 4.0.1, a $295 payware IDE for Perl, Ruby, PHP, Python, Tcl, AJAX, and XSLT. Komodo runs on Mac OS X 10.3 and later, Linux, and Windows.

Saturday, February 3, 2007 (Permalink)

Clever Age has released the OpenXML Translator 1.0, a Microsoft Word 2007 plug-in that enables Word to read and write OpenOffice Open Document Format (ODF) documents.

The converter is based on XSL transformations between two XML formats, along with some pre- and post-processing to manage the packaging (zip / unzip), schema incompatibility processings and the integration into Microsoft Word. We chose to use an Open Source development model that allows developers from all around the world to participate & contribute to the project. Along with the Add-in for Microsoft Word, we also provide a command line translator that allows doing batch conversions. This translator could also be run on the server side for certain scenarios.

The translator is published under the BSD license.

Friday, February 2, 2007 (Permalink)

Per Bothner has released Qexo 1.9.1, an XQuery to Java byte code compiler. Qexo is published under the X11/MIT license.

RenderX has released version 4.9 of XEP, its payware XSL Formatting Objects to PDF and PostScript converter. XEP also supports part of Scalable Vector Graphics (SVG) 1.1. "Major achievements have been made in the AFP backend: multilingual support (Latin, Western European, Hebrew, and Cyrillic character sets), SVG bullets, new SVG primitives with G:OCA (polyline, elliptical arc, Bezier curve and polycurve), SVG text, SVG transformations, viewbox and nested svg:svg elements, improved support for SVG color, extended support for WordArt, Barcodes generation with BC:OCA or G:OCA, Codabar, Code2of5 and other types. Improvements have been introduced to XEP Assistant user interface." The basic client is $299.95. The developer edition with an API is $999.95. The server version is $3999.95.

Thursday, February 1, 2007 (Permalink)

Julian Graham has posted SDOM 0.4.1, a DOM Level 3 implementation for Scheme. This is designed as an extension of Oleg Kiselyov's SXML. According to Kiselyov, "SXML is an abstract syntax tree of an XML document. SXML is also a concrete representation of the XML Infoset in the form of S-expressions." SDOM is free software, published under the GPL.

Kiyut has released Sketsa 4.0, a $49 payware SVG editor written in Java. Version 4.0 is built on top of the NetBeans platform and features a spanking new user interface. Java 5 or later is required.

Dave Beckett has released the Raptor RDF Parser Toolkit 1.4.14, an open source C library for parsing the RDF/XML, N-Triples. Turtle, and Atom Resource Description Framework formats. It uses expat or libxml2 as the underlying XML parser. This release adds new serializers for Turtle and GraphViz DOT. The GRDDL parser can now recursively traverse namespace and profile URIs. Raptor is dual licensed under the LGPL and Apache 2.0 licenses.

Wednesday, January 31, 2007 (Permalink)

IBM's developerWorks has published Pull parsing XML in PHP, an introductory tutorial about PHP5's new XMLReader class:

PHP 5 introduced XMLReader, a new class for reading Extensible Markup Language (XML). Unlike SimpleXML or the Document Object Model (DOM), XMLReader operates in streaming mode. That is, it reads the document from start to finish. You can begin to work with the content at the beginning before you see the content at the end. This makes it very fast, very efficient, and very parsimonious with memory. The larger the documents you need to process, the more important this is.

Unlike the Simple API for XML (SAX), XMLReader is a pull parser rather than a push parser. This means that your program is in control. Rather than being told what the parser sees when the parser sees it, you tell the parser when to go fetch the next piece of the document. You request content rather than react to it. Another way of thinking about it: XMLReader is an implementation of the Iterator design pattern rather than the Observer design pattern.

Tuesday, January 30, 2007 (Permalink)

Mulberry Tech has posted the Call for Participation for Extreme Markup Languages 2007 which takes place from August 7-10, 2007 in Montréal. Extreme is a technical conference devoted to markup, markup languages, markup systems, markup applications, and software for manipulating and exploiting markup. According to B. Tommie Usdin, "Extreme is an open marketplace of theories about markup and all the things that they support or that support them: the difficult cases in publishing, linguistics, transformation, searching, indexing, and storage and retrieval. At Extreme, markup enthusiasts gather each year to trade in ideas, not to convince management to buy new stuff. At Extreme we push the edges of markup theory & practice." Typical discussions topics include:

Is tag soup poisonous?
Will the Semantic Web ever take off? Does it matter?
Is draconian error handling harmful?
Is overlap evil?
Is XML acidic or salty?
Are grammars really the right way to validate documents?

Stefan Champailler has released DTDDoc 1.0, a JavaDoc like tool for creating HTML documentation of document type definitions from embedded DTD comments. It includes an Ant task and a Maven 2plug-in. DTDDoc is published under an MIT license.

Gerald Schmidt has released XML Copy Editor 1.0.8.9,a free-as-in-speech (GPL) XML editor for Windows and Linux "with DTD/XML Schema/RELAX NG validation, XSLT, XPath, pretty-printing, syntax highlighting, folding, tag completion/locking and lossless import/export of Microsoft Word documents." This release adds support for NewsML, XML Topic Maps, and OpenLaszlo. It also adds Slovak and Swedish localizations.

Code Synthesis has released xsd 2.3.1, a free-as-in-speech (GPL) W3C XML Schema language based data binding tool for C++.

Given an XML instance description (XML Schema), it generates C++ classes that represent the given vocabulary as well as parsing and serialization code (collectively called a mapping or binding).

Compared to APIs such as DOM and SAX, the generated code allows you to access the information in XML instance documents using your domain vocabulary instead of generic elements, attributes, and text. Static typing helps catch errors at compile-time rather than at run-time. Automatic code generation frees you for more interesting tasks while minimizing the effort needed to adopt your applications to changes in the document structure.

xsd supports two C++ mappings: in-memory C++/Tree and event-driven C++/Parser. The C++/Tree mapping consists of C++ classes that represent data types defined in XML Schema, a set of parsing functions that convert XML instance documents to a tree-like in-memory data structure, and a set of serialization functions that convert the in-memory representation back to XML....

The C++/Parser mapping provides parser templates for data types defined in XML Schema. Using these parser templates you can build your own in-memory representations or perform immediate processing of XML instance documents.

This release enables you to customize parsing constructors and serialization operators. It also supports some more compilers.

Monday, January 29, 2007 (Permalink)

Erik Wilde has written an XSLT 2 stylesheet that mostly implements XInclude 1.0. Content negotiation is not handled. Otherwise it should be fairly complete.

The W3C Web Application Formats Working Group has published a second last call working draft of XML Binding Language (XBL) 2.0.

This specification defines the XML Binding Language and some supporting DOM interfaces and CSS features. XBL is a mechanism for overriding the standard presentation and interactive behavior of particular elements by attaching those elements to appropriate definitions, called bindings. Bindings can be attached to elements using either CSS, the DOM, or by declaring, in XBL, that elements matching a specific selector are implemented by a particular binding. The element that the binding is attached to, called the bound element, acquires the new behavior and presentation specified by the binding.

Bindings can contain event handlers that watch for events on the bound element, an implementation of new methods and properties that become accessible from the bound element, shadow content that is inserted underneath the bound element, and associated resources such as scoped style sheets and precached images, sounds, or videos.

XBL cannot be used to give a document new semantics. The meaning of a document is not changed by any bindings that are associated with it, only its presentation and interactive behavior.

This version is a non-backwards-compatible "revision of Mozilla's XBL 1.0 language, originally developed at Netscape in 2000, and originally implemented in the Gecko rendering engine" developed by Mozilla, Opera, Google, and Apple. (Hmm, who's missing from that list?) It's supposedly less Mozilla focused, more browser independent. This is not the same as the W3C's sXBL effort, and it's not immediately clear whether work on that will continue in parallel, or if this will replace it in the W3C standards track. Either way this looks very interesting, and I hope the W3c can navigate the rocky shores of browser compatibility to get something usefully implemented.

Sunday, January 28, 2007 (Permalink)

SyncroSoft has also released <Oxygen/> 8.1, $298 payware XML editor written in Java. Oxygen supports XML, XSL, DTDs, XQuery, SVG, Relax NG, Schematron, and the W3C XML Schema Language. According to the announcement, new features in 8.1 include:

The new <oXygen/> NVDL (Namespace-based Validation Dispatching Language) editor allows you to visually edit NVDL scripts. A diagram showing the script structure and allowing navigation from a mode reference to its definition is available. When editing an NVDL script the content completion offers assistance for entering a mode reference by presenting the defined modes and for entering a new mode by presetting the modes used but not defined. Also the NVDL schema that drives the content completion was annotated, so you will get documentation for the proposals offered during editing.

A new XQuery Input View has been added. When editing an XQuery file, <oXygen/> detects the documents used as inputs and presents a simplified outline for each one. The input view can analyse documents that are stored on the local file system. You can use the Drag and Drop triggered popup menu to easily create XQuery FLWOR constructs or XPath expressions.

You can copy and paste sections of tabular data between Microsoft Excel and the XML Grid Editor. This allow an easy import of tabular data into an XML structure and, respectively, an easy export of tabular data from an XML document to Excel.

Other new features include indexing support for Berkeley XML DB, custom format of calendar dates for import operations, update to Xerces-J parser 2.9.0 and support for SVN version 1.4.

Friday, January 26, 2007 (Permalink)

Deborah Pickett has released Naxos 1.0, an XSLT 1.0 processor written in XSLT 1.0. I'm not quite sure what the point of this is, but it's amusing nonetheless.

Thursday, January 25, 2007 (Permalink)

IBM's developerWorks has published XForms in Firefox, an introductory tutorial about writing XForm:

XForms makes development of Web-deployed applications faster and easier. XForms' clean architecture makes applications more robust, more scalable, faster, and more secure. Except for one little detail, developing with XForms would be a no-brainer. That detail is that no current browsers actually support XForms out of the box. Needless to say, this severely limits what you can do with XForms and where you can deploy them.

However, there are workarounds. Browser plug-ins exist for both Windows® Internet Explorer® and Firefox that add XForms support to these market-leading browsers. XForms processors have also been written in Flash that can be deployed to any browser with a Flash runtime. Finally, there are server-side solutions that precompile all XForms markup to classic Hypertext Markup Language (HTML) and JavaScript programs.

These solutions all have something to recommend them, but for first learning XForms the simplicity of support right in the browser really helps. You can write a piece of a form and then preview it. Then you can change it a little bit more and preview it again. If the form doesn't look quite right, tweak it a bit and reload. Server-side solutions like Chiba are good for deployment, but for learning nothing beats the rapid development cycle of a browser. Therefore, in this article I focus on using the Mozilla XForms plug-in in Firefox.

Speaking of the Mozilla XForms plug-in, they have now released version 0.7.0.1. This release is compatible with Firefox 2.0 and SeaMonkey 1.1 for the first time.

Wednesday, January 24, 2007 (Permalink)

The W3C XQuery and XSL working groups have released XQuery 1.0, XSLT 2 and XPath 2 as a collection of eight related recommendations:

Now the implementation work begins. There's one good open source implementation of XQuery/XSLT 2 (Saxon), a few native XML databases that support XQuery including the open source eXist, and one product I can't quite categorize (Data Direct XQuery). No browsers support XSLT 2 nor are any likely to in the near future. Getting them to support XSLT 1 was a major struggle.

Despite the hundreds of pages of specs, XQuery is still really only half done. Updates are still necessary and may become stable later this year. Even when they're finished, I'm not sure if it will really be possible to write pure XQuery apps, or if you'll still need to use database specific code for crucial operations like defining collections. Standard Java and other APIs for talking to XQuery databases are also needed, and work is under way to produce them. Nonetheless this release is a major milestone. As Churchill once said, "This is not the end. It is not even the beginning of the end. But it is, perhaps, the end of the beginning."

Tuesday, January 23, 2007 (Permalink)

Matt Mullenweg has released Wordpress 2.1, an open source (GPL) blog engine based on PHP and MySQL. New features include:

Autosave posts
A tabbed editor to switch between WYSIWYG and code editing
XML import and export of blogs
Spell checking
Search engine privacy option to indicate a blog shouldn’t ping or be indexed
Any page can be the front page of your site
Pages can now be drafts or private.
Comment feeds include all the comments
Better internationalization and support for right-to-left languages.
Scheduled events
Image and thumbnail API allows for richer media plugins.

Increasingly it seems possible to run an entire site in WordPress, if only someone could convince Matt to stop using GET for unsafe operations. :-(

Monday, January 22, 2007 (Permalink)

Rick Jelliffe has posted a beta implementation of ISO Schematron. Schematron is an XPath based schema language that focuses on assertions rather than grammars. It is liberal (anything not forbidden is permitted) where other languages are conservative (everything not permitted is forbidden). According to Jelliffe, this release:

is the successor to the "skeleton" XSLT meta-stylesheet implementation used widely for Schematron 1.5 and 1.6. It implements all of ISO Schematron (http://www.schematron.com) except "abstract patterns", which will be folded in this month or added using a pre-processor.

Oliver Becker's skeleton design provides an XSLT API for the output templates. So it is very easy to override the default templates and make your own customized validator, if you are an experienced XSLT programmer. Existing validators built using the old skeleton API will probably work unchanged with the new one.

The site also contains two new (beta) validators built on the skeleton. Schematron SVRL generates XML output, using the Schematron Validation Report Language that is Annex D of ISO Schematron. Schematron Terminator will terminate after the first error is found. Example scripts are on the site to show how you can validate a document using SVRL output, then use a further "testing" Schematron schema to look at the validation results and report their significance or set error codes.

Comments and testers are very welcome. The code is early beta (not recommended for republishing or commercial use:) and is being revised daily in response to feedback from the Schematron-love-in mail list
http://eccnet.eccnet.com/mailman/listinfo/schematron-love-in

It is open source, with a non-viral license.

I expect it will be tested enough for serious use by the beginning of February, but it will stay in beta status until:

A small but full-coverage test suite has been created and passed

The suite passes running over Saxon 8, Saxon 6, Xerces and MSXML engines

The validator accepts the EXSLT and XSLT2 query bindings as well as XSLT1.

Issues arising from user comments and feature requests are handled. (I have already gone back over about five years of feature requests and suggestions to improve the code.)

Abstract patterns are implemented

Sunday, January 21, 2007 (Permalink)

Andres Almiray has posted the second beta of Json-lib 1.0, an open source Java library "for transforming beans, maps, collections, java arrays and XML to JSON and back again to beans and DynaBeans." It does not appear to be fully round-trippable for either XML or Java.

Saturday, January 20, 2007 (Permalink)

The Mozilla Project has released SeaMonkey 1.1. SeaMonkey is the continuation of the integrated Mozilla suite, and has XML support roughly equivalent to Firefox 1.5 (e.g. XML, XSLT, CSS, XHTML, etc.) It also bundles an e-mail client, web editor, browser, and more into one application. New features in this release include:

More Mac-native look and feel
Inline spell checking
Multiline tooltips
Context menus for the bookmarks menu and personal toolbar folder overflow menu
Message labelling has been superceded by tagging (Yay!)
Improved phishing detection

The Mozilla Project has released Firefox 2.0.0.1. This releases fixes security flaws and mostly supports Windows Vista. All 2.0.x users should upgrade.

Thursday, January 18, 2007 (Permalink)

Sylvain Hellegouarch has posted amplee 0.4.0, a Python implementation of the Atom Publishing Protocol using CherryPy 3.

Wednesday, January 17, 2007 (Permalink)

IBM's developerWorks has published XML in 2006, my wrap-up of notable news in the XML world in the just-finished year. Particularly were the heating up of the browser wars and the struggle for dominance in office file formats, but there were some smaller stories of interest too.

Tuesday, January 16, 2007 (Permalink)

Matt Mullenweg has released Wordpress 2.0.7, an open source (GPL) blog engine based on PHP and MySQL. 2.0.7 fixes still more security bugs and a very serious conflict with FeedBurner. All users should upgrade. I've already upgraded The Cafes and Mokka mit Schlag.

Monday, January 15, 2007 (Permalink)

Monkfish Software has released xmlBlueprint 4.3, a $45 payware XML editor for Windows 98 and later that features schema-based tag completion and an XPath evaluator.

Sunday, January 14, 2007 (Permalink)

The W3C XForms working group has posted the fifth public working draft of XForms 1.1. Changes since 1.0 include:

A new namespace URI, http://www.w3.org/2004/xforms/
power, luhn, current, choose, id and property XPath extension functions
An email address datatype
An ID card number datatype
A prompt action element
An xforms-close event
An xforms-submit-serialize event
Inline rendering of non-text media types

This draft "consists mainly of a merge of the previous XForms 1.1 Working Draft and XForms 1.0 Second Edition...very few changes have been made to the content of the previous version of the XForms 1.1 Working Draft"

x-port.net has released of formsPlayer 1.5.0.1049, a free-beer (e-mail address required) "set of modules designed to make it easy to build XForms processors, editors and debuggers. These processors can run on a variety of platforms, using a range of user interfaces." This release improves performance. Internet Explorer is required.

Friday, January 12, 2007 (Permalink)

The W3C Device Independence Working Group has posted a working draft of Content Selection Primer 1.0. According to the draft,

there are capabilities in other W3C and IETF specifications that allow some level of control over the particular variant of a resource that is returned to a browser in response to a request. While these capabilities can support many of the use cases commonly found on the conventional Web, practical experience has found them somewhat lacking for supporting selection of material to be used to support the myriad of different kinds of device that are now able to access the Web.

We've already noted issues with existing mechanisms where two different devices both support the same media type, but have other requirements. For example consider images encoded using the Portable Network Graphics [PNG] specification being delivered to one device that is a tiny mobile phone, and another, which is a large, personal digital assistant with a large screen. Suppose that an author is required to provide different versions of a particular image in order to satisfy design criteria for a page that must be delivered to both devices. Different variants of the image are prepared to satisfy the criteria. However, because both devices support the same image encoding, PNG, content negotiation cannot be used to provide the appropriate version in this case. The criteria used in content negotiation are simply not sufficiently fine-grained to cater for even this simple level of selection.

Solutions that support a wide variety of different types of device require the ability to make use of a wider range of different alternative content variants, and employ a much richer set of criteria in connection with selection.

Here's an example from the primer:

<sel:select>
       <sel:when expr="eg:getStyleSheetSupport() = 'excellent'">
          <link rel="stylesheet" type="text/css" href="../styles/sensational.css"/>
       </sel:when>
       <sel:when expr="eg:getStyleSheetSupport() = 'basic'">
          <link rel="stylesheet" type="text/css" href="../styles/mediocre.css"/>

       </sel:when>
   </sel:select>

Thursday, January 11, 2007 (Permalink)

Per Bothner has released Qexo 1.8.95, an XQuery to Java byte code compiler. According to Bothner, "This is basically a release candidate for Qexo 1.9. At this point, I'm concentrating on on updating documentation and the web site, rather than adding features, fixing bugs, or tuning, though if a bug is reported in time it might get fixed!" Qexo is published under the X11/MIT license.

The W3C Voice Browser Working Group has posted the first public working draft of the Speech Synthesis Markup Language Version 1.1. According to the abstract, the Speech Synthesis Markup Language "is designed to provide a rich, XML-based markup language for assisting the generation of synthetic speech in Web and other applications. The essential role of the markup language is to provide authors of synthesizable content a standard way to control aspects of speech such as pronunciation, volume, pitch, rate, etc. across different synthesis-capable platforms." New elements in this release lang, and w (for word).

ITRIS has released Glips Graffiti 1.5, an SVG editor based on Batik. "It features shape tools (rectangles, circles, ellipses, lines, polygons, and polylines), path tools (Bezier curves, conversion to a path, union, subtraction, and intersection), basic text support, and image import (SVG or Bitmap). Supported transformations are translate, resize, rotate, and skew. A property manager is available for each object, and a resource manager takes care of gradients, patterns, markers, and filters." Java 6 or later is required.

Glips Graffiti is published under the GPL which is a problem since it's based on Batik, which has a GPL-incompatible license. This needs to be cleared up before it can be reliably adopted.

Wednesday, January 10, 2007 (Permalink)

Bare Bones Software has released version 8.6 of BBEdit, my preferred text editor on the Mac, and what I'm using to type these very words. New features include:

"The Java language module has been rewritten, and gets all sorts of goodies: folding for functions, inner class support, recognition of interfaces, and listing of abstract method declarations in the function popup."
Markdown support
Syntax colored text can be saved as styled HTML
TeX and LaTeX support have been improved
BBEdit can now read and write the "binary property list" format used in Mac OS X 10.4 for application preferences files.

BBEdit is $199 payware. Upgrades from 8.5 are free. Upgrades from 8.0 cost $30 and upgrades from 7.x costs $40. Mac OS X 10.4 or later is required.

Tuesday, January 9, 2007 (Permalink)

The XML Apache Project has posted version 0.93 of FOP, an open source XSL Formatting Objects to PDF/PostScript/RTF converter written in Java. This release adds support for OpenType fonts for PDF and "all fonts available to the Java2D subsystem for all Java2D-descendant renderers (TIFF, PNG, print, AWT)." It also improves the Java2DRenderer and its dependent print and bitmap renderers.

Monday, January 8, 2007 (Permalink)

The Omni Group has released OmniWeb 5.5.2, a $29.95 payware web browser for Mac OS X that supports the core parts of XML on the Web including XSLT and CSS. This release fixes bugs including at least one serious security hole in JavaScript. All users should upgrade.

Sunday, January 7, 2007 (Permalink)

Matt Mullenweg has released Wordpress 2.0.6, an open source (GPL) blog engine based on PHP and MySQL. 2.0.6 fixes assorted security bugs, which is no great surprise since they're still trying to spackle over every security hole that pops up. WordPress has some deep architectural flaws that the developers are in serious denial about. I don't think we've seen the last major security hole in this product. This release has a really serious conflict with FeedBurner, so you may want to wait for 2.0.7. I'm not sure whether it's Wordpress's fault or FeedBurner's. I've heard both groups blamed. However given Wordpress's known disregard for HTTP, I know where'd I'd put my money if I had to bet.

I use WordPress to power The Cafes and Mokka mit Schlag. It's got a lot to recommend it including the user interface and themability. Unfortunately HTTP, XML, and security are not equal strengths. It may (or may not) be the best open source blog engine available today, but it's certainly not even close to the best one that's possible.

Saturday, January 6, 2007 (Permalink)

FourThought has released the Amara XML Toolkit 1.2.0, an open source "collection of Python tools for XML processing-- not just tools that happen to be written in Python, but tools built from the ground up to use Python idioms and take advantage of the many advantages of Python." Amara includes:

Bindery: data binding tool (a very Pythonic XML API)
Scimitar, an implementation of the Schematron language for that converts Schematron documents to Python scripts
domtools: set of tools to augment Python DOMs
saxtools: set of tools to make SAX easier to use in Python
Flextyper: user-defined datatypes in Python for XML processing

New features in this release include:

omit_nodetype_rule bindery rule
force_nsdecls parameter to bindery node xml() method
Support for attribute patterns to pushdom/pushbind
An experimental xml_xslt() method to bindery object to apply transforms
DTD validation is now off by default
Added support for DTD validation & custom binding classes to convenience APIs

Python 2.3 or later is required.

Gerald Schmidt has released XML Copy Editor 1.0.8.7, a free-as-in-speech (GPL) XML editor for Windows and Linux "with DTD/XML Schema/RELAX NG validation, XSLT, XPath, pretty-printing, syntax highlighting, folding, tag completion/locking and lossless import/export of Microsoft Word documents." This release adds tab splitting and fixes bugs.

Friday, January 5, 2007 (Permalink)

The W3C Technical Architecture Group (TAG) published an approved finding on The Use of Metadata in URIs. " This finding addresses several questions regarding Uniform Resource Identifiers (URIs). Specifically, what information about a resource can or should be embedded in its URI? What metadata can be reliably determined from a URI, and in what circumstances is it appropriate to rely on the correctness of such information? In what circumstances is it appropriate to use information from a URI as a hint as to the nature of a resource or its representations? Simple examples are used to explain the tradeoffs involved in employing such metadata in URIs."

Joerg Moebius has released yax, an open source (LGPL) implementation of XProc.

Advanced Software Production Line has released libaxl 0.4, a C parser for XML. This release adds namespace awareness as an optional feature. (In 2006 namespaces are not optional.) The API has some serious problems, but the developers have already frozen it, so they won't be fixed. I recommend against this library. There are many superior alternatives available.

Andrea Marchesini has released libnxml 0.16, a C library for parsing, writing, and creating XML 1.0 and 1.1. Version 0.16 adds support for cacert SSL HTTP requests, and XML 1.0 without a prolog. libnxml is published under the LGPL.

Thursday, January 4, 2007 (Permalink)

The Big Faceless Organization has released the Big Faceless Report Generator 1.1.33, a $1200 payware Java application for converting XML documents to PDF. Unlike most similar tools it appears to be based on HTML and CSS rather than XSL Formatting Objects. This is mostly a bug fix release. Java 1.2 or later is required.

Wednesday, January 3, 2007 (Permalink)

Todd Ditchendorf has released TeXSLMate 1.0, a free-beer plugin based on libxml and libxslt that adds an XSLT/XQuery debugging palette to the Mac TextMate editor.

Tuesday, January 2, 2007 (Permalink)

The W3C XForms working group has posted the fifth public working draft of XForms 1.1. "XForms 1.1 refines the XML processing platform introduced by [XForms 1.0] by adding several new submission capabilities, action handlers, utility functions, user interface improvements, and helpful datatypes as well as a more powerful action processing facility, including conditional, iterated and background execution, the ability to manipulate data arbitrarily and to access event context information." Changes since 1.0 include:

PUT and DELETE as forms actions
HTTP headers can be controlled from the XForm submission.
power, luhn, current, choose, id and property XPath extension functions
An email address datatype
An ID card number datatype
A prompt action element
An xforms-close event
An xforms-submit-serialize event
Inline rendering of non-text media types

The major change in this draft is the addition of a truly insane idea known as chameleon schemas. In brief, the namespace of an XForm is allowed to change depending on which document you put it in. This more or less makes namespaces completely pointless, and is based on two mistaken beliefs:

The purpose of namespaces is merely to disambiguate the same names that appear in two different vocabularies.
The intersection of the set of users smart enough to author XForms and the set of users too stupid to understand namespace scoping rules is non-empty.

I've argued against this change, but the working group seems hell-bent on defenestrating namespaces. This is going to make writing generic XForms processors and libraries much harder.

Monday, January 1, 2007 (Permalink)

Here's a New Year's present for everyone: I have uploaded the finished release of Jaxen 1.1, an open source XPath 1.0 engine written in Java that supports multiple object models including DOM, XOM, JDOM, and dom4j. It is also flexible enough to be adapted to XML views of non-XML data structures. For instance, PMD uses it to enable XPath expressions to query compiled Java byte code. Version 1.1 is believed to be fully conformant with the XPath 1.0 specification. Numerous bugs have been fixed since version 1.0 several years ago. If anyone is still using version 1.0, please upgrade at your earliest convenience. Jaxen is published under a modified BSD license.

[ XML Books | XML Trade Shows | XML Mailing Lists | XML Quotes ]