XML News from Friday, June 27, 2008

The W3C Semantic Web Best Practices and Deployment Working Group and HTML Working Groups have published a new working draft of RDFa Primer 1.0 and a candidate recommendation of RDFa in XHTML: Syntax and Processing.

The modern Web is made up of an enormous number of documents that have been created using HTML. These documents contain significant amounts of structured data, which is largely unavailable to tools and applications. When publishers can express this data more completely, and when tools can read it, a new world of user functionality becomes available, letting users transfer structured data between applications and web sites, and allowing browsing applications to improve the user experience: an event on a web page can be directly imported into a user's desktop calendar; a license on a document can be detected so that users can be informed of their rights automatically; a photo's creator, camera setting information, resolution, location and topic can be published as easily as the original photo itself, enabling structured search and sharing.

RDFa is a specification for attributes to be used with languages such as HTML and XHTML to express structured data. The rendered, hypertext data of XHTML is reused by the RDFa markup, so that publishers don't need to repeat significant data in the document content. This document only specifies the use of the RDFa attributes with XHTML. The underlying abstract representation is RDF [RDF-PRIMER], which lets publishers build their own vocabulary, extend others, and evolve their vocabulary with maximal interoperability over time. The expressed structure is closely tied to the data, so that rendered data can be copied and pasted along with its relevant structure.

The rules for interpreting the data are generic, so that there is no need for different rules for different formats; this allows authors and publishers of data to define their own formats without having to update software, register formats via a central authority, or worry that two formats may interfere with each other.

RDFa shares some use cases with microformats [MICROFORMATS]. Whereas microformats specify both a syntax for embedding structured data into HTML documents and a vocabulary of specific terms for each microformat, RDFa specifies only a syntax and relies on independent specification of terms (often called vocabularies or taxonomies) by others. RDFa allows terms from multiple independently-developed vocabularies to be freely intermixed and is designed such that the language can be parsed without knowledge of the specific term vocabulary being used.

This document is a detailed syntax specification for RDFa, aimed at:

For those looking for an introduction to the use of RDFa and some real-world examples, please consult the RDFa Primer.

Here's a syntax example from the primer draft:

   <div about="/posts/trouble_with_bob">
      <h2 property="dc:title">The trouble with Bob</h2>
      
      The trouble with Bob is that he takes much better photos than I do:
	
      <div about="http://example.com/bob/photos/sunset.jpg">
        <img src="http://example.com/bob/photos/sunset.jpg" />
        <span property="dc:title">Beautiful Sunset</span>

        by <span property="dc:creator">Bob</span>.
      </div>
   </div>

The thing that jumps out at me are the use of namespace prefixes in attribute values. Haven't we learned by now that this is a bad idea?

I'm actually designing a significant metadata system at my day job at the moment, and for the life of me I can't figure out why we should use RDF in any shape or form. It doesn't offer clients any useful tools, and just makes the data more opaque. Most of the interesting meta-things we want to say will have to be hand-coded anyway because there are no standards for them. I think we're going to go with a hand-rolled XML syntax as the simplest thing that could possibly work. If anyone asks for RDF, we can always publish a GRDDL or XSLT transform; but RDF just seems pointless.