XML News from Friday, October 17, 2008

The W3C Semantic Web Best Practices and Deployment Working Group and HTML Working Groups have published the finished recommendation of RDFa in XHTML: Syntax and Processing.

The modern Web is made up of an enormous number of documents that have been created using HTML. These documents contain significant amounts of structured data, which is largely unavailable to tools and applications. When publishers can express this data more completely, and when tools can read it, a new world of user functionality becomes available, letting users transfer structured data between applications and web sites, and allowing browsing applications to improve the user experience: an event on a web page can be directly imported into a user's desktop calendar; a license on a document can be detected so that users can be informed of their rights automatically; a photo's creator, camera setting information, resolution, location and topic can be published as easily as the original photo itself, enabling structured search and sharing.

RDFa is a specification for attributes to express structured data in any markup language. This document specifies how to use RDFa with XHTML. The rendered, hypertext data of XHTML is reused by the RDFa markup, so that publishers don't need to repeat significant data in the document content. The underlying abstract representation is RDF [RDF-PRIMER], which lets publishers build their own vocabulary, extend others, and evolve their vocabulary with maximal interoperability over time. The expressed structure is closely tied to the data, so that rendered data can be copied and pasted along with its relevant structure.

The rules for interpreting the data are generic, so that there is no need for different rules for different formats; this allows authors and publishers of data to define their own formats without having to update software, register formats via a central authority, or worry that two formats may interfere with each other.

RDFa shares some use cases with microformats [MICROFORMATS]. Whereas microformats specify both a syntax for embedding structured data into HTML documents and a vocabulary of specific terms for each microformat, RDFa specifies only a syntax and relies on independent specification of terms (often called vocabularies or taxonomies) by others. RDFa allows terms from multiple independently-developed vocabularies to be freely intermixed and is designed such that the language can be parsed without knowledge of the specific term vocabulary being used.

This document is a detailed syntax specification for RDFa, aimed at:

For those looking for an introduction to the use of RDFa and some real-world examples, please consult the RDFa Primer.

Here's a syntax example from the draft:

<html
  xmlns="http://www.w3.org/1999/xhtml"
  xmlns:cal="http://www.w3.org/2002/12/cal/ical#"
  xmlns:xsd="http://www.w3.org/2001/XMLSchema"
  >
  <head><title>Jo's Friends and Family Blog</title></head>
  <body>
    <p>
      I'm holding
      <span property="cal:summary">
        one last summer Barbecue
      </span>,
      on
      <span property="cal:dtstart" content="20070916T1600-0500"
            datatype="xsd:datetime">
        September 16th at 4pm
      </span>.
    </p>
  </body>
</html>

You'll notice that RDFa manages to avoid using namespace prefixes ion attribute names (where they work) but do use them inside attribute values (where they don't). I can't get too worked up over this, though. It's not like anyone is ever going to pay any attention to it anyway. I confidently predict that RDFa will be every bit as successful as RDF itself (which is, to say, not at all.) RDF has been to informaticians of the 21st century what hot fusion was to physicists of the 20th: a fun way to waste a career on a technology doomed to failure. At least the informaticians won't blow hundreds of millions of research dollars while they discover this.