Cafe con Leche News Tuesday, November 14, 2006

The W3C Technical Architecture Group (TAG) has published On Linking Alternative Formats To Enable Discovery And Publishing. The problem is:

Content creators wishing to publish multiple versions of a given resource on the Web face a number of questions with respect to how such URIs are created, published and discovered. Questions include:
Given a resource http://example.com/ubiquity/ that can be delivered in a multiplicity of representations, how should one publish the relevant URIs to enable automatic discovery of these representations (AKA specific resources)?

How does one ensure that the alternative relationship amongst these various representations is available in a machine readable form, and consequently discoverable?

Here, multiple representations might include:
Representations appropriate for different delivery contexts
Alternative formats of the resource distinguished by Content-type
Different versions of the resource e.g., either by language or date
Representations in different languages

This document explores the issues that arise in this context, and attempts to define best practices that help:
Preserve the One Web while enabling content publishing to a multiplicity of delivery contexts.

Enable the creation of RESTful URIs that remain representation agnostic while delivering the correct end-user experience.

Enable automatic discovery of the available representations.

Enable web crawlers discover the relationship between a given generic resource and the specific resources that correspond to its various alternatives. This will help search engines build better Web indices and avoid the need to index all available alternatives of a given resource

The suggested solution is:

Create representation-specific URIs (specific resources) for each available alternative (representation_i), e.g., http://example.com/ubiquity/resource/representation_i.

If no content negotiation is in place, serve a canonical representation (generic resource) of the content at http://example.com/ubiquity/resource

With that same URI, use HTTP content-negotiation, along with the correct HTTP VARY headers to serve up the appropriate representation at access time. Ensure that the VARY headers capture the right parameters that were used to choose the representation that is being served — this is important for correct behavior when using cacheing proxies.

As an alternative to the previous step, arrange for the server to generate an HTTP 302 (Found) redirect to automatically serve up http://example.com/ubiquity/representation_i when http://example.com/ubiquity is accessed by user-agent_i. This form of redirect involves an extra client/server round-trip, and may therefore be suboptimal for mobile devices. This is a temporary redirect; the accessing user-agent should continue to use the canonical URI when creating bookmarks, or emailing URI. Finally, note that to optimize link traversal out of the resulting document, the content provider might wish to rewrite relative links to point at the specific resource. This will ensure that later uses of the URI results in expected end-user results; e.g., In the following scenario:
Cell-phone user emails link
Recipiant opens message on a desktop
Clicks on the link
The user following the link from inside the email message on a desktop browser should receive the desktop version, and not the mobile version. Notice that passing around the canonical URI is critical in achieving this behavior.
Additionally, contrast this solution with using HTTP content-negotiation with VARY headers; using a redirect to the URI as a specific resource has the advantage of freezing all parameters that were used to choose that representation into the URI.

Use linking mechanisms provided by the representation being served to create links to the other available representations. As an example, when using HTML, one might use a and link elements to advertize the availability of alternate representations. In this context, note that there are two distinct types of such links:

Links for human consumption that are to be presented to the user
And links for machine consumption, that are used by the user agent to provide additional functionality.

As an example, links to available alternatives meant for human consumption might use the HTML a element since these are rendered by user-agents. In contrast, links meant for use by bots might use the HTML link element — as an example, this reflects present practice when publishing pointers to Atom/RSS feeds.

In either case, notice that following these steps creates a mini-graph comprising of the canonical URI and URIs for its various representations.

This is actually just the solution suggested for one particular use case, but the others are very similar.

This seems wise, and in general points out something I've noticed in designing RESTful systems. The server maintainer needs to be able to freely define resources and invent URLs pointing to those resources. A given resource can have more than one URL, and indeed different parts of one document may be individual resources with their own unique URLs. For example, this page could have one URL (http://www.cafeconleche.org/) and every news item on the page could have its own URL (http://www.cafeconleche.org/news/November_14_2006_35156, http://www.cafeconleche.org/news/November_14_2006_34857). Parts of the page could be updated by PUTting the relevant content to the individual item URLs.

Of course this requires an additional layer of indirection on the server, maybe more than one. The current static file system that serves Cafe con Leche can't really handle this. However fewer and fewer sites are generated out of static files these days anyway. The key is to design the server side systems such that URLs are freely created for everything of interest.

I'm reminded of a problem a lot of my intro to Java students have. They can't figure out how to make two objects talk to each other (often action listeners and the applet they're responding to) so they want to put everything in one class. The proper solution to this problem is to add methods to one or the other of the two classes so the objects can communicate. In the RESTful world of HTTP, when you find you're having trouble sending the server the message you want to send it, the solution is definitely not adding a new method. Rather it's adding a new URI. Don't be afraid of URIs. A good RESTful system will have lots of them.

XML News from Tuesday, November 14, 2006