XML News from Tuesday, September 22, 2009

The W3C has published a note on Publishing Open Government Data:

Step 1: The quickest and easiest way to make data available on the Internet is to publish the data in its raw form (e.g., an XML file of polling data from past elections). However, the data should be well-structured. Structure allows others to successfully make automated use of the data. Well-known formats or structures include XML, RDF and CSV. Formats that only allow the data to be seen, rather than extracted (for example, pictures of the data), are not useful and should be avoided.

Step 2: Create an online catalog of the raw data (complete with documentation) so people can discover what has been posted.

These raw datasets should be reliably structured and documented, otherwise their usefulness is negligible.  Most governments already have mechanisms in place to create and store data (e.g., Excel, Word, and other software-specific file formats).

Posting raw data, with an online catalog, is a great starting point, and reflects the next-step evolution of the Internet - "website as fileserver".

Step 3: Make the data both human- and machine-readable:

These steps will help the public to easily find, use, cite and understand the data. The data catalog should explain any rules or regulations that must be followed in the use of the dataset. Also, the data catalog itself is considered "data" and should be published as structured data, so that third parties can extract data about the datasets. Thoroughly document the parts of the web page, using valid XHTML, and choose easily patterned and discoverable URLs for the pages. Also syndicate the data for the catalog (using formats such as RSS) to quickly and easily advertise new datasets upon publication.

Actually, that sounds like good advice for more than just government data.