XML News from Friday, August 5, 2005

Day 5 of Extreme begins. This is half day with three sessions and a closing keynote, which is plenty enough. I'm a little zoned by now. First talk this morning is IBM's Erik Hennum discussing "A Unified Type Hierarchy: A Proposal for DITA 2." DITA is the Darwin Information Typing Architecture. From his diagram it looks like another example of the rule that any problem can be solved by an additional layer of indirection. In this case, the indiorection allows different topics (answers to questions) to be combined in different ways in different collections. "Maps are collections of topic references that provide a context for topics...maps aggregate topics." This enables the same content to be ruesed in many different contexts and collections. caveat: despite the terminology this is not about topic maps. DITA is a hierarchy of types. Cusstmizing XML markup to your needs prevents you from sharing the work with other groups. Instead of customizing, he wants to specialize. I'm not sure I see the difference, but it seems to be just sharing some markup and adding new custom markup instead of doing everything from scratch. Extension by substitution. This seems to be like subclassing in OOP. They can say that a steps is an ol, and a step is an li, and a taskbody is a body and so forth. They use DTDs, attribuytes, and XSLT to enable all this. "It's proven to be very pragmatic." "You can only constrain content models. You can't add things to content models." This is klike inheritance where you can override but not add new methods. (Difference is that the overriding elements/methods don't have the same names as the methods they override.) XSLT can easily change your specialized content to the more general form, mostly just by changing element names. This is DITA as it exists today.

What else is needed? Obviously the ability to add properties to subclasses, as well as just rename them. i.e. subtypes can have child elements the supertype doesn't have.

Eric van der Vlist is discussing "RDF Query By example." He's doing a presentation with only angle brackets. Even I don't go this far. I write my notes in XML too, but I add a stylesheet to change them to HTML before displaying them.

Eric van der Vlist's notes

He needed to work with LDAP, which is both a graph and a tree. "RDF is a nice way of modelling graphs."

The first (and last) real case study I've seen at this show is Jeff Beck describing "How XML made the NIH "Policy on Enhancing Public Access to Archived Publications Resulting from NIH-Funded Research." The extreme part is the public access policy. (Not so extreme: it's voluntary.) The XML part is PubMed Central, a stable archive of NIH funded research publications. (This is not the same as PubMed, which only contains abstracts, though about 13 million of them. This contains full articles.) About 1400 NIH funded papers a week are published. Every submitted article is converted into XML (often from PDF!?) (Many are submitted in SGML.) These papers need to be accessible without plugins and on slow connections (e.g. in the field in Africa). Special characters and math were thus a problem. Unicode is part of the answer but only part. They are scanning back issues. Only 11 manuscripts are available so far. More are coming. Articles are delayed six months after original paper publication before being posted online. They validate with DTDs and XSLT.

Here's a picture of the poster that collected comments, suggestions, and RFEs for the XML randomizer I talked about on Tuesday:

Obscruing XML RFEs

Some good ideas here and some wild ones, and those are not necessarily non-intersecting sets. It occurs to me that this pretty well describes the whole conference.

Traditionally C. Michael Sperberg-McQueen gives the closing keynote, and this year is no exception. The official title is "Getting it in writing: The letter killeth, but the spirit giveth life. Or was it the other way round?"

C. Michael Sperberg-McQuuen closing keynote

By the way, I apologize for the quality of some of the pictures. The lighting in the room we're in this year really seems to disagree with my camera (a Panasonic Lumix FZ5).