XML News from Tuesday, October 19, 2004

Yesterday evening I was doing some programming with DOM, when I was reminded of the importance of failing fast (as well as just how much I hate DOM). I was running the XInclude Test Suite across my DOMXIncluder and logging the results to a simple, record-like XML document. The format for the output was suggested by the XInclude working group, but it's quite simple: no namespaces; nothing fancy. It looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<testresults processor="com.elharo.xml.xinclude.DOMXIncluder">
    <testresult id="imaq-include-xml-01" result="pass"/>
    <testresult id="imaq-include-xml-02" result="pass"/>
    <testresult id="imaq-include-xml-03" result="skipped">
        <note>DOMXIncluder does not support the xpointer scheme</note>
    </testresult>
    <testresult id="imaq-include-xml-04" result="pass"/>
    <testresult id="imaq-include-xml-05" result="pass"/>
    <testresult id="imaq-include-xml-06" result="fail"/>
...

One of the things I logged into the document was exception messages encountered when running any one of the 150 or so tests. Somewhere along the line one or more of the exception messages I logged was null or contained an XML illegal character such as a form feed. However, I'm still not sure which ones because DOM doesn't actually complain if you create a text node with malformed data that cannot possibly be serialized. Quite a while later, when I was serializing the document, the serializer complained and died with an unhelpful error message that didn't actually tell me where to find the problem. (I tried two serializers. Apache's XMLSerializer complained about a bad character, but didn't tell me what the character was or where it appeared. JAXP's ID transform simply generated a blank document without any error message.) As a kludgy fix, for the time being I've stopped logging the exception messages into DOM.

The problem is not that the exception messages contained illegal characters. If I had been informed of this, it would have been trivial to work around it. The problem was that DOM didn't bother checking for this, and blindly created a malformed document. XOM would have caught the error immediately when it happened, rather than waiting for the entire document to be serialized. It would have pinpointed exactly where the problem was so I could fix it. Draconian error handling is a feature, not a bug. It is the API's responsibility to detect bad input. It must not rely on client programmers to provide correct data. Even when the programmers are experts who really do know all the ins and outs of which input is legal, they may not be creating the input by hand. They are often passing in data from another source that has no idea it is talking to an API with particular preconditions. Precondition verification is a sine qua non for robust, published APIs; and it is a sine qua non that DOM fails to implement.