XML News from Sunday, June 4, 2006

John Cowan has posted the seventh release candidate of TagSoup, an open source, Java-language, SAX parser for nasty, ugly HTML. I use TagSoup to convert JavaDoc to well-formed XHTML. RC6 focuses on namespaces. According to Cowan,

Mike Bremford sent me a patch that causes TagSoup to send the system and public IDs to the LexicalHandler if there is a DOCTYPE declaration present in the input. Formerly, DOCTYPE declarations were simply ignored. This patch is too good to reject even at this stage, and with a few emendations it passed all my acceptance tests, so I've incorporated it.

In addition, the last known remaining bug was removed. In the last few releases, the script element was allowed to be a root element (a by-product of allowing it anywhere). Now it will be wrapped in an html element instead. This eliminates some random newlines that were being added at the end of such a root-level script element as well.

TagSoup is dual licensed under the Academic Free License and the GPL.