Four byte UTF-8 encodings can encode UCS-4 characters which are beyond the range of legal XML characters (and can't be expressed in Unicode surrogate pairs). This document holds such a character.
<doc>�</doc>
Expected result | Actual result for org.apache.crimson.parser.XMLReaderImpl |
---|---|
<?xml version="1.0" encoding="UTF-8"?> <ConformanceResults> <startDocument/> <fatalError/> <endDocument/> </ConformanceResults> | <?xml version="1.0" encoding="UTF-8"?> <ConformanceResults> <startDocument/> <fatalError/> </ConformanceResults> |