The document is encoded in UTF-8 and the text inside the root element uses two non-ASCII characters, encoded in UTF-8 and each of which expands to a Unicode surrogate pair.
<!DOCTYPE doc [ <!ELEMENT doc (#PCDATA)> ]> <doc>𐀀</doc>
Expected result | Actual result for net.sf.saxon.aelfred.SAXDriver |
---|---|
<?xml version="1.0" encoding="UTF-8"?> <ConformanceResults> <startDocument/> <startElement> <namespaceURI/> <localName>doc</localName> <qualifiedName>doc</qualifiedName> <attributes/> </startElement> <char>\uD800</char> <char>\uDC00</char> <char>\uDBFF</char> <char>\uDFFD</char> <endElement> <namespaceURI/> <localName>doc</localName> <qualifiedName>doc</qualifiedName> </endElement> <endDocument/> </ConformanceResults> | <?xml version="1.0" encoding="UTF-8"?> <ConformanceResults> <startDocument/> <resolveEntity> <systemID>file:/home/elharo/SAXTest/xmlconf/xmltest/valid/sa/052.xml</systemID> </resolveEntity> <startElement> <namespaceURI/> <localName>doc</localName> <qualifiedName>doc</qualifiedName> <attributes/> </startElement> <char>\uD800</char> <char>\uDC00</char> <char>\uDBFF</char> <char>\uDFFD</char> <endElement> <namespaceURI/> <localName>doc</localName> <qualifiedName>doc</qualifiedName> </endElement> <endDocument/> </ConformanceResults> |