Determine from MIME media type in HTTP content-type header or similar
If the document parsed as text is in fact an XML document, use the normal XML heuristics
Refer to the encoding attribute
encoding
UTF-8