A W3C standard for determining when two documents are the same after:
Entity references are resolved
Document is converted to Unicode
Unicode combining forms are combined
Comments are stripped
White space is normalized
Default attribute values are added
If at all possible, your programs should depend only on the canonical form of the document
Canonical form of hotcop.xml:
<?xml-stylesheet type="text/css" href="song.css"?><SONG> <TITLE>Hot Cop</TITLE> <COMPOSER>Jacques Morali</COMPOSER> <COMPOSER>Henri Belolo</COMPOSER> <COMPOSER>Victor Willis</COMPOSER> <PRODUCER>Jacques Morali</PRODUCER> <PUBLISHER>A & M Records</PUBLISHER> <LENGTH>6:20</LENGTH> <YEAR>1978</YEAR> <ARTIST>Village People</ARTIST> </SONG>