•Can’t use regular expressions:
•Detecting encoding
•Comments and processing instructions that contain tags
•CDATA sections
•Unexpected placement of spaces and line
breaks within tags
•Default attribute values
•Character and entity references
•Malformed documents
•Internal DTD Subset
•Why not?
•Unfamiliarity with parsers
•Too slow