XML News from Wednesday, June 2, 2004

The IETF has posted a new working draft of Internationalized Resource Identifiers (IRIs). "An IRI is a sequence of characters from the Universal Character Set (Unicode/ISO 10646). A mapping from IRIs to URIs is defined, which means that IRIs can be used instead of URIs where appropriate to identify resources." In other words this lets you write URLs that use non-ASCII characters such as http://www.libération.fr/. The non-ASCII characters would be converted to a genuine URI using hexadecimally escaped UTF-8. For instance, http://www.libération.fr/ becomes http://www.lib%C3%A9ration.fr/. There's also an alternative, more complicated syntax to be used when the DNS doesn't allow percent escaped domain names. However, the other parts of the IRI (fragment ID, path, scheme, etc.) always use percent escaping.