The W3C XML Core Working Group has published a note on Legacy extended IRIs for XML resource identification. "For historic reasons, some formats have allowed variants of IRIs that are somewhat less restricted in syntax, for example XML system identifiers and W3C XML Schema anyURIs. This document provides a definition and a name (Legacy Extended IRI or LEIRI) for these variants for easy reference. These variants have to be used with care; they require further processing before being fully interchangeable as IRIs. New protocols and formats should not use Legacy Extended IRIs." Characters allowed in LEIRIs but not IRIs include:
- Space (U+0020)
- Delimiters "<" (U+003C), ">" (U+003E) and '"' (U+0022)
- Unwise characters "\" (U+005C), "^" (U+005E), "`" (U+0060), "{" (U+007B), "|" (U+007C) and "}" (U+007D)
- The controls (C0 controls, DEL and C1 controls, U+0000 - U+001F U+007F - U+009F)
- Bidi formatting characters (U+200E, U+200F, U+202A-202E)
- Specials (U+FFF0-FFFD)
- Private use code points (U+E000-F8FF, U+F0000-FFFFD, U+100000- 10FFFD)
- Tags (U+E0000-E0FFF)
- Non-characters (U+FDD0-FDEF, U+1FFFE-1FFFF, U+2FFFE-2FFFF, U+3FFFE-3FFFF, U+4FFFE-4FFFF, U+5FFFE-5FFFF, U+6FFFE-6FFFF, U+7FFFE-7FFFF, U+8FFFE-8FFFF, U+9FFFE-9FFFF, U+AFFFE-AFFFF, U+BFFFE-BFFFF, U+CFFFE-CFFFF, U+DFFFE-DFFFF, U+EFFFE-EFFFF, U+FFFFE-FFFFF, U+10FFFE-10FFFF)