XLinks and XPointers

Elliotte Rusty Harold

XML One 2001 Austin

March 7, 2001

elharo@metalab.unc.edu

http://www.ibiblio.org/xml/

Please turn off all

Cell Phones
Pagers
Alarm Watches
etc.
and set your notebook's volume to zero

HTML Hypertext is Limited

The Web conquered gopher for one reason: HTML made it possible to embed hypertext links in documents.
HTML linking has limits

You can only link to one document at a time
You must link to the entire document.
Once the link is traversed the trail of where you've been is lost.

Includes are server dependent and don't work across domains
Links break

XML Hypertext

Linking in XML is divided into multiple parts:

A Uniform Resource Identifier (URI) names or locates a resource
An XLink defines connections between two or more documents identified by URIs
XPath identifies particular nodes within a document
An XPointer adds an XPath to a URI
XBase defines the URI against which relative URIs are resolved
XInclude embeds a document identified by a URI inside an XML document.

XML Hypertext Example

<?xml version="1.0"?>
<story date="January 9, 2001"
       xmlns:xlink="http://www.w3.org/1999/xlink"
       xmlns:xinclude="http://www.w3.org/1999/XML/xinclude"
       xml:base="http://www.cafeaulait.org/">

  <p>
    The W3C XML Linking Working Group has pushed the 
    <link xlink:type="simple"
      xlink:href="http://www.w3.org/TR/2001/WD-xptr-20010108">
      XPointer specification
    </link> 
    back to working draft status. The specific issue that was 
    uncovered during Candidate Recommendation was some 
    <link xlink:type="simple"
      xlink:href="http://www.w3.org/TR/xptr#xpointer(//div[@class='div3'][7])">
      confusion
    </link> 
    over how to integrate XPointers, particularly those in non-XML documents, 
    with namespaces. 
   </p>

   <p>
     It's also come to light in this draft that Sun has 
     <link xlink:type="simple"
      xlink:href=
      "http://lists.w3.org/Archives/Public/www-xml-linking-comments/2000OctDec/0092.html"
      >
      claimed a patent</link> on some of the technologies needed to 
      implement XPointer. I think this is particularly offensive because Eve 
      L. Maler, a Sun employee, served as co-chair of the XML Linking 
      Working Group and a co-editor of the XPointer specification. As usual 
      Sun wants to use this as a club to lock implementers and users into a 
      licensing agreement that goes beyond what Sun and the W3C could 
      otherwise demand. The specific patent is <cite>United States Patent 
      No. 5,659,729, Method and system for implementing hypertext scroll 
      attributes</cite>, issued to Jakob Nielsen in 1997. The patent was 
      filed on February 1, 1996. It claims:
  </p>
  <blockquote>
    <xinclude:include 
      href=
      "http://www.delphion.com/details?&pn=US05659729__#xpointer(//abstract)"
      >
    </xinclude:include>
  </blockquote>
  
</story>

Versions

This talk covers:

XLinks: December 20, 2000 Proposed Recommendation
XPointers: January 8, 2001 2nd Last Call Working Draft
XPath: November 16, 1999 1.0 Specification
XInclude: October 26, 2000 Working Draft
XBase: December 20, 2000 Proposed Recommendation

Part I: XLinks

Once you've tasted XLink's Chunky Monkey, it's hard to reconcile yourself to HTML's vanilla.

--John E. Simpson on the xsl-list mailing list

XLinks are More Powerful

Designed especially for use with XML
Multidirectional
Any element can be a link, not just <A>
Can link to arbitrary positions in the document

Application Support

No general-purpose Web browsers or other applications support arbitrary XLinks.
XLinks have a much broader base of applicability than HTML links. They can be used by any custom application that needs to establish connections between documents and parts of documents, for any reason.
Even when XLinks are fully implemented in browsers they may not always be blue underlined text that you click to jump to another page.

Linking Elements

Any element can be a link
XLink elements are identified by an xlink:type attribute with one of these six values:
- simple
- extended
- locator
- arc
- resource
- title
Linking elements are identified by an xlink:type attribute with one of these two values:
- simple
- extended
Each linking element contains an xlink:href attribute whose value is the URI of the resource being linked to.
An xmlns:xlink attribute associates the xlink prefix with the http://www.w3.org/1999/xlink namespace.

For example

<FOOTNOTE xmlns:xlink="http://www.w3.org/1999/xlink"
          xlink:type="simple"
          xlink:href="footnote7.xml">7</FOOTNOTE>
<COMPOSER xmlns:xlink="http://www.w3.org/1999/xlink"
          xlink:type="simple"
          xlink:href="http://www.interport.net/~beand/">
    Beth Anderson
</COMPOSER>
<IMAGE xmlns:xlink="http://www.w3.org/1999/xlink"
       xlink:type="simple" xlink:href="logo.gif"/>

Declaring XLink Attributes in DTDs

<!ELEMENT FOOTNOTE (#PCDATA)>
<!ATTLIST FOOTNOTE
  xmlns:xlink CDATA #FIXED "http://www.w3.org/1999/xlink"
  xlink:type  CDATA #FIXED "simple"
  xlink:href  CDATA #REQUIRED
>
<!ELEMENT COMPOSER (#PCDATA)>
<!ATTLIST COMPOSER
  xmlns:xlink CDATA #FIXED "http://www.w3.org/1999/xlink"
  xlink:type  CDATA #FIXED "simple"
  xlink:href  CDATA #REQUIRED
>
<!ELEMENT IMAGE EMPTY>
<!ATTLIST IMAGE
  xmlns:xlink CDATA #FIXED "http://www.w3.org/1999/xlink"
  xlink:type  CDATA #FIXED "simple"
  xlink:href  CDATA #REQUIRED
>

Fixed Attributes

<FOOTNOTE xlink:href="footnote7.xml">7</FOOTNOTE>
<COMPOSER xlink:href="http://www.interport.net/~beand/">
  Beth Anderson
</COMPOSER>
<IMAGE xlink:href="logo.gif"/>

Other Attributes

<!ELEMENT IMAGE EMPTY>
<!ATTLIST IMAGE
  xmlns:xlink CDATA #FIXED "http://www.w3.org/1999/xlink"
  xlink:type  CDATA #FIXED "simple"
  xlink:href  CDATA #REQUIRED
  ALT         CDATA #REQUIRED
  HEIGHT      CDATA #REQUIRED
  WIDTH       CDATA #REQUIRED
>

Descriptions of the Remote Resource

A link element may contain optional xlink:role and xlink:title attributes that describe the remote resource, that is, the document or other resource to which the link points
The title contains a short plain text description.

The role contains a URI pointing to a long description.

<AUTHOR 
 xmlns:xlink="http://www.w3.org/1999/xlink"
 xlink:href="mailto:elharo@metalab.unc.edu"
 xlink:title="Send email to Elliotte Rusty Harold" 
 xlink:role="http://www.macfaq.com/personal.html">
  Please drop me a line.
</AUTHOR>

As with all other attributes, the xlink:title and xlink:role attributes should be declared in the DTD for all the elements to which they belong. For example, this is a reasonable declaration for the above AUTHOR element:

<!ELEMENT AUTHOR (#PCDATA)>
<!ATTLIST AUTHOR
  xmlns:xlink CDATA #FIXED "http://www.w3.org/1999/xlink"
  xlink:type  CDATA #FIXED "simple"
  xlink:href  CDATA #REQUIRED
  xlink:title CDATA #IMPLIED
  xlink:role  CDATA #IMPLIED
>

Link Behavior

Linking elements can contain two more optional attributes that suggest to applications how the remote resource is associated with the current page. These are:

xlink:show suggests where the content should be displayed when the link is activated
xlink:actuate suggests whether the link should be traversed automatically or whether a specific user request is required
These are application dependent, however, and applications are free to ignore the suggestions.

xlink:show

The xlink:show attribute has five predefined values:
- replace
- new
- embed
- other
- none

Like all attributes in valid documents, the xlink:show attribute must be declared in a <!ATTLIST> declaration for the DTD's link element. For example:

<!ELEMENT WEBSITE (#PCDATA)>
<!ATTLIST WEBSITE 
    xmlns:xlink CDATA  #FIXED "http://www.w3.org/1999/xlink"
    xlink:type CDATA   #FIXED "simple"
    xlink:href CDATA   #REQUIRED
    xlink:show (new | replace | embed) "replace"
>

xlink:actuate

A linking element's xlink:actuate attribute has four predefined values:

onRequest
onLoad
other
none

<IMAGE 
  xmlns:xlink="http://www.w3.org/1999/xlink" 
       xlink:type="simple" xlink:href="logo.gif"
       xlink:actuate="onLoad"/>

Like all attributes in valid documents, the actuate attribute must be declared in the DTD in a <!ATTLIST> declaration for the link elements in which it appears. For example:

<!ELEMENT IMAGE EMPTY>
<!ATTLIST IMAGE 
 xmlns:xlink CDATA #FIXED "http://www.w3.org/1999/xlink"
  xlink:type CDATA #FIXED "simple"
  xlink:href CDATA #REQUIRED
  xlink:show    (new | replace | embed) "embed"
  xlink:actuate (onRequest | onLoad)    "onLoad"
>

Parameter Entities for Link Attributes

<!ENTITY % link-attributes
   "xlink:type     CDATA  #FIXED 'simple'
    xlink:role     CDATA  #IMPLIED
    xlink:title    CDATA  #IMPLIED

    xmlns:xlink    CDATA  #FIXED 'http://www.w3.org/1999/xlink'
    xlink:href     CDATA  #REQUIRED
    xlink:show     (new | replace | embed) 'replace'
    xlink:actuate  (onRequest | onLoad)    'onRequest'"
>

<!ELEMENT COMPOSER (#PCDATA)>
<!ATTLIST COMPOSER 
    %link-attributes;
>
<!ELEMENT AUTHOR (#PCDATA)>
<!ATTLIST AUTHOR
    %link-attributes;
>
<!ELEMENT WEBSITE (#PCDATA)>
<!ATTLIST WEBSITE
    %link-attributes;
>

Extended Links

Simple links are very similar to HTML links, one-directional, one-element-to-one-document links
Extended links are multi-directional, many-to-many links
An extended link is a list of nodes and a list of the connections between them

Extended Links

An extended link is included in an XML document as an element of some arbitrary type like COMPOSER or TEAM that has an xlink:type attribute with the value extended.

<WEBSITE xmlns:xlink="http://www.w3.org/1999/xlink"
         xlink:type="extended">
 ...
</WEBSITE>

Resources

Extended links generally point to more than one target and from more than one source. Both sources and targets are called by the more generic word resource.
Resources are divided into remote resources and local resources.
A local resource is actually contained inside the extended link element. It is enclosed in element of arbitrary type that has an xlink:type attribute with the value resource.
A remote resource exists outside the extended link element, very possibly in another document. The extended link element contains locator child elements that point to the remote resource. These are elements with any name that have an xlink:type attribute with the value locator. Each locator element has an xlink:href attribute whose value is a URI locating the remote resource.

Resource Example

<WEBSITE xmlns:xlink="http://www.w3.org/1999/xlink" 
         xlink:type="extended">
  <NAME xlink:type="resource">Cafe au Lait</NAME>
  <HOMESITE xlink:type="locator" 
            xlink:href="http://ibiblio.org/javafaq/"/>
  <MIRROR xlink:type="locator" 
          xlink:href="http://sunsite.kth.se/javafaq"/>
  <MIRROR xlink:type="locator" 
          xlink:href="http://sunsite.informatik.rwth-aachen.de/javafaq/"/>
  <MIRROR xlink:type="locator" 
          xlink:href="http://sunsite.cnlab-switch.ch/javafaq/"/>
</WEBSITE>

This WEBSITE element describes an extended link with five resources:

The text "Cafe au Lait", a local resource
The document at http://ibiblio.org/javafaq/, a remote resource
The document at http://sunsite.kth.se/javafaq, a remote resource
The document at http://sunsite.informatik.rwth-aachen.de/javafaq/, a remote resource
The document at http://sunsite.cnlab-switch.ch/javafaq/, a remote resource

Since one of the resources referenced by this extended link is contained in the extended link, it is called an inline link. It will be included as part of one of the documents it connects.

Resource Example Diagram

This picture shows the WEBSITE extended link element and five resources, one of which WEBSITE contains, the other four of which are referred to by URLs. However, this just describes these resources. No connections are implied between them.

Four local and one remote resource with no connections

Roles and Titles for Resources

Both the extended link element itself and the individual locator children may have descriptive attributes such as xlink:role and xlink:title.
The xlink:role and xlink:title attributes of the extended link element provide default roles and titles for each of the individual locator child elements.
Individual resource and locator elements may override these defaults with xlink:role and xlink:title attributes of their own.

<WEBSITE xmlns:xlink="http://www.w3.org/1999/xlink" 
         xlink:type="extended" xlink:title="Cafe au Lait">
  <NAME xlink:type="resource" 
        xlink:role="http://ibiblio.org/javafaq/">
    Cafe au Lait
  </NAME>
  <HOMESITE xlink:type="locator" 
          xlink:href="http://ibiblio.org/javafaq/"
          xlink:role="http://ibiblio.org/"/>
  <MIRROR xlink:type="locator" 
         xlink:title="Cafe au Lait Swedish Mirror"
         xlink:role="http://sunsite.kth.se/"
         xlink:href="http://sunsite.kth.se/javafaq"/>
  <MIRROR xlink:type="locator" 
         xlink:title="Cafe au Lait German Mirror"
         xlink:role="http://sunsite.informatik.rwth-aachen.de/"
         xlink:href=
          "http://sunsite.informatik.rwth-aachen.de/javafaq/"/>
  <MIRROR xlink:type="locator" 
         xlink:title="Cafe au Lait Swiss Mirror"
         xlink:role="http://sunsite.cnlab-switch.ch/"
         xlink:href="http://sunsite.cnlab-switch.ch/javafaq/"/>
</WEBSITE>

DTD for Extended Links

<!ELEMENT WEBSITE (NAME, HOMESITE, MIRROR*) >
<!ATTLIST WEBSITE
  xmlns:xlink  CDATA     #FIXED "http://www.w3.org/1999/xlink"
  xlink:type  (extended) #FIXED  "extended"
  xlink:title  CDATA     #IMPLIED
  xlink:role   CDATA     #IMPLIED
>

<!ELEMENT NAME (#PCDATA)>
<!ATTLIST NAME
   xlink:type  (resource) #FIXED    "resource"
   xlink:role   CDATA     #IMPLIED
   xlink:title  CDATA     #IMPLIED
>

<!ELEMENT HOMESITE (#PCDATA)>
<!ATTLIST HOMESITE
   xlink:type  (locator)  #FIXED    "locator"
   xlink:href   CDATA     #REQUIRED
   xlink:role   CDATA     #IMPLIED
   xlink:title  CDATA     #IMPLIED
>

<!ELEMENT MIRROR (#PCDATA)>
<!ATTLIST MIRROR
   xlink:type  (locator)  #FIXED    "locator"
   xlink:href   CDATA     #REQUIRED
   xlink:role   CDATA     #IMPLIED
   xlink:title  CDATA     #IMPLIED
>

Another Shortcut for the DTD

<!ENTITY % extended.att
  "xlink:type   CDATA    #FIXED 'extended'
   xmlns:xlink  CDATA    #FIXED 'http://www.w3.org/1999/xlink'
   xlink:role   CDATA    #IMPLIED
   xlink:title  CDATA    #IMPLIED"
>

<!ENTITY % resource.att
  "xlink:type (resource) #FIXED  'resource'
   xlink:href    CDATA   #REQUIRED
   xlink:role    CDATA   #IMPLIED
   xlink:title   CDATA   #IMPLIED"
>

<!ENTITY % locator.att
  "xlink:type (locator)  #FIXED  'locator'
   xlink:href    CDATA   #REQUIRED
   xlink:role    CDATA   #IMPLIED
   xlink:title   CDATA   #IMPLIED"
>

<!ELEMENT WEBSITE (HOMESITE, MIRROR*) >
<!ATTLIST WEBSITE
   %extended.att;
>

<!ELEMENT NAME (#PCDATA)>
<!ATTLIST NAME
   %resource.att;
>

<!ELEMENT HOMESITE (#PCDATA)>
<!ATTLIST HOMESITE
   %locator.att;
>

<!ELEMENT MIRROR (#PCDATA)>
<!ATTLIST MIRROR
   %locator.att;
>

Arcs

In an extended link with three resources, A, B, and C; there are nine different possible traversals.
- A --> A
- B --> B
- C --> C
- A --> B
- B --> A
- A --> C
- C --> A
- B --> C
- C --> B
These potential traversals are called arcs
Arcs are represented in XML by elements that have an xlink:type attribute with the value arc.
Traversal rules are defined by attaching xlink:actuate and xlink:show attributes to arc elements.
An arc element has an xlink:from attribute and an xlink:to attribute.
These attributes match the xlink:label attributes of the locator element in the extended link from which traversal is initiated and to which the traversal goes.

Arc Example

<?xml version="1.0"?>
<WEBSITE xmlns:xlink="http://www.w3.org/1999/xlink" 
         xlink:type="extended" xlink:title="Cafe au Lait">
         
  <NAME xlink:type="resource" xlink:label="source">
    Cafe au Lait
  </NAME>

  <HOMESITE xlink:type="locator" 
            xlink:href="http://ibiblio.org/javafaq/"
            xlink:label="us"/>
  
  <MIRROR xlink:type="locator" 
          xlink:title="Cafe au Lait Swedish Mirror"
          xlink:label="se"
          xlink:href="http://sunsite.kth.se/javafaq"/>
  
  <MIRROR xlink:type="locator" 
          xlink:title="Cafe au Lait German Mirror"
          xlink:label="de"
          xlink:href="http://sunsite.informatik.rwth-aachen.de/javafaq/"/>
  
  <MIRROR xlink:type="locator" 
          xlink:title="Cafe au Lait Swiss Mirror"
          xlink:label="ch"
          xlink:href="http://sunsite.cnlab-switch.ch/javafaq/"/>
  
  <CONNECTION xlink:type="arc" xlink:from="source" 
              xlink:to="ch"    xlink:show="replace" 
              xlink:actuate="onRequest"/>
  <CONNECTION xlink:type="arc" xlink:from="source" 
              xlink:to="us"    xlink:show="replace" 
              xlink:actuate="onRequest"/>
  <CONNECTION xlink:type="arc" xlink:from="source" 
              xlink:to="se"    xlink:show="replace" 
              xlink:actuate="onRequest"/>
  <CONNECTION xlink:type="arc" xlink:from="source" 
              xlink:to="sk"    xlink:show="replace" 
              xlink:actuate="onRequest"/>
  
</WEBSITE>

Arc Example Diagram

Arc Example

<WEBSITE xmlns:xlink="http://www.w3.org/1999/xlink" 
         xlink:type="extended" xlink:title="Cafe au Lait">
         
  <NAME xlink:type="resource" xlink:label="source">
    Cafe au Lait
  </NAME>

  <HOMESITE xlink:type="locator" 
           xlink:href="http://ibiblio.org/javafaq/"
           xlink:label="us"/>
  
  <MIRROR xlink:type="locator" 
         xlink:title="Cafe au Lait Swedish Mirror"
         xlink:label="mirror"
         xlink:href="http://sunsite.kth.se/javafaq"/>
  
  <MIRROR xlink:type="locator" 
         xlink:title="Cafe au Lait German Mirror"
         xlink:label="mirror"
         xlink:href="http://sunsite.informatik.rwth-aachen.de/javafaq/"/>
  
  <MIRROR xlink:type="locator" 
         xlink:title="Cafe au Lait Swiss Mirror"
         xlink:label="mirror"
         xlink:href="http://sunsite.cnlab-switch.ch/javafaq/"/>
  
  <CONNECTION xlink:type="arc"  xlink:from="source" 
              xlink:to="mirror" xlink:show="replace" 
              xlink:actuate="onRequest"/>

</WEBSITE>

Arc Example Diagram

Arc Example with omitted to attribute

<?xml version="1.0"?>
<WEBSITE xmlns:xlink="http://www.w3.org/1999/xlink" 
         xlink:type="extended" xlink:title="Cafe au Lait">
         
  <NAME xlink:type="resource" xlink:label="source">
    Cafe au Lait
  </NAME>

  <HOMESITE xlink:type="locator" 
            xlink:href="http://ibiblio.org/javafaq/"
            xlink:label="us"/>
  
  <MIRROR xlink:type="locator" 
          xlink:title="Cafe au Lait Swedish Mirror"
          xlink:label="se"
          xlink:href="http://sunsite.kth.se/javafaq"/>
  
  <MIRROR xlink:type="locator" 
          xlink:title="Cafe au Lait German Mirror"
          xlink:label="sk"
          xlink:href="http://sunsite.informatik.rwth-aachen.de/javafaq/"/>
  
  <MIRROR xlink:type="locator" 
          xlink:title="Cafe au Lait Swiss Mirror"
          xlink:label="ch"
          xlink:href="http://sunsite.cnlab-switch.ch/javafaq/"/>
  
  <xlink:arc from="source" show="new" actuate="onRequest"/>

  <CONNECTION xlink:type="arc" xlink:from="source" 
              xlink:show="replace" xlink:actuate="onRequest"/>

</WEBSITE>

Arc Example Diagram

Arcs can return to the same resource they started from

Arc DTD Fragment

<!ELEMENT WEBSITE (HOMESITE, MIRROR*, xlink:arc*) >
<!ATTLIST WEBSITE
  xmlns:xlink  CDATA  #FIXED "http://www.w3.org/1999/xlink"
  xlink:type  (extended) #FIXED  "extended"
  xlink:title  CDATA     #IMPLIED
  xlink:label  CDATA     #IMPLIED
>

<!ELEMENT HOMESITE (#PCDATA)>
<!ATTLIST HOMESITE
   xlink:type     (locator) #FIXED  "locator"
   xlink:href      CDATA    #REQUIRED
   xlink:label     CDATA    #REQUIRED
   xlink:title     CDATA    #IMPLIED
>

<!ELEMENT MIRROR (#PCDATA)>
<!ATTLIST MIRROR
   xlink:type     (locator) #FIXED  "locator"
   xlink:href      CDATA    #REQUIRED
   xlink:label     CDATA    #REQUIRED
   xlink:title     CDATA    #IMPLIED
>

<!ELEMENT xlink:arc EMPTY>
<!ATTLIST CONNECTION
  xlink:type     (arc)               #FIXED   "arc"
  xlink:from     CDATA               #IMPLIED
  xlink:to       CDATA               #IMPLIED
  xlink:show    (replace)            "replace"
  xlink:actuate (onRequest | onLoad) "onRequest"
>

Out-of-Line Links

Inline links, such as the familiar A element from HTML, are themselves part of the source or target of the link. The source of the link, that is the blue underlined text, is included inside the A element that defines the link. Most simple links are inline.
An out-of-line link does not contain any part of any of the resources it connects. Instead, the links are stored in a separate document called the linkbase.
Out of line links allow you to add links to and from documents that can't be modified such as a page on someone else's web site.
Out of line links allow you to add links to different parts of non-XML content.
Out of line links are not yet supported by software.

Out of line Link example

Out of line Link example

Out of line Link Example

<COURSE xmlns:xlink="http://www.w3.org/1999/xlink"
         xlink:type="extended">

  <TOC xlink:type="locator" 
          xlink:href="http://www.ibiblio.org/javafaq/course/" 
          xlink:label="index"/>

  <CLASS xlink:type="locator" xlink:label="class" xlink:label="class"
         xlink:href="http://www.ibiblio.org/javafaq/course/week1.xml"/>
  <CLASS xlink:type="locator" xlink:label="class" 
         xlink:href="http://www.ibiblio.org/javafaq/course/week2.xml"/>
  <CLASS xlink:type="locator" xlink:label="class" 
         xlink:href="http://www.ibiblio.org/javafaq/course/week3.xml"/>
  <CLASS xlink:type="locator" xlink:label="class" 
         xlink:href="http://www.ibiblio.org/javafaq/course/week4.xml"/>
  <CLASS xlink:type="locator" xlink:label="class" 
         xlink:href="http://www.ibiblio.org/javafaq/course/week5.xml"/>
  <CLASS xlink:type="locator" xlink:label="class" 
         xlink:href="http://www.ibiblio.org/javafaq/course/week6.xml"/>
  <CLASS xlink:type="locator" xlink:label="class" 
         xlink:href="http://www.ibiblio.org/javafaq/course/week7.xml"/>
  <CLASS xlink:type="locator" xlink:label="class" 
         xlink:href="http://www.ibiblio.org/javafaq/course/week8.xml"/>
  <CLASS xlink:type="locator" xlink:label="class" 
         xlink:href="http://www.ibiblio.org/javafaq/course/week9.xml"/>
  <CLASS xlink:type="locator" xlink:label="class" 
         xlink:href="http://www.ibiblio.org/javafaq/course/week10.xml"/> 
  <CLASS xlink:type="locator" xlink:label="class" 
         xlink:href="http://www.ibiblio.org/javafaq/course/week11.xml"/> 
  <CLASS xlink:type="locator" xlink:label="class" 
         xlink:href="http://www.ibiblio.org/javafaq/course/week12.xml"/>
  <CLASS xlink:type="locator" xlink:label="class" 
         xlink:href="http://www.ibiblio.org/javafaq/course/week13.xml"/>
  
  <CONNECTION xlink:type="arc" from="index" to="class"/>
  <CONNECTION xlink:type="arc" from="class" to="index"/>
  
</COURSE>

Another Out of line Link Example

<COURSE xmlns:xlink="http://www.w3.org/1999/xlink"
         xlink:type="extended">

  <CLASS xlink:type="locator" xlink:label="1"
         xlink:href="http://www.ibiblio.org/javafaq/course/week1.xml"/>
  <CLASS xlink:type="locator" xlink:label="2" 
         xlink:href="http://www.ibiblio.org/javafaq/course/week2.xml"/>
  <CLASS xlink:type="locator" xlink:label="3"
         xlink:href="http://www.ibiblio.org/javafaq/course/week3.xml"/>
  <CLASS xlink:type="locator" xlink:label="4"
         xlink:href="http://www.ibiblio.org/javafaq/course/week4.xml"/>
  <CLASS xlink:type="locator" xlink:label="5" 
         xlink:href="http://www.ibiblio.org/javafaq/course/week5.xml"/>
  <CLASS xlink:type="locator" xlink:label="6"
         xlink:href="http://www.ibiblio.org/javafaq/course/week6.xml"/>
  <CLASS xlink:type="locator"  xlink:label="7" 
         xlink:href="http://www.ibiblio.org/javafaq/course/week7.xml"/>
  <CLASS xlink:type="locator"   xlink:label="8"
         xlink:href="http://www.ibiblio.org/javafaq/course/week8.xml"/>
  <CLASS xlink:type="locator" xlink:label="9"
         xlink:href="http://www.ibiblio.org/javafaq/course/week9.xml"/>
  <CLASS xlink:type="locator"  xlink:label="10"
         xlink:href="http://www.ibiblio.org/javafaq/course/week10.xml"/> 
  <CLASS xlink:type="locator" xlink:label="11" 
         xlink:href="http://www.ibiblio.org/javafaq/course/week11.xml"/> 
  <CLASS xlink:type="locator" xlink:label="12" 
         xlink:href="http://www.ibiblio.org/javafaq/course/week12.xml"/>
  <CLASS xlink:type="locator" xlink:label="13" 
         xlink:href="http://www.ibiblio.org/javafaq/course/week13.xml"/>
  
  <!-- Previous Links --> 
  <CONNECTION xlink:type="arc" xlink:from="2" xlink:to="1"/>
  <CONNECTION xlink:type="arc" xlink:from="3" xlink:to="2"/>
  <CONNECTION xlink:type="arc" xlink:from="4" xlink:to="3"/>
  <CONNECTION xlink:type="arc" xlink:from="5" xlink:to="4"/>
  <CONNECTION xlink:type="arc" xlink:from="6" xlink:to="5"/>
  <CONNECTION xlink:type="arc" xlink:from="7" xlink:to="6"/>
  <CONNECTION xlink:type="arc" xlink:from="8" xlink:to="7"/>
  <CONNECTION xlink:type="arc" xlink:from="9" xlink:to="8"/>
  <CONNECTION xlink:type="arc" xlink:from="10" xlink:to="9"/>
  <CONNECTION xlink:type="arc" xlink:from="11" xlink:to="10"/> 
  <CONNECTION xlink:type="arc" xlink:from="12" xlink:to="11"/> 
  <CONNECTION xlink:type="arc" xlink:from="13" xlink:to="12"/>
  
  <!-- Next Links --> 
  <CONNECTION xlink:type="arc" xlink:from="1" xlink:to="2"/>
  <CONNECTION xlink:type="arc" xlink:from="2" xlink:to="3"/>
  <CONNECTION xlink:type="arc" xlink:from="3" xlink:to="4"/>
  <CONNECTION xlink:type="arc" xlink:from="4" xlink:to="5"/>
  <CONNECTION xlink:type="arc" xlink:from="5" xlink:to="6"/>
  <CONNECTION xlink:type="arc" xlink:from="6" xlink:to="7"/>
  <CONNECTION xlink:type="arc" xlink:from="7" xlink:to="8"/>
  <CONNECTION xlink:type="arc" xlink:from="8" xlink:to="9"/>
  <CONNECTION xlink:type="arc" xlink:from="9" xlink:to="10"/>
  <CONNECTION xlink:type="arc" xlink:from="10" xlink:to="11"/> 
  <CONNECTION xlink:type="arc" xlink:from="11" xlink:to="12"/> 
  <CONNECTION xlink:type="arc" xlink:from="12" xlink:to="13"/>
  
</COURSE>

Linkbases

A single XML document may contain multiple out-of-line extended links. However, the current XLink specification is relatively silent on exactly what the format of such a compound document should look like. About all it says is that such a document must be a well-formed XML document. An XLink processor would presumably read the entire document an extract any extended links that indicate connections to or from the current document.
A browser or other application that's reading the individual pages needs to be informed that there is a separate linkbase elsewhere that it should read and parse so that it can show the links to the user.
Ideally it would be handled through some external mechanism like HTTP headers.
The only currently defined way to do this is to add an arc element inside the documents the out-of-line link connects. This arc has an xlink:arcrole attribute with the value http://www.w3.org/1999/xlink/properties/linkbase. Its xlink:to attribute points to the linkbase.

<METADATA xlink:type="xlink:extended"
          xmlns:xlink="http://www.w3.org/1999/xlink">
  <LINKBASE xlink:type="arc"
            xmlns:xlink="http://www.w3.org/1999/xlink"
            xlink:arcrole="http://www.w3.org/1999/xlink/properties/linkbase"
            xlink:to="courselinks"/>
  <RESOURCE xlink:type="locator" href="courselinks.xml" 
            xlink:label="courselinks"/>
</METADATA>

XLink Summary

XLinks can do everything HTML links can do and quite a bit more, but they aren't supported by current applications.
XLink elements of all types are placed in the http://www.w3.org/1999/xlink namespace, normally with the xlink prefix.
Simple links behave much like HTML links, but they are not restricted to a single <A> tag.
Linking elements are identified by xlink:type attributes.
Simple link elements are identified by xlink:type attributes with the value simple.
Linking elements can describe the resource they're linking to with xlink:title and xlink:role attributes.
Linking elements can use the xlink:show attribute to tell the application how the content should be displayed when the link is activated, for example, by opening a new window.
Linking elements can use the xlink:actuate attribute to tell the application whether the link should be traversed without a specific user request.
Extended link elements are identified by xlink:type attributes with the value extended.
Extended links can contain multiple locators, resources, and arcs. Currently, it's left to the application to decide how to choose between different alternatives.
A resource element represents a local, inline resource. It is identified by an xlink:type attributes with the value resource.
A locator element represents a remote, out-of-line resource. It is identified by an xlink:type attribute with the value locator.
Both locator and resource elements can be labeled by xlink:label attributes. These labels are used to define arcs between resources.
A locator element has an xlink:href attribute whose value is the URI of the resource it locates.
Arc elements are identified by xlink:type attributes with the value arc.
Arc elements have xlink:from and xlink:to attributes of IDREF type that identify the resources they connect by their labels.
Arc elements may have xlink:show and xlink:actuate attributes to determine when and how traversal of the link occurs.
An out-of-line link is a link that does not contain any local resources.
A linkbase is a document containing multiple out-of-line, extended link elements.
A linkbase is found when a document with an extended link with the role xlink:external-linkset is read.

To Learn More

This presentation: http://www.ibiblio.org/xml/slides/xmloneaustin2001/xlinks/
XLink Specification: http://www.w3.org/TR/xlink/
Chapter 16 of the XML Bible: http://www.ibiblio.org/xml/books/bible/updates/16.html
Chapter 10 of XML in a Nutshell

Part II: XPointers

The many advantages of descriptive pointing are crucial for a scalable, generic pointing system. Descriptive pointing is crucial for all the same reasons that descriptive markup is crucial to documents, and that making links first-class objects is crucial to linking. It is also clearly feasible, as shown by multiple implementations of the prior WDs from the XML WG, and of TEI extended pointers.

--XML Linking Working Group, XML XPointer Requirements

XPointers

Why Use XPointers?
XPointer Examples
A Concrete Example
Location Paths, Steps, and Sets
Axes
Node Tests
Predicates
Functions that Return Node Sets
Points
Ranges
Child Sequences

What are XPointers?

XPointer, the XML Pointer Language, defines an addressing scheme for individual parts of an XML document.
XLinks point to a URI (in practice, a URL) that specifies a particular resource.
The URI may include an XPointer part that more specifically identifies the desired part or element of the targeted resource or document.
XPointers use the same XPath syntax you're familiar with from XSL transformations to identify the parts of the document they point to, along with a few additional pieces.

Why Use XPointers?

The element with a given ID
All elements that possess a certain attribute
The first element of a certain type
The last element whose class attribute has the value pending.
The seventh element of a given type
The first child of the seventh element
and many more including combinations of these addresses...

XPointer Examples

xpointer(id("ebnf"))
xpointer(descendant::language[position()=2])
ebnf
xpointer(/child::spec/child::body/child::*/child::language[position()=2])
/1/14/2
xpointer(id("ebnf"))xpointer(id("EBNF"))

XPointers in URIs

The XPointer does not specify the document. A URI does.
XPointers can be used as fragment identifiers in a URI after a #
For example,
http://www.w3.org/TR/1998/REC-xml-19980210.xml#xpointer(id("ebnf")) http://www.w3.org/TR/1998/REC-xml-19980210.xml#xpointer(descendant::language[position()=2]) http://www.w3.org/TR/1998/REC-xml-19980210.xml#ebnf http://www.w3.org/TR/1998/REC-xml-19980210.xml#xpointer(/child::spec/child::body/child::*/child::language[position()=2]) http://www.w3.org/TR/1998/REC-xml-19980210.xml#/1/14/2 http://www.w3.org/TR/1998/REC-xml-19980210.xml#xpointer(id("ebnf"))xpointer(id("EBNF"))

XPointers in XLinks

<SPECIFICATION xmlns:xlink="http://www.w3.org/1999/xlink" xlink:type="simple" xlink:href="http://www.w3.org/TR/1998/REC-xml-19980210.xml#xpointer(id('ebnf'))"> xlink:actuate="onRequest" xlink:show="replace"> Extensible Markup Language (XML) 1.0 </SPECIFICATION>

A Concrete Example

<?xml version="1.0"?>
<!DOCTYPE FAMILYTREE [

  <!ELEMENT FAMILYTREE (PERSON | FAMILY)*>

  <!-- PERSON elements --> 
  <!ELEMENT PERSON (NAME*, BORN*, DIED*, SPOUSE*)>
  <!ATTLIST PERSON 
    ID      ID     #REQUIRED
    FATHER  CDATA  #IMPLIED
    MOTHER  CDATA  #IMPLIED
  >
  <!ELEMENT NAME (#PCDATA)>
  <!ELEMENT BORN (#PCDATA)>
  <!ELEMENT DIED  (#PCDATA)>
  <!ELEMENT SPOUSE EMPTY>
  <!ATTLIST SPOUSE IDREF IDREF #REQUIRED>
  
  <!--FAMILY--> 
  <!ELEMENT FAMILY (HUSBAND?, WIFE?, CHILD*) >
  <!ATTLIST FAMILY ID ID #REQUIRED>
  
  <!ELEMENT HUSBAND EMPTY>
  <!ATTLIST HUSBAND IDREF IDREF #REQUIRED>
  <!ELEMENT WIFE EMPTY>
  <!ATTLIST WIFE IDREF IDREF #REQUIRED>
  <!ELEMENT CHILD EMPTY>
  <!ATTLIST CHILD IDREF IDREF #REQUIRED>

]>
<FAMILYTREE>

  <PERSON ID="p1">
    <NAME>Domeniquette Celeste Baudean</NAME>
    <BORN>21 Apr 1836</BORN>
    <DIED>Unknown</DIED>
    <SPOUSE IDREF="p2"/>
  </PERSON>

  <PERSON ID="p2">
    <NAME>Jean Francois Bellau</NAME>
    <SPOUSE IDREF="p1"/>
  </PERSON>

  <PERSON ID="p3" FATHER="p2" MOTHER="p1">
    <NAME>Elodie Bellau</NAME>
    <BORN>11 Feb 1858</BORN>
    <DIED>12 Apr 1898</DIED>
    <SPOUSE IDREF="p4"/>
  </PERSON>

  <PERSON ID="p4" FATHER="p2" MOTHER="p1">
    <NAME>John P. Muller</NAME>
    <SPOUSE IDREF="p3"/>
  </PERSON>

  <PERSON ID="p7">
    <NAME>Adolf Eno</NAME>
    <SPOUSE IDREF="p6"/>
  </PERSON>

  <PERSON ID="p6" FATHER="p2" MOTHER="p1">
    <NAME>Maria Bellau</NAME>
    <SPOUSE IDREF="p7"/>
  </PERSON>

  <PERSON ID="p5" FATHER="p2" MOTHER="p1">
    <NAME>Eugene Bellau</NAME>
  </PERSON>

  <PERSON ID="p8" FATHER="p2" MOTHER="p1">
    <NAME>Louise Pauline Bellau</NAME>
    <BORN>29 Oct 1868</BORN>
    <DIED>3 May 1938</DIED>
    <SPOUSE IDREF="p9"/>
  </PERSON>

  <PERSON ID="p9">
    <NAME>Charles Walter Harold</NAME>
    <BORN>about 1861</BORN>
    <DIED>about 1938</DIED>
    <SPOUSE IDREF="p8"/>
  </PERSON>

  <PERSON ID="p10" FATHER="p2" MOTHER="p1">
    <NAME>Victor Joseph Bellau</NAME>
    <SPOUSE IDREF="p11"/>
  </PERSON>

  <PERSON ID="p11">
    <NAME>Ellen Gilmore</NAME>
    <SPOUSE IDREF="p10"/>
  </PERSON>

  <PERSON ID="p12" FATHER="p2" MOTHER="p1">
    <NAME>Honore Bellau</NAME>
  </PERSON>

  <FAMILY ID="f1">
    <HUSBAND IDREF="p2"/>
    <WIFE IDREF="p1"/>
    <CHILD IDREF="p3"/>
    <CHILD IDREF="p5"/>
    <CHILD IDREF="p6"/>
    <CHILD IDREF="p8"/>
    <CHILD IDREF="p10"/>
    <CHILD IDREF="p12"/>
  </FAMILY>

  <FAMILY ID="f2">
    <HUSBAND IDREF="p7"/>
    <WIFE IDREF="p6"/>
  </FAMILY>

</FAMILYTREE>

Location Paths, Steps, and Sets

Many (though not all) XPointers are location paths. These are the same location paths used by XSLT.
Location paths are built from location steps.
Each location step specifies a point in the targeted document, generally relative to some other well-known point such as the start of the document or another location step. This well-known point is called the context node.

Location Steps

A location step has three parts:
- The axis
- The node test
- An optional predicate
axis::node-test[predicate]
child::PERSON[position()=2]
The axis tells you in what direction to search from the context node.
The node test tells you which nodes to consider along the axis.
The predicate is a boolean expression that tests each node in that set. If that expression returns false, then the node is removed from the set.

Location Paths

xpointer(/child::FAMILYTREE/child::PERSON[position()=3])
The location path of this XPointer is /child::FAMILYTREE/child::PERSON[position()=3].
It is built from two location steps:
- /child::FAMILYTREE
- child::PERSON[position()=3]

It identifies the single node:

  <PERSON ID="p3" FATHER="p2" MOTHER="p1">
    <NAME>Elodie Bellau</NAME>
    <BORN>11 Feb 1858</BORN>
    <DIED>12 Apr 1898</DIED>
    <SPOUSE IDREF="p4"/>
  </PERSON>

Location Paths that Identify Multiple Nodes

xpointer(/child::FAMILYTREE/child::PERSON[position()>3])
Identifies all PERSON element nodes after Elodie Bellau

Axes

XPath defines twelve axes along which an XPointer may search for nodes
These depend on context to determine exactly what they point to.
For instance, consider this location path:
id("p6")/child::NAME
It begins with the id() function that returns a node set containing the element with the ID type attribute whose value is p6. This provides a context node for the following location step along the relative child axis.
Other axes include
- ancestor
- descendant
- self
- ancestor-or-self
- descendant-or-self
- attribute
Each selects nodes from a particular subset of the nodes in the document. For instance, the following axis selects from nodes that come after the context node. The preceding axis selects from nodes that come before the context node.

Location Step Axes

Axis	Selects From
`ancestor`	the parent of the context node, the parent of the parent of the context node, the parent of the parent of the parent of the context node, and so forth back to the root node
`ancestor-or-self`	the ancestors of the context node and the context node itself
`attribute`	the attributes of the context node
`child`	the immediate children of the context node
`descendant`	the children of the context node, the children of the children of the context node, and so forth
`descendant-or-self`	the context node itself and its descendants
`following`	all nodes that start after the end of the context node, excluding attribute and namespace nodes
`following-sibling`	all nodes that start after the end of the context node and have the same parent as the context node
`parent`	the unique parent node of the context node
`preceding`	all nodes that end before the beginning of the context node, excluding attribute and namespace nodes
`preceding-sibling`	all nodes that start before the beginning of the context node and have the same parent as the context node
`self`	the context node

Node Tests

There are ten node tests in XPointer, eight from XPath and two new ones:
- name
- *
- prefix:*
- @name
- node()
- text()
- comment()
- processing-instruction()
- point()
- range()
A node test is attached to an axis to specify which nodes along the axis are chosen.
For example:
/descendant::body/child::*/attribute::xlink:*

Predicates

Each location step can contain zero or more predicates that further restrict which nodes an XPointer points to. In most non-trivial cases a predicate is necessary to pick the one node from a node set that you want.
Each predicate contains a boolean expression in square brackets ([]) that further winnows the node set.
This allows an XPointer to select nodes according to many different criteria. For example, you can select:
- All elements that have a specified attribute
- All elements that have a specified attribute with a specified value
- The first element that contains a specified child element
- An element whose text content includes a specified string
- All elements that are not the first or last children of their parents
- All elements whose value is a number
- All elements whose value is a number greater than 100
These are just a small sampling of the selections that predicates make possible.

Boolean Conversion

XPath predicate expressions are ultimately converted to a boolean after all calculations are finished. Non-boolean results are converted as follows:
- A number is true if it's equal to the position of the context node, false otherwise.
- An empty node set is false; all other node sets are true.
- A zero length string is false; all other strings are true (including the string "false")
The predicate expression is evaluated for each node in the context node list. Each node for which the expression ultimately evaluates to false is removed from the list. Thus only those nodes that satisfy the predicate remain.

The position() function

Probably the function most frequently used in XPointer predicates is position(). This returns the index of the node in the context node list. This allows you to find the first, second, third, or other indexed node.

You can compare positions using the various relational operators like <, >, =, !=, >=, and <=.

xpointer(/child::FAMILYTREE/child::*[position()=1])
xpointer(/child::FAMILYTREE/child::*[position()=2])
xpointer(/child::FAMILYTREE/child::*[position()=3])
xpointer(/child::FAMILYTREE/child::*[position()=4])
xpointer(/child::FAMILYTREE/child::*[position()=5])
xpointer(/child::FAMILYTREE/child::*[position()=6])
xpointer(/child::FAMILYTREE/child::*[position()=7])
xpointer(/child::FAMILYTREE/child::*[position()=8])
xpointer(/child::FAMILYTREE/child::*[position()=9])
xpointer(/child::FAMILYTREE/child::*[position()=10])
xpointer(/child::FAMILYTREE/child::*[position()=11])
xpointer(/child::FAMILYTREE/child::*[position()=12])
xpointer(/child::FAMILYTREE/child::*[position()=13])
xpointer(/child::FAMILYTREE/child::*[position()=14])

Identifying an element by its position

xpointer(/child::FAMILYTREE/child::*[1])
xpointer(/child::FAMILYTREE/child::*[2])
xpointer(/child::FAMILYTREE/child::*[3])
xpointer(/child::FAMILYTREE/child::*[4])
xpointer(/child::FAMILYTREE/child::*[5])
xpointer(/child::FAMILYTREE/child::*[6])
xpointer(/child::FAMILYTREE/child::*[7])
xpointer(/child::FAMILYTREE/child::*[8])
xpointer(/child::FAMILYTREE/child::*[9])
xpointer(/child::FAMILYTREE/child::*[10])
xpointer(/child::FAMILYTREE/child::*[11])
xpointer(/child::FAMILYTREE/child::*[12])
xpointer(/child::FAMILYTREE/child::*[13])
xpointer(/child::FAMILYTREE/child::*[14])

Functions that Return Node Sets


        id()
        here()
        origin()

The last two, here() and origin() are XPointer extensions to XPath that are not available in XSLT.

id()

The id() function selects the element in the document that has an ID type attribute with a specified value.
For example, consider the URI http://www.theharolds.com/genealogy.xml#xpointer(id("p12")). If you look back at Listing 17-1, you find this element:
```
<PERSON ID="p12" FATHER="p2" MOTHER="p1">
  <NAME>Honore Bellau</NAME>
</PERSON>
```
Since ID pointers are so common and so useful, there's also a shortcut for this. If all you want to do is point to a particular element with a particular ID, you can skip all the xpointer(id("")) fru-fru and just use the bare ID after the # like this:
http://www.theharolds.com/genealogy.xml#p12

here()

Consider a simple slide show. In this example, here()/following::SLIDE[1] refers to the next slide in the show. here()/preceding::SLIDE[1] refers to the previous slide in the show. Presumably this would be used in conjunction with a style sheet that showed one slide at a time.

<?xml version="1.0"?>
<SLIDESHOW xmlns:xlink="http://www.w3.org/1999/xlink">
  <SLIDE>
    <H1>Welcome to the slide show!</H1>
    <BUTTON xlink:type="simple"
            xlink:href="here()/following::SLIDE[1]">
      Next
    </BUTTON>
  </SLIDE>
  <SLIDE>
    <H1>This is the second slide</H1>
    <BUTTON xlink:type="simple" 
            xlink:href="here()/preceding::SLIDE[1]">
      Previous
    </BUTTON>
    <BUTTON xlink:type="simple" 
            xlink:href="here()/following::SLIDE[1]">
      Next
    </BUTTON>
  </SLIDE>
  <SLIDE>
    <H1>This is the second slide</H1>
    <BUTTON xlink:type="simple" 
            xlink:href="here()/preceding::SLIDE[1]">
      Previous
    </BUTTON>
    <BUTTON xlink:type="simple" 
           xlink:href="here()/following::SLIDE[1]">
      Next
    </BUTTON>
  </SLIDE>
  <SLIDE>
    <H1>This is the third slide</H1>
    <BUTTON xlink:type="simple" 
            xlink:href="here()/preceding::SLIDE[1]">
      Previous
    </BUTTON>
    <BUTTON xlink:type="simple" 
            xlink:href="here().following(1,SLIDE)">
      Next
    </BUTTON>
  </SLIDE>
  ...
  <SLIDE>
    <H1>This is the last slide</H1>
    <BUTTON xlink:type="simple"
            xlink:href="here()/preceding::SLIDE[1]">
      Previous
    </BUTTON>
  </SLIDE>

</SLIDESHOW>

Generally, the here() location term is only used in fully relative URIs in XLinks. If any URI part is included, it must be the same as the URI of the current document.

origin()

The origin() function is much the same as here(); that is, it refers to the source of a link. However, origin() is used in out-of-line links where the link is not actually present in the source document. It points to the element in the source document from which the user activated the link.

Points

Every point is either between two nodes or between two characters in the parsed character data of a document. To make sense of this you have to remember that parsed character data is part of a text node. For instance, consider this very simple but well-formed XML document:

<GREETING>
  Hello
</GREETING>

Tree Structure

There are exactly three nodes and 13 distinct points in this document. In order the points are:

The point before the root node
The point before the GREETING element node
The point before the text node containing the text "Hello" (as well as assorted white space)
The point before the white space between <GREETING> and Hello.
The point before the first H in Hello
The point between the H and the e in Hello
The point between the e and the l in Hello
The point between the l and the l in Hello
The point between the l and the o in Hello
The point after the o in Hello
The point after the white space between Hello and </GREETING>.
The point after the GREETING element.
The point after the root node.

The exact details of the white space in the document are not considered here. XPointer collapses all runs of white space to a single space.

Point Expressions

A point is selected using an XPath expression with the point() node test
A predicate can indicate which of several points is chosen.
child::point()[position()=n]
The index refers to the point before n^th child element if the context node is an element or root node, or to the n^th character of the string value of the node otherwise.
For example, to select the point immediately before the D in Domeniquette Celeste Baudean's NAME element,
/child::FAMILYTREE/descendant::*[position()=1]/child::NAME/child::text()/child::point()[position()=0]
To select the point after the last e in Domeniquette, since there are 12 letters in Domeniquette,
/child::FAMILYTREE/descendant::*[position()=1]/child::NAME/child::text()/child::point()[position()=12]

Ranges

In some applications it may be important to specify a range across a document rather than a particular point in the document. For instance, the selection a user makes with a mouse is not necessarily going to match up with any one element or node. It may start in the middle of one paragraph, extend across a heading and a picture and then into the middle of another paragraph two pages down. Any such contiguous area of a document can be described with a range.

A range begins at one point and continues until another point.
The endpoints of the range are identified by location paths.
If the starting path points to a node set rather than a point, then the first point in the location set the XPointer identifies is the start point.
If the ending location path points to a node set rather than a point, then the last point in the location set the XPointer identifies is the end point of the range.

Range Expressions

To specify a range, you append /range-to(end-point) to a location path specifying the start point of the range.
The parentheses contain a location path specifying the endpoint of the range.
For example, suppose you want to select everything between the first PERSON element and the last PERSON element
xpointer(/child::PERSON[position() = 1]/range-to(/child::PERSON[position() = last()]))

Range Functions

range(location-set): returns returns a location set containing one range for each location in the argument.
The range is the minimum range necessary to cover the entire location.
range-inside(location-set): Returns a location set containing the interiors of each of the locations in the input.
start-point(location-set): Returns a location set that contains one point representing the first point of each location in the input location set. For example, start-point(//PERSON[1]) Returns the point immediately before the first PERSON element. start-point(//PERSON) returns the set of points immediately before each PERSON element.
end-point(location-set): The same as start-point() except that it returns the points immediately after each location in its input.

String Ranges

string-range(node-set,substring,index,length)

A string range points to an occurrence of a specified string, or a substring of a given string in the text (not markup) of the document.
string-range() takes as arguments a node set to search and a substring to search for.
string-range() returns a node set containing one range for each non-overlapping match to the string.
By default, the range returned starts before the first matched character and encompasses all the matched characters.
You can also provide optional index and length arguments indicating how many characters after the match the range should start and how many characters after the start the range should continue.
For example, this XPointer finds all occurrences of the string "Harold":
xpointer(string-range(/,"Harold"))
You can change the first argument to specify what nodes you want to look in. For example, this XPointer finds all occurrences of the string "Harold" in NAME elements:
xpointer(string-range(//NAME,"Harold"))
String ranges may have node tests. Thus this XPointer finds only the first occurrence of the string "Harold" in the document:
xpointer(string-range(/,"Harold")[position()=1])

This targets the position immediately preceding the word Harold in Charles Walter Harold's NAME element. This is not the same as pointing at the entire NAME element as an element-based selector would do.
A third numeric argument targets a particular position in the string. For example, this targets the point immediately following the first occurrence of the string "Harold" because Harold has six letters:
xpointer(string-range(/,"Harold",6)[position()=1])
An optional fourth argument specifies the number of characters to select. For example, this URI selects the "old" from the first occurrence of the entire string "Harold":
xpointer(string-range(/,"Harold",4,3)[position()=1])
When matching strings, case is considered. All white space is condensed to a single space. Markup characters are ignored.

XPointers and Namespaces

XPointers may appear in non-XML documents where namespace prefixes are not defined.
You use an xmlns() scheme to map a prefix to a URI. For example,
xmlns(svg=http://www.w3.org/2000/svg) xpointer(//svg:polygon[3])

Child Sequences

A child sequence is a shortcut for XPointers that consist of nothing but a series of child relative location steps counting down from the root node, each of which selects a particular child by position only.
The shortcut is to use only the position number and the slashes that separate individual elements from each other, like this:
http://www.theharolds.com/genealogy.xml#/1/4
/1/4 is a child sequence that says to select the fourth child element of the first child element of the root.
Child sequences may include an initial ID. In that case the counting begins from the element with that ID rather than from the root. For example, John P. Muller's PERSON element has an ID attribute with the value p4. Consequently the XPointer p4/1 points to his NAME element and p4/2 points to his SPOUSE element.
Each child sequence always points to a single element. You cannot use child sequences with any other relative location steps. You cannot use them to select elements of a particular type. You cannot use them to select attribute or strings. You can only use them to select a single element by its relative location in the tree.

XPointer Summary

XPointers refer to particular parts of or locations in XML documents.
The syntax of an XPointer is the keyword xpointer, followed by parentheses containing an XPath expression that returns a node set.
The id() function points to an element with a specified value for an ID type attribute.
Location steps can be chained to make more sophisticated location paths.
Each location step contains an axis, a node test, and zero or more predicates.
Relative location steps select nodes in a document based on their relationship to a context node.
The self axis points to the context node. It can be abbreviated as a period (.).
The parent axis points to the node that contains the context node. It can be abbreviated as a double period (..).
The child axis points to immediate children of the context node. It can be abbreviated simply by a node test.
The descendant axis points to all elements contained in the context node. It can be abbreviated as a double slash (//).
The descendant-or-self axis points to all elements contained in the context node as well as the context node itself.
The ancestor axis points to an element that contains the context node.
The ancestor-or-self axis points to all elements that contain the context node as well as the context node itself.
The preceding axis points to any element that comes before the context node.
The following axis points to any element following the context node.
The preceding-sibling axis selects from sibling elements that precede the context node.
The following-sibling axis selects from sibling elements that follow the context node.
The attribute axis points to an attribute of the context node. It can be abbreviated as a @ sign.
The node test of a relative location step is normally an element name, but may also be * to select all elements, @* to select all attributes, @name to select all attributes with the given name, prefix:* to select all elements in the specified namespace, or one of the keywords comment(), text(), processing-instruction(), node(), point() or range().
The optional predicate of a relative location step is an XPath boolean expression enclosed in square brackets that further narrows down the node set the XPointer refers to.
A point indicates a position preceding or following a node or a character.
A range identifies the parsed character data between two points.
The string-range() function points to a specified block of text.
A child sequence points to an element by counting children from the root.

To Learn More

This presentation: http://www.ibiblio.org/xml/slides/xmloneaustin2001/xlinks
XPointer Specification: http://www.w3.org/TR/xptr
Chapter 17 of the XML Bible: http://www.ibiblio.org/xml/books/bible/updates/17.html
Chapter 10 of XML in a Nutshell

Part III: XML Base

What is XML Base?

An inband means of specifying the proper URI for a document that can succeed even if out-of-band mechanisms aren't available.
A means of specifying the proper base URI which relative URLs are relative to, even if the document itself is copied to a different location.
An XML replacement for the HTML BASE element
W3C Proposed Recommendation, December 20, 2000

The xml:base attribute

<slide xml:base="http://www.ibiblio.org/xml/slides/xmloneaustin2001/xlinks/">
  <title>The xml:base attribute</title>
  ...
  <previous xlink:type="simple" xlink:href="What_Is_XBase.xml"/>
  <next xlink:type="simple" xlink:href="xbaseexample.xml"/>
</slide>

May be attached to any element to set the base URI for that element and its descendants
The xml prefix is automatically bound to the http://www.w3.org/XML/1998/namespace URI
The value should be an absolute URI

XML Base Example

<COURSE xmlns:xlink="http://www.w3.org/1999/xlink"
         xml:base="http://www.ibiblio.org/javafaq/course/"
         xlink:type="extended">

  <TOC xlink:type="locator" xlink:href="index.html" xlink:label="index"/>

  <CLASS xlink:type="locator" xlink:label="class"
         xlink:href="week1.xml"/>
  <CLASS xlink:type="locator" xlink:label="class" 
         xlink:href="week2.xml"/>
  <CLASS xlink:type="locator" xlink:label="class" 
         xlink:href="week3.xml"/>
  <CLASS xlink:type="locator" xlink:label="class" 
         xlink:href="week4.xml"/>
  <CLASS xlink:type="locator" xlink:label="class" 
         xlink:href="week5.xml"/>
  <CLASS xlink:type="locator" xlink:label="class" 
         xlink:href="week6.xml"/>
  <CLASS xlink:type="locator" xlink:label="class" 
         xlink:href="week7.xml"/>
  <CLASS xlink:type="locator" xlink:label="class" 
         xlink:href="week8.xml"/>
  <CLASS xlink:type="locator" xlink:label="class" 
         xlink:href="week9.xml"/>
  <CLASS xlink:type="locator" xlink:label="class" 
         xlink:href="week10.xml"/> 
  <CLASS xlink:type="locator" xlink:label="class" 
         xlink:href="week11.xml"/> 
  <CLASS xlink:type="locator" xlink:label="class" 
         xlink:href="week12.xml"/>
  <CLASS xlink:type="locator" xlink:label="class" 
         xlink:href="week13.xml"/>
  
  <CONNECTION xlink:type="arc" from="index" to="class"/>
  <CONNECTION xlink:type="arc" from="class" to="index"/>
  
</COURSE>

"index.html" now resolves to the URI "http://www.ibiblio.org/javafaq/course/index.html"
"week1.xml" resolves to the URI "http://www.ibiblio.org/javafaq/course/week1.xml"
"week2.xml" resolves to the URI "http://www.ibiblio.org/javafaq/course/week2.xml"
"week3.xml" resolves to the URI "http://www.ibiblio.org/javafaq/course/week3.xml"
etc.

Open Issues

How does it interact with XHTML? in particular, the XHTML base element?
Browser and other application support?

To Learn More

XML Base Specification: http://www.w3.org/TR/xmlbase

Part IV: XInclude

The problem is that we're not providing the tools. We're providing the specs. That's a whole different ball game. If tools existed for actually making really interesting use of RDF and XLink and XInclude then people would use them. If IE and/or Mozilla supported the full gamut of specs, from XSLT 1.0 to XLink and XInclude (OK, so they're not quite REC's, but with time...) then you would find people using them more.

--Matt Sergeant on the xml-dev mailing list

What is XInclude?

A means of including one XML document inside another, irrespective of validation.
W3C Working Draft, October 26, 2000
Based on the XML Infoset; a source infoset is transformed into a result infoset

Alternatives (and why they don't work)

xlink:show="embed" only graphically includes, like the IMG element in HTML. It does not merge infosets.
External parsed entities:
- Require a DTD
- Can only handle very limited documents; i.e. not all well-formed XML documents are well-formed external parsed entities. In particular XML declarations can be and document type declarations are a problem.
- Doesn't allow unparsed text inserted as CDATA
XSLT document() function
- Only handles XSLT
- No unparsed, pure-text includes
Custom code or XSLT extension functions

The include element

href attribute identifies the document (or part thereof) to be included
In the http://www.w3.org/1999/XML/xinclude namespace.
The prefix xinclude is customary.

<book xmlns:xinclude="http://www.w3.org/1999/XML/xinclude">
  <title>Processing XML with Java</title>
  <chapter><xinclude:include href="dom.xml"/></chapter>
  <chapter><xinclude:include href="sax.xml"/></chapter>
  <chapter><xinclude:include href="jdom.xml"/></chapter>
</book>

The parse attribute

parse="xml": The resource must be parsed as XML and the infosets merged. This is the default.
parse="text": The resource must be treated as pure text and inserted as a text node. When serialized, this means that characters like < will change to < and so forth.

<slide xmlns:xinclude="http://www.w3.org/1999/XML/xinclude">
  <title>The href attribute</title>
  
<ul>
  <li>Identifies the document to be included with a URI</li>
  <li>The document at the URI replaces the <code>include</code> 
      element in the including document</li>
  <li>The <code>xinclude</code> prefix is bound to the http://www.w3.org/1999/XML/xinclude
  namespace URI. 
  </li>
</ul>  

<pre><code><xinclude:include parse="text" href="processing_xml_with_java.xml"/>
</code></pre>
        
  <description>
      A slide from Elliotte Rusty Harold's XML and Hypertext seminar at
      <host_ref/>, <date_ref/>
    </description>
  <last_modified>October 26, 2000</last_modified>
</slide>

Implementation as JDOM

/*--

 Copyright 2000 Elliotte Rusty Harold.
 All rights reserved.

 I haven't yet decided on a license.
 It will be some form of open source.

 THIS SOFTWARE IS PROVIDED "AS IS" AND ANY EXPRESSED OR IMPLIED
 WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
 OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
 DISCLAIMED.  IN NO EVENT SHALL ELLIOTTE RUSTY HAROLD OR ANY
 OTHER CONTRIBUTORS TO THIS PACKAGE
 BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
 SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
 LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
 USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
 ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
 OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
 OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
 SUCH DAMAGE.

 */

package com.macfaq.xml;

import java.net.URL;
import java.net.MalformedURLException;
import java.util.Stack;
import java.util.Iterator;
import java.util.List;
import java.io.File;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.BufferedInputStream;
import java.io.InputStream;
import org.jdom.Namespace;
import org.jdom.Comment;
import org.jdom.CDATA;
import org.jdom.JDOMException;
import org.jdom.Attribute;
import org.jdom.Element;
import org.jdom.ProcessingInstruction;
import org.jdom.Document;
import org.jdom.input.SAXBuilder;
import org.jdom.output.XMLOutputter;

/**
 * <p><code>XIncluder</code> provides methods to
 * resolve JDOM elements and documents to produce
 * a new Document or Element with all
 * XInclude references resolved.
 * </p>
 *
 *
 * @author Elliotte Rusty Harold
 * @version 1.0d2
 */
public class XIncluder {

  public final static Namespace XINCLUDE_NAMESPACE
    = Namespace.getNamespace("xinclude", "http://www.w3.org/1999/XML/xinclude");

  // No instances allowed
  private XIncluder() {}

  private static SAXBuilder builder = new SAXBuilder();

  /**
    * <p>
    * This method resolves a JDOM <code>Document</code>
    * and merges in all XInclude references.
    * If a referenced document cannot be found it is replaced with
    * an error message. The Document object returned is a new document.
    * The original <code>Document</code> is not changed.
    * </p>
    *
    * @param original <code>Document</code> that will be processed
    * @param base     <code>String</code> form of the base URI against which
    *                 relative URLs will be resolved. This can be null if the
    *                 document includes an <code>xml:base</code> attribute.
    * @return Document new <code>Document</code> object in which all
    *                  XInclude elements have been replaced.
    * @throws CircularIncludeException if this document possesses a cycle of
    *                                  XIncludes.
    * @throws MalformedURLException if Java cannot parse the base URI using the
    *                               the normal methods of java.net.URL.
    */
    public static Document resolve(Document original, String base)
      throws CircularIncludeException, MalformedURLException {

        if (original == null) throw new NullPointerException("Document must not be null");

        Element root = original.getRootElement();
        Element resolved = (Element) resolve(root, base);

        // catch a ClassCastException if a String is returned????
        // Is the root element allowed to be replaced by
        // an parse="text"

        Document result = new Document(resolved, original.getDocType());

        Iterator iterator = original.getMixedContent().iterator();
        while (iterator.hasNext()) {
          Object o = iterator.next();
          if (o instanceof Comment) {
            Comment c = (Comment) o;
            result.addContent((Comment) c.clone());
          }
          else if (o instanceof ProcessingInstruction) {
            ProcessingInstruction pi =(ProcessingInstruction) o;
            result.addContent((ProcessingInstruction) pi.clone());
          }
        }

        return result;
  }

  /**
    * <p>
    * This method resolves a JDOM <code>Element</code>
    * and merges in all XInclude references. This process is recursive.
    * The element returned contains no XInclude elements.
    * If a referenced document cannot be found it is replaced with
    * an error message. The <code>Element</code> object returned is a new element.
    * The original <code>Element</code> is not changed.
    * </p>
    *
    * @param original <code>Element</code> that will be processed
    * @param base     <code>String</code> form of the base URI against which
    *                 relative URLs will be resolved. This can be null if the
    *                 element includes an <code>xml:base</code> attribute.
    * @return Object  Either an <code>Element</code>
    *                 (<code>parse="text"</code>) or a <code>String</code>
    *                 (<code>parse="xml"</code>)
    * @throws CircularIncludeException if this <code>Element</code> contains an XInclude element
    *                                  that attempts to include a document in which
    *                                  this element is directly or indirectly included.
    */
    public static Object resolve(Element original, String base)
     throws CircularIncludeException, MalformedURLException {

        if (original == null) {
          throw new NullPointerException("You can't XInclude a null element.");
        }
        Stack bases = new Stack();
        if (base != null) bases.push(base);

        Object result = resolve(original, bases);
        bases.pop();
        return result;

    }

    private static boolean isIncludeElement(Element element) {

        if (element.getName().equals("include") &&
            element.getNamespace().equals(XINCLUDE_NAMESPACE)) {
          return true;
        }
        return false;

    }


  /**
    * <p>
    * This method resolves a JDOM <code>Element</code>
    * and merges in all XInclude references. This process is recursive.
    * The element returned contains no XInclude elements.
    * If a referenced document cannot be found it is replaced with
    * an error message. The <code>Element</code> object returned is a new element.
    * The original <code>Element</code> is not changed.
    * </p>
    *
    * @param original <code>Element</code> that will be processed
    * @param bases    <code>Stack</code> containing the string forms of
    *                 all the URIs of documents which contain this element
    *                 through XIncludes. This used to detect if a circular
    *                 reference is being used.
    * @return Object  Either an <code>Element</code>
    *                 (<code>parse="text"</code>) or a <code>String</code>
    *                 (<code>parse="xml"</code>)
    * @throws CircularIncludeException if this <code>Element</code> contains an XInclude element
    *                                  that attempts to include a document in which
    *                                  this element is directly or indirectly included.
    */
  protected static Object resolve(Element original, Stack bases)
   throws CircularIncludeException {

    Element result;
    String base = "";
    if (bases.size() != 0) base = (String) bases.peek();

    if (isIncludeElement(original)) {
      Attribute href = original.getAttribute("href");
      if (href == null) { // illegal, what kind of exception????
        throw new IllegalArgumentException("Missing href attribute");
      }
      Attribute baseAttribute
       = original.getAttribute("base", Namespace.XML_NAMESPACE);
      if (baseAttribute != null) base = baseAttribute.getValue();
      boolean parse = true;
      Attribute parseAttribute = original.getAttribute("parse");
      if (parseAttribute != null) {
        if (parseAttribute.getValue().equals("text")) parse = false;
      }

      URL remote;
      if (base != null) {
        try {
          URL context = new URL(base);
          remote = new URL(context, href.getValue());
        }
        catch (MalformedURLException ex) {
          return "Unresolvable URL " + base + "/" + href.getValue();
        }
      }
      else {
        try {
          remote = new URL(href.getValue());
        }
        catch (MalformedURLException ex) {
          return "Unresolvable URL " + href.getValue();
        }
      }

      if (parse) {
                 // checks for equality (OK) or identity (not OK)????
        if (bases.contains(remote.toExternalForm())) {
          // need to figure out how to get file and number where
          // bad include occurs
          throw new CircularIncludeException(
            "Circular XInclude Reference to "
           + remote.toExternalForm() + " in " );
        }

        try {
          Document doc = builder.build(remote);
          bases.push(remote.toExternalForm());
          result = (Element) resolve(doc.getRootElement(), bases);
          bases.pop();
        }
        // Make this configurable
        catch (JDOMException e) {
           return "Document not found: " + remote.toExternalForm()
            + "\r\n" + e.getMessage();
        }
      }
      else { // insert text
        return downloadTextDocument(remote);
      }

    }
    // not an include element
    else { // recursively process children
       result = new Element(original.getName(), original.getNamespace());
       Iterator attributes = original.getAttributes().iterator();
       while (attributes.hasNext()) {
         Attribute a = (Attribute) attributes.next();
         result.addAttribute((Attribute) a.clone());
       }
       List children = original.getMixedContent();

       Iterator iterator = children.iterator();
       while (iterator.hasNext()) {
         Object o = iterator.next();
         if (o instanceof Element) {
           Element e = (Element) o;
           Object resolved = resolve(e, bases);
           if (resolved instanceof String) {
               result.addContent((String) resolved);
           }
           else result.addContent((Element) resolved);
         }
         else if (o instanceof String) {
           result.addContent((String) o);
         }
         else if (o instanceof Comment) {
           result.addContent((Comment) o);
         }
         else if (o instanceof CDATA) {
           result.addContent((CDATA) o);
         }
         else if (o instanceof ProcessingInstruction) {
           result.addContent((ProcessingInstruction) o);
         }
       }
    }

    return result;

  }

  /**
    * <p>
    * This utility method reads a document at a specified URL
    * and returns the contents of that document as a <code>String</code>.
    * It's used to include files with <code>parse="text"</code>
    * </p>
    *
    * <p>
    * If the document cannot be located due to an IOException,
    * then an error message string is returned. I'm not yet convinced this
    * is the right behavior. Perhaps I should pass on the exception?
    * </p>
    *
    * @param source   <code>URL</code> of the document that will be stored in
    *                 <code>String</code>.
    * @return String  The document retrieved from the source <code>URL</code>
    *                 or an error message if the document can't be retrieved.
    *                 Note: throwing an exception might be better here. I should
    *                 at least allow the setting of the error message.
    */
    public static String downloadTextDocument(URL source) {

        StringBuffer s = new StringBuffer();
        try {
          InputStream in = new BufferedInputStream(source.openStream());
          // does XInclude give you anything to specify the character set????
          InputStreamReader reader = new InputStreamReader(in, "8859_1");
          int c;
          while ((c = in.read()) != -1) {
            if (c == '<') s.append("&lt;");
            else if (c == '&') s.append("&amp;");
            else s.append((char) c);
          }
          return s.toString();
        }
        catch (IOException e) {
          return "Document not found: " + source.toExternalForm();
        }

    }

    /**
      * <p>
      * The driver method for the XIncluder program.
      * I'll probably move this to a separate class soon.
      * </p>
      *
      * @param args  <code>args[0]</code> contains the URL or file name
      *              of the document to be processed.
      */
    public static void main(String[] args) {

        SAXBuilder builder = new SAXBuilder();
        XMLOutputter outputter = new XMLOutputter();
        for (int i = 0; i < args.length; i++) {
          try {
            Document input = builder.build(args[i]);
            // absolutize URL
            String base = args[i];
            if (base.indexOf(':') < 0) {
              File f = new File(base);
              base = f.toURL().toExternalForm();
            }
            Document output = resolve(input, base);
            // need to set encoding on this to Latin-1 and check what
            // happens to UTF-8 curly quotes
            outputter.output(output, System.out);
          }
          catch (Exception e) {
            System.err.println(e);
            e.printStackTrace();
          }
        }

    }

}

Implementation as DOM

/*--

 Copyright 2000 Elliotte Rusty Harold.
 All rights reserved.

 I haven't yet decided on a license.
 It will be some form of open source.

 THIS SOFTWARE IS PROVIDED "AS IS" AND ANY EXPRESSED OR IMPLIED
 WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
 OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
 DISCLAIMED.  IN NO EVENT SHALL ELLIOTTE RUSTY HAROLD OR ANY
 OTHER CONTRIBUTORS TO THIS PACKAGE
 BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
 SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
 LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
 USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
 ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
 OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
 OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
 SUCH DAMAGE.

 */

package com.macfaq.xml;

import java.net.URL;
import java.net.MalformedURLException;
import java.util.Stack;
import org.xml.sax.SAXException;
import java.io.File;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.BufferedInputStream;
import java.io.InputStream;
import org.w3c.dom.Element;
import org.w3c.dom.Document;
import org.w3c.dom.Attr;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.w3c.dom.DocumentType;
import org.w3c.dom.DOMImplementation;
import org.apache.xerces.parsers.DOMParser;
import org.apache.xml.serialize.OutputFormat;
import org.apache.xml.serialize.XMLSerializer;

/**
 * <p><code>DOMXIncluder</code> provides methods to
 * resolve DOM elements and documents to produce
 * a new <code>Document</code> or <code>Element</code> with all
 * XInclude references resolved.
 * </p>
 *
 *
 * @author Elliotte Rusty Harold
 * @version 1.0d1
 */
public class DOMXIncluder {

  public final static String XINCLUDE_NAMESPACE
   = "http://www.w3.org/1999/XML/xinclude";

  // No instances allowed
  private DOMXIncluder() {}

  private static DOMParser parser = new DOMParser();

  /**
    * <p>
    * This method resolves a DOM <code>Document</code>
    * and merges in all XInclude references.
    * If a referenced document cannot be found it is replaced with
    * an error message. The <code>Document</code>
    * object returned is a new document.
    * The original <code>Document</code> object is not changed.
    * </p>
    *
    * @param original <code>Document</code> that will be processed
    * @param base     <code>String</code> form of the base URI against which
    *                 relative URLs will be resolved. This can be null if the
    *                 document includes an <code>xml:base</code> attribute.
    * @return Document new <code>Document</code> object in which all
    *                  XInclude elements have been replaced.
    * @throws CircularIncludeException if this document possesses a cycle of
    *                                  XIncludes.
    * @throws NullPointerException  if the original argument is null.
    */
    public static Document resolve(Document original, String base)
      throws CircularIncludeException, NullPointerException {

        if (original == null) {
          throw new NullPointerException("Document must not be null");
        }

        Element root = original.getDocumentElement();

        // catch a ClassCastException if a Text is returned????
        // Is the root element allowed to be replaced by
        // an parse="text"

        DOMImplementation impl = original.getImplementation();

        DocumentType oldDoctype = original.getDoctype();
        DocumentType newDoctype = impl.createDocumentType(
         oldDoctype.getName(),
         oldDoctype.getPublicId(),
         oldDoctype.getSystemId());

        Document resultDocument
         = impl.createDocument(root.getNamespaceURI(),
           root.getTagName(),
           newDoctype);
        // check that tag name is qualified name

        NodeList children = original.getChildNodes();
        for (int i = 0; i < children.getLength(); i++) {
          Node n = children.item(i);
          if (n instanceof Element) { // root element
              resultDocument.replaceChild(
               resolve(root, base, resultDocument),
               resultDocument.getDocumentElement()
             );
          }
          else if (n instanceof DocumentType) {
              // skip it, already cloned
          }
          else {
              resultDocument.appendChild(n.cloneNode(true));
          }
        }

        return resultDocument;
  }

  /**
    * <p>
    * This method resolves a DOM <code>Element</code>
    * and merges in all XInclude references. This process is recursive.
    * The element returned contains no XInclude elements.
    * If a referenced document cannot be found it is replaced with
    * an error message. The <code>Element</code> object returned is a new element.
    * The original <code>Element</code> is not changed.
    * </p>
    *
    * @param original <code>Element</code> that will be processed
    * @param base     <code>String</code> form of the base URI against which
    *                 relative URLs will be resolved. This can be null if the
    *                 element includes an <code>xml:base</code> attribute.
    * @param resolved <code>Document</code> into which the resolved element will be placed.
    * @return Node    Either an <code>Element</code>
    *                 (<code>parse="text"</code>) or a <code>Text</code>
    *                 (<code>parse="xml"</code>)
    * @throws CircularIncludeException if this <code>Element</code> contains an XInclude element
    *                                  that attempts to include a document in which
    *                                  this element is directly or indirectly included.
    * @throws NullPointerException  if the original argument is null.
    */
    public static Node resolve(Element original, String base, Document resolved)
     throws CircularIncludeException,  NullPointerException {

        if (original == null) {
          throw new NullPointerException(
           "You can't XInclude a null element."
          );
        }
        Stack bases = new Stack();
        if (base != null) bases.push(base);

        Node result = resolve(original, bases, resolved);
        bases.pop();
        return result;

    }

    private static boolean isIncludeElement(Element element) {

        if (element.getLocalName().equals("include") &&
            element.getNamespaceURI().equals(XINCLUDE_NAMESPACE)) {
          return true;
        }
        return false;

    }


  /**
    * <p>
    * This method resolves a DOM <code>Element</code>
    * and merges in all XInclude references. This process is recursive.
    * The element returned contains no XInclude elements.
    * If a referenced document cannot be found it is replaced with
    * an error message. The <code>Element</code> object returned is a new element.
    * The original <code>Element</code> is not changed.
    * </p>
    *
    * @param original <code>Element</code> that will be processed
    * @param bases    <code>Stack</code> containing the string forms of
    *                 all the URIs of doucments which contain this element
    *                 through XIncludes. This used to detect if a circular
    *                 reference is being used.
    * @param resolved <code>Document</code> into which the resolved element will be placed.
    * @return Node  Either an <code>Element</code>
    *                 (<code>parse="text"</code>) or a <code>String</code>
    *                 (<code>parse="xml"</code>)
    * @throws CircularIncludeException if this <code>Element</code> contains an XInclude element
    *                                  that attempts to include a document in which
    *                                  this element is directly or indirectly included.
    * @throws IllegalArgumentException if the href attribute is missing from an include element.
    */
  private static Node resolve(Element original, Stack bases, Document resolved)
   throws CircularIncludeException, IllegalArgumentException {

    Element result;
    String base = "";
    if (bases.size() != 0) base = (String) bases.peek();

    if (isIncludeElement(original)) {
      String href = original.getAttribute("href");
      if (href == null || href.equals("")) { // illegal, what kind of exception????
        throw new IllegalArgumentException("Missing href attribute");
      }
      String baseAttribute
       = original.getAttributeNS("http://www.w3.org/XML/1998/namespace", "base");
      if (base != null && !base.equals("")) {
        base = baseAttribute;
      }
      boolean parse = true;
      String parseAttribute = original.getAttribute("parse");
      if (parseAttribute != null && parseAttribute.equals("text")) {
          parse = false;
      }

      String remote;
      if (base != null) {
        try {
          URL context = new URL(base);
          URL u = new URL(context, href);
          remote = u.toExternalForm();
        }
        catch (MalformedURLException ex) {
          return resolved.createTextNode("Unresolvable URL "
           + base + "/" + href);
        }
      }
      else {
          remote = href;
      }

      if (parse) {
                 // checks for equality (OK) or identity (not OK)????
        if (bases.contains(remote)) {
          // need to figure out how to get file and number where
          // bad include occurs
          throw new CircularIncludeException(
            "Circular XInclude Reference to "
           + remote + " in " );
        }

        try {
          parser.parse(remote);
          Document doc = parser.getDocument();
          bases.push(remote);
          result = (Element) resolve(doc.getDocumentElement(), bases, resolved);
          bases.pop();
        }
        // Make this configurable
        catch (SAXException e) {
           return resolved.createTextNode("Document "
            + remote + " is not well-formed.\r\n" + e.getMessage());
        }
        catch (IOException e) {
           return resolved.createTextNode("Document not found: "
            + remote + "\r\n" + e.getMessage());
        }
      }
      else { // insert text
        String s = downloadTextDocument(remote);
        return resolved.createTextNode(s);
      }

    }
    // not an include element
    else { // recursively process children
       // still need to adjust bases here????
       result = (Element) resolved.importNode(original, false);
       NodeList children = original.getChildNodes();
       for (int i = 0; i < children.getLength(); i++) {
         Node n = children.item(i);
         if (n instanceof Element) {
           Element e = (Element) n;
           result.appendChild(resolve(e, bases, resolved));
         }
         else {
           result.appendChild(resolved.importNode(n,true));
         }
       }
    }

    return result;

  }

  /**
    * <p>
    * This utility method reads a document at a specified URL
    * and returns the contents of that document as a <code>Text</code>.
    * It's used to include files with <code>parse="text"</code>
    * </p>
    *
    * <p>
    * If the document cannot be located due to an IOException,
    * then an error message string is returned. I'm not yet convinced this
    * is the right behavior. Perhaps I should pass on the exception?
    * </p>
    *
    * @param url      URL of the doucment that will be stored in
    *                 <code>String</code>.
    * @return Text  The document retrieved from the source <code>URL</code>
    *                 or an error message if the document can't be retrieved.
    *                 Note: throwing an exception might be better here. I should
    *                 at least allow the setting of the eror message.
    */
    public static String downloadTextDocument(String url) {

        URL source;
        try {
          source = new URL(url);
        }
        catch (MalformedURLException ex) {
          return "Unresolvable URL " + url;
        }
        StringBuffer s = new StringBuffer();
        try {
          InputStream in = new BufferedInputStream(source.openStream());
          // does XInclude give you anything to specify the character set????
          InputStreamReader reader = new InputStreamReader(in, "8859_1");
          int c;
          while ((c = in.read()) != -1) {
            if (c == '<') s.append("&lt;");
            else if (c == '&') s.append("&amp;");
            else s.append((char) c);
          }
          return s.toString();
        }
        catch (IOException e) {
          return "Document not found: " + source.toExternalForm();
        }

    }

    /**
      * <p>
      * The driver method for the XIncluder program.
      * I'll probably move this to a separate class soon.
      * </p>
      *
      * @param args  <code>args[0]</code> contains the URL or file name
      *              of the document to be procesed.
      */
    public static void main(String[] args) {

        DOMParser parser = new DOMParser();
        XMLSerializer outputter = new XMLSerializer();
        for (int i = 0; i < args.length; i++) {
          try {
            parser.parse(args[i]);
            Document input = parser.getDocument();
            // absolutize URL
            String base = args[i];
            if (base.indexOf(':') < 0) {
              File f = new File(base);
              base = f.toURL().toExternalForm();
            }
            Document output = resolve(input, base);
            // need to set encoding on this to Latin-1 and check what
            // happens to UTF-8 curly quotes

            OutputFormat format = new OutputFormat("XML", "ISO-8859-1", false);
            format.setPreserveSpace(true);
            XMLSerializer serializer
             = new XMLSerializer(System.out, format);
            serializer.serialize(output);
          }
          catch (Exception e) {
            System.err.println(e);
            e.printStackTrace();
          }
        }

    }

}

To Learn More

XInclude Specification: http://www.w3.org/TR/xinclude

To Learn More

This presentation: http://www.ibiblio.org/xml/slides/xmloneaustin2001/xlinks
XML Base Specification: http://www.w3.org/TR/xmlbase
XInclude Specification: http://www.w3.org/TR/xinclude
XPath Specification: http://www.w3.org/TR/xpath
XML in a Nutshell
- Elliotte Rusty Harold and W. Scott Means
- O'Reilly & Associates, 2001
- ISBN 0-596-00058-8
- XPath: http://www.oreilly.com/catalog/xmlnut/chapter/ch09.html
XML Bible, second edition
- Elliotte Rusty Harold
- Hungry Minds, 2001
- ISBN 0-7645-4760-7
- XLinks: http://www.ibiblio.org/xml/books/bible/updates/16.html
- XPointers: http://www.ibiblio.org/xml/books/bible/updates/17.html

Index | Cafe con Leche