String Ranges

XPointer provides some very basic string matching capabilities through the string-range() function. This function takes as an argument a node set to search and a substring to search for. It returns a node set containing one range for each non-overlapping match to the string. You can also provide optional index and length arguments indicating how many characters after the match the range should start and how many characters after the start the range should continue. The basic syntax is:

string-range(node-set,substring,index,length)

The first node-set argument is an XPath expression specifying which part of the document to search for a matching string. The second substring argument is the actual string to search for. By default, the range returned starts before the first matched character and encompasses all the matched characters. However, the index argument can give a positive number to start after the start of the match. For instance, setting it to 2 would indicate that the range starts after the first matched character. The length argument can specify how many characters to include in the range.

A string range points to an occurrence of a specified string, or a substring of a given string in the text (not markup) of the document. For example, this XPointer finds all occurrences of the string "Harold":

xpointer(string-range(/,"Harold"))

You can change the first argument to specify what nodes you want to look in. For example, this XPointer finds all occurrences of the string "Harold" in NAME elements:

xpointer(string-range(//NAME,"Harold"))

String ranges may have node tests. Thus this XPointer finds only the first occurrence of the string "Harold" in the document:

xpointer(string-range(/,"Harold")[position()=1])

This targets the position immediately preceding the word Harold in Charles Walter Harold's NAME element. This is not the same as pointing at the entire NAME element as an element-based selector would do.

A third numeric argument targets a particular position in the string. For example, this targets the point immediately following the first occurrence of the string "Harold" because Harold has six letters:

xpointer(string-range(/,"Harold",6)[position()=1])

An optional fourth argument specifies the number of characters to select. For example, this URI selects the "old" from the first occurrence of the entire string "Harold":

xpointer(string-range(/,"Harold",4,3)[position()=1])

If the first string argument in the node test is the empty string, then relevant positions in the context node's text contents will be selected. For example, the following XPointer targets the first six characters of the document's parsed character data:

xpointer(/string::"",1,6[position()=1])

For another example, let's suppose you want to find the year of birth for all people born in the nineteenth century. The following will accomplish that:

xpointer(string-range(//BORN, "18", 2, 4)

This says to look in all BORN elements for the string " 18". (The initial space is important to avoid accidentally matching someone born in 1918 or on the 18th day of the month.) When it's found move one character ahead (to skip the space) and return a range covering the next four characters.

When matching strings, case is considered. All white space is condensed to a single space. Markup characters are ignored.


Previous | Next | Top | Cafe con Leche

Copyright 2000 Elliotte Rusty Harold
elharo@metalab.unc.edu
Last Modified February 1, 2000