Hands On XSLT

Hands-On XSLT

Elliotte Rusty Harold

Software Development 2002 East

November 19, 2002

elharo@metalab.unc.edu

http://www.cafeconleche.org/

What Is XSL?

The Extensible Stylesheet Language
Three parts:
- A data model and query language (XPath)
- A transformation language (XSLT)
- A formatting language (XSL-FO)

Versions

This class covers:
- XSL Transformations: November 16, 1999 1.0 Specification
- XPath: November 16, 1999 1.0 Specification
XPath 2.0/XSLT 2.0 is under development, but will build on XSLT 1.0 rather than replacing it.

Part I: Basic XSLT

The Process of an XSL Transformation

The XML parser reads an XML document and forms a tree
The tree is passed to the XSLT processor
The XSLT processor compares the nodes in the tree to the instructions in the style sheet
When the XSLT processor finds a match, it outputs a tree fragment
(Optional) The complete output tree is serialized to some other format such as text, HTML, or an XML file

XSLT Software

Linux: xsltproc
Windows: Instant Saxon
Java: Saxon
All Platforms: Mozilla
Others are available
Install from CD

Two Example XML Documents

Periodic Table, a record like document
XPath, a narrative document

An XSLT Style Sheet

The stylesheet is a well-formed XML document.
Root element is xsl:stylesheet
The xsl prefix is mapped to the http://www.w3.org/1999/XSL/Transform namespace URI
The xsl:stylesheet element has a version attribute with the value 1.0

<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <!-- Templates will go here... -->

</xsl:stylesheet>

You can find this example in examples/sheet1.xsl

Exercise 1: Running the XSLT Processor

Apply the empty stylesheet to both input documents.

% java com.icl.saxon.StyleSheet filename.xml stylesheet.xsl
C:\> saxon filename.xml stylesheet.xsl
% xsltproc stylesheet.xsl filename.xml

What do you see?

Template Rules

An xsl:template element represents a template rule.
Each template rule has a match pattern and a template.
The match pattern is found in the match attribute.
The template is the content of the xsl:template element.
When the pattern is matched, the template is instantiated to form a result tree fragment, which is added to the complete result tree.
Processing starts by attempting to match the root node of the document to a template rule.
There default template rules that apply when no explicit rule matches.

<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:template match="NAME">
    <h2><xsl:value-of select="."/></h2>
  </xsl:template>

</xsl:stylesheet>

You can find this example in examples/atoms1.xsl

View Transformed Document in Browser

Exercise 2: Template Rules

Add a template to the stylesheet that puts the ATOMIC_NUMBER in a paragraph (an HTML P element).

If time permits put the other information in paragraphs too.

Literal text

Character data and markup that is copied directly from the stylesheet to the output document

  <xsl:template match="ATOMIC_WEIGHT">
    <p>Atomic Weight: <xsl:value-of select="."/></p>
  </xsl:template>

View Transformed Document in Browser

Exercise 3: Literal Character Data and Markup

Easy

Add human readable labels (e.g. "Atomic weight") to each paragraph
Italicize the atomic symbol.

Medium

Same problem, but do it with CSS instead of an i element.)

Adding the root

 <xsl:template match="/">
    <html>
      <body>
        <xsl:value-of select="."/>
      </body>
    </html>
  </xsl:template>

View Transformed Document in Browser

Applying Templates

 <xsl:template match="/">
    <html>
      <body>
        <xsl:apply-templates/>
      </body>
    </html>
  </xsl:template>

View Transformed Document in Browser

The select attribute

Select only the NAME of each ATOM:

  <xsl:template match="ATOM">
    <xsl:apply-templates select="NAME"/>
  </xsl:template>

The select attribute is pretty much the same for xsl:value-of and xsl:apply-templates.
The select attribute contains an XPath expression relative to the currently matched node.
A raw element name is an XPath expression that selects children of the currently matched node.

View Transformed Document in Browser

Exercise 4: Applying Templates

Make each atom produce the following format (modulo white space):

<h2>Name [Symbol]</h2>
    <ul>
      <li>Atomic Weight: Atomic Weight</li>
      <li>Atomic Number: Atomic Number</li>
    </ul>

Attributes

You can use an @ sign before an attribute name to select or match an attribute of the current node.

 <xsl:template match="DENSITY">
    <li>Density: <xsl:value-of select="."/> <xsl:value-of select="@UNITS"/></li>
  </xsl:template>

View Transformed Document in Browser

Exercise 5: Attributes

Add density, atomic volume, and atomic radius to the information displayed for each atom. Include the units.

Attribute Value Templates

Make the symbol an ID attribute so it can be linked to:
<h2 ID="He">Helium [He]</h2>
Need a way to copy nodes from the input document to attribute values in the output document.

Attribute value templates are the solution. Enclose the XPath expression inside the literal attribute value in curly braces.

 <xsl:template match="ATOM">
    <h2 id="{SYMBOL}"><xsl:value-of select="NAME"/> [<xsl:value-of select="SYMBOL"/>]</h2>
    <ul>
    <xsl:apply-templates select="ATOMIC_NUMBER"/>
    <xsl:apply-templates select="ATOMIC_WEIGHT"/>
    <xsl:apply-templates select="ATOMIC_RADIUS"/>
    <xsl:apply-templates select="ATOMIC_VOLUME"/>
    <xsl:apply-templates select="DENSITY"/>
    </ul>
  </xsl:template>

An attribute value can contain multiple attribute value templates interspersed with literal data.

View Transformed Document in Browser

Iteration with xsl:for-each

xsl:apply-templates is push
xsl:for-each is pull

A template that lists the names of all the atoms in the document:

  <xsl:template match="PERIODIC_TABLE">
    <html>
      <body>
         <ul>
         <xsl:for-each select="ATOM">
            <li><xsl:value-of select="NAME"/></li>
         </xsl:for-each>
         </ul>
        <xsl:apply-templates/>
      </body>
    </html>
  </xsl:template>

View Transformed Document in Browser

Exercise 6: Iteration

Create hypertext links from the element names in the initial list to the full description of that element later in the document.

Matching Elements in Namespaces

XSLT matches by namesapce URI and local name; the prefix is ignored.
The prefix does not have to be the same in the input document as the stylesheet.
Unprefixed names in the stylesheet are always in no namespace. You need to use a prefix to match names in the stylesheet even if you use the default namespace in the original document.

Exercise 7: Matching Elements in Namespaces

Add an xmlns="http://www.example.com/" attribute to the root PERIODIC_TABLE element. Then fix the last stylesheet to work with it.

Part II: XPath

XPath Explorer

Expressions to Try Out

/PERIODIC_TABLE/ATOM is an XPath expression that selects all ATOM elements.
/PERIODIC_TABLE/ATOM[1] is an XPath expression that selects the first ATOM element.
/PERIODIC_TABLE/ATOM/SYMBOL is an XPath expression that selects all SYMBOL elements.
/PERIODIC_TABLE/ATOM[ATOMIC_NUMBER="3"] is an XPath expression that selects Lithium.
//ATOM[SYMBOL="He"]/DENSITY/attribute::UNITS is an XPath expression that selects the units in which the density of Helium is given.
/child::PERIODIC_TABLE/child::ATOM/child::* is an XPath expression that selects all the child elements of ATOM elements.
round(number(/PERIODIC_TABLE/ATOM[SYMBOL="Fe"]/ATOMIC_WEIGHT) - round(number(/PERIODIC_TABLE/ATOM[SYMBOL="Fe"]/ATOMIC_NUMBER)) is an XPath expression that returns the approximate number of neutrons in iron.
/descendant::* is an XPath expression that selects all the elements in the document.
/descendant-or-self::node() is an XPath expression that selects all non-attribute, non-namespace nodes in the document.
/descendant-or-self::node() | /descendant::*/attribute:* | /descendant::*/namespace:* is an XPath expression that selects all nodes in the document.

XPath Data Model

A document is a tree of nodes
Seven kinds of nodes:

root node
The document itself. The root node’s children are the comments and processing instructions in the prolog and epilog and the root element of the document.

element node
An element. Its children are all the child elements, text nodes, comments, and processing instructions the element contains. An element also has namespaces and attributes. However, these are not child nodes.

attribute node
An attribute other than one that declares a namespace

text node
The maximum uninterrupted run of text between tags, comments, and processing instructions. White space is included.

comment node
A comment

processing instruction node
A processing instruction

namespace node
A namespace mapping in scope on an element
The XPath data model does not include:
- entity references
- CDATA sections
- the document type declaration
- namespace declarations

Node Properties

Each node has a string-value

Attributes, elements, processing instructions, and namespace nodes have expanded names, which are divided into a local part and a namespace URI.

Node type Local name Namespace name String-value
root None None the complete, ordered content of all text nodes in the document; same as the value of the root element of the document
element The name of the element, not including any prefix or colon The namespace URI of the element The complete, ordered content of all text node descendants of this element (i.e. the text that’s left after all references are resolved and all other markup is stripped out.)
attribute The name of the attribute, not including any prefix or colon The namespace URI of the attribute The normalized attribute value

text None None The complete content of the text node

processing instruction The target of the processing instruction None The processing instruction data
comment None None The text of the comment
namespace The prefix for the namespace None The absolute URI for the namespace

Node type	Local name	Namespace name	String-value
root	None	None	the complete, ordered content of all text nodes in the document; same as the value of the root element of the document
element	The name of the element, not including any prefix or colon	The namespace URI of the element	The complete, ordered content of all text node descendants of this element (i.e. the text that’s left after all references are resolved and all other markup is stripped out.)
attribute	The name of the attribute, not including any prefix or colon	The namespace URI of the attribute	The normalized attribute value
text	None	None	The complete content of the text node
processing instruction	The target of the processing instruction	None	The processing instruction data
comment	None	None	The text of the comment
namespace	The prefix for the namespace	None	The absolute URI for the namespace

Location steps

A location path selects a set of nodes from an XML document.
Each location path is composed of one or more location steps.
Each location step has an axis, a node test and may have one or more predicates.
Each location step is evaluated with respect to a particular context node.
A double colon (::) separates the axis from the node test.
Each predicate is enclosed in square brackets.

Examples:

child::PERIODIC_TABLE
descendant::ATOM[1]
ancestor::ATOM
following-sibling::ATOM[ATOMIC_NUMBER="3"]
descendant-or-self::ATOM[]
child::*
descendant-or-self::node()
attribute::UNITS

Axes

There are twelve axes along which a location step can move. Each selects a different subset of the nodes in the document, depending on the context node. These are:

self: The node itself.
child: All child nodes of the context node. (Attributes and namespaces are not considered to be children of the node they belong to.)
descendant: All nodes completely contained inside the context node (between the end of its start-tag and the beginning of its end-tag); that is, all child nodes, plus all children of the child nodes, and all children of the children’s children, and so forth. This axis is empty if the context node is not an element node or a root node.
descendant-or-self: All descendants of the context node and the context node itself.
parent: The node which most immediately contains the context node. The root node has no parent. The parent of the root element and comments and processing instructions in the document’s prolog and epilog is the root node. The parent of every other node is an element node. The parent of a namespace or attribute node is the element node that contains it, even though namespaces and attributes aren’t children of their parent elements.
ancestor: The root node and all element nodes that contain the context node.
ancestor-or-self: All ancestors of the context node and the context node itself.
preceding: All non-attribute, non-namespace nodes which come before the context node in document order and which are not ancestors of the context node
preceding-sibling: All non-attribute, non-namespace nodes which come before the context node in document order and have the same parent node
following: All non-attribute, non-namespace nodes which follow the context node in document order and which are not descendants of the context node.
following-sibling: All non-attribute, non-namespace nodes which follow the context node in document order and have the same parent node
attribute: Attributes of the context node. This axis is empty if the context node is not an element node.
namespace: Namespaces in scope on the context node. This axis is empty if the context node is not an element node.

Node Tests

The axis chooses the direction to move from the context node. The node test determines what kinds of nodes will be selected along that axis. The node tests are:

Name: Any element or attribute with the specified name. If the name is prefixed, then the local name and namespace URI are compared, not the qualified names. If the name is not prefixed, then the element must be in no namespace at all. An unprefixed name in an XPath expression never matches an element in a namespace, even in the default namespace. When using XPath to search for an unprefixed element like ATOM that is in a namespace, you have to use a prefixed name instead such as chem:Quote. Exactly how the prefix is mapped to the namespace depends on the environment in which the XPath expression is used.
*: Along the attribute axis the asterisk matches all attribute nodes. Along the namespace axis the asterisk matches all namespace nodes. Along all other axes, this matches all element nodes.
prefix:*: Any element or attribute in the namespace mapped to the prefix.
comment(): Any comment
text(): Any text node
node(): Any node
processing-instruction(): Any processing instruction
processing-instruction('target'): Any processing instruction with the specified target

Exercise 8: Axes and Node Tests

Write a stylesheet that lists all the titles in xpath.xml.

Predicates

Each location step can have zero or more predicates that further filter the node-set.
A predicate is an XPath expression in square brackets that is evaluated for each node selected by the location step. If the predicate is true, then the node is kept in the node-set. If the predicate is false, then the node is removed from the node-set.
If the predicate returns a number, then the node is kept in the set only if the number is equal to the position of the context node in the context node list.
If the predicate returns a string, then the context node is deleted from the set if the string is empty and kept otherwise.
If the predicate returns a node-set, then the source node is kept in the returned set only if the predicate node-set is non-empty. It is deleted otherwise.

Examples:

child::ATOM[position() = 1]
child::ATOM[starts-with("NAME", "H")]
child::ATOM[ATOMIC_NUMBER="3"]
child::ATOM[ATOMIC_NUMBER > 3 ][ATOMIC_WEIGHT < 200]
descendant::ATOM[SYMBOL="He"]
child::ATOM[SYMBOL="Fe"]

Exercise 9: Predicates

Easy

The atomic number of uranium, the heaviest naturally occurring element, is 92. Write a stylesheet that lists the names of all the transuranium elements.

Medium

Write a stylesheet that divides the periodic table into two sections, the first containing all the elements with atomic numbers less than or equal to 92, the second with atomic numbers greater than or equal to 92. Put an H2 header in front of each section.

Multistep Location Paths

The forward slash (/) combines location steps into a location path.
The node-set selected by the first step becomes the context node-set for the second step. The node-set identified by the second step becomes the context node-set for the third step, and so on.

Examples

child::PERIODIC_TABLE/child::ATOM is an XPath expression that selects all ATOM elements.
child::PERIODIC_TABLE/child::ATOM[1] is an XPath expression that selects the first ATOM element.
child::PERIODIC_TABLE/child::ATOM/child::SYMBOL is an XPath expression that selects all SYMBOL elements.
child::PERIODIC_TABLE/child::ATOM[ATOMIC_NUMBER="3"] is an XPath expression that selects Lithium.
child::ATOM[SYMBOL="He"]/child::DENSITY/attribute::UNITS is an XPath expression that selects the units in which the density of Helium is given.
child::PERIODIC_TABLE/child::ATOM/child::* is an XPath expression that selects all the child elements of ATOM elements.

Absolute location paths

Begin with a / to select the root node
Independent of the context node

Examples

/child::PERIODIC_TABLE/child::ATOM is an XPath expression that selects all ATOM elements.
/child::PERIODIC_TABLE/child::ATOM[1] is an XPath expression that selects the first ATOM element.
/child::PERIODIC_TABLE/child::ATOM/child::SYMBOL is an XPath expression that selects all SYMBOL elements.
/child::PERIODIC_TABLE/child::ATOM[ATOMIC_NUMBER="3"] is an XPath expression that selects Lithium.
/descendant::ATOM[SYMBOL="He"]/child::DENSITY/attribute::UNITS is an XPath expression that selects the units in which the density of Helium is given.
/child::PERIODIC_TABLE/child::ATOM/child::* is an XPath expression that selects all the child elements of ATOM elements.
/descendant::* is an XPath expression that selects all the elements in the document.
/descendant-or-self::node() is an XPath expression that selects all non-attribute, non-namespace nodes in the document.

Abbreviated location paths

Abbreviation	Expanded form
`Name`	`child::Name`
`@Name`	`attribute::Name`
`//`	`/descendant-or-self::node()/`
`.`	`self::node()`
`..`	`parent::node()`

Examples

/PERIODIC_TABLE/ATOM
/PERIODIC_TABLE/ATOM[1]
/PERIODIC_TABLE/ATOM/SYMBOL
/PERIODIC_TABLE/ATOM[ATOMIC_NUMBER="3"]
//ATOM[SYMBOL="He"]/DENSITY/@UNITS
/PERIODIC_TABLE/ATOM/*
//*
//node()

Combining location paths with |

Union of multiple node-sets

Examples

/PERIODIC_TABLE/ATOM | /PERIODIC_TABLE/ELEMENT
child::NAME | child::SYMBOL
child::* | attribute::*
* | @*
//ATOM[SYMBOL="He"] | //ATOM[SYMBOL="H"]

General Expressions

Not necessarily location paths
Can return numbers, booleans, strings, or node-sets
Can be raw functions such as id("He")
Can be numeric such as (2+2)*3.14

Data Types

XPath has four data types:
string
A sequence of zero or more Unicode characters. This is not quite the same thing as a Java String which is a sequence of UTF-16 code points. A single Unicode character from outside Unicode’s Basic Multilingual Plane (BMP) occupies two UTF-16 code points. XPath strings that contain characters from outside the BMP will have smaller lengths than the equivalent Java string.
number
An IEEE-754 double. This is the same as Java’s double primitive data type for all intents and purposes.
boolean
Semantically the same as Java’s boolean type. However, XPath does allow 1 and 0 to represent true and false respectively.
node-set
An unordered collection of nodes from an XML document without any duplicates. Since a node-set is a mathematical set, there is no fundamental ordering defined on the set. However, most node-sets have a natural document order that’s derived from the order of the nodes in the set in the input document. This is similar to how the set of integers {1, 4, -76, 23} is unordered. However, the individual elements in the set can be compared to each other and sorted if desired. In practice, most APIs use lists rather than sets to represent node-sets, and these lists are sorted in either document order or reverse document order, depending on how they were created.
XSLT adds a result tree fragment type for output only

Literals

XPath defines literal forms for strings and numbers. Numbers have more or less the same form as double literals in Java. That is, they look like 72.5, -72.5, .5321, and so forth. XPath only uses floating point arithmetic, so integers like 42, -23, and 0 are also number literals. However, XPath does not recognize scientific notation such as 5.5E-10 or 6.022E23.
XPath string literals are enclosed in single or double quotes. For example, "red" and 'red' are different representations for the same string literal containing the word red.
There are no boolean or node-set literals. However, the true() and false() functions sometimes substitute for the lack of boolean literals.

Number Operators

XPath has several operators for doing arithmetic:
- +
- -
- *
- div
- mod
These may be part of any select expression, but are most commonly used in predicates with comparison operators.

Exercise 10: Arithmetic

The boiling points in the periodic table document are given in degrees Kelvin.

Write a stylesheet which produces a table of boiling points for each element in degrees Celsius and degrees Fahrenheit

Celsius = Kelvin - 273.15.

Fahrenheit = (9/5 * Celsius) + 32

Boolean Operators

`<`	less than
`>`	greater than
`<=`	less than or equal to
`>=`	greater than or equal to
`=`	boolean equals (not an assignment statement as in Java)
`!=`	not equal to
`or`	Boolean or
`and`	Boolean and

In an XSLT stylesheet, some of these may need to be escaped with < or >.

Exercise 11: Boolean Operators

Write a stylesheet that lists the names of all elements with a boiling point somewhere around room temperature (0 degrees Fahreneheit to 100 degrees Fahrenheit)

Fahrenheit = (9/5 * Celsius) + 32

Celsius = (5/9) * (Fahrenheit -32) = (9/5 * Celsius) + 32

Functions

Functions operate on and return the four fundamental XPath data types.
Some functions take variable numbers of arguments.
A function that doesn’t have any arguments normally operates on the context node.
Most functions are weakly typed. You can pass any of the four types in the place of an argument that is declared to be of type boolean, number, or string. XPath will convert it and use it. The exceptions are those functions that are declared to take node-sets as arguments. XPath cannot convert arguments of other types to node-sets.
Functions do not modify their arguments in anyway.

Node-set Functions

number last(): Returns the number of nodes in the context node list. This is the same as the position of the last node in the list.
number position(): Returns the position of the context node in the context node list. The first node has position 1, not 0.
number count(node-set): Returns the number of nodes in the argument
node-set id(object): Returns a node-set containing the single element node with the specified id as determined by an ID-type attribute. If no node has the specified ID, then this function returns an empty node-set. If the argument is a node-set, then it returns a node-set containing all the element nodes whose ID matches the string-value of any of the nodes in the argument node-set.
string local-name(node-set?): Returns the local name of the first node in the argument node-set, or the local name of the context node if the argument is omitted. It returns an empty string if the relevant node does not have a local name (i.e. it’s a comment, root, or text node.)
string namespace-uri(node-set?): Returns the namespace name of the first node in the argument node-set, or the namespace name of the context node if the argument is omitted. It returns an empty string if the node is an element or attribute that is not in a namespace. It also returns an empty string if namespace names don’t apply to this node (i.e. it’s a comment, processing instruction, root, or text node.)
string name(node-set?): Returns the full, prefixed name of the first node in the argument node-set, or the name of the context node if the argument is omitted. It returns the empty string if the relevant node does not have a name (e.g. it’s a comment or text node.)

Exercise 12: Node-set Functions

A paragraph is represented by a para element.

Easy

Write a stylesheet that extracts the first paragraph of each sect1 in xpath.xml and outputs it in an HTML document.

Medium

Write a stylesheet that extracts the first paragraph of the document and each sect1 in xpath.xml.

Hard

Write a stylesheet that extracts the first paragraph of the document itself and each separate section of the document xpath.xml. Furthermore the title of each section should be replaced by the equivalent HTML header element. That is, sect1/title --> h1, sect2/title --> h2, etc.

Number Functions

XPath includes five functions that operate on numbers:

floor(number) returns the greatest integer smaller than the number
ceiling(number) returns the smallest integer greater than the number
round(number) rounds the number to the nearest integer
sum(number) returns the sum of its arguments
format-number(number, format-string) returns the string form of a number formatted according to the specified format-string as if by Java 1.1's java.text.DecimalFormat class

Exercise 13: Number Formatting

Celsius = Kelvin - 273.15.

Fahrenheit = (9/5 * Celsius) + 32

Easy

Rewrite the boiling points table of exercise 10 to use two decimal digits of precision for each value in the table.

Hard

Rewrite the boiling points table to use the same number of significant digits in the output as are provided in the input.

String functions

string string(object?): This function returns the string-value of the argument. If the argument is a node-set, then it returns the string-value of the first node in the set. If the argument is omitted, it returns the string-value of the context node.
string concat(string, string, string...): This function returns a string containing the concatenation of all its arguments.
boolean starts-with(string, string): This function returns true if the first string starts with the second string. Otherwise it returns false.
boolean contains(string, string): This function returns true if the first string contains the second string. Otherwise it returns false.
string substring-before(string, string): This returns that part of the first string that precedes the second string. It returns the empty string if the second string is not a substring of the first string. If the second string appears multiple times in the first string, then this returns the portion of the first string before the first appearance of the second string.
string substring-after(string, string): This returns that part of the first string that follows the second string. It returns the empty string if the second string is not a substring of the first string. If the second string appears multiple times in the first string, then this returns the portion of the first string after the initial appearance of the second string.
string substring(string, number, number?): This returns the substring of the first argument beginning at the second argument and continuing for the number of characters specified by the third argument (or until the end of the string if the third argument is omitted.) Unlike Java, the foirst character is at position 1, not 0.
number string-length(string?): Returns the number of Unicode characters in the string, or the string-value of the context node if the argument is omitted. This may not be the same as the number returned by the length() method in Java’s String class because XSLT counts characters and Java counts UTF-16 code points.
string normalize-space(string?): This function strips all leading and trailing white-space from its argument, or the string-value of the context node if the argument is omitted, and condenses all other runs of whitespace to a single space. It’s very useful in XML documents where whitespace is used primarily for formatting.
string translate(string, string, string): This function replaces all characters in the first string that are found in the second string with the corresponding character from the third string.

Exercise 14: String Manipulation

Write a stylesheet that extracts the first sentence of each paragraph in xpath.xml.

A paragraph is represented by a para element.

Each sentence ends with a period followed by at least one white space character or the end of the paragraph.

Boolean Functions

boolean boolean(object): Converts the argument to a boolean in a mostly sensible way. NaN and 0 are false. All other numbers are true. Empty strings are false. All other strings are true. Empty node-sets are false. All other node-sets are true.
boolean not(boolean): This function turns true into false and false into true.
boolean true(): This function always returns true. It’s necessary because XPath does not have any boolean literals.
boolean false(): This function always returns false. It’s necessary because XPath does not have any boolean literals.
boolean lang(string): This function returns true if the context node is written in the language specified by the argument. The language of the context node is determined by the currently in-scope xml:lang attribute. If there is no such attribute, this function returns false.

Match patterns vs. XSLT Expressions

select attributes, test attributes, and predicates contain general XPath expressions
match attributes contain a subset of XPath expressions with these limitations:
- A Pattern is a set of location path patterns separated by |.
- Only uses the child or attribute axes.
- Can use // and /
- Can start with an id() or key() function call with a literal argument.
- Can use arbitrary predicates

Part III: More XSLT

The Default Template Rules

<xsl:template match="*|/">
  <xsl:apply-templates/>
</xsl:template>

<xsl:template match="text()|@*">
  <xsl:value-of select="."/>
</xsl:template>

<xsl:template match="processing-instruction()|comment()"/>

Testing conditions with xsl:if

  <xsl:if test="expression">
    
  </xsl:if>

Zero length strings are false
Empty node-sets are false; non-empty node-sets are true
Zero and NaN are false; other numbers are true

Exercise 15: Conditional Output

Revise the boiling point table of exercise 12 so that elements whose boiling point is unknown are not included in the table.

Harder: Rewrite the boiling points table to use the same number of significant digits in the output as are provided in the input.

Celsius = Kelvin - 273.15.

Fahrenheit = (9/5 * Celsius) + 32

xsl:choose

xsl:when child elements with test attributes. Each contains a template. The first one whose test attribute is evaluated to true is instantiated. The rest are ignored.
Optional xsl:default if none of the test attributes of the xsl:when elements evaluate to true.

Exercise 16: Selection

Easy

List the names of the elements in four different colors: red for liquids, blue for gases, black for solids, and green for unknown.

Medium

Same problem, but do it with CSS instead of a font element.

Hard

Divide the list of elements into four sections; one each for gases, liquids, solids, and unknown states

Sorting with xsl:sort

The select attribute provides the key to sort by
Must be a child of xsl:apply-templates or xsl:for-each

       <xsl:apply-templates select="composer">
         <xsl:sort select="name/last_name"/>
       </xsl:apply-templates>

Multiple Key Sorts

       <xsl:apply-templates select="composer">
         <xsl:sort select="name/last_name"/>
         <xsl:sort select="name/first_name"/>
         <xsl:sort select="name/middle_name"/>
       </xsl:apply-templates>

Sort Options

       <xsl:apply-templates select="composer">
         <xsl:sort select="name/last_name" order="ascending" lang="en" data-type="text"/>
         <xsl:sort select="name/first_name"/>
         <xsl:sort select="name/middle_name"/>
       </xsl:apply-templates>

data-type:
- text
- number
- qualified name
lang:
- en
- fr
- fr-CA
- ch
- etc.
order:
- ascending
- descending
case-order:
- upper-first
- lower-first

Exercise 17: Sorting

Starting with the complete information document from Exercise 6

Sort the table of contents alphabetically
Sort the element sections by atomic number

Numbering Output

The xsl:number element has a variety of attributes to determine number style, exactly what's counted, where numbering starts, and so forth
The position() function returns the position of the current node in the context node list

  <xsl:template match="PERIODIC_TABLE">
    <body>
          <xsl:for-each select="ATOM">
             <xsl:sort select="NAME"/>
            <xsl:number value="position()"/>. <a href="#{SYMBOL}">
            <xsl:value-of select="NAME"/></a><br />
         </xsl:for-each>
        <xsl:apply-templates  select="ATOM">
             <xsl:sort select="ATOMIC_NUMBER" data-type="number"/>
        </xsl:apply-templates>
      </body>
  </xsl:template>

View Transformed Document in Browser

Number Options

level:
- single
- multiple
- ant
count: pattern
from: pattern
format: string like Java 1.1's DecimalFormat class
letter-value:
- alphabetic
- traditional
grouping-separator: char
grouping-size: number
lang:
- en
- fr
- fr-CA
- ch
- etc.

Part IV: XSLT in Practice

Where Does the Transformation Happen?

There are three primary ways XML documents are transformed into other formats, such as HTML, with an XSLT style sheet:

The XML document and associated style sheet are both served to the client (Web browser), which then transforms the document as specified by the style sheet and presents it to the user.
The server applies an XSL style sheet to an XML document to transform it to some other format (generally HTML) and sends the transformed document to the client (Web browser).
A third program transforms the original XML document into some other format (often HTML) before the document is placed on the server. Both server and client only deal with the post-transform document.

Client Side Processing

Place an xml-stylesheet processing instruction in the prolog immediately after the XML declaration (if any) and before the document type declaration (if any).
This processing instruction should have a type attribute with the value text/xml and an href attribute whose value is an absolute or relative URL pointing to the style sheet.

<?xml version="1.0"?>
<?xml-stylesheet type="text/xml" href="compositions.xsl"?>

Eventually application/xml+xslt will replace text/xml.
IE uses the non-existent MIME media type text/xsl instead.
This is also how you attach a CSS style sheet to a document. The only difference here is that the type attribute has the value text/xml instead of text/css.

Exercise 18: Client Side Transformation

Easy

Load a styled periodic table into Mozilla

Medium

Load a styled periodic table document into Internet Explorer 6.0 and Mozilla

Hard

Load a styled periodic table document into Internet Explorer 5.0, Internet Explorer 6.0, Mozilla

What do you see?

What else does XSLT have?

The xsl:element, xsl:attribute, xsl:processing-instruction, xsl:comment, and xsl:text elements can output elements, attributes, processing instructions, comments, and text calculated from data in the input document.
The xsl:copy element to copy nodes from the input to the output
The xsl:copy-of element to copy nodes from the input to the output with their contents intact
Parameters for passing arguments to templates
Modes for reprocessing the same element in a different fashion
Recursion
Named templates, variables, and attribute sets help you reuse common template code.
The xsl:import and xsl:include elements merge rules from different style sheets.
Various attributes of the xsl:output element allow you to specify the output document's format, XML declaration, document type declaration, indentation, encoding and MIME type.
Extension functions written in other languages like Java, JavaScript, and C++
Extension elements written in other languages like Java, JavaScript, and C++

What does XSLT not have?

A general regular expression language
Non-final variables (hence no side effects)
Loops

Summary

The Extensible Stylesheet Language (XSL) comprises two separate XML applications for transforming and formatting XML documents.
An XSL transformation applies rules to a tree read from an XML document to transform it into an output tree written as an XML document.
An XSL template rule is an xsl:template element with a match attribute. Nodes in the input tree are compared against the patterns of the match attributes of the different template elements. When a match is found, the contents of the template are output.
The value of a node is a pure text (no markup) string containing the contents of the node. This can be calculated by the xsl:value-of element.
The xsl:apply-templates element continues processing the children of the current node
The xsl:if element produces output if, and only if, its test attribute is true.
The xsl:number element inserts the number specified by its value attribute into the output using a specified number format given by the format attribute.
The xsl:sort element can reorder the input nodes before copying them to the output.

To Learn More

XML in a Nutshell, second edition
- Elliotte Rusty Harold and W. Scott Means
- O'Reilly & Associates, 2002
- ISBN 0-596-00292-0
- XPath: http://www.oreilly.com/catalog/xmlnut/chapter/ch09.html
XML Bible, second edition
- Elliotte Rusty Harold
- Hungry Minds, 2001
- ISBN 0-7645-4760-7
- Chapter 17 XSL Transformations: http://www.cafeconleche.org/books/biblegold/chapters/ch17.html
This presentation: http://www.cafeconleche.org/slides/sd2002east/handsonxslt
XSL Transformations 1.0 Specification: http://www.w3.org/TR/xslt/
XPath Specification: http://www.w3.org/TR/xpath

Index | Cafe con Leche