For the most part this book is going to focus on XML documents used as input to and output from various kinds of programs. In many cases it’s entirely possible that the XML documents will be both written and read by software and that no human being ever even looks at the documents. However, on occasion people do need to load XML documents into a browser or print them on paper so they can read them. For this purpose XML documents are rather coarse, especially for non-programmers. To pretty them up, you can attach a style sheet to the document that specifies how each element should be presented. There are two main languages used for this purpose today, Cascading Style Sheets (CSS) and the Extensible Stylesheet Language (XSL).
CSS is a very straightforward, non-XML, declarative language. CSS rules attach style properties to elements. Each rule has a selector specifying which elements it applies to. The simplest selector is merely an element name such as Order or Price. This is followed by a pair of braces containing the style properties to apply to the selected elements. Each property has a name such as font-weight or display and a value that’s appropriate for that property. The name and value are separated by a colon. For example, this rule says that the Customer element should be bold faced:
Customer {font-weight: bold}
CSS rules often set multiple properties for a single element. Individual properties are separated by semicolons. For example, this rule says that the Order element (and all its descendants) should have the font-family serif and the font-size 16 points.
Order {font-family: serif; font-size: 16pt}
Most of the properties set on an element such as Order are inherited by all its descendants such as Customer, Price, and State. However, if a descendant element sets a different value for an inherited property, then that value overrides the inherited value. For example, this rule sets the font-family for the gift message to ZapfChancery or any script font if ZapfChancery is not available on the local system. It overrides the choice of a serif font that GiftMessage inherits from its ancestor Order element.
GiftMessage {font-family: ZapfChancery, script}
The selector syntax can be adjusted to apply to multiple elements at one time by separating the names by commas. For example, this rule specifies that five different elements all have the value block for the display property. This means each will be separated from the previous and following elements by a line break.
Street, Subtotal, Tax, Shipping, Total {display: block}
CSS also allows you to select elements according to attributes, parentage, siblings, ID, link status, and more. An asterisk can be used to stand in for any element.
Example 1.11 is a complete CSS stylesheet for order documents such as Example 1.2. It adds several new features including setting the display property to none to hide the SKU element and using the :before selector and the content property to add a little boiler-plate text in front of several elements. However, although useful, these facilities are limited. You still can’t reorder the elements, and the content property is limited to plain text, no markup.
Example 1.11. A CSS stylesheet for order documents
Order {font-family: serif; font-size: 16pt; display: block; line-height: 20pt; margin-left: 0.25in } ShipTo {margin-left: 0.5in; display: block } ShipTo:before {content: "Ship to:"; margin-left: -0.25in } Product {font-size: 12pt; display: block } Customer {font-weight: bold; display: block } GiftMessage {font-family: ZapfChancery, script} Street, Subtotal, Tax, Shipping, Total {display: block} Quantity:before {content: "Quantity: "} SKU {display: none} Subtotal:before {content: "Subtotal: $"} Price:before {content: "Unit Price: $"} Tax:before {content: "Tax: $"} Shipping:before {content: "Shipping: $"} Total:before {content: "Total: $"}
A few browsers may allow the user to specify which stylesheet to apply to a document. In general, however, an XML document that will be read by people must carry an xml-stylesheet processing instruction in its prolog that indicates which stylesheet should be applied to that document. This processing instruction has two pseudo-attributes, type and href. The type pseudo-attribute identifies the MIME media type of the stylesheet, text/css for Cascading Style Sheets. The href pseudo-attribute specifies the relative or absolute URL where the style sheet can be found. For example, this xml-stylesheet processing instruction says that a CSS style sheet named order.css can be found in the same directory where the XML document itself was found:
<?xml-stylesheet type="text/css" href="order.css"?>
Figure 1.1 shows Example 1.2 loaded into Mozilla after the stylesheet in Example 1.11 has been attached. Loaded into Opera or Netscape 6, the results would be similar. Internet Explorer 5.5 and earlier have much weaker support for CSS, and would not do nearly as good a job formatting the XML. Netscape 4.x and earlier have absolutely no support for displaying XML documents.
This isn’t bad. A browser with full support for CSS Level 2 does let you do a lot, but there are still numerous issues. The recipient’s name should be included in the ship-to information. The subtotal, tax, shipping, and total dollar amounts should really be aligned. And the product name, quantity, and price should probably be in a multi-row table with one row for each item in the order. Although CSS does support tables, it requires that the markup in the XML document already be structured in a very tabular fashion, which it isn’t here.
CSS is limited to specifying how the text content of each element is styled. It presents pretty much the entire content of the XML document in pretty much the order it was present in the stylesheet. It cannot add boilerplate text such as the return policy or extract just the elements needed for an address label. It is limited to specifying the appearance of elements already present in the XML document without changing their order or content.
XSL, by contrast, can reorder elements, delete certain elements, add content that wasn’t present in the original document, combine multiple documents into a single result, and more. Because of its ability to add text to what’s in the XML document before the document is shown to the user, XSL is much more suited for many of the examples in this book that contain only the raw data without a lot of extraneous text. XSL is a much more powerful stylesheet language than CSS. However, it is also comparatively difficult to learn.
XSL is divided into two complementary parts, XSL Transformations (XSLT) and XSL Formatting Objects (XSL-FO). Unlike CSS, both XSLT and XSL-FO are XML applications. XSLT stylesheets and XSL-FO documents are well-formed XML documents.
XSLT is a Turing-complete functional programming language designed specifically for describing transformations from one XML format into another. (A functional programming language is one with no side effects, in which the order of evaluation of statements makes no difference to the final result. The classic examples of functional languages are Scheme and Lisp.) An XSLT processor reads an XML document and an XSLT stylesheet, and transforms the document according to the instructions found in the stylesheet. It then outputs the transformed document. XSLT is designed for XML-to-XML transformations, but it can transform XML to HTML, XML to plain text, or XML to any other text format such as TeX or troff.
XSL-FO is an XML application that describes the layout of the various elements on a printed page. It has elements that represent paragraphs, list items, margins, and so forth. An XSL-FO processor converts an XSL-FO document into some other format that can be printed or viewed such as PDF, TeX, SVG, or plain text. This book was actually written in an XML application called DocBook. An XSLT style sheet converted the source DocBook files into XSL-FO document, which was then further processed to produce the PDF document that was sent to the printer.
XSLT is based on the notion of templates. The XSLT processor compares nodes in an input document tree to templates in the stylesheet. When it finds a match, it follows the instructions in that template. These instructions can include XML to output, details about what to copy from the input into the output, and directions for which nodes to process next.
Example 1.12 is an XSLT stylesheet that describes how order documents can be converted into XSL Formatting Objects. This particular stylesheet formats the products as a table, and the rest of the elements as paragraphs. The products are written in a 12-point font while the rest of the document uses 16-point fonts.
Example 1.12. An XSLT stylesheet for order documents
<?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:fo="http://www.w3.org/1999/XSL/Format"> <!-- Try to make the output look half decent --> <xsl:output indent="yes"/> <xsl:template match="Order"> <fo:root> <fo:layout-master-set> <fo:simple-page-master master-name="only"> <fo:region-body margin-left="0.5in" margin-top="0.5in"/> </fo:simple-page-master> </fo:layout-master-set> <fo:page-sequence master-reference="only"> <fo:flow flow-name="xsl-region-body"> <xsl:apply-templates select="Customer"/> <xsl:apply-templates select="ShipTo"/> <fo:table font-size="12pt" space-before="24pt" space-after="24pt"> <fo:table-column column-width="2in"/> <fo:table-column column-width="1in"/> <fo:table-column column-width="1in"/> <fo:table-column column-width="1in"/> <fo:table-body> <fo:table-row font-weight="bold"> <fo:table-cell> <fo:block>Product</fo:block> </fo:table-cell> <fo:table-cell> <fo:block>Quantity</fo:block> </fo:table-cell> <fo:table-cell> <fo:block>Unit Price</fo:block> </fo:table-cell> <fo:table-cell> <fo:block>Subtotal</fo:block> </fo:table-cell> </fo:table-row> <xsl:apply-templates select="Product"/> </fo:table-body> </fo:table> <xsl:apply-templates select="Tax"/> <xsl:apply-templates select="Shipping"/> <xsl:apply-templates select="Total"/> </fo:flow> </fo:page-sequence> </fo:root> </xsl:template> <xsl:template match="Customer"> <fo:block font-size="16pt" font-family="serif" line-height="20pt"> Ship to: </fo:block> <fo:block font-size="16pt" font-family="serif" margin-left="0.5in" line-height="20pt"> <xsl:apply-templates/> </fo:block> </xsl:template> <xsl:template match="ShipTo"> <fo:block font-size="16pt" font-family="sans-serif" line-height="18pt" margin-top="20pt" margin-left="0.5in"> <xsl:apply-templates select="Street"/> </fo:block> <fo:block font-size="16pt" font-family="sans-serif" line-height="18pt" margin-left="0.5in"> <xsl:apply-templates select="City"/>  <xsl:apply-templates select="State"/>  <xsl:apply-templates select="Zip"/> </fo:block> </xsl:template> <xsl:template match="Product"> <fo:table-row> <fo:table-cell> <fo:block><xsl:value-of select="Name"/></fo:block> </fo:table-cell> <fo:table-cell> <fo:block><xsl:value-of select="Quantity"/></fo:block> </fo:table-cell> <fo:table-cell> <fo:block>$<xsl:value-of select="Price"/></fo:block> </fo:table-cell> <fo:table-cell> <fo:block> $<xsl:value-of select="Price*Quantity"/> </fo:block> </fo:table-cell> </fo:table-row> </xsl:template> <xsl:template match="Tax|Shipping|Total"> <fo:block font-size="16pt" font-family="serif" line-height="20pt"> <xsl:value-of select="name()"/>: $<xsl:apply-templates/> </fo:block> </xsl:template> <!-- want to leave this one out of the output --> <xsl:template match="SKU"/> </xsl:stylesheet>
The output vocabulary used in this stylesheet is XSL Formatting Objects. The actual document produced by transforming Example 1.5 with this style sheet is shown in Example 1.13 (with a few allowances for white space). This document would then be fed into an XSL-FO processor such as the Apache XML Project’s open source FOP to convert it to some other format, in this case PDF.
Example 1.13. An XSL-FO document for the clock order
<?xml version="1.0" encoding="utf-8"?> <fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format"> <fo:layout-master-set> <fo:simple-page-master master-name="only"> <fo:region-body margin-left="0.5in" margin-top="0.5in"/> </fo:simple-page-master> </fo:layout-master-set> <fo:page-sequence master-reference="only"> <fo:flow flow-name="xsl-region-body"> <fo:block font-size="16pt" font-family="serif" line-height="20pt"> Ship to: </fo:block> <fo:block font-size="16pt" font-family="serif" margin-left="0.5in" line-height="20pt">Chez Fred</fo:block> <fo:table font-size="12pt" space-before="24pt" space-after="24pt"> <fo:table-column column-width="2in"/> <fo:table-column column-width="1in"/> <fo:table-column column-width="1in"/> <fo:table-column column-width="1in"/> <fo:table-body> <fo:table-row font-weight="bold"> <fo:table-cell> <fo:block>Product</fo:block> </fo:table-cell> <fo:table-cell> <fo:block>Quantity</fo:block> </fo:table-cell> <fo:table-cell> <fo:block>Unit Price</fo:block> </fo:table-cell> <fo:table-cell> <fo:block>Subtotal</fo:block> </fo:table-cell> </fo:table-row> <fo:table-row> <fo:table-cell> <fo:block>Birdsong Clock</fo:block> </fo:table-cell> <fo:table-cell> <fo:block>12</fo:block> </fo:table-cell> <fo:table-cell> <fo:block>$21.95</fo:block> </fo:table-cell> <fo:table-cell> <fo:block> $263.4</fo:block> </fo:table-cell> </fo:table-row> <fo:table-row> <fo:table-cell> <fo:block>Brass Ship's Bell</fo:block> </fo:table-cell> <fo:table-cell> <fo:block>1</fo:block> </fo:table-cell> <fo:table-cell> <fo:block>$144.95</fo:block> </fo:table-cell> <fo:table-cell> <fo:block> $144.95</fo:block> </fo:table-cell> </fo:table-row> </fo:table-body> </fo:table> <fo:block font-size="16pt" font-family="serif" line-height="20pt">Tax: $28.20</fo:block> <fo:block font-size="16pt" font-family="serif" line-height="20pt">Shipping: $8.95</fo:block> <fo:block font-size="16pt" font-family="serif" line-height="20pt">Total: $431.00</fo:block> </fo:flow> </fo:page-sequence> </fo:root>
Since this book focuses more on using XML for computer-to-computer communication rather than computer-to-human or human-to-human communication, I’m not going to spend very many pages on style sheets. Indeed, this is the last time you’re going to see them for the next fifteen chapters. However, you should at least be aware that XSLT is an extremely powerful approach for certain kinds of problems. Indeed once you’ve gained a little facility with XSLT, you’ll notice that it is often easier to write an XSLT style sheet to solve a problem than a classic procedural program in Java. Even when XSLT can’t do everything you need, you may find that it can solve a large part of your problem. XML systems are often designed as a chain of steps: first validation using a DTD or schema, then transformation using XSLT, and finally whatever custom processing you wish to apply using Java. Many apparently hard problems become much simpler and easier to tackle when broken down in this fashion. I’ll explore this further when we return to XSLT in Chapter 17.
Copyright 2001, 2002 Elliotte Rusty Harold | elharo@metalab.unc.edu | Last Modified July 28, 2001 |
Up To Cafe con Leche |