Processing XML with Java
Processing XML with Java
Please turn off all
Where we're going
Processing XML with Java is easy
Prerequisites
XML API Styles
Parser APIs
Part I: XML Infoset
A simple example
Markup and Character Data
Markup and Character Data Example
Elements and Tags
Entities
Parsed Character Data
CDATA sections
Comments
Processing Instructions
The XML Declaration
Document Type Declaration
Document Type Definition (DTD)
XML Names
Questions?
XML Namespaces
Namespace Syntax
Namespace URIs
Binding Prefixes to Namespace URIs
The Default Namespace
How Parsers Handle Namespaces
Questions?
Three Variations on a Theme
A normal XML document
A canonical XML document
An org.w3c.dom.Document object formed by reading hotcop.xml
Are these three the same thing or not?
What is the XML Infoset?
A kangaroo is infoset conformant if its collar says, "This kangaroo contains no information items."
The Infoset defines 11 Kinds of Information Items
The Document Information Item
Element Information Items
Attributes
Comments
A Processing Instruction Information Item Includes:
Characters
Namespace Information Items
Document Type Declaration
Unparsed Entity Information Items
The Infoset Omits:
The Five Layers of XML Processing
To Learn More
Questions?
Part II: Writing XML Documents with Java
You don't always need a new API
Unicode
Readers and Writers
A Java program that writes Fibonacci numbers into a text file
fibonacci.txt
A Java program that writes Fibonacci numbers into an XML file
fibonacci.xml
Single quoted attribute values are a little cleaner
Suppose we want to use a different encoding than UTF-8
fibonacci_Latin_1.xml
Suppose you want to include a DTD
valid_fibonacci.xml
Questions?
Converting data to XML
Sample Tab Delimited Data: Baseball Statistics
A Program to convert tab delimited data to XML
Baseball Stats in XML
Converting data to XML while Processing it
Batting Averages in XML
The point is this:
Questions?
To Learn More
Part III: Reading XML Documents with SAX
Reading XML Documents
SAX
SAX Parsers for Java
The Horrors of the CLASSPATH
SAX1
SAX2
The SAX2 Process
Making an XMLReader
Parsing a Document with XMLReader
Sample Output from SAX2Checker
The ContentHandler interface
SAX2 Event Reporter
Event Reporter Output
Questions?
A Sample Application
Goal: Return a list of all the URLs in this list as java.net.URL objects
SAX Design
User Interface Class
ContentHandler Class
Weblogs Output
Questions?
Features and Properties
Feature/Property SAXExceptions
Required Features
Core Features
Turning on Validation
Three Levels of Errors
The ErrorHandler interface
An ErrorHandler for Reporting Validity Errors
Validating
Core Properties
Nonstandard Features in Xerces
Nonstandard Properties in Xerces
Properties for Extension Handlers
Questions?
Handling Attributes in SAX2
Attributes Example
Resolving Entities
EntityResolver Example
Questions?
Handling DTDs
DTDHandler Example
TextEntityReplacer
Handling Declarations
The DeclHandler interface:
DTDMerger
Handling Lexical Events
The LexicalHandler interface
LexicalHandler Example
SAXCommentReader Output
The Locator interface
Locator Example
Locator Example
The DefaultHandler class
The NamespaceSupport class
Filtering XML
XMLFilter Example
TextMerger
InputSource
The InputSource interface
Example of InputSource
What SAX2 doesn't do
Event Based API Caveats
To Learn More
Questions?
Part IV: DOM, The Document Object Model
Where we're going
Trees
Document Object Model
DOM Evolution
DOM Implementations for Java
Eight Modules:
DOM Trees
org.w3c.dom
The DOM Process
Parsing documents with a DOM Parser Example
The JAXP Process
Parsing documents with a JAXP DocumentBuilder
Questions?
The Node Interface
The NodeList Interface
Node Reporter
Node Reporter Output
Node Values as returned by getNodeValue()
The Document Node
The Document Interface
A Sample Application
DOM Design
Weblogs with DOM
Weblogs Output
Questions?
Element Nodes
The Element Interface
IDTagger
Output from IDTagger
CharacterData interface
The CharacterData Interface
ROT13 XML Text
ROT13 XML Output
Text Nodes
The Text Interface
CDATA section Nodes
The CDATASection Interface
DocumentType Nodes
The DocumentType Interface
Example of the DocumentType Interface
XHTMLValidator
EntityReference Nodes
The EntityReference Interface
Attr Nodes
The Attr Interface
XLinkSpider with DOM
ProcessingInstruction Nodes
The ProcessingInstruction Interface
XLinkSpider that Respects robots processing instruction
Comment Nodes
The Comment Interface
Comment Example
DOMCommentReader Output
Entity Nodes
The Entity Interface
DOMException
Questions?
The org.w3c.dom.traversal Package
NodeIterator
ValueReporter
ValueReporter Output
NodeFilter
DOM based TagStripper
Output from a DOM based TagStripper
TreeWalker
Questions?
Writing XML Documents with DOM
The DOMImplementation interface
org.apache.xerces.dom.DOMImplementationImpl
A Xerces/DOM program that writes Fibonacci numbers into an XML document
A JAXP/DOM program that writes Fibonacci numbers into an XML document
Serialization
A DOM program that writes Fibonacci numbers onto System.out
fibonacci.xml
OutputFormat
Better formatted output
formatted_fibonacci.xml
DOM based XMLPrettyPrinter
Output from a DOM based XMLPrettyPrinter
The point is this:
Questions?
To Learn More
Part V: JDOM
Where we're going
What is JDOM?
About JDOM
JDOM versions
Six packages:
The org.jdom package
The org.jdom.input package
The org.jdom.output package
The org.jdom.filter package
The org.jdom.adapters package
The org.jdom.transform package
The org.jdom.xpath package
Writing XML Documents with JDOM
A JDOM program that writes this XML document
Hello JDOM
Actual Output
Hello DOM
White space is significant
Actual Output
fibonacci.xml
A JDOM program that writes Fibonacci numbers into an XML file
Output
Controlling white space on output
Output
Suppose you want to include a DTD
ValidFibonacci
validfibonacci.xml
Internal DTD Subsets
internalvalidfibonacci.xml
Using Namespaces
Rules for Using Namespaces
With Namespace Prefixes
The Default, Unprefixed Namespace
Rules for Using Default Namespace
With Default Namespace
Converting data to XML
Sample Tab Delimited Data: Baseball Statistics
A Program to convert tab delimited data to XML
Baseball Stats in XML
A Shortcut
Questions?
Converting data to XML while Processing it
Batting Averages in XML
Advantages of JDOM for Writing Documents
Questions?
Reading XML with JDOM
JDOM Compatible Parsers for Java
The JDOM Process
Parsing a Document with JDOM
Parser Results
Turning on Validation in JDOM
JDOM Validator
Validation Output
Weblogs with JDOM
Goal: Return a list of all the URLs in this list as java.net.URL objects
JDOM Design
Weblogs with JDOM
Weblogs Output
The org.jdom Package
The Document Node
The Document Class
Document Example
Output from XMLPrinter
Element Nodes
Element Class Implementation
The Element Class
Element Example: XCount
XCount Output
Handling Attributes in JDOM
The Attribute Class
XLinkSpider with JDOM
IDTagger
Before IDTagger
After IDTagger
Handling Entities in JDOM
The EntityRef Class
Handling Comments in JDOM
The Comment Class
Comment Example
CommentReader Output
ProcessingInstruction Nodes
The ProcessingInstruction Class
XLinkSpider that Respects the robots Processing Instruction
Handling Namespaces
The Namespace Class
DocType Nodes
The DocType class
Example of the DocType Class
XHTMLValidator
Using the XHTMLValidator
The Verifier Class
The Verifier Class
JDOMException
JDOMException Class
The org.jdom.output Package
Serialization
XMLOutputter
Using the XMLOutputter Class Directly
Using the XMLOutputter Class Indirectly
JDOM based TagStripper
Output from a JDOM based TagStripper
Talking to DOM Programs
Talking to SAX Programs
What JDOM doesn't do
To Learn More
Questions?
Part VI: Pull Parsing
To Learn More
Part VII: TrAX
What is TrAX
TrAX Classes
The Process of a TrAX Transformation
TrAX Example
Thread Safety
Locating Transformers
The xml-stylesheet processing instruction
Features
Features Example
Feature Tester Output
XSLT Processor Attributes
URI Resolution
Error Handling
ErrorListener Example
Passing Parameters to Style Sheets
Output Properties
Controlling Output Properties from Java
Sources and Results
DOMSource and DOMResult
SAXSource and SAXResult
StreamSource and StreamResult
To Learn More
Questions?
To Learn More
Questions?
Entire Presentation as Single File
Start
|
Cafe con Leche
Copyright 2000-2003 Elliotte Rusty Harold
Last Modified March 27, 2003