Processing XML with Java
Processing XML with Java
Please turn off all
Where we're going
Processing XML with Java is easy
Prerequisites
Parser APIs
Part I: XML Infoset
A simple example
Markup and Character Data
Markup and Character Data Example
Entities
Parsed Character Data
CDATA sections
Comments
Processing Instructions
The XML Declaration
Document Type Declaration
Document Type Definition (DTD)
XML Names
Questions?
XML Namespaces
Namespace Syntax
Namespace URIs
Binding Prefixes to Namespace URIs
The Default Namespace
How Parsers Handle Namespaces
Questions?
Three Variations on a Theme
A normal XML document
A canonical XML document
An org.w3c.dom.Document object formed by reading hotcop.xml
Are these three the same thing or not?
What is the XML InfoSet?
The InfoSet defines 11 kinds of Information Items
The Document Information Item
Element Information Items
Attributes
Comments
A Processing Instruction Information Item Includes:
Characters
Namespace Information Items
Document Type Declaration
Unparsed Entity Information Items
The InfoSet Omits:
To Learn More
Questions?
Part II: Writing XML Documents with Java
Unicode
Readers and Writers
A Java program that writes Fibonacci numbers into a text file
fibonacci.txt
A Java program that writes Fibonacci numbers into an XML file
fibonacci.xml
Suppose we want to use a different encoding than UTF-8
fibonacci_8859_1.xml
Suppose you want to include a DTD
valid_fibonacci.xml
Questions?
Converting data to XML
Sample Tab Delimited Data: Baseball Statistics
A Program to convert tab delimited data to XML
Baseball Stats in XML
Converting data to XML while Processing it
Batting Averages in XML
The point is this:
Questions?
To Learn More
Part III: Reading XML Documents with SAX
Reading XML Documents
SAX
SAX Parsers for Java
SAX1
SAX2
The SAX2 Process
Making an XMLReader
Parsing a Document with XMLReader
Sample Output from SAX2Checker
The ContentHandler interface
SAX2 Event Reporter
Event Reporter Output
Questions?
A Sample Application
Goal: Return a list of all the URLs in this list as java.net.URL objects
SAX Design
User Interface Class
ContentHandler Class
Weblogs Output
Features and Properties
Feature/Property SAXExceptions
Required Features
Core Features
Turning on Validation
Three Levels of Errors
The ErrorHandler interface
An ErrorHandler for Reporting Validity Errors
Validating
Core Properties
Nonstandard Features in Xerces
Nonstandard Properties in Xerces
Properties for Extension Handlers
Handling Attributes in SAX2
Attributes Example
Resolving Entities
EntityResolver Example
Questions?
Handling DTDs
DTDHandler Example
TextEntityReplacer
Handling Declarations
The DeclHandler interface:
DTDMerger
Handling Lexical Events
The LexicalHandler interface
LexicalHandler Example
SAXCommentReader Output
The Locator interface
Locator Example
Locator Example
The DefaultHandler class
The NamespaceSupport class
Filtering XML
XMLFilter Example
TextMerger
InputSource
The InputSource interface
Example of InputSource
What SAX2 doesn't do
Event Based API Caveats
To Learn More
Questions?
Part IV: DOM, The Document Object Model
Where we're going
Trees
Document Object Model
DOM Evolution
DOM Parsers for Java
Eight Modules:
Which modules and features are supported?
Which modules are supported?
Which modules are supported? Results
DOM Trees
org.w3c.dom
The DOM Process
Parsing documents with a DOM Parser Example
The JAXP Process
Parsing documents with a JAXP DocumentBuilder
The Node Interface
The NodeList Interface
Node Reporter
Node Reporter Output
Node Values as returned by getNodeValue()
The Document Node
The Document Interface
A Sample Application
DOM Design
Weblogs with DOM
Weblogs Output
Element Nodes
The Element Interface
IDTagger
Output from IDTagger
CharacterData interface
The CharacterData Interface
ROT13 XML Text
ROT13 XML Output
Text Nodes
The Text Interface
CDATA section Nodes
The CDATASection Interface
DocumentType Nodes
The DocumentType Interface
Example of the DocumentType Interface
XHTMLValidator
EntityReference Nodes
The EntityReference Interface
Attr Nodes
The Attr Interface
XLinkSpider with DOM
ProcessingInstruction Nodes
The ProcessingInstruction Interface
XLinkSpider that Respects robots processing instruction
Comment Nodes
The Comment Interface
Comment Example
DOMCommentReader Output
Entity Nodes
The Entity Interface
DOMException
Questions?
The org.w3c.dom.traversal Package
NodeIterator
ValueReporter
ValueReporter Output
NodeFilter
DOM based TagStripper
Output from a DOM based TagStripper
Questions?
Writing XML Documents with DOM
org.apache.xerces.dom.DOMImplementationImpl
A Xerces/DOM program that writes Fibonacci numbers into an XML document
A JAXP/DOM program that writes Fibonacci numbers into an XML document
Serialization
A DOM program that writes Fibonacci numbers onto System.out
fibonacci.xml
OutputFormat
Better formatted output
formatted_fibonacci.xml
DOM based XMLPrettyPrinter
Output from a DOM based XMLPrettyPrinter
The point is this:
Questions?
To Learn More
Part V: JDOM
Where we're going
What is JDOM?
About JDOM
JDOM versions
Six packages:
The org.jdom package
The org.jdom.input package
The org.jdom.output package
The org.jdom.filter package
The org.jdom.adapters package
The org.jdom.transform package
Writing XML Documents with JDOM
A JDOM program that writes this XML document
Hello JDOM
Actual Output
Hello DOM
White space is significant
Actual Output
fibonacci.xml
A JDOM program that writes Fibonacci numbers into an XML file
Output
Suppose you want to include a DTD
ValidFibonacci
validfibonacci.xml
Internal DTD Subsets
internalvalidfibonacci.xml
Using Namespaces
Rules for Using Namespaces
With Namespace Prefixes
The Default, Unprefixed Namespace
Rules for Using Default Namespace
With Default Namespace
Converting data to XML
Sample Tab Delimited Data: Baseball Statistics
A Program to convert tab delimited data to XML
Baseball Stats in XML
A Shortcut
Questions?
Converting data to XML while Processing it
Batting Averages in XML
Advantages of JDOM for Writing Documents
Questions?
Reading XML with JDOM
JDOM Compatible Parsers for Java
The Design of the DOM API
The JDOM Process
Parsing a Document with JDOM
Parser Results
Turning on Validation in JDOM
JDOM Validator
Validation Output
Building with DOM instead of SAX
DOMBuilder Example
Weblogs with JDOM
Goal: Return a list of all the URLs in this list as java.net.URL objects
JDOM Design
Weblogs with JDOM
Weblogs Output
The org.jdom Package
The Document Node
The Document Class
Document Example
Output from XMLPrinter
Element Nodes
Element Class Implementation
The Element Class
Element Example: XCount
XCount Output
Handling Attributes in JDOM
The Attribute Class
IDTagger
Before IDTagger
After IDTagger
Handling Entities in JDOM
The EntityRef Class
Handling Comments in JDOM
The Comment Class
Comment Example
CommentReader Output
ProcessingInstruction Nodes
The ProcessingInstruction Class
XLinkSpider that Respects the robots Processing Instruction
Handling Namespaces
The Namespace Class
The Namespace Class
DocType Nodes
The DocType class
Example of the DocType Class
XHTMLValidator
Using the XHTMLValidator
The Verifier Class
The Verifier Class
JDOMException
JDOMException Class
The org.jdom.output Package
Serialization
XMLOutputter
Using the XMLOutputter Class Directly
Using the XMLOutputter Class Indirectly
JDOM based TagStripper
Output from a JDOM based TagStripper
Talking to DOM Programs
Talking to SAX Programs
What JDOM doesn't do
To Learn More
Questions?
Part VI: dom4J
To Learn More
Questions?
Part VII: TrAX
Questions?
Evaluations
To Learn More
Questions?
Entire Presentation as Single File
Start
|
Cafe con Leche
Copyright 2000-2002 Elliotte Rusty Harold
Last Modified April 19, 2002