RELAX: Schemas Don't Have to be Hard


RELAX: Schemas Don't Have to be Hard

Elliotte Rusty Harold

Software Development 2005 West

Friday, March 18, 2005

elharo@metalab.unc.edu

http://www.cafeconleche.org/


What are Schemas?


About RELAX NG


What's Wrong with DTDs?


What's Wrong with W3C XML Schema Language?


greeting.xml


greeting.rng


Validating the document with Jing

$ jing greeting.rng greeting.xml


An Invalid Document


Checking the Invalid Document

$ jing  greeting.rng greeting3.xml
/Users/elharo/Documents/speaking/sd2005west/relaxng/examples/greeting3.xml:
3:6: error: unknown element "P"

Notice how completely decoupled the validation is from the instance document. We can validate any document against any schema. We do not need to specify the schema in the instance document as you must do with DTDs and often need to do with W3C schemas.


RELAX NG Validators


The compact syntax: greeting.rnc


A More Complex Document


A More Complex Schema



Groups


A More Flexible Schema



A Couple of Notes


Element Content


Nested structures



Sharing Content Models



When Order Doesn't Matter


interleave



Mixed Content


Declaring Mixed Content

...
  <define name="personContent">
    <element name="NAME">
      <interleave>
        <text />
        <element name="GIVEN">
          <text/>
        </element>
        <element name="FAMILY">
          <text/>
        </element>
      </interleave>
    </element>
  </define>
...

personContent =
  element NAME {
    text
    & element GIVEN  { text }
    & element FAMILY { text }
  }

Choices

A song must have at least one of ARTIST, COMPOSER, or PRODUCER:



Groups

Allow a NAME element to contain either plain text or a GIVEN and a FAMILY but not both:

<element name="NAME>
  <choice>
    <text/>
    <group>
       <interleave>
         <element name="GIVEN">
           <text/>
         </element>
         <element name="FAMILY">
           <text/>
         </element>
       </interleave>
    </group>
  </choice>
</element>

A Document with Attributes


Declaring Attributes



Enumerations

The publisher must be one of the oligopoly that controls 90% of U.S. music (Warner-Elektra-Atlantic, Universal Music Group, Sony Music Entertainment, Inc., Capitol Records, Inc., BMG Music)



Namespaces


Default Namespace


A Schema for a Document that uses Namespaces



A Prefixed Schema for a Document that uses Namespaces



Namespaces on Attributes


Attaching a namespace to an attribute



Data Typing

Consider this document:

<foo>
    <value>45.67</value>
</foo>

What is the type of value?


Possible types

Other interpretations are doubtless possible, and even make sense in particular contexts. There's no guarantee that the string 45.67 in fact represents any particular type.


RELAX NG type libraries


data element



Primitive Data Types for Schemas


Numeric Data Types for Schemas

XML Schema Built-In Numeric Simple Types
Name Type Examples
float IEEE 754 32-bit floating point number -INF, -1E4, -0, 0, 12.78E-2, 12, INF, NaN
double IEEE 754 64-bit floating point number -INF, 1.401E-90, -1E4, -0, 0, 12.78E-2, 12, INF, NaN, 3.4E42
decimal arbitrary precision, decimal numbers -2.7E400, 5.7E-444, -3.1415292, 0, 7.8, 90200.76, 3.4E1024
integer an arbitrarily large or small integer -500000000000000000000000, -9223372036854775809, -126789, -1, 0, 1, 5, 23, 42, 126789, 9223372036854775808, 456734987324983264987362495809587095720978
nonPositiveInteger an integer less than or equal to zero 0, -1, -2, -3, -4, -5, ...
negativeInteger an integer strictly less than zero -1, -2, -3, -4, -5, ...
long an eight-byte two's complement integer such as Java's long type -9223372036854775808, -12678967543233, -1, 9223372036854775807
int an integer that can be represented as a four-byte, two's complement number such as Java's int type -2147483648, -1, 0, 1, 5, 23, 42, 2147483647
short an integer that can be represented as a two-byte, two's complement number such as Java's short type -32768, -1, 0, 1, 5, 23, 42, 32767
byte an integer that can be represented as a one-byte, two's complement number such as Java's byte type -128, -1, 0, 1, 5, 23, 42, 127
nonNegativeInteger an integer greater than or equal to zero 0, 1, 2, 3, 4, 5, ...
unsignedLong an eight-byte unsigned integer 0, 1, 2, 3, 4, 5, ...18446744073709551614, 18446744073709551615
unsignedInt a four-byte unsigned integer 0, 1, 2, 3, 4, 5, ...4294967294, 4294967295
unsignedShort a two-byte unsigned integer 0, 1, 2, 3, 4, 5, ...65534, 65535
unsignedByte a one-byte unsigned integer 0, 1, 2, 3, 4, 5, ...254, 255
positiveInteger an integer strictly greater than zero 1, 2, 3, 4, 5, 6, ...

Time Data Types for Schemas

XML Schema Built-In Time Simple Types
Name Type Examples
dateTime a particular moment in Coordinated Universal Time; up to an arbitrarily small fraction of a second 1999-05-31T13:20:00.000-05:00
gMonth A given month in a given year 2000-10
gYear a given year 2000
gMonthDay a date in no particular year, or rather in every year --10-31
gDay a day in no particular month, or rather in every month ----31
duration a length of time, without fixed endpoints, to an arbitrary fraction of a second P2000Y10M31DT09H32M7.4312S
date a specific day in history 2000-10-31
time a specific time of day, that recurs every day 14:30:00.000, 09:30:00.000-05:00

XML Data Types for Schemas

XML Schema Built-In XML Simple Types
Name Type Examples
ID XML 1.0 ID attribute type any XML name that's unique among ID type attributes
IDREF XML 1.0 IDREF attribute type any XML name that's used as an ID type attribute elsewhere in the document
ENTITY XML 1.0 ENTITY attribute type any XML name that's declared as an unparsed entity in the DTD
NOTATION ???? ????
language Permissible values for xml:lang as defined in XML 1.0 en-GB, en-US, fr
IDREFS XML 1.0 IDREFS attribute type a white space separated list of IDREF names
ENTITIES XML 1.0 ENTITIES attribute type a white space separated list of ENTITY names
NMTOKEN XML 1.0 NMTOKEN attribute type 12 are you ready
NMTOKENS XML 1.0 NMTOKENS attribute type a white space separated list of name tokens
Name An XML 1.0 Name set, title, rdf, math, math123, href
QName an optionally prefixed, namespace qualified name song:title
NCName a local name without any colons title

Assorted Data Types for Schemas

XML Schema Built-In Simple Types
Name Type Examples
string Parsed Character Data; #PCDATA Hot Cop
normalizedString A string whose normalized value does not contain any tabs, carriage returns, or linefeeds PIC1, PIC2, PIC3, cow_movie, MonaLisa, Hello World , Warhol, red green
token A string whose normalized value has no leading or trailing white space, no tabs, no linefeeds, and not more than one consecutive space p1 p2, ss123 45 6789, _92, red, green, NT Decl, seventeenp1, p2, 123 45 6789, ^*&^*&_92, red green blue, NT-Decl, seventeen; Mary had a little lamb, The love of money is the root of all Evil.
boolean C++'s bool type true, false, 1, 0
anyURI relative or absolute URI http://www.w3.org/TR/2000/WD-xmlschema-2-20000407/#duration, /javafaq/reports/JCE1.2.1.html
hexBinary Arbitrary binary data encoded in hexadecimal form A4E345EC54CC8D52198000FFEA6C
base64Binary Arbitrary binary data encoded in Base64 6jKpNnmkkWeArsn5Oeeg2njcz+nXdk0f9kZI892ddlR8Lg1aMhPeFTYuoq3I6neFlb BjWzuktNZKiXYBfKsSTB8U09dTiJo2ir3HJuY7eW/p89osKMfixPQsp9vQMgzph6Qa lY7j4MB7y5ROJYsTr1/fFwmj/yhkHwpbpzed1LE=

Restricting types


Facets


Length Facets: length, minLength, maxLength


Facets for ordered items: minExclusive, maxExclusive, minInclusive, maxInclusive


Facets for decimal numbers: totalDigits and fractionDigits


Adding a Price


The pattern facet


Regular Expressions


The Price Schema



Lists

Allow multiple years in the YEAR element:

<element name="YEAR">
  <list>
    <oneOrMore>
      <data type="gYear"/>
    </oneOrMore>
  </list>
</element>

element SONG {
  element TITLE     { text },
  element COMPOSER  { xsd:string }+,
  element PRODUCER  { xsd:string }*,
  element PUBLISHER { xsd:string }?,
  element LENGTH    { xsd:string }?,
  element YEAR {
    list { xsd:gYear+ }
  }?,
  element ARTIST    { xsd:string }+
}

User defined type libraries


Prime Datatype


Prime Datatype Library


Prime Datatype Library Factory


Locating type libraries


An invalid document


A Prime schema



include element



externalref element


Pointing to the XHTML schema



Annotations



RELAX NG in Java 1.5/JAXP 1.3


Other features


What RELAX NG doesn't do

* Can be added


RELAX NG is being used for


Trang

$ trang http://www.w3.org/Graphics/SVG/1.2/rng/Full-1.2/Full-1.2.rng full.xsd
http://www.w3.org/Graphics/SVG/1.2/rng/Tiny-1.2/tiny-structure.rng:401:13: warning: choice between attributes and children cannot be represented; approximating
http://www.w3.org/Graphics/SVG/1.2/rng/Tiny-1.2/tiny-structure.rng:420:13: warning: choice between attributes and children cannot be represented; approximating
http://www.w3.org/Graphics/SVG/1.2/rng/Tiny-1.2/tiny-structure.rng:438:13: warning: choice between attributes and children cannot be represented; approximating
http://www.w3.org/Graphics/SVG/1.2/rng/Full-1.2/structure.rng:24:13: warning: choice between attributes and children cannot be represented; approximating
http://www.w3.org/Graphics/SVG/1.2/rng/Tiny-1.2/script.rng:38:13: warning: choice between attributes and children cannot be represented; approximating
http://www.w3.org/Graphics/SVG/1.2/rng/Tiny-1.2/tiny-flow.rng:22:15: warning: cannot represent an optional group of attributes; approximating
http://www.w3.org/Graphics/SVG/1.2/rng/Tiny-1.2/handler.rng:44:13: warning: choice between attributes and children cannot be represented; approximating
http://www.w3.org/Graphics/SVG/1.2/rng/Full-1.2/style.rng:81:13: warning: choice between attributes and children cannot be represented; approximating

To Learn More


Index | Cafe con Leche

Copyright 2005 Elliotte Rusty Harold
elharo@metalab.unc.edu
Last Modified March 18, 2005