The Mozilla Project has released Firefox 2.0.0.15. This release fixes security issues. All 2.x users should upgrade. (That includes me: Firefox 3 broke some of the AppleScript I depend on to manage this site.)
A new version of SeaMonkey has also been posted, though Camino doesn't seem to have been updated yet. Camino users may want to switch to Firefox or Safari for the time being.
Michael Kay has released version 9.1 of Saxon, his XSLT 2.0 and XQuery processor for Java and .NET. According to Kay,
For the XSLT user the most interesting developments are probably in the area of streaming, allowing large documents to be processed without constructing a complete tree in memory: see
http://www.saxonica.com/documentation/sourcedocs/serial.html
The
saxon:stream()extension function is essentially a repackaging of the existing <xsl:copy-of saxon:read-once> instruction, but it becomes a lot more versatile with the new syntax; in addition, a wider class of XPath expressions can now be streamed. Apart from this syntactic change, there are two other significant enhancements:
the
saxon:iterateextension instruction allows "stateful" streamed processing where the processing of an element in the document can depend on data that was seen earlier in the stream. This was not previously possible.operations that only need to see data near the start of the document will cause the XML parsing to terminate as soon as the required data is available. So you can get the title of a document without parsing the whole document.
Saxon is published in two versions for both of which Java 1.4 or later (or .NET) is required. Saxon 9.0B is an open source product published under the Mozilla Public License 1.0 that "implements the 'basic' conformance level for XSLT 2.0 and XQuery." Saxon 9.0 SA is a £250.00 payware version that "allows stylesheets and queries to import an XML Schema, to validate input and output trees against a schema, and to select elements and attributes based on their schema-defined type. Saxon-SA also incorporates a free-standard XML Schema validator. In addition Saxon-SA incorporates some advanced extensions not available in the Saxon-B product. These include a try/catch capability for catching dynamic errors, improved error diagnostics, support for higher-order functions, and additional facilities in XQuery including support for grouping, advanced regular expression analysis, and formatting of dates and numbers."
SyncroSoft has released <Oxygen/> 9.3, $345 payware XML editor written in Java. Oxygen supports XML, XSL, DTDs, XQuery, SVG, Relax NG, Schematron, and the W3C XML Schema Language. According to the announcement, 9.3 adds support for OOXML, ODF, and other ZIP-wrapped XML packges.
The W3C POWDER Working Group has published a new working draft of Protocol for Web Description Resources (POWDER): Description Resources.
The Protocol for Web Description Resources (POWDER) facilitates the publication of descriptions of multiple resources such as all those available from a Web site. These descriptions are always attributed to a named individual, organization or entity that may or may not be the creator of the described resources. This contrasts with more usual metadata that typically applies to a single resource, such as a specific document's title, which is usually provided by its author.
This document sets out how Description Resources (DRs) can be created and published, how to link to DRs from other online resources, and, crucially, how DRs may be authenticated and trusted. The aim is to provide a platform through which opinions, claims and assertions about online resources can be expressed by people and exchanged by machines. POWDER has evolved from the data model developed for the final report [XGR] of the Web Content Label Incubator Group [WCL-XG] from which we define a Description Resource as: "a resource that contains a description, a definition of the scope of the description and assertions about both the circumstances of its own creation and the entity that created it."
Microsoft has released the Office Open XML File Format Converter for Microsoft Office 2004 for the Mac. This updater enables Mac Office 2004 to open and save files in the OOXML format supported by Microsoft Office 2007 for Windows and Microsoft Office 2008 for Mac.
Personally I uninstalled Microsoft Office 2004 from my Mac about a month ago due to massive instability and hangs, and haven't missed it. I tried updating to the latest point release first, but they're about a dozen different updaters that have to be downloaded and applied in a specific order and who has time for that? Instead, I've been using Google Docs for the limited amount of Word docs I need to read. I suppose if I were writing another book, I might have to reinstall Word, but short of that I just don't the see the need.
The W3C Semantic Web Best Practices and Deployment Working Group and HTML Working Groups have published a new working draft of RDFa Primer 1.0 and a candidate recommendation of RDFa in XHTML: Syntax and Processing.
The modern Web is made up of an enormous number of documents that have been created using HTML. These documents contain significant amounts of structured data, which is largely unavailable to tools and applications. When publishers can express this data more completely, and when tools can read it, a new world of user functionality becomes available, letting users transfer structured data between applications and web sites, and allowing browsing applications to improve the user experience: an event on a web page can be directly imported into a user's desktop calendar; a license on a document can be detected so that users can be informed of their rights automatically; a photo's creator, camera setting information, resolution, location and topic can be published as easily as the original photo itself, enabling structured search and sharing.
RDFa is a specification for attributes to be used with languages such as HTML and XHTML to express structured data. The rendered, hypertext data of XHTML is reused by the RDFa markup, so that publishers don't need to repeat significant data in the document content. This document only specifies the use of the RDFa attributes with XHTML. The underlying abstract representation is RDF [RDF-PRIMER], which lets publishers build their own vocabulary, extend others, and evolve their vocabulary with maximal interoperability over time. The expressed structure is closely tied to the data, so that rendered data can be copied and pasted along with its relevant structure.
The rules for interpreting the data are generic, so that there is no need for different rules for different formats; this allows authors and publishers of data to define their own formats without having to update software, register formats via a central authority, or worry that two formats may interfere with each other.
RDFa shares some use cases with microformats [MICROFORMATS]. Whereas microformats specify both a syntax for embedding structured data into HTML documents and a vocabulary of specific terms for each microformat, RDFa specifies only a syntax and relies on independent specification of terms (often called vocabularies or taxonomies) by others. RDFa allows terms from multiple independently-developed vocabularies to be freely intermixed and is designed such that the language can be parsed without knowledge of the specific term vocabulary being used.
This document is a detailed syntax specification for RDFa, aimed at:
- those looking to create an RDFa parser, and who therefore need a detailed description of the parsing rules;
- those looking to recommend the use of RDFa within their organisation, and who would like to create some guidelines for their users;
- anyone familiar with RDF, and who wants to understand more about what is happening 'under the hood', when an RDFa parser runs.
For those looking for an introduction to the use of RDFa and some real-world examples, please consult the RDFa Primer.
Here's a syntax example from the primer draft:
<div about="/posts/trouble_with_bob">
<h2 property="dc:title">The trouble with Bob</h2>
The trouble with Bob is that he takes much better photos than I do:
<div about="http://example.com/bob/photos/sunset.jpg">
<img src="http://example.com/bob/photos/sunset.jpg" />
<span property="dc:title">Beautiful Sunset</span>
by <span property="dc:creator">Bob</span>.
</div>
</div>
The thing that jumps out at me are the use of namespace prefixes in attribute values. Haven't we learned by now that this is a bad idea?
I'm actually designing a significant metadata system at my day job at the moment, and for the life of me I can't figure out why we should use RDF in any shape or form. It doesn't offer clients any useful tools, and just makes the data more opaque. Most of the interesting meta-things we want to say will have to be hand-coded anyway because there are no standards for them. I think we're going to go with a hand-rolled XML syntax as the simplest thing that could possibly work. If anyone asks for RDF, we can always publish a GRDDL or XSLT transform; but RDF just seems pointless.
Adobe has released Acrobat 9. You can now embed movies in Acrobat documents. I wonder if this version will include an actually working Firefox plugin and a Reader that doesn't crash on the second page of every document? Hmm, apparently the answer is no. The reader is still the unusably buggy 8.1.2.
The W3C Web Application Formats Working Group has posted the last call working draft of Widgets 1.0 Requirements. "A widget is an interactive single purpose application for displaying and/or updating local data or data on the Web, packaged in a way to allow a single download and installation on a user's machine or mobile device. A widget may run as a stand alone application (meaning it can run outside of a Web browser), or may be embedded into a Web document. In this document, the runtime environment on which a widget is run is referred to as a widget user agent and a running widget is referred to as an instantiated widget. Prior to instantiation, a widget exists as a widget resource.
The W3C has published a note on A Prototype Knowledge Base for the Life Sciences. "The prototype we describe is a biomedical knowledge base, constructed for a demonstration at Banff WWW2007 , that integrates 15 distinct data sources using currently available Semantic Web technologies such as the W3C standard Web Ontology Language [OWL] and Resource Description Framework [RDF]. This report outlines which resources were integrated, how the knowledge base was constructed using free and open source triple store technology, how it can be queried using the W3C Recommended RDF query language SPARQL [SPARQL], and what resources and inferences are involved in answering complex queries. While the utility of the knowledge base is illustrated by identifying a set of genes involved in Alzheimer's Disease, the approach described here can be applied to any use case that integrates data from multiple domains."
Just in case you missed the sirens and flashing lights, Firefox 3 is now out. Download it. Use it. Love it. (Some older extensions may be incompatible. Downloader assumes all liability. Contents may be hot. Do not use while driving. Free ice cream offer void in Louisiana. Not responsible for typographical errors, or pretty much anything else.)
The W3C has posted a new working draft of HTML 5. "This specification defines the 5th major revision of the core language of the World Wide Web: the Hypertext Markup Language (HTML). In this version, new features are introduced to help Web application authors, new elements are introduced based on research into prevailing authoring practices, and special attention has been given to defining clear conformance criteria for user agents in an effort to improve interoperability." There are also drafts of HTML 5 differences from HTML 4 and HTML 5 Publication Notes. The latter contains a convenient list of changes since the January 22 draft:
- Implementation and authoring details around the
pingattribute have changed.<meta http-equiv=content-type>is now a conforming way to set the character encoding.- API for the
canvaselement has been cleaned up. Text support has been added.globalStorageis now restricted to the same-origin policy and renamed tolocalStorage. Related event dispatching has been clarified.postMessage()API changed. Only the origin of the message is exposed, no longer the URI. It also requires a second argument that indicates the origin of the target document.- Drag and drop API has got clarification. The
dataTransferobject now has atypesattribute indicating the type of data being transferred.- The
melement is now calledmark.- Server-sent events has changed and gotten clarification. It uses a new format so that older implementations are not broken.
- The
figureelement no longer requires a caption.- The
olelement has a newreversedattribute.- Character encoding detection has changed in response to feedback.
- Various changes have been made to the HTML parser section in response to implementation feedback.
- Various changes to the editing section have been made, including adding
queryCommandEnabled()and related methods.- The
headersattribute has been added fortdelements.- The
tableelement has a newcreateTBody()method.- MathML support has been added to the HTML parser section. (SVG support is still awaiting input from the SVG WG.)
- Author defined attributes have been added. Authors can add attributes to elements in the form of
data-nameand can access these through the DOM usingdataset[name]on the element in question.- The
qelement has changed to require punctation inside rather than having the browser render it.- The
targetattribute can now have the value_blank.- The
showModalDialogAPI has been added.- The
document.domainAPI has been defined.- The
sourceelement now has a newpixelratioattribute useful for videos that have some kind encoding error.bufferedBytes,totalBytesandbufferingThrottledDOM attributes have been added to thevideoelement.- Media
beginevent has been renamed toloadstartfor consistency with the Progress Events specification.charsetattribute has been added toscript.- The
iframeelement has gained thesandboxandseamlessattributes which provide sandboxing functionality.- The
ruby,rtandrpelements have been added to support ruby annotation.- A
showNotification()method has been added to show notification messages to the user.- Support for
beforeprintandafterprintevents has been added.
The W3C XHTML2 Working Group has published proposed recommendations of XHTML Modularization 1.1 and XHTML Basic 1.1. "The former provides a means for subsetting and extending XHTML, a feature needed for extending XHTML's reach onto emerging platforms. This specification is intended for use by language designers as they construct new XHTML Family Markup Languages. This second version of this specification includes several minor updates to provide clarifications and address errors found in the first version. It also provides an implementation using XML Schemas. This version of XHTML Basic, which uses the Modularization approach, has been brought into alignment with the widely deployed XHTML Mobile Profile from the Open Mobile Alliance (OMA). XHTML Basic 1.1 will thus make it easier to author Web pages that work on millions of mobile handsets. Comments on these specifications are welcome through 15 July."
Opera Software has released version 9.5 of their namesake free-beer web browser for Windows, Mac, and Linux, FreeBSD, and Solaris. This release is supposed to be much faster than previously.
I wonder if they've deuglified it yet? Hmm, looks like they tried, but it didn't quite take. They may have hired a real artist to draw the buttons and the icons for the first time, because those are looking good in isolation. However, I'd guess they didn't hire a professional user interface designer to put them all together. The fonts are still wrong (the ones in the UI widgets, that is, not the ones in the web page) and the alignment of various components is way off. This is a frequent problem with cross-platform apps, but Firefox has done a good job with this for years now, so it's certainly possible to get this right. There seem to be multiple other user interface glitches, like a close button (white X on a red background in a widgets pane) that doesn't seem to actually close anything.
Opera may be the fastest browser on the planet, but if it is, I'll never know because it's just too damned ugly to look at for any length of time. Opera should split out the core rendering engine (which isn't bad) from the UI, so someone else can wrap some decent chrome around it. Right now, Opera is like putting a 3900 HF VVT engine in a AMC pacer body and slapping a fresh coat of paint over it.
The OpenOffice Project has released OpenOffice 2.4.1, an open source office suite for Linux, Solaris, and Windows that saves all its files as zipped XML. 2.4.1 is a bug fix release.
I deleted Microsoft Office from my MacBook a couple of weeks ago, because of severe bugs (startup took forever). I haven't had any trouble living without it, but then I'm no longer a fulltime writer. For now I'm just using Google Docs as my replacement. Openoffice is still too ugly to tolerate.
The W3C XML Security Specifications Maintenance Working Group
has published the second edition of
XML Signature Syntax and Processing (Second Edition) .
"This Second Edition of XML Signature Syntax and
Processing adds Canonical XML 1.1 as a required
canonicalization algorithm and recommends its use for inclusive
canonicalization. This version of Canonical XML enables use of
xml:id and xml:base Recommendations
with XML Signature and also enables other possible future
attributes in the XML namespace. Additional minor changes,
including the incorporation of known errata, are documented in
Changes in XML Signature Syntax and Processing
(Second Edition)."
Friday is the last day to submit late-breaking news for Balisage this August in Montreal. "Balisage is a peer-reviewed conference designed to meet the needs of markup theoreticians and practitioners who are pushing the boundaries of the field. It's all about the markup: how to create it; what it means; hierarchies and overlap; modeling; taxonomies; transformation; query, searching, and retrieval; presentation and accessibility; making systems that make markup dance (or dance faster to a different tune in a smaller space) — in short, changing the world and the web through the power of marked-up information. It's an XML Conference. It's an XSL Conference. It's a conference about XSD, XQuery, RDF, UBL, SGML, LMNL, XSL-FO, XTM, SVG, MathML, OWL, TexMECS, RNG, and a lot more. We welcome papers about topic maps, document modeling, markup of overlapping structures, ontologies, metadata, content management, and other markup-related topics at Balisage."
The Mozilla Project has posted the second release candidate of Firefox 3.0 for Mac, Linux, and Windows. Firefox 3 is based on the much improved Gecko 1.9 Web rendering platform. Mostly this release focuses on small user interface improvements, tightened security, and improved performance and under-the-hood architecture, rather than big new features. Still. there are a few new features including:
The W3C HTML Working Group has published a note on Offline Web Applications. "HTML 5 contains several features that address the challenge of building Web applications that work while offline. This document highlights these features (SQL, offline application caching APIs as well as online/offline events, status, and the localStorage API) from HTML 5 and provides brief tutorials on how these features might be used to create Web applications that work offline."
The XML Apache Project has posted version 0.95 of FOP, an open source XSL Formatting Objects to PDF/PostScript/RTF converter written in Java. This release fixes bugs, improves table support, and removes the need for some additional libraries. Java 1.4 or later is required.
The W3C Web Accessibility Initiative has published a working draft of Web Accessibility for Older Users: A Literature Review:
There has been extensive development and adoption of the WAI guidelines for Web accessibility for people with disabilities. However, while these guidelines address many of the requirements needed by the ageing population, the relevance of the WAI guidelines to the needs of older people with functional disabilities caused by ageing does not seem to be well understood.
This review examines the literature relating to the use of the Web by older people to primarily look for intersections and differences between the WAI guidelines and recommendations for web design and development issues that will improve the accessibility and usability for older people. It is intended that the review will:
- better inform the ongoing work of W3C/WAI with regard to the needs of older computer users and their web accessibility related needs
- inform the development of potential extensions on WAI guidelines and techniques and/or provide direct input into future versions of WAI guidelines
- lead to the development of educational resources focussed towards industry implementers, and organisations representing and serving ageing communities
- help foster dialog between ageing communities, disability communities, industry, and other interested parties around issues of web accessibility
- inform the contributions that W3C makes into the standards development processes in Europe and internationally.
Michael Kay has released version 9.0.0.6 of Saxon, his XSLT 2.0 and XQuery processor for Java and .NET. This is a bug fix release.
Saxon is published in two versions for both of which Java 1.4 or later (or .NET) is required. Saxon 9.0B is an open source product published under the Mozilla Public License 1.0 that "implements the 'basic' conformance level for XSLT 2.0 and XQuery." Saxon 9.0 SA is a £250.00 payware version that "allows stylesheets and queries to import an XML Schema, to validate input and output trees against a schema, and to select elements and attributes based on their schema-defined type. Saxon-SA also incorporates a free-standard XML Schema validator. In addition Saxon-SA incorporates some advanced extensions not available in the Saxon-B product. These include a try/catch capability for catching dynamic errors, improved error diagnostics, support for higher-order functions, and additional facilities in XQuery including support for grouping, advanced regular expression analysis, and formatting of dates and numbers."
The W3C has posted a working draft of State Chart XML (SCXML): State Machine Notation for Control Abstraction:
a general-purpose event-based state machine language that can be used in many ways, including:
- As a high-level dialog language controlling VoiceXML 3.0's encapsulated speech modules (voice form, voice picklist, etc.)
- As a voice application metalanguage, where in addition to VoiceXML 3.0 functionality, it may also control database access and business logic modules.
- As a multimodal control language in the MultiModal Interaction framework [W3C MMI], combining VoiceXML 3.0 dialogs with dialogs in other modalities including keyboard and mouse, ink, vision, haptics, etc. It may also control combined modalities such as lipreading (combined speech recognition and vision) speech input with keyboard as fallback, and multiple keyboards for multi-user editing.
- As the state machine framework for a future version of CCXML.
- As an extended call center management language, combining CCXML call control functionality with computer-telephony integration for call centers that integrate telephone calls with computer screen pops, as well as other types of message exchange such as chats, instant messaging, etc.
- As a general process control language in other contexts not involving speech processing.
SCXML combines concepts from CCXML and Harel State Tables. CCXML [W3C CCXML 1.0] is an event-based state machine language designed to support call control features in Voice Applications (specifically including VoiceXML but not limited to it). The CCXML 1.0 specification defines both a state machine and event handing syntax and a standardized set of call control elements. Harel State Tables are a state machine notation that was developed by the mathematician David Harel [Harel and Politi] and is included in UML [UML 2.0]. They offer a clean and well-thought out semantics for sophisticated constructs such as a parallel states. They have been defined as a graphical specification language, however, and hence do not have an XML representation. The goal of this document is to combine Harel semantics with an XML syntax that is a logical extension of CCXML's state and event notation.
I've begun serializing the first chapter of Refactoring HTML on The Cafes. The first two sections are posted now:
More are coming tomorrow and Friday.
XMLMind has released Qizx/db 2.1, a $3200 closed source, embeddable native XML database engine written in Java that supports XQuery 1.0. Version 2.1 adds support for XQuery Update. The query interpreter part is available under an open source license.
The W3C XHTML 2 working group has posted the last call working draft of
XHTML Access Module
Module to enable generic document accessibility. This module defines acess, an empty element that can carry
activate, key, targetid, and targetrole attributes.
activate attribute indicates whether a target element should be activated or not once it obtains focus. key attribute assigns a key mapping to an access shortcut. Triggering an access key defined in an access element changes focus to the next element in navigation order from the current focus that has one of the the referenced role or id values. targetid attribute specifies one or more IDREFs related to target elements for the associated event.targetrole attribute specifies a space separated list of CURIEs that maps to an element with a role attribute with the same value.The W3C CSS Working Group has posted the Candidate Recommendation of CSS Namespaces Module. This module "defines the syntax for using namespaces in CSS. It defines the @namespace rule for declaring the default namespace and binding namespaces to namespace prefixes, and it also defines a syntax that other specifications can adopt for using those prefixes in namespace-qualified names."
Given the namespace declarations:
@namespace toto "http://toto.example.org"; @namespace "http://example.com/foo";In a context where the default namespace applies
toto|A- represents the name
Ain thehttp://toto.example.orgnamespace.|B- represents the name
Bthat belongs to no namespace.*|C- represents the name
Cin any namespace, including no namespace.D- represents the name
Din thehttp://example.com/foonamespace.
Edwin Dankert has released XML Hammer 1.0, a GUI program written in Java and based on JAXP 1.3 for checking well-formedness, validating, transforming, and querying XML documents. XML Hammer is published under the Mozilla Public License 1.1.
The W3C Web API Working Group has posted the third public working draft of Progress Events 1.0. This "defines events which can be used to monitor a process and provide feedback to a user, particularly for network-based events." Here's the IDL:
interface ProgressEvent : events::Event {
readonly attribute boolean lengthComputable;
readonly attribute unsigned long loaded;
readonly attribute unsigned long total;
void initProgressEvent(in DOMString typeArg,
in boolean canBubbleArg,
in boolean cancelableArg,
in boolean lengthComputableArg,
in unsigned long loadedArg,
in unsigned long totalArg,
void initProgressEventNS(in DOMString namespaceURI,
in DOMString typeArg,
in boolean canBubbleArg,
in boolean cancelableArg,
in boolean lengthComputableArg,
in unsigned long loadedArg,
in unsigned long totalArg,
};
The W3C CSS Working Group has posted the last call working draft of Cascading Style Sheets (CSS) Snapshot 2007
When the first CSS specification was published, all of CSS was contained in one document that defined CSS Level 1. CSS Level 2 was defined also by a single, multi-chapter document. However for CSS beyond Level 2, the CSS Working Group chose to adopt a modular approach, where each module defines a part of CSS, rather than to define a single monolithic specification. This breaks the specification into more manageable chunks and allows more immediate, incremental improvement to CSS.
Since different CSS modules are at different levels of stability, the CSS Working Group has chosen to publish this profile to define the current scope and state of Cascading Style Sheets as of late 2007. This profile includes only specifications that we consider stable and for which we have enough implementation experience that we are sure of that stability.
Note that this is not intended to be a CSS Desktop Browser Profile: inclusion in this profile is based on feature stability only and not on expected use or Web browser adoption. This profile defines CSS in its most complete form.
Note also that although we don't anticipate significant changes to the specifications that form this snapshot, their inclusion does are not mean they are frozen. The Working Group will continue to address problems as they are found in these specs. Implementers should monitor www-style and/or the CSS Working Group Blog for any resulting changes, corrections, or clarifications.
There actually isn't that much that's ready; mostly CSS Level 2 plus CSS Namespaces, Selectors Level 3 and CSS Color Level 3.
The Mozilla Project has posted the first release candidate of Firefox 3.0 for Mac, Linux, and Windows. Firefox 3 is based on the much improved Gecko 1.9 Web rendering platform. Mostly this release focuses on small user interface improvements, tightened security, and improved performance and under-the-hood architecture, rather than big new features. Still. there are a few new features including:
I haven't tried Firefox 3 yet myself, but initial reviews are very positive.
The W3C XQuery working group has posted the candidate recommendation of XQuery and XPath Full Text 1.0:
XML documents may contain highly structured data (fixed schemas, known types such as numbers, dates), semi-structured data (flexible schemas and types), markup data (text with embedded tags), and unstructured data (untagged free-flowing text). Where a document contains unstructured or semi-structured data, it is important to be able to search using Information Retrieval techniques such as scoring and weighting.
Full-text search is different from substring search in many ways:
A full-text search searches for tokens and phrases rather than substrings. A substring search for news items that contain the string "lease" will return a news item that contains "Foobar Corporation releases the 20.9 version ...". A full-text search for the token "lease" will not.
There is an expectation that a full-text search will support language-based searches which substring search cannot. An example of a language-based search is "find me all the news items that contain a token with the same linguistic stem as 'mouse'" (finds "mouse" and "mice"). Another example based on token proximity is "find me all the news items that contain the tokens 'XML' and 'Query' allowing up to 3 intervening tokens".
Full-text search must address the vagaries and nuances of language. Search results are often of varying usefulness. When you search a web site for cameras that cost less than $100, this is an exact search. There is a set of cameras that matches this search, and a set that does not. Similarly, when you do a string search across news items for "mouse", there is only 1 expected result set. When you do a full-text search for all the news items that contain the token "mouse", you probably expect to find news items containing the token "mice", and possibly "rodents", or possibly "computers". Not all results are equal. Some results are more "mousey" than others. Because full-text search may be inexact, we have the notion of score or relevance. We generally expect to see the most relevant results at the top of the results list.
Note:
As XQuery and XPath evolve, they may apply the notion of score to querying structured data. For example, when making travel plans or shopping for cameras, it is sometimes useful to get an ordered list of near matches in addition to exact matches. If XQuery and XPath define a generalized inexact match, we expect XQuery and XPath to utilize the scoring framework provided by XQuery and XPath Full Text.
[Definition: Full-text queries are performed on tokens and phrases. Tokens and phrases are produced via tokenization.] Informally, tokenization breaks a character string into a sequence of tokens, units of punctuation, and spaces.
Tokenization, in general terms, is the process of converting a text string into smaller units that are used in query processing. Those units, called tokens, are the most basic text units that a full-text search can refer to. Full-text operators typically work on sequences of tokens found in the target text of a search. These tokens are characterized by integers that capture the relative position(s) of the token inside the string, the relative position(s) of the sentence containing the token, and the relative position(s) of the paragraph containing the token. The positions typically comprise a start and an end position.
Tokenization, including the definition of the term "tokens", SHOULD be implementation-defined. Implementations SHOULD expose the rules and sample results of tokenization as much as possible to enable users to predict and interpret the results of tokenization. Tokenization is defined more formally in 4.1 Tokenization.
[Definition: A token is a non-empty sequence of characters returned by a tokenizer as a basic unit to be searched. Beyond that, tokens are implementation-defined.] [Definition: A phrase is an ordered sequence of any number of tokens. Beyond that, phrases are implementation-defined.]
Note:
Consecutive tokens need not be separated by either punctuation or space, and tokens may overlap.
Note:
In some natural languages, tokens and words can be used interchangeably.
[Definition: A sentence is an ordered sequence of any number of tokens. Beyond that, sentences are implementation-defined. A tokenizer is not required to support sentences.]
[Definition: A paragraph is an ordered sequence of any number of tokens. Beyond that, paragraphs are implementation-defined. A tokenizer is not required to support paragraphs.]
Some XML elements represent semantic markup, e.g., <title>. Others represent formatting markup, e.g., <b> to indicate bold. Semantic markup serves well as token boundaries. Some formatting markup serves well as token boundaries, for example, paragraphs are most commonly delimited by formatting markup. Other formatting markup may not serve well as token boundaries. Implementations are free to provide implementation-defined ways to differentiate between the markup's effect on token boundaries during tokenization. In the absence of an implementation-defined way to differentiate, element markup (start tags, end tags, and empty-element tags) creates token boundaries.
A sample tokenization is used for the examples in this document. The results might be different for other tokenizations.
Tokenization enables functions and operators that operate on a part or the root of the token (e.g., wildcards, stemming).
Tokenization enables functions and operators which work with the relative positions of tokens (e.g., proximity operators).
This specification focuses on functionality that serves all languages. It also selectively includes functionalities useful within specific families of languages. For example, searching within sentences and paragraphs is useful to many western languages and to some non-western languages, so that functionality is incorporated into this specification.
SyncroSoft has released <Oxygen/> 9.2, $345 payware XML editor written in Java. Oxygen supports XML, XSL, DTDs, XQuery, SVG, Relax NG, Schematron, and the W3C XML Schema Language. According to the announcement, 9.2 adds support for the Intel XSLT engine and Saxon 9.0.0.4.
The Web Accessibility Initiative has published Web Accessibility for Older Users: A Literature Review. "This document is intended to provide an overview of currently available literature about the needs of older adults with functional impairments accessing the web. It will compare how well these requirements are addressed and communicated by the WAI guidelines. This early version is intended to elicit comment and feedback on the literature collected and discussed so far. In particular we are interested in whether there are gaps in our coverage, or key resources overlooked. It should be noted that this is a work-in-progress and that not all sections are yet complete."
I am pleased to announce that my latest book,
Refactoring HTML has been released by Addison Wesley. This book endeavors to improve the design of existing web sites along multiple axes: maintainability, security, attractiveness, and performance.
It does this by moving sites to web standards: XHTML, CSS, and REST.
Rather than approaching this as a big bang project, small changes can be made in small steps that offer linear improvement. You don't need to spend months of developer time and thousands of dollars before you see any payback. You can improve your site some today, and then some more tomorrow. Refactoring a web site doesn't require large blocks of uninterrupted development time. Add up enough small changes in the little pieces of time scattered throughout the workday, and before you know it, your site is dramatically improved.
Not convinced yet? Let me offer a brief excerpt from Chapter 1:
Refactoring. What is it? Why do it?
In brief, refactoring is the gradual improvement of a code base by making small changes that don’t modify a program’s behavior, usually with the help of some kind of automated tool. The goal of refactoring is to remove the accumulated cruft of years of legacy code and produce cleaner code that is easier to maintain, easier to debug, and easier to add new features to.
Technically, refactoring never actually fixes a bug or adds a feature. However, in practice, when refactoring I almost always uncover bugs that need to be fixed and spot opportunities for new features. Often, refactoring changes difficult problems into tractable and even easy ones. Reorganizing code is the first step in improving it.
The concept of refactoring originally came from the object-oriented programming community, and dates back at least as far as 1990 (William F. Opdyke and Ralph E. Johnson, “Refactoring: An Aid in Designing Application Frameworks and Evolving Object-Oriented Systems,” Proceedings of the Symposium on Object-Oriented Programming Emphasizing Practical Applications [SOOPPA], September 1990, ACM), though likely it was in at least limited use before then. However, the term was popularized by Martin Fowler in 1999 in his book Refactoring (Addison-Wesley, 1999). Since then, numerous IDEs and other tools such as Eclipse, IntelliJ IDEA, and C# Refactory have implemented many of his catalogs of refactorings for languages such as Java and C#, as well as inventing many new ones.
However, it’s not just object-oriented code and object-oriented languages that develop cruft and need to be refactored. In fact, it’s not just programming languages at all. Almost any sufficiently complex system that is developed and maintained over time can benefit from refactoring. The reason is twofold.
Increased knowledge of both the system and the problem domain often reveals details that weren’t apparent to the initial designers. No one ever gets everything right in the first release. You have to see a system in production for a while before some of the problems become apparent.
2. Over time, functionality increases and new code is written to support this functionality. Even if the original system solved its problem perfectly, the new code written to support new features doesn’t mesh perfectly with the old code. Eventually, you reach a point where the old code base simply cannot support the weight of all the new features you want to add.
When you find yourself with a system that is no longer able to support further developments, you have two choices: You can throw it out and build a new system from scratch, or you can shore up the foundations. In practice, we rarely have the time or budget to create a completely new system just to replace something that already works. It is much more cost-effective to add the struts and supports that the existing system needs before further work. If we can slip these supports in gradually, one at a time, rather than as a big-bang integration, so much the better.
Many sufficiently complex systems with large chunks of code are not object-oriented languages and perhaps are not even programming languages at all. For instance, Scott Ambler and Pramod Sadalage demonstrated how to refactor the SQL databases that support many large applications in Refactoring Databases (Addison-Wesley, 2006). However, while the back end of a large networked application is often a relational database, the front end is a web site. Thin client GUIs delivered in Firefox or Internet Explorer are everywhere, replacing thick client GUIs for all sorts of business applications, such as payroll and lead tracking. Adventurous users at companies such as Sun and Google are going even further and replacing classic desktop applications like word processors and spreadsheets with web apps built out of HTML, CSS, and JavaScript. Finally, the Web and the ubiquity of the web browser have enabled completely new kinds of applications that never existed before, such as eBay, Netflix, PayPal, Google Reader, and Google Maps.
HTML made these applications possible, and it made them faster to develop, but it didn’t make them easy. It didn’t make them simple. It certainly didn’t make them less fundamentally complex. Some of these systems are now on their second, third, or fourth generation; and wouldn’t you know it? Just like any other sufficiently complex, sufficiently long-lived application, these web apps are developing cruft. The new pieces aren’t merging perfectly with the old pieces. Systems are slowing down because the whole structure is just too ungainly. Security is being breached when hackers slip in through the cracks where the new parts meet the old parts. Once again, the choice comes down to throwing out the original application and starting over, or fixing the foundations; but really, there’s no choice. In today’s fast-moving world, nobody can afford to wait for a completely new replacement. The only realistic option is to refactor.
Most of the refactorings in this book focus on upgrading sites to web standards, specifically:
- XHTML
- CSS
- REST
They are going to help you move away from
- Tag soup
- Presentation-based markup
- Stateful applications
These are not binary choices, or all-or-nothing decisions. You can often improve the characteristics of your sites along these three axes without going all the way to one extreme. An important characteristic of refactoring is that it’s linear. Small changes generate small improvements. You do not need to do everything at once. You can implement well-formed XHTML before you implement valid XHTML. You can implement valid XHTML before you move to CSS. You can have a fully valid CSS-laid-out site before you consider what’s required to eliminate sessions and session cookies.
Nor do you have to implement these changes in this order. You can pick and choose the refactorings from the catalog that bring the most benefit to your applications. You may not require XHTML, but you may desperately need CSS. You may want to move your application architecture to REST for increased performance but not care much about converting the documents to XHTML. Ultimately, the decision rests with you. This book presents the choices and options so that you can weigh the costs and benefits for yourself.
It is certainly possible to build web applications using tag-soup table-based layout, image maps, and cookies. However, it’s not possible to scale those applications, at least not without a disproportionate investment in time and resources that most of us can’t afford. Growth both horizontally (more users) and vertically (more features) requires a stronger foundation. This is what XHTML, CSS, and REST provide.
Refactoring HTML is available now at Amazon, Safari, and other fine bookstores everywhere. The price is a very reasonable $39.99, and most stores are offering their customary discounts. (Amazon is 10% off at the moment.) I hope you enjoy it.
The W3C Web Application Formats several new and update working drafts about Widgets:
A widget is an interactive single purpose application for displaying and/or updating local data or data on the Web, packaged in a way to allow a single download and installation on a user's machine or mobile device. A widget may run as a stand alone application (meaning it can run outside of a Web browser), or may be embedded into a Web document. In this document, the runtime environment on which a widget is run is referred to as a widget user agent and a running widget is referred to as an instantiated widget. Prior to instantiation, a widget exists as a widget resource. For more information about widgets, see the Widget Landscape document.
To be clear, this specification describes the requirements for desktop style widgets (akin to Dashboard, Opera Widgets, and Yahoo! Widgets). This document does not address the requirements of "web widgets", such as iGoogle Gadgets or Windows Live Gadgets.
The drafts include:
The W3C Web Content Accessibility Guidelines Working Group has updated two working drafts on the Web Content Accessibility Guidelines:
This document, "Understanding WCAG 2.0," is an essential guide to understanding and using Web Content Accessibility Guidelines 2.0 [WCAG20]. It is part of a series of documents that support WCAG 2.0. Please note that the contents of this document are informative (they provide guidance), and not normative (they do not set requirements for conforming to WCAG 2.0).
WCAG 2.0 establishes a set of Success Criteria to define conformance to the WCAG 2.0 Guidelines. A Success Criterion is a testable statement that will be either true or false when applied to specific Web content. "Understanding WCAG 2.0" provides detailed information about each Success Criterion, including its intent, the key terms that are used in the Success Criterion, and how the Success Criteria in WCAG 2.0 help people with different types of disabilities. This document also provides examples of Web content that meet the success criterion using various Web technologies (for instance, HTML, CSS, XML), and common examples of Web content that does not meet the success criterion.
This document indicates specific techniques to meet each Success Criterion. Details for how to implement each technique are available in Techniques and Failures for WCAG 2.0, but "Understanding WCAG 2.0" provides the information about the relationship of each technique to the Success Criteria. Techniques are categorized by the level of support they provide for the Success Criteria. "Sufficient techniques" are sufficient to meet a particular Success Criterion (either by themselves or in combination with other techniques), while other techniques are advisory and therefore optional. None of the techniques are required to meet WCAG 2.0, although some may be the only known method if a particular technology is used. "Advisory techniques" are not sufficient to meet the Success Criteria on their own (because they are not testable or provide incomplete support) but it is encouraged that authors follow them when possible to provide enhanced accessibility. Another support category is "Failure techniques", which describe authoring practices known to cause Web content not to conform to WCAG 2.0. Although failure techniques provide advisory information about certain authoring practices, authors must avoid those practices in order to meet the WCAG 2.0 Success Criteria.
"'Techniques and Failures for WCAG 2.0' provides information to Web content developers who wish to satisfy the success criteria of Web Content Accessibility Guidelines 2.0 (WCAG 2.0). Techniques are specific authoring practices that may be used in support of the WCAG 2.0 success criteria. This document provides "General Techniques" that describe basic practices that are applicable to any technology, and technology-specific techniques that provide information applicable to specific technologies. Currently, technology-specific techniques are available for HTML, CSS, ECMAScript, SMIL, ARIA, and Web servers. The World Wide Web Consortium only documents techniques for non-proprietary technologies; the WCAG Working Group hopes vendors of other technologies will provide similar techniques to describe how to conform to WCAG 2.0 using those technologies. Use of the techniques provided in this document makes it easier for Web content to demonstrate conformance to WCAG 2.0 success criteria than if these techniques are not used."
There's a lot of good information here. These should really be required reading for all HTML authors and web designers. The Techniques spec is probably the most practical, and where most readers should start.
The W3C XHTML 2 Working Group has posted the last call working draft of
CURIE Syntax 1.0:
A syntax for expressing Compact URIs. This is modeled after namespace URIs and qualified names. In brief, it defines a prefix for a known base IRI (a URI that can contain non-ASCII characters like é),
then appends a colon and a local part.
For example, the CURIE cafe:tradeshows.xml could be shorthand for
http://www.cafeaulait.org/tradeshows.xml if the prefix
cafe were mapped to the URL
http://www.cafeaulait.org/.
Exactly how prefixes are mapped to base IRIs is left to the specification of the documents in which the CURIEs appear. However
if the CURIEs are in an XML document, then the namespaces in scope define the
prefix mappings. The default namespace can be used for prefix-less CURIEs.
Frankly I'm surprised to see this. Namespaces and the namespace syntax are one of the notable failures of the XML ecosystem. Why someone would choose to imitate this now that we know better is beyond me. Based on experience with namespaces, I predict that the problems of moving CURIEs from one context to another are going to be especially problematic. Well, we've learned to live with (if not exactly like) namespaces. I guess we can get used to this.
Planamesa Software has released NeoOffice/J 2.2.3, a Mac port of OpenOffice 2.1 using a Java-based GUI. New features in this latest patch release include a Media Browser, native floating tool windows, trackpad magnify and swipe features. Features since 2.2.2 include grammar checking, importing images from scanners and cameras, QuickTime video, and menu bars that stay open when no window is present.
SyncroSoft has released <Oxygen/> 9.2, $345 payware XML editor written in Java. Oxygen supports XML, XSL, DTDs, XQuery, SVG, Relax NG, Schematron, and the W3C XML Schema Language. According to the announcement, "Version 9.2 introduces a new XML Author edition specially tuned for content authors providing a well designed interface for XML editing by keeping only the relevant authoring features. The major additions in Oxygen XML Editor 9.2 are related to the WYSIWYG-like editing support and in particular to the DITA support. The general visual editing improvements include displaying the resolved content in the editor and navigation through links. With the new DITA features that include a new DITA map editor, actions for inserting conref links, a tight integration of the latest version of the DITA Open Toolkit, Oxygen XML Editor becomes the leading DITA editor and the easiest to use. Other improvements are browsing of XML databases using WebDAV connections, better handling of Chinese, Japanese and Korean (CJK) text, support for the Intel® XML Software Suite and multiple component updates."
The OpenOffice Project has posted the first beta of OpenOffice 3.0, an open source office suite for Linux, Solaris, and Windows that saves all its files as zipped XML.
The most immediately visible change to OpenOffice.org 3.0 is the new "Start Centre", new fresh-looking icons, and a new zoom control in the status bar. A closer look shows that 3.0 has a myriad of new features. Notable Calc improvements include a new solver component; support for spreadsheet collaboration through workbook sharing; and an increase to 1024 columns per sheet. Writer has an improved notes feature and displays of multiple pages while editing. There are numerous Chart enhancements, and an improved crop feature in Draw and Impress.
Behind the scenes, OpenOffice.org 3.0 will support the upcoming OpenDocument Format (ODF) 1.2 standard, and is capable of opening files created with MS-Office 2007 or MS-Office 2008 for Mac OS X (.docx, .xlsx, .pptx, etc.). This is in addition to read and write support for the MS-Office binary file formats (.doc, .xls, .ppt, etc.).
OpenOffice.org 3.0 will be the first version to run on Mac OS X without X11, with the look and feel of any other Aqua application. It introduces partial VBA support to this platform. In addition, OpenOffice.org 3.0 integrates well with the Mac OS X accessibility APIs, and thus offers better accessibility support than many other Mac OS X applications.
The W3C Web Content Accessibility Guidelines Working Group has posted the candidate recommendation of Web Content Accessibility Guidelines 2.0. "Web Content Accessibility Guidelines 2.0 (WCAG 2.0) covers a wide range of recommendations for making Web content more accessible. Following these guidelines will make content accessible to a wider range of people with disabilities, including blindness and low vision, deafness and hearing loss, learning disabilities, cognitive limitations, limited movement, speech difficulties, photosensitivity and combinations of these. Following these guidelines will also often make your Web content more usable to users in general."
The W3C XML Core Working Group has published the finished recommendation
Canonical XML 1.1. This
attempts to address some of the weirdnesses of
Canonical XML, such as the movement of xml:id attributes from one element to another and breaking of base URLs when canonicalizing.
The W3C XML Processing Model Working Group has published a new Working Draft of XProc: An XML Pipeline Language. According to group lead Norm Walsh, changes in this draft are:
Fairly substantial syntax changes. A <p:pipeline> is now just syntactic sugar for a particular <p:declare-step>.
Significantly reworked the syntax and semantics of variables, options, and parameters. Added <p:variable>. Imposed a syntactic distinction between declaration (<p:option>) and use (<p:with-option>/<p:with-param>) of options and parameters.
Clarified the scope of variables and options.
Removed value attribute from <p:variable>, <p:option>, <p:with-option>, and <p:with-param>.
Removed automatic declaration of parameter input ports; you have to declare them explicitly if you need them.
Added p:base-uri() and p:resolve-uri() XPath extension functions to support (XPath 1.0) pipelines that need access to the base URI of documents.
Removed ignored namespaces, added <p:pipeinfo>.
Redefined the <p:label-elements> step to use a step-local variable in the XPath context.
Added psvi-required attribute to pipelines.
Changed definition of <p:error> to better address localization issues.
The syntax changes, and making <p:pipeline> syntactic sugar for a particular <p:declare-step>, have the effect of making very simple, straight-through pipelines syntactically simple again.
Reorganizing some of the option and parameter elements, and adding a variable element, makes the language bigger (in the sense that it has more elements) but I think it has significantly reduced some of the confusing sublty that used to exist around declaration and use of options.
In general, I think these are all changes for the better. And I think we're done. This is a Last Call working draft in all but name. The changes are significant enough that we thought it would be best to float them in an ordinary working draft first. That will, I hope, save us the embarrassment of having to do more than two last calls.
Mokka mit Schlag is borked at the moment. I think I know what went wrong with the upgrade, and I'm working on fixing it. In brief, the WordPress user did not have permissions to create and drop tables. This is indicative of a bug in WordPress--it does not verify that it has the necessary permissions before attempting to upgrade, nor does it notice that the upgrade has failed and perform a rollback. However the host (Pair Networks) has not been quickly responsive, so I'm not sure how long it will take; and I don't have the root database access necessary to repair the problem, so it may take a little while.
Another day, another WordPress security bug. Matt Mullenweg has released Wordpress 2.5.1 an open source (GPL) blog engine based on PHP and MySQL. All users should upgrade.
The W3C has posted the first working draft of Requirements of Japanese Text Layout. "This document describes requirements for general Japanese layout realized with technologies like CSS, SVG and XSL-FO. The document is mainly based on a standard for Japanese layout, JIS X 4051. However, it addresses also areas which are not covered by JIS X 4051. The document is currently in draft stage. This public draft contains the Introduction and section 1 Basics of Japanese Text Layout. Further sections are available in a non-public version of the document and will be integrated into a further public Working Draft."
Daniel Veillard has released version 2.6.32 of libxml2, the open source XML C library for Gnome. This release fixes assorted bugs including some memory leaks. All users should upgrade.
The W3C Web API Working Group has posted the last call working draft of The XMLHttpRequest Object.
The
XMLHttpRequestobject implements an interface exposed by a scripting engine that allows scripts to perform HTTP client functionality, such as submitting form data or loading data from a server.The name of the object is
XMLHttpRequestfor compatibility with the web, though each component of this name is potentially misleading. First, the object supports any text based format, including XML. Second, it can be used to make requests over both HTTP and HTTPS (some implementations support protocols in addition to HTTP and HTTPS, but that functionality is not covered by this specification). Finally, it supports "requests" in a broad sense of the term as it pertains to HTTP; namely all activity involved with HTTP requests or responses for the defined HTTP methods.
Michael Kay has released version 9.0.0.4 of Saxon, his XSLT 2.0 and XQuery processor for Java and .NET. This is a bug fix release. "Although there's a steady stream of new bugs and fixes, I think they are largely problems that affect very few users, so unless you know you're affected by one of the bugs, there's no great urgency to upgrade to the latest maintenance build."
Saxon is published in two versions for both of which Java 1.4 or later (or .NET) is required. Saxon 9.0B is an open source product published under the Mozilla Public License 1.0 that "implements the 'basic' conformance level for XSLT 2.0 and XQuery." Saxon 9.0 SA is a £250.00 payware version that "allows stylesheets and queries to import an XML Schema, to validate input and output trees against a schema, and to select elements and attributes based on their schema-defined type. Saxon-SA also incorporates a free-standard XML Schema validator. In addition Saxon-SA incorporates some advanced extensions not available in the Saxon-B product. These include a try/catch capability for catching dynamic errors, improved error diagnostics, support for higher-order functions, and additional facilities in XQuery including support for grouping, advanced regular expression analysis, and formatting of dates and numbers."
In related news, the fourth edition of Kay's classic XSLT 2.0 and XPath 2.0 Programmer's Reference is scheduled to be released on April 28th. It's in hardcover, over 1300 pages, and is currently available for $37.79 at Amazon.
XMLMind has released version 3.8.0 of their XML Editor. This $300 payware product features word processor and spreadsheet like views of XML documents. This release adds support for MathML 2 presentation markup. A free-beer hobbled version is also available.
The W3C HTML working group has posted the last call working draft of XHTML Role Attribute Module.
The Role Attribute Module defines the
roleattribute and some values for that attribute in the default vocabulary space. Theroleattribute takes as its value one or more whitespace separatedCURIEs [CURIE]. Any non-qualified value MUST be interpreted as being from the XHTML vocabulary athttp://www.w3.org/1999/xhtml/vocab#. For a list of all roles in the default vocabulary, see [XHTMLVOCAB].The attribute describes the role(s) the current element plays in the context of the document. This can be used, for example, by applications and assistive technologies to determine the purpose of an element. This could allow a user to make informed decisions on which actions may be taken on an element and activate the selected action in a device independent way. It could also be used as a mechanism for annotating portions of a document in a domain specific way (e.g., a legal term taxonomy).
This example is informative<ul role="navigation sitemap"> <li href="downloads">Downloads</li> <li href="docs">Documentation</li> <li href="news">News</li> </ul>The following list represents some of the roles defined in the default vocabulary. They are intended to define regions of the document to help orient the user.
- banner
- A region that contains the prime heading or internal title of a page.
Most of the content of a banner is site-oriented, rather than being page-specific. Site-oriented content typically includes things such as the logo of the site sponsor, the main heading for the page, and site-specific search tool. Typically this appears at the top of the page spanning the full width.
- complementary
- Any section of the document that supports but is separable from the main content, but is semantically meaningful on its own even when separated from it.
There are various types of content that would appropriately have this role. For example, in the case of a portal, this may include but not be limited to show times, current weather, related articles, or stocks to watch. The content should be relevant to the main content; if it is completely separable, a more general role should be used instead.
- contentinfo
- Meta information about the content on the page or the page as a whole.
For example, footnotes, copyrights, links to privacy statements, etc. would belong here.
- definition
- A definition of a term or concept.
A role is not provided to specify the term being defined, although host languages may provide such an element; in XHTML this is the dfn element. The defined term should be included in such an element even when occurring within an element having the definition role.
- main
- Main content in a document.
This marks the content that is directly related to or expands upon the central topic of the page.
- navigation
- A collection of links suitable for use when navigating the document or related documents.
- note
- The content is parenthetic or ancillary to the main content of the resource.
- search
- The search tool of a web document.
This is typically a form used to submit search requests about the site or to a more general Internet search service.
You can add other values for this attribute by placing the values in a namespace. (Haven't we learned yet that namespaced attribute values are a bad idea?)
The W3C Web API Working Group has published the second working draft of Language Bindings for DOM Specifications. "“Language Bindings for DOM Specifications” is intended to specify in detail the IDL language used by W3C specifications to define DOM interfaces, and to provide precise conformance requirements for ECMAScript and Java bindings of such interfaces. It is expected that this document acts as a guide to implementors of already-published DOM specifications, and that newly published DOM specifications reference this document to ensure conforming implementations of DOM interfaces are interoperable."
The W3C Semantic Web Activity has posted a ?working draft? of Experiences with the conversion of SenseLab databases to RDF/OWL. "One of the challenges facing Semantic Web for Health Care and Life Sciences is that of converting relational databases into Semantic Web format. The issues and the steps involved in such a conversion have not been well documented. To this end, we have created this document to describe the process of converting SenseLab databases into OWL. SenseLab is a collection of relational (Oracle) databases for neuroscientific research. The conversion of these databases into RDF/OWL format is an important step towards realizing the benefits of Semantic Web in integrative neuroscience research. This document describes how we represented some of the SenseLab databases in Resource Description Framework (RDF) and Web Ontology Language (OWL), and discusses the advantages and disadvantages of these representations. Our OWL representation is based on the reuse of existing standard OWL ontologies developed in the biomedical ontology communities. The purpose of this document is to share our implementation experience with the community."
Mildly interesting, but why this is working draft instead of a note, or why it's even published by the W3C I can't quite figure out. This is a case study at most, not a specification of anything in particular.
The W3C Math Working Group has posted the third public working draft of Mathematical Markup Language (MathML) Version 3.0. Changes since 2.0 include content dictionaries, "a mechanism for recording that a particular notational structure has a particular mathematical meaning". Version 3.0 is also supposed to enable easier markup of elementary school mathematics.
The Modis Team has released Sedna 3.0, an open source native XML database for Windows and Linux written in C++ and Scheme and published under the Apache License 2.0. Sedna supports XQuery and its own declarative update language. This release fixes bugs and improves transaction support.
Of the open source XML databases, this is the one I know the least about. Anyone want to comment on this one?
The W3C XHTML 2 Working Group has posted the third public working draft of
CURIE Syntax 1.0:
A syntax for expressing Compact URIs. This is modeled after namespace URIs and qualified names. In brief, it defines a prefix for a known base IRI (a URI that can contain non-ASCII characters like é),
then appends a colon and a local part.
For example, the CURIE cafe:tradeshows.xml could be shorthand for
http://www.cafeaulait.org/tradeshows.xml if the prefix
cafe were mapped to the URL
http://www.cafeaulait.org/.
Exactly how prefixes are mapped to base IRIs is left to the specification of the documents in which the CURIEs appear. However
if the CURIEs are in an XML document, then the namespaces in scope define the
prefix mappings. The default namespace can be used for prefix-less CURIEs.
Frankly I'm surprised to see this. Namespaces and the namespace syntax are one of the notable failures of the XML ecosystem. Why someone would choose to imitate this now that we know better is beyond me. Based on experience with namespaces, I predict that the problems of moving CURIEs from one context to another are going to be especially problematic. Well, we've learned to live with (if not exactly like) namespaces. I guess we can get used to this.
The Unicode Consortium has released Unicode 5.1:
This release contains over 100,000 characters, and provides significant additions and improvements that extend text processing for software worldwide. Some of the key features are: increased security in data exchange, significant character additions for Indic and South East Asian scripts, expanded identifier specifications for Indic and Arabic scripts, improvements in the processing of Tamil and other Indic scripts, linebreaking conformance relaxation for HTML and other protocols, strengthened normalization stability, new case pair stability, plus others given below.
The Version 5.1.0 data files and documentation are final and posted on the Unicode site. In addition to updated existing files, implementers will find new test data files (for example, for linebreaking) and new XML data files that encapsulate all of the Unicode character properties. For details, see the page for Unicode 5.1.0 at http://www.unicode.org/versions/Unicode5.1.0/.
A major feature of Unicode 5.1.0 is the enabling of ideographic variation sequences. These sequences allow standardized representation of glyphic variants needed for Japanese, Chinese, and Korean text. The first registered collection, from Adobe Systems, is now available at http://www.unicode.org/ivd/.
Unicode 5.1 contains significant changes to properties and behaviorial specifications. Several important property definitions were extended, improving linebreaking for Polish and Portuguese hyphenation. The Unicode Text Segmentation Algorithms, covering sentences, words, and characters, were greatly enhanced to improve the processing of Tamil and other Indic languages. The Unicode Normalization Algorithm now defines stabilized strings and provides guidelines for buffering. Standardized named sequences are added for Lithuanian, and provisional named sequences for Tamil.
Unicode 5.1.0 adds 1,624 newly encoded characters. These additions include characters required for Malayalam and Myanmar and important individual characters such as Latin capital sharp s for German. Version 5.1 extends support for languages in Africa, India, Indonesia, Myanmar, and Vietnam, with the addition of the Cham, Lepcha, Ol Chiki, Rejang, Saurashtra, Sundanese, and Vai scripts. Scholarly support includes important editorial punctuation marks, as well as the Carian, Lycian, and Lydian scripts, and the Phaistos disc symbols. Other new symbol sets include dominoes, Mahjong, dictionary punctuation marks, and math additions. This latest version of the Unicode Standard has exactly the same character assignments as ISO/IEC 10646:2003 plus Amendments 1 through 4.
The Unicode Collation Algorithm (UCA), the core standard for sorting all text, is also being updated at the same time (see http://www.unicode.org/reports/tr10/). The major changes in UCA include coverage of all Unicode 5.1 characters, tightened conformance for canonical equivalence, clearer definitions of internationalized search and matching, specifications of parameters for customizing collation, and definitions of collation folding. There are also important clarifications on the use of contractions (such as "ch" in Slovak) in collation.
The next version of the Unicode locale project (CLDR) is also being prepared on the basis of Unicode 5.1, and is now open for public data submission (see http://www.unicode.org/cldr/).
The W3C Web Security Context Working Group has posted the an updated public working draft of Web Security Context: Experience, Indicators, and Trust.
This specification deals with the trust decisions that users must make online, and with ways to support them in making safe and informed decisions where possible.
In order to achieve that goal, this specification includes recommendations on the presentation of identity information by Web user agents; on handling errors in security protocols in a way that minimizes the trust decisions left to users, and (we hope) induces them toward safe behavior where they have to make these decisions; and on data entry interactions that (we hope, again) will make it easier for users to enter sensitive data into legitimate sites than to enter them into illegitimate sites.
Where this document specifies user interactions with a goal toward making security usable, no claim is made at this time that this goal is met: As noted in the Status of this Document section, this is an initial draft to trigger discussion and commentary; assume that what is proposed here is untested.
To complement the interaction and decision related parts of this specification, 7 Robustness addresses the question of how the communication of context information needed to make decisions can be made more robust against attacks.
Finally, 8 Authoring and deployment best practices is about practices for those who deploy Web Sites. It complements some of the interaction related techniques recommended in this specification. The aim of this section is to provide guidelines for creating Web sites with reduced attack surfaces against certain threats, and with usefully provided security context information.
This specification comes with two companion documents: [WSC-USECASES] documents the use cases and assumptions that underly this specification. [WSC-THREATS] documents the Working Group's threat analysis.
The W3C XML Core Working Group has a new last call working draft of the XML Linking Language (XLink) Version 1.1. There are three major changes in XLink 1.1 compared to 1.0:
xlink:type="simple" attribute is no longer required.That is a simple link can now be written like this:
<composer xlink:href="http://www.beand.com/">Beth Anderson</composer>
It's no longer necessary to write this:
<composer xlink:type="simple" xlink:href="http://www.beand.com/">Beth Anderson</composer>
This is a good thing. I'm not sure who first came up with this idea, but I've been advocating it for a while now. This makes XLink a lot more palatable in applications like SVG.
It's not immediately clear what changes necessitated going back from the previous candidate recommendation to a last call status again.
The Mozilla Project has posted the fifth beta of Firefox 3.0 for Mac, Linux, and Windows. "Firefox 3 is based on the Gecko 1.9 Web rendering platform, which has been under development for the past 32 months. Building on the previous release, Gecko 1.9 has more than 12,000 updates including some major re-architecting to provide improved performance, stability, rendering correctness, and code simplification and sustainability. Firefox 3 has been built on top of this new platform resulting in a more secure, easier to use, more personal product with a lot more under the hood to offer website and Firefox add-on developers. [Improved in Beta 5!] Firefox 3 Beta 5 includes more than 750 changes from the previous beta, improving stability and web compatibility, providing platform and user interface enhancements, and resulting in the fastest Firefox ever. Many of these improvements were based on community feedback from the previous beta."
The W3C XML Security Specifications Maintenance Working Group
has posted the Proposed Edited Recommendation of
XML Signature Syntax and Processing (Second Edition)
"This Proposed Second Edition of XML Signature Syntax and
Processing adds Canonical XML 1.1 as a required
canonicalization algorithm and recommends its use for inclusive
canonicalization. This version of Canonical XML enables use of
xml:id and xml:base Recommendations
with XML Signature and also enables other possible future
attributes in the XML namespace. Additional minor changes,
including the incorporation of known errata, are documented in
Changes in XML Signature Syntax and Processing
(Second Edition)." I have to read through the detailed changes, but at first glance this looks like a reasonable adjustment that doesn't break any existing code.
The W3C XSL Working Group has published the requirements for the XSL Formatting Objects 2.0. "A number of XSL 1.0 implementations already support dynamic inclusion of vector graphics using W3C SVG. The XSL and SVG WGs want to define a tighter interface between XSL-FO and SVG to provide enhanced functionality. Experiments with the use of SVG paths to create non-rectangular text regions, or 'run-arounds', have helped to motivate further work on deeper integration of SVG graphics inside XSL-FO documents, and to work with the SVG WG on specifying the meaning of XSL-FO markup inside SVG graphics. A similar level of integration with MathML is contemplated."
Cambridge University's Toby O. H. White has released FoX, an open source, validating XML parser written in Fortran 95. It includes both SAX-like push and DOM interfaces. FoX is published under a BSD license.
The OpenOffice Project has released OpenOffice 2.4, an open source office suite for Linux, Solaris, and Windows that saves all its files as zipped XML. New features in 2.4 include:
- Connect to WebDAV servers via HTTPS
- Custom icons for toolbars are imported
- Control password-storing with a master password
- Warning if document is from a newer ODF
- PDF documents: relative links, document references, PDF/A-1 (ISO 19005-1) supported, and cross-document link behavior options
- Mac OS X: Quicktime support for movies and sound / use the built in spell checker
- Print dialog improvements in usability
- Edit boxes: warning at limit of characters
- DejaVu font is now default instead of BitStream Vera
Localisation
- Entries for 10 languages added
Base / DBA
- Improved rendering of numeric(n) data from JDBC and Oracle
- Easier choice of table name in "Copy table"
- Editing of views in HSQLDB
- Query designer for all properties which allow SQL command
- Query designer in SQL view
- Relation design accessible for MySQL databases
- Setting to check for required fields on forms
- Support for Access 2007 (.accdb files)
Calc
- Convert text to columns: with this feature CSV data inside cells can be transformed into columns directly
- Columns and rows in spreadsheet can be moved with drag and drop
- Enter key returns to the column where the input started, one row below
- Formula input: "+" and "-" can also be used to start
- Individual zoom level per sheet
- AutoFilter: choices clearer grouped and based on result of filtering in other columns
- DataPilot: Manual Sorting / Double-click in DataPilot cell provides calculation data of that cell
- Performance improvement with functions VLOOKUP and MATCH
- Print dialog for Calc easier to use
- PageUp and PageDown keys work in print preview
- Sheet names in cell-hyperlinks: renamed properly
Chart
- Regression curves: show equations and R² value
- Reverse axes possible
- Bars on different axes displayed next to each other
- Data labels: Number format
- Data point label: display both value and percentage
- Data label: display each part in a separate line
- Data labels: more flexible placement of labels
- Labels on pie segments: avoiding overlapping
- Data point label: can be removed with delete key
Draw
- Navigation (tab) order of page objects
- PDF export: page names as bookmark
- Reduce complexity: no longer necessary display options removed
Impress
- Navigation (tab) order of page objects
- Thrilling 3D effects in slide transitions
- Export slide names as PDF bookmarks
- Easier to insert background picture
Writer
- Selecting rectangular region of text
- Find and Replace: backward references in regular expressions
- Spell checking: easier selecting of the language
- Insert&Insert Object toolbar redesign - Writer
- Printing of hidden text can be turned on
- Printing text place holders can be turned off
- Shortcuts added for paragraph style Heading 4, Heading 5 and Textbody
- Ctrl-click behaviour for hyperlinks can be changed
- Custom document properties: Text fields and UI support
Extensions/ programmability / API
- Extensible Help System for extensions
- Extensions can have a separate display name
- Extensions: support of web based update
- Extensions: additional information about the publisher and release notes
- Extensions: check for updates
- Dialogs can have a wallpaper set
- Transparent background for controls
- Remote control presentations via API
- API: get selected table(s) or query(s) in the main Base window
The Mozilla Project has released Firefox 2.0.0.13. This release fixes a number of security issues. All users should upgrade.
A new version of SeaMonkey has also been posted, though Camino doesn't seem to have been updated yet. Camino users may want to switch to Firefox or Safari for the time being.
The W3C Semantic Web Best Practices and Deployment Working Group and HTML Working Groups have published a new working draft of RDFa Primer 1.0.
Current Web pages, written in XHTML, contain inherent structured data: calendar events, contact information, photo captions, song titles, copyright licensing information, etc. When authors and publishers can express this data precisely, and when tools can read it robustly, a new world of user functionality becomes available, letting users transfer structured data between applications and Web sites. An event on a Web page can be directly imported into a desktop calendar. A license on a document can be detected to inform the user of his rights automatically. A photo's creator, camera setting information, resolution, and topic can be published as easily as the original photo itself.
RDFa lets XHTML authors express this structured data using existing XHTML attributes and a handful of new ones. Where data, such as a photo caption, is already present on the page for human readers, the author need not repeat it for automated processes to access it. A Web publisher can easily reuse data fields, e.g. an event's date, defined by other publishers, or create new ones altogether. RDFa gets its expressive power from RDF [RDFPRIMER], though the reader need not understand RDF before reading this document.
For simplicity, instead of using RDF terminology, we use the word "field" to indicate a unit of labeled information, e.g. the "first name" field indicates a person's first name.
RDFa uses Compact URIs, which express a URI using a prefix, e.g.
dc:titlewheredc:stands forhttp://purl.org/dc/elements/1.1/. In this document, for simplicity's sake, the following prefixes are assumed to be already declared:dcfor Dublin Core [DC],foaffor Friend-Of-A-Friend [FOAF],ccfor Creative Commons [CC], andxsdfor XML Schema Definitions [XSD]:
dc: http://purl.org/dc/elements/1.1/foaf: http://xmlns.com/foaf/0.1/cc: http://creativecommons.org/ns#xsd: http://www.w3.org/2001/XMLSchema#We use standard XHTML notation for elements and attributes: both are denoted using fixed-width lowercase font, e.g.
div, and attributes are differentiated using a preceding '@' character, e.g.@href.
Here's a syntax example from the draft:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN" "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd">
<html xmlns:cal="http://www.w3.org/2002/12/cal/ical#"
xmlns:contact="http://www.w3.org/2001/vcard-rdf/3.0#">
<head>
<title>Jo's Friends and Family Blog</title>
</head>
<body>
...
<p instanceof="cal:Vevent">
I'm holding
<span property="cal:summary">
one last summer Barbecue,
</span>
on
<span property="cal:dtstart" content="20070916T1600-0500">
September 16th at 4pm.
</span>
</p>
...
<p class="contactinfo" about="http://example.org/staff/jo">
<span property="contact:fn">Jo Smith</span>.
<span property="contact:title">Web hacker</span>
at
<a rel="contact:org" href="http://example.org">
Example.org
</a>.
You can contact me
<a rel="contact:email" href="mailto:jo@example.org">
via email
</a>.
</p>
...
</body>
</html>
The thing that jumps out at me are the use of namespace prefixes in attribute values. Haven't we learned by now that this is a bad idea?
The W3C has published the second working draft of Cool URIs for the Semantic Web:
The Semantic Web is envisioned as a decentralised world-wide information space for sharing machine-readable data with a minimum of integration costs. Its two core challenges are the distributed modelling of the world with a shared data model, and the infrastructure where data and schemas can be published, found and used. Users benefit from getting information "raw and now" [Give] and in portable data formats [DP]. Providers often publish data embedded in a fixed user interface, in HTML. A basic question is thus how to publish information about resources in a way that allows interested users and software applications to find and interpret them.
On the Semantic Web, all information has to be expressed as statements about resources, like the members of the company Example.com are Alice and Bob or Bob's telephone number is "+1 555 262 or this Web page was created by Alice. Resources are identified by Uniform Resource Identifiers (URIs) [RFC3986]. This modelling approach is at the heart of Resource Description Framework (RDF) [RDFPrimer]. A nice introduction is given in the N3 primer [N3Primer].
Using RDF, the statements can be published on the Web site of the company. Others can read the data and publish their own information, linking to existing resources. This forms a distributed model of the world. It allows the user to pick any application to view and work with the same data, for example to see Alice's published address in your address book.
At the same time, Web documents have always been addressed with URIs (in common parlance often referred as Uniform Resource Locators, URLs). This is useful because it means we can easily make RDF statements about Web pages, but also dangerous because we can easily mix up Web pages and the things, or resources, described on the page.
So the question is, what URIs should we use in RDF? As an example, to identify the frontpage of the Web site of Example Inc., we may use http://www.example.com/. But what URI identifies the company as an organisation, not a Web site? Do we have to serve any content—HTML pages, RDF files—at those URIs? In this document we will answer these questions according to relevant specifications. We explain how to use URIs for things that are not Web pages, such as people, products, places, ideas and concepts such as ontology classes. We give detailed examples how the Semantic Web can (and should) be realised as a part of the Web.
Oracle's John Snelson has posted a beta of Faxpp, an
open source
XML pull parser written in C with an API that can return
UTF-8 or UTF-16 strings. Faxpp is published under the Apache License v2.
The W3C has published a proposed edited recommendation of XML Base (Second Edition). Changes since the first edition include:
The published errata (see http://www.w3.org/2001/06/xmlbase-errata) have been incorporated;
The definition of URI reference has been switched from RFC2396 to 3986;
The xml:base attribute has been redescribed as a Legacy Extended IRI, but this does not change its syntax (the December 2006 PER used the term "XML Resource Identifier" which was to be defined in an XLink revision, but that plan has been superseded by the definition of LEIRI in RFC 3987 bis);
Implementations are now encouraged to return base “URIs” without escaping non-URI characters;
The meanings of xml:base="" and xml:base="#frag" have been clarified;
The expected reference to XML Base in the forthcoming XML Media Types RFC (“son of 3023”) has been noted;
It has been clarified that normal validity rules apply to the xml:base attribute;
The out-of-date appendix describing effects on other standards has been removed;
Apple has released Safari 3.1 for Mac and Windows. This release speeds up JavaScript and plugs some security holes. New features include:
video and audio elementsimg element and CSS now support SVG images (Is inline SVG supported? I'll have to check. Yep, looks like it works but only if Safari recognizes the document as XHTML, not HTML. Firefox behaves similarly. )The W3C Working Group has published a new working draft of Protocol for Web Description Resources (POWDER): Description Resources.
The Protocol for Web Description Resources (POWDER) facilitates the publication of descriptions of multiple resources such as all those available from a Web site. These descriptions are always attributed to a named individual, organization or entity that may or may not be the creator of the described resources. This contrasts with more usual metadata that typically applies to a single resource, such as a specific document's title, which is usually provided by its author.
This document sets out how Description Resources (DRs) can be created and published, whether individually or as bulk data, how to link to DRs from other online resources, and, crucially, how DRs may be authenticated and trusted. The aim is to provide a platform through which opinions, claims and assertions about online resources can be expressed by people and exchanged by machines. POWDER has evolved from the data model developed for the final report [XGR] of the Web Content Label Incubator Group [WCL-XG] from which we define a Description Resource as: "a resource that contains a description, a definition of the scope of the description and assertions about both the circumstances of its own creation and the entity that created it."
The method of defining the scope of a DR, that is, defining what is being described, is provided in a separate document: Grouping of Resources [GROUP]. Companion documents describe the RDF/OWL vocabulary [VOC] and XML data types [WDRD] that are derived from the Grouping of Resources document and this document, with each term's domain, range and constraints defined. As each term is introduced in this document, it is linked to its description in the vocabulary document.
The W3C XQuery working group has posted the candidate recommendations of XQuery Update Facility, XQuery Update Facility Use Cases, and XQuery Update Facility 1.0 Requirements. XQuery as it currently exists is basically just SELECT in SQL terms. XQuery Update adds INSERT, UPDATE, and DELETE. More specifically it is:
upd:mergeUpdatesupd:revalidateupd:applyUpdatesupd:insertBeforeupd:insertAfterupd:insertIntoupd:insertIntoAsFirstupd:insertIntoAsLastupd:insertAttributesupd:deleteupd:replaceNodeupd:replaceValueupd:replaceElementContentupd:renameupd:removeTypeupd:setToUntypedThis is one of the last two pieces before XQuery 1.0 is really complete. (The other is full-text search.)
The Helsinki University of Technology has released X-Smiles 1.2, a proof-of-concept XForms engine written in Java. Version 1.2 improves support for XBL 2 bindings.
The W3C Authoring Tool Accessibility Guidelines Working Group has posted new working drafts of Authoring Tool Accessibility Guidelines 2.0 and Implementation Techniques for Authoring Tool Accessibility Guidelines 2.0. "An authoring tool that conforms to these guidelines will promote accessibility by providing an accessible user interface to authors with disabilities as well as enabling, supporting, and promoting the production of accessible Web content by all authors." and
The W3C Web API Working Group has published the last call working draft of ElementTraversal Specification. "This specification defines the ElementTraversal interface, which allows script navigation of the elements of a DOM tree, excluding all other nodes in the DOM, such as text nodes. It also provides a property to expose the number of child elements of an element. It is intended to provide a more convenient alternative to existing DOM navigation interfaces, with a low implementation footprint." Hmm, just what the DOM needs: yet another way to do it.
ElementTraversal provides some extra properties/methods for navigating only through elements, while ignoring text and white space:
firstElementChildlastElementChildpreviousElementSiblingnextElementSiblingchildElementCount