The W3C Multimodal Interaction Activity has posted the first public working draft of Multimodal Architecture and Interfaces. Quoting from the draft,
This document describes the architecture of the Multimodal Interaction (MMI) framework and the interfaces between its constituents. The MMI Working Group is aware that multimodal interfaces are an area of active research and that commercial implementations are only beginning to emerge. Therefore we do not view our goal as standardizing a hypothetical existing common practice, but rather providing a platform to facilitate innovation and technical development. Therefore the aim of this design is to provide a general and flexible framework providing interoperability among modality-specific components from different vendors - for example, speech recognition from one vendor and handwriting recognition from another. This framework places very few restrictions on the individual components or on their interactions with each other, but instead focuses on providing a general means for allowing them to communicate with each other, plus basic infrastructure for application control and platform services.
Our framework is motivated by several basic design goals:
- Encapsulation. The achitecture should make no assumptions about the internal implementation of components, which will be treated as black boxes.
- Distribution. The architecture should support both distributed and co-hosted implementations.
- Extensibility. The architecture should facilitate the integration of new modality components. For example, given an existing implementation with voice and graphics components, it should be possible to add a new component (for example, a biomentric security component) without modifying the existing components.
- Recursiveness. The architecture should allow for nesting, so that an instance of the framework consisting of several components can be packaged up to appear as a single component to a higher-level instance of the architecture.
- Modularity. The architecture should provide for the separation of data, control, and presentation.