VoiceXML

 

Research Topics Page 4 of 5

 

There is a large potential for research in this field. Some of the interesting research topics to consider are:

Confidence Measures for Automatic Speech Recognizers

In the last years, the vocal dialogue with computers has become a reality. Recent advances in speech recognition technology have enabled speech recognition systems to migrate from search laboratories to many commercial services and products. The goal of the present job is to study and to estimate methods to associate measures that indicate the reliability of the recognition to one or more recognized words. This can help to avoid unnecessary verification turns in automatic recognition systems and to make the dialogue man-machine more natural and less boring for the customer. It is intended to study and to compare two techniques: Statistical Hypothesis Test - It estimates the ratio between the likelihood of the observation sequence given the model of the recognized word and the likelihood of the same observation sequence given a complementary model (anti-model) of the same word. The confidence is estimated as A Posteriori Word Probability given the observation sequence. The estimation of the posterior word probabilities is the sum of a posteriori probabilities of all paths crossing that one word. To achieve this, we first find a complete alignment of all words in the recognition lattice (graph of the paths more probable), merging word hypothesis based on temporal overlapping and phonetic likeness considerations. The second method uses only data directly supplied by the recognizer without further processing. This makes it a suitable method in real time applications.

Development of a multimodal service distribution platform

Access to the World Wide Web is mainly achieved through personal computers, such as desktops and notebooks. A PC enables the user to navigate the Internet through visual browsers, which present output information on a monitor and receive input commands from keyboard or mouse. It is now possible to search for information on Internet through wireless telephones devices that support Wireless Application Protocol (WAP) browsers, equipped with a small display. On such devices output is presented to the user on the small display, and input commands are given by pressing the dial tone keypad. There are situations in which it would be easier to access an Internet browser through voice commands and receive an audio description of the browsed page content, keeping enabled the standard I/0 modalities. This approach to Internet browsing seems useful in the following situations: · Users with physical disabilities might find it much easier to access a browser with voice commands and be guided by an audio assistant. · Using a small keypad on a cellular phone might be very frustrating. It seems easier to give voice commands to a WAP browser rather than pushing long sequences of buttons on a keypad. Also users not able to freely use their hands, perhaps drivers, would find it much more useful to keep their hands busy and navigate the browser through voice. · Even on a personal computer equipped with headphones and microphone a voice browser would have some great features. Think of an e-commerce site giving an audio assistant to the customer, or the possibility to skip frequent operations like checking for new mail, while it would be easier to give an appropriate voice command. We intend to investigate on how to create a user interface, called Multimodal Voice Browser, and a service distribution platform that enables the user to access the Internet through standard I/0 modalities (display, keyboard, mouse) and voice (microphone, speakers) simultaneously, on devices such as PC or WAP telephones.

Realization of an interface between VoiceXML and automatic dialog development systems

Recently the forecast of a gradual web transition from the traditional services to the voice-based services has induced some great companies (like Nuance, IBM, Microsoft) to make available almost complete recognition systems, tools for grammar development and document builders in several languages. The main difficulty is to find an automatic tool for dialogue development able to simplify the job of the developer. ABLA's software belongs to this category for its characteristics of facility and speed of use. Moreover the general tendency, not only in the vocal field, is the integration with XML world. ABLA also intend to extend the potentialities of InstantSpeech (speech recognition module) allowing the generation of dialogues directly in VoiceXML language. To achieve that it is necessary a deepened analysis of the characteristics of VoiceXML in order to map the structure of an InstantSpeech dialogue in a standard structure (that must maintain the characteristics of efficiency and speed of the InstantSpeech environment, especially in the case of mixed-initiative dialogues). The goal is to develop an extension of the Dialogue Builder with an interface towards the VoiceXML environment, realizing an alternative architecture full compatible with every platforms of vocal I/O (i.e., Automatic Speech Recognition and Text-To-Speech Systems).

Future developments in this field include the following: Leveraging the best aspects of AT&T and Lucent phone markup languages and Motorola's VoxML technology, together with the VoiceXML Forum's large collection of supporters and contributors, is expected to yield an open, broadly applicable voice markup language standard for all to use.

The end result will have the telephony features needed to build sophisticated interactive voice services for business applications, such as call centers, as well as all of the functions needed to provide speech-driven interfaces to all manner of end users. VoiceXML will help deliver voice services from the high-mobility worker on a cellular phone calling the company intranet to get information on a sales prospect to mom calling to get a weather report before sending the kids out for the day.

VoiceXML will include conventional telephony input, output and call control features, including: touch-tone input, automatic speech recognition support, audio recording (e.g., for voice mail), the ability to play recordings (such as WAV files), speech synthesis from plain or annotated text, call transfer, conferencing, and other advanced call management features. As an XML-based definition with an HTML-like appearance, VoiceXML will be easy to learn for experienced Web content programmers and amenable to easy processing by tools to support desktop development of VoiceXML Web applications.