VoiceXML

Main Menu

Section Menu

Syllabus

Prompt

Page 2 of 6

The prompt element controls the output of synthesized speech and prerecorded audio. Conceptually, prompts are instantaneously queued for playing, so interpretation proceeds until the user needs to provide an input. At this point, the prompts are played, and the system waits for user input. Once the input is received from the speech recognition subsystem (or the DTMFrecognizer), interpretation proceeds.

Prompts have the following attributes:

bargein	Control whether a user can interrupt a prompt. Default is true.
cond	An expression telling if the prompt should be spoken. Default is true.
count	A number that allows you to emit different prompts if the user is doing something repeatedly. If omitted, it defaults to “1”.
timeout	The timeout that will be used for the following user input. The default noinput timeout is platform specific.

Basic Prompts

You’ve seen prompts in the previous examples:

<prompt>Please say your city.</prompt> <font color="#00376F">

You can leave out the <prompt> … </prompt> if:

There is no need to specify a prompt attribute (like bargein), and
The prompt consists entirely of PCDATA (contains no speech markups) or consists of just an <audio> element.

For instance, these are also prompts:


Please say your city. 
<audio src="say_your_city.wav"/>

But the <prompt> … </prompt> cannot be removed from this prompt due to the embedded speech markups:


<prompt>Please <emp>say</emp> your city.</prompt>

Speech Markup

Prompts can have markup to indicate emphasis, breaks, and prosody:


<prompt> This is
<emp>also</emp> computer-generated text. 
   <break size="medium"/> Do you like it? </prompt>

VoiceXML supports the following speech markup elements:

<break>

Specifies a pause in the speech output. Attributes of <break> are:

msecs	The number of milliseconds to pause.
size	A relative pause duration. Possible values are: none, small, medium or large.

At most one of msecs and size must be specified. If neither are specified, size="medium" is assumed.

<div>

Identifies the enclosed text as a particular type. Attributes of <div> are:

type	Possible values are sentence or paragraph.

<emp>

Specifies that the enclosed text should be spoken with emphasis. Attributes of <emp> are:

level	Specifies the level of emphasis. Possible values are: strong, moderate (default), none or reduced.

<pros>

Specifies prosodic information for the enclosed text. For details about the format of attribute values, see the Java API Speech Markup Language specification (v0.5 - August 28, 1997)

Attributes of <pros> are:

rate	Specifies the speaking rate.
vol	Specifies the output volume.
pitch	Specifies the pitch.
range	Specifies the pitch range.

<sayas>

Specifies how a word or phrase is spoken. Attributes of <sayas> are:

phon	The representation of the Unicode International Phonetic Alphabet (IPA) characters that are to be spoken instead of the contained text.
sub	Defines substitute text to be spoken instead of the contained text.
class	Possible values are phone, date, digits, literal, currency, number and time.

Sometimes text needs to be rendered using a particular style. For example, a telephone number adhering to the North American Dialing Plan needs a break after the first three digits, and another break after the second three digits. To effect this, use the class attribute:


<prompt>
   You are calling <value expr="home_num" class="phone"/>
</prompt> 
<prompt>You are calling 
   <sayas class="phone">312-555-1212</sayas> 
</prompt>

While the interpreter must tolerate the full set of speech markup, if its implementation platform uses a text-to-speech engine that doesn’t have this level of speech markup functionality, the platform will have to map the VoiceXML markups as best it can. Specifically, all platforms must allow all speech markup elements, and if an element with contained text is not supported, the contained text must still be spoken.

Audio Prompting

Prompts can have audio clips intermingled with synthesized speech:


<prompt> 
   Welcome to the Bird Seed Emporium. 
   <audio src="http://www.birdsounds.example/thrush.wav"/> 
   We have 250 kilogram drums of thistle seed for 
   <sayas class="currency">$299.95</sayas> 
   plus shipping and handling this month. 
   <audio src="http://www.birdsounds.example/mourningdove.wav"/> 
</prompt>

Audio can be played in any prompt. Typically it is specified via a URI, but it can also be in an audio variable previously recorded:


<prompt> 
   Your recorded greeting is 
   <value expr="greeting"/> 
   To rerecord, press 1. 
   To keep it, press pound. 
   To return to the main menu press star M. 
   To exit press star, star X. 
</prompt>

The audio tag can have alternate text (with markups) in case the audio sample is not available:

<prompt> 
   <audio src="welcome.wav"><emp>Welcome</emp>
     to Voice Portal.
   </audio> 
</prompt>

If the audio file cannot be played (e.g. unsupported format, invalid URI, etc.), the content of the audio element is played instead. The content may include text, speech markup, or another audio element. If the audio file cannot be played (e.g. unsupported format, invalid URI, etc.) and the content of the audio element is empty, an appropriate error event will be thrown.

Attributes of <audio> include:

src	The URI of the audio prompt. See Appendix E for suggested audio file formats.
caching	Either safe to force a query to fetch the most recent copy of the content, or fast to use the cached copy of the content if it has not expired. If not specified, a value derived from the innermost caching property is used. /td>
fetchtimeout	The interval to wait for the content to be returned before throwing an error.badfetch event. If not specified, a value derived from the innermost fetchtimeout property is used.
fetchhint	fetchhint Defines when the interpreter context should retrieve content from the server. prefetch indicates a file may be downloaded when the page is loaded, whereas safe indicates a file that should only be downloaded when actually needed. In the case of a very large file (implying long download times) or a streaming audio source, stream indicates to the interpreter context to begin processing the content as it arrives and should not wait for full retrieval of the content. If not specified, a value derived from the innermost relevant *fetchhint property is used.

The <value> Element

Prompts can contain embedded variable references using the <value> element:

<prompt>
  You are calling <value expr="home_num"/>
</prompt>

Attributes of <value> are:

expr	The expression to render.
class	The <sayas> class of the variable, e.g. phone, date, currency. The valid formats are the same as those supported in the <sayas> speech markup.
mode	The type of rendering: tts (the default), or recorded.
recsrc	The URI of the audio files to be concatenated when mode isrecorded.

Barge-in

If an implementation platform supports barge-in, the service author can specify whether a user can interrupt, or “barge-in” on, a prompt. This speeds up conversations, but is not always desired. If the user must hear all of a warning, legal notice, or advertisement, barge-in should be disabled. This is done with the bargein attribute:


<prompt bargein="false">
  <audio src="legalese.wav"/>
</prompt>

Users can interrupt a prompt whose bargein attribute is true, but must wait for completion of a prompt whose bargein attribute is false. In the case where several prompts are queued, the bargein attribute of each prompt is honored during the period of time in which that prompt is playing. If bargein occurs during any prompt in a sequence, all subsequent prompts are not played. If bargein is not specified, then the value of the bargein property is used.

Prompt Selection

Tapered prompts are those that may change with each attempt. Information-requesting prompts may become more terse under the assumption that the user is becoming more familiar with the task. Help messages become more detailed perhaps, under the assumption that the user needs more help. Or, prompts can change just to make the interaction more interesting.

Each form item and each menu has an internal prompt counter that is reset to one each time the form or menu is entered. Whenever the system uses a prompt, its associated prompt counter is incremented. This is the mechanism supporting tapered prompts.

For instance, here is a form with a form level prompt and field level prompts:


<form id="tapered"> 
  <block> 
    <prompt bargein="false">
      Welcome to the ice cream survey.
    </prompt> 
  </block> 
  <field name="flavor"> 
    <grammar>vanilla|chocolate|strawberry</grammar> 
    <prompt count="1">What is your favorite flavor?</prompt> 
   <prompt count="3">Say chocolate, vanilla, or strawberry.</prompt> 
   <help>Sorry, no help is available.</help> 
  </field> 
</form>

A conversation using this form follows:

C: Welcome to the ice cream survey.

C: What is your favorite flavor? (the “flavor” field’s prompt counter is 1)

H: Pecan praline.

C: I do not understand.

C: What is your favorite flavor? (the prompt counter is now 2)

H: Pecan praline.

C: I do not understand.

C: Say chocolate, vanilla, or strawberry. (prompt counter is 3)

H: What if I hate those?

C: I do not understand.

C: Say chocolate, vanilla, or strawberry. (prompt counter is 4)

H: …

When it is time to select a prompt, the prompt counter is examined. The child prompt with the highest count attribute less than or equal to the prompt counter is used. If a prompt has no count attribute, a count of “1” is assumed.

A conditional prompt is one that is spoken only if its condition is satisfied. In this example, a prompt is varied on each visit to the enclosing form.


<form id="another_joke"> 
   <var name="r" expr="Math.random()"/> 
   <field name="another" type="boolean"> 
       <prompt cond="r < .50"> 
          Would you like to hear another elephant joke? 
       </prompt> 
       <prompt cond="r >= .50"> 
         For another joke say yes.  To exit say no. 
       </prompt> 
       <filled> 
          <if cond="another"> 
            <goto next="#pick_joke"/> 
          </if> 
       </filled> 
   </field> 
</form>

When a prompt must be chosen, a set of prompts to be queued is chosen according to the following algorithm:

Form an ordered list of prompts consisting of all prompts in the enclosing element in document order.
Remove from this list all prompts whose cond evaluates to false.
Find the “correct count”: the highest count among the prompt elements still on the list less than or equal to the current count value.
Remove from the list all the elements that don't have the “correct count”.

All elements that remain on the list will be queued for play.

Timeout

The timeout attribute specifies the interval of silence allowed while waiting for user input after the end of the last prompt. If this interval is exceeded, the platform will throw a noinput event. This attribute defaults to the value specified by the timeout property (see Section 17).

The reason for allowing timeouts to be specified as prompt attributes is to support tapered timeouts. For example, the user may be given five seconds for the first input attempt, and ten seconds on the next.

The prompt timeout attribute determines the noinput timeout for the following input:


<prompt count="1">
  Pick a color for your new Model T.
</prompt> 

<prompt count="2" timeout="120s"> 
   Please choose color of your new nineteen twenty four
   Ford Model T. Possible colors are black, black, or
   black.  Please take your time. 
</prompt>

If several prompts are queued before a field input, the timeout of the last prompt is used.

Main Menu

Section Menu

Syllabus