Recognition Ideas/ Algorithms

 

Possible Stages

Seperation of utterance into phonemes

For computer systems, which are speaker-independent, phonemes are extracted from the audio provided

Recognition of phonemes

Each phoneme is recognized and converted into ASCII characters

Grouping of Phonemes into potential words

The phonemes recognized above are formulated into potential/hypothesized words.

Word Recognition

Many techniques can be used here. Sometimes a domain is assumed, meaning a constrainted grammar (collection of words, setences/requests) is defined. This makes the recognition problem easier that unconstrained, natural language recognition.

There are mathematical formulas and models used to identify the most likely word spoken. These models match spoken words against known word models and selects one that has the greatest likelihood of being the correct word. In order to identify the "greatest likelihood", large amounts of training data is used to create the models. This type of statistical model is known as the Hidden Markov Model (HMM).

Other techniques like Time Warping and Neural Networks have also been used.

 

Semantics/ Natural Language Processing

Sometimes semantics and rules of languages involved in the area of Natrual Language Processing are utilized to improve recogition. This goes beyond the recognition of words independent of themselfs but, looks at surrounding words, previous phrases, sentences, etc.

 

 

 

© Lynne Grewe