Many techniques can be used here. Sometimes a domain is assumed,
meaning a constrainted grammar (collection of words,
setences/requests) is defined. This makes the recognition
problem easier that unconstrained, natural language recognition.
There are mathematical formulas and models used to identify
the most likely word spoken. These models match spoken words
against known word models and selects one that has the greatest
likelihood of being the correct word. In order to identify
the "greatest likelihood", large amounts of training data
is used to create the models. This type of statistical
model is known as the Hidden Markov Model
(HMM).
Other techniques like Time
Warping and Neural Networks
have also been used.