Global Sources
EE Times-India
Stay in touch with EE Times India
EE Times-India > Embedded

Reduce driver distraction using speech recognition

Posted: 15 Aug 2011     Print Version  Bookmark and Share

Keywords:speech recognition  automotive  software 

By going off-board, voice recognition is not "grammar bound" by a fixed vocabulary as with the memory and CPU limits of an embedded system in the car. "SMS is 'general' grammar [i.e. any combination of letters]. So if you have the connectivity, take advantage of it" to do the processing needed off-board, notes Radloff. Cloud-based service also keeps navigation system points-of-interest (POIs) and construction site data up-to-date.

Also in the offing more near term is installation of more than one microphone, which allows more sophisticated noise cancellation and beam forming. Processing directs the "listening beam" (for instance by manipulating delay of the same sound between mics) to "focus" on the driver, lowering the tendency to pick up passenger voices.

Microphone manipulation
More insight into the nuances of microphone installation is provided by Scott Pennock, senior hands-free standard specialist, Hands-Free and Speech Technology, for QNX Software Systems, which partners with Nuance and provides acoustic processing middleware in creating speech interfaces. One QNX focus is delivering better voice signals to the speech recognition system.

image name

Figure 2: Two or more microphones using audio processing can form a sensitivity "beam" to pick up the driver's voice and reject sounds from the background-talking passengers.

"Vehicle noise is diffuse, the same throughout the cabin," Pennock says. "The challenge with the far-field mics comes about because if you double the distance to the speaker (driver), you take a 6 dB hit in the signal-to-noise (S/N) ratio." Thus it is better to install a microphone in the headliner, about 12 inches from the driver's mouth, rather than on the rearview mirror, up to 24 inches away.

As for adding another mic for beam forming on the driver, there is also a S/N benefit, adds Pennock. But this is only a 3 dB improvement because the "noise floor" is raised by 3 dB (i.e. the second mic not only picks up speech but noise as well).

In developing speech recognition systems, Pennock cites another challenge that may not be all that obvious. A system is specified with a required accuracy rate, but determining if that rate has been achieved can be daunting. This task was easier when systems used set commands rather than natural language. Systems are now pitted against natural utterances as people speak normally.

Testing may be done with live subjects, which is time consuming, and sampling may still not be large enough for today's increasing grammars to ensure all accents are adequately covered, Pennock notes. It is better to build a library of utterances that can be played back more efficiently. The utterances should be collected in a vehicle noise environment, where people tend to talk louder at a higher pitch. Interestingly, a person speaking a string of familiar phone number digits in a natural cadence produces a higher recognition rate than deliberately slowing, Pennock says.

Voice systems need to be tested under different operating conditions. These can range from idle to 70 mph with climate control fans on high; during rain, where the noise is not steady but a dynamically varying signal; and riding over louder concrete or quieter asphalt.

Good speech-recognition user-interface design is more than just high recognition rates, however. How a system recovers from errors has to take into account both expert and novice users, notes Pennock. A re-prompt from the system when a phrase is not recognised may at first be, "Did you say xyz?" By detecting response pauses, the system can assume the user needs more verbal prompts to perhaps learn phrases, where as an experienced user will just confirm or repeat a request. The system then transitions a user over time to a more expert level.

Pennock concludes that with the multi-mode user interfaces available today, it seems speech is most effective to input more complex "information in a user's head" (such as requests for POIs, audio selections, or phone calls) with more natural language interaction without resorting to distracting touch scrolling. Whereas a single, simple action (climate mode selection or temperature increase) is done effectively with a quick touchscreen/switch stroke.

Similarly, Brigitte Richardson, Global Voice Control Technology/Speech Systems lead engineer for Ford notes some fans of voice recognition want to expand its use for such functions as seat adjustments and window control—applications she feels are an overuse of the technology because these simple tasks are handled adequately now with familiar, basic switches.

However, one trend is apparent—speech recognition is an increasing enabler for interacting with new automotive features, and user devices' connectivity, offering ease of use in a minimally distractive, safer manner.

- Rick DeMeis
  EE Times

To download the PDF version of this article, click here.

 First Page Previous Page 1 • 2

Comment on "Reduce driver distraction using spee..."
*  You can enter [0] more charecters.
*Verify code:


Visit Asia Webinars to learn about the latest in technology and get practical design tips.


Go to top             Connect on Facebook      Follow us on Twitter      Follow us on Orkut

Back to Top