Global Sources
EE Times-India
Stay in touch with EE Times India
EE Times-India > Embedded

Building robust HMM models for speech recognition of the hearing impaired

Posted: 01 Oct 2012     Print Version  Bookmark and Share

Keywords:hearing impairment  Hidden Markov Model  speech recognition systems 

The degree of hearing impairment varies widely from person to person. Some people have partial hearing loss because they can hear some sounds. People with complete hearing loss cannot hear anything. In some types of hearing loss, a person can have much more trouble when there is background noise. One or both ears may be affected, and the impairment may be severe in one ear than in the other. Congenital hearing loss is present at birth. Acquired hearing loss occurs later in life—during childhood, the teen years, or in adulthood.

According to the National Institute on Deafness and Other Communication Disorders, about 28 million Americans are deaf or hearing impaired. That's about 1 out of every 10 people. Another 30 million are exposed to hazardous noise levels on a regular basis. In India, among the one billion plus population, the number of deaf people can not be accurately estimated. It is known to be in the millions—some estimates are as high as 60 million.

Hearing aids and lip-reading are more effective in face-to-face communication between small groups of hearing impaired people. Unfortunately, there are many events such as public meetings and lectures where the speaker may be poorly lit or too far away to be seen or heard clearly, or where high levels of background noise prevent the successful use of a hearing aid. In these circumstances simultaneous visual transcript of speech may be helpful. [1]

Hearing aids come in various forms that fit inside or behind the ear and amplify sound louder. They are adjusted by the audiologist so that the sound coming in is amplified enough to allow the person with a hearing impairment to hear it clearly. Sometimes, the hearing loss is so severe that the most powerful hearing aids can't amplify the sound sufficiently. In those cases, a cochlear implant may be recommended. Cochlear implants are surgically implanted devices that bypass the damaged inner ear and send signals directly to the auditory nerve. A small microphone behind the ear picks up sound waves and sends them to a receiver that has been placed under the scalp. This receiver then transmits impulses directly to the auditory nerve. These signals are perceived as sound and heard. Many people with implants learn to hear sounds effectively by using telephone. This is particularly true for parents of hearing impaired children and want their children to be able to function in the deaf community if they are also hearing impaired. The language of the deaf community is American Sign Language (ASL). ASL is a system of gestures, that many deaf and hearing-impaired people use it for communication.

One of the problems encountered in analysing the speech of the deaf is the large variability in pronunciation among speakers. Differences between deaf speakers are substantially greater than differences between normal speakers and thus correspondingly more data are needed to separate out differences between talkers corresponding to deaf and normal speakers [2].The language skills of these children are, on the average, severely retarded; their speech production and speech reception are, at the best, of limited use; their vocabulary, grammar, and reading show great deficiencies in comparison with normal children. Consequently, their education is restricted even when the most intense efforts are made to keep pace with normal education [3].

Loudness of sounds in the range (0–25db), (25–40db), (40–55db), (55–70db) is classified as Normal hearing sensitivity, Mild hearing loss, Moderate hearing loss, Moderately severe hearing loss respectively. (70–90db) is classified as severe hearing loss and > 90 dB is classified as Profoundly deaf. Mild to severe hearing loss is known as hard of hearing. There are so many problems faced by the hearing impaired. The three main issues are Education, Employment and Communication. Since hearing impaired children receive insufficient sound information, it is tough for them to talk fluently. Using speech therapy, they can improve their ability to speak. But speech therapy requires well trained staff and modern training equipment. [4]

Since their speech could not be understood by others, they won't behave socially like normal people in the society. On most of the occasions, even their parents and teachers cannot recognise their speech.

In this scenario, if a system is developed for recognising the hearing impaired speech in real time, the communication difficulties between the hearing impaired and the normal will be minimised. If the hearing impaired has this system in their pocket, their sounds can be converted into intelligible speech which is easily understood by normal people. Using this, they can also operate voice operating devices and telephone which is illustrated in the figure 1.

Figure 1: Proposed Hearing impaired speech recognition system.

In this paper, recognition of hearing impaired speech is carried out by the use of Hidden Markov Model [5] with various features. The reasons behind the popularity of the method are the inherent statistical framework, the ease and availability of training algorithms for estimating the parameters of the models from finite training sets of speech data and the flexibility of the resulting recognition system in which one can easily change the size, type, or architecture of the models to suit particular words, sounds [6].

Linear prediction coefficients (LPC) were introduced in late 1960s which model the speech signal as the output of a linear, time varying system excited by quasi-periodic pulses. This method was used especially for estimating basic speech parameters such as pitch, formants and spectrum. However these coefficients have not been found very much useful in recognition applications.

By using Mel basis frequency scale, which imitates the human ear behaviour, MFCC has been used in many recognition applications successfully. Windowed speech frame is pre emphasized and power spectrum (FFT) is taken for each frame then it is filtered by a set of filters. From the output power of the filter bank, Mel frequency cepstrum is calculated using DCT [7]. Experiments showed that MFCC was very successful when the clean speech is used for training and testing. However we encounter many types of noise sources in daily life. Therefore, in recent years robustness has become more important in recognition along with the performance. Perceptual linear predictive (PLP) analysis uses the concepts of psychoacoustics of hearing.

The features considered here are LPCC, PLP, and MFCC with their delta and acceleration coefficients. In general the mixture values in HMM are selected randomly according to the no. of speakers. Here it is varied from 3 to 7 since the speakers are 10 and recognition performance also good. If we exceed the value 7 there has been slight change in the recognition performance.

Source of data
The performance of speech recognition is usually evaluated in terms of accuracy and speed. In general, speech recognition is a very complex problem since the same words spoken by the same person will not be equal in different time periods. Vocalisations with regard to hearing impaired children vary in terms of accent, pronunciation, articulation, roughness, nasality, pitch, volume and speed. We can informally check (cheque for banks) whether the children are hard of hearing and profoundly deaf. Within 5 meters a child should hear the following sounds ; If not, they have difficulty hearing. For this study, we have taken speech samples of 10 deaf children in the age group of 5-10 years from Maharishi Vidhya Mandir service centre for the hearing impaired, Tiruchirappalli. The deaf children are able to follow only one language, mostly their native language because they cannot follow the different facial expressions and throat vibrations. Like a Speech therapist, we have used the integration of tactile to record the speeches of hearing impaired. i.e. we make them to touch our throat by their hands to feel the vibration when pronouncing a particular word and ask them to see our facial expressions and listen to the sound. They have to visually see the words also, by writing them on the board. Among the ten children seven are profoundly deaf and three are hard of hearing. In this work, we have taken isolated digits from to .

1 • 2 • 3 • 4 Next Page Last Page

Comment on "Building robust HMM models for speec..."
*  You can enter [0] more charecters.
*Verify code:


Visit Asia Webinars to learn about the latest in technology and get practical design tips.


Go to top             Connect on Facebook      Follow us on Twitter      Follow us on Orkut

Back to Top