Fig

Fig. 6.9 GUI of pattern matching of saha. Fig. 6.10 GUI of recognized numeral saha.

Fig. 6.11 GUI of pattern matching of nau. Fig. 6.12 GUI of recognized numeral nau.
6.2 Testing and Results
6.2.1 Testing with pre-recorded samples
Out of the 20 samples recorded for each word, 16 were used for training purpose. We tested our program’s accuracy with these 4 unused samples. A total of 20 samples were tested (4 samples each for the 5 words) and the program yielded the right result for all 20 samples. Thus, we obtained 100% accuracy with pre- recorded samples.

Best services for writing your paper according to Trustpilot

* All Partners were chosen among 50+ writing services by our Customer Satisfaction Team

6.2.2 Real-time testing
For real-time testing, we took a sample using microphone and directly executed the program using this sample. A total of 30 samples were tested, out of which 24 samples gave the right result. This gives an accuracy of about 80% with real-time samples.

6.2.3 Results
Case 1: Speaker independent (20 templates per digit 10 male, 10 female)
The above implemented work is tested for 100 samples of each word spoken by 50 different speakers with 2 samples of each digit per head.

The testing work leads to the results given in Table 6.1.

Table 6.1 Accuracy of the Speaker Independent Test Results.

DIGIT 0 1 2 3 4 5 6 7 8 9
% ACCURACY 87 88 82 78 79 84 85 81 78 87
Case 2: Speaker Dependent (one template per digit).
The above implemented work is tested for 10 samples of each word spoken by single speaker. The results are given in Table 6.2.

Table 6.2 Accuracy of the Speaker Dependent Test Results.

DIGIT 0 1 2 3 4 5 6 7 8 9
% ACCURACY 90 91 84 90 87 88 92 84 86 92
It is observed that the accuracy of the pre-recorded samples is more than that of the real-time testing samples. We have also observed that the accuracy of the speaker dependent samples is more than that of the speaker independent samples.
Table 6.3 Confusion Matrix of the MFCC ; DTW Recognition.

ekdon teen char pachsahasat aathnaushunyaAvg. %
ek1 1 1 4 1 1 1 1 1 0 80
don 2 2 2 2 3 2 2 2 2 2 90
teen 3 3 3 3 9 3 3 2 2 2 80
char 4 4 5 4 4 4 4 6 4 4 80
pach5 5 5 5 5 5 5 5 5 3 90
Saha6 6 6 6 1 6 6 4 6 6 80
Sat 7 7 8 7 7 7 7 7 7 7 90
Aath2 8 8 8 8 7 8 8 8 8 80
nau9 9 4 9 9 5 9 9 9 9 80
shunya0 0 0 0 5 0 0 2 0 0 80
Table 6.4 Confusion Matrix of the MFCC ; HMM Recognition.

ekdon teen char pachsahasat aathnaushunyaAvg. %
ek1 1 1 1 1 1 1 3 1 1 90
don 2 2 2 2 2 2 2 2 2 5 90
teen 3 3 3 3 3 3 1 3 3 3 90
char 4 3 4 4 4 4 4 4 8 4 80
pach5 5 5 5 5 5 5 5 5 5 100
Saha6 6 6 8 6 6 6 6 6 6 90
Sat 7 7 7 7 7 7 7 5 7 7 90
Aath8 8 8 8 8 8 8 8 5 8 90
nau9 9 9 9 9 7 9 9 9 9 90
shunya0 0 0 7 0 0 0 5 0 0 80
Table 6.5 Comparison Digit Recognition Accuracy Test Results.

Numeral DTW Accuracy HMM Accuracy
ek80 90
don 90 90
teen 80 90
char 80 80
pach90 100
Saha80 90
Sat 90 90
Aath80 90
nau80 90
shunya80 80
Average % 83% 89%
Experimentally, it is observed that recognition accuracy is better for HMM compared with DTW, but the training procedure in DTW is very simple and fast, as compared with the HMM.

Fig. 6.13 Recognition accuracy of the DTW ; HMM.

The time required for recognition of numerals using HMM is more as compared to DTW, as it has to go through the many states, iteratations; many more mathematical modeling, so DTW is preferred for the real-time applications as compared with the HMM.

Conclusions and Future Scopes
7.1 Conclusions
Though the advances accomplished throughout the last decades, automatic speech recognition (ASR) is still a challenging and difficult task 1.

The non-parametric method for modeling the human auditory perception system, Mel Frequency Cepstral Coefficients (MFCCs) isused as extraction techniques. The nonlinear sequence alignment known as Dynamic Time Warping (DTW) has been used as features matching techniques. The nonlinear sequence alignment known as Dynamic Time Warping (DTW) has been used as features matching techniques. Since it’s obvious that the voice signal tends to have different temporal rate, the alignment is important to produce the better performance.
This paper proposed that higher recognition rates can be achieved using MFCC features with DTW which is useful for different time varying numeral speech utterances.
MFCC analysis provides better recognition rate than LPC as it operates on a logarithmic scale which resembles human auditory system whereas LPC has uniform resolution over the frequency plane. This is followed by pattern recognition. Since the voice signal tends to have different temporal rate, DTW is one of the methods that provide non-linear alignment between two voice signals.

Another method called HMM that statistically models the words is also presented. Experimentally it is observed that recognition accuracy is better for HMM compared with DTW, but the training procedure in DTW is very simple and fast, as compared with the HMM.

The time required for recognition of numerals using HMM is more as compared to DTW, as it has to go through the many states, iteratations& many more mathematical modeling, so DTW is preferred for the real-time applications as compared with the HMM .

DTW is a cost minimization matching technique, in which a test signal is stretched or compressed according to a reference template.

The accuracy of the pre-recorded samples is more than that of the real-time testing samples. We have also observed that the accuracy of the speaker dependent samples is more than that of the speaker independent samples.
7.2 Future Scopes
One of the key areas where future work can be concentrated is the large vocabulary generation & to improve robustness of speech recognition performance 2.

Another key area of research is focused on an opportunity rather than a problem. This research attempts to take advantage of the fact that in many applications there is a large quantity of speech data available, up to millions of hours. It is too expensive to have humans transcribe such large quantities of speech, so the research focus is on developing new methods ofmachine learning that can effectively utilize large quantities of unlabeled data.
The better understanding of human capabilities and to use this understanding to improve machine recognition performance.

The future work could be towards Online Speech Summarization. The
majority of speech summarization research has focused on extracting the most informative dialogue acts from recorded, archived data.

The future work could be towards minimizing the time required for recognition of numerals using HMM.

Best services for writing your paper according to Trustpilot

Related posts:

You Might Also Like