Speech Emotion Recognition by Using Combinations of C5.0, Neural Network (nn), and Support Vector Machines (svm) Classification Methods

Volume 6, Issue 3, pp 191-200 http://dx.doi.org/10.22436/jmcs.06.03.03

Download PDF

Download XML

3471 Downloads
4787 Views

Authors

Mohammad Masoud Javidi - Department of Computer Science, Shahid Bahonar University of Kerman, Kerman, Iran. Ebrahim Fazlizadeh Roshan - Department of Computer Science, Shahid Bahonar University of Kerman, Kerman, Iran.

Abstract

Speech is the fastest and most natural method for human to communicate. This has led several researches to be done in the field of the interaction effects between human and machine. Hence, it is necessary to design machines which can intelligently recognize the emotion of a human voice. However, we are still far from having a natural interaction between the human and machine because machines cannot distinguish the emotion of the speaker. This has established a new field in the literature, namely the speech emotion recognition systems. The accuracy of these systems depends on various factors such as the number and type of the emotion manners as well as the feature selection and the classifier sort. In this paper, classification methods of the Neural Network (NN), Support Vector Machine (SVM), the combination of NN and SVM (NN-SVM), NN and SVM (NN-SVM), NN and C5.0 (NN-C5.0), C5.0 and SVM (SVM-C5.0), and finally the combination of NN, SVM, and C5.0 (NN-SVM-C5.0) have been verified, and their efficiencies in speech emotion recognition have been compared. The utilized features in this research include energy, power, Zero Crossing Rate (ZCR), pitch, and Mel-scale Frequency Cepstral Coefficients (MFCC). The presented results in this paper demonstrate that using the proposed NN-C5.0 classification method is more efficient in recognizing the emotion states-to the extent of 6%- to 30% depending on the number of emotions states-than SVM, NN, and other aforementioned combinations of classification methods.

Share and Cite

ISRP Style

Mohammad Masoud Javidi, Ebrahim Fazlizadeh Roshan, Speech Emotion Recognition by Using Combinations of C5.0, Neural Network (nn), and Support Vector Machines (svm) Classification Methods, Journal of Mathematics and Computer Science, 6 (2013), no. 3, 191-200

AMA Style

Javidi Mohammad Masoud, Roshan Ebrahim Fazlizadeh, Speech Emotion Recognition by Using Combinations of C5.0, Neural Network (nn), and Support Vector Machines (svm) Classification Methods. J Math Comput SCI-JM. (2013); 6(3):191-200

Chicago/Turabian Style

Javidi, Mohammad Masoud, Roshan, Ebrahim Fazlizadeh. "Speech Emotion Recognition by Using Combinations of C5.0, Neural Network (nn), and Support Vector Machines (svm) Classification Methods." Journal of Mathematics and Computer Science, 6, no. 3 (2013): 191-200

Keywords

Emotion recognition
Feature extraction
Mel-scale Frequency Cepstral Coefficients
Neural Network
Support Vector Machines
C5.0

MSC

68Qxx
68T05
68T10
68T50

References

[1] T. Pao, Y. Chen, J. Yeh, Y. Chang, Emotion recognition and evaluation of mandarin speech using weighted D-KNN classification, Int. Innov. Comput. Info. Control. , 4 (2008), 1695- 1709
- Google Scholar

[2] H. Altunand, G. Pollat, Boosting selection of speech related features to improve performance of multi-class SVMs in emotion detection, Expert Syst. Appl. , 36 (2009), 8197-8203
- View Article
- Google Scholar

[3] M. L. Yang, Emotion recognition from speech signal using new harmony feature, Single Process, 90 (2010), 1415-1423
- View Article
- Google Scholar
- MATH

[4] L. He, M. Lech, N. C. Maddage, N. B. Allen, Study of empirical mode decomposition and spectral analysis for stress and emotion classification in natural speech, BiomedSignal Process Control, 6 (2011), 139-146
- View Article
- Google Scholar

[5] M. Sheikhan, M. Bejamin, D. Gharavian, Modular neural-SVM scheme for speech emotion recognition using ANOVA feature for method, Neural Comput&Applic, 215–227 (2012)
- View Article
- Google Scholar

[6] B. Schuller, G. Rigoll, M. Lang, Speech emotion recognizing combining acoustic features and linguistic information in a hybrid support vector machine-belief network architecture, in proceeding of the ICASSP, 1 (2004), 397-401
- View Article
- Google Scholar

[7] M. Ayadi, M. S. Kamel, F. Karray, Survey on speech emotion recognition: features, classification schemes, and databases, Pattern Recognition, 44 (2011), 572–587
- View Article
- Google Scholar
- MATH

[8] F. Yu, E. Chang, Y. Xu, H. Shum, Emotion detection from speech to enrich multimedia content, In proceedings of the IEEE Pacific Rim conference on multimedia, Advances in multimedia information processing, (2001), 550-557
- View Article
- Google Scholar
- MATH

[9] M. Ayadi, S. Kamel, F. Karray, Speech emotion recognition using Gaussian mixture vector autoregressive models, Inproceeding of the international conference on acoustics, speech, and signal processing, 5 (2007), 957-960
- View Article
- Google Scholar

[10] D. Ververidis, C.Kotropoulos, Fast sequential floating forward selection applied to emotional speech features estimated on DES and SUSAS data collection, In proceeding of the European signal processing conference, (2006), 1-5
- Google Scholar

[11] V. A. Petrushin, Emotion recognition in speech signal: experimental study, development, and application, In proceedings of international conference on spoken language processing, (2000), 222-225
- Google Scholar

[12] D. Gharavian, M. Sheikhan, M. Pezhmanpour, GMM-based emotion recognition in Farsi language using feature selection algorithms, World ApplSci J , 14 (2011), 626-638
- Google Scholar

[13] M. Hamidi, M. Mansorizadeh, Emotion recognition from Persian speech with NEURAL NETWORK, International Journal of Artificial Intelligence & Applications (IJAIA), 3, No.5 (2012)
- View Article
- Google Scholar

[14] L.He, M. Lech, N. C. Maddage, N. B. Allen, Study of empirical mode decomposition and spectral analysis for stress and emotion classification in natural speech, Biomed Signal Process Control, 6 (2011), 139–146
- View Article
- Google Scholar

[15] R. Kohavi, G. H. John, Wrappers for feature subset selection, ArtifIntell, 97 (1997), 273–324
- View Article
- Google Scholar

[16] E. Fersini, E. Messina, F. Archetti, Emotional states in judicial courtrooms: an experimental investigation, SpeechCommun , 54 (2012), 11–22
- View Article
- Google Scholar

[17] F. Dellaert, T. Polzin, A. Waibel, Recognizing emotion in speech, In proceedings of international conference on spoken language processing, 3 (1996), 1970–1973
- Google Scholar

[18] W. U. Siqing, H. Tiao, C. Wai-yip, Automatic speech emotion recognition using modulation features, Speech Communication , 53 (2011), 768–785
- View Article
- Google Scholar

[19] L. R. Rabiner, R. W. Scheafer, Introduction to Digital Speech Processing , The essence of knowledge, Boston- Delft (2007)
- View Article
- Google Scholar

[20] J. R. Deller, J. G. Proakis, Hanson discrete-TIME Processing of Speech Signals, Macmilan, New York (1993)
- Google Scholar

[21] J. Rong, G. Li, Y. P. Chen, Acoustic feature selection for automatic emotion recognition from speech, Information Processing and Management , 45 (2009), 315–328
- View Article
- Google Scholar

[22] M. T. Hagan, H. B. Demuth, M. Beale, Neural networks design, Boston: PWS, (1996)
- Google Scholar

[23] J. Christopher, C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 2 (1998), 121 – 167
- View Article
- Google Scholar

[24] J. Quinlan, Programs for machine learning, Morgan Kaufmann, San Francisco, CA (1993)
- Google Scholar

[25] Clementine®12.0, , Clementine Modeling Nodes(Chapter 6), (2007)
- Google Scholar

[26] L. Fausett, Fundamentals of Neural Networks, Prentice-Hall, (1994)
- Google Scholar

[27] S. Haq, P. J. B. Jackson, J. Edge, Audio-visual feature selection and eduction for emotion classification, In proceedings of international conference on auditory-visual speech processing, (2008), 185–190
- Google Scholar