Speech Emotion Recognition by Using Combinations of C5.0, Neural Network (nn), and Support Vector Machines (svm) Classification Methods
-
3471
Downloads
-
4787
Views
Authors
Mohammad Masoud Javidi
- Department of Computer Science, Shahid Bahonar University of Kerman, Kerman, Iran.
Ebrahim Fazlizadeh Roshan
- Department of Computer Science, Shahid Bahonar University of Kerman, Kerman, Iran.
Abstract
Speech is the fastest and most natural method for human to communicate. This has led several researches to be done in the field of the interaction effects between human and machine. Hence, it is necessary to design machines which can intelligently recognize the emotion of a human voice. However, we are still far from having a natural interaction between the human and machine because machines cannot distinguish the emotion of the speaker. This has established a new field in the literature, namely the speech emotion recognition systems. The accuracy of these systems depends on various factors such as the number and type of the emotion manners as well as the feature selection and the classifier sort. In this paper, classification methods of the Neural Network (NN), Support Vector Machine (SVM), the combination of NN and SVM (NN-SVM), NN and SVM (NN-SVM), NN and C5.0 (NN-C5.0), C5.0 and SVM (SVM-C5.0), and finally the combination of NN, SVM, and C5.0 (NN-SVM-C5.0) have been verified, and their efficiencies in speech emotion recognition have been compared. The utilized features in this research include energy, power, Zero Crossing Rate (ZCR), pitch, and Mel-scale Frequency Cepstral Coefficients (MFCC). The presented results in this paper demonstrate that using the proposed NN-C5.0 classification method is more efficient in recognizing the emotion states-to the extent of 6%- to 30% depending on the number of emotions states-than SVM, NN, and other aforementioned combinations of classification methods.
Share and Cite
ISRP Style
Mohammad Masoud Javidi, Ebrahim Fazlizadeh Roshan, Speech Emotion Recognition by Using Combinations of C5.0, Neural Network (nn), and Support Vector Machines (svm) Classification Methods, Journal of Mathematics and Computer Science, 6 (2013), no. 3, 191-200
AMA Style
Javidi Mohammad Masoud, Roshan Ebrahim Fazlizadeh, Speech Emotion Recognition by Using Combinations of C5.0, Neural Network (nn), and Support Vector Machines (svm) Classification Methods. J Math Comput SCI-JM. (2013); 6(3):191-200
Chicago/Turabian Style
Javidi, Mohammad Masoud, Roshan, Ebrahim Fazlizadeh. "Speech Emotion Recognition by Using Combinations of C5.0, Neural Network (nn), and Support Vector Machines (svm) Classification Methods." Journal of Mathematics and Computer Science, 6, no. 3 (2013): 191-200
Keywords
- Emotion recognition
- Feature extraction
- Mel-scale Frequency Cepstral Coefficients
- Neural Network
- Support Vector Machines
- C5.0
MSC
References
-
[1]
T. Pao, Y. Chen, J. Yeh, Y. Chang, Emotion recognition and evaluation of mandarin speech using weighted D-KNN classification, Int. Innov. Comput. Info. Control. , 4 (2008), 1695- 1709
-
[2]
H. Altunand, G. Pollat, Boosting selection of speech related features to improve performance of multi-class SVMs in emotion detection, Expert Syst. Appl. , 36 (2009), 8197-8203
-
[3]
M. L. Yang, Emotion recognition from speech signal using new harmony feature, Single Process, 90 (2010), 1415-1423
-
[4]
L. He, M. Lech, N. C. Maddage, N. B. Allen, Study of empirical mode decomposition and spectral analysis for stress and emotion classification in natural speech, BiomedSignal Process Control, 6 (2011), 139-146
-
[5]
M. Sheikhan, M. Bejamin, D. Gharavian, Modular neural-SVM scheme for speech emotion recognition using ANOVA feature for method, Neural Comput&Applic, 215–227 (2012)
-
[6]
B. Schuller, G. Rigoll, M. Lang, Speech emotion recognizing combining acoustic features and linguistic information in a hybrid support vector machine-belief network architecture, in proceeding of the ICASSP, 1 (2004), 397-401
-
[7]
M. Ayadi, M. S. Kamel, F. Karray, Survey on speech emotion recognition: features, classification schemes, and databases, Pattern Recognition, 44 (2011), 572–587
-
[8]
F. Yu, E. Chang, Y. Xu, H. Shum, Emotion detection from speech to enrich multimedia content, In proceedings of the IEEE Pacific Rim conference on multimedia, Advances in multimedia information processing, (2001), 550-557
-
[9]
M. Ayadi, S. Kamel, F. Karray, Speech emotion recognition using Gaussian mixture vector autoregressive models, Inproceeding of the international conference on acoustics, speech, and signal processing, 5 (2007), 957-960
-
[10]
D. Ververidis, C.Kotropoulos, Fast sequential floating forward selection applied to emotional speech features estimated on DES and SUSAS data collection, In proceeding of the European signal processing conference, (2006), 1-5
-
[11]
V. A. Petrushin, Emotion recognition in speech signal: experimental study, development, and application, In proceedings of international conference on spoken language processing, (2000), 222-225
-
[12]
D. Gharavian, M. Sheikhan, M. Pezhmanpour, GMM-based emotion recognition in Farsi language using feature selection algorithms, World ApplSci J , 14 (2011), 626-638
-
[13]
M. Hamidi, M. Mansorizadeh, Emotion recognition from Persian speech with NEURAL NETWORK, International Journal of Artificial Intelligence & Applications (IJAIA), 3, No.5 (2012)
-
[14]
L.He, M. Lech, N. C. Maddage, N. B. Allen, Study of empirical mode decomposition and spectral analysis for stress and emotion classification in natural speech, Biomed Signal Process Control, 6 (2011), 139–146
-
[15]
R. Kohavi, G. H. John, Wrappers for feature subset selection, ArtifIntell, 97 (1997), 273–324
-
[16]
E. Fersini, E. Messina, F. Archetti, Emotional states in judicial courtrooms: an experimental investigation, SpeechCommun , 54 (2012), 11–22
-
[17]
F. Dellaert, T. Polzin, A. Waibel, Recognizing emotion in speech, In proceedings of international conference on spoken language processing, 3 (1996), 1970–1973
-
[18]
W. U. Siqing, H. Tiao, C. Wai-yip, Automatic speech emotion recognition using modulation features, Speech Communication , 53 (2011), 768–785
-
[19]
L. R. Rabiner, R. W. Scheafer, Introduction to Digital Speech Processing , The essence of knowledge, Boston- Delft (2007)
-
[20]
J. R. Deller, J. G. Proakis, Hanson discrete-TIME Processing of Speech Signals, Macmilan, New York (1993)
-
[21]
J. Rong, G. Li, Y. P. Chen, Acoustic feature selection for automatic emotion recognition from speech, Information Processing and Management , 45 (2009), 315–328
-
[22]
M. T. Hagan, H. B. Demuth, M. Beale, Neural networks design, Boston: PWS, (1996)
-
[23]
J. Christopher, C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 2 (1998), 121 – 167
-
[24]
J. Quinlan, Programs for machine learning, Morgan Kaufmann, San Francisco, CA (1993)
-
[25]
Clementine®12.0, , Clementine Modeling Nodes(Chapter 6), (2007)
-
[26]
L. Fausett, Fundamentals of Neural Networks, Prentice-Hall, (1994)
-
[27]
S. Haq, P. J. B. Jackson, J. Edge, Audio-visual feature selection and eduction for emotion classification, In proceedings of international conference on auditory-visual speech processing, (2008), 185–190