Abstract:
Emotion recognition is a type of computer simulation for human emotion perception process, which is significant in research and applications. Traditional speech recognition systems usually employ a single feature extraction method, which sometimes loses important information from speech emotion signals. Therefore, based on the improved complete ensemble empirical mode decomposition with adaptive noise (ICEEMDAN), a combined multi-feature extraction method to classify semantically independent speech emotion signals is proposed in this paper. ICEEMDAN decomposes one-dimensional speech signals into multiple intrinsic modes, and then extracts characteristics such as energy intensity, average, variance, kurtosis, skewness, center frequency, peak amplitude, permutation entropy from each decomposed mode. Finally, four emotions such as anger, happiness, sadness, and no emotion are classified. The results show that the proposed method achieves an average recognition rate of 91.44% after training with an 8∶2 model of the support vector machine (SVM). It can provide an important reference for speech emotion recognition.