全卷积循环神经网络的语音情感识别

朱敏; 姜芃旭; 赵力

doi:10.16300/j.cnki.1000-3630.2021.05.009

全卷积循环神经网络的语音情感识别

Speech emotion recognition based on full convolution recurrent neural network

摘要

摘要: 语音情感识别是人机交互的热门研究领域之一。然而，由于缺乏对语音中时频相关信息的研究，导致情感信息挖掘深度不够。为了更好地挖掘语音中的时频相关信息，提出了一种全卷积循环神经网络模型，采用并行多输入的方式组合不同模型，同时从两个模块中提取不同功能的特征。利用全卷积神经网络（Fully Convolutional Network，FCN）学习语音谱图特征中的时频相关信息，同时，利用长短期记忆（Long Short-Term Memory，LSTM）神经网络来学习语音的帧级特征，以补充模型在FCN学习过程中缺失的时间相关信息，最后，将特征融合后使用分类器进行分类，在两个公开的情感数据集上的测试验证了所提算法的优越性。

Abstract: Speech emotion recognition is one of the hot research fields of human-computer interaction. However, lack of researches on speech time-frequency information leads to the insufficient depth of exploring emotional information. To better explore the time-frequency related information in speech, a novel fully convolutional recurrent neural network model is proposed, in which, the multi-input parallel model combination method is used to extract features of different functions from two modules. The fully convolutional network (FCN) is used to learn the time-frequency related information in the features of speech spectrogram, and long short-term memory neural network (LTSM) is used to learn the frame-level features of speech to supplement the missing time-dependent information during FCN learning. Finally, the features are fused and classified by classifier. Experiments on two public emotional data sets show the superiority of the proposed algorithm.

HTML全文

参考文献(16)

施引文献

资源附件(0)