Advanced Search
ZHU Min, JIANG Pengxu, ZHAO Li. Speech emotion recognition based on full convolution recurrent neural networkJ. Technical Acoustics, 2021, 40(5): 645-651. DOI: 10.16300/j.cnki.1000-3630.2021.05.009
Citation: ZHU Min, JIANG Pengxu, ZHAO Li. Speech emotion recognition based on full convolution recurrent neural networkJ. Technical Acoustics, 2021, 40(5): 645-651. DOI: 10.16300/j.cnki.1000-3630.2021.05.009

Speech emotion recognition based on full convolution recurrent neural network

  • Speech emotion recognition is one of the hot research fields of human-computer interaction. However, lack of researches on speech time-frequency information leads to the insufficient depth of exploring emotional information. To better explore the time-frequency related information in speech, a novel fully convolutional recurrent neural network model is proposed, in which, the multi-input parallel model combination method is used to extract features of different functions from two modules. The fully convolutional network (FCN) is used to learn the time-frequency related information in the features of speech spectrogram, and long short-term memory neural network (LTSM) is used to learn the frame-level features of speech to supplement the missing time-dependent information during FCN learning. Finally, the features are fused and classified by classifier. Experiments on two public emotional data sets show the superiority of the proposed algorithm.
  • loading

Catalog

    Turn off MathJax
    Article Contents

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return