Abstract:
A child speech emotion recognition technique based on an attention long short-term memory (LSTM) network is propose in this paper. The core idea is to combine acoustic and articulatory features and utilize an attention-based LSTM network for speech emotion recognition. Compared to existing methods in this field, it demonstrates significant innovation. In terms of experimental validation, the proposed method shows a 9.77 percentage point improvement in emotion recognition accuracy through weighted averaging compared to using only acoustic features and an LSTM classifier. These results can serve as valuable references for researchers working on emotion recognition in children's speech.