Sound event classification and localization based on convolutional recurrent neural network

LI Qing; AO Bangqian; YAN Changguo; CHEN Xiaoyu; ZHANG Nanqing

doi:10.16300/j.cnki.1000-3630.22102701

LI Qing, AO Bangqian, YAN Changguo, et al. Sound event classification and localization based on convolutional recurrent neural network[J]. Technical Acoustics, 2025, 44(0): 1-9. DOI: 10.16300/j.cnki.1000-3630.22102701

Citation:

Sound event classification and localization based on convolutional recurrent neural network

Graphical Abstract

Graphical Abstract

Abstract

Abstract

To identify the classification and localization of multiple sound events effectively, a convolutional recurrent neural network (CNN) is proposed, which combines the powerful feature extraction capabilities of the convolutional neural network and the advantage of long-term dependence of text data of the recurrent neural network. Firstly, the sound event signal is received by a circular array and then expanded to increase the universality of the source data. Secondly, the phase spectrum and amplitude spectrum of the sound event are input to the neural network for training and learning. The outputs of the neural network are the classification probability and the regression coordinate position of the sound event. An appropriate threshold is set to judge if the sound event is an active event, and the sound event localization is carried on only when the threshold value of the sound event classification is greater than 0.5, i.e. it is defined as an active event. Finally, three kinds of classification and localization tests are carried out for the convolutional recurrent neural network, and the results prove that the proposed algorithm has high classification and localization performance and strong robustness for the single sound event without interference, echo sound event and mixed sound event. Compared with traditional methods, the classification accuracy is improved by more than 42.29 percentage points and the positioning accuracy is improved by 14.09 percentage points. Compared with other neural neworks, the classification accuracy and positioning accuracy are improved by 13.61 percentage points without significantly increasing the complexity of the algorithm. At the same time, the designed network structure also has the advantage of strong robustness and could be used on the detection of sound event.

FullText(HTML)

References (19)

Cited By

Turn off MathJax

Article Contents

Sound event classification and localization based on convolutional recurrent neural network

Graphical Abstract

Abstract

Catalog

Export File

Citation

Format

Content