Abstract:
To identify the classification and localization of multiple sound events effectively, a convolutional recurrent neural network is proposed, which combines the powerful feature extraction capabilities of the convolutional neural network and the advantage of long-term dependence of text data of the recurrent neural network. Firstly, the sound event signal is received by a circular array and then expanded to increase the universality of the source data; Secondly, the phase spectrum and amplitude spectrum of the sound event are input to the neural network for training and learning. The outputs of the neural network are the classification probability and the regression coordinate position of the sound event. An appropriate threshold is set to judge if the sound event is an active event, and the sound event localization is carried on only when the threshold value of the sound event classification is greater than 0.5, i.e. it is defined as an active event. Finally, three kinds of classification and localization tests are carried out for the convolutional recurrent neural network, and the results prove that the proposed algorithm has high classification and localization performance and strong robustness for the single sound event without interference, echo sound event and mixed sound event. Compared with traditional methods, the classification accuracy is improved by more than 42.29% and the positioning accuracy is improved by 14.09%. Compared with other neural neworks, the classification accuracy and positioning accuracy are improved by 13.61% without significantly increasing the complexity of the algorithm. At the same time, the designed network structure also has the advantage of strong robustness and could be used on the detection of sound event.