In scenarios where reverberation, noise and multiple sound sources coexist, accurately localizing sound sources using a single short-time Fourier transform (STFT) and simple neural networks is challenging. This paper aims to propose a multi-source localization method based on enhanced STFT and a residual attention mechanism.
First, the authors established a multi-source localization signal model in the presence of reverberation and noise. Then, the complementary information from the phase component and magnitude component of the STFT, along with the spectral flux, is used to perform feature fusion of the microphone array signals, effectively capturing the sound source location information embedded in the array signals. Finally, channel attention and residual connections are introduced into the convolutional neural network to learn the mapping relationship between the fused features and sound source positions, thereby localizing the sound sources.
The simulation results demonstrate that the proposed method can accurately localize multiple sound sources, is robust to reverberation and noise and exhibits strong generalization ability for unseen data. Moreover, it outperforms three comparison methods. The experimental results show that the proposed method can localize multiple sound sources with an accuracy greater than 90% in both real outdoor and indoor scenarios. In multi-source localization scenarios, the accuracy is improved by 7.25% compared to using a single STFT.
The method proposed in this paper not only accurately localizes multiple sound sources but also demonstrates robustness to reverberation and noise, and it holds significant importance for the development of sound source localization techniques.
