To address the problem of microphone count variation in complex acoustic environments with microphone arrays, traditional sound source localization methods and single features cannot achieve accurate localization, and they rely heavily on fixed microphone arrays. Once the array structure changes, re-localization is required. To solve this problem, this paper aims to propose a variable-microphone sound source localization method based on cross-domain collaborative features and multi-head graph attention mechanism.
First, time-frequency domain fusion features are obtained using Short-Time Fourier Transform magnitude spectra (STFT), Inter-channel Phase Difference (IPD) and Generalized Cross-Correlation with Phase Transform (GCC-PHAT). These three complementary features jointly provide richer and more stable acoustic cues. Then, k-nearest neighbor (k-NN) is used to construct graph data for the microphone array, capturing spatial relationships among microphones. Finally, a multi-head graph attention mechanism is integrated into the graph neural network to adaptively learn the weights of neighboring nodes, enabling accurate localization even when the microphone topology changes.
Simulation results show that the proposed method can accurately localize multiple sound sources and maintains high localization accuracy and low error even under challenging conditions such as reverberation and noise. Experimental results demonstrate that in real-world indoor and outdoor environments, the model achieves over 86% accuracy in multi-source localization even with damaged microphones, with localization errors within 0.47 meters.
The proposed method achieves accurate and robust multi-source localization in acoustically complex scenarios with varying microphone counts, making it suitable for harsh indoor and outdoor environments and offering a novel approach for advancing sound source localization technology.
