Repeated Update of Demixing Vectors in Independent Low-rank Matrix Analysis for Better Separation

Nakashima, Taishi; Ono, Nobutaka

doi:10.1561/116.00000080

In this paper, we propose a better update algorithm for independent low-rank matrix analysis (ILRMA). ILRMA has two types of parameters, demixing vectors and non-negative matrix factorization parameters, which are estimated by minimizing the same objective function. Although many extensions of ILRMA have been proposed, the importance of the order of parameter updates in ILRMA is not investigated sufficiently. Because of the observation that iterative projection two (IP2) shows a higher performance than IP1, we propose a repeated update of demixing vectors with the source model fixed in one iteration; this approximates a simultaneous update of all demixing vectors together. We conducted music source separation experiments with more than 100 songs. The results showed that the proposed algorithm with the repeated update of demixing vectors outperforms the conventional ILRMA regarding separation performance and convergence speed.

1 Introduction

Blind source separation (BSS) aims to separate a mixture of multiple signals into individual signals [2]. It has been applied in a vast number of applications, such as multichannel audio processing [11], music information retrieval [1], wireless communication systems [10], soundscape information retrieval [9], and brain imaging [29].

Independent vector analysis (IVA) considers frequency-wise correlations and avoids permutation ambiguity by assuming a multivariate distribution of sources [3, 5]. The demixing matrices of IVA are estimated by updating with the gradient descent method, which is sensitive to a step-size parameter and may be unstable. Auxiliary-function-based IVA (AuxIVA) has been proposed to improve convergence and performance [21]. AuxIVA estimates demixing matrices faster and more stably than the conventional IVA with no tuning parameters using the majorization-minimization algorithm [8]. Some extensions of AuxIVA for online/real-time processing [27] and its application to hearing aids [26] have been proposed. To further improve the source model, AuxIVA was extended to independent low-rank matrix analysis (ILRMA) [7] by exploiting a low-rank source model of nonnegative matrix factorization. ILRMA models the spectrograms of separated sources more accurately and thus achieves a higher separation performance than AuxIVA. Many extensions to other source models of ILRMA were proposed [6, 14, 15]. Update rules of demixing matrices and source models have been extended from IVA to ILRMA.

Update rules of demixing matrices of AuxIVA have been extensively studied in recent years. Iterative projection (IP) was initially proposed for AuxIVA and updates each row vector of a demixing matrix (demixing vector) per iteration [21]. The pairwise update rule called iterative projection 2 (IP2) was also proposed and updates two demixing vectors simultaneously in each iteration [20]. We refer to the original IP as IP1 to distinguish it from IP2 in this paper. As an alternative approach, iterative source steering (ISS), which updates the entire demixing matrix using elementary row operations, has been proposed [24]. ISS can eliminate matrix inversions and thus updates more rapidly while keeping the same separation performance as IP1. All these update methods are derived by minimizing the likelihood function of the input mixture signal, leading to a system of quadratic equations called hybrid exact-approximate joint diagonalization (HEAD) [30, 31] (also known as “sequentially drilled” joint congruence (SeDJoCo) [28]). When the number of sources is three or more, HEAD is still an open problem. IP2 can yield a global optimal solution of HEAD only for two sources by solving a generalized eigenvalue problem. IP1, IP2, and ISS are available for ILRMA because ILRMA includes AuxIVA as a special case [7]. The original ILRMA employs IP1 to update demixing vectors. ILRMA with IP2 or ISS has also been proposed [17, 18]. Interestingly, it has been reported that IVA with IP2 [19] and ILRMA with IP2 [18] showed a higher performance than those with IP1, even using the same source model. These results imply that how to update the parameters also contributes the better separation performance. Similarly, it has been shown that a slower updating of source model parameters tends to improve the separation performance [13].

Motivated by these studies, we here focus on the parameter updates in ILRMA. The contribution of this study is summarized as follows:

We propose a new algorithm of ILRMA named ILRMA-Rep. It repeatedly updates the demixing vectors by fixing the source model at each iteration. It works as approximately updating all the demixing vectors at once.
We derive faster update rules of ILRMA-Rep based on the matrix inversion lemma (MIL). Especially the update rule based on IP2 with MIL, which we call IP2-MIL, is new and applicable to other BSS methods, such as online AuxIVA.
We conducted simulated separation experiments with 144 music signals to validate the efficacy of our proposed method. The results show that the proposed methods improve the separation performance.

The rest of this paper is organized as follows. We briefly summarize the background of the conventional ILRMA and its update rules in Section 2. Section 3 describes ILRMA with repeated updates of demixing vectors (ILRMA-Rep), the proposed ILRMA updates with repeated updates of demixing vectors. We evaluate the performance of ILRMA-Rep and compare it with that of the conventional ILRMA with IP1, IP2, and ISS in Section 4. Finally, Section 5 concludes this paper.

2 Overview of Problem Formulation and Conventional ILRMA

2.1 Notations

In the rest of this paper, we use lower- and uppercase bold symbols for vectors and matrices, respectively. Lowercase normal symbols denote vector entries or scalars. For example, x_kft denotes the kth entry of a vector x_ft. The transpose and the conjugate transpose of a vector and a matrix are respectively denoted as (·)^⊤ and $(\cdot) H$ ⁠. We denote ℂ and ℝ₊ as the sets of all complex numbers and non-negative real numbers, respectively. Unless specified otherwise, indices f, t, and m always take the ranges from 1 to F, T, and M, respectively. Also, we omit the bounds of sums or products over these indices when they span the ranges. For example, Σ_k is the sum over all k in the range of k = 1,…,K, and Σ_ft is the double-sum over f = 1,…, F and t = 1,…, T. Similarly, {W_f}_f denotes the set of W_f for all f ; {x_kft}_kft denotes the set of x_kft for all k, f, and t, for example.

2.2 Signal Model

We formulate the short-time Fourier transform (STFT) domain BSS as

x_{f t} = A_{f} s_{f t},

(1)

where x_ft∈ ℂ^M denotes the observed mixture signals recorded by M microphones and s_ft∈ ℂ^K denotes the source signals of K sources, respectively, at frequency f and time t. Here, A_f = [α_1f… α_Kf] ∈ ℂ^M×K is the mixing matrix whose kth column vector α_Kf corresponds to the steering vector from source k to each microphne. The goal of BSS is to estimate the frequency-wise demixing matrices,

W_{f} = {[\begin{array}{l} w_{1 f} & \dots & w_{M f} \end{array}]}^{H} \in C^{K \times M},

(2)

such that the estimated source is

y_{f t} = W_{f} x_{f t},

(3)

given only mixture signals x_ft, where y_ft∈ ℂ^K denotes the estimated signals. The kth row vector of the demixing matrix w_kf is called the demixing vector for source k. We henceforth consider the determined situation, i.e., the number of sources equals that of microphones M = K.

2.3 Independent Low-rank Matrix Analysis

We briefly summarize ILRMA as a maximum likelihood estimation problem. The likelihood function is derived with the following assumptions.

All the sources are statistically independent; their joint distribution is the product of their individual distributions.
The estimated signals follow this zero-mean multivariate complex Gaussian distribution,
$p_{y} (y_{k f t}) = \frac{1}{π r_{k f t}} \exp (- \frac{{| y_{k f t} |}^{2}}{r_{k f t}}),$
(4)
where r_kft ∈ ℝ₊ is called the variance of the distribution corresponding to the kth source in a time-frequency point f and t.
The variance r_kft is decomposed into L parts as
$r_{k f t} = \sum_{l = 1}^{L} b_{k f l} c_{k l t},$
(5)
where b_kft and c_klt ∈ ℝ₊ are called the basis and activity coefficients, respectively. Henceforth, we call these coefficients the source model parameters.

The kth estimated signal is denoted as $y_{k f t} = w_{k f}^{H} x_{f t}$ from (3). By using these assumptions, we can determine the likelihood of the observation as

ℒ (W) = \underset{f t}{\prod^{​}} p_{x} (x_{f t})

(6)

= \underset{f t}{\prod^{​}} p_{y} (y_{f t}) {| \det W_{f} |}^{2}

(7)

= \underset{f t}{\prod^{​}} (\underset{k}{\prod^{​}} p_{y} (y_{k f t})) {| \det W_{f} |}^{2},

(8)

where p_x is the probability density function of the observed signals and 𝒲 is the set defined as 𝒲 = {W_f}_f. The determinant term is the Jacobian that comes from the change of variable. The goal of ILRMA is to estimate W_f by minimizing the following function:

ℒ^{+} (W, ℬ, C) = \underset{f t}{\sum^{​}} [\underset{k}{\sum^{​}} (\frac{{| w_{k f}^{H} x_{f t} |}^{2}}{r_{k f t}} + \log r_{k f t}) - \log {| \det W_{f} |}^{2}] + const .,

(9)

where ℬ and 𝒞 are the sets defined as {b_kfl}_kfl and {c_klt}_klt, respectively. The aim of ILRMA is to estimate the demixing vectors w_kf and the source model parameters b_kfl,_cklt that minimize (9) with only the observed mixture x_ft.

2.4 Update of Source Model Parameters

We can derive the following multiplicative update rules of the source model parameters by applying the auxiliary-function method to (9) [7]:

b_{k f l} \leftarrow b_{k f l} \cdot \frac{\sum_{t} {| y_{k f t} |}^{2} c_{k l t} {(\sum_{i} b_{k f i} c_{k i t})}^{- 2}}{\sum_{t} c_{k l t} {(\sum_{i} b_{k f i} c_{k i t})}^{- 1}},

(10)

c_{k l t} \leftarrow c_{k l t} \cdot \frac{\sum_{f} {| y_{k f t} |}^{2} b_{k f l} {(\sum_{i} b_{k f i} c_{k i t})}^{- 2}}{\sum_{f} b_{k f l} {(\sum_{i} b_{k f i} c_{k i t})}^{- 1}} .

(11)

These update rules guarantee the convergence of (9).

2.5 Update of Demixing Vectors

We summarized the well-used update rules of demixing vectors for AuxIVA and ILRMA, which is based on our proposed method in the next section.

2.5.1 Sequential Update of Demixing Vector: Iterative Projection One (IP1)

We calculate the derivative of (9) with respect to w_kf (k = 1, …, K) to derive the update rules of demixing vectors. Then, we obtain the following system of quadratic equations:

w_{m f}^{H} U_{k f} w_{k f} = {\begin{array}{l} 0 & (m \neq k) \\ 1 & (m = k) \end{array} (m, k = 1, \dots, K),

(12)

where U_kf ∈ ℂ^K×K is called the weighted covariance matrix defined as

U_{k f} = \frac{1}{T} \sum_{t} \frac{x_{f t} x_{f t}^{H}}{r_{k f t}} .

This system of quadratic equations is called hybrid exact-approximate joint diagonalization (HEAD) [28, 30, 31]. When K = 2, the closed-form solution of (12) can be derived by solving a generalized eigenvalue problem [19, 25]. However, for K ≥ 3, the closed-form solution of the HEAD problem has not yet been found. Instead, we minimize the objective function (9) with respect to only one demixing vector w_kf while keeping the others fixed. The resulting update rule is given by

w_{k f} \leftarrow U_{k f}^{- 1} W_{f}^{- 1} e_{k},

(13)

w_{k f} \leftarrow \frac{w_{k f}}{\sqrt{w_{k f}^{H} U_{k f} w_{k f}}} .

(14)

It is referred to as IP1 [21]. Algorithm 1 summarizes the conventional ILRMA with IP1.

2.5.2 Pairwise Update Rules of Demixing Vectors: Iterative Projection Two (IP2)

The closed-form solution of HEAD is available for two sources by solving the following generalized eigenvalue problem:

U_{2 f} w_{k f} = λ_{k f} U_{1 f} w_{k f} (k = 1, 2),

(15)

where λ_1f and λ_2f (λ_1f ≥ λ_2f) are the corresponding eigenvalues of w_1f and w_2f. Therefore, the two demixing vectors w_1f, w_2f can be updated simultaneously [20, 32].

Algorithm 1

Mathematical formula illustrating the calculation of the number of particles in a given system.

View large Download slide

Conventional ILRMA with IP1.

For K ≥ 3, the joint update method for two demixing vectors in the case of three or more sources has been proposed [19]. This method was originally proposed for AuxIVA and achieved a higher separation performance with fewer iteration steps. We obtain the following system of 2K quadratic equations by calculating ∂ℒ+ /∂w_mf = 0 and ∂ℒ+ / ∂w_nf = 0 for all k ≠ m, n and m ≠ n:

w_{m f}^{H} U_{m f} w_{m f} = 1, w_{m f}^{H} U_{n f} w_{n f} = 0,

(16)

w_{n f}^{H} U_{m f} w_{m f} = 0, w_{n f}^{H} U_{n f} w_{n f} = 1,

(17)

w_{k f}^{H} U_{m f} w_{m f} = 0, w_{k f}^{H} U_{n f} w_{n f} = 0.

(18)

The pairwise update rule of w_𝓁 (𝓁 = m, n) is derived by solving (16)-(18) as follows [19];

P_{ℓ f} \leftarrow U_{ℓ f}^{- 1} W_{f}^{- 1} [\begin{matrix} e_{m} & e_{n} \end{matrix}],

(19)

Z_{ℓ f} \leftarrow P_{ℓ f}^{H} U_{ℓ f} P_{ℓ f},

(20)

w_{ℓ f} \leftarrow \frac{P_{ℓ f} z_{ℓ f}}{\sqrt{z_{ℓ f}^{H} Z_{ℓ f} z_{ℓ f}}},

(21)

where z_𝓁f is the eigenvector of Z_𝓁f (see [20] for details). Note that the choice of the two indices m and n is arbitrary as long as m ≠ n.

2.5.3 Iterative Source Steering (ISS)

Instead of estimating the demixing vector w_kf in IP1, ISS [24] updates the entire demixing matrix by estimating a new vector v_kf = [v_1kf… v_Kkf]^⊤,

W_{f} \leftarrow W_{f} - v_{k f} w_{k f}^{H},

(22)

where the update rule that minimizes (9) with respect to v_mkf is given by

v_{m k f} = {\begin{array}{l} 1 - {(w_{k f}^{H} U_{k f} w_{k f})}^{- \frac{1}{2}} & (m = k), \\ \frac{w_{m f}^{H} U_{m f} w_{k f}}{w_{k f}^{H} U_{m f} w_{k f}} & (m \neq k) . \end{array}

(23)

Furthermore, from the demixing model $y_{k f t} = w_{k f}^{H} x_{f t}$ ⁠, the following are the inverse-free update rules of v_kf and the output estimated signal y_ft:

v_{m k f} = \frac{\sum_{t} \frac{y_{m f t} y_{k f t}^{*}}{r_{m f t}}}{\sum_{t} \frac{{| y_{k f t} |}^{2}}{r_{m f t}}},

(24)

y_{f t} \leftarrow y_{f t} - v_{k f} y_{k f t} .

(25)

3 Repeated Update of Demixing Vectors for ILRMA

3.1 Our Motivation and Approach

IVA with IP2 [19] and ILRMA with IP2 [18] showed a higher performance than those with IP1. These results imply that the performance will be further improved if more demixing vectors, ideally all of them, are updated simultaneously such as

W_{f} \leftarrow \arg \min_{W_{f}} ℒ^{+} (W, ℬ, C) .

(26)

However, it leads to the HEAD problem and no closed-form solution has yet been found for K ≥ 3 as mentioned in the previous section.

Instead of solving the HEAD problem in a closed-form manner, we propose an “approximately” simultaneous update of demixing vectors by simply repeating the demixing vector updates with IP1, IP2, or ISS several times while keeping the source model parameters fixed. Since each update reduces the objective function of (9), it is expected that the demixing matrix converges the solution of the HEAD problem by repeating the demixing matrix update. We can see this as follows. From (12), we rewrite the HEAD problem as

W_{f} [\begin{array}{l} U_{1 f} w_{1 f} & \dots & U_{K f} w_{K f} \end{array}] = E_{K},

(27)

where Ε_κ is the K × K identity matrix. Therefore, we can check how close the demixing matrix is to the solution of HEAD by visualizing the left side of (27). Figure 1 shows one example of convergence to the solution of HEAD by iteration steps. Each colormap shows the element-wise absolute value of the left-hand side of (27). The initial demixing matrices were set to the identity matrices. The weighted covariance matrices were set to the random Hermitian matrices. As shown in Figure 1, the repeated updates of demixing vectors considerably improve the estimated solution of HEAD. This example implies that a set of repeated updates of demixing vectors can work as a simultaneous update of demixing vectors.

Figure 1

A table shows three methods, I P 1, I P 2, and I S S, across five columns representing different Iterations: 1, 2, 5, 10, and 20.

View large Download slide

Convergence of demixing vectors to the solution of HEAD by repeated updates. The ideal solution is the identity matrix; only the diagonal elements are bright, and the others are dark. This figure shows an example of convergence for four sources. Each column shows element-wise absolute values of the demixing matrices after applying each update method.

The outline of the proposed procedures is summarized in Algorithm 2. We call this algorithm ILRMA-Rep. Figure 2 illustrates how the demixing vectors and the source model parameters are updated.

3.2 Efficient Algorithm for Repeated Update

Updates of demixing matrices with IP1 or IP2 include a matrix inversion. It could cause a large computational complexity for repeating the update of demixing vectors. However, the matrix inversion lemma (MIL) is available to reduce complexities because the covariance matrices are fixed within each “repeat.” In this section, we derive an efficient algorithm for the repeated update of demixing vectors.

Algorithm 2

A table displaying the phrase “the number of elements in a set” prominently at the center.

View large Download slide

Proposed ILRMA-Rep.

Figure 2

A block diagram shows the operation of two I L R M A Independent Low Rank Matrix Analysis methods.

View large Download slide

The diagram between conventional ILRMA (a) and proposed ILRMA-Rep (b) where K = 3, for example. Blight boxes in each subfigure indicate the update of corresponding parameters.

MIL states that:

{(X + Ψ Ω)}^{- 1} = X^{- 1} - X^{- 1} Ψ {(E_{J} + Ω X^{- 1} Ψ)}^{- 1} Ω X^{- 1},

(28)

where X is an invertible square matrix of shape K × K, Ψ and Ω are matrices of shape K × J and J × K (J ≤ K) respectively, and E_J is the J × J identity matrix.

Let us consider avoiding the matrix inversion of IP1 included in (13). Let $A_{f} = W_{f}^{- 1}$ and a_kf= A_fe_k, if we update A_f together with W_f, (13) can be easily calculated as ${\hat{w}}_{k f} = U_{k f}^{- 1} a_{k f}$ where ${\hat{w}}_{k f}$ denote the left side of (13). Then, the point is how to update A_f without the matrix inversion. We rewrite (13) as

W_{f} \leftarrow W_{f} + e_{k} {({\hat{w}}_{k f} - {\hat{w}}_{k f})}^{H},

(29)

Then, by using the matrix inversion lemma with Ψ ← e_k and $Ω \leftarrow ({\hat{w}}_{k f} - {w_{k f})}^{H}$ in (28), we obtain the update rule of A_f without matrix inversion. Finally, the update rules without the matrix inversion are summarized as follows.

{\hat{w}}_{k f} \leftarrow U_{k f}^{- 1} a_{k f},

(30)

{\hat{w}}_{k f} \leftarrow \frac{{\hat{w}}_{k f}}{\sqrt{{\hat{w}}_{k f}^{H} U_{k f} {\hat{w}}_{k f}}},

(31)

d_{k f} : = {\hat{w}}_{k f} - w_{k f}

(32)

A_{f} \leftarrow A_{f} - \frac{a_{k f} (d_{k f}^{H} A_{f})}{1 + d_{k f}^{H} a_{k f}},

(33)

W_{f} \leftarrow W_{f} + e_{k} d_{k f}^{H} .

(34)

These update rules were first derived in the context of online BSS [27]. Henceforth, we call this version IP1-MIL.

Likewise, by setting Ψ ← [e_me_n] and $Ω \leftarrow {[({\hat{w}}_{m f} - w_{m f}) ({\hat{w}}_{n f} - w_{n f})]}^{H}$ in (28), the efficient update rule of IP2 is given by

P_{ℓ f} \leftarrow U_{ℓ f}^{- 1} [a_{ℓ f} a_{n f}] (ℓ = m, n),

(35)

Z_{ℓ f} \leftarrow P_{ℓ f}^{H} U_{ℓ f} P_{ℓ f} (ℓ = m, n),

(36)

{\hat{w}}_{ℓ f} \leftarrow \frac{P_{ℓ f} z_{ℓ f}}{\sqrt{z_{ℓ f}^{H} Z_{ℓ f} z_{ℓ f}}}, (ℓ = m, n),

(37)

d_{ℓ f} : = {\hat{w}}_{ℓ f} - w_{ℓ f} (ℓ = m, n),

(38)

G : = E_{2} + [\begin{array}{l} d_{m f}^{H} \\ d_{n f}^{H} \end{array}] [\begin{matrix} a_{m f} a_{n f} \end{matrix}],

(39)

A_{f} \leftarrow A_{f} - [\begin{matrix} a_{m f} & a_{n f} \end{matrix}] G^{- 1} ([\begin{matrix} d_{m f}^{H} \\ d_{n f}^{H} \end{matrix}] A_{f}),

(40)

W_{f} \leftarrow W_{f} + e_{m} d_{m f}^{H} + e_{n} d_{n f}^{H},

(41)

where E₂ is the 2 × 2 identity matrix. Note that G is a matrix of shape 2 × 2 and thus the cost of its inversion is much less than those of W_f and U_kf. To the best of our knowledge, this is the first derivation of IP2 with the MIL. Henceforth, we call this version IP2-MIL.

3.3 Complexity Analysis

This subsection outlines the time complexity of the demixing matrix update method for each iteration and frequency bin. For IP1 and IP2, the calculation of (W_fU_kf)⁻¹ in (13) and (19), respectively, takes O(K³) time at each source k. For ISS by (23), the calculation of v_kf takes O(K³) time. Therefore, the total time complexity of updating the demixing matrix W_f for all sources k = 1, …, K is O(K⁴) with these methods. In ILRMA-Rep, the covariance matrices U_kf are fixed, and the repeat of demixing matrix updates takes O(N_repK⁴) time, where N_rep is the number of repeats. The use of MIL allows for efficient updating of W⁻¹. More specifically, O(K⁴) can be computed for the first repeat only, and O(K³) for subsequent (N_rep − 1) repeats. Therefore, the total time complexity of IP1-MIL and IP2-MIL for all the sources takes the time complexity of O(N_repK³+ K⁴). Note that MIL cannot be applied to ISS because W⁻¹ is not calculated in ISS. In addition, we have an alternative for ISS using (24) instead of (23). As for ISS by (24), the calculation of v_kf takes O(TK) time, and the total time complexity of updating the demixing matrix W_f for all sources k = 1,…, K is O(TK²). From this, ILRMA-Rep using ISS by (24) takes O(N_repTK²) in total. Repeating the update by (24) should take more time than ISS by (23) because T ≫ K² in most cases for normal BSS. Therefore, we hereafter use (23) for ISS update in this paper. Table 1 summarizes the complexities of the update rules.

Table 1

Complexities per each iteration and frequency bin.

Method	Complexity w_kf or v_kf (∀k)
Method	U_kf (∀k)	w_kf or v_kf (∀k)
IP1, IP2	O(TK³)	O(N_repK⁴)
IP1-MIL, IP2-MIL	O(TK³)	O(N_repK³+ K⁴)
ISS by (24)	O(TK³)	O(N_repK⁴)
ISS by (25)		O(N_repTK²)

4 Experimental Validation

4.1 Solving the HEAD Problem

4.I.I Runtime Comparison

Since the runtime performance may differ from theoretical complexity, as described in Section 3.3, we first compare the runtime performance characteristics of the different methods. This simulation was run using Python language on a workstation powered by a 96-core AMD EPYC 7643 processor at 2.3 GHz. We generate sets of K random Hermitian matrices U_kf (k = 1, …, K) by the following procedure:

Generate random complex values
$\begin{array}{l} y_{k f t}, z_{k f t} \sim N (0, 1) & (\forall k, f, t), \\ x_{k f t} : = y_{k f t} + j z_{k f t} & (\forall k, f, t) . \end{array}$
Generate random observed signals as x_ft := [x_1ft … x_Kft]^⊤ (∀f, t).
Calculate random covariance matrices as
$U_{k f} \leftarrow \frac{1}{T} \sum_{t} \frac{x_{f t} x_{f t}^{H}}{\sqrt{\sum_{f} {| x_{k f t} |}^{2}}} (\forall k, f) .$

The initial demixing matrices were set to the identity matrix. We set the number of time frame T to 100 and the number of frequency bins F to 257. For ISS, we update the demixing matrices by (23) in this experiment because T is large. We ran 20 iterations of IP1, IP1-MIL, IP2, IP2-MIL, and ISS by changing K = 2, 4, 6, 8, 10, 12, 14,16. The covariance matrices are fixed in each K. Table 2 shows the average runtimes for IP1, IP1-MIL, IP2, IP2-MIL, and ISS over 100 trials.

Table 2

Runtimes (ms) with and without the matrix inversion lemma (MIL).

Methods	Number of channels
Methods	2	4	6	8	10	12	14	16
IP1	5.7	16.9	40.7	79.5	151.1	222.3	342.6	503.6
IP1-MIL	4.8	14.5	32.6	62.6	112.6	180.8	312.6	465.0
IP2	12.4	33.8	68.3	122.9	202.4	302.0	461.7	720.3
IP2-MIL	13.6	32.1	55.2	90.0	135.4	188.5	283.1	501.2
ISS	4.4	16.4	45.2	84.8	174.1	265.2	511.6	793.1

Methods	Number of channels
Methods	2	4	6	8	10	12	14	16
IP1	5.7	16.9	40.7	79.5	151.1	222.3	342.6	503.6
IP1-MIL	4.8	14.5	32.6	62.6	112.6	180.8	312.6	465.0
IP2	12.4	33.8	68.3	122.9	202.4	302.0	461.7	720.3
IP2-MIL	13.6	32.1	55.2	90.0	135.4	188.5	283.1	501.2
ISS	4.4	16.4	45.2	84.8	174.1	265.2	511.6	793.1

We can see that the fastest algorithm depends on the number of channels. For example, ISS is the best in K = 2, but IP1-MIL is the best in K = 6. The table also shows that IP1-MIL and IP2-MIL were almost consistently faster than IP1 and IP2, except for IP2 and IP2-MIL in K = 2. This exception might be because computing temporal variables in IP2-MIL take more time than the matrix inversion when K = 2. In the following evaluation of separation performance, IP1-MIL and IP2-MIL will respectively be referred to as IP1 and IP2 because the resulting outputs are identical to those of the original IP1 and IP2.

4.2 Blind Separation of Music Mixtures

4.2.1 Evaluation Criteria

We evaluated the separation and convergence performance characteristics of our proposed ILRMA-Rep with convolutive music mixtures. The separation performance criterion is the scale-invariant source-to-distortion ratio (SI-SDR) [S.], defined as

SI - S D R (s, \hat{s}) = 10 \log_{10} \frac{‖ α s ‖^{2}}{‖ α s - \hat{s} ‖^{2}},

(42)

α = \frac{{\hat{s}}^{⊤} s}{‖ s ‖^{2}},

(43)

where s ∈ ℝ^L is the reference signal and ŝ∈ ℝ^L is the estimated signal in the discrete time domain of length L. A higher SI-SDR indicates higher separation performance and quality. We also measured SI-SDR improvement (SI-SDRi) defined as the difference in SI-SDR between the estimated and mixture signals:

SI - S D R i (s, \hat{s}) = SI - S D R (s, \hat{s}) - SI - S D R (s, x)

(44)

where x∈ ℝ^L is the mixture signal in the discrete time domain. In the following experiment, we use the individual reverberant signals at each microphone as reference signals.

4.2.2 Dataset and Simulation Setup

We used the MUSDB18-HQ dataset [22] as a source signal, including 150 stereo-recorded songs. Each song in the dataset consists of four parts: bass, drums, other, and vocals. To perform the image-source method and reduce the computational load, we extracted the left channel of each signal and then downsampled each signal by 16 kHz from the original 44.1 kHz. We created simulated convolutive mixtures using the Python package pyroomacoustics [23]. As shown in Figure 3, the number of sound sources and microphones was four, and the microphone array was uniformly linear with a spacing of 2 cm. The reverberation time was approximately 200 ms.

Figure 3

A scatter plot shows the spatial arrangement of four labeled sound sources and an array of microphones.

View large Download slide

Room geometry and locations of sources and microphone array.

4.2.3 Experimental Conditions

We use the STFT with a 4096-points Hamming analysis window and half-overlap. We set the initial demixing matrices W_f to the identity matrices for all f. Also, we set the initial values of the source models b_kfi to 1 and c_klt to uniformly distributed values over ]0, 1] for all l, k, f, and t. We performed ILRMA-Rep with 100 iteration steps and several “repeates” N_rep in each iteration. After separation, the scale of the estimated signal was restored by back-projection onto the first microphone [16]. Then, we calculated the SI-SDRi for each music mixture and obtained its average along all the songs and channels.

4.2.4 Implementation Notes

We implemented all the algorithms in Python 3.9.6. For numerical stability, we performed the following regularization after the respective updates in each iteration:

b_{k f l} \leftarrow \max (b_{k f l}, ε), c_{k l t} \leftarrow \max (c_{k l t}, ε),

(45)

where ε is a user-defined parameter. We set ε to 10⁻⁶ in this experiment.

4.2.5 Results and Discussion

Figure 4 shows the separation performance at the end of the 100 iterations for the number of bases L =1, 2, 5, 10, and 20. Note that the conventional ILRMA corresponds to N_rep = 1 in the brightest color bar in Figure 4, and the case of L = 1 in the leftmost plot in Figure 4 is nearly equivalent to that of AuxIVA.

Figure 4

A bar graph titled S I dash S D R I d B displays Source to Interference dash Signal Distortion Ratio Improvement values on the Y axis from 0 to 5 point 5, across 5 subplots for different values of L, representing the number of iterations for spatial filtering.

View large Download slide

Averaged SI-SDR improvements over 144 samples and four channels at the end of 100 iterations, with various numbers of repeats and bases.

For L =1, the SI-SDRi of all methods did not change markedly with increasing N_rep. For L = 2 or larger, SI-SDRi was steadily increased by N_rep in each L. These results support that performance can be improved by simply repeating the updates of demixing matrices.

One possible explanation for these results would be as follows. ILRMA is a difficult optimization problem in terms of two different types of parameters: demixing matrices and source model parameters (NMF parameters). It is a non-convex optimization, and then it should have local minima. For example, if the source model is optimized too fast, it may fit mixed (or insufficiently-separated) signals. Then, the demixing matrices are no longer updated, and a poor result might be obtained. Previous studies suggest that quickly updating the demixing matrices [18, 19] or slowly updating the source model parameters [13] leads to better separation performance. Our proposed method repeatedly updates the demixing matrices by fixing the source model parameters. It works by updating the demixing matrices faster than the source model parameters and thus improves the separation performance.

Figure 5 shows the relationship between SI-SDRi at the end of the iteration with N_rep = 1 and N_rep ≥ 2. Each point corresponds to the value before becoming the average in Figure 4. The dashed line in the figure is the line where the vertical and horizontal axes are equal; the points above the dashed line indicate that repeats of the demixing vector update improve SI-SDRi. For L =1, the points are clustered near the dashed line for any number of repetitions. By contrast, when L = 2 or larger, more points are clustered above the dashed line, and this tendency is stronger with a larger number of iterations. This result supports the observation that the separation performance improves with repeated updates of the demixing matrix.

Figure 5

A collection of four rows of scatter plots shows S I dash S D R I d B values for N rep equal to 2, 4, 10, and 20 on the Y axis, plotted against S I dash S D R I d B for N rep equal to 1 on the X axis, with both axes ranging from minus 5 to 15.

View large Download slide

Scatter plots of SI-SDR improvements with and without repeats for each number of bases L.

Figure 6

A collection of 15 line graphs arranged in 5 rows labeled a L equals 1 to e L equals 20, with 3 columns for Method equals I P 1, Method equals I P 2, and Method equals I S S 1.

View large Download slide

Averaged SI-SDR improvements over 144 samples and four channels along with iteration steps, with various numbers of repeats and bases. Note that vertical axes are aligned for each row, but not for different rows.

Figure 6 shows the separation performance with the number of iteration steps for the number of bases L = 1, 2, 5, 10, and 20. For L =1, the convergence became faster as N_rep increased, but the final performance was almost the same, approximately 3dB. By contrast, for L = 2 or larger, both the convergence speed and the final performance became much higher than those for L = 1.

5 Conclusion

In this paper, we proposed a repeated update scheme of demixing vectors for independent low-rank matrix analysis. We also derived an efficient update rule for the repeated version of IP2 by applying the matrix inversion lemma. Music source separation experiments with more than 100 songs were performed to evaluate the separation performance. The experimental results implied that the proposed update scheme improved the separation performance. In particular, we experimentally found that when the number of bases is large, the separation performance is markedly improved by repeating parameter updates. In future work, we will further investigate the performance of repeated updates of demixing vectors on other source models using IP1, e.g., in independent deeply learned matrix analysis [12] or the multichannel variational autoencoder method [4].

Biographies

Taishi Nakashima received his B.E. in Engineering from Osaka University, Osaka, Japan, in 2019 and his M.S. in Informatics from Tokyo Metropolitan University, Tokyo, Japan, in 2021. He is pursuing a Ph.D. at Tokyo Metropolitan University and is also a recipient of the JSPS Research Fellowship (DC1) from April 2021. He is an esteemed Student Member of the Acoustical Society of Japan (ASJ) and the IEEE Signal Processing Society (SPS). He received the 24th Best Student Presentation Award of ASJ and the 16th IEEE SPS Japan Student Conference Paper Award in 2022. His research interests primarily focus on blind source separation and acoustic signal processing.

Nobutaka Ono received his B.E., M.S., and Ph.D. degrees from the University of Tokyo, Japan, in 1996, 1998, and 2001, respectively. He became a research associate in 2001 and a lecturer in 2005 at the University of Tokyo. He moved to the National Institute of Informatics in 2011 as an associate professor and then to Tokyo Metropolitan University in 2017 as a full professor. His research interests include acoustic signal processing, especially microphone array processing, source localization and separation, machine learning, and optimization algorithms. He is a member of IEEE, EURASIP, APSIPA, IPSJ, IEICE, and ASJ. He was a member of IEEE Audio and Acoustic Signal Processing (AASP) Technical Committee from 2014 to 2019. He served as Associate Editor of IEEE Transactions on Audio, Speech, and Language Processing from 2012 to 2015. He received the best paper award at APSIPA ASC in 2018 and 2021 and Sadaoki Furui Prize Paper Award from APSIPA in 2021.

References

[1]

E.

Cano

,

D.

FitzGerald

,

A.

Liutkus

,

M. D.

Plumbley

, and

F.-R.

Stöter

, “

Musical Söurce Separation: An Introduction

,”

IEEE Signal Processing Magazine

,

36

(

1

),

2019

,

31

–

40

, DOI:

https://doi.org/10.1109/MSP.2018.2874719

.

Google Scholar

Crossref

[2]

P.

Comon

and

C.

Jutten

,

Handbook of Blind Source Separation: Independent Component Analysis and Applications

, (1st) edition,

USA

:

Academic Press, Inc.

,

2010

.

Google Scholar

[3]

A.

Hiroe

, “

Solution of Permutation Problem in Frequency Domain ICA, Using Multivariate Probability Density Functions

,” in

Proc. ICA

,

2006

,

601

–

608

.

[4]

H.

Kameoka

,

L.

Li

,

S.

Inoue

, and

S.

Makino

, “

Supervised Determined Source Separation with Multichannel Variational Autoencoder

,”

Neural Computation

,

31

(

9

),

2019

,

1891

–

1914

, DOI:

https://doi.org/10.1162/neco_a_01217

.

Google Scholar

Crossref

PubMed

[5]

T.

Kim

,

H. T.

Attias

,

S.-Y.

Lee

, and

T.-W.

Lee

, “

Blind Source Separation Exploiting Higher-Order Frequency Dependencies

,”

IEEE/ACM Trans. Audio, Speech, Language Process

,

15

(

1

),

2006

,

70

–

79

.

Google Scholar

Crossref

[6]

D.

Kitamura

,

S.

Mogami

,

Y.

Mitsui

,

N.

Takamune

,

H.

Saruwatari

,

N.

Ono

,

Y.

Takahashi

, and

K.

Kondo

, “

Generalized Independent Low-rank Matrix Analysis Using Heavy-Tailed Distributions for Blind Source Separation

,”

EURASIP Journal on Advances in Signal Processing

,

2018

(

1

),

2018

,

28

.

Google Scholar

Crossref

[7]

D.

Kitamura

,

N.

Ono

,

H.

Sawada

,

H.

Kameoka

, and

H.

Saruwatari

, “

Determined Blind Source Separation Unifying Independent Vector Analysis and Nonnegative Matrix Factorization

,”

IEEE/ACM Trans. Audio, Speech, Language Process

,

24

(

9

),

2016

,

1622

–

1637

, DOI:

https://doi.org/10.1109/TASLP.2016.2577880

Google Scholar

Crossref

[8]

K.

Lange

,

MM Optimization Algorithms

,

Society for Industrial & Applied Mathematics

,

U.S.

,

2016

, DOI:

https://doi.org/10.1137/1.9781611974409.ch1

.

Google Scholar

Crossref

[9]

T.-H.

Lin

and

Y.

Tsao

, “

Source Separation in Ecoacoustics: A Roadmap Towards Versatile Soundscape Information Retrieval

,”

Remote Sensing in Ecology and Conservation

,

6

(

3

),

2020

,

236

–

247

.

Google Scholar

Crossref

[10]

Z.

Luo

,

C.

Li

, and

L.

Zhu

, “

A Comprehensive Survey on Blind Source Separation for Wireless Adaptive Processing: Principles, Perspectives, Challenges and New Research Directions

,”

IEEE Access

,

6

,

2018

, 66685708, DOI:

https://doi.org/10.1109/ACCESS.2018.2879380

.

Google Scholar

[11]

S.

Makino

, ed.,

Audio Source Separation

,

Springer International Publishing

,

August

2018

, DOI:

10.007/978-3-319-73031-8

.

Google Scholar

Crossref

[12]

N.

Makishima

,

S.

Mogami

,

N.

Takamune

,

D.

Kitamura

,

H.

Sumino

,

S.

Takamichi

,

H.

Saruwatari

, and

N.

Ono

, “

Independent Deeply Learned Matrix Analysis for Determined Audio Source Separation

,”

IEEE/ACM Transactions on Audio, Speech, and Language Processing

,

27

(

10

),

2019

,

1601

–

1615

.

Google Scholar

Crossref

[13]

Y.

Mitsui

,

D.

Kitamura

,

N.

Takamune

,

H.

Saruwatari

,

Y.

Takahashi

, and

K.

Kondo

, “

Independent Low-rank Matrix Analysis based on Parametric Majorization-Equalization Algorithm

,” in

Proceedings of IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP)

,

December 2017

,

1

–

5

.

[14]

S.

Mogami

,

Y.

Mitsui

,

N.

Takamune

,

D.

Kitamura

,

H.

Saruwatari

,

Y.

Takahashi

,

K.

Kondo

,

H.

Nakajima

, and

H.

Kameoka

, “

Independent Low-Rank Matrix Analysis Based on Generalized Kullback-Leibler Divergence

,”

IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences

, E102-A(2),

2019

,

458

–

63

.

Google Scholar

[15]

S.

Mogami

,

N.

Takamune

,

D.

Kitamura

,

H.

Saruwatari

,

Y.

Takahashi

,

K.

Kondo

, and

N.

Ono

, “

Independent Low-Rank Matrix Analysis Based on Time-Variant Sub-Gaussian Source Model for Determined Blind Source Separation

,”

IEEE/ACM Trans. Audio, Speech, Language Process

,

28

,

2020

,

503

–

518

.

Google Scholar

Crossref

[16]

N.

Murata

,

S.

Ikeda

, and

A.

Ziehe

, “

An Approach to Blind Source Separation Based on Temporal Structure of Speech Signals

,”

Neurocomputing

,

41

(

1-4

),

2001

,

1

–

24

.

Google Scholar

Crossref

[17]

T.

Nakashima

,

R.

Scheibler

,

M.

Togami

, and

N.

Ono

, “

Joint Dereverber-ation and Separation with Iterative Source Steering

,” in

Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

, June

2021

.

[18]

T.

Nakashima

,

R.

Scheibler

,

Y.

Wakabayashi

, and

N.

Ono

, “

Faster Independent Low-rank Matrix Analysis with Pairwise Updates of Demixing Vectors

,” in

Proceedings of European Signal Processing Conference (EU-SIPCO)

,

January 2021

,

301

–

5

.

[19]

N.

Ono

, “

Fast Algorithm for Independent Component/Vector/Lowrank Matrix Analysis with Three or More Sources

,” in

Proceedings Spring Meeting of Acoustical Society of Japan

, in

Japanese

,

March 2018

,

437

–

8

.

[20]

N.

Ono

, “

Fast Stereo Independent Vector Analysis and Its Implementation on Mobile Phone

,” in

Proceedings of IEEE International Workshop on Acoustic Signal Enhancement (IWAENC)

,

September 2012

.

[21]

N.

Ono

, “

Stable and Fast Update Rules for Independent Vector Analysis based on Auxiliary Function Technique

,” in

Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)

,

2011

,

189

–

192

, DOI:

https://doi.org/10.1109/ASPAA.2011.6082320

.

Crossref

[22]

Z.

Rafii

,

A.

Liutkus

,

F.-R.

Sttöter

,

S. I.

Mimilakis

, and

R.

Bittner

, “

MUSDB18-HQ - an uncompressed version of MUSDB18

,”

August

2019

, https://doi.org/10.5281/zenodo.3338373.

[23]

R.

Scheibler

,

E.

Bezzam

, and

I.

Dokmanic

, “

Pyroomacoustics: A Python Package for Audio Room Simulation and Array Processing Algorithms

,” in

Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

,

April 2018

,

351

–

5

, DOI:

https://doi.org/10.1109/ICASSP.2018.8461310

.

Crossref

[24]

R.

Scheibler

and

N.

Ono

, “

Fast and Stable Blind Source Separation with Rank-1 Updates

,” in

Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

,

2020

,

236

–

240

.

[25]

R.

Scheibler

and

N.

Ono

, “

MM Algorithms for Joint Independent Sub-space Analysis with Application to Blind Single and Multi-Source Extraction

,”

2020

, arXiv: 2004.03926 [eess.SP].

[26]

M.

Sunohara

,

C.

Haruta

, and

N.

Ono

, “

Low-latency real-time blind source separation for hearing aids based on time-domain implementation of online independent vector analysis with truncation of non-causal components

,” in

Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

,

March 2017

,

216

–

20

, DOI:

https://doi.org/10.1109/ICASSP.2017.7952149

.

Crossref

[27]

T.

Taniguchi

,

N.

Ono

,

A.

Kawamura

, and

S.

Sagayama

, “

An Auxiliary-Function Approach to Online Independent Vector Analysis for Realtime Blind Source Separation

,” in

Proceedings of Hands-Free Speech Communication and Microphone Arrays (HSCMA)

,

May 2014

,

107

–

11

.

[28]

A.

Weiss

,

A.

Yeredor

,

S. A.

Cheema

, and

M.

Haardt

, “

The Extended “Sequentially Drilled” Joint Congruence Transformation and Its Application in Gaussian Independent Vector Analysis

,”

IEEE Transactions on Signal Processing

,

65

(

23

),

2017

,

6332

–

6344

.

Google Scholar

Crossref

[29]

J.

Yang

,

S.

Gohel

, and

B.

Vachha

, “

Current Methods and New Directions in Resting State fMRI

,”

Clinical Imaging

,

65

,

2020

,

47

–

53

, DOI: https://doi.org/10.1016/j.clinimag.2020.04.004.

Google Scholar

Crossref

PubMed

[30]

A.

Yeredor

, “

Blind Separation of Gaussian Sources With General Covari-ance Structures: Bounds and Optimal Estimation

,”

IEEE Transactions on Signal Processing

,

58

(

10

),

2010

,

5057

–

5068

, DOI:

https://doi.org/10.1109/TSP.2010.2053362

.

Google Scholar

Crossref

[31]

A.

Yeredor

, “

On Hybrid Exact-Approximate Joint Diagonalization

,” in

Proceedings of IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP)

,

December 2009

,

312

–

5

.

[32]

T.

Yoshioka

,

T.

Nakatani

, and

M.

Miyoshi

, “

An Integrated Method for Blind Separation and Dereverberation of Convolutive Audio Mixtures

,” in

Proceedings of European Signal Processing Conference (EUSIPCO)

,

August 2008

.

2023

T. Nakashima and N. Ono

Published in APSIPA Transactions on Signal and Information Processing. Published by Emerald Publishing Limited. This article is published under the Creative Commons Attribution-NonCommercial (CC BY-NC 4.0) licence. Anyone may reproduce, distribute, translate and create derivative works of this article (for non-commercial purposes only), subject to full attribution to the original publication and authors. The full terms of this licence may be seen at Link to the terms of the CC BY-NC 4.0 licence.

Repeated Update of Demixing Vectors in Independent Low-rank Matrix Analysis for Better Separation

1 Introduction

2 Overview of Problem Formulation and Conventional ILRMA

2.1 Notations

2.2 Signal Model

2.3 Independent Low-rank Matrix Analysis

2.4 Update of Source Model Parameters

2.5 Update of Demixing Vectors

2.5.1 Sequential Update of Demixing Vector: Iterative Projection One (IP1)

2.5.2 Pairwise Update Rules of Demixing Vectors: Iterative Projection Two (IP2)

2.5.3 Iterative Source Steering (ISS)

3 Repeated Update of Demixing Vectors for ILRMA

3.1 Our Motivation and Approach

3.2 Efficient Algorithm for Repeated Update

3.3 Complexity Analysis

4 Experimental Validation

4.1 Solving the HEAD Problem

4.I.I Runtime Comparison

4.2 Blind Separation of Music Mixtures

4.2.1 Evaluation Criteria

4.2.2 Dataset and Simulation Setup

4.2.3 Experimental Conditions

4.2.4 Implementation Notes

4.2.5 Results and Discussion

5 Conclusion

Biographies

References

Email Alerts

Cited By

Repeated Update of Demixing Vectors in Independent Low-rank Matrix Analysis for Better Separation Open Access

1 Introduction

2 Overview of Problem Formulation and Conventional ILRMA

2.1 Notations

2.2 Signal Model

2.3 Independent Low-rank Matrix Analysis

2.4 Update of Source Model Parameters

2.5 Update of Demixing Vectors

2.5.1 Sequential Update of Demixing Vector: Iterative Projection One (IP1)

2.5.2 Pairwise Update Rules of Demixing Vectors: Iterative Projection Two (IP2)

2.5.3 Iterative Source Steering (ISS)

3 Repeated Update of Demixing Vectors for ILRMA

3.1 Our Motivation and Approach

3.2 Efficient Algorithm for Repeated Update

3.3 Complexity Analysis

4 Experimental Validation

4.1 Solving the HEAD Problem

4.I.I Runtime Comparison

4.2 Blind Separation of Music Mixtures

4.2.1 Evaluation Criteria

4.2.2 Dataset and Simulation Setup

4.2.3 Experimental Conditions

4.2.4 Implementation Notes

4.2.5 Results and Discussion

5 Conclusion

Biographies

References

Email Alerts

Suggested Reading

Recommended for you

Cited By

Repeated Update of Demixing Vectors in Independent Low-rank Matrix Analysis for Better Separation