Skip to Main Content

In this paper, we propose a better update algorithm for independent low-rank matrix analysis (ILRMA). ILRMA has two types of parameters, demixing vectors and non-negative matrix factorization parameters, which are estimated by minimizing the same objective function. Although many extensions of ILRMA have been proposed, the importance of the order of parameter updates in ILRMA is not investigated sufficiently. Because of the observation that iterative projection two (IP2) shows a higher performance than IP1, we propose a repeated update of demixing vectors with the source model fixed in one iteration; this approximates a simultaneous update of all demixing vectors together. We conducted music source separation experiments with more than 100 songs. The results showed that the proposed algorithm with the repeated update of demixing vectors outperforms the conventional ILRMA regarding separation performance and convergence speed.

Blind source separation (BSS) aims to separate a mixture of multiple signals into individual signals [2]. It has been applied in a vast number of applications, such as multichannel audio processing [11], music information retrieval [1], wireless communication systems [10], soundscape information retrieval [9], and brain imaging [29].

Independent vector analysis (IVA) considers frequency-wise correlations and avoids permutation ambiguity by assuming a multivariate distribution of sources [3, 5]. The demixing matrices of IVA are estimated by updating with the gradient descent method, which is sensitive to a step-size parameter and may be unstable. Auxiliary-function-based IVA (AuxIVA) has been proposed to improve convergence and performance [21]. AuxIVA estimates demixing matrices faster and more stably than the conventional IVA with no tuning parameters using the majorization-minimization algorithm [8]. Some extensions of AuxIVA for online/real-time processing [27] and its application to hearing aids [26] have been proposed. To further improve the source model, AuxIVA was extended to independent low-rank matrix analysis (ILRMA) [7] by exploiting a low-rank source model of nonnegative matrix factorization. ILRMA models the spectrograms of separated sources more accurately and thus achieves a higher separation performance than AuxIVA. Many extensions to other source models of ILRMA were proposed [6, 14, 15]. Update rules of demixing matrices and source models have been extended from IVA to ILRMA.

Update rules of demixing matrices of AuxIVA have been extensively studied in recent years. Iterative projection (IP) was initially proposed for AuxIVA and updates each row vector of a demixing matrix (demixing vector) per iteration [21]. The pairwise update rule called iterative projection 2 (IP2) was also proposed and updates two demixing vectors simultaneously in each iteration [20]. We refer to the original IP as IP1 to distinguish it from IP2 in this paper. As an alternative approach, iterative source steering (ISS), which updates the entire demixing matrix using elementary row operations, has been proposed [24]. ISS can eliminate matrix inversions and thus updates more rapidly while keeping the same separation performance as IP1. All these update methods are derived by minimizing the likelihood function of the input mixture signal, leading to a system of quadratic equations called hybrid exact-approximate joint diagonalization (HEAD) [30, 31] (also known as “sequentially drilled” joint congruence (SeDJoCo) [28]). When the number of sources is three or more, HEAD is still an open problem. IP2 can yield a global optimal solution of HEAD only for two sources by solving a generalized eigenvalue problem. IP1, IP2, and ISS are available for ILRMA because ILRMA includes AuxIVA as a special case [7]. The original ILRMA employs IP1 to update demixing vectors. ILRMA with IP2 or ISS has also been proposed [17, 18]. Interestingly, it has been reported that IVA with IP2 [19] and ILRMA with IP2 [18] showed a higher performance than those with IP1, even using the same source model. These results imply that how to update the parameters also contributes the better separation performance. Similarly, it has been shown that a slower updating of source model parameters tends to improve the separation performance [13].

Motivated by these studies, we here focus on the parameter updates in ILRMA. The contribution of this study is summarized as follows:

  1. We propose a new algorithm of ILRMA named ILRMA-Rep. It repeatedly updates the demixing vectors by fixing the source model at each iteration. It works as approximately updating all the demixing vectors at once.

  2. We derive faster update rules of ILRMA-Rep based on the matrix inversion lemma (MIL). Especially the update rule based on IP2 with MIL, which we call IP2-MIL, is new and applicable to other BSS methods, such as online AuxIVA.

  3. We conducted simulated separation experiments with 144 music signals to validate the efficacy of our proposed method. The results show that the proposed methods improve the separation performance.

The rest of this paper is organized as follows. We briefly summarize the background of the conventional ILRMA and its update rules in Section 2. Section 3 describes ILRMA with repeated updates of demixing vectors (ILRMA-Rep), the proposed ILRMA updates with repeated updates of demixing vectors. We evaluate the performance of ILRMA-Rep and compare it with that of the conventional ILRMA with IP1, IP2, and ISS in Section 4. Finally, Section 5 concludes this paper.

In the rest of this paper, we use lower- and uppercase bold symbols for vectors and matrices, respectively. Lowercase normal symbols denote vector entries or scalars. For example, xkft denotes the kth entry of a vector xft. The transpose and the conjugate transpose of a vector and a matrix are respectively denoted as (·) and ()H. We denote ℂ and ℝ+ as the sets of all complex numbers and non-negative real numbers, respectively. Unless specified otherwise, indices f, t, and m always take the ranges from 1 to F, T, and M, respectively. Also, we omit the bounds of sums or products over these indices when they span the ranges. For example, Σk is the sum over all k in the range of k = 1,…,K, and Σft is the double-sum over f = 1,…, F and t = 1,…, T. Similarly, {Wf}f denotes the set of Wf for all f ; {xkft}kft denotes the set of xkft for all k, f, and t, for example.

We formulate the short-time Fourier transform (STFT) domain BSS as

(1)

where xftM denotes the observed mixture signals recorded by M microphones and sftK denotes the source signals of K sources, respectively, at frequency f and time t. Here, Af = [α1f… αKf] ∈ ℂM×K is the mixing matrix whose kth column vector αKf corresponds to the steering vector from source k to each microphne. The goal of BSS is to estimate the frequency-wise demixing matrices,

(2)

such that the estimated source is

(3)

given only mixture signals xft, where yftK denotes the estimated signals. The kth row vector of the demixing matrix wkf is called the demixing vector for source k. We henceforth consider the determined situation, i.e., the number of sources equals that of microphones M = K.

We briefly summarize ILRMA as a maximum likelihood estimation problem. The likelihood function is derived with the following assumptions.

  1. All the sources are statistically independent; their joint distribution is the product of their individual distributions.

  2. The estimated signals follow this zero-mean multivariate complex Gaussian distribution,
    (4)

    where rkft ∈ ℝ+ is called the variance of the distribution corresponding to the kth source in a time-frequency point f and t.

  3. The variance rkft is decomposed into L parts as
    (5)

    where bkft and cklt ∈ ℝ+ are called the basis and activity coefficients, respectively. Henceforth, we call these coefficients the source model parameters.

The kth estimated signal is denoted as ykft=wkfHxft from (3). By using these assumptions, we can determine the likelihood of the observation as

(6)
(7)
(8)

where px is the probability density function of the observed signals and 𝒲 is the set defined as 𝒲 = {Wf}f. The determinant term is the Jacobian that comes from the change of variable. The goal of ILRMA is to estimate Wf by minimizing the following function:

(9)

where ℬ and 𝒞 are the sets defined as {bkfl}kfl and {cklt}klt, respectively. The aim of ILRMA is to estimate the demixing vectors wkf and the source model parameters bkfl,cklt that minimize (9) with only the observed mixture xft.

We can derive the following multiplicative update rules of the source model parameters by applying the auxiliary-function method to (9) [7]:

(10)
(11)

These update rules guarantee the convergence of (9).

We summarized the well-used update rules of demixing vectors for AuxIVA and ILRMA, which is based on our proposed method in the next section.

2.5.1 Sequential Update of Demixing Vector: Iterative Projection One (IP1)

We calculate the derivative of (9) with respect to wkf (k = 1, …, K) to derive the update rules of demixing vectors. Then, we obtain the following system of quadratic equations:

(12)

where Ukf ∈ ℂK×K is called the weighted covariance matrix defined as

This system of quadratic equations is called hybrid exact-approximate joint diagonalization (HEAD) [28, 30, 31]. When K = 2, the closed-form solution of (12) can be derived by solving a generalized eigenvalue problem [19, 25]. However, for K ≥ 3, the closed-form solution of the HEAD problem has not yet been found. Instead, we minimize the objective function (9) with respect to only one demixing vector wkf while keeping the others fixed. The resulting update rule is given by

(13)
(14)

It is referred to as IP1 [21]. Algorithm 1 summarizes the conventional ILRMA with IP1.

2.5.2 Pairwise Update Rules of Demixing Vectors: Iterative Projection Two (IP2)

The closed-form solution of HEAD is available for two sources by solving the following generalized eigenvalue problem:

(15)

where λ1f and λ2f (λ1fλ2f) are the corresponding eigenvalues of w1f and w2f. Therefore, the two demixing vectors w1f, w2f can be updated simultaneously [20, 32].

Algorithm 1

Conventional ILRMA with IP1.

Algorithm 1

Conventional ILRMA with IP1.

Close modal

For K ≥ 3, the joint update method for two demixing vectors in the case of three or more sources has been proposed [19]. This method was originally proposed for AuxIVA and achieved a higher separation performance with fewer iteration steps. We obtain the following system of 2K quadratic equations by calculating ∂ℒ+ /wmf = 0 and ∂ℒ+ /wnf = 0 for all km, n and mn:

(16)
(17)
(18)

The pairwise update rule of w𝓁 (𝓁 = m, n) is derived by solving (16)-(18) as follows [19];

(19)
(20)
(21)

where z𝓁f is the eigenvector of Z𝓁f (see [20] for details). Note that the choice of the two indices m and n is arbitrary as long as mn.

2.5.3 Iterative Source Steering (ISS)

Instead of estimating the demixing vector wkf in IP1, ISS [24] updates the entire demixing matrix by estimating a new vector vkf = [v1kf… vKkf],

(22)

where the update rule that minimizes (9) with respect to vmkf is given by

(23)

Furthermore, from the demixing model ykft=wkfHxft, the following are the inverse-free update rules of vkf and the output estimated signal yft:

(24)
(25)

IVA with IP2 [19] and ILRMA with IP2 [18] showed a higher performance than those with IP1. These results imply that the performance will be further improved if more demixing vectors, ideally all of them, are updated simultaneously such as

(26)

However, it leads to the HEAD problem and no closed-form solution has yet been found for K ≥ 3 as mentioned in the previous section.

Instead of solving the HEAD problem in a closed-form manner, we propose an “approximately” simultaneous update of demixing vectors by simply repeating the demixing vector updates with IP1, IP2, or ISS several times while keeping the source model parameters fixed. Since each update reduces the objective function of (9), it is expected that the demixing matrix converges the solution of the HEAD problem by repeating the demixing matrix update. We can see this as follows. From (12), we rewrite the HEAD problem as

(27)

where Εκ is the K × K identity matrix. Therefore, we can check how close the demixing matrix is to the solution of HEAD by visualizing the left side of (27). Figure 1 shows one example of convergence to the solution of HEAD by iteration steps. Each colormap shows the element-wise absolute value of the left-hand side of (27). The initial demixing matrices were set to the identity matrices. The weighted covariance matrices were set to the random Hermitian matrices. As shown in Figure 1, the repeated updates of demixing vectors considerably improve the estimated solution of HEAD. This example implies that a set of repeated updates of demixing vectors can work as a simultaneous update of demixing vectors.

Figure 1

Convergence of demixing vectors to the solution of HEAD by repeated updates. The ideal solution is the identity matrix; only the diagonal elements are bright, and the others are dark. This figure shows an example of convergence for four sources. Each column shows element-wise absolute values of the demixing matrices after applying each update method.

Figure 1

Convergence of demixing vectors to the solution of HEAD by repeated updates. The ideal solution is the identity matrix; only the diagonal elements are bright, and the others are dark. This figure shows an example of convergence for four sources. Each column shows element-wise absolute values of the demixing matrices after applying each update method.

Close modal

The outline of the proposed procedures is summarized in Algorithm 2. We call this algorithm ILRMA-Rep. Figure 2 illustrates how the demixing vectors and the source model parameters are updated.

Updates of demixing matrices with IP1 or IP2 include a matrix inversion. It could cause a large computational complexity for repeating the update of demixing vectors. However, the matrix inversion lemma (MIL) is available to reduce complexities because the covariance matrices are fixed within each “repeat.” In this section, we derive an efficient algorithm for the repeated update of demixing vectors.

Algorithm 2

Proposed ILRMA-Rep.

Algorithm 2

Proposed ILRMA-Rep.

Close modal
Figure 2

The diagram between conventional ILRMA (a) and proposed ILRMA-Rep (b) where K = 3, for example. Blight boxes in each subfigure indicate the update of corresponding parameters.

Figure 2

The diagram between conventional ILRMA (a) and proposed ILRMA-Rep (b) where K = 3, for example. Blight boxes in each subfigure indicate the update of corresponding parameters.

Close modal

MIL states that:

(28)

where X is an invertible square matrix of shape K × K, Ψ and Ω are matrices of shape K × J and J × K (JK) respectively, and EJ is the J × J identity matrix.

Let us consider avoiding the matrix inversion of IP1 included in (13). Let Af=Wf1 and akf= Afek, if we update Af together with Wf, (13) can be easily calculated as w^kf=Ukf1akf where w^kf denote the left side of (13). Then, the point is how to update Af without the matrix inversion. We rewrite (13) as

(29)

Then, by using the matrix inversion lemma with Ψek and Ω(w^kfwkf)H in (28), we obtain the update rule of Af without matrix inversion. Finally, the update rules without the matrix inversion are summarized as follows.

(30)
(31)
(32)
(33)
(34)

These update rules were first derived in the context of online BSS [27]. Henceforth, we call this version IP1-MIL.

Likewise, by setting Ψ ← [emen] and Ω[(w^mfwmf)(w^nfwnf)]H in (28), the efficient update rule of IP2 is given by

(35)
(36)
(37)
(38)
(39)
(40)
(41)

where E2 is the 2 × 2 identity matrix. Note that G is a matrix of shape 2 × 2 and thus the cost of its inversion is much less than those of Wf and Ukf. To the best of our knowledge, this is the first derivation of IP2 with the MIL. Henceforth, we call this version IP2-MIL.

This subsection outlines the time complexity of the demixing matrix update method for each iteration and frequency bin. For IP1 and IP2, the calculation of (WfUkf)−1 in (13) and (19), respectively, takes O(K3) time at each source k. For ISS by (23), the calculation of vkf takes O(K3) time. Therefore, the total time complexity of updating the demixing matrix Wf for all sources k = 1, …, K is O(K4) with these methods. In ILRMA-Rep, the covariance matrices Ukf are fixed, and the repeat of demixing matrix updates takes O(NrepK4) time, where Nrep is the number of repeats. The use of MIL allows for efficient updating of W1. More specifically, O(K4) can be computed for the first repeat only, and O(K3) for subsequent (Nrep − 1) repeats. Therefore, the total time complexity of IP1-MIL and IP2-MIL for all the sources takes the time complexity of O(NrepK3+ K4). Note that MIL cannot be applied to ISS because W−1 is not calculated in ISS. In addition, we have an alternative for ISS using (24) instead of (23). As for ISS by (24), the calculation of vkf takes O(TK) time, and the total time complexity of updating the demixing matrix Wf for all sources k = 1,…, K is O(TK2). From this, ILRMA-Rep using ISS by (24) takes O(NrepTK2) in total. Repeating the update by (24) should take more time than ISS by (23) because TK2 in most cases for normal BSS. Therefore, we hereafter use (23) for ISS update in this paper. Table 1 summarizes the complexities of the update rules.

Table 1

Complexities per each iteration and frequency bin.

MethodComplexity wkf or vkf (∀k)
Ukf (∀k)wkf or vkf (∀k)
IP1, IP2O(TK3)O(NrepK4)
IP1-MIL, IP2-MILO(TK3)O(NrepK3+ K4)
ISS by (24)O(TK3)O(NrepK4)
ISS by (25) O(NrepTK2)

4.I.I Runtime Comparison

Since the runtime performance may differ from theoretical complexity, as described in Section 3.3, we first compare the runtime performance characteristics of the different methods. This simulation was run using Python language on a workstation powered by a 96-core AMD EPYC 7643 processor at 2.3 GHz. We generate sets of K random Hermitian matrices Ukf (k = 1, …, K) by the following procedure:

  1. Generate random complex values
  2. Generate random observed signals as xft := [x1ftxKft] (∀f, t).

  3. Calculate random covariance matrices as

The initial demixing matrices were set to the identity matrix. We set the number of time frame T to 100 and the number of frequency bins F to 257. For ISS, we update the demixing matrices by (23) in this experiment because T is large. We ran 20 iterations of IP1, IP1-MIL, IP2, IP2-MIL, and ISS by changing K = 2, 4, 6, 8, 10, 12, 14,16. The covariance matrices are fixed in each K. Table 2 shows the average runtimes for IP1, IP1-MIL, IP2, IP2-MIL, and ISS over 100 trials.

Table 2

Runtimes (ms) with and without the matrix inversion lemma (MIL).

MethodsNumber of channels
246810121416
IP15.716.940.779.5151.1222.3342.6503.6
IP1-MIL4.814.532.662.6112.6180.8312.6465.0
IP212.433.868.3122.9202.4302.0461.7720.3
IP2-MIL13.632.155.290.0135.4188.5283.1501.2
ISS4.416.445.284.8174.1265.2511.6793.1

We can see that the fastest algorithm depends on the number of channels. For example, ISS is the best in K = 2, but IP1-MIL is the best in K = 6. The table also shows that IP1-MIL and IP2-MIL were almost consistently faster than IP1 and IP2, except for IP2 and IP2-MIL in K = 2. This exception might be because computing temporal variables in IP2-MIL take more time than the matrix inversion when K = 2. In the following evaluation of separation performance, IP1-MIL and IP2-MIL will respectively be referred to as IP1 and IP2 because the resulting outputs are identical to those of the original IP1 and IP2.

4.2.1 Evaluation Criteria

We evaluated the separation and convergence performance characteristics of our proposed ILRMA-Rep with convolutive music mixtures. The separation performance criterion is the scale-invariant source-to-distortion ratio (SI-SDR) [S.], defined as

(42)
(43)

where s ∈L is the reference signal and ŝL is the estimated signal in the discrete time domain of length L. A higher SI-SDR indicates higher separation performance and quality. We also measured SI-SDR improvement (SI-SDRi) defined as the difference in SI-SDR between the estimated and mixture signals:

(44)

where xL is the mixture signal in the discrete time domain. In the following experiment, we use the individual reverberant signals at each microphone as reference signals.

4.2.2 Dataset and Simulation Setup

We used the MUSDB18-HQ dataset [22] as a source signal, including 150 stereo-recorded songs. Each song in the dataset consists of four parts: bass, drums, other, and vocals. To perform the image-source method and reduce the computational load, we extracted the left channel of each signal and then downsampled each signal by 16 kHz from the original 44.1 kHz. We created simulated convolutive mixtures using the Python package pyroomacoustics [23]. As shown in Figure 3, the number of sound sources and microphones was four, and the microphone array was uniformly linear with a spacing of 2 cm. The reverberation time was approximately 200 ms.

Figure 3

Room geometry and locations of sources and microphone array.

Figure 3

Room geometry and locations of sources and microphone array.

Close modal

4.2.3 Experimental Conditions

We use the STFT with a 4096-points Hamming analysis window and half-overlap. We set the initial demixing matrices Wf to the identity matrices for all f. Also, we set the initial values of the source models bkfi to 1 and cklt to uniformly distributed values over ]0, 1] for all l, k, f, and t. We performed ILRMA-Rep with 100 iteration steps and several “repeates” Nrep in each iteration. After separation, the scale of the estimated signal was restored by back-projection onto the first microphone [16]. Then, we calculated the SI-SDRi for each music mixture and obtained its average along all the songs and channels.

4.2.4 Implementation Notes

We implemented all the algorithms in Python 3.9.6. For numerical stability, we performed the following regularization after the respective updates in each iteration:

(45)

where ε is a user-defined parameter. We set ε to 10−6 in this experiment.

4.2.5 Results and Discussion

Figure 4 shows the separation performance at the end of the 100 iterations for the number of bases L =1, 2, 5, 10, and 20. Note that the conventional ILRMA corresponds to Nrep = 1 in the brightest color bar in Figure 4, and the case of L = 1 in the leftmost plot in Figure 4 is nearly equivalent to that of AuxIVA.

Figure 4

Averaged SI-SDR improvements over 144 samples and four channels at the end of 100 iterations, with various numbers of repeats and bases.

Figure 4

Averaged SI-SDR improvements over 144 samples and four channels at the end of 100 iterations, with various numbers of repeats and bases.

Close modal

For L =1, the SI-SDRi of all methods did not change markedly with increasing Nrep. For L = 2 or larger, SI-SDRi was steadily increased by Nrep in each L. These results support that performance can be improved by simply repeating the updates of demixing matrices.

One possible explanation for these results would be as follows. ILRMA is a difficult optimization problem in terms of two different types of parameters: demixing matrices and source model parameters (NMF parameters). It is a non-convex optimization, and then it should have local minima. For example, if the source model is optimized too fast, it may fit mixed (or insufficiently-separated) signals. Then, the demixing matrices are no longer updated, and a poor result might be obtained. Previous studies suggest that quickly updating the demixing matrices [18, 19] or slowly updating the source model parameters [13] leads to better separation performance. Our proposed method repeatedly updates the demixing matrices by fixing the source model parameters. It works by updating the demixing matrices faster than the source model parameters and thus improves the separation performance.

Figure 5 shows the relationship between SI-SDRi at the end of the iteration with Nrep = 1 and Nrep ≥ 2. Each point corresponds to the value before becoming the average in Figure 4. The dashed line in the figure is the line where the vertical and horizontal axes are equal; the points above the dashed line indicate that repeats of the demixing vector update improve SI-SDRi. For L =1, the points are clustered near the dashed line for any number of repetitions. By contrast, when L = 2 or larger, more points are clustered above the dashed line, and this tendency is stronger with a larger number of iterations. This result supports the observation that the separation performance improves with repeated updates of the demixing matrix.

Figure 5

Scatter plots of SI-SDR improvements with and without repeats for each number of bases L.

Figure 5

Scatter plots of SI-SDR improvements with and without repeats for each number of bases L.

Close modal
Figure 6

Averaged SI-SDR improvements over 144 samples and four channels along with iteration steps, with various numbers of repeats and bases. Note that vertical axes are aligned for each row, but not for different rows.

Figure 6

Averaged SI-SDR improvements over 144 samples and four channels along with iteration steps, with various numbers of repeats and bases. Note that vertical axes are aligned for each row, but not for different rows.

Close modal

Figure 6 shows the separation performance with the number of iteration steps for the number of bases L = 1, 2, 5, 10, and 20. For L =1, the convergence became faster as Nrep increased, but the final performance was almost the same, approximately 3dB. By contrast, for L = 2 or larger, both the convergence speed and the final performance became much higher than those for L = 1.

In this paper, we proposed a repeated update scheme of demixing vectors for independent low-rank matrix analysis. We also derived an efficient update rule for the repeated version of IP2 by applying the matrix inversion lemma. Music source separation experiments with more than 100 songs were performed to evaluate the separation performance. The experimental results implied that the proposed update scheme improved the separation performance. In particular, we experimentally found that when the number of bases is large, the separation performance is markedly improved by repeating parameter updates. In future work, we will further investigate the performance of repeated updates of demixing vectors on other source models using IP1, e.g., in independent deeply learned matrix analysis [12] or the multichannel variational autoencoder method [4].

Taishi Nakashima received his B.E. in Engineering from Osaka University, Osaka, Japan, in 2019 and his M.S. in Informatics from Tokyo Metropolitan University, Tokyo, Japan, in 2021. He is pursuing a Ph.D. at Tokyo Metropolitan University and is also a recipient of the JSPS Research Fellowship (DC1) from April 2021. He is an esteemed Student Member of the Acoustical Society of Japan (ASJ) and the IEEE Signal Processing Society (SPS). He received the 24th Best Student Presentation Award of ASJ and the 16th IEEE SPS Japan Student Conference Paper Award in 2022. His research interests primarily focus on blind source separation and acoustic signal processing.

Nobutaka Ono received his B.E., M.S., and Ph.D. degrees from the University of Tokyo, Japan, in 1996, 1998, and 2001, respectively. He became a research associate in 2001 and a lecturer in 2005 at the University of Tokyo. He moved to the National Institute of Informatics in 2011 as an associate professor and then to Tokyo Metropolitan University in 2017 as a full professor. His research interests include acoustic signal processing, especially microphone array processing, source localization and separation, machine learning, and optimization algorithms. He is a member of IEEE, EURASIP, APSIPA, IPSJ, IEICE, and ASJ. He was a member of IEEE Audio and Acoustic Signal Processing (AASP) Technical Committee from 2014 to 2019. He served as Associate Editor of IEEE Transactions on Audio, Speech, and Language Processing from 2012 to 2015. He received the best paper award at APSIPA ASC in 2018 and 2021 and Sadaoki Furui Prize Paper Award from APSIPA in 2021.

[1]
E.
Cano
,
D.
FitzGerald
,
A.
Liutkus
,
M. D.
Plumbley
, and
F.-R.
Stöter
, “
Musical Söurce Separation: An Introduction
,”
IEEE Signal Processing Magazine
,
36
(
1
),
2019
,
31
40
, DOI:.
[2]
P.
Comon
and
C.
Jutten
,
Handbook of Blind Source Separation: Independent Component Analysis and Applications
, (1st) edition,
USA
:
Academic Press, Inc.
,
2010
.
[3]
A.
Hiroe
, “
Solution of Permutation Problem in Frequency Domain ICA, Using Multivariate Probability Density Functions
,” in
Proc. ICA
,
2006
,
601
608
.
[4]
H.
Kameoka
,
L.
Li
,
S.
Inoue
, and
S.
Makino
, “
Supervised Determined Source Separation with Multichannel Variational Autoencoder
,”
Neural Computation
,
31
(
9
),
2019
,
1891
1914
, DOI:.
[5]
T.
Kim
,
H. T.
Attias
,
S.-Y.
Lee
, and
T.-W.
Lee
, “
Blind Source Separation Exploiting Higher-Order Frequency Dependencies
,”
IEEE/ACM Trans. Audio, Speech, Language Process
,
15
(
1
),
2006
,
70
79
.
[6]
D.
Kitamura
,
S.
Mogami
,
Y.
Mitsui
,
N.
Takamune
,
H.
Saruwatari
,
N.
Ono
,
Y.
Takahashi
, and
K.
Kondo
, “
Generalized Independent Low-rank Matrix Analysis Using Heavy-Tailed Distributions for Blind Source Separation
,”
EURASIP Journal on Advances in Signal Processing
,
2018
(
1
),
2018
,
28
.
[7]
D.
Kitamura
,
N.
Ono
,
H.
Sawada
,
H.
Kameoka
, and
H.
Saruwatari
, “
Determined Blind Source Separation Unifying Independent Vector Analysis and Nonnegative Matrix Factorization
,”
IEEE/ACM Trans. Audio, Speech, Language Process
,
24
(
9
),
2016
,
1622
1637
, DOI:
[8]
K.
Lange
,
MM Optimization Algorithms
,
Society for Industrial & Applied Mathematics
,
U.S.
,
2016
, DOI:.
[9]
T.-H.
Lin
and
Y.
Tsao
, “
Source Separation in Ecoacoustics: A Roadmap Towards Versatile Soundscape Information Retrieval
,”
Remote Sensing in Ecology and Conservation
,
6
(
3
),
2020
,
236
247
.
[10]
Z.
Luo
,
C.
Li
, and
L.
Zhu
, “
A Comprehensive Survey on Blind Source Separation for Wireless Adaptive Processing: Principles, Perspectives, Challenges and New Research Directions
,”
IEEE Access
,
6
,
2018
, 66685708, DOI:.
[11]
S.
Makino
, ed.,
Audio Source Separation
,
Springer International Publishing
,
August
2018
, DOI:
10.007/978-3-319-73031-8
.
[12]
N.
Makishima
,
S.
Mogami
,
N.
Takamune
,
D.
Kitamura
,
H.
Sumino
,
S.
Takamichi
,
H.
Saruwatari
, and
N.
Ono
, “
Independent Deeply Learned Matrix Analysis for Determined Audio Source Separation
,”
IEEE/ACM Transactions on Audio, Speech, and Language Processing
,
27
(
10
),
2019
,
1601
1615
.
[13]
Y.
Mitsui
,
D.
Kitamura
,
N.
Takamune
,
H.
Saruwatari
,
Y.
Takahashi
, and
K.
Kondo
, “
Independent Low-rank Matrix Analysis based on Parametric Majorization-Equalization Algorithm
,” in
Proceedings of IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP)
,
December 2017
,
1
5
.
[14]
S.
Mogami
,
Y.
Mitsui
,
N.
Takamune
,
D.
Kitamura
,
H.
Saruwatari
,
Y.
Takahashi
,
K.
Kondo
,
H.
Nakajima
, and
H.
Kameoka
, “
Independent Low-Rank Matrix Analysis Based on Generalized Kullback-Leibler Divergence
,”
IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences
, E102-A(2),
2019
,
458
63
.
[15]
S.
Mogami
,
N.
Takamune
,
D.
Kitamura
,
H.
Saruwatari
,
Y.
Takahashi
,
K.
Kondo
, and
N.
Ono
, “
Independent Low-Rank Matrix Analysis Based on Time-Variant Sub-Gaussian Source Model for Determined Blind Source Separation
,”
IEEE/ACM Trans. Audio, Speech, Language Process
,
28
,
2020
,
503
518
.
[16]
N.
Murata
,
S.
Ikeda
, and
A.
Ziehe
, “
An Approach to Blind Source Separation Based on Temporal Structure of Speech Signals
,”
Neurocomputing
,
41
(
1-4
),
2001
,
1
24
.
[17]
T.
Nakashima
,
R.
Scheibler
,
M.
Togami
, and
N.
Ono
, “
Joint Dereverber-ation and Separation with Iterative Source Steering
,” in
Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
, June
2021
.
[18]
T.
Nakashima
,
R.
Scheibler
,
Y.
Wakabayashi
, and
N.
Ono
, “
Faster Independent Low-rank Matrix Analysis with Pairwise Updates of Demixing Vectors
,” in
Proceedings of European Signal Processing Conference (EU-SIPCO)
,
January 2021
,
301
5
.
[19]
N.
Ono
, “
Fast Algorithm for Independent Component/Vector/Lowrank Matrix Analysis with Three or More Sources
,” in
Proceedings Spring Meeting of Acoustical Society of Japan
, in
Japanese
,
March 2018
,
437
8
.
[20]
N.
Ono
, “
Fast Stereo Independent Vector Analysis and Its Implementation on Mobile Phone
,” in
Proceedings of IEEE International Workshop on Acoustic Signal Enhancement (IWAENC)
,
September 2012
.
[21]
N.
Ono
, “
Stable and Fast Update Rules for Independent Vector Analysis based on Auxiliary Function Technique
,” in
Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)
,
2011
,
189
192
, DOI:.
[22]
Z.
Rafii
,
A.
Liutkus
,
F.-R.
Sttöter
,
S. I.
Mimilakis
, and
R.
Bittner
, “
MUSDB18-HQ - an uncompressed version of MUSDB18
,”
August
2019
, https://doi.org/10.5281/zenodo.3338373.
[23]
R.
Scheibler
,
E.
Bezzam
, and
I.
Dokmanic
, “
Pyroomacoustics: A Python Package for Audio Room Simulation and Array Processing Algorithms
,” in
Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
,
April 2018
,
351
5
, DOI:.
[24]
R.
Scheibler
and
N.
Ono
, “
Fast and Stable Blind Source Separation with Rank-1 Updates
,” in
Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
,
2020
,
236
240
.
[25]
R.
Scheibler
and
N.
Ono
, “
MM Algorithms for Joint Independent Sub-space Analysis with Application to Blind Single and Multi-Source Extraction
,”
2020
, arXiv: 2004.03926 [eess.SP].
[26]
M.
Sunohara
,
C.
Haruta
, and
N.
Ono
, “
Low-latency real-time blind source separation for hearing aids based on time-domain implementation of online independent vector analysis with truncation of non-causal components
,” in
Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
,
March 2017
,
216
20
, DOI:.
[27]
T.
Taniguchi
,
N.
Ono
,
A.
Kawamura
, and
S.
Sagayama
, “
An Auxiliary-Function Approach to Online Independent Vector Analysis for Realtime Blind Source Separation
,” in
Proceedings of Hands-Free Speech Communication and Microphone Arrays (HSCMA)
,
May 2014
,
107
11
.
[28]
A.
Weiss
,
A.
Yeredor
,
S. A.
Cheema
, and
M.
Haardt
, “
The Extended “Sequentially Drilled” Joint Congruence Transformation and Its Application in Gaussian Independent Vector Analysis
,”
IEEE Transactions on Signal Processing
,
65
(
23
),
2017
,
6332
6344
.
[29]
J.
Yang
,
S.
Gohel
, and
B.
Vachha
, “
Current Methods and New Directions in Resting State fMRI
,”
Clinical Imaging
,
65
,
2020
,
47
53
, DOI: https://doi.org/10.1016/j.clinimag.2020.04.004.
[30]
A.
Yeredor
, “
Blind Separation of Gaussian Sources With General Covari-ance Structures: Bounds and Optimal Estimation
,”
IEEE Transactions on Signal Processing
,
58
(
10
),
2010
,
5057
5068
, DOI:.
[31]
A.
Yeredor
, “
On Hybrid Exact-Approximate Joint Diagonalization
,” in
Proceedings of IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP)
,
December 2009
,
312
5
.
[32]
T.
Yoshioka
,
T.
Nakatani
, and
M.
Miyoshi
, “
An Integrated Method for Blind Separation and Dereverberation of Convolutive Audio Mixtures
,” in
Proceedings of European Signal Processing Conference (EUSIPCO)
,
August 2008
.
Published in APSIPA Transactions on Signal and Information Processing. Published by Emerald Publishing Limited. This article is published under the Creative Commons Attribution-NonCommercial (CC BY-NC 4.0) licence. Anyone may reproduce, distribute, translate and create derivative works of this article (for non-commercial purposes only), subject to full attribution to the original publication and authors. The full terms of this licence may be seen at Link to the terms of the CC BY-NC 4.0 licence.

or Create an Account

Close Modal
Close Modal