Missing Data Completion of Multi-channel Signals Using Autoencoder for Acoustic Scene Classification Open Access

[2]

Barker

Watanabe

Vincent

, and

Trmal

, “

The fifth ‘CHiME’ Speech Separation and Recognition Challenge: Dataset, task and baselines

,” in

Interspeech 2018 - 19th Annual Conference of the International Speech Communication Association

2018

1561

–

1565

[3]

J. P.

Bello

Silva

Nov

R. L.

Dubois

Arora

Salamon

Mydlarz

, and

Doraiswamy

, “

SONYC: A System for Monitoring, Analyzing, and Mitigating Urban Noise Pollution

,”

Communications of the ACM

(

2019

–

[4]

Chen

Liu

Zhang

, and

Yan

, “

Integrating the Data Augmentation Scheme With Various Classifiers for Acoustic Scene Modeling

,”

Tech. rep., Detection, Classification of Acoustic Scenes, and Events (DCASE) 2019 Challenge

2019

,” https://dcase.community/challenge2018/task-monitoring-domestic-activities.

[5]

Choi

Fazekas

Sandler

, and

Cho

, “

Convolutional Recurrent Neural Networks for Music Classification

,” in

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

2017

2392

–

2396

[6]

“

DCASE 2018 Challenge Task5

[7]

Dekkers

Lauwereins

Thoen

M. W.

Adhana

Brouckxon

van Waterschoot

Vanrumste

Verhelst

, and

Karsmakers

, “

The SINS Database for Detection of Daily Activities in a Home Environment Using an Acoustic Sensor Network

,” in

The Detection and Classification of Acoustic Scenes and Events 2017 Workshop (DCASE)

2017

–

[8]

Dekkers

Vuegen

van Waterschoot

Vanrumste

, and

Karsmakers

, “

DCASE 2018 Challenge - Task 5: Monitoring of Domestic Activities Based on Multi-Channel Acoustics

,”

Tech. rep., Detection, Classification of Acoustic Scenes, and Events (DCASE) 2018 Challenge

2018

[9]

Del Galdo

Thiergart

Weller

, and

E. A.

Habets

, “

Generating Virtual Microphone Signals Using Geometrical Information Gathered by Distributed Arrays

,” in

2011 Joint Workshop on Hands-free Speech Communication and Microphone Arrays

2011

185

–

190

[10]

T. B.

Duman

Bayram

, and

Ince

, “

Acoustic Anomaly Detection Using Convolutional Autoencoders in Industrial Processes

,” in

14th International Conference on Soft Computing Models in Industrial and Environmental Applications (SOCO)

2020

432

–

442

[11]

M. C.

Green

and

D. T.

Murphy

, “

Acoustic Scene Classification Using Spatial Features.

,”

Tech. rep., Detection, Classification of Acoustic Scenes, and Events (DCASE) 2017 Challenge

2017

–

[12]

Griffin

and

Lim

, “

Signal Estimation from Modified Short-Time Fourier Transform

,”

IEEE Transactions on Acoustics, Speech, and Signal Processing

(

1984

236

–

243

[13]

Zhang

Ren

, and

Sun

, “

Deep Residual Learning for Image Recognition

,” in

the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

2016

770

–

778

[14]

S. M.

Siniscalchi

C.-H. H.

Yang

, and

C.-H.

Lee

, “

A Variational Bayesian Approach to Learning Latent Variables for Acoustic Knowledge Transfer

,” in

2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

2022

4041

–

4045

[15]

Imoto

and

Ono

, “

Spatial Cepstrum as a Spatial Feature Using a Distributed Microphone Array for Acoustic Scene Analysis

,”

IEEE/ACM Transactions on Audio, Speech, and Language Processing

(

2017

1335

–

1343

[16]

Kim

Yang

Kim

, and

Chang

, “

QTI Submission to DCASE 2021: Residual Normalization for Device-Imbalanced Acoustic Scene Classification with Efficient Design

,”

Tech. rep., Detection, Classification of Acoustic Scenes, and Events (DCASE) 2021 Challenge

2021

[17]

D. P.

Kingma

and

, “

Adam: A Method for Stochastic Optimization

,” in

the 3rd International Conference on Learning Representations (ICLR)

2015

[18]

Kinoshita

and

Ono

, “

Analysis on Roles of DNNs in End-to-End Acoustic Scene Analysis Framework with Distributed Sound-to-Light Conversion Devices

,” in

2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)

2021

1167

–

1172

[19]

Kinoshita

and

Ono

, “

End-to-End Training for Acoustic Scene Analysis with Distributed Sound-to-Light Conversion Devices

,” in

2021 29th European Signal Processing Conference (EUSIPCO)

2021

1010

–

1014

[20]

Peddinti

Povey

M. L.

Seltzer

, and

Khudanpur

, “

A Study on Data Augmentation of Reverberant Speech for Robust Speech Recognition

,” in

IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

2017

5220

–

5224

[21]

Koizumi

Kawaguchi

Imoto

Nakamura

Nikaido

Tanabe

Purohit

Suefusa

Endo

Yasuda

, and

Harada

, “

Description and Discussion on DCASE2020 Challenge Task2: Unsupervised Anomalous Sound Detection for Machine Condition Monitoring

,” in

the Detection and Classification of Acoustic Scenes and Events 2020 Workshop (DCASE2020)

2020

–

[22]

Koutini

Eghbal-Zadeh

, and

Widmer

, “

Acoustic Scene Classification and Audio Tagging with Receptive-Field-Regularized CNNs

,”

Tech. rep., Detection, Classification of Acoustic Scenes, and Events (DCASE) 2019 Challenge

2019

[23]

Mesaros

Heittola

Benetos

Foster

Lagrange

Virtanen

, and

M. D.

Plumbley

IEEE/ACM Transactions on Audio, Speech, and Language Processing

(

2018

379

–

[24]

Ochiai

Delcroix

Nakatani

Ikeshita

Kinoshita

, and

Araki

, “

Neural Network-Based Virtual Microphone Estimator

,” in

2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

2021

6114

–

6118

[25]

D. S.

Park

Chan

Zhang

C.-C.

Chiu

Zoph

E. D.

Cubuk

, and

Q. V.

, “

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

,” in

Interspeech 2019 - 20th Annual Conference of the International Speech Communication Association

2019

2613

–

2617

[26]

Politis

Mesaros

Adavanne

Heittola

, and

Virtanen

, “

Overview and Evaluation of Sound Event Localization and Detection in DCASE 2019

,”

IEEE/ACM Transactions on Audio, Speech, and Language Processing

2020

684

–

698

[27]

Ronneberger

Fischer

, and

Brox

, “

U-Net: Convolutional Networks for Biomedical Image Segmentation

,” in

Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015

2015

234

–

241

[28]

Shiroma

Imoto

Shiota

Ono

, and

Kiya

, “

Investigation on Spatial and Frequency-Based Features for Asynchronous Acoustic Scene Analysis

,” in

2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)

2021

1161

–

1166

[29]

Suh

Park

Jeong

, and

Lee

, “

Designing Acoustic Scene Classification Models with CNN Variants

,”

Tech. rep., Detection, Classification of Acoustic Scenes, and Events (DCASE) 2020 Challenge

2020

[30]

Takahashi

Takamuku

Imoto

, and

Natori

, “

Semi-Supervised Domain Adaptation for Acoustic Scene Classification by Minimax Entropy and Self-Supervision Approaches

,” in

2022 International Workshop on Acoustic Signal Enhancement (IWAENC)

2022

–

[31]

Tanabe

Endo

Nikaido

Ichige

Nguyen

Kawaguchi

, and

Hamada

, “

Multichannel Acoustic Scene Classification by Blind Dereverberation, Dlind Source Separation, Data Augmentation, and Model Ensembling

,”

Tech. rep., Detection, Classification of Acoustic Scenes, and Events (DCASE) 2018 Challenge

2018

[32]

Tang

Qin

, and

Liu

, “

Document Modeling With Gated Recurrent Neural Network for Sentiment Classification

,” in

The 2015 Conference on Empirical Methods in Natural Language Processing

2015

1422

–

1432

[33]

Trockman

and

J. Z.

Kolter

, “

Patches Are All You Need?

”

arXiv preprint

2022

[34]

Yamaoka

Ono

Makino

, and

Yamada

, “

Abnormal Sound Detection by Two Microphones using Virtual Microphone Technique

,” in

2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)

2017

478

–

482

[35]

Yatabe

Masuyama

Kusano

, and

Oikawa

, “

Representation of Complex Spectrogram via Phase Conversion

,”

Acoustical Science and Technology

(

2019

170

–

177

2023

Y. Shiroma, Y. Kinoshita, K. Imoto, S. Shiota, N. Ono and H. Kiya

Figure 1

Performance differences between acoustic tasks with clean and missing data.

Figure 2

An illustration of an Autoencoder structure, a type of neural network.

Network architecture of the autoencoder used in the proposed method.

Figure 3

An image showing a flowchart of an iterative Autoencoder based completion operation.

Block diagram of the proposed missing data completion method using an autoencoder. Only the missing channel is replaced. Normal channels are not changed.

Figure 4

An illustration of a deep learning model structure called a stacked Autoencoder for spectrogram data.

Network structure of the CAE used in the proposed method.

Figure 5

A floor plan of a home showing furniture layout and eight numbered locations.

Two-dimensional floorplan of combined kitchen and living room [8] with selected microphone array numbers in SINS database.

Figure 6

A floor plan of a home with eight numbered points from 1 to 8 on the perimeter walls, indicating sensor locations.

Microphone positions for completion models and ASC models for Situation 1.

Figure 7

An illustration showing a floor plan of a home with eight labeled sensor locations (1 through 8) on the perimeter walls.

Microphone positions for completion models and ASC models for Situation 2.

Figure 8

A graph with three horizontal spectrograms, each showing the number of Mel filter banks on the y axis and Time frame on the x axis, both numbered from 0.

Example signals of Clean, Missing (the average of the other channels), and Completed conditions.

Figure 9

A graph with six subplots labeled a through f is displayed.

Trajectories of In-Out MSE and Out-True MSE for each completion iteration in each completion model.

Figure 10

A group of six line graphs, labeled (a) through (f), shows the F-score [%] versus Iteration for different model architectures (CNN, CRNN, ConvMixer) across six different datasets/tasks related to image manipulation or forgery detection.

Trajectories of F-score using three types of ASC models for each completion iteration in each completion model.

Figure 11

The image is a set of three confusion matrices, each illustrating the classification performance (in percentages) of a model for various human activities under different data conditions: Clean, Missing (some data/labels are missing), and Completed (missing data/labels have been imputed/filled).

Confusion matrices of the experimental results using ConvMixer and CAE-M described in Table 3.

Table 1

Overview of the dataset for constructing completion models [hours].

Classes	Training	Validation	Total
Absence	8.4	2.1	10.5
Cooking	2.3	0.5	2.8
Dishwashing	0.6	0.2	0.8
Eating	1.0	0.3	1.3
Other	0.9	0.2	1.1
Social activity (Calling + Visit)	2.2	0.5	2.7
Vacuum cleaner	0.4	0.1	0.5
Watching TV	8.3	2.1	10.4
Working	8.3	2.1	10.4
Total	32.4	8.1	40.5

Classes	Training	Validation	Total
Absence	8.4	2.1	10.5
Cooking	2.3	0.5	2.8
Dishwashing	0.6	0.2	0.8
Eating	1.0	0.3	1.3
Other	0.9	0.2	1.1
Social activity (Calling + Visit)	2.2	0.5	2.7
Vacuum cleaner	0.4	0.1	0.5
Watching TV	8.3	2.1	10.4
Working	8.3	2.1	10.4
Total	32.4	8.1	40.5

Table 2

Overview of the dataset for constructing ASC models [hours].

Class	Training	Validation	Testing	Total
Absence	8.4	2.1	2.6	13.1
Cooking	1.0	0.2	0.3	1.5
Dishwashing	0.3	0.1	0.1	0.5
Eating	0.5	0.1	0.2	0.8
Other	0.6	0.2	0.2	1.0
Social activity	1.0	0.3	0.3	1.6
Vacuum cleaner	0.2	0.1	0.1	0.3
Watching TV	8.3	2.1	2.6	13.0
Working	7.9	2.0	2.5	12.3
Total	28.2	7.1	8.8	44.1

Class	Training	Validation	Testing	Total
Absence	8.4	2.1	2.6	13.1
Cooking	1.0	0.2	0.3	1.5
Dishwashing	0.3	0.1	0.1	0.5
Eating	0.5	0.1	0.2	0.8
Other	0.6	0.2	0.2	1.0
Social activity	1.0	0.3	0.3	1.6
Vacuum cleaner	0.2	0.1	0.1	0.3
Watching TV	8.3	2.1	2.6	13.0
Working	7.9	2.0	2.5	12.3
Total	28.2	7.1	8.8	44.1

Table 3

Classification results of clean, missing, and completed conditions with different completion models described in F-score [%].

Completion model	Conditions	CNN	CRNN	ConvMixer
	Clean	95.94	95.48	95.13
-	Missing	71.21	30.25	79.71
CAE-S	Completed	72.93	63.30	76.35
CAE-M	Completed	83.36	81.94	91.39
CAE-L	Completed	91.24	87.75	86.42
CAE-latent-S	Completed	91.35	87.63	88.55
CAE-latent-L	Completed	87.36	84.85	84.67
U-Net	Completed	69.89	37.24	80.59

Completion model	Conditions	CNN	CRNN	ConvMixer
	Clean	95.94	95.48	95.13
-	Missing	71.21	30.25	79.71
CAE-S	Completed	72.93	63.30	76.35
CAE-M	Completed	83.36	81.94	91.39
CAE-L	Completed	91.24	87.75	86.42
CAE-latent-S	Completed	91.35	87.63	88.55
CAE-latent-L	Completed	87.36	84.85	84.67
U-Net	Completed	69.89	37.24	80.59

Table 4

Precision and recall scores of the experimental results described in Table 3.

Completion model	Condition	CNN	CRNN	ConvMixer
-	Clean missing	96.07 / 96.13	95.91 / 95.22	95.45 / 95.41
		84.69 / 75.24	83.00 / 39.89	86.76 / 80.91
CAE-S		85.03 / 76.34	84.11 / 68.17	85.56 / 78.33
CAE-M		87.73 / 81.75	89.82 / 79.50	92.26 / 91.35
CAE-L	Completed	91.38 / 91.41	89.57 / 86.73	86.60 / 86.47
CAE-latent-S		91.82 / 91.70	89.12 / 87.32	89.29 / 88.83
CAE-latent-L		88.96 / 87.67	87.28 / 84.55	85.21 / 84.49
U-Net		83.75 / 73.42	58.75 / 30.36	86.92 / 81.57

Completion model	Condition	CNN	CRNN	ConvMixer
-	Clean missing	96.07 / 96.13	95.91 / 95.22	95.45 / 95.41
		84.69 / 75.24	83.00 / 39.89	86.76 / 80.91
CAE-S		85.03 / 76.34	84.11 / 68.17	85.56 / 78.33
CAE-M		87.73 / 81.75	89.82 / 79.50	92.26 / 91.35
CAE-L	Completed	91.38 / 91.41	89.57 / 86.73	86.60 / 86.47
CAE-latent-S		91.82 / 91.70	89.12 / 87.32	89.29 / 88.83
CAE-latent-L		88.96 / 87.67	87.28 / 84.55	85.21 / 84.49
U-Net		83.75 / 73.42	58.75 / 30.36	86.92 / 81.57

Table 5

F-scores of clean, missing, and completed conditions with CAE-M and CAE-latent-S in Situations 1 and 2[%].

Completion model	Conditions	CNN	CRNN	ConvMixer
	Clean	95.94	95.48	95.13
	Missing	71.21	30.25	79.71
CAE-M	Completed (Situation 1)	80.14	74.79	81.97
	Completed (Situation 2)	74.29	60.38	78.26
CAE-latent-S	Completed (Situation 1)	82.38	76.22	85.41
	Completed (Situation 2)	70.11	58.63	75.91

Completion model	Conditions	CNN	CRNN	ConvMixer
	Clean	95.94	95.48	95.13
	Missing	71.21	30.25	79.71
CAE-M	Completed (Situation 1)	80.14	74.79	81.97
	Completed (Situation 2)	74.29	60.38	78.26
CAE-latent-S	Completed (Situation 1)	82.38	76.22	85.41
	Completed (Situation 2)	70.11	58.63	75.91

[1]

Barchiesi

Giannoulis

Stowell

, and

M. D.

Plumbley

, “

Acoustic Scene Classification: Classifying Environments from the Sounds They Produce

,”

IEEE Signal Processing Magazine

(

2015

–

[2]

Barker

Watanabe

Vincent

, and

Trmal

, “

The fifth ‘CHiME’ Speech Separation and Recognition Challenge: Dataset, task and baselines

,” in

Interspeech 2018 - 19th Annual Conference of the International Speech Communication Association

2018

1561

–

1565

[3]

J. P.

Bello

Silva

Nov

R. L.

Dubois

Arora

Salamon

Mydlarz

, and

Doraiswamy

, “

SONYC: A System for Monitoring, Analyzing, and Mitigating Urban Noise Pollution

,”

Communications of the ACM

(

2019

–

[4]

Chen

Liu

Zhang

, and

Yan

, “

Integrating the Data Augmentation Scheme With Various Classifiers for Acoustic Scene Modeling

,”

Tech. rep., Detection, Classification of Acoustic Scenes, and Events (DCASE) 2019 Challenge

2019

,” https://dcase.community/challenge2018/task-monitoring-domestic-activities.

[5]

Choi

Fazekas

Sandler

, and

Cho

, “

Convolutional Recurrent Neural Networks for Music Classification

,” in

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

2017

2392

–

2396

[6]

“

DCASE 2018 Challenge Task5

[7]

Dekkers

Lauwereins

Thoen

M. W.

Adhana

Brouckxon

van Waterschoot

Vanrumste

Verhelst

, and

Karsmakers

, “

The SINS Database for Detection of Daily Activities in a Home Environment Using an Acoustic Sensor Network

,” in

The Detection and Classification of Acoustic Scenes and Events 2017 Workshop (DCASE)

2017

–

[8]

Dekkers

Vuegen

van Waterschoot

Vanrumste

, and

Karsmakers

, “

DCASE 2018 Challenge - Task 5: Monitoring of Domestic Activities Based on Multi-Channel Acoustics

,”

Tech. rep., Detection, Classification of Acoustic Scenes, and Events (DCASE) 2018 Challenge

2018

[9]

Del Galdo

Thiergart

Weller

, and

E. A.

Habets

, “

Generating Virtual Microphone Signals Using Geometrical Information Gathered by Distributed Arrays

,” in

2011 Joint Workshop on Hands-free Speech Communication and Microphone Arrays

2011

185

–

190

[10]

T. B.

Duman

Bayram

, and

Ince

, “

Acoustic Anomaly Detection Using Convolutional Autoencoders in Industrial Processes

,” in

14th International Conference on Soft Computing Models in Industrial and Environmental Applications (SOCO)

2020

432

–

442

[11]

M. C.

Green

and

D. T.

Murphy

, “

Acoustic Scene Classification Using Spatial Features.

,”

Tech. rep., Detection, Classification of Acoustic Scenes, and Events (DCASE) 2017 Challenge

2017

–

[12]

Griffin

and

Lim

, “

Signal Estimation from Modified Short-Time Fourier Transform

,”

IEEE Transactions on Acoustics, Speech, and Signal Processing

(

1984

236

–

243

[13]

Zhang

Ren

, and

Sun

, “

Deep Residual Learning for Image Recognition

,” in

the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

2016

770

–

778

[14]

S. M.

Siniscalchi

C.-H. H.

Yang

, and

C.-H.

Lee

, “

A Variational Bayesian Approach to Learning Latent Variables for Acoustic Knowledge Transfer

,” in

2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

2022

4041

–

4045

[15]

Imoto

and

Ono

, “

Spatial Cepstrum as a Spatial Feature Using a Distributed Microphone Array for Acoustic Scene Analysis

,”

IEEE/ACM Transactions on Audio, Speech, and Language Processing

(

2017

1335

–

1343

[16]

Kim

Yang

Kim

, and

Chang

, “

QTI Submission to DCASE 2021: Residual Normalization for Device-Imbalanced Acoustic Scene Classification with Efficient Design

,”

Tech. rep., Detection, Classification of Acoustic Scenes, and Events (DCASE) 2021 Challenge

2021

[17]

D. P.

Kingma

and

, “

Adam: A Method for Stochastic Optimization

,” in

the 3rd International Conference on Learning Representations (ICLR)

2015

[18]

Kinoshita

and

Ono

, “

Analysis on Roles of DNNs in End-to-End Acoustic Scene Analysis Framework with Distributed Sound-to-Light Conversion Devices

,” in

2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)

2021

1167

–

1172

[19]

Kinoshita

and

Ono

, “

End-to-End Training for Acoustic Scene Analysis with Distributed Sound-to-Light Conversion Devices

,” in

2021 29th European Signal Processing Conference (EUSIPCO)

2021

1010

–

1014

[20]

Peddinti

Povey

M. L.

Seltzer

, and

Khudanpur

, “

A Study on Data Augmentation of Reverberant Speech for Robust Speech Recognition

,” in

IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

2017

5220

–

5224

[21]

Koizumi

Kawaguchi

Imoto

Nakamura

Nikaido

Tanabe

Purohit

Suefusa

Endo

Yasuda

, and

Harada

, “

Description and Discussion on DCASE2020 Challenge Task2: Unsupervised Anomalous Sound Detection for Machine Condition Monitoring

,” in

the Detection and Classification of Acoustic Scenes and Events 2020 Workshop (DCASE2020)

2020

–

[22]

Koutini

Eghbal-Zadeh

, and

Widmer

, “

Acoustic Scene Classification and Audio Tagging with Receptive-Field-Regularized CNNs

,”

Tech. rep., Detection, Classification of Acoustic Scenes, and Events (DCASE) 2019 Challenge

2019

[23]

Mesaros

Heittola

Benetos

Foster

Lagrange

Virtanen

, and

M. D.

Plumbley

IEEE/ACM Transactions on Audio, Speech, and Language Processing

(

2018

379

–

[24]

Ochiai

Delcroix

Nakatani

Ikeshita

Kinoshita

, and

Araki

, “

Neural Network-Based Virtual Microphone Estimator

,” in

2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

2021

6114

–

6118

[25]

D. S.

Park

Chan

Zhang

C.-C.

Chiu

Zoph

E. D.

Cubuk

, and

Q. V.

, “

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

,” in

Interspeech 2019 - 20th Annual Conference of the International Speech Communication Association

2019

2613

–

2617

[26]

Politis

Mesaros

Adavanne

Heittola

, and

Virtanen

, “

Overview and Evaluation of Sound Event Localization and Detection in DCASE 2019

,”

IEEE/ACM Transactions on Audio, Speech, and Language Processing

2020

684

–

698

[27]

Ronneberger

Fischer

, and

Brox

, “

U-Net: Convolutional Networks for Biomedical Image Segmentation

,” in

Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015

2015

234

–

241

[28]

Shiroma

Imoto

Shiota

Ono

, and

Kiya

, “

Investigation on Spatial and Frequency-Based Features for Asynchronous Acoustic Scene Analysis

,” in

2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)

2021

1161

–

1166

[29]

Suh

Park

Jeong

, and

Lee

, “

Designing Acoustic Scene Classification Models with CNN Variants

,”

Tech. rep., Detection, Classification of Acoustic Scenes, and Events (DCASE) 2020 Challenge

2020

[30]

Takahashi

Takamuku

Imoto

, and

Natori

, “

Semi-Supervised Domain Adaptation for Acoustic Scene Classification by Minimax Entropy and Self-Supervision Approaches

,” in

2022 International Workshop on Acoustic Signal Enhancement (IWAENC)

2022

–

[31]

Tanabe

Endo

Nikaido

Ichige

Nguyen

Kawaguchi

, and

Hamada

, “

Multichannel Acoustic Scene Classification by Blind Dereverberation, Dlind Source Separation, Data Augmentation, and Model Ensembling

,”

Tech. rep., Detection, Classification of Acoustic Scenes, and Events (DCASE) 2018 Challenge

2018

[32]

Tang

Qin

, and

Liu

, “

Document Modeling With Gated Recurrent Neural Network for Sentiment Classification

,” in

The 2015 Conference on Empirical Methods in Natural Language Processing

2015

1422

–

1432

[33]

Trockman

and

J. Z.

Kolter

, “

Patches Are All You Need?

”

arXiv preprint

2022

[34]

Yamaoka

Ono

Makino

, and

Yamada

, “

Abnormal Sound Detection by Two Microphones using Virtual Microphone Technique

,” in

2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)

2017

478

–

482

[35]

Yatabe

Masuyama

Kusano

, and

Oikawa

, “

Representation of Complex Spectrogram via Phase Conversion

,”

Acoustical Science and Technology

(

2019

170

–

177