Table 2 Comparison of detection...

Table 2

Comparison of detection performance of several Deepfake detectors on the second genreation datasets under cross-domain training and with AUC as the performance metric. The AUC results of DefakeHop anad DefakeHop++ in both frame-level and video-level are given. The best and the second-best results are shown in boldface and underbared, respectively. Furthermore, we include results of DefakeHop and DefakeHop++ under the same-domain training in the last 4 rows. The AUC results of benchmarking methods are taken from [21] and the number of parameters are from https://keras.io/api/applications. Also, we use ^a to denote deep learning methods and ^b to denote non-deep-learning methods.

		2nd Generation
Method	Model	Celeb-DF v1	Celeb-DF v2	#param
Two-stream [44]	InceptionV3^a	55.7%	53.8%	23.9M
Meso4 [1]	Designed CNN^a	53.6%	54.8%	28.0K
MesoInception4 [1]	Designed CNN^a	49.6%	53.6%	28.6K
HeadPose [37]	SVM^b	54.8%	54.6%	−
FWA [20]	ResNet-50^a	53.8%	56.9%	25.6M
VA-MLP [23]	Designed CNN^a	48.8%	55.0%	−
VA-LogReg [23]	Logistic Regression^b	46.9%	55.1%	−
Xception-raw [27]	XceptionNet^a	38.7%	48.2%	22.9M
Xception-c23 [27]	XceptionNet^a	−	65.3%	22.9M
Xception-c40 [27]	XceptionNet^a	−	65.5%	22.9M
Multi-task [25]	Designed CNN^a	36.5%	54.3%	−
Capsule [26]	CapsuleNet^a	−	57.5%	3.9M
DSP-FWA [19]	SPPNet^a	−	64.6%	−
Multi-attentional [43]	Efficient-B4^a	−	67.4%	19.5M
Ours (Frame Level)	DefakeHop++^b	56.30%	60.5%	238K
Ours (Video Level)	DefakeHop++^b	58.15%	62.4%	238K
Ours (Trained on Celeb-DF, Frame Level)	DefakeHop^b	93.1%	87.7%	42.8K
Ours (Trained on Celeb-DF, Video Level)	DefakeHop^b	95.0%	90.6%	42.8K
Ours (Trained on Celeb-DF, Frame Level)	DefakeHop++^b	95.4%	94.3%	238K
Ours (Trained on Celeb-DF, Video Level)	DefakeHop++^b	97.5%	96.7%	238K

		2nd Generation
Method	Model	Celeb-DF v1	Celeb-DF v2	#param
Two-stream [44]	InceptionV3^a	55.7%	53.8%	23.9M
Meso4 [1]	Designed CNN^a	53.6%	54.8%	28.0K
MesoInception4 [1]	Designed CNN^a	49.6%	53.6%	28.6K
HeadPose [37]	SVM^b	54.8%	54.6%	−
FWA [20]	ResNet-50^a	53.8%	56.9%	25.6M
VA-MLP [23]	Designed CNN^a	48.8%	55.0%	−
VA-LogReg [23]	Logistic Regression^b	46.9%	55.1%	−
Xception-raw [27]	XceptionNet^a	38.7%	48.2%	22.9M
Xception-c23 [27]	XceptionNet^a	−	65.3%	22.9M
Xception-c40 [27]	XceptionNet^a	−	65.5%	22.9M
Multi-task [25]	Designed CNN^a	36.5%	54.3%	−
Capsule [26]	CapsuleNet^a	−	57.5%	3.9M
DSP-FWA [19]	SPPNet^a	−	64.6%	−
Multi-attentional [43]	Efficient-B4^a	−	67.4%	19.5M
Ours (Frame Level)	DefakeHop++^b	56.30%	60.5%	238K
Ours (Video Level)	DefakeHop++^b	58.15%	62.4%	238K
Ours (Trained on Celeb-DF, Frame Level)	DefakeHop^b	93.1%	87.7%	42.8K
Ours (Trained on Celeb-DF, Video Level)	DefakeHop^b	95.0%	90.6%	42.8K
Ours (Trained on Celeb-DF, Frame Level)	DefakeHop++^b	95.4%	94.3%	238K
Ours (Trained on Celeb-DF, Video Level)	DefakeHop++^b	97.5%	96.7%	238K

[ViewLarge]

Sharing Unavailable