Comparison of detection performance of several Deepfake detectors on the second genreation datasets under cross-domain training and with AUC as the performance metric. The AUC results of DefakeHop anad DefakeHop++ in both frame-level and video-level are given. The best and the second-best results are shown in boldface and underbared, respectively. Furthermore, we include results of DefakeHop and DefakeHop++ under the same-domain training in the last 4 rows. The AUC results of benchmarking methods are taken from [21] and the number of parameters are from https://keras.io/api/applications. Also, we use a to denote deep learning methods and b to denote non-deep-learning methods.
| 2nd Generation | ||||
|---|---|---|---|---|
| Method | Model | Celeb-DF v1 | Celeb-DF v2 | #param |
| Two-stream [44] | InceptionV3a | 55.7% | 53.8% | 23.9M |
| Meso4 [1] | Designed CNNa | 53.6% | 54.8% | 28.0K |
| MesoInception4 [1] | Designed CNNa | 49.6% | 53.6% | 28.6K |
| HeadPose [37] | SVMb | 54.8% | 54.6% | − |
| FWA [20] | ResNet-50a | 53.8% | 56.9% | 25.6M |
| VA-MLP [23] | Designed CNNa | 48.8% | 55.0% | − |
| VA-LogReg [23] | Logistic Regressionb | 46.9% | 55.1% | − |
| Xception-raw [27] | XceptionNeta | 38.7% | 48.2% | 22.9M |
| Xception-c23 [27] | XceptionNeta | − | 65.3% | 22.9M |
| Xception-c40 [27] | XceptionNeta | − | 65.5% | 22.9M |
| Multi-task [25] | Designed CNNa | 36.5% | 54.3% | − |
| Capsule [26] | CapsuleNeta | − | 57.5% | 3.9M |
| DSP-FWA [19] | SPPNeta | − | 64.6% | − |
| Multi-attentional [43] | Efficient-B4a | − | 67.4% | 19.5M |
| Ours (Frame Level) | DefakeHop++b | 56.30% | 60.5% | 238K |
| Ours (Video Level) | DefakeHop++b | 58.15% | 62.4% | 238K |
| Ours (Trained on Celeb-DF, Frame Level) | DefakeHopb | 93.1% | 87.7% | 42.8K |
| Ours (Trained on Celeb-DF, Video Level) | DefakeHopb | 95.0% | 90.6% | 42.8K |
| Ours (Trained on Celeb-DF, Frame Level) | DefakeHop++b | 95.4% | 94.3% | 238K |
| Ours (Trained on Celeb-DF, Video Level) | DefakeHop++b | 97.5% | 96.7% | 238K |
| 2nd Generation | ||||
|---|---|---|---|---|
| Method | Model | Celeb-DF v1 | Celeb-DF v2 | #param |
| Two-stream [ | InceptionV3 | 55.7% | 53.8% | 23.9M |
| Meso4 [ | Designed CNN | 53.6% | 54.8% | 28.0K |
| MesoInception4 [ | Designed CNNa | 49.6% | 53.6% | 28.6K |
| HeadPose [ | SVM | 54.8% | 54.6% | − |
| FWA [ | ResNet-50 | 53.8% | 56.9% | 25.6M |
| VA-MLP [ | Designed CNN | 48.8% | 55.0% | − |
| VA-LogReg [ | Logistic Regression | 46.9% | 55.1% | − |
| Xception-raw [ | XceptionNet | 38.7% | 48.2% | 22.9M |
| Xception-c23 [ | XceptionNeta | − | 65.3% | 22.9M |
| Xception-c40 [ | XceptionNet | − | 65.5% | 22.9M |
| Multi-task [ | Designed CNN | 36.5% | 54.3% | − |
| Capsule [ | CapsuleNet | − | 57.5% | 3.9M |
| DSP-FWA [ | SPPNeta | − | 64.6% | − |
| Multi-attentional [ | Efficient-B4 | − | 19.5M | |
| Ours (Frame Level) | DefakeHop++ | 56.30% | 60.5% | 238K |
| Ours (Video Level) | DefakeHop++ | 62.4% | 238K | |
| Ours (Trained on Celeb-DF, Frame Level) | DefakeHop | 93.1% | 87.7% | 42.8K |
| Ours (Trained on Celeb-DF, Video Level) | DefakeHop | 95.0% | 90.6% | 42.8K |
| Ours (Trained on Celeb-DF, Frame Level) | DefakeHop++ | 95.4% | 94.3% | 238K |
| Ours (Trained on Celeb-DF, Video Level) | DefakeHop++ | 238K | ||