The detection performance of some popular detection models. The evaluation metrics are Area under ROC curve (AUROC) and accuracy (ACC). The results of GPTZero, OpenAl-GPT-2 [21], and OriginalityAI detectors are derived from the data presented in [9].
| Test Dataset | ACC | AUROC | ||||
|---|---|---|---|---|---|---|
| HPPT | HC3 | CDB | HPPT | HC3 | CDB | |
| GPTZero | - | - | 0.4406 | - | - | 0.6818 |
| OpenAI-GPT-2 | - | - | 0.4633 | - | - | 0.5604 |
| OriginalityAI | - | - | 0.4967 | - | - | 0.5721 |
| DetectGPT | 0.5129 | 0.8309 | 0.4593 | 0.6876 | 0.9058 | 0.7308 |
| Roberta-HC3 | 0.5285 | 0.9991 | 0.5848 | 0.7946 | 1 | 0.7526 |
| Roberta-HPPT (ours) | 0.9465 | 0.9671 | 0.8825 | 0.9947 | 0.9931 | 0.9518 |
| Test Dataset | ACC | AUROC | ||||
|---|---|---|---|---|---|---|
| HPPT | HC3 | CDB | HPPT | HC3 | CDB | |
| GPTZero | - | - | 0.4406 | - | - | 0.6818 |
| OpenAI-GPT-2 | - | - | 0.4633 | - | - | 0.5604 |
| OriginalityAI | - | - | 0.4967 | - | - | 0.5721 |
| DetectGPT | 0.5129 | 0.8309 | 0.4593 | 0.6876 | 0.9058 | 0.7308 |
| Roberta-HC3 | 0.5285 | 0.5848 | 0.7946 | 0.7526 | ||
| Roberta-HPPT (ours) | 0.9671 | 0.9931 |
Sharing content requires targeting cookies to be enabled. Please update your cookie preferences to use this feature.