Performance metrics of LLM-based de-identification process considering LLM-corrected human redaction as our ground truth
| Course | Precision | Recall | Kappa | ||||||
|---|---|---|---|---|---|---|---|---|---|
| GPT-4o | Llama 3.3 | Llama 3.1 | GPT-4o | Llama 3.3 | Llama 3.1 | GPT 4o | Llama 3.3 | Llama 3.1 | |
| Accounting | 0.764 | 0.616 | 0.327 | 0.967 | 0.971 | 0.950 | 0.852 | 0.750 | 0.478 |
| Calculus | 0.668 | 0.620 | 0.224 | 0.936 | 0.974 | 0.956 | 0.778 | 0.756 | 0.356 |
| Design | 0.779 | 0.760 | 0.385 | 0.975 | 0.960 | 0.936 | 0.864 | 0.846 | 0.537 |
| Gamification | 0.685 | 0.588 | 0.342 | 0.982 | 0.948 | 0.906 | 0.806 | 0.723 | 0.491 |
| Business trends | 0.473 | 0.305 | 0.272 | 0.961 | 0.972 | 0.935 | 0.630 | 0.458 | 0.414 |
| Poetry | 0.273 | 0.195 | 0.110 | 0.907 | 0.905 | 0.855 | 0.419 | 0.320 | 0.193 |
| Mythology | 0.486 | 0.504 | 0.204 | 0.930 | 0.978 | 0.949 | 0.636 | 0.663 | 0.329 |
| Probability | 0.637 | 0.600 | 0.198 | 0.929 | 0.976 | 0.949 | 0.754 | 0.741 | 0.321 |
| Vaccines | 0.445 | 0.370 | 0.292 | 0.928 | 0.974 | 0.913 | 0.597 | 0.531 | 0.436 |
| Average | 0.579 | 0.506 | 0.262 | 0.946 | 0.962 | 0.928 | 0.704 | 0.643 | 0.395 |
| Course | Precision | Recall | Kappa | ||||||
|---|---|---|---|---|---|---|---|---|---|
| GPT-4o | Llama 3.3 | Llama 3.1 | GPT-4o | Llama 3.3 | Llama 3.1 | GPT 4o | Llama 3.3 | Llama 3.1 | |
| Accounting | 0.616 | 0.327 | 0.967 | 0.950 | 0.750 | 0.478 | |||
| Calculus | 0.620 | 0.224 | 0.936 | 0.956 | 0.756 | 0.356 | |||
| Design | 0.760 | 0.385 | 0.960 | 0.936 | 0.846 | 0.537 | |||
| Gamification | 0.588 | 0.342 | 0.948 | 0.906 | 0.723 | 0.491 | |||
| Business trends | 0.305 | 0.272 | 0.961 | 0.935 | 0.458 | 0.414 | |||
| Poetry | 0.195 | 0.110 | 0.905 | 0.855 | 0.320 | 0.193 | |||
| Mythology | 0.486 | 0.204 | 0.930 | 0.949 | 0.636 | 0.329 | |||
| Probability | 0.600 | 0.198 | 0.929 | 0.949 | 0.741 | 0.321 | |||
| Vaccines | 0.370 | 0.292 | 0.928 | 0.913 | 0.531 | 0.436 | |||
| Average | 0.506 | 0.262 | 0.946 | 0.928 | 0.643 | 0.395 | |||
Note(s):
The best performing LLM for each course according to each metric is shown in italics