Table 2. Performance metrics of...

Table 2.

Performance metrics of LLM-based de-identification process considering LLM-corrected human redaction as our ground truth

Course	Precision			Recall			Kappa
Course	GPT-4o	Llama 3.3	Llama 3.1	GPT-4o	Llama 3.3	Llama 3.1	GPT 4o	Llama 3.3	Llama 3.1
Accounting	0.764	0.616	0.327	0.967	0.971	0.950	0.852	0.750	0.478
Calculus	0.668	0.620	0.224	0.936	0.974	0.956	0.778	0.756	0.356
Design	0.779	0.760	0.385	0.975	0.960	0.936	0.864	0.846	0.537
Gamification	0.685	0.588	0.342	0.982	0.948	0.906	0.806	0.723	0.491
Business trends	0.473	0.305	0.272	0.961	0.972	0.935	0.630	0.458	0.414
Poetry	0.273	0.195	0.110	0.907	0.905	0.855	0.419	0.320	0.193
Mythology	0.486	0.504	0.204	0.930	0.978	0.949	0.636	0.663	0.329
Probability	0.637	0.600	0.198	0.929	0.976	0.949	0.754	0.741	0.321
Vaccines	0.445	0.370	0.292	0.928	0.974	0.913	0.597	0.531	0.436
Average	0.579	0.506	0.262	0.946	0.962	0.928	0.704	0.643	0.395

Course	Precision			Recall			Kappa
Course	GPT-4o	Llama 3.3	Llama 3.1	GPT-4o	Llama 3.3	Llama 3.1	GPT 4o	Llama 3.3	Llama 3.1
Accounting	0.764	0.616	0.327	0.967	0.971	0.950	0.852	0.750	0.478
Calculus	0.668	0.620	0.224	0.936	0.974	0.956	0.778	0.756	0.356
Design	0.779	0.760	0.385	0.975	0.960	0.936	0.864	0.846	0.537
Gamification	0.685	0.588	0.342	0.982	0.948	0.906	0.806	0.723	0.491
Business trends	0.473	0.305	0.272	0.961	0.972	0.935	0.630	0.458	0.414
Poetry	0.273	0.195	0.110	0.907	0.905	0.855	0.419	0.320	0.193
Mythology	0.486	0.504	0.204	0.930	0.978	0.949	0.636	0.663	0.329
Probability	0.637	0.600	0.198	0.929	0.976	0.949	0.754	0.741	0.321
Vaccines	0.445	0.370	0.292	0.928	0.974	0.913	0.597	0.531	0.436
Average	0.579	0.506	0.262	0.946	0.962	0.928	0.704	0.643	0.395

Note(s):

The best performing LLM for each course according to each metric is shown in italics

Source(s): Authors’ own creation