Table 2.

Performance metrics of LLM-based de-identification process considering LLM-corrected human redaction as our ground truth

CoursePrecisionRecallKappa
GPT-4oLlama 3.3Llama 3.1GPT-4oLlama 3.3Llama 3.1GPT 4oLlama 3.3Llama 3.1
Accounting0.7640.6160.3270.9670.9710.9500.8520.7500.478
Calculus0.6680.6200.2240.9360.9740.9560.7780.7560.356
Design0.7790.7600.3850.9750.9600.9360.8640.8460.537
Gamification0.6850.5880.3420.9820.9480.9060.8060.7230.491
Business trends0.4730.3050.2720.9610.9720.9350.6300.4580.414
Poetry0.2730.1950.1100.9070.9050.8550.4190.3200.193
Mythology0.4860.5040.2040.9300.9780.9490.6360.6630.329
Probability0.6370.6000.1980.9290.9760.9490.7540.7410.321
Vaccines0.4450.3700.2920.9280.9740.9130.5970.5310.436
Average0.5790.5060.2620.9460.9620.9280.7040.6430.395

Note(s):

The best performing LLM for each course according to each metric is shown in italics

Source(s): Authors’ own creation

or Create an Account

Close Modal
Close Modal