Performance results of models (GPT-4o, GPT-4o-finetuning, DeepSeek-V3 and DeepSeek-R1)
| Tasks | Models | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| GPT-4o Base | GPT-4o-finetuning | DeepSeek-V3 base | DeepSeek-R1 base | |||||||||
| P | R | F1 | P | R | F1 | P | R | F1 | P | R | F1 | |
| MNR | 0.798 | 0.637 | 0.709 | 0.762 | 0.714 | 0.737 | 0.833 | 0.751 | 0.790 | 0.769 | 0.755 | 0.762 |
| DIA | 0.830 | 0.672 | 0.743 | 0.851 | 0.690 | 0.762 | 0.757 | 0.914 | 0.828 | 0.746 | 0.707 | 0.726 |
| ETI | 0.385 | 0.500 | 0.435 | 0.471 | 0.800 | 0.593 | 0.667 | 0.400 | 0.500 | 0.467 | 0.700 | 0.560 |
| GMI | 0.500 | 0.438 | 0.467 | 0.471 | 0.500 | 0.485 | 0.692 | 0.563 | 0.621 | 0.500 | 0.563 | 0.529 |
| PROG | 0.429 | 0.177 | 0.250 | 0.308 | 0.235 | 0.267 | 0.455 | 0.294 | 0.357 | 0.625 | 0.294 | 0.400 |
| TREAT | 0.876 | 0.698 | 0.777 | 0.833 | 0.785 | 0.808 | 0.918 | 0.779 | 0.843 | 0.837 | 0.837 | 0.837 |
| ENR | 0.471 | 0.367 | 0.412 | 0.502 | 0.557 | 0.528 | 0.373 | 0.398 | 0.385 | 0.500 | 0.430 | 0.462 |
| FEEL | 0.504 | 0.545 | 0.523 | 0.553 | 0.594 | 0.573 | 0.393 | 0.463 | 0.425 | 0.542 | 0.520 | 0.531 |
| VIEW | 0.359 | 0.143 | 0.204 | 0.443 | 0.510 | 0.474 | 0.341 | 0.316 | 0.328 | 0.431 | 0.316 | 0.365 |
| MNCX | 0.707 | 0.608 | 0.654 | 0.753 | 0.704 | 0.727 | 0.706 | 0.612 | 0.656 | 0.701 | 0.777 | 0.737 |
| BACK | 0.609 | 0.506 | 0.553 | 0.697 | 0.639 | 0.667 | 0.658 | 0.526 | 0.585 | 0.694 | 0.761 | 0.726 |
| CON | 0.867 | 0.289 | 0.433 | 0.550 | 0.468 | 0.506 | 0.537 | 0.449 | 0.489 | 0.574 | 0.700 | 0.631 |
| ELA | 0.779 | 0.815 | 0.796 | 0.856 | 0.836 | 0.846 | 0.790 | 0.750 | 0.770 | 0.749 | 0.817 | 0.781 |
| ENCX (CAUSE) | 0.707 | 0.716 | 0.712 | 0.846 | 0.846 | 0.846 | 0.671 | 0.671 | 0.671 | 0.790 | 0.790 | 0.790 |
| Overall | 0.681 | 0.569 | 0.619 | 0.706 | 0.692 | 0.699 | 0.652 | 0.608 | 0.629 | 0.690 | 0.696 | 0.693 |
| Tasks | Models | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| GPT-4o Base | GPT-4o-finetuning | DeepSeek-V3 base | DeepSeek-R1 base | |||||||||
| P | R | F1 | P | R | F1 | P | R | F1 | P | R | F1 | |
| DIA | 0.830 | 0.672 | 0.743 | 0.851 | 0.690 | 0.762 | 0.757 | 0.914 | 0.746 | 0.707 | 0.726 | |
| ETI | 0.385 | 0.500 | 0.435 | 0.471 | 0.800 | 0.667 | 0.400 | 0.500 | 0.467 | 0.700 | 0.560 | |
| GMI | 0.500 | 0.438 | 0.467 | 0.471 | 0.500 | 0.485 | 0.692 | 0.563 | 0.500 | 0.563 | 0.529 | |
| PROG | 0.429 | 0.177 | 0.250 | 0.308 | 0.235 | 0.267 | 0.455 | 0.294 | 0.357 | 0.625 | 0.294 | |
| TREAT | 0.876 | 0.698 | 0.777 | 0.833 | 0.785 | 0.808 | 0.918 | 0.779 | 0.837 | 0.837 | 0.837 | |
| FEEL | 0.504 | 0.545 | 0.523 | 0.553 | 0.594 | 0.573 | 0.393 | 0.463 | 0.425 | 0.542 | 0.520 | 0.531 |
| VIEW | 0.359 | 0.143 | 0.204 | 0.443 | 0.510 | 0.341 | 0.316 | 0.328 | 0.431 | 0.316 | 0.365 | |
| BACK | 0.609 | 0.506 | 0.553 | 0.697 | 0.639 | 0.667 | 0.658 | 0.526 | 0.585 | 0.694 | 0.761 | |
| CON | 0.867 | 0.289 | 0.433 | 0.550 | 0.468 | 0.506 | 0.537 | 0.449 | 0.489 | 0.574 | 0.700 | |
| ELA | 0.779 | 0.815 | 0.796 | 0.856 | 0.836 | 0.790 | 0.750 | 0.770 | 0.749 | 0.817 | 0.781 | |