End-to-end system evaluation results of the proposed student-teacher offline RL (STORL) compared to the previous approaches. The bold numbers indicate the best score in the metrics without considering system with perfect NLU. AUGPT [21] is a model participating in DSTC9 track 2. * means the scores were obtained by running the provided models. ** means the scores were obtained from the results mentioned in GPT-critic paper [17].
| Configuration | Success Rate (%) | F1 Score | Complete Rate (%) | Booking Rate (%) | Average Turn (Succ/All) | |||
|---|---|---|---|---|---|---|---|---|
| NLU | DST | Policy | NLG | |||||
| BERT | Rule | PG* | Template | 44.7 | 60.6 | 47.1 | 29.7 | 12.5/20.1 |
| BERT | Rule | GDPL* | Template | 47.2 | 64.6 | 50.0 | 26.8 | 11.9/19.3 |
| BERT | Rule | PPO* | Template | 61.2 | 68.2 | 64.7 | 62.4 | 13.0/18.1 |
| BERT | Rule | PPO HITL* | Template | 81.4 | 84.6 | 86.2 | 88.4 | 11.3/12.4 |
| BERT | Rule | Rule | Template | 83.8 | 86.2 | 92.7 | 91.5 | 11.4/11.9 |
| End-to-End (DAMD)* | 34.2 | 56.9 | 39.6 | 52.0 | 15.6/30.2 | |||
| End-to-End (MINTL)** | 68.1 | 69.0 | 71.4 | 65.4 | 15.7/20.7 | |||
| End-to-End (UBAR)** | 74.3 | 76.0 | 79.8 | 80.8 | 14.2/18.1 | |||
| End-to-End (CRR)** | 72.6 | 76.0 | 78.2 | 82.2 | 13.6/17.9 | |||
| End-to-End (Decision Transformer)** | 75.3 | 77.0 | 81.3 | 83.5 | 14.8/18.0 | |||
| End-to-End (GPT-Critic)** | 77.7 | 79.0 | 84.3 | 85.4 | 16.3/19.4 | |||
| End-to-End (AUGPT (DSTC9 track 2))* | 60 | 70.2 | 89.3 | 86 | 12.7/13.9 | |||
| BERT | Rule | STORL | Template | 84.7 | 86.9 | 92.8 | 92.1 | 11.5/12.3 |
| BERT | Rule | STORL (Teacher) | Template | 83.3 | 88.6 | 92.0 | 91.8 | 11.4/12.3 |
| Perfect | Rule | STORL | Template | 93.0 | 89.5 | 96.1 | 97.7 | 11.6/12.0 |
| Perfect | Rule | STORL (Teacher) | Template | 92.6 | 91.2 | 96.0 | 98.0 | 11.6/12.0 |
| Configuration | Success Rate (%) | F1 Score | Complete Rate (%) | Booking Rate (%) | Average Turn (Succ/All) | |||
|---|---|---|---|---|---|---|---|---|
| NLU | DST | Policy | NLG | |||||
| BERT | Rule | PG* | Template | 44.7 | 60.6 | 47.1 | 29.7 | 12.5/20.1 |
| BERT | Rule | GDPL* | Template | 47.2 | 64.6 | 50.0 | 26.8 | 11.9/19.3 |
| BERT | Rule | PPO* | Template | 61.2 | 68.2 | 64.7 | 62.4 | 13.0/18.1 |
| BERT | Rule | PPO HITL* | Template | 81.4 | 84.6 | 86.2 | 88.4 | 11.3/12.4 |
| BERT | Rule | Rule | Template | 83.8 | 86.2 | 92.7 | 91.5 | |
| End-to-End (DAMD)* | 34.2 | 56.9 | 39.6 | 52.0 | 15.6/30.2 | |||
| End-to-End (MINTL)** | 68.1 | 69.0 | 71.4 | 65.4 | 15.7/20.7 | |||
| End-to-End (UBAR)** | 74.3 | 76.0 | 79.8 | 80.8 | 14.2/18.1 | |||
| End-to-End (CRR)** | 72.6 | 76.0 | 78.2 | 82.2 | 13.6/17.9 | |||
| End-to-End (Decision Transformer)** | 75.3 | 77.0 | 81.3 | 83.5 | 14.8/18.0 | |||
| End-to-End (GPT-Critic)** | 77.7 | 79.0 | 84.3 | 85.4 | 16.3/19.4 | |||
| End-to-End (AUGPT (DSTC9 track 2))* | 60 | 70.2 | 89.3 | 86 | 12.7/13.9 | |||
| BERT | Rule | STORL | Template | 86.9 | 11.5/12.3 | |||
| BERT | Rule | STORL (Teacher) | Template | 83.3 | 92.0 | 91.8 | 11.4/12.3 | |
| Perfect | Rule | STORL | Template | 93.0 | 89.5 | 96.1 | 97.7 | 11.6/12.0 |
| Perfect | Rule | STORL (Teacher) | Template | 92.6 | 91.2 | 96.0 | 98.0 | 11.6/12.0 |
Sharing content requires targeting cookies to be enabled. Please update your cookie preferences to use this feature.