Table 3

End-to-end system evaluation results of the proposed student-teacher offline RL (STORL) compared to the previous approaches. The bold numbers indicate the best score in the metrics without considering system with perfect NLU. AUGPT [21] is a model participating in DSTC9 track 2. * means the scores were obtained by running the provided models. ** means the scores were obtained from the results mentioned in GPT-critic paper [17].

ConfigurationSuccess Rate (%)F1 ScoreComplete Rate (%)Booking Rate (%)Average Turn (Succ/All)
NLUDSTPolicyNLG
BERTRulePG*Template44.760.647.129.712.5/20.1
BERTRuleGDPL*Template47.264.650.026.811.9/19.3
BERTRulePPO*Template61.268.264.762.413.0/18.1
BERTRulePPO HITL*Template81.484.686.288.411.3/12.4
BERTRuleRuleTemplate83.886.292.791.511.4/11.9
End-to-End (DAMD)*34.256.939.652.015.6/30.2
End-to-End (MINTL)**68.169.071.465.415.7/20.7
End-to-End (UBAR)**74.376.079.880.814.2/18.1
End-to-End (CRR)**72.676.078.282.213.6/17.9
End-to-End (Decision Transformer)**75.377.081.383.514.8/18.0
End-to-End (GPT-Critic)**77.779.084.385.416.3/19.4
End-to-End (AUGPT (DSTC9 track 2))*6070.289.38612.7/13.9
BERTRuleSTORLTemplate84.786.992.892.111.5/12.3
BERTRuleSTORL (Teacher)Template83.388.692.091.811.4/12.3
PerfectRuleSTORLTemplate93.089.596.197.711.6/12.0
PerfectRuleSTORL (Teacher)Template92.691.296.098.011.6/12.0

or Create an Account

Close Modal
Close Modal