Table 3 End-to-end system evaluation...

Table 3

End-to-end system evaluation results of the proposed student-teacher offline RL (STORL) compared to the previous approaches. The bold numbers indicate the best score in the metrics without considering system with perfect NLU. AUGPT [21] is a model participating in DSTC9 track 2. * means the scores were obtained by running the provided models. ** means the scores were obtained from the results mentioned in GPT-critic paper [17].

Configuration				Success Rate (%)	F1 Score	Complete Rate (%)	Booking Rate (%)	Average Turn (Succ/All)
NLU	DST	Policy	NLG	Success Rate (%)	F1 Score	Complete Rate (%)	Booking Rate (%)	Average Turn (Succ/All)
BERT	Rule	PG^*	Template	44.7	60.6	47.1	29.7	12.5/20.1
BERT	Rule	GDPL^*	Template	47.2	64.6	50.0	26.8	11.9/19.3
BERT	Rule	PPO*	Template	61.2	68.2	64.7	62.4	13.0/18.1
BERT	Rule	PPO HITL*	Template	81.4	84.6	86.2	88.4	11.3/12.4
BERT	Rule	Rule	Template	83.8	86.2	92.7	91.5	11.4/11.9
End-to-End (DAMD)*				34.2	56.9	39.6	52.0	15.6/30.2
End-to-End (MINTL)**				68.1	69.0	71.4	65.4	15.7/20.7
End-to-End (UBAR)**				74.3	76.0	79.8	80.8	14.2/18.1
End-to-End (CRR)**				72.6	76.0	78.2	82.2	13.6/17.9
End-to-End (Decision Transformer)**				75.3	77.0	81.3	83.5	14.8/18.0
End-to-End (GPT-Critic)**				77.7	79.0	84.3	85.4	16.3/19.4
End-to-End (AUGPT (DSTC9 track 2))*				60	70.2	89.3	86	12.7/13.9
BERT	Rule	STORL	Template	84.7	86.9	92.8	92.1	11.5/12.3
BERT	Rule	STORL (Teacher)	Template	83.3	88.6	92.0	91.8	11.4/12.3
Perfect	Rule	STORL	Template	93.0	89.5	96.1	97.7	11.6/12.0
Perfect	Rule	STORL (Teacher)	Template	92.6	91.2	96.0	98.0	11.6/12.0

Configuration				Success Rate (%)	F1 Score	Complete Rate (%)	Booking Rate (%)	Average Turn (Succ/All)
NLU	DST	Policy	NLG	Success Rate (%)	F1 Score	Complete Rate (%)	Booking Rate (%)	Average Turn (Succ/All)
BERT	Rule	PG^*	Template	44.7	60.6	47.1	29.7	12.5/20.1
BERT	Rule	GDPL^*	Template	47.2	64.6	50.0	26.8	11.9/19.3
BERT	Rule	PPO*	Template	61.2	68.2	64.7	62.4	13.0/18.1
BERT	Rule	PPO HITL*	Template	81.4	84.6	86.2	88.4	11.3/12.4
BERT	Rule	Rule	Template	83.8	86.2	92.7	91.5	11.4/11.9
End-to-End (DAMD)*				34.2	56.9	39.6	52.0	15.6/30.2
End-to-End (MINTL)**				68.1	69.0	71.4	65.4	15.7/20.7
End-to-End (UBAR)**				74.3	76.0	79.8	80.8	14.2/18.1
End-to-End (CRR)**				72.6	76.0	78.2	82.2	13.6/17.9
End-to-End (Decision Transformer)**				75.3	77.0	81.3	83.5	14.8/18.0
End-to-End (GPT-Critic)**				77.7	79.0	84.3	85.4	16.3/19.4
End-to-End (AUGPT (DSTC9 track 2))*				60	70.2	89.3	86	12.7/13.9
BERT	Rule	STORL	Template	84.7	86.9	92.8	92.1	11.5/12.3
BERT	Rule	STORL (Teacher)	Template	83.3	88.6	92.0	91.8	11.4/12.3
Perfect	Rule	STORL	Template	93.0	89.5	96.1	97.7	11.6/12.0
Perfect	Rule	STORL (Teacher)	Template	92.6	91.2	96.0	98.0	11.6/12.0

[ViewLarge]

Sharing Unavailable