Figure 2
A diagram illustrates a process for dialogue policy learning using a teacher-student framework, divided into three stages: Teacher Model Training, Student-Teacher Learning, and Final Dialogue Policy Model.

Student-teacher offline reinforcement learning process. It initially starts with teacher model trained with clean data, followed by student-teacher learning using both of clean and noisy data by the frozen teacher model. The well-trained student model is used as the final dialogue policy model where the student classifier parameter ϕcls is directly copied from the teacher classifier parameter θcls.

or Create an Account

Close Modal
Close Modal