Figure 2 Student-teacher offline...

Figure 2

A diagram illustrates a process for dialogue policy learning using a teacher-student framework, divided into three stages: Teacher Model Training, Student-Teacher Learning, and Final Dialogue Policy Model.

Student-teacher offline reinforcement learning process. It initially starts with teacher model trained with clean data, followed by student-teacher learning using both of clean and noisy data by the frozen teacher model. The well-trained student model is used as the final dialogue policy model where the student classifier parameter ϕ_cls is directly copied from the teacher classifier parameter θ_cls.

Sharing Unavailable