Table 4 Variables in DRL model Variable... | Emerald Publishing

Table 4

Variables in DRL model

Variable	Description	Equation
$S_{t}$	Current state of the environment at time $t$ ⁠, including mold and crew utilization, and job progress	Eq (8)
$a_{t}$	Selected action at decision point $t$ ⁠, representing job allocation to mold plates	Eq (9)
$a_{u t}$	Valid action at decision point $t$	–
$Q_{t} (S_{t}, a_{t})$	Estimated Q-value of taking action $a_{t}$ in state $S_{t}$	Eq (12)
$Q_{u t}$	Refined estimated value after action masking	–
$s_{m}$	The information about the job currently being executed on the mold plate $m$	Eq (8)
$O_{m}$	One-hot encoding. For each type of PC jobs, a unique vector is designed to identify which type of job is currently under execution on mold plate $m$	Eq (8)
$P$	Order completion progress $P \in R$ ⁠, representing the current percentage of jobs completed or being executed for a specific type of PC order, consists of amount percentage and relative due dates	Eq (8)
$I$	Job completion progress, representing the completion progress of various processes for jobs currently being executed	Eq (8)
$\hat{A}$	Action space	Eq.(11)
$n_{j}$	Selecting n orders from the $j^{t h}$ type of PC jobs	Eq.(9)
$ε$	The coefficiency of $ε - g r e e d y$ policy	Eq.(12)
$φ (S_{t}, a_{t})$	The action masking function at decision point $t$	Eq (14)Eq (13)
$\hat{A_{φ}} (S_{t})$	Masked action space at decision point $t$	–
$R W$	Reward received after taking action	Eq (14) - Eq (17)

Variable	Description	Equation
$S_{t}$	Current state of the environment at time $t$ , including mold and crew utilization, and job progress	Eq (8)
$a_{t}$	Selected action at decision point $t$ , representing job allocation to mold plates	Eq (9)
$a_{u t}$	Valid action at decision point $t$	–
$Q_{t} (S_{t}, a_{t})$	Estimated Q-value of taking action $a_{t}$ in state $S_{t}$	Eq (12)
$Q_{u t}$	Refined estimated value after action masking	–
$s_{m}$	The information about the job currently being executed on the mold plate $m$	Eq (8)
$O_{m}$	One-hot encoding. For each type of PC jobs, a unique vector is designed to identify which type of job is currently under execution on mold plate $m$	Eq (8)
$P$	Order completion progress $P \in R$ , representing the current percentage of jobs completed or being executed for a specific type of PC order, consists of amount percentage and relative due dates	Eq (8)
$I$	Job completion progress, representing the completion progress of various processes for jobs currently being executed	Eq (8)
$\hat{A}$	Action space	Eq.(11)
$n_{j}$	Selecting n orders from the $j^{t h}$ type of PC jobs	Eq.(9)
$ε$	The coefficiency of $ε - g r e e d y$ policy	Eq.(12)
$φ (S_{t}, a_{t})$	The action masking function at decision point $t$	Eq (14)Eq (13)
$\hat{A_{φ}} (S_{t})$	Masked action space at decision point $t$	–
$R W$	Reward received after taking action	Eq (14) - Eq (17)

Source(s): Authors’ own work