Table 4

Variables in DRL model

VariableDescriptionEquation
StCurrent state of the environment at time t, including mold and crew utilization, and job progressEq (8)
atSelected action at decision point t, representing job allocation to mold platesEq (9)
autValid action at decision point t
Qt(St,at)Estimated Q-value of taking action at in state StEq (12)
QutRefined estimated value after action masking
smThe information about the job currently being executed on the mold plate mEq (8)
OmOne-hot encoding. For each type of PC jobs, a unique vector is designed to identify which type of job is currently under execution on mold plate mEq (8)
POrder completion progress PR, representing the current percentage of jobs completed or being executed for a specific type of PC order, consists of amount percentage and relative due datesEq (8)
IJob completion progress, representing the completion progress of various processes for jobs currently being executedEq (8)
AˆAction spaceEq.(11)
njSelecting n orders from the jth type of PC jobsEq.(9)
εThe coefficiency of εgreedy policyEq.(12)
φ(St,at)The action masking function at decision point tEq (14)Eq (13)
Aφˆ(St)Masked action space at decision point t
RWReward received after taking actionEq (14) - Eq (17)
Source(s): Authors’ own work

or Create an Account

Close Modal
Close Modal