Figure 1 On the left, a rectangular box...

Figure 1

A diagram showing the interaction between an agent and environment with actions, states, and rewards labeled.

On the left, a rectangular box labeled “Agent Policy pi” is positioned. From the top of this “Agent” box, an arrow extends horizontally to the right, pointing toward the top of another rectangular box labeled “Environment” on the right. This arrow is labeled “Action a subscript t.” Below this main action flow, two feedback loops run back from the “Environment” to the “Agent.” From the bottom of the “Environment” box, a horizontal arrow labeled “Reward r subscript t plus 1” extends leftwards to a vertical dashed line at the mid-bottom. From this line, an arrow, labeled “Reward r subscript t,” connects to the bottom of the “Agent” box. Another horizontal arrow, labeled “State s subscript t plus 1,” from the bottom of the “Environment” box, extending leftwards to the vertical dashed line at the mid-bottom. From this line, an arrow, labeled State s subscript t,” connects to the bottom of the “Agent” box.

DRL framework. Source: Authors’ own work (2024)

Sharing Unavailable