Table 1

Hyperparameters of H-PPO

HyperparameterValueDescription
Discount factor0.97Number used by stochastic gradient descent update
Minibatch size16 × 1,024Number used by stochastic gradient descent update
Learning rate0.005The learning rate used by Adam
GAE parameter(λ)0.97Advantage function estimation discounting factor
Clip parameter0.2Clipping range
Entropy coeff.0.01The coefficient of entropy
VF coeff.0.6The coefficient of the value function
Hidden layer1 node256The node number of 1st hidden layer
Hidden layer2 node256The node number of 2nd hidden layer

or Create an Account

Close Modal
Close Modal