Hyperparameters of H-PPO
| Hyperparameter | Value | Description |
|---|---|---|
| Discount factor | 0.97 | Number used by stochastic gradient descent update |
| Minibatch size | 16 × 1,024 | Number used by stochastic gradient descent update |
| Learning rate | 0.005 | The learning rate used by Adam |
| GAE parameter(λ) | 0.97 | Advantage function estimation discounting factor |
| Clip parameter | 0.2 | Clipping range |
| Entropy coeff. | 0.01 | The coefficient of entropy |
| VF coeff. | 0.6 | The coefficient of the value function |
| Hidden layer1 node | 256 | The node number of 1st hidden layer |
| Hidden layer2 node | 256 | The node number of 2nd hidden layer |
| Value | Description | |
|---|---|---|
| 0.97 | Number used by stochastic gradient descent update | |
| 16 × 1,024 | Number used by stochastic gradient descent update | |
| 0.005 | The learning rate used by Adam | |
| 0.97 | Advantage function estimation discounting factor | |
| 0.2 | Clipping range | |
| 0.01 | The coefficient of entropy | |
| 0.6 | The coefficient of the value function | |
| 256 | The node number of 1st hidden layer | |
| 256 | The node number of 2nd hidden layer |