This paper aims to tackle the challenges of high dimensionality, strong nonlinearity and tight coupling in motion control for the Kuavo humanoid robot by introducing a novel method based on inverse reinforcement learning (IRL).
To overcome traditional limitations relying on precise dynamic models and manual controllers, the authors use IRL to learn reward functions and policies from limited expert demonstrations. An action re-targeting technique maps human expert motion data to the Kuavo’s action space, generating initial motion references. This IRL framework uses these demonstrations to learn implicit reward functions and further incorporates velocity targets into the policy, thus formulating motion control as a velocity-conditioned Markov Decision Process to improve adaptability.
Experimental results demonstrate that this method effectively recovers reward functions, generates natural and stable motion control policies and improves the robot’s adaptability across different speeds and environments. The Kuavo robot achieves real-time motion adjustments according to speed commands, ensuring rapid response and stable control during variations in speed.
This study represents a pioneering application of IRL to humanoid robot motion control, particularly for the Kuavo robot. By leveraging limited expert demonstrations and integrating velocity-conditioned policies, the proposed method facilitates autonomous acquisition of diverse motion skills and adaptation to various tasks and environmental conditions, marking a significant advancement over traditional methods.
