A goal-conditioned policy search method with multi-timescale value function tuning

Jiang, Zhihong; Hu, Jiachen; Zhao, Yan; Huang, Xiao; Li, Hui

doi:10.1108/RIA-11-2023-0167

Article navigation

Research Article| June 11 2024

A goal-conditioned policy search method with multi-timescale value function tuning

Zhihong Jiang;

Zhihong Jiang

School of Mechatronical Engineering,

Beijing Institute of Technology

, Beijing,

China

Search for other works by this author on:

This Site

PubMed

Google Scholar

Jiachen Hu;

Jiachen Hu

Beijing Institute of Technology

, Beijing,

China

Search for other works by this author on:

This Site

PubMed

Google Scholar

Yan Zhao;

Yan Zhao

Beijing Institute of Technology

, Beijing,

China

Search for other works by this author on:

This Site

PubMed

Google Scholar

Xiao Huang;

Xiao Huang

Beijing Institute of Technology

, Beijing,

China

Xiao Huang can be contacted at: huangxiao@bit.edu.cn

Search for other works by this author on:

This Site

PubMed

Google Scholar

Hui Li

School of Mechatronical Engineering,

Beijing Institute of Technology

, Beijing,

China

Search for other works by this author on:

This Site

PubMed

Google Scholar

Author & Article Information

Xiao Huang can be contacted at: huangxiao@bit.edu.cn

Publisher: Emerald Publishing

Received: November 21 2023

Revision Received: January 28 2024

Revision Received: March 02 2024

Accepted: March 02 2024

Online ISSN: 2754-6977

Print ISSN: 2754-6969

2024

Emerald Publishing Limited

Licensed re-use rights only

Robotic Intelligence and Automation (2024) 44 (4): 549–559.

https://doi.org/10.1108/RIA-11-2023-0167

Purpose

Current reinforcement learning (RL) algorithms are facing issues such as low learning efficiency and poor generalization performance, which significantly limit their practical application in real robots. This paper aims to adopt a hybrid model-based and model-free policy search method with multi-timescale value function tuning, aiming to allow robots to learn complex motion planning skills in multi-goal and multi-constraint environments with a few interactions.

Design/methodology/approach

A goal-conditioned model-based and model-free search method with multi-timescale value function tuning is proposed in this paper. First, the authors construct a multi-goal, multi-constrained policy optimization approach that fuses model-based policy optimization with goal-conditioned, model-free learning. Soft constraints on states and controls are applied to ensure fast and stable policy iteration. Second, an uncertainty-aware multi-timescale value function learning method is proposed, which constructs a multi-timescale value function network and adaptively chooses the value function planning timescales according to the value prediction uncertainty. It implicitly reduces the value representation complexity and improves the generalization performance of the policy.

Findings

The algorithm enables physical robots to learn generalized skills in real-world environments through a handful of trials. The simulation and experimental results show that the algorithm outperforms other relevant model-based and model-free RL algorithms.

Originality/value

This paper combines goal-conditioned RL and the model predictive path integral method into a unified model-based policy search framework, which improves the learning efficiency and policy optimality of motor skill learning in multi-goal and multi-constrained environments. An uncertainty-aware multi-timescale value function learning and selection method is proposed to overcome long horizon problems, improve optimal policy resolution and therefore enhance the generalization ability of goal-conditioned RL.

2024

Emerald Publishing Limited

Licensed re-use rights only

You do not currently have access to this content.

Don't already have an account? Register

A goal-conditioned policy search method with multi-timescale value function tuning

Email Alerts

Cited By

A goal-conditioned policy search method with multi-timescale value function tuning Available to Purchase

Sign in

Client Account

ICE Member Sign In

Email Alerts

Recommended for you

Cited By

Gift article access

Gift article access

Gift article access

Gift article access

Sharing Unavailable

A goal-conditioned policy search method with multi-timescale value function tuning