Skip to Main Content
Article navigation
Purpose

This study aims to realize long-horizon robotic manipulation guided by implicit instructions that convey real intent through metaphors, emotional expressions and other indirect means.

Design/methodology/approach

First, this study proposed shared attributes to enhance the ability of visual-language models (VLMs) in reasoning explicit intentions from implicit instructions. Specifically, the VLM was fine-tuned by adding shared attributes. These attributes are derived from those with the highest similarity extracted from images and explicit instructions, bridging intrinsic cross-modal semantic mappings between implicit expressions and explicit intentions. Owing to the lack of relevant data, an implicit instruction-based data set was constructed for fine-tuning the VLMs. Then, a hierarchical learning strategy was introduced to map explicit instructions to robotic controller parameters through a planning module, a sequencing module and an interaction learning module.

Findings

In these experiments, the fine-tuned VLM achieved state-of-the-art performance on both this study’s constructed data set and the public VAGUE benchmark and successfully executed ten implicit-instruction-guided robotic manipulation tasks in simulation and eight in the real world.

Originality/value

This work integrates implicit instructions into robot manipulation. To the best of the authors’ knowledge, this if the first study to introduce long-horizon robotic manipulation guided by implicit instructions. The authors propose to fine-tune the VLM by adding shared attributes, bridging intrinsic cross-modal semantic mappings between implicit expressions and explicit intentions. This study further introduces a hierarchical learning strategy to enable efficient transformation from implicit instructions to executable operations. This approach provides new perspectives on language-conditioned robotic manipulation and has the potential to be extended to a wide range of human-centered manipulation tasks. The video can be found at here.

Licensed re-use rights only
You do not currently have access to this content.
Don't already have an account? Register

Purchased this content as a guest? Enter your email address to restore access.

Please enter valid email address.
Email address must be 94 characters or fewer.
Pay-Per-View Access
$41.00
Rental

or Create an Account

Close Modal
Close Modal