Implicit instruction reasoning by fine-tuning VLM for robotic manipulation

Cheng, Huakang; Li, Shaodong; Shuang, Feng; Gao, Fang

doi:10.1108/IR-05-2025-0175

Article navigation

Research Article| October 10 2025

Implicit instruction reasoning by fine-tuning VLM for robotic manipulation

Huakang Cheng;

Huakang Cheng

School of Electrical Engineering,

Guangxi University

, Nanning,

China

Search for other works by this author on:

This Site

PubMed

Google Scholar

Shaodong Li;

Shaodong Li

School of Electrical Engineering,

Guangxi University

, Nanning,

China

Corresponding author Shaodong Li lishaodongyx@126.com

Search for other works by this author on:

This Site

PubMed

Google Scholar

Feng Shuang;

Feng Shuang

School of Electrical Engineering,

Guangxi University

, Nanning,

China

Search for other works by this author on:

This Site

PubMed

Google Scholar

Fang Gao

School of Electrical Engineering,

Guangxi University

, Nanning,

China

Search for other works by this author on:

This Site

PubMed

Google Scholar

Author & Article Information

Corresponding author Shaodong Li lishaodongyx@126.com

Publisher: Emerald Publishing

Received: May 17 2025

Revision Received: August 13 2025

Accepted: August 20 2025

Online ISSN: 1758-5791

Print ISSN: 0143-991X

Funding

Funding Group:

Award Group:
- Funder(s):
  Natural Science Foundation of Guangxi-General Program
- Award Id(s):
  2025GXNSFAA069931
Award Group:
- Funder(s):
  Natural Science Foundation of Guangxi-Young Scientists
- Award Id(s):
  2023GXNSFBA02606
Funding Statement(s):
Natural Science Foundation of Guangxi-General Program (2025GXNSFAA069931), tNatural Science Foundation of Guangxi-Young Scientist (2023GXNSFBA02606).

2025

Emerald Publishing Limited

Licensed re-use rights only

Industrial Robot (2026) 53 (2): 409–420.

https://doi.org/10.1108/IR-05-2025-0175

Purpose

This study aims to realize long-horizon robotic manipulation guided by implicit instructions that convey real intent through metaphors, emotional expressions and other indirect means.

Design/methodology/approach

First, this study proposed shared attributes to enhance the ability of visual-language models (VLMs) in reasoning explicit intentions from implicit instructions. Specifically, the VLM was fine-tuned by adding shared attributes. These attributes are derived from those with the highest similarity extracted from images and explicit instructions, bridging intrinsic cross-modal semantic mappings between implicit expressions and explicit intentions. Owing to the lack of relevant data, an implicit instruction-based data set was constructed for fine-tuning the VLMs. Then, a hierarchical learning strategy was introduced to map explicit instructions to robotic controller parameters through a planning module, a sequencing module and an interaction learning module.

Findings

In these experiments, the fine-tuned VLM achieved state-of-the-art performance on both this study’s constructed data set and the public VAGUE benchmark and successfully executed ten implicit-instruction-guided robotic manipulation tasks in simulation and eight in the real world.

Originality/value

This work integrates implicit instructions into robot manipulation. To the best of the authors’ knowledge, this if the first study to introduce long-horizon robotic manipulation guided by implicit instructions. The authors propose to fine-tune the VLM by adding shared attributes, bridging intrinsic cross-modal semantic mappings between implicit expressions and explicit intentions. This study further introduces a hierarchical learning strategy to enable efficient transformation from implicit instructions to executable operations. This approach provides new perspectives on language-conditioned robotic manipulation and has the potential to be extended to a wide range of human-centered manipulation tasks. The video can be found at here.

2025

Emerald Publishing Limited

Licensed re-use rights only

You do not currently have access to this content.

Don't already have an account? Register

Implicit instruction reasoning by fine-tuning VLM for robotic manipulation

Email Alerts

Cited By

Implicit instruction reasoning by fine-tuning VLM for robotic manipulation

Sign in

Client Account

ICE Member Sign In

Email Alerts

Suggested Reading

Recommended for you

Cited By

Sharing Unavailable