Hume: Introducing System-2 Thinking in Visual-Language-Action Model

Overview

We present Hume, a dual-system vision-language-action model exploring human-like thinking capabilities for dexterous robot control. Equipped with value-guided System-2 thinking and cascaded action denoising, the model achieves superior complex reasoning and control capabilities. The model achieves state-of-the-art performance across a diverse range of evaluations and shows significantly advancement in complex robot control tasks.

Pipeline

The pipeline of Hume. Hume contains two systems working asynchronously. Given the observation, System 2 of Hume first generates \(N\) candidate action chunks with different noise level, and the best-of-N candidate with the highest \(Q\) value will be selected as the optimal candidate \(\mathbf{A}_{t}^{\tau^*}\), which is segmented and conveyed to System 1 for continuous action denoising.

Experiment

We evaluate Hume across 3 simulation environments and 3 different real-world robotic platforms, covering 15 robot learning scenarios and 21 real-world manipulation tasks.

Evaluation on WidowX 250s

put carrot on plate

put cup on white plate

put cup on pink cloth

put eggplant in basket

close microwave

lift red pepper

Evaluation on Frank-Emika-Panda

put banana in basket

put pot on cutting board

push handle aside

put penguin on toy car

put tiger on toy car

put blue cube on toy car

put green cube on toy car

put red cube on toy car

Evaluation on Agibot G-1

pass water

pour water

restock

fold shorts

Failure Recovery Powerd by Value-Guided Thinking

When a failure occurs, such as missing the grasping position, other policies fall into a failure state, and Hume selects the correct action through value-guided thinking to help it recover from the failure state and successfully complete the task.

Pi0: Trapped in Error

GR00T: Trapped in Error

Hume: Recover from Failure

Put banana in basket

Put handle aside

Put tiger on toy car

Put blue cube on toy car

Put green cube on toy car

Put red cube on toy car

Citation

If you find our work helpful, please cite us:

@article{song2025hume,
  title={Hume: Introducing System-2 Thinking in Visual-Language-Action Model},
  author={Anonimous Authors},
  journal={arXiv preprint arXiv:2505.21432},
  year={2025}
}

Hume Introducing System-2 Thinking in Visual-Language-Action Models