View on GitHub

A Mobile Robot Hand-Arm Teleoperation System by Vision and IMU

Shuang Li, Jiaxi Jiang, Philipp Ruppel, Hongzhuo Liang, Xiaojian Ma, Norman Hendrich, Fuchun Sun, Jianwei Zhang

Our goal is to build a mobile robotic hand-arm teleoperation system in which the teleoperator is in an unlimited workspace and performs natural hand motion for a series of manipulation tasks. In this paper(arXiv), we present a multimodal mobile teleoperation system that consists of a novel vision-based hand pose regression network (Transteleop) and an IMU-based arm tracking method. Network evaluation results on a test dataset and a variety of complex manipulation tasks that go beyond simple pick-and-place operations show the efficiency and stability of our multimodal teleoperation system.



A complete version:


Transteleop is a vision-based hand joint estimation network based on the image-to-image translation method. Transteleop extracts coherent pose features between the paired human and robot hand based on image-to-image translation methods. Transteleop takes the depth image of the human hand as input, then estimates the joint angles of the robot hand, and also generates the reconstructed image of the robot hand. A keypoint-based reconstruction loss explores the resemblance in appearance and anatomy between human and robotic hands and enriches the local features of reconstructed images. We trained Transteleop based on a recently released dataset of paired human-robot images from Teachnet.



1. Robot: a PR2 robot with a 19 DoF Shadow hand mounted on its right arm.
2. Camera: Intel RealSense SR300 depth sensor.
3. Camera holder. A wearable camera holder enables simultaneous hand-arm control and facilitates the mobility of the whole teleoperation system.
4. IMU setup: Perception Neuron(PN) device.


Network implementation details

170K training images were trained for 100 epochs, batch size 32, with random jitter. The input depth images are extracted from the raw depth image as a fixed-size cube around the hand and resized to 96 × 96. We apply Adam optimizer with a learning rate of 0.002 and momentum parameters β1 = 0.5, β2 = 0.999. We add a batch normalization (BN) layer and a rectified linear unit (ReLU) after each convolution layer. ReLU is also employed as an activation function after all FC layers except for the last FC layer.

The encoder-decoder architecture consists of:
Joint regression:

Control implementation details

The frequency of the arm’s velocity control is 20 Hz. The frequency of the hand’s trajectory control is set to 10 Hz. To simplify our experiments, we set the trajectory control within a proper maximum force for each joint. We only do collision check for the arm. The system was tested on a Alienware15 with Intel Core i7-4720HQ CPU machine.

Robot experiments

The multimodel teleoperation approach was systematically evaluated across four types of physical tasks that analyze precision and power grasps, prehensile and non-prehensile manipulation, and dual-arm handover tasks. One female and two male testers have participated in the following robotic experiments, and each task was randomly performed by one or two of them.
The complete video can be found in here.


Code of this project can be found at


If you found this paper useful in your research, please consider citing:

title={A Mobile Robot Hand-Arm Teleoperation System by Vision and IMU},
author={Li, Shuang and Jiang, Jiaxi and Ruppel, Philipp and Liang, Hongzhuo and Ma, Xiaojian and Hendrich, Norman and Sun, Fuchun and Zhang, Jianwei},
journal={arXiv preprint arXiv:2003.05212},
  title={Vision-based Teleoperation of Shadow Dexterous Hand using End-to-End Deep Neural Network},
  author={Li, Shuang and Ma, Xiaojian and Liang, Hongzhuo and G{\"o}rner, Michael and Ruppel, Philipp and Fang, Bing and Sun, Fuchun and Zhang, Jianwei},
  booktitle={IEEE International Conference on Robotics and Automation (ICRA)},