Disentangled Representations for Explaining and Learning from Demonstrations

University of Edinburgh
Conference on Robot Learning 2019, Osaka, Japan

Abstract: Learning from demonstration is an effective method for human users to instruct desired robot behaviour. However, for most non-trivial tasks of practical interest, efficient learning from demonstration depends crucially on inductive bias in the chosen structure for rewards/costs and policies. We address the case where this inductive bias comes from an exchange with a human user. We propose a method in which a learning agent utilizes the information bottleneck layer of a high-parameter variational neural model, with auxiliary loss terms, in order to ground abstract concepts such as spatial relations. The concepts are referred to in natural language instructions and are manifested in the high-dimensional sensory input stream the agent receives from the world.

We evaluate the properties of the latent space of the learned model in a photorealistic synthetic environment and particularly focus on examining its usability for downstream tasks. Additionally, through a series of controlled table-top manipulation experiments, we demonstrate that the learned manifold can be used to ground demonstrations as symbolic plans, which can then be executed on a PR2 robot.

Paper: https://arxiv.org/abs/1907.13627v2

Code: https://github.com/yordanh/spatial_relations_experiments

Data generation code: https://github.com/yordanh/clevr-dataset-gen

Photorealistic CLEVR training data: https://drive.google.com/file/d/1kslhtiZb1LTxcKSXXevHhUQ4gzrO9g_r/view?usp=sharing

Photorealistic CLEVR testing data: https://drive.google.com/file/d/1NoXwwTJExxhbHSL-meE4Blrr_Mx7GNed/view?usp=sharing

Robot data: https://drive.google.com/drive/folders/1UlfO4VkT26Xym0QbtjtJwQEYoog5x9i1?usp=sharing

Example photorealistic blocksworld setup (static scenes for training)

Example real-world robot setup

Evaluation Data

Photorealistic Moving BlocksWorld:

C-shape

Jump over

off-on-off

repetitive left-right

repetitive out-in

repetitive off-on

Teleoperating a PR2 robot for tabletop object manipulation (robot's PoV):

Make 2 cups 'face' each other

Put a cube in a bowl

Place a cube on a cylinder

Learning Framework:

Plan Segmentation: