A modern & clean implementation of the PILCO Algorithm in TensorFlow v2
.
Unlike PILCO's original implementation which was written as a self-contained package of MATLAB
, this repository aims to provide a clean implementation by heavy use of modern machine learning libraries.
In particular, we use TensorFlow v2
to avoid the need for hardcoded gradients and scale to GPU architectures. Moreover, we use GPflow v2
for Gaussian Process Regression.
The core functionality is tested against the original MATLAB
implementation.
Before using PILCO
you have to install it by running:
git clone https://github.com/nrontsis/PILCO && cd PILCO
python setup.py develop
It is recommended to install everything in a fresh conda environment with python>=3.7
The examples included in this repo use OpenAI gym 0.15.3
and mujoco-py 2.0.2.7
. Theses dependecies should be installed manually. Then, you can run one of the examples as follows
python examples/inverted_pendulum.py
As an example of the extensibility of the framework, we include in the folder safe_pilco_extension
an extension of the standard PILCO algorithm that takes safety constraints (defined on the environment's state space) into account as in https://arxiv.org/abs/1712.05556. The safe_swimmer_run.py
and safe_cars_run.py
in the examples
folder demonstrate the use of this extension.
The following people have been involved in the development of this package:
See the following publications for a description of the algorithm: 1, 2, 3