Using ODE Environments

Using an existing ODE environment

This tutorial walks you through the process of setting up an existing ODE Environment for use as a testbed for RL or optimization algorithms.

First we need the following additional packages that are not required for PyBrain (in addition to SciPy):
  • matplotlib
  • python-tk
  • python-pyode
  • python-opengl (if you also want to view what is happening, very recommended)

You also need to exchange the following two .py files with custom versions:

cd pybrain/pybrain/rl/environments/ode/xode_changes/
sudo cp * /usr/lib/python2.6/dist-packages/xode/ (or there ever your dist-packages are)

You can test if all your settings are ok by starting following example:

cd ~/pybrain/examples/rl/
python johnnie_pgpe.py

... and then view what is happening by using the viewer:

cd ~/pybrain/pybrain/rl/environments/ode
python viewer.py

Note

On Linux, if that gives rise to a segmentation fault, try installing xorg-driver-fglrx

Existing ODE Environments that are tested are:

Creating your own learning task in an existing ODE environment

This tutorial walks you through the process of setting up a new task within an existing ODE Environment. It assumes that you have taken the steps described in the section Using an existing ODE environment.

For all ODE environments there can be found a standard task in pybrain/rl/environments/ode/tasks

We take as an example again the Johnnie environment. You will find that the first class in the johnnie.py file in the above described location is named JohnnieTask and inherits from EpisodicTask.

The necessary methods that you need to define your own task are described already in that basic class:
  • __init__(self, env) - the constructor
  • performAction(self, action) - processes and filters the output from the controller and communicates it to the environment.
  • isFinished(self) - checks if the maximum number of timesteps has been reached or if other break condition has been met.
  • res(self) - resets counters rewards and similar.

If we take a look at the StandingTask (the next class in the file) we see that only little has to be done to create an own task. First of all the class must inherit from JohnnieTask. Then, the constructor has to be overwritten to declare some variables and constants for the specific task. In this case there were some additional position sensors added and normalized for reward calculation. As normally last step the getReward Method has to be overwritten, because the reward definition is normally what defines the task. In this case just the vertical head position is returned (with some clipping to prevent the robot from jumping to get more reward). That is already enough to create a task that is sufficiently defined to make a proper learning method (like PGPE in the above mentioned and testable example johnnie_pgpe.py) learn a controller that let the robot stand complete upright without falling.

For some special cases you maybe are forced to rewrite the performAction method and the isFinished method, but that special cases are out of scope of this HowTo. If you need to make such changes and encounter problems please feel free to contact the PyBrain mailing list.

Creating your own ODE environment

This tutorial walks you through the process of setting up a new ODE Environment. It assumes that you are already familiar with the sections Using an existing ODE environment and Creating your own learning task in an existing ODE environment and have taken the necessary steps explained there.

If you want to your own environment you need the following:

  • Environment that inherits from ODEEnvironment
  • Agent that inherits from OptimizationAgent
  • Tasks that inherit from EpisodicTask

For all ODE environments, an instance can be found in pybrain/rl/environments/ode/instances/

We take as an example again the Johnnie environment. You will find that the first class in the johnnie.py file in the location described above is named JohnnieEnvironment and inherits from ODEEnvironment.

You will see that were is not much to do on the PyBrain side to generate the environment class. First loading the corresponding XODE file is necessary to provide PyBrain with the specification of the simulation. How to generate the corresponding XODE file will be shown later in this HowTo. Then the standard sensors are added like the JointSensors, the corresponding JointVelocitySensors and also the actuators for every joint. Because this kind of sensors and actuators are needed in every simulation they are already added in the environment and assumed to exist by later stages of PyBrain.

The next part is a bit more involved. First, member variables that state the number of action dimensions and number of sensors have to be set.

self.actLen = self.getActionLength()
self.obsLen = len(self.getSensors())

Next, 3 lists are generated for every action dimension. The first list is called torqueList and states the fraction of the global maximal force that can bee applied to the joints. The second list states the maximum angle, the third list states the minimum angle for every joint. (cHighList and cLowList) For example:

self.tourqueList = array([0.2, 0.2, 0.2, 0.5, 0.5, 2.0, 2.0,2.0,2.0,0.5,0.5],)
self.cHighList = array([1.0, 1.0, 0.5, 0.5, 0.5, 1.5, 1.5,1.5,1.5,0.25,0.25],)
self.cLowList = array([-0.5, -0.5, -0.5, 0.0, 0.0, 0.0, 0.0,0.0,0.0,-0.25,-0.25],)

The last thing to do is how much simulation steps ODE should make before getting an update from the controller and sending new sensor values back, called stepsPerAction.

Creating your own XODE instance

Now we want to specify a instantiation in a XODE file. If you do not know ODE very well, you can use a script that is shipped with PyBrain and can be found in pybrain/rl/environments/ode/tools/xodetools.py

The first part of the file is responsible for parsing the simplified XODE code to a regular XODE file, that can be ignored. For an example, look at the Johnnie definition by searching for class XODEJohnnie(XODEfile)

The instantiation of what you want to simulate in ODE is defined in this tool as a class that inherits from XODEfile. The class consists only of a constructor. Here all parts of the simulated object are defined. The parts are defined in an global coordinate system. For examples the row

self.insertBody('arm_left','cappedCylinder',[0.25,7.5],5,pos=[2.06,-2.89,0],
                                euler=[90,0,0], passSet=['total'], mass=2.473)

creates the left arm (identifier ‘arm_left’) of Johnnie as an cylinder with round endings (‘cappedCylinder’) with a diameter of 0.25 and a length of 7.5 ([0.25,7.5]) with a density of 5 (that will be overwritten if the optional value mass is given at the end of the command), an initial position of pos = [2.06,-2.89,0], turned by 90 degrees around the x-Axis (euler = [90,0,0], all capped cylinders are by default aligned with the y-Axis) the passSet named ‘total’ (will be explained soon) and the optional mass of the part.

“passSet” is used to define parts that can penetrate each other. That is especially necessary for parts that have a joint together, but can also be usable in other cases. All parts that are part of the same passSet can penetrate each other. Multiple passSet names can be given delimited by a “,”. Types that are understood by this tool are:

  • cylinder
  • cappedCylinder
  • box

Next we have to define the joints that connect the parts. Types of joints that are understood by this tool are:

  • fixed, for a stiff fixed joint.
  • hinge, one dimensional joint.
  • universal joint, experimental 2D joint.

A joint between two parts is inserted in the model by insertJoint, giving the identifier of the first part, then the identifier of the second part. Next the type of joint is stated (e.g. ‘hinge’). The axis around the joint will rotate is stated like axis={'x':1,'y':0,'z':0} and the anchor point in global coordinates is defined by something like anchor=(2.06,0.86,0). Add all parts and joints for your model.

Finally with centerOn(identifier) the camera position is fixed to that part and with insertFloor(y=??) a floor can be added.

Now go to the end of the file and state:

name = YourClass('../models/name')
name.writeXODE()

and execute the file with

python xodetools.py

And you have created an instantiation of your model that can be read in in the above environment.

What is missing is a default task for the new environment. In the previous “HowTo create your own learning task in an existing ODE environment” we saw how such a standard task looks for the Johnnie environment. To create our own task we have to create a file with the name of our environment in pybrain/rl/environments/ode/tasks/

The new task has to import the following packages:

from pybrain.rl.environments import EpisodicTask from pybrain.rl.environments.ode.sensors import *

And whatever is needed from scipy and similar.

The new class should inherit from EpisodicTask like in the JohnnieTask. Next we create the constructor that takes the environment with def __init__(self, env).

It is important that the constructor of EpisodicTask is called.

EpisodicTask.__init__(self, env)

The following member variables are mandatory:

self.maxPower = 100.0   #Overall maximal torque - is multiplied with relative max
                        #torque for individual joint to get individual max torque
self.reward_history = []
self.count = 0          #timestep counter
self.epiLen = 500       #time steps for one episode

In contrast to the ODEEnvironment standard settings some changes might be needed:

  • self.env.FricMu if you need higher or lower friction for your task,
  • self.env.dt if you need more timely resolution.

Next the sensor and actuator limits must be set, usually between -1 and 1:

# normalize standard sensors to (-1, 1)
self.sensor_limits = []
#Angle sensors
for i in range(self.env.actLen):
    # Joint velocity sensors
    self.sensor_limits.append((self.env.cLowList[i], self.env.cHighList[i]))
for i in range(self.env.actLen):
    self.sensor_limits.append((-20, 20))
#Normalize all actor dimensions to (-1, 1)
self.actor_limits = [(-1, 1)]*env.actLen

The next method that is needed is the performAction method, the standard setting looks like that:

def performAction(self, action):
    """ Filtered mapping towards performAction of the underlying environment """
    EpisodicTask.performAction(self, action)

If you want to control the wanted angels instead of the forces you may include this simple PD mechanism:

#The joint angles
isJoints = self.env.getSensorByName('JointSensor')
#The joint angular velocities
isSpeeds = self.env.getSensorByName('JointVelocitySensor')
#norm output to action interval
act = (action+1.0)/2.0*(self.env.cHighList-self.env.cLowList)+self.env.cLowList
#simple PID
action = tanh((act - isJoints - isSpeeds) * 16.0) * self.maxPower * self.env.tourqueList

Now we have to define the isFinished() method:

def isFinished(self):
    """ returns true if episode timesteps has reached episode length and resets the task """
    if self.count > self.epiLen:
        self.res()
        return True
    else:
        self.count += 1
        return False

You are certainly free to include other breaking conditions.

Finally we define a reset() method:

def res(self):
    """ sets counter and history back, increases incremental counter """
    self.count = 0
    self.reward_history.append(self.getTotalReward())

We don’t need a getReward() function here, because the method from EpisodicTask that returns always 0.0 is taken over. This is the default task that is used to create specific tasks. Please take a look at Creating your own learning task in an existing ODE environment for how to create a task that gives actual reward.

If you have done all steps right you now have a new ODE environment with a corresponding task that you can test by creating an experiment. Or you can try to copy an existing example like the johnnie_pgpe.py and replace the environment and the task definition with your new environment and task.