.. _ode:

Using ODE Environments
================================================

.. _existingode:

Using an existing ODE environment
-----------------------------------------

This tutorial walks you through the process of setting up an existing ODE Environment
for use as a testbed for RL or optimization algorithms.

First we need the following additional packages that are not required for PyBrain (in addition to SciPy):
	* matplotlib
	* python-tk
	* python-pyode
	* python-opengl (if you also want to view what is happening, very recommended)

You also need to exchange the following two ``.py`` files with custom versions:
::
	
	cd pybrain/pybrain/rl/environments/ode/xode_changes/
	sudo cp * /usr/lib/python2.6/dist-packages/xode/ (or there ever your dist-packages are)

You can test if all your settings are ok by starting following example:
::
	
	cd ~/pybrain/examples/rl/
	python johnnie_pgpe.py

... and then view what is happening by using the viewer:
::

	cd ~/pybrain/pybrain/rl/environments/ode
	python viewer.py


.. note::
	On Linux, if that gives rise to a segmentation fault, try installing ``xorg-driver-fglrx``
	
	
Existing ODE Environments that are tested are:
	* Johnnie (a biped humanoid robot modeled after the real
	  robot Johnnie (http://www.amm.mw.tum.de)
	* CCRL (a robot with two 7 DoF Arms and simple grippers, modeled
	  after the real robot at the CCRL of TU Munich. (http://www.lsr.ei.tum.de/)
	* PencilBalancer (a robot that balances pencils in a 2D way, modeled
	  after the real robot from Jörg Conradt. (http://www.ini.uzh.ch/~conradt/Projects/PencilBalancer/)

.. ToDo: check the rest of the environments.


.. _existinglearning:

Creating your own learning task in an existing ODE environment
-----------------------------------------------------------------------

This tutorial walks you through the process of setting up a
new task within an existing ODE Environment.
It assumes that you have taken the steps described in the section :ref:`existingode`.

For all ODE environments there can be found a standard task in
``pybrain/rl/environments/ode/tasks``

We take as an example again the Johnnie environment. You will find
that the first class in the johnnie.py file in the above described location is named
JohnnieTask and inherits from EpisodicTask.

The necessary methods that you need to define your own task are described already in that basic class:
	* ``__init__(self, env)`` - the constructor
	* ``performAction(self, action)`` - processes and filters the output from the controller
	  and communicates it to the environment.
	* ``isFinished(self)`` - checks if the maximum number of timesteps has been reached
	  or if other break condition has been met.
	* ``res(self)`` - resets counters rewards and similar.

If we take a look at the StandingTask (the next class in the file) we see
that only little has to be done to create an own task.
First of all the class must inherit from JohnnieTask.
Then, the constructor has to be overwritten to declare some variables and
constants for the specific task. In this case there were some additional
position sensors added and normalized for reward calculation.
As normally last step the getReward Method has to be overwritten, because
the reward definition is normally what defines the task. In this case just
the vertical head position is returned (with some clipping to prevent the
robot from jumping to get more reward). That is already enough to create a
task that is sufficiently defined to make a proper learning method (like
PGPE in the above mentioned and testable example johnnie_pgpe.py) learn a
controller that let the robot stand complete upright without falling.

For some special cases you maybe are forced to rewrite the performAction
method and the isFinished method, but that special cases are out of scope of this HowTo.
If you need to make such changes and encounter problems please feel
free to contact the PyBrain mailing list.


.. _createenvironment:

Creating your own ODE environment
-----------------------------------------

This tutorial walks you through the process of setting up a new ODE Environment.
It assumes that you are already familiar with the sections :ref:`existingode` and :ref:`existinglearning`
and have taken the necessary steps explained there.

If you want to your own environment you need the following:
	
	* Environment that inherits from ODEEnvironment
	* Agent that inherits from OptimizationAgent
	* Tasks that inherit from EpisodicTask

For all ODE environments, an instance can be found  in ``pybrain/rl/environments/ode/instances/``

We take as an example again the Johnnie environment. You will find
that the first class in the ``johnnie.py`` file in the location described above is named
:class:`JohnnieEnvironment` and inherits from :class:`ODEEnvironment`.

You will see that were is not much to do on the PyBrain side to generate the environment class.
First loading the corresponding XODE file is necessary to
provide PyBrain with the specification of the simulation.
How to generate the corresponding XODE file will be shown later in this HowTo.
Then the standard sensors are added like the JointSensors, the corresponding
JointVelocitySensors and also the actuators for every joint.
Because this kind of sensors and actuators are needed in every simulation
they are already added in the environment and assumed to exist by later stages of PyBrain.

The next part is a bit more involved.
First, member variables that state the number
of action dimensions and number of sensors have to be set.

::

     self.actLen = self.getActionLength()
     self.obsLen = len(self.getSensors())


Next, 3 lists are generated for every action dimension. The first list
is called :attr:`torqueList` and states the fraction of
the global maximal force that can bee applied to the joints.
The second list states the maximum angle, the third list states the
minimum angle for every joint. (:attr:`cHighList` and :attr:`cLowList`) For example:
::

      self.tourqueList = array([0.2, 0.2, 0.2, 0.5, 0.5, 2.0, 2.0,2.0,2.0,0.5,0.5],)
      self.cHighList = array([1.0, 1.0, 0.5, 0.5, 0.5, 1.5, 1.5,1.5,1.5,0.25,0.25],)
      self.cLowList = array([-0.5, -0.5, -0.5, 0.0, 0.0, 0.0, 0.0,0.0,0.0,-0.25,-0.25],)

The last thing to do is how much simulation steps ODE should make
before getting an update from the controller and sending new sensor values back, called stepsPerAction.

.. _createinstance:

Creating your own XODE instance
-----------------------------------------

Now we want to specify a instantiation in a XODE file.
If you do not know ODE very well,
you can use a script that is shipped with PyBrain and can be found in
``pybrain/rl/environments/ode/tools/xodetools.py``

The first part of the file is responsible for parsing the simplified XODE
code to a regular XODE file, that can be ignored.
For an example, look at the Johnnie definition by searching for ``class XODEJohnnie(XODEfile)``

The instantiation of what you want to simulate in ODE is defined in this
tool as a class that inherits from :class:`XODEfile`.
The class consists only of a constructor. Here all parts of the simulated object are defined.
The parts are defined in an global coordinate system. For examples the row
::

	self.insertBody('arm_left','cappedCylinder',[0.25,7.5],5,pos=[2.06,-2.89,0],
					euler=[90,0,0], passSet=['total'], mass=2.473)

creates the left arm (identifier 'arm_left') of Johnnie as an cylinder with round
endings ('cappedCylinder') with a diameter of 0.25 and a length of 7.5 ([0.25,7.5])
with a density of 5 (that will be overwritten if the optional value mass is given
at the end of the command), an initial position of ``pos = [2.06,-2.89,0]``, turned
by 90 degrees around the x-Axis (``euler = [90,0,0]``, all capped cylinders are by
default aligned with the y-Axis) the passSet named 'total' (will be explained
soon) and the optional mass of the part.

"passSet" is used to define parts that can penetrate each other.
That is especially necessary for parts that have a joint together,
but can also be usable in other cases. All parts that are part of
the same passSet can penetrate each other. Multiple passSet names can be given delimited by a ",".
Types that are understood by this tool are:

	* cylinder
	* cappedCylinder
	* box
	
.. - ToDo - are there more?

Next we have to define the joints that connect the parts.
Types of joints that are understood by this tool are:
	
	* fixed, for a stiff fixed joint.
	* hinge, one dimensional joint.
	* universal joint, experimental 2D joint.

.. - ToDo, are there more?

A joint between two parts is inserted in the model by insertJoint,
giving the identifier of the first part, then the identifier of the second part.
Next the type of joint is stated (e.g. 'hinge'). The axis around the joint will
rotate is stated like ``axis={'x':1,'y':0,'z':0}`` and the anchor point in global
coordinates is defined by something like ``anchor=(2.06,0.86,0)``.
Add all parts and joints for your model.

Finally with ``centerOn(identifier)`` the camera position is fixed to that part and
with ``insertFloor(y=??)`` a floor can be added.

Now go to the end of the file and state:
::

	name = YourClass('../models/name')
	name.writeXODE()

and execute the file with
::

	python xodetools.py

And you have created an instantiation of your model that can be read in in the above environment.

What is missing is a default task for the new environment. In the previous
"HowTo create your own learning task in an existing ODE environment"
we saw how such a standard task looks for the Johnnie environment.
To create our own task we have to create a file with the name of our environment in
``pybrain/rl/environments/ode/tasks/``

The new task has to import the following packages:

    from pybrain.rl.environments import EpisodicTask
    from pybrain.rl.environments.ode.sensors import *
	
And whatever is needed from scipy and similar.

The new class should inherit from EpisodicTask like in the JohnnieTask.
Next we create the constructor that takes the environment with
``def __init__(self, env)``.

It is important that the constructor of EpisodicTask is called.
::

	EpisodicTask.__init__(self, env)

The following member variables are mandatory:
::
	
    self.maxPower = 100.0   #Overall maximal torque - is multiplied with relative max
                            #torque for individual joint to get individual max torque
    self.reward_history = []
    self.count = 0          #timestep counter
    self.epiLen = 500       #time steps for one episode

In contrast to the ODEEnvironment standard settings some changes might be needed:

    * :attr:`self.env.FricMu` if you need higher or lower friction for your task,
    * :attr:`self.env.dt` if you need more timely resolution.

Next the sensor and actuator limits must be set, usually between -1 and 1:
::

    # normalize standard sensors to (-1, 1)
    self.sensor_limits = []
    #Angle sensors
    for i in range(self.env.actLen):
        # Joint velocity sensors
        self.sensor_limits.append((self.env.cLowList[i], self.env.cHighList[i]))
    for i in range(self.env.actLen):
        self.sensor_limits.append((-20, 20))
    #Normalize all actor dimensions to (-1, 1)
    self.actor_limits = [(-1, 1)]*env.actLen

The next method that is needed is the performAction method, the standard setting looks like that:
::

    def performAction(self, action):
        """ Filtered mapping towards performAction of the underlying environment """
        EpisodicTask.performAction(self, action)

If you want to control the wanted angels instead of the forces you may include this simple PD mechanism:
::

    #The joint angles
    isJoints = self.env.getSensorByName('JointSensor')
    #The joint angular velocities
    isSpeeds = self.env.getSensorByName('JointVelocitySensor')
    #norm output to action interval
    act = (action+1.0)/2.0*(self.env.cHighList-self.env.cLowList)+self.env.cLowList
    #simple PID
    action = tanh((act - isJoints - isSpeeds) * 16.0) * self.maxPower * self.env.tourqueList

Now we have to define the :meth:`isFinished` method:
::

    def isFinished(self):
        """ returns true if episode timesteps has reached episode length and resets the task """
        if self.count > self.epiLen:
            self.res()
            return True
        else:
            self.count += 1
            return False

You are certainly free to include other breaking conditions.

Finally we define a :meth:`reset` method:

::

   def res(self):
       """ sets counter and history back, increases incremental counter """
       self.count = 0
       self.reward_history.append(self.getTotalReward())

We don't need a :meth:`getReward` function here, because the method from :class:`EpisodicTask`
that returns always 0.0 is taken over. This is the default task that is used to create specific tasks.
Please take a look at :ref:`existinglearning` for how to create a task that gives actual reward.

If you have done all steps right you now have a new ODE environment with a
corresponding task that you can test by creating an experiment.
Or you can try to copy an existing example like the ``johnnie_pgpe.py`` and
replace the environment and the task definition with your new environment and task.