.. _ode: Using ODE Environments ================================================ .. _existingode: Using an existing ODE environment ----------------------------------------- This tutorial walks you through the process of setting up an existing ODE Environment for use as a testbed for RL or optimization algorithms. First we need the following additional packages that are not required for PyBrain (in addition to SciPy): * matplotlib * python-tk * python-pyode * python-opengl (if you also want to view what is happening, very recommended) You also need to exchange the following two ``.py`` files with custom versions: :: cd pybrain/pybrain/rl/environments/ode/xode_changes/ sudo cp * /usr/lib/python2.6/dist-packages/xode/ (or there ever your dist-packages are) You can test if all your settings are ok by starting following example: :: cd ~/pybrain/examples/rl/ python johnnie_pgpe.py ... and then view what is happening by using the viewer: :: cd ~/pybrain/pybrain/rl/environments/ode python viewer.py .. note:: On Linux, if that gives rise to a segmentation fault, try installing ``xorg-driver-fglrx`` Existing ODE Environments that are tested are: * Johnnie (a biped humanoid robot modeled after the real robot Johnnie (http://www.amm.mw.tum.de) * CCRL (a robot with two 7 DoF Arms and simple grippers, modeled after the real robot at the CCRL of TU Munich. (http://www.lsr.ei.tum.de/) * PencilBalancer (a robot that balances pencils in a 2D way, modeled after the real robot from Jörg Conradt. (http://www.ini.uzh.ch/~conradt/Projects/PencilBalancer/) .. ToDo: check the rest of the environments. .. _existinglearning: Creating your own learning task in an existing ODE environment ----------------------------------------------------------------------- This tutorial walks you through the process of setting up a new task within an existing ODE Environment. It assumes that you have taken the steps described in the section :ref:`existingode`. For all ODE environments there can be found a standard task in ``pybrain/rl/environments/ode/tasks`` We take as an example again the Johnnie environment. You will find that the first class in the johnnie.py file in the above described location is named JohnnieTask and inherits from EpisodicTask. The necessary methods that you need to define your own task are described already in that basic class: * ``__init__(self, env)`` - the constructor * ``performAction(self, action)`` - processes and filters the output from the controller and communicates it to the environment. * ``isFinished(self)`` - checks if the maximum number of timesteps has been reached or if other break condition has been met. * ``res(self)`` - resets counters rewards and similar. If we take a look at the StandingTask (the next class in the file) we see that only little has to be done to create an own task. First of all the class must inherit from JohnnieTask. Then, the constructor has to be overwritten to declare some variables and constants for the specific task. In this case there were some additional position sensors added and normalized for reward calculation. As normally last step the getReward Method has to be overwritten, because the reward definition is normally what defines the task. In this case just the vertical head position is returned (with some clipping to prevent the robot from jumping to get more reward). That is already enough to create a task that is sufficiently defined to make a proper learning method (like PGPE in the above mentioned and testable example johnnie_pgpe.py) learn a controller that let the robot stand complete upright without falling. For some special cases you maybe are forced to rewrite the performAction method and the isFinished method, but that special cases are out of scope of this HowTo. If you need to make such changes and encounter problems please feel free to contact the PyBrain mailing list. .. _createenvironment: Creating your own ODE environment ----------------------------------------- This tutorial walks you through the process of setting up a new ODE Environment. It assumes that you are already familiar with the sections :ref:`existingode` and :ref:`existinglearning` and have taken the necessary steps explained there. If you want to your own environment you need the following: * Environment that inherits from ODEEnvironment * Agent that inherits from OptimizationAgent * Tasks that inherit from EpisodicTask For all ODE environments, an instance can be found in ``pybrain/rl/environments/ode/instances/`` We take as an example again the Johnnie environment. You will find that the first class in the ``johnnie.py`` file in the location described above is named :class:`JohnnieEnvironment` and inherits from :class:`ODEEnvironment`. You will see that were is not much to do on the PyBrain side to generate the environment class. First loading the corresponding XODE file is necessary to provide PyBrain with the specification of the simulation. How to generate the corresponding XODE file will be shown later in this HowTo. Then the standard sensors are added like the JointSensors, the corresponding JointVelocitySensors and also the actuators for every joint. Because this kind of sensors and actuators are needed in every simulation they are already added in the environment and assumed to exist by later stages of PyBrain. The next part is a bit more involved. First, member variables that state the number of action dimensions and number of sensors have to be set. :: self.actLen = self.getActionLength() self.obsLen = len(self.getSensors()) Next, 3 lists are generated for every action dimension. The first list is called :attr:`torqueList` and states the fraction of the global maximal force that can bee applied to the joints. The second list states the maximum angle, the third list states the minimum angle for every joint. (:attr:`cHighList` and :attr:`cLowList`) For example: :: self.tourqueList = array([0.2, 0.2, 0.2, 0.5, 0.5, 2.0, 2.0,2.0,2.0,0.5,0.5],) self.cHighList = array([1.0, 1.0, 0.5, 0.5, 0.5, 1.5, 1.5,1.5,1.5,0.25,0.25],) self.cLowList = array([-0.5, -0.5, -0.5, 0.0, 0.0, 0.0, 0.0,0.0,0.0,-0.25,-0.25],) The last thing to do is how much simulation steps ODE should make before getting an update from the controller and sending new sensor values back, called stepsPerAction. .. _createinstance: Creating your own XODE instance ----------------------------------------- Now we want to specify a instantiation in a XODE file. If you do not know ODE very well, you can use a script that is shipped with PyBrain and can be found in ``pybrain/rl/environments/ode/tools/xodetools.py`` The first part of the file is responsible for parsing the simplified XODE code to a regular XODE file, that can be ignored. For an example, look at the Johnnie definition by searching for ``class XODEJohnnie(XODEfile)`` The instantiation of what you want to simulate in ODE is defined in this tool as a class that inherits from :class:`XODEfile`. The class consists only of a constructor. Here all parts of the simulated object are defined. The parts are defined in an global coordinate system. For examples the row :: self.insertBody('arm_left','cappedCylinder',[0.25,7.5],5,pos=[2.06,-2.89,0], euler=[90,0,0], passSet=['total'], mass=2.473) creates the left arm (identifier 'arm_left') of Johnnie as an cylinder with round endings ('cappedCylinder') with a diameter of 0.25 and a length of 7.5 ([0.25,7.5]) with a density of 5 (that will be overwritten if the optional value mass is given at the end of the command), an initial position of ``pos = [2.06,-2.89,0]``, turned by 90 degrees around the x-Axis (``euler = [90,0,0]``, all capped cylinders are by default aligned with the y-Axis) the passSet named 'total' (will be explained soon) and the optional mass of the part. "passSet" is used to define parts that can penetrate each other. That is especially necessary for parts that have a joint together, but can also be usable in other cases. All parts that are part of the same passSet can penetrate each other. Multiple passSet names can be given delimited by a ",". Types that are understood by this tool are: * cylinder * cappedCylinder * box .. - ToDo - are there more? Next we have to define the joints that connect the parts. Types of joints that are understood by this tool are: * fixed, for a stiff fixed joint. * hinge, one dimensional joint. * universal joint, experimental 2D joint. .. - ToDo, are there more? A joint between two parts is inserted in the model by insertJoint, giving the identifier of the first part, then the identifier of the second part. Next the type of joint is stated (e.g. 'hinge'). The axis around the joint will rotate is stated like ``axis={'x':1,'y':0,'z':0}`` and the anchor point in global coordinates is defined by something like ``anchor=(2.06,0.86,0)``. Add all parts and joints for your model. Finally with ``centerOn(identifier)`` the camera position is fixed to that part and with ``insertFloor(y=??)`` a floor can be added. Now go to the end of the file and state: :: name = YourClass('../models/name') name.writeXODE() and execute the file with :: python xodetools.py And you have created an instantiation of your model that can be read in in the above environment. What is missing is a default task for the new environment. In the previous "HowTo create your own learning task in an existing ODE environment" we saw how such a standard task looks for the Johnnie environment. To create our own task we have to create a file with the name of our environment in ``pybrain/rl/environments/ode/tasks/`` The new task has to import the following packages: from pybrain.rl.environments import EpisodicTask from pybrain.rl.environments.ode.sensors import * And whatever is needed from scipy and similar. The new class should inherit from EpisodicTask like in the JohnnieTask. Next we create the constructor that takes the environment with ``def __init__(self, env)``. It is important that the constructor of EpisodicTask is called. :: EpisodicTask.__init__(self, env) The following member variables are mandatory: :: self.maxPower = 100.0 #Overall maximal torque - is multiplied with relative max #torque for individual joint to get individual max torque self.reward_history = [] self.count = 0 #timestep counter self.epiLen = 500 #time steps for one episode In contrast to the ODEEnvironment standard settings some changes might be needed: * :attr:`self.env.FricMu` if you need higher or lower friction for your task, * :attr:`self.env.dt` if you need more timely resolution. Next the sensor and actuator limits must be set, usually between -1 and 1: :: # normalize standard sensors to (-1, 1) self.sensor_limits = [] #Angle sensors for i in range(self.env.actLen): # Joint velocity sensors self.sensor_limits.append((self.env.cLowList[i], self.env.cHighList[i])) for i in range(self.env.actLen): self.sensor_limits.append((-20, 20)) #Normalize all actor dimensions to (-1, 1) self.actor_limits = [(-1, 1)]*env.actLen The next method that is needed is the performAction method, the standard setting looks like that: :: def performAction(self, action): """ Filtered mapping towards performAction of the underlying environment """ EpisodicTask.performAction(self, action) If you want to control the wanted angels instead of the forces you may include this simple PD mechanism: :: #The joint angles isJoints = self.env.getSensorByName('JointSensor') #The joint angular velocities isSpeeds = self.env.getSensorByName('JointVelocitySensor') #norm output to action interval act = (action+1.0)/2.0*(self.env.cHighList-self.env.cLowList)+self.env.cLowList #simple PID action = tanh((act - isJoints - isSpeeds) * 16.0) * self.maxPower * self.env.tourqueList Now we have to define the :meth:`isFinished` method: :: def isFinished(self): """ returns true if episode timesteps has reached episode length and resets the task """ if self.count > self.epiLen: self.res() return True else: self.count += 1 return False You are certainly free to include other breaking conditions. Finally we define a :meth:`reset` method: :: def res(self): """ sets counter and history back, increases incremental counter """ self.count = 0 self.reward_history.append(self.getTotalReward()) We don't need a :meth:`getReward` function here, because the method from :class:`EpisodicTask` that returns always 0.0 is taken over. This is the default task that is used to create specific tasks. Please take a look at :ref:`existinglearning` for how to create a task that gives actual reward. If you have done all steps right you now have a new ODE environment with a corresponding task that you can test by creating an experiment. Or you can try to copy an existing example like the ``johnnie_pgpe.py`` and replace the environment and the task definition with your new environment and task.