explorers – RL Components: Explorers

class pybrain.rl.explorers.explorer.Explorer(indim, outdim, name=None, **args)

An Explorer object is used in Agents, receives the current state and action (from the controller Module) and returns an explorative action that is executed instead the given action.

activate(state, action)
The super class commonly ignores the state and simply passes the action through the module. implement _forwardImplementation() in subclasses.
newEpisode()
Inform the explorer about the start of a new episode.

Continuous Explorers

class pybrain.rl.explorers.continuous.NormalExplorer(dim, sigma=0.0)
A continuous explorer, that perturbs the resulting action with additive, normally distributed random noise. The exploration has parameter(s) sigma, which are related to the distribution’s standard deviation. In order to allow for negative values of sigma, the real std. derivation is a transformation of sigma according to the expln() function (see pybrain.tools.functions).
class pybrain.rl.explorers.continuous.sde.StateDependentExplorer(statedim, actiondim, sigma=-2.0)

A continuous explorer, that perturbs the resulting action with additive, normally distributed random noise. The exploration has parameter(s) sigma, which are related to the distribution’s standard deviation. In order to allow for negative values of sigma, the real std. derivation is a transformation of sigma according to the expln() function (see pybrain.tools.functions).

activate(state, action)
The super class commonly ignores the state and simply passes the action through the module. implement _forwardImplementation() in subclasses.
newEpisode()
Randomize the matrix values for exploration during one episode.

Discrete Explorers

class pybrain.rl.explorers.discrete.discrete.DiscreteExplorer

Bases: pybrain.rl.explorers.explorer.Explorer

Discrete explorers choose one of the available actions from the set of actions. In order to know which actions are available and which action to choose, discrete explorers need access to the module (which has to of class ActionValueTable).

_setModule(module)
Tells the explorer the module (which has to be ActionValueTable).
class pybrain.rl.explorers.discrete.EpsilonGreedyExplorer(epsilon=0.29999999999999999, decay=0.99990000000000001)

Bases: pybrain.rl.explorers.discrete.discrete.DiscreteExplorer

A discrete explorer, that executes the original policy in most cases, but sometimes returns a random action (uniformly drawn) instead. The randomness is controlled by a parameter 0 <= epsilon <= 1. The closer epsilon gets to 0, the more greedy (and less explorative) the agent behaves.

_forwardImplementation(inbuf, outbuf)
Draws a random number between 0 and 1. If the number is less than epsilon, a random action is chosen. If it is equal or larger than epsilon, the greedy action is returned.
class pybrain.rl.explorers.discrete.BoltzmannExplorer(tau=2.0, decay=0.99950000000000006)

Bases: pybrain.rl.explorers.discrete.discrete.DiscreteExplorer

A discrete explorer, that executes the actions with probability that depends on their action values. The boltzmann explorer has a parameter tau (the temperature). for high tau, the actions are nearly equiprobable. for tau close to 0, this action selection becomes greedy.

activate(state, action)
The super class ignores the state and simply passes the action through the module. implement _forwardImplementation() in subclasses.
_forwardImplementation(inbuf, outbuf)
Draws a random number between 0 and 1. If the number is less than epsilon, a random action is chosen. If it is equal or larger than epsilon, the greedy action is returned.
class pybrain.rl.explorers.discrete.DiscreteStateDependentExplorer(epsilon=0.20000000000000001, decay=0.99980000000000002)

Bases: pybrain.rl.explorers.discrete.discrete.DiscreteExplorer

A discrete explorer, that directly manipulates the ActionValue estimator (table or network) and keeps the changes fixed for one full episode (if episodic) or slowly changes it over time.

TODO: currently only implemented for episodes

activate(state, action)
Save the current state for state-dependent exploration.
_forwardImplementation(inbuf, outbuf)
Activate the copied module instead of the original and feed it with the current state.

Table Of Contents

Previous topic

experiments – RL Components: Experiments

Next topic

learners – RL Components: Learners

This Page