learners – RL Components: Learners

Abstract classes

The top of the learner hierarchy is more conceptual than functional. The different classes distinguish algorithms in such a way that we can automatically determine when an algorithm is not applicable for a problem.

class pybrain.rl.learners.learner.Learner

Top-level class for all reinforcement learning algorithms. Any learning algorithm changes a policy (in some way) in order to increase the expected reward/fitness.

learn()
The main method, that invokes a learning step.
class pybrain.rl.learners.learner.EpisodicLearner

Bases: pybrain.rl.learners.learner.Learner

Assumes the task is episodic, not life-long, and therefore does a learning step only after the end of each episode.

class pybrain.rl.learners.learner.DataSetLearner

Bases: pybrain.rl.learners.learner.EpisodicLearner

A class for learners that learn from a dataset, which has no target output but only a reinforcement signal for each sample. It requires a ReinforcementDataSet object (which provides state-action-reward tuples).

class pybrain.rl.learners.learner.ExploringLearner

Bases: pybrain.rl.learners.learner.Learner

A Learner determines how to change the adaptive parameters of a module.

class pybrain.rl.learners.directsearch.directsearch.DirectSearchLearner

Bases: pybrain.rl.learners.learner.Learner

The class of learners that (in contrast to value-based learners) searches directly in policy space.

Value-based Learners

class pybrain.rl.learners.valuebased.valuebased.ValueBasedLearner

Bases: pybrain.rl.learners.learner.ExploringLearner, pybrain.rl.learners.learner.DataSetLearner, pybrain.rl.learners.learner.EpisodicLearner

An RL algorithm based on estimating a value-function.

batchMode
Does the algorithm run in batch mode or online?
explorer
Return the internal explorer.
module
Return the internal module.
offPolicy
Does the algorithm work on-policy or off-policy?
class pybrain.rl.learners.valuebased.Q(alpha=0.5, gamma=0.98999999999999999)

Bases: pybrain.rl.learners.valuebased.valuebased.ValueBasedLearner

learn()

Learn on the current dataset, either for many timesteps and even episodes (batchMode = True) or for a single timestep (batchMode = False). Batch mode is possible, because Q-Learning is an off-policy method.

In batchMode, the algorithm goes through all the samples in the history and performs an update on each of them. if batchMode is False, only the last data sample is considered. The user himself has to make sure to keep the dataset consistent with the agent’s history.

class pybrain.rl.learners.valuebased.QLambda(alpha=0.5, gamma=0.98999999999999999, qlambda=0.90000000000000002)

Bases: pybrain.rl.learners.valuebased.valuebased.ValueBasedLearner

Q-lambda is a variation of Q-learning that uses an eligibility trace.

class pybrain.rl.learners.valuebased.SARSA(alpha=0.5, gamma=0.98999999999999999)

Bases: pybrain.rl.learners.valuebased.valuebased.ValueBasedLearner

State-Action-Reward-State-Action (SARSA) algorithm.

In batchMode, the algorithm goes through all the samples in the history and performs an update on each of them. if batchMode is False, only the last data sample is considered. The user himself has to make sure to keep the dataset consistent with the agent’s history.

class pybrain.rl.learners.valuebased.NFQ

Bases: pybrain.rl.learners.valuebased.valuebased.ValueBasedLearner

Neuro-fitted Q-learning

Direct-search Learners

class pybrain.rl.learners.directsearch.policygradient.PolicyGradientLearner

Bases: pybrain.rl.learners.directsearch.directsearch.DirectSearchLearner, pybrain.rl.learners.learner.DataSetLearner, pybrain.rl.learners.learner.ExploringLearner

PolicyGradientLearner is a super class for all continuous direct search algorithms that use the log likelihood of the executed action to update the weights. Subclasses are ENAC, GPOMDP, or REINFORCE.

learn()
calls the gradient calculation function and executes a step in direction of the gradient, scaled with a small learning rate alpha.
class pybrain.rl.learners.directsearch.reinforce.Reinforce

Bases: pybrain.rl.learners.directsearch.policygradient.PolicyGradientLearner

Reinforce is a gradient estimator technique by Williams (see “Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning”). It uses optimal baselines and calculates the gradient with the log likelihoods of the taken actions.

class pybrain.rl.learners.directsearch.enac.ENAC

Bases: pybrain.rl.learners.directsearch.policygradient.PolicyGradientLearner

Episodic Natural Actor-Critic. See J. Peters “Natural Actor-Critic”, 2005. Estimates natural gradient with regression of log likelihoods to rewards.

Note

Black-box optimization algorithms can also be seen as direct-search RL algorithms, but are not included here.

Table Of Contents

Previous topic

explorers – RL Components: Explorers

Next topic

tasks – RL Components: Tasks

This Page