`learners` – RL Components: Learners¶

Abstract classes¶

The top of the learner hierarchy is more conceptual than functional. The different classes distinguish algorithms in such a way that we can automatically determine when an algorithm is not applicable for a problem.

class pybrain.rl.learners.learner.Learner¶

Top-level class for all reinforcement learning algorithms. Any learning algorithm changes a policy (in some way) in order to increase the expected reward/fitness.

learn()¶: The main method, that invokes a learning step.

class pybrain.rl.learners.learner.EpisodicLearner¶

Bases: pybrain.rl.learners.learner.Learner

Assumes the task is episodic, not life-long, and therefore does a learning step only after the end of each episode.

class pybrain.rl.learners.learner.DataSetLearner¶

Bases: pybrain.rl.learners.learner.EpisodicLearner

A class for learners that learn from a dataset, which has no target output but only a reinforcement signal for each sample. It requires a ReinforcementDataSet object (which provides state-action-reward tuples).

class pybrain.rl.learners.learner.ExploringLearner¶

Bases: pybrain.rl.learners.learner.Learner

A Learner determines how to change the adaptive parameters of a module.

class pybrain.rl.learners.directsearch.directsearch.DirectSearchLearner¶

Bases: pybrain.rl.learners.learner.Learner

The class of learners that (in contrast to value-based learners) searches directly in policy space.

Value-based Learners¶

class pybrain.rl.learners.valuebased.valuebased.ValueBasedLearner¶

Bases: pybrain.rl.learners.learner.ExploringLearner, pybrain.rl.learners.learner.DataSetLearner, pybrain.rl.learners.learner.EpisodicLearner

An RL algorithm based on estimating a value-function.

batchMode¶: Does the algorithm run in batch mode or online?

explorer¶: Return the internal explorer.

module¶: Return the internal module.

offPolicy¶: Does the algorithm work on-policy or off-policy?

class pybrain.rl.learners.valuebased.Q(alpha=0.5, gamma=0.98999999999999999)¶

Bases: pybrain.rl.learners.valuebased.valuebased.ValueBasedLearner

learn()¶

Learn on the current dataset, either for many timesteps and even episodes (batchMode = True) or for a single timestep (batchMode = False). Batch mode is possible, because Q-Learning is an off-policy method.

In batchMode, the algorithm goes through all the samples in the history and performs an update on each of them. if batchMode is False, only the last data sample is considered. The user himself has to make sure to keep the dataset consistent with the agent’s history.

class pybrain.rl.learners.valuebased.QLambda(alpha=0.5, gamma=0.98999999999999999, qlambda=0.90000000000000002)¶

Bases: pybrain.rl.learners.valuebased.valuebased.ValueBasedLearner

Q-lambda is a variation of Q-learning that uses an eligibility trace.

class pybrain.rl.learners.valuebased.SARSA(alpha=0.5, gamma=0.98999999999999999)¶

Bases: pybrain.rl.learners.valuebased.valuebased.ValueBasedLearner

State-Action-Reward-State-Action (SARSA) algorithm.

In batchMode, the algorithm goes through all the samples in the history and performs an update on each of them. if batchMode is False, only the last data sample is considered. The user himself has to make sure to keep the dataset consistent with the agent’s history.

class pybrain.rl.learners.valuebased.NFQ¶

Bases: pybrain.rl.learners.valuebased.valuebased.ValueBasedLearner

Neuro-fitted Q-learning

Direct-search Learners¶

class pybrain.rl.learners.directsearch.policygradient.PolicyGradientLearner¶

Bases: pybrain.rl.learners.directsearch.directsearch.DirectSearchLearner, pybrain.rl.learners.learner.DataSetLearner, pybrain.rl.learners.learner.ExploringLearner

PolicyGradientLearner is a super class for all continuous direct search algorithms that use the log likelihood of the executed action to update the weights. Subclasses are ENAC, GPOMDP, or REINFORCE.

learn()¶: calls the gradient calculation function and executes a step in direction of the gradient, scaled with a small learning rate alpha.

class pybrain.rl.learners.directsearch.reinforce.Reinforce¶

Bases: pybrain.rl.learners.directsearch.policygradient.PolicyGradientLearner

Reinforce is a gradient estimator technique by Williams (see “Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning”). It uses optimal baselines and calculates the gradient with the log likelihoods of the taken actions.

class pybrain.rl.learners.directsearch.enac.ENAC¶

Bases: pybrain.rl.learners.directsearch.policygradient.PolicyGradientLearner

Episodic Natural Actor-Critic. See J. Peters “Natural Actor-Critic”, 2005. Estimates natural gradient with regression of log likelihoods to rewards.

Note

Black-box optimization algorithms can also be seen as direct-search RL algorithms, but are not included here.

`learners` – RL Components: Learners¶

Abstract classes¶

Value-based Learners¶

Direct-search Learners¶

Table Of Contents

Previous topic

Next topic

This Page

Navigation

learners – RL Components: Learners¶

Abstract classes¶

Value-based Learners¶

Direct-search Learners¶

Table Of Contents

Previous topic

Next topic

This Page

Quick search

Navigation

`learners` – RL Components: Learners¶