Building a DataSet

In order for our networks to learn anything, we need a dataset that contains inputs and targets. PyBrain has the pybrain.dataset package for this, and we will use the SupervisedDataSet class for our needs.

A customized DataSet

The SupervisedDataSet class is used for standard supervised learning. It supports input and target values, whose size we have to specify on object creation:

>>> from pybrain.datasets import SupervisedDataSet
>>> ds = SupervisedDataSet(2, 1)

Here we have generated a dataset that supports two dimensional inputs and one dimensional targets.

Adding samples

A classic example for neural network training is the XOR function, so let’s just build a dataset for this. We can do this by just adding samples to the dataset:

>>> ds.addSample((0, 0), (0,))
>>> ds.addSample((0, 1), (1,))
>>> ds.addSample((1, 0), (1,))
>>> ds.addSample((1, 1), (0,))

Examining the dataset

We now have a dataset that has 4 samples in it. We can check that with python’s idiomatic way of checking the size of something:

>>> len(ds)
4

We can also iterate over it in the standard way:

>>> for inpt, target in ds:
...   print inpt, target
...
[ 0.  0.] [ 0.]
[ 0.  1.] [ 1.]
[ 1.  0.] [ 1.]
[ 1.  1.] [ 0.]

We can access the input and target field directly as arrays:

>>> ds['input']
array([[ 0.,  0.],
       [ 0.,  1.],
       [ 1.,  0.],
       [ 1.,  1.]])
>>> ds['target']
array([[ 0.],
       [ 1.],
       [ 1.],
       [ 0.]])

It is also possible to clear a dataset again, and delete all the values from it:

>>> ds.clear()
>>> ds['input']
array([], shape=(0, 2), dtype=float64)
>>> ds['target']
array([], shape=(0, 1), dtype=float64)

Table Of Contents

Previous topic

Building a Network

Next topic

Training your Network on your Dataset

This Page