SmallGP - a lean Genetic Programming System for Symbolic Regression
Author: Hannes Planatscher
Java SmallGP Applet
The functionality of SmallGP is implemented as a Java Applet: Click here to start!
Classic Command-line SmallGP
Introduction
SmallGP is a implementation of Symbolic Regression
wich is compact by means of lines of code and memory usage, developed as an entry
for the "Tiny GP"-Coding-Contest at the Genetic and Evolutionary
Computation Conference 2004. It did not win however. But I still think
that it contains some ideas that may be useful to someone interested or new in the area.
SmallGP is free software released under the GNU GPL.
Download
tar.gz: smallgp-1.0.tar.gz
zip: smallgp-1.0.zip
Installation Instructions
gcc and make are needed to compile SmallGP. It has succesfully been
compiled under Linux, FreeBSD, Solaris and Win32. I recommend the Cygwin-Environment
for Win32. It should work for all Win32-C-Compilers too however.
Unpack smallgp-1.0.tar.gz:
> tar xvfz smallgp-1.0.tar.gz
Move to the created directory:
> cd smallgp-1.0
Run Make:
> make
Some compiler warnings may occur, but after compilation you should find
a working version of smallgp in the same directory.
Usage
SmallGP is a commandline-tool. All important parameters can be changed easily.
Display the help with
> smallgp -h
to learn which parameters are avaiable.
Most of the parameters and their default values are explained in this specs.
Example parameters for a test run:
dataset: test.data
population size: 10000
tournament size: 100
maximum generations: 80
maximum prog. length: 50
maximum depth: 7
#constants: 200
Pcross: 0.7
Pmut/Node: 0.1
To start smallgp with these parameters just run
> smallgp -i test.data -p 10000 -t 100 -g 80 -l 50 -d 7 -r 200 -c 0.7 -m 0.1
.
Warning: Parameters are not checked for validity (ex. negative
probabilities etc.).
Detailed documentation
Also available as plain text: smallgpdoc.txt .
_____ ___ ___ ___ _ _ _____ _____
/ ___/ / |/ | / | | | | | / ___| | _ \
| |___ / /| /| | / /| | | | | | | | | |_| |
\___ \ / / |__/ | | / / | | | | | | | | _ | ___/
___| | / / | | / / | | | |___ | |___ | |_| | | |
/_____/ /_/ |_| /_/ |_| |_____| |_____| \_____/ |_| 1.0
a compact implementation of Symbolic Regression
----------------------------------------------------------------
Content:
I) Introduction
II) Installation/Compilation
III) Usage
IV) Implementation Details
1) Representation of individuals
2) Crossover
3) Mutation
4) Strategy
5) Initialization
V) References
VI) License
I) Introduction
SmallGP is a implementation of Symbolic Regression [1] wich is compact by
means of lines of code and memory usage, developed as an entry
for the "Tiny GP"-Coding-Contest [2] at the Genetic and Evolutionary
Computation Conference 2004.
II) Installation Instructions
tar, gcc and make are needed to compile SmallGP. It has succesfully been
compiled under Linux, FreeBSD, Solaris and Win32 (with djgpp).
Unpack smallgp-1.0.tar.gz:
> tar xvfz smallgp-1.0.tar.gz
Move to the created directory:
> cd smallgp-1.0
Run Make:
> make
Some compiler warnings may occur, but after compilation you should find
a working version of smallgp in the same directory.
III) Usage
SmallGP is a commandline-tool. All important parameters can be changed easily.
Display the help with
> smallgp -h
to learn which parameters are avaiable.
Most of the parameters and their default values are explained in [2].
Example parameters for a test run:
dataset: test.data
population size: 10000
tournament size: 100
maximum generations: 80
maximum prog. length: 50
maximum depth: 7
#constants: 200
Pcross: 0.7
Pmut/Node: 0.1
To start smallgp with these parameters just run
> smallgp -i test.data -p 10000 -t 100 -g 80 -l 50 -d 7 -r 200 -c 0.7 -m 0.1
.
Warning: Parameters are not checked for validity (ex. negative
probabilities etc.).
IV) Implementation Details
1) Representation of individuals
SmallGP uses the concept of genotype-phenotype-duality.
Consider the following tree:
(+)
/ \
x1 (+)
/ \
x2 c2
This can be written in polish notation [3]:
+ x1 + x2 c2
Due limited amount of different symbols it is possible to store this
expression in a vector of integer values.
(+)
/ \
x1 (+) => + x1 + x2 c2 => (2,5,2,6,16)
/ \
x2 c2
The integer vector represents the genotype, while the phenotype
is a standard binary-tree, that consists of nodes containing
information which are holding pointers to their children and
their predecessor.
struct node {
struct node *right;
struct node *left;
struct node **bp;
unsigned char no;
};
node->no specifies the primitive function/terminal represented by this node.
It can easily be deducted that the maximum number of different primitves and
terminals is limited to 255 with using an 8-bit datatype.
This phenotypic representation enables easy implementation of crossover and
evaluation.
With 104 bits memory needed per node it isnt really memory efficient,
as every node contains just 8 bit of information, so we use the genotype
for storage.
The phenotype is easily converted to the genotype: a vector of 8-bit values.
The genotype is far more memory efficient than the phenotypic representation,
but the implementation of subtree crossover and evaluation is less trivial.
SmallGP uses the phenotype-genotype-duality to save memory and to allow
easy manipulation. While for evaluation and crossover individuals are
temporarily expanded to their phenotype, point mutation can be
performed on the genotype.
2) Crossover
SmallGP implements a size-safe kozastyle subtree-crossover[4]. Subtrees of
appropriate size are selected from two individuals, and then exchanged. It is
guaranteed that the size constraint is met by both individuals after the
operation.
Suppose that size(tree1) < size(tree2).
tree1 can be extended by (MAXLEN - size(tree1)) nodes. The maximum size
of a subtree of tree2 is size(tree2) so we fix a maximum size for the
subtree selected from tree2 by a positive random integer value
smaller than
((MAX_LEN - size(tree1)) modulo size(tree2)) = maxsize2
After the selection of a subtree subtree_2 with a maximal size
maxsize2 from tree2, we can determine maxsize1. tree2 can be extended
by (MAXLEN - size(tree2) + size(subtree2)).
As the maximum size of a subtree from tree1 is size(tree1) we choose
a positive random integer value not bigger than
((MAX_LEN - size(tree2) + size(subtree2)) modulo size(tree1)) = maxsize1
.
3) Mutation
SmallGP implements point-mutation, that simply substitutes a terminal with
another terminal, and a primitive with another primitive [5].
4) Strategy
SmallGP uses a generational strategy, with tournament selection.
5) Initialization
The tree in the initial population are initialized with a size-safe
"Grow"-Method. The probabilty of choosing a terminal at a certain depth
is obtained as follows:
|primitives| + |variables| + 1
P_term = ------------------------------
|variables| + 1
Constants are considered as one virtual terminal symbol.
Due to the "Grow"-Method most of the individuals are very small in
the initial population.
This, in combination with the previously defined point mutation, can
cause lack of structural innovation in some cases.
So an additional parameter P_term_really is introduced that allows
to scale between "Grow" and "Full" initialization.
A terminal is set with the probability:
P_term' = P_term * P_term_really
. With P_term_really = 1 the method equals "Grow", with P_term_really = 0
"Full". The defaultvalue of Pterm is 1.
V) References
[1] Genetic Programming, Symbolic Regression
http://www.genetic-programming.org/
[2] Tiny GP specification:
http://cswww.essex.ac.uk/staff/sml/gecco/TinyGP.html
[3] Polish Notation
http://www.cenius.net/refer/display.php?ArticleID=polishnotation
[4] Subtree Crossover
http://www.geneticprogramming.com/Tutorial/#anchor179890
[5] Genetic Programming with One-Point Crossover and Point-Mutation
(Poli, Langdon)
http://cswww.essex.ac.uk/staff/poli/papers/Poli-WSC2-1997.pdf
VI) License
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 US