Grammar

This is the difficult part, as you have to be very precise and thoughtful with what you write. We’ll step through the base grammar required to evolve our max function, but grammar extensions that include additional logic, list functions, and dictionary functions are available at https://github.com/Padam-0/NC-GA.

Once again, PonyGE2 provides grammar examples in the PonyGE2/grammars directory. Create yours in there (it can be called anything, but give it the extensions .pybnf ).

The first line contains the starting point for the program:

<fc> ::= <deff>{::}<callf>

In English, this says:

<fc> is the starting symbol, which contains one option: <deff> followed by a newline ( {::} represents a newline, {: is an indent (tab or 4 spaces) and :} closes the indent), followed by <callf> .

The next line, we define <deff> :

<deff> ::= def fun(li):{:m = 0{::}<code>{::}return m:}

This looks a bit more like Python. We can see we have a function definition of a function called fun() with one argument li .

Next we open an indent (like Python requires), initialise our variable m=0 , start a new line, write some <code> , start a new line, return m , then close the indent.

Again, there is only one option for <deff> , so this will definitely be written. Next, <callf> :

<callf> ::= return_val = fun(test_list)

Again, this looks more like Python. And only one option, so regardless of the chromosome, our algorithm will always look something like this:

def fun(li):

m = 0

<code>

return m

return_val = fun(test_list)

The <code> section is where the evolution gets done, but in most cases we’ll now have a working function. Cool hey! It gets cooler.

Remember in our fitness function, we gave our dictionary d a key called test_list ? Well, because we call exec(p, d) , the program can see the contents of that dictionary d , so when it sees test_list in the algorithm, it replaces it with the value corresponding to test_list in the dictionary d , or in our case, the random list we want to find the maximum of! Cool huh, now we have a way of passing data from our fitness function into our evolved function.

And again, in our fitness function, we set guess = d['return_val'] ? Well it turns out not only can we read from d , we can write to it, and the value returned from fun(test_list) is passed to d as return_val , so then we can access the returned value of the evolved algorithm in the fitness function, so we can access its fitness!

So we now can send data in both directions, I recommend going and re-reading the section on the fitness function to see if this all makes a bit more sense now. It definitely does for me.

As for the rest of our grammar:

<code> ::= <stmt> | <stmt>{::}<code>

Code is either a single statement, or a statement followed by some more code on a new line. This way we can give ourselves multi-line functions! This recursive nature of the grammar is very useful, and will come up quite a lot to enable us to add complexity to the evolution.

<stmt> ::= <var> = <expr> | <for> | <if>

A statement is then either assigning an expression to a variable, a for loop, or an if statement. Let’s look at these in reverse order:

<if> ::= if <cond>:<if-opt>

<if-opt> ::= {:<code>:} | {:<code>:}else:{:<code>:} | {:<code>:}elif <cond>:{:<if-opt>:} <cond> ::= <expr> <c-op> <expr>

<c-op> ::= ">=" | "<=" | ">" | "<"

So the if statement takes a condition, and then an option. The condition takes the form of an expression, an operator, and an expression.

To make this clearer, it’s worth discussing the <expr> tag now:

<expr> ::= <number> | <var> | <expr> <op> <expr>

So an expression is either a number, a variable, or an operation (more on all of these later).

Back to our if statement, our condition takes the form if <cond>: which expands to:

if <expr> <c-op> <expr>:

Which can then expand to something like:

if <var> >= <number>:

And we have an if statement we recognise. The <if-opt> are a little more complex, as they have to include elif and else functionality, but they all boil down to an if statement, a condition, and then some <code> being written inside them.

Next, for loops:

<for> ::= for i in <list-var>:{:<fl-code>:}

<fl-stmt> ::= <var> = <expr> | <fl-if>

<fl-code> ::= <fl-stmt> | <fl-stmt>{::}<fl-code>

So the for loop takes the form for i in list: then creates an indent (on a new line) for some statement. The statement itself could just be our <stmt> from above, but this would allow nested for loops, which can exponentially increase evaluation time. Unless you think nested for loops are absolutely necessary, avoid them (this is a pretty good rule for programming in general).

So our <fl-stmt> only includes variable assignment, and a special if statement, <fl-if> :

<fl-if> ::= if <cond>:<fl-if-opt>

<fl-if-opt> ::= {:<fl-code>:} | {:<fl-code>:}else:{:<fl-code>:} | {:<fl-code>:}elif <cond>:{:<fl-if-opt>:}

This is actually identical to the if statement grammar above, but instead of calling <code> , it calls <fl-code> , basically to avoid for loops inside if statements inside for loops. A little cumbersome, but better than allowing nested for loops!

Finally, variable assignment. We’ve already looked at <expr> , so let’s look at parts that make it up:

<var> ::= m | i

<list-var> ::= li <number> ::= <num><num><num> | <num><num> | <num>

<num> ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 <op> ::= + | - | * | / | // | %

This should be pretty straightforward. Our variables are m which we initialised at the beginning, and i which is initialised if a for-loop is run. Numbers can go from 0 to 999, and we have all the python operators (except exponential, which can cause integer overflow and a whole bunch of very slow to execute algorithms. Like the nested for loops, only include it if you need it).

And believe it or not, that’s our grammar done! Let’s see how we can go from a <code> tag to our final function. A chromosome like:

[0, 1, 0, 1, 0, 1, 1, 2, 1, 0, 0, 0, 0, 1, 1]

Mapped out, this gives us:

def fun(li):

for i in li:

if i > m:

m = i

return_val = fun(test_list)

Sweet! Now that we have our fitness function and our grammar, how can we run this with PonyGE2?

Well, if the fitness function is in PonyGE2/src/fitness (and has the same filename as the class it holds, max_in_list in this case), and the grammar is in PonyGE2/grammars, we can run from the command line:

python ponyge.py --fitness_function max_in_list --grammar_file max_in_list.pybnf

This will run your evolution with your grammar and fitness function, with default parameters for other options. There are a lot, and it’s worth reading the reference documentation, but the two really important ones are Population size and Number of generations.

The population size is the number of chromosomes generated at the beginning, and evolved at each generation. The larger this number, the wider the search, but the slower the evolution time.

The number of generations is the number of evolutions that the chromosomes go through. Again, it widens the search but at the cost of evolution speed.

You can set these parameters with:

python ponyge.py --fitness_function max_in_list --grammar_file MIL_e2.pybnf --population_size 500 --generations 100

Now for simple grammars, values for these parameters in the range of 100–500 is sufficient, but as the grammars get more complex, they require more population and generations to ensure a sufficient algorithm is evolved. A good test is to add complexity to your grammar, and run it with the same fitness function as above. If you get a function which finds the maximum in a list, you can be confident that your parameters are appropriate, and then turn it onto your other fitness function for the problem you’re trying to solve.

Unfortunately, run times get large quite quickly. For me, a population of 2000 and 750 generations took around 6 and a half hours, and that wasn’t enough to evolve an appropriate algorithm with the grammar above (but also including list and dictionary operations).

And herein lies the issue with GE. The promise is that if you can write a fitness function, and a robust grammar that encompasses a wide range of Python operations, then GE will be able to evolve a more efficient way of doing things. Unfortunately, the processing power and time this takes at present is infeasible on public machines.

Regardless, this is a field to watch, as if we’re going to teach machines to code, there’s a good bet that Grammatical Evolution will have something to do with it.