I've been experimenting with genetic algorithms as of late and now I'd like to build mathematical expressions out of the genomes (For easy talk, its to find an expression that matches a certain outcome).
I have genomes consisting of genes which are represented by bytes, One genome can look like this: {12, 127, 82, 35, 95, 223, 85, 4, 213, 228}. The length is predefined (although it must fall in a certain range), neither is the form it takes. That is, any entry can take any byte value.
Now the trick is to translate this to mathematical expressions. It's fairly easy to determine basic expressions, for example: Pick the first 2 values and treat them as products, pick the 3rd value and pick it as an operator ( +, -,开发者_开发技巧 *, /, ^ , mod ), pick the 4th value as a product and pick the 5th value as an operator again working over the result of the 3rd operator over the first 2 products. (or just handle it as an postfix expression)
The complexity rises when you start allowing priority rules. Now when for example the entry under index 2 represents a '(', your bound to have a ')' somewhere further on except for entry 3, but not necessarily entry 4
Of course the same goes for many things, you can't end up with an operator at the end, you can't end up with a loose number etc.
Now i can make a HUGE switch statement (for example) taking in all the possible possibilities but this will make the code unreadable. I was hoping if someone out there knows a good strategy of how to take this one on.
Thanks in advance!
** EDIT **
On request: The goal I'm trying to achieve is to make an application which can resolve a function for a set of numbers. As for the example I've given in the comment below: {4, 11, 30} and it might come up with the function (X ^ 3) + X
Belisarius in a comment gave a link to an identical topic: Algorithm for permutations of operators and operands
My code:
private static double ResolveExpression(byte[] genes, double valueForX)
{
// folowing: https://stackoverflow.com/questions/3947937/algorithm-for-permutations-of-operators-and-operands/3948113#3948113
Stack<double> operandStack = new Stack<double>();
for (int index = 0; index < genes.Length; index++)
{
int genesLeft = genes.Length - index;
byte gene = genes[index];
bool createOperand;
// only when there are enough possbile operators left, possibly add operands
if (genesLeft > operandStack.Count)
{
// only when there are at least 2 operands on the stack
if (operandStack.Count >= 2)
{
// randomly determine wether to create an operand by threating everything below 127 as an operand and the rest as an operator (better then / 2 due to 0 values)
createOperand = gene < byte.MaxValue / 2;
}
else
{
// else we need an operand for sure since an operator is illigal
createOperand = true;
}
}
else
{
// false for sure since there are 2 many operands to complete otherwise
createOperand = false;
}
if (createOperand)
{
operandStack.Push(GeneToOperand(gene, valueForX));
}
else
{
double left = operandStack.Pop();
double right = operandStack.Pop();
double result = PerformOperator(gene, left, right);
operandStack.Push(result);
}
}
// should be 1 operand left on the stack which is the ending result
return operandStack.Pop();
}
private static double PerformOperator(byte gene, double left, double right)
{
// There are 5 options currently supported, namely: +, -, *, /, ^ and log (math)
int code = gene % 6;
switch (code)
{
case 0:
return left + right;
case 1:
return left - right;
case 2:
return left * right;
case 3:
return left / right;
case 4:
return Math.Pow(left, right);
case 5:
return Math.Log(left, right);
default:
throw new InvalidOperationException("Impossible state");
}
}
private static double GeneToOperand(byte gene, double valueForX)
{
// We only support numbers 0 - 9 and X
int code = gene % 11; // Get a value between 0 and 10
if (code == 10)
{
// 10 is a placeholder for x
return valueForX;
}
else
{
return code;
}
}
#endregion // Helpers
}
Use "post-fix" notation. That handles priorities very nicely.
Post-fix notation handles the "grouping" or "priority rules" trivially.
For example, the expression b**2-4*a*c, in post-fix is
b, 2, **, 4, a, *, c, *, -
To evaluate a post-fix expression, you simply push the values onto a stack and execute the operations.
So the above becomes something approximately like the following.
stack.push( b )
stack.push( 2 )
x, y = stack.pop(), stack.pop(); stack.push( y ** x )
stack.push( 4 )
stack.push( a )
x, y = stack.pop(), stack.pop(); stack.push( y * x )
stack.push( c )
x, y = stack.pop(), stack.pop(); stack.push( y * x )
x, y = stack.pop(), stack.pop(); stack.push( y - x )
To make this work, you need to have to partition your string of bytes into values and operators. You also need to check the "arity" of all your operators to be sure that the number of operators and the number of operands balances out. In this case, the number of binary operators + 1 is the number of operands. Unary operators don't require extra operands.
As ever with GA a large part of the solution is choosing a good representation. RPN (or post-fix) has already been suggested. One concern you still have is that your GA might throw up expressions which begin with operators (or mismatch operators and operands elsewhere) such as:
+,-,3,*,4,2,5,+,-
A (small) part of the solution would be to define evaluations for operand-less operators. For example one might decide that the sequence:
+
evaluates to 0, which is the identity element for addition. Naturally
*
would evaluate to 1. Mathematics may not have figured out what the identity element for division is, but APL has.
Now you have the basis of an approach which doesn't care if you get the right sequence of operators and operands, but you still have a problem when you have too many operands for the number of operators. That is, what is the intepretation of (postfix following) ?
2,4,5,+,3,4,-
which (possibly) evaluates to
2,9,-1
Well, now you have to invent your own convention if you want to reduce this to a single value. But you could adopt the convention that the GA has created a vector-valued function.
EDIT: response to OP's comment ...
If a byte can represent either an operator or an operand, and if your program places no restrictions on where a genome can be split for reproduction, then there will always be a risk that the offspring represents an invalid sequence of operators and operands. Consider, instead of having each byte encode either an operator or an operand, a byte could encode an operator+operand pair (you might run out of bytes quickly so perhaps you'd need to use two bytes). Then a sequence of bytes might be translated to something like:
(plus 1)(plus x)(power 2)(times 3)
which could evaluate, following a left-to-right rule with a meaningful interpretation for the first term, to 3((x+1)^2)
精彩评论