Table of Contents
1. Introduction
Even I have already dealt in the past with parsing of mathematical expressions in [1], still here another and better approach to tackle this topic.
Accidentally, I came across the shunting yard algorithm ([2]) some weeks ago which I did not know up to this point (although it was invented by famous Edsger Dijkstra).
While it is quite similar to the "The postfix stack approach" solution presented in [1], the approach is more elegant and also handles operator associativity correctly.
This article describes how a mathematical expression in infix notation is parsed, converted to postfix notation using the shunting yard algorithm and finally evaluated using a stack.
Let's start and create a nice implementation.
1.1 Fundamentals
The shunting yard algorithm takes as input an mathematical expression in infix notation and converts it into postfix notation, by correctly taking the precedence and associativity of the operators into account.
Operator precedence defines how the operators are grouped to determine the order in which the operators shall be applied.
For example, the expression 2 + 3 * 4 is treated as 2 + (3 * 4) so the multiplication is executed before the addition.
This is due to the fact that the multiplication operator has an higher precedence than the addition operator.
Note that parentheses have the highest precedence so they can be used to 'override' the precedence order of operations.
Operator associativity comes into play when two or more operators are used which have the same precedence. Then the question arises how the operators are grouped, so in which order they are evaluated.
It answers the question if 10 - 7 + 1 is evaluated as (10 - 7) + 1 or 10 - (7 + 1). This is important because the results are not the same (4 != 2). Most operators, like addition or multiplication,
are left-to-right associative, meaning they are evaluated from left to right, so the correct result of 10 - 7 + 1 is (10 - 7) + 1 = 2.
However, there are also some right-to-left associative operators, like exponentiation (^ operator).
For example, 2 ^ 3 ^ 2 is correctly evaluated as 2 ^ (3 ^ 2) = 2 ^ 9 = 512 and NOT as (2 ^ 3) ^ 2 = 8 ^ 2 = 64.
Postfix notation (also called reverse polish notation) is a mathematical notation in which the operands precede its operators.
The great benefit of postfix notation compared to infix notation is that it does not contain parentheses and dedicated operator precedence. The precedence is given by notation.
Further, an expression in postfix notation is easier to be evaluated than an expression in infix notation.
For example, the infix term 2 + 3 * 4 is denoted in postfix notation as 2 3 4 * +.
2. Shunting Yard Algorithm Parser
The general steps of our shunting yard algorithm parser are the following:
- Input preparation: "Tokenize"" the infix expression into a list of tokens.
- Shunting Yard Algorithm: The token list is input to the shunting yard algorithm which converts the expression into postfix notation.
- Evaluation: The expression in postfix notation is finally evaluated by a simple stack-based approach.
Note that the evaluation of the expression is not part of the actual shunting yard algorithm.
Further, the shunting yard algorithm can also produce an abstract syntax tree instead of a postfix notation string, but for simplicity only the postfix notation variant is implemented.
2.1 Tokens, Precedence and Associativity
At first, define the operators with their precedence and associativity which our parser will support.
Precedence value | Operator | Operator name | Associativity |
4 | () | Parentheses | Left-to-right |
3 | ** | Exponentiation | Right-to-left |
2 | \ * | Division Multiplication | Left-to-right |
1 | + - | Addition Subtraction | Left-to-right |
A higher value above means a higher precedence.
Further, numerical values as floating point numbers in decimal notation are supported and parsed also into tokens.
Following tokens are supported by our parser: NUMBER (any decimal number, parsed into a double), EXP (Exponentiation), DIV (Division), MUL (Multiplication), PLUS (Addition), SUB (Subtraction)
CURLY_BRACKET_OPEN (left parenthesis) and CURLY_BRACKET_CLOSE (right parenthesis).
The concrete parsing and tokenization is not further considered in this article, see [3] for further details and refer to the example source code at top of this article.
2.2 The Shunting Yard Algorithm
The shunting yard algorithm uses an stack to temporarily store operators from the input list for later usage.
The output is list of the tokens in postfix notation. Summarized, the used data structures are:
- Input: A list of tokens in infix expression.
- Internally: A stack to store the operators.
- Output: A list of tokens in postfix notation, representing the input infix expression.
The algorithm itself processes the input tokens one after another according to following rules:
Rule 1
Add it directly to the output token list.
Rule 2
Add it to the operator stack.
Rule 3
While the token at top of the operator stack is not a left parenthesis:
Pop the token from the stack and add it to the output token.
Remove the final left parenthesis from the stack and discard it (do not add it to the output list).
Rule 4
While the token at top of the operator stack is
not a left parenthesis AND
has a higher precedence OR (the same precedence and current token is left associative):
Pop the token from the stack and add it to the output token list
Add current input token operator to the operator stack.
Finally, while the operator stack is not empty, pop the token from the stack and add it to the output token list.
2.3 Implementation
Based on these rules, the implementation is straight-forward.
In the following, the core part of the relevant source code is shown:
{
IList<Token> output = new List<Token>();
Stack<Token> operatorStack = new Stack<Token>();
foreach (Token token in inputTokens)
{
switch (((SypTokenType)(token.TokenType)).Type)
{
/* Rule 1 */
case ESypTokenType.NUMBER:
{
output.Add(token);
}
break;
/* Rule 2 */
case ESypTokenType.CURLY_BRACKET_OPEN:
{
operatorStack.Push(token);
}
break;
/* Rule 3 */
case ESypTokenType.CURLY_BRACKET_CLOSE:
{
while ( operatorStack.Count != 0 &&
(((SypTokenType)operatorStack.Peek().TokenType).Type != ESypTokenType.CURLY_BRACKET_OPEN) )
{
output.Add(operatorStack.Pop());
}
/* remove the left parenthesis from the operator stack. Following condition must be true
otherwise the input string was not correctly formatted with parentheses */
if ( operatorStack.Count != 0 && (((SypTokenType)operatorStack.Peek().TokenType).Type == ESypTokenType.CURLY_BRACKET_OPEN))
{
operatorStack.Pop();
}
}
break;
/* Rule 4 */
case ESypTokenType.PLUS:
case ESypTokenType.MINUS:
case ESypTokenType.MUL:
case ESypTokenType.DIV:
case ESypTokenType.EXP:
{
while (operatorStack.Count != 0 &&
(((SypTokenType)operatorStack.Peek().TokenType).Type != ESypTokenType.CURLY_BRACKET_OPEN) &&
( (GetOperatorPrecedence(operatorStack.Peek()) > GetOperatorPrecedence(token)) ||
(GetOperatorPrecedence(operatorStack.Peek()) == GetOperatorPrecedence(token) && GetOperatorAssociativity(token) == EAssociativity.LEFT_TO_RIGHT)
))
{
output.Add(operatorStack.Pop());
}
operatorStack.Push(token);
}
break;
default: break;
}
}
/* Final step */
while (operatorStack.Count != 0)
{
output.Add(operatorStack.Pop());
}
return output;
}
2.4 Examples of the Algorithm Processing
Let's have a look at some examples which demonstrate the execution of the algorithm:
Example 1: Shunting yard algorithm for the infix expression 3 * (4 + 5) ** 2.
Iteration | Input | Rule | Operator stack | Output | Notes |
---|---|---|---|---|---|
1 | 3 * (4 + 5) ** 2 | Rule 1 | empty | 3 | Add 3 to output |
2 | * (4 + 5) ** 2 | Rule 4 | * | 3 | Add * to operator stack |
3 | ( 4 + 5) ** 2 | Rule 2 | * ( | 3 | Add ( to operator stack |
4 | 4 + 5) ** 2 | Rule 1 | * ( | 3 4 | Add 4 to output |
5 | + 5) ** 2 | Rule 4 | * ( + | 3 4 | Add + to operator stack |
6 | 5 ) ** 2 | Rule 1 | * ( + | 3 4 5 | Add 5 to output |
7 | ) ** 2 | Rule 3 | * | 3 4 5 + | Pop + operator from stack to output, remove left parenthesis from operator stack |
8 | ** 2 | Rule 4 | * ** | 3 4 5 + | Add ** to operator stack |
9 | 2 | Rule 1 | * ** | 3 4 5 + 2 | Add 2 to output |
- | empty | Final step | * ** | 3 4 5 + 2 ** * | Pop all operators from stack and add them to output |
The result is 3 4 5 + 2 ** *.
Example 2: With right-to-left associative operator: 4 ** 3 ** 2.
Iteration | Input | Rule | Operator stack | Output | Notes |
---|---|---|---|---|---|
1 | 4 ** 3 ** 2 | Rule 1 | empty | 4 | Add 4 to output |
2 | ** 3 ** 2 | Rule 4 | ** | 4 | Add ** to operator stack |
3 | 3 ** 2 | Rule 1 | ** | 4 3 | Add 3 to output |
4 | ** 2 | Rule 4 | ** ** | 4 3 | No operator is taken from the operator stack because
operator has same precedence but is not left-to-right associative, Add ** to operator stack |
5 | 2 | Rule 1 | ** ** | 4 3 2 | Add 2 to output |
- | empty | Final step | 4 3 2 ** ** | Pop all operators from stack and add them to output |
The result is 4 3 2 ** **.
Note that in case the exponentiation operator would have not been correctly handled as right-to-left associative, then in iteration step 4 the ** operator would have been popped from the operator stack and added to the output.
This would finally result in the incorrect postfix notation "4 3 ** 2 **" (check it out for yourself if you like). So this example shows the special handling of right-to-left associative
Example 3: With right-to-left associative operator and parentheses: (4 ** 3) ** 2.
Iteration | Input | Rule | Operator stack | Output | Notes |
---|---|---|---|---|---|
1 | ( 4 ** 3) ** 2 | Rule 2 | ( | Add ( to operator stack | |
2 | 4 ** 3) ** 2 | Rule 1 | ( | 4 | Add 4 to output |
3 | ** 3) ** 2 | Rule 4 | ( ** | 4 | Add ** to operator stack (no operator is popped from stack because left parenthesis is on top |
4 | 3 ) ** 2 | Rule 1 | ( ** | 4 3 | Add 3 to output |
5 | ) ** 2 | Rule 1 | 4 3 ** | Take ** operator from stack and add it to output. Remove left parenthesis from operator stack top | |
6 | ** 2 | Rule 4 | ** | 4 3 ** | Add ** to operator stack |
7 | 2 | Rule 1 | ** | 4 3 ** 2 | Add 2 to output |
- | empty | Final step | 4 3 ** 2 ** | Pop all operators from stack and add them to output |
The result is 4 3 ** 2 **.
3. PostFix Notation Evaluation using a Stack
The evaluation of the token list in postfix notation can be realized very simple using a stack. Initially, the stack is empty.
The tokens are processed one after another from beginning to end according to following two rules:
- Rule 1: If the token is a number, just push it directly onto the stack.
- Rule 2: If the token is an operator, pop the number of needed operators from the stack, apply the operator to the popped values and push the result onto the stack.
If the postfix token stack is correct, then after all tokens are processed, there should be exactly one item in the stack left - the result of the expression!
An example can already be found in [1], so here directly the relevant source code part is shown:
{
Stack<double> stack = new Stack<double>();
while (postFixTokens.Count > 0)
{
Token token = postFixTokens.First();
postFixTokens.RemoveAt(0);
Tokenizer.SypTokenType.ESypTokenType tokType = ((SypTokenType)(token.TokenType)).Type;
if (tokType == SypTokenType.ESypTokenType.NUMBER)
{
stack.Push((double)token.TokenValue);
}
else
{
double secondVal = stack.Pop();
double firstVal = stack.Pop();
if (tokType == SypTokenType.ESypTokenType.PLUS)
{
stack.Push(firstVal + secondVal);
}
else if (tokType == SypTokenType.ESypTokenType.MINUS)
{
stack.Push(firstVal - secondVal);
}
else if (tokType == SypTokenType.ESypTokenType.MUL)
{
stack.Push(firstVal * secondVal);
}
else if (tokType == SypTokenType.ESypTokenType.DIV)
{
stack.Push(firstVal / secondVal);
}
else if (tokType == SypTokenType.ESypTokenType.EXP)
{
stack.Push(Math.Pow(firstVal, secondVal));
}
}
}
return stack.Pop();
}
4. Final notes
Note that the algorithm itself does not perform any error detection, and also the sample implementation does not implement any validation.
Thus invalid prefix expressions (incorrect order of operators, numbers or brackets and also incomplete expressions) are not detected and potentially lead to
either wrong postfix notation with subsequent incorrect evaluation or, more probably, just to an exception.
If validation of the input is required, this needs to be implemented additionally before the the input is passed to the algorithm, i.e. as separate step or during tokenization.
5. Conclusion & References
Another fantastic parsing algorithm was covered, explained in detail and implemented exemplarily. This was fun!
Hopefully you found it interesting and learned something!
References
* [1] Different approaches to write a simple parser for mathematical expressions - Some introductory notes
* [2] Shunting yard algorithm @ Wikipedia
* [3] Writing a simple generic tokenizer
History
- 2025/02/15: Initial version.