@Wayne-Z
2017-12-14T11:58:11.000000Z
字数 10055
阅读 2067
compiler
This document introduces about the final implementation of the interpret and execution part of the CMM Interpreter.
Finally I change the origin grammar decided before to get a better implement.
grammar Cmm;
import CmmLex;
program: stmt+;
stmt : declStmt
| ifStmt
| whileStmt
|(assignStmt SEMI)
| writeStmt
| readStmt
| stmtBlock
| forStmt
;
declStmt: type deAs(COMMA deAs)* SEMI #declStmt1
| type L_BRACKET INT R_BRACKET ID SEMI #declStmt2
;
deAs: ID #deAs1
| ID ASSIGN expr #deAs2
| ID ASSIGN MINUS expr #deAs3
;
ifStmt : IF L_PAREN compare R_PAREN stmtBlock elseIfStmt*
| IF L_PAREN compare R_PAREN stmtBlock elseIfStmt* elseStmt
;
elseIfStmt : ELSE_IF L_PAREN compare R_PAREN stmtBlock;
elseStmt : ELSE stmtBlock;
whileStmt: WHILE L_PAREN compare R_PAREN stmtBlock;
assignStmt: variable ASSIGN expr #assignStmt1
| variable ASSIGN MINUS expr #assignStmt2
;
writeStmt: WRITE L_PAREN expr R_PAREN SEMI;
readStmt: READ L_PAREN variable R_PAREN SEMI;
stmtBlock: L_BRACE stmt* R_BRACE;
forStmt : FOR L_PAREN assignStmt SEMI compare SEMI assignStmt R_PAREN stmtBlock #forStmt1
| FOR L_PAREN declStmt compare SEMI assignStmt R_PAREN stmtBlock #forStmt2
;
type: T_INT #type1
| T_DOUBLE #type2
|T_BOOL #type3
;
variable: ID #variable1
| ID L_BRACKET expr R_BRACKET #variable2
;
constant: INT| DOUBLE ;
compare : LOG_NOT compare #compare1
| expr op=(BIGGER_THAN|LESS_THAN|NO_LESS|NO_MORE|EQUAL|UN_EQUAL) expr #compare2
| expr #compare3
//| L_PAREN compare R_PAREN
;
expr : expr op=(MUL|DIV|MOD) expr #mulExpr1
| expr op=(PLUS|MINUS) expr #expr1
| variable #factor1
| constant #factor2
| L_PAREN expr R_PAREN #factor3
| L_PAREN MINUS expr R_PAREN #factor4
;
The expression part is kind different then we dessigned before when start the final work and I change them to this eventually. Because others are not suitable for the listener. For example, my teammate found this works well when he write his own syntax analisis, but it comes out wrong as the picture show below.
But we can see that it got problems in some expression like this.
5-2/(4.0-3)*2.5+0.01
However, 2/(4.0-3) should be executed first, but it goes wrong. SO the finally the grammar was changed even when the work is finished.
Firstly add necessary mark like #factor4
to make the antlr create specified listener function for each marked line. Then we can modified each case of each statement. And then create a new listener named IRProduceListener extends the antlr-generated class CmmBaseListener. Then we add codes to generate quaternion code in each overriden listener function.
It declare variables using DECL action.
(DECL, TYPE, null, ID)//to declare a normal variable
(DECL, TYPE, INT, ID) //To declare an array
The code in listener is like this.
//type L_BRACKET INT R_BRACKET ID SEMI;
@Override public void exitDeclStmt2(CmmParser.DeclStmt2Context ctx) {
Code.add(currentLine = new IRLine(DECL, TYPE, ctx.INT().getText(), ctx.ID().getText()));
}
It supports array access by using VARIABLE, when it need to access some variable's value no matter it is an ID or an array element, we use MOV action to let VARIABLE stores exactly the index the variable in the symbol list.
(MOV, ID, null, VARIABLE)//to access a normal variable
(MOV, ID, EXPR, VARIABLE) //To access an array element
//variable: ID | ID L_BRACKET expr R_BRACKET ;
@Override public void enterVariable1(CmmParser.Variable1Context ctx) {
Code.add(currentLine = new IRLine(MOV, ctx.ID().getText(), null, VARIABLE));
}
@Override public void exitVariable2(CmmParser.Variable2Context ctx) {
Code.add(currentLine = new IRLine(MOV, ctx.ID().getText(), EXPR, VARIABLE));
}
This grammar support '+-*/%' operation and '>,<,>=,<=,!=,==' comparison. When generating quaternion code in listener, it use some register like variables such as COMPARE, EXPR, as temporary variables to store and pass the medium value.
//expr : mulExpr op=(PLUS|MINUS) expr
@Override public void exitExpr1(CmmParser.Expr1Context ctx) {
switch (ctx.op.getType()) {
case CmmLexer.PLUS:
Code.add(currentLine = new IRLine(PLUS, EXPR, EXPR, EXPR));
break;
case CmmLexer.MINUS:
Code.add(currentLine = new IRLine(MINUS, EXPR, EXPR, EXPR));
break;
}
}
//mulExpr : expr op=(MUL|DIV|MOD) expr;
@Override public void exitMulExpr1(CmmParser.MulExpr1Context ctx) {
switch (ctx.op.getType()) {
case CmmLexer.MUL:
Code.add(currentLine = new IRLine(MUL, EXPR, EXPR, EXPR));
break;
case CmmLexer.DIV:
Code.add(currentLine = new IRLine(DIV, EXPR, EXPR, EXPR));
break;
case CmmLexer.MOD:
Code.add(currentLine = new IRLine(MOD, EXPR, EXPR, EXPR));
break;
}
}
Finally it all use EXPR to store the temporary value, and it the details will be discuss in the excution part.
It use a stack to help record the code that need to be backfilled.
//whileStmt: WHILE L_PAREN compare R_PAREN stmtBlock;
@Override public void enterWhileStmt(CmmParser.WhileStmtContext ctx) {
Code.add(currentLine = new IRLine(LABEL, null,null,String.valueOf(Code.size())));
stack.push(currentLine);
}
@Override public void enterStmtBlock(CmmParser.StmtBlockContext ctx) {
Code.add(currentLine = new IRLine(IN, null, null, null));
Code.add(currentLine = new IRLine(JMP, COMPARE, FORWARD, null));
stack.push(currentLine);
}
@Override public void exitStmtBlock(CmmParser.StmtBlockContext ctx) {
if(ctx.getParent() instanceof CmmParser.WhileStmtContext || ctx.getParent() instanceof CmmParser.ForStmtContext) {
currentLine = stack.pop();
currentLine.dest = String.valueOf(Code.size() + 2);
} else if (ctx.getParent() instanceof CmmParser.IfStmtContext ||ctx.getParent() instanceof CmmParser.ElseIfStmtContext) {
Code.add(currentLine = new IRLine(JMP, null, null, null));
stack.pop().dest = String.valueOf(Code.size());
stack.push(Code.get(Code.size()-1));
}
Code.add(currentLine = new IRLine(OUT, null, null, null));
}
@Override public void exitWhileStmt(CmmParser.WhileStmtContext ctx) {
Code.add(currentLine = new IRLine(JMP,null,BACKWARD,String.valueOf(stack.pop().dest)));
}
The execution and code generation sequence is shown below.
When step into while/for statement, the LABEL and JMP IR are pushed into the stack, when step out, the stack will pop the JMP IR firstly and fillin the dest by measure the current size of the IR's list, and the LABEL will fillin the dest of the final JMP IR which make the code JMP to the start of the while/for block.
In case of if-else condition ,the only difference is that the final JMP will also be pushed into stack, and when the whole if stmt is finshed, each JMP IR's dest of each starement block will be fillin by the current size of the whole list.
A factorial code is like this.
int a = 6;
int factorial = 1;
while( a != 0 ){
factorial = factorial * a;
a = a -1;
}
write( factorial );
And the final output of the IR is like this below. The first column represents the action, all action is marked as an static intteger.
public static final int
LABEL=0, JMP = 1,MOV = 2, WRITE = 3, DECL = 4, ASSIGN = 5,
PLUS = 6, MINUS = 7, MUL = 8, DIV = 9, MOD = 10, BIGGER_THAN = 11,
LESS_THAN = 12, NO_LESS = 13, NO_MORE = 14, EQ = 15,
UNEQ = 16, LOG_OR = 17, LOG_AND = 18, LOG_NOT = 19,
Minus = 20, READ = 21, IN = 22, OUT = 23;
And the second and third column means the operand, the execution outcomes will be stored in the forth column.
2 1 null TYPE
2 6 null EXPR
4 TYPE null a
2 EXPR null a
2 1 null TYPE
2 1 null EXPR
4 TYPE null factorial
2 EXPR null factorial
0 null null 8
2 a null VARIABLE
2 VARIABLE null EXPR
2 EXPR null COMPARE
1 null BACKWARD 8
22 null null null
1 COMPARE FORWARD null
2 factorial null VARIABLE
2 factorial null VARIABLE
2 VARIABLE null EXPR
2 a null VARIABLE
2 VARIABLE null EXPR
8 EXPR EXPR EXPR
5 EXPR null VARIABLE
2 a null VARIABLE
2 a null VARIABLE
2 VARIABLE null EXPR
2 1 null EXPR
7 EXPR EXPR EXPR
5 EXPR null VARIABLE
23 null null null
2 factorial null VARIABLE
2 VARIABLE null EXPR
3 EXPR null STDOUT
public IRExecuter(ArrayList<IRLine> i){
IR = i;
currentLine = 0;
symlist = new LinkedList<Symbol>();
COMPARE = new Stack<Value>();
EXPR = new Stack<Value>();
VARIABLE= new Stack<Integer>();
}
COMPARE and EXPR is kind of register but they are all declared as Stack to avoid the case that all register are used. When doing operation ,all temp value will be pushed into EXPR, and then pop the top 2 Value to do the operation. This is kind of postfix expression. The VARIABLE is used to store the address that is need to pass the value to expression.
As said before, the symlist stores the all variables used in the origin code. It uses MOV action to access the symlist to get the variable index into the VARIABLE Stack no matter it is an ID or an array element.
case IRProduceListener.VARIABLE:
int a = isValExist(currentIR.second);
if(currentIR.third!=null)
a -= EXPR.pop().mInt;
VARIABLE.push(a);
break;
The IR is always like this
MOV a null VARIABLE
MOV VARIABLE null EXPR
The index is passed to VARIABLE, and the value is passed to EXPR by the index stored in VARIABLE.
Take plus and minus as example.
public void PLUSMINUS() throws Exception{
Value b = EXPR.pop();
Value a = EXPR.pop();
if(a.mType==Value.INT&&b.mType==Value.INT) {
switch (currentIR.first) {
case PLUS:
EXPR.push(new Value(Value.INT,new Double(a.getValue()+b.getValue()).intValue()));
break;
case MINUS:
EXPR.push(new Value(Value.INT,new Double(a.getValue()-b.getValue()).intValue()));;
break;
}
} else {
switch (currentIR.first) {
case PLUS:
EXPR.push(new Value(Value.DOUBLE,a.getValue()+b.getValue()));
break;
case MINUS:
EXPR.push(new Value(Value.DOUBLE,a.getValue()-b.getValue()));
break;
}
}
}
As said before, all the temporary value will be stored in EXPR. Like this code
factorial = factorial * a;
This is one line taken from the example, and its IR is shown like this, 2 means MOV, 8 means PLUS, 5 means assign
2 factorial null VARIABLE
2 factorial null VARIABLE
2 VARIABLE null EXPR
2 a null VARIABLE
2 VARIABLE null EXPR
8 EXPR EXPR EXPR
5 EXPR null VARIABLE
VARIABLE, EXPR are all implemented as a STACK, so the store action means a push action on the stack, the access means the pop. So it first push the index of factorial into VARIABLE twice, one for assignment, one for operation, and then the VARIABLE pop the top index to pass the value, which is pushed, to EXPR stack. Similarly, a's value is passed into EXPR in the same way. Then the top two value stored in EXPR is poped and used to do add operation and the outcome is pushed into EXPR, and this value then be poped to pass to the variable, by poping the index in VARIABLE and find the index and change the value.
In the command console, using -r option to process the testcases provided using scripts in a command line mod, we get all expected outcomes.
Actually, the IR(quaternion code) Engine is more powerful than to implement the execution part of a CMM Interpreter. Because the engine is dessigned to be used in a more general way and take in some ideas from Assembly Code, though the IR generated is much simple using the listener generated by antlr.
Due to many reasons, this final work is done in a such a hurry. The origin aim is to take Assembly Code as the IR, however, time is limited and it is difficult do deal with the register part, so we changeed to implement a simpler one in half way, which is a pitty.
Although it pass all right cases, it dose not performe well with wrong cases.
All in all, I still learnt a lot in this process, including some knowledges about Assembly Code, the Antlr's listener mode and the whole process of what a compiler does.