[关闭]
@Wayne-Z 2017-12-14T11:58:11.000000Z 字数 10055 阅读 2036

Cmm Interpreter

compiler



Overview

This document introduces about the final implementation of the interpret and execution part of the CMM Interpreter.

Grammar

Finally I change the origin grammar decided before to get a better implement.

  1. grammar Cmm;
  2. import CmmLex;
  3. program: stmt+;
  4. stmt : declStmt
  5. | ifStmt
  6. | whileStmt
  7. |(assignStmt SEMI)
  8. | writeStmt
  9. | readStmt
  10. | stmtBlock
  11. | forStmt
  12. ;
  13. declStmt: type deAs(COMMA deAs)* SEMI #declStmt1
  14. | type L_BRACKET INT R_BRACKET ID SEMI #declStmt2
  15. ;
  16. deAs: ID #deAs1
  17. | ID ASSIGN expr #deAs2
  18. | ID ASSIGN MINUS expr #deAs3
  19. ;
  20. ifStmt : IF L_PAREN compare R_PAREN stmtBlock elseIfStmt*
  21. | IF L_PAREN compare R_PAREN stmtBlock elseIfStmt* elseStmt
  22. ;
  23. elseIfStmt : ELSE_IF L_PAREN compare R_PAREN stmtBlock;
  24. elseStmt : ELSE stmtBlock;
  25. whileStmt: WHILE L_PAREN compare R_PAREN stmtBlock;
  26. assignStmt: variable ASSIGN expr #assignStmt1
  27. | variable ASSIGN MINUS expr #assignStmt2
  28. ;
  29. writeStmt: WRITE L_PAREN expr R_PAREN SEMI;
  30. readStmt: READ L_PAREN variable R_PAREN SEMI;
  31. stmtBlock: L_BRACE stmt* R_BRACE;
  32. forStmt : FOR L_PAREN assignStmt SEMI compare SEMI assignStmt R_PAREN stmtBlock #forStmt1
  33. | FOR L_PAREN declStmt compare SEMI assignStmt R_PAREN stmtBlock #forStmt2
  34. ;
  35. type: T_INT #type1
  36. | T_DOUBLE #type2
  37. |T_BOOL #type3
  38. ;
  39. variable: ID #variable1
  40. | ID L_BRACKET expr R_BRACKET #variable2
  41. ;
  42. constant: INT| DOUBLE ;
  43. compare : LOG_NOT compare #compare1
  44. | expr op=(BIGGER_THAN|LESS_THAN|NO_LESS|NO_MORE|EQUAL|UN_EQUAL) expr #compare2
  45. | expr #compare3
  46. //| L_PAREN compare R_PAREN
  47. ;
  48. expr : expr op=(MUL|DIV|MOD) expr #mulExpr1
  49. | expr op=(PLUS|MINUS) expr #expr1
  50. | variable #factor1
  51. | constant #factor2
  52. | L_PAREN expr R_PAREN #factor3
  53. | L_PAREN MINUS expr R_PAREN #factor4
  54. ;

Comprison

The expression part is kind different then we dessigned before when start the final work and I change them to this eventually. Because others are not suitable for the listener. For example, my teammate found this works well when he write his own syntax analisis, but it comes out wrong as the picture show below.
inteperter.PNG-12.8kB
But we can see that it got problems in some expression like this.

  1. 5-2/(4.0-3)*2.5+0.01

image_1c17t9ea5omihvr1sbn19at1fj311.png-23.3kB
However, 2/(4.0-3) should be executed first, but it goes wrong. SO the finally the grammar was changed even when the work is finished.

Key Part in Quaternion Generation

Listener mod

Firstly add necessary mark like #factor4 to make the antlr create specified listener function for each marked line. Then we can modified each case of each statement. And then create a new listener named IRProduceListener extends the antlr-generated class CmmBaseListener. Then we add codes to generate quaternion code in each overriden listener function.

Array Support

It declare variables using DECL action.

  1. (DECL, TYPE, null, ID)//to declare a normal variable
  2. (DECL, TYPE, INT, ID) //To declare an array

The code in listener is like this.

  1. //type L_BRACKET INT R_BRACKET ID SEMI;
  2. @Override public void exitDeclStmt2(CmmParser.DeclStmt2Context ctx) {
  3. Code.add(currentLine = new IRLine(DECL, TYPE, ctx.INT().getText(), ctx.ID().getText()));
  4. }

It supports array access by using VARIABLE, when it need to access some variable's value no matter it is an ID or an array element, we use MOV action to let VARIABLE stores exactly the index the variable in the symbol list.

  1. (MOV, ID, null, VARIABLE)//to access a normal variable
  2. (MOV, ID, EXPR, VARIABLE) //To access an array element
  1. //variable: ID | ID L_BRACKET expr R_BRACKET ;
  2. @Override public void enterVariable1(CmmParser.Variable1Context ctx) {
  3. Code.add(currentLine = new IRLine(MOV, ctx.ID().getText(), null, VARIABLE));
  4. }
  5. @Override public void exitVariable2(CmmParser.Variable2Context ctx) {
  6. Code.add(currentLine = new IRLine(MOV, ctx.ID().getText(), EXPR, VARIABLE));
  7. }

Operation

This grammar support '+-*/%' operation and '>,<,>=,<=,!=,==' comparison. When generating quaternion code in listener, it use some register like variables such as COMPARE, EXPR, as temporary variables to store and pass the medium value.

  1. //expr : mulExpr op=(PLUS|MINUS) expr
  2. @Override public void exitExpr1(CmmParser.Expr1Context ctx) {
  3. switch (ctx.op.getType()) {
  4. case CmmLexer.PLUS:
  5. Code.add(currentLine = new IRLine(PLUS, EXPR, EXPR, EXPR));
  6. break;
  7. case CmmLexer.MINUS:
  8. Code.add(currentLine = new IRLine(MINUS, EXPR, EXPR, EXPR));
  9. break;
  10. }
  11. }
  12. //mulExpr : expr op=(MUL|DIV|MOD) expr;
  13. @Override public void exitMulExpr1(CmmParser.MulExpr1Context ctx) {
  14. switch (ctx.op.getType()) {
  15. case CmmLexer.MUL:
  16. Code.add(currentLine = new IRLine(MUL, EXPR, EXPR, EXPR));
  17. break;
  18. case CmmLexer.DIV:
  19. Code.add(currentLine = new IRLine(DIV, EXPR, EXPR, EXPR));
  20. break;
  21. case CmmLexer.MOD:
  22. Code.add(currentLine = new IRLine(MOD, EXPR, EXPR, EXPR));
  23. break;
  24. }
  25. }

Finally it all use EXPR to store the temporary value, and it the details will be discuss in the excution part.

IF & WHILE

It use a stack to help record the code that need to be backfilled.

  1. //whileStmt: WHILE L_PAREN compare R_PAREN stmtBlock;
  2. @Override public void enterWhileStmt(CmmParser.WhileStmtContext ctx) {
  3. Code.add(currentLine = new IRLine(LABEL, null,null,String.valueOf(Code.size())));
  4. stack.push(currentLine);
  5. }
  6. @Override public void enterStmtBlock(CmmParser.StmtBlockContext ctx) {
  7. Code.add(currentLine = new IRLine(IN, null, null, null));
  8. Code.add(currentLine = new IRLine(JMP, COMPARE, FORWARD, null));
  9. stack.push(currentLine);
  10. }
  11. @Override public void exitStmtBlock(CmmParser.StmtBlockContext ctx) {
  12. if(ctx.getParent() instanceof CmmParser.WhileStmtContext || ctx.getParent() instanceof CmmParser.ForStmtContext) {
  13. currentLine = stack.pop();
  14. currentLine.dest = String.valueOf(Code.size() + 2);
  15. } else if (ctx.getParent() instanceof CmmParser.IfStmtContext ||ctx.getParent() instanceof CmmParser.ElseIfStmtContext) {
  16. Code.add(currentLine = new IRLine(JMP, null, null, null));
  17. stack.pop().dest = String.valueOf(Code.size());
  18. stack.push(Code.get(Code.size()-1));
  19. }
  20. Code.add(currentLine = new IRLine(OUT, null, null, null));
  21. }
  22. @Override public void exitWhileStmt(CmmParser.WhileStmtContext ctx) {
  23. Code.add(currentLine = new IRLine(JMP,null,BACKWARD,String.valueOf(stack.pop().dest)));
  24. }

The execution and code generation sequence is shown below.
While.jpg-66.4kB
When step into while/for statement, the LABEL and JMP IR are pushed into the stack, when step out, the stack will pop the JMP IR firstly and fillin the dest by measure the current size of the IR's list, and the LABEL will fillin the dest of the final JMP IR which make the code JMP to the start of the while/for block.

In case of if-else condition ,the only difference is that the final JMP will also be pushed into stack, and when the whole if stmt is finshed, each JMP IR's dest of each starement block will be fillin by the current size of the whole list.

An Example

A factorial code is like this.

  1. int a = 6;
  2. int factorial = 1;
  3. while( a != 0 ){
  4. factorial = factorial * a;
  5. a = a -1;
  6. }
  7. write( factorial );

And the final output of the IR is like this below. The first column represents the action, all action is marked as an static intteger.

  1. public static final int
  2. LABEL=0, JMP = 1,MOV = 2, WRITE = 3, DECL = 4, ASSIGN = 5,
  3. PLUS = 6, MINUS = 7, MUL = 8, DIV = 9, MOD = 10, BIGGER_THAN = 11,
  4. LESS_THAN = 12, NO_LESS = 13, NO_MORE = 14, EQ = 15,
  5. UNEQ = 16, LOG_OR = 17, LOG_AND = 18, LOG_NOT = 19,
  6. Minus = 20, READ = 21, IN = 22, OUT = 23;

And the second and third column means the operand, the execution outcomes will be stored in the forth column.

  1. 2 1 null TYPE
  2. 2 6 null EXPR
  3. 4 TYPE null a
  4. 2 EXPR null a
  5. 2 1 null TYPE
  6. 2 1 null EXPR
  7. 4 TYPE null factorial
  8. 2 EXPR null factorial
  9. 0 null null 8
  10. 2 a null VARIABLE
  11. 2 VARIABLE null EXPR
  12. 2 EXPR null COMPARE
  13. 1 null BACKWARD 8
  14. 22 null null null
  15. 1 COMPARE FORWARD null
  16. 2 factorial null VARIABLE
  17. 2 factorial null VARIABLE
  18. 2 VARIABLE null EXPR
  19. 2 a null VARIABLE
  20. 2 VARIABLE null EXPR
  21. 8 EXPR EXPR EXPR
  22. 5 EXPR null VARIABLE
  23. 2 a null VARIABLE
  24. 2 a null VARIABLE
  25. 2 VARIABLE null EXPR
  26. 2 1 null EXPR
  27. 7 EXPR EXPR EXPR
  28. 5 EXPR null VARIABLE
  29. 23 null null null
  30. 2 factorial null VARIABLE
  31. 2 VARIABLE null EXPR
  32. 3 EXPR null STDOUT

Key Part in Quaternion Execution

Data Structure

  1. public IRExecuter(ArrayList<IRLine> i){
  2. IR = i;
  3. currentLine = 0;
  4. symlist = new LinkedList<Symbol>();
  5. COMPARE = new Stack<Value>();
  6. EXPR = new Stack<Value>();
  7. VARIABLE= new Stack<Integer>();
  8. }

COMPARE and EXPR is kind of register but they are all declared as Stack to avoid the case that all register are used. When doing operation ,all temp value will be pushed into EXPR, and then pop the top 2 Value to do the operation. This is kind of postfix expression. The VARIABLE is used to store the address that is need to pass the value to expression.

Variable access

As said before, the symlist stores the all variables used in the origin code. It uses MOV action to access the symlist to get the variable index into the VARIABLE Stack no matter it is an ID or an array element.

  1. case IRProduceListener.VARIABLE:
  2. int a = isValExist(currentIR.second);
  3. if(currentIR.third!=null)
  4. a -= EXPR.pop().mInt;
  5. VARIABLE.push(a);
  6. break;

The IR is always like this

  1. MOV a null VARIABLE
  2. MOV VARIABLE null EXPR

The index is passed to VARIABLE, and the value is passed to EXPR by the index stored in VARIABLE.

Operation

Take plus and minus as example.

  1. public void PLUSMINUS() throws Exception{
  2. Value b = EXPR.pop();
  3. Value a = EXPR.pop();
  4. if(a.mType==Value.INT&&b.mType==Value.INT) {
  5. switch (currentIR.first) {
  6. case PLUS:
  7. EXPR.push(new Value(Value.INT,new Double(a.getValue()+b.getValue()).intValue()));
  8. break;
  9. case MINUS:
  10. EXPR.push(new Value(Value.INT,new Double(a.getValue()-b.getValue()).intValue()));;
  11. break;
  12. }
  13. } else {
  14. switch (currentIR.first) {
  15. case PLUS:
  16. EXPR.push(new Value(Value.DOUBLE,a.getValue()+b.getValue()));
  17. break;
  18. case MINUS:
  19. EXPR.push(new Value(Value.DOUBLE,a.getValue()-b.getValue()));
  20. break;
  21. }
  22. }
  23. }

As said before, all the temporary value will be stored in EXPR. Like this code

  1. factorial = factorial * a;

This is one line taken from the example, and its IR is shown like this, 2 means MOV, 8 means PLUS, 5 means assign

  1. 2 factorial null VARIABLE
  2. 2 factorial null VARIABLE
  3. 2 VARIABLE null EXPR
  4. 2 a null VARIABLE
  5. 2 VARIABLE null EXPR
  6. 8 EXPR EXPR EXPR
  7. 5 EXPR null VARIABLE

VARIABLE, EXPR are all implemented as a STACK, so the store action means a push action on the stack, the access means the pop. So it first push the index of factorial into VARIABLE twice, one for assignment, one for operation, and then the VARIABLE pop the top index to pass the value, which is pushed, to EXPR stack. Similarly, a's value is passed into EXPR in the same way. Then the top two value stored in EXPR is poped and used to do add operation and the outcome is pushed into EXPR, and this value then be poped to pass to the variable, by poping the index in VARIABLE and find the index and change the value.

Outcomes

In the command console, using -r option to process the testcases provided using scripts in a command line mod, we get all expected outcomes.
interpter.PNG-170.5kB

Conclusion

Actually, the IR(quaternion code) Engine is more powerful than to implement the execution part of a CMM Interpreter. Because the engine is dessigned to be used in a more general way and take in some ideas from Assembly Code, though the IR generated is much simple using the listener generated by antlr.

Due to many reasons, this final work is done in a such a hurry. The origin aim is to take Assembly Code as the IR, however, time is limited and it is difficult do deal with the register part, so we changeed to implement a simpler one in half way, which is a pitty.

Although it pass all right cases, it dose not performe well with wrong cases.

All in all, I still learnt a lot in this process, including some knowledges about Assembly Code, the Antlr's listener mode and the whole process of what a compiler does.

添加新批注
在作者公开此批注前,只有你和作者可见。
回复批注