A Pascal LL(1) Compiler written in Python
- the scanner & character scanner aren't properly tested. They've only been print statement tested. So far works with fairly simple code.
this parses correcty...
program addNumbers;
Var a,b : Integer;
Var c: Integer;
begin
a := 1 + 1;
if (a > 0) then
writeln(a)
else
b := a + 1;
c := a + b;
end.
but this doesn't.. grammar doesn't have those begin/ends. dunno if grammar is wrong.
program addNumbers;
Var a,b : Integer;
Var c: Integer;
begin
a := 1 + 1;
if (a > 0) then
writeln(a)
else
begin
b := a + 1;
c := a + b;
end;
end.
both of them are correctly compiled by fpc.
1. DONE. have tokenzier return tk_value alongside current_line, current_line_number for debug purposes
2. DONE. change tokenizer name to something that better describes it's functionality, say, file_parser?
3. ehh. clean up constants.py and most likely change name to soemthing htat better describes its
functionality as it isn't a list of constants
4. DONE. write tests that ensure token object has all correct correspoding attributes
5. get machine instruction generation working for simple expression, if statements, while statements.
x := 1 + 2;
should generate
op_pushi, 1
op_pushi, 2
op_add
op_pop, 0
op_halt
too lazy, just look at files.
Scanner/ CharacterScanner TokenCreator
Tokenizer/ constants character
Character/ constants
token: what is a token? what does a token do? what is a token used for?
design questions?
should a token object know what to do with attributes once they are initialized? i.e token = Token(attr1, attr2, attr3), should the object itself determine how to parse the values or have some other piece figure out what to do with inputs and then input it?
List of Tokens RESERVERED_WORDS:
IDENTIFIERS:
OPERATORS: arithmetic: + - * / Div Mod logical: not and or xor shl shr << >> boolean: not and or xor string: set: + : Union - : Difference >< : Symmetric Difference <= : contains include : include an element in the set exclude : exclude an element from the set in: check whether an element is in the set relational: = : equal <> : Not Equal < : strictly less than > : strictly greater than <= : less than or equal >= : greater than or equal in : Element of
class:
seperators: white_space, new_line constants: float, integer, literal
Literals: Integer (TK_INTLIT, value) Real (TK_REALLIT, index -> real table) Chars (TK_INTLIB | TK_CHARLIT, index -> string table) Strings (TK_STRLIT, index -> string table)
Keywords (individual type, --) Identifiers (TK_ID, --, curname) EOL? EOF? (TK_EOF, --)
TOKEN: int curtoken; int curtokenvalue; string curname; curfile; curline; curcol;
(TK_ID, --, curvalue) where -- is TK_A_VAR, index -> symtable
use LL parser style.