This project is an attempt to imitating hadoop / hiveql compiler, It takes a hiveql queries as a text file + a data files (csv files) then parse it and generate map/shuffle/reduce actions written in python that corsspond to the entered query. finally , it execute the python code to produce the final query result.
- implementation language : Java SE (JDK 1.8.0_25).
- IDE : intellij IDEA community edition (2018.3.5) .
- Parser tool : antlr 4.7.2 (JAR)
- Templates for code generation : StringTmaplate 4.1 (JAR)
- Executing the generated Code : python 3.7.3 Compiler
This Project parses statements for : -hiveql queries (create table statements + full select statements ). -c++ statements (function declaration , variable declaration , for statement declaration , if statement declaration , assignment statement ) .
- hiveql with c++ statements (assign query to variable , assign query result to variable)
full syntax checks with underlining the syntax error in the console and printing line and column number of the error.
- store all of primitive types (Int – Bool – String – Real ) in a binary file by default .
- store new types (table) in a binary file with its own delimiter and location of csv files which added by create statements.
create symbol table that represents the declarative statements of the input.
create AST for full select statements .
- Error for using undeclared variable;
- Error for using un-existed column of a type;
- Error in Multiple Declarations: a variable should be declared (in the same scope) at most once.
- Error while using of undeclared type (like table);
- Error for calling undeclared method;
- Everything in select statement should be in grouping if there is a grouping by clause.
- Having clause contains only grouping functions.
- Group by clause can’t contain aggregate function.
- generating map ,reduce and shuffle actions written in python code (targeted language) .
- code generator can generate code for select statements include :
- aggregation functions clause (sum ,count,avg,min,max)
- group by clause
- where clause
- order by clause
- execute the generated mappers , shuffler and reducers -in suitable order- which written in python using input data files (csv files) to produce the final query result on a text file .