analyzer
Folders and files
Name | Name | Last commit date | ||
---|---|---|---|---|
parent directory.. | ||||
README file for Semantic Analyzer (C++ edition) ====================================================== The directory should now contain the following files: Makefile README ast-lex.cc -> [cool root]/src/analyzer/ast-lex.cc ast-parse.cc -> [cool root]/src/analyzer/ast-parse.cc bad.cl cgen -> [cool root]/etc/../lib/.i cool-tree.cc -> [cool root]/src/analyzer/cool-tree.cc cool-tree.h cool-tree.handcode.h dumptype.cc -> [cool root]/src/analyzer/dumptype.cc good.cl handle_flags.cc -> [cool root]/src/analyzer/handle_flags.cc mycoolc -> [cool root]/src/analyzer/mycoolc mysemant -> [cool root]/src/analyzer/mysemant semant-phase.cc -> [cool root]/src/analyzer/semant-phase.cc semant.cc semant.h stringtab.cc -> [cool root]/src/analyzer/stringtab.cc symtab_example.cc -> [cool root]/src/analyzer/symtab_example.cc tree.cc -> [cool root]/src/analyzer/tree.cc utilities.cc -> [cool root]/src/analyzer/utilities.cc class-tree.cc -> [cool root]/src/analyzer/class-tree.cc class-visitor.cc -> [cool root]/src/analyzer/class-visitor.cc environment.cc -> [cool root]/src/analyzer/environment.cc method-environment.cc -> [cool root]/src/analyzer/method-environment.cc node-children.cc -> [cool root]/src/analyzer/node-children.cc object-environment.cc -> [cool root]/src/analyzer/object-environment.cc propagate-class.cc -> [cool root]/src/analyzer/object-environment.cc scope-check.cc -> [cool root]/src/analyzer/scope-check.cc segment-tree.cc -> [cool root]/src/analyzer/segment-tree.cc semant-error.cc -> [cool root]/src/analyzer/semant-error.cc type-check.cc -> [cool root]/src/analyzer/type-check.cc type-table.cc -> [cool root]/src/analyzer/type-table.cc union-find.cc -> [cool root]/src/analyzer/union-find.cc validator.cc -> [cool root]/src/analyzer/validator.cc install-basic-classes.cc -> [cool root]/src/analyzer/install-basic-classes.cc The include (.h) files can be found in [cool root]/include/analyzer The Makefile contains targets for compiling and running your program. DO NOT MODIFY. The README contains this info. Part of the assignment is to fill the README with the write-up for your project. You should explain design decisions, explain why your code is correct, and why your test cases are adequate. It is part of the assignment to clearly and concisely explain things in text as well as to comment your code. Just edit this file. good.cl and bad.cl test a few features of the semantic checker. You should add tests to ensure that good.cl exercises as many legal semantic combinations as possible and that bad.cl exercises as many kinds of semantic errors as possible. semant.h contains declarations and definitions for the semantic analyzer. Place class definitions for the structures you will use here. cool-tree.aps contains the definitions for the tree language which you use to construct the abstract syntax tree (AST). From this file, cool-tree.h and cool-tree.cc are automatically generated by a utility that compiles the specification into C++ functions for producing and consuming the tree nodes. This file is provided for your reference. DO NOT MODIFY. tree.{cc|h} contain definitions used by the tree package. DO NOT MODIFY. cool-tree.h, and cool-tree.handcode.h specify and give an implementation of Cool ASTs (see the README for PA3 and the "Cool Tour"). In this assignment, you will need to add functions to the AST classes to store, fetch, and compute information about the AST. Note that cool-tree.handcode.h differs slightly from the file supplied for PA3. You should NOT remove any definitions that are already present in cool-tree.h and cool-tree.handcode.h. These functions and data members are required for the system to function properly. You should add any fields and methods to the classes you need to perform semantic analysis. You will need to add, for example, methods which traverse the expressions of the tree and implement the type-checking rules. cool-tree.cc contains definitions of the provided methods, and instantiations of the template for the list handling functions. You should not modify this file, but place definitions of all methods you add to cool-tree.h or cool-tree.handcode.h in semant.cc. DO NOT MODIFY cool-tree.cc semant.cc is the file in which you should write your semantic analyzer. The main() procedure calls the method `semant' on `ast_root', the root of the abstract syntax tree generated by the parser. There are methods supplied that you should use to report errors. You are relatively free in how you decide to structure the semantic checker, but don't modify the error printing routines. ast-lex.cc and ast-parse.cc implement a lexer and a parser for reading text representation of ASTs from console in the format produced by the parser phase. DO NOT MODIFY. semant-phase.cc contains a test driver for semantic analysis. The main program reads an AST in text form from standard input, parses it, and then produces a type-annotated AST on standard output. The script mycoolc can pass any of the standard flags to the semantic analyzer as well; for this assignment, -s (semantic analysis debug) may be useful as it sets a global variable semant_debug to true (1). If you want your semantic checker to print debug information when the option is set, write your debug code in the following format: if (semant_debug) { ... } semant_debug is provided as a convenience. You don't need to use the debugging flags if you don't want to. DON'T MODIFY semant-phase.cc symtab.h contains a symbol table implementation. Read the comments in the file, the "Cool Tour", and look at the example in symtab_example.cc. You are not required to use this code, but you may find it useful. DO NOT MODIFY. Instructions ------------ To compile the example use of the symbol table, type % make symtab_example % ./symtab_example To compile your semantic analyzer program type: % make semant To test your semantic checker, type: % ./mysemant good.cl mysemant is a version of mycoolc that omits code generation. mysemant parses all the cool files given on the command line and builds a single abstract syntax tree containing all class definitions appearing in the input files. Your semantic checker is then called on this abstract syntax tree. If there are no errors, the program produces a type-annotated abstract syntax tree as output. To run your checker on the files good.cl and bad.cl type: % make dotest If you think your semantic checker is correct and behaves like the one we wrote, you can try to run mycoolc using your checker, your parser and also your lexical analyzer if you choose (see below for instructions). Remember if your lexer, parser or checker behaves in an unexpected manner, you may get errors anywhere. If you change architectures you must issue % make clean when you switch from one type of machine to the other. If at some point you get weird errors from the linker, you probably forgot this step. ---8<------8<------8<------8<---cut here---8<------8<------8<------8<--- Author Abdullah Emad Semantic Analysis Phases -------------------------- The implementation of this semantic analyzer consists of 3 main stages that must succeed before the program is declared to be error free. Each of these phases may require at least one whole or a partail pass through the abstract syntax tree (AST). The 3 stages are: 1. Preprocessing and Validation 3. Environment population 4. Type and Scope checking The first phase starts by adding all the basic classes (Object, IO, Int, Bool, Str) to the list of classes in the AST so that later on they can be referenced without raising an undefined class error. The next phase of the preprocessing stage propagates each class to all its AST node children. In other word, this preprocessing step makes sure that each and every node in the AST knows which class it belongs to. This is useful for error reporting and other steps that does not include an environment. The validation in the first stage makes sure that classes and their inheritance relation are well formed, all the types are declared and used correctly and does a preliminary check for redefinitions and duplicates. "Preliminary" because this redefinition check is not sufficient for all cases but rather necessary to be done before the main check is done. On error detection in the validation stage, one of three actions is to be taken, after reporting the error of course. The analyzer might chose to terminate a program after detecting a fatal error; this is typically the action taken if any problem with the classes is detected. The second action is a corrective one; if a type is used incorrectly, the analyzer will replace this type with the type "Object", so that this does not cause a problem or unnecessary error messages in a later stage. The third action is marking the node that has the problem as a faulty node. Later on, faulty nodes may be ignored by analyzer. The validation on the classes tries to detect several problems in the definitions of the class. First, it makes sure that the Main class and the main method are defined accordingly and with the correct signature. Moreover, the analyzer will check for duplicate class and features definitions. While this is necessary at this stage, it does not detect all kinds of redefinitions. It only detects redefinitions that occurs within the class only but not invalid redefinitions of inherited features, which is detected at a later stage. Next, misuse of reserved identifiers and redefinition of basic classes are detected as well as invalid inheritance such as inheriting from Int, Bool or String. Finally, cycles in the inheritance graph are detected and reported accordingly to the user. The validations on other nodes are mainly done to detect invalid type declarations and correcting them for later stages in the semantic analysis. This includes invalid usage of SELF_TYPE. The second stage in the analysis is populating the environment. The purpose of this stage is to insert all the methods in the environment and detect invalid redefinitions of methods. For each class, all the methods defined in this class or any of its ancestors are recorded in the environment under this node. Later on, when a look up of a certain method is needed, the environment can be searched using class name and method name. The third and main stage of the analysis is type and scope checking. Scope checking is done as part of type checking since type checking at some nodes cannot proceed if the node does not scope check. Scope checking is only necessary at 4 tree nodes: Object node, dispatch node, static dispatch node, and assignment node. Before type checking these 4 nodes, scope checking is performed on them. If the node does not scope check, the type checking will not proceed. Classes and Datastructures --------------------------- It was necessary to modify the tree nodes in order to perform new actions and keep some relevant state. Moreover, some classes and datastructures were implemented for the algorithms used in semantic analysis. Those include: 1. SemantErrorHandler 2. Environment 3. ObjectEnvironment 4. MethodEnvironment 5. ClassTree 6. SegmentTree 7. TypeTable 8. UnionFind 9. ClassVisitor The SemantErrorHandler is a global singleton class for error handling. It is used to store and report semantic errors in the program. Whenever an error is detected, the appropriate error object is created and passed to the "report" function. Fatal errors are reported through "report_fatal," which will terminate after reporting the error. After all the semantic stages are done executing, the "terminate_on_error" function must be called. This function will print the errors detected on the screen, sorted by the line number at which they occur, and then will terminate with an error code. If no errors were reported the handler, "terminate_on_errors" will do nothing and will just return. The Environment serves as an interface and a container for the object and method environments. Interactions with the object and method environments must be done through this class. The ObjectEnvironment serves as a storage for all the objects and identifiers defined in the current context. The ObjectEnvironment itself, does not handle redefined or undefined idenitifers errors. Instead, it should be used externally to detect such errors. When a new identifier is added to the ObjectEnvironment, it will shadow any other identifier with the same name. Thus, when looking up a specific identifier, the most recently added identifier that has the same name will be returned. Conversly, when removing an identifier, the most recently added identifier will be removed. All the operations in this datastructures has a time complexity of O(1), while the space complexity of this datastructure is O(n). The MethodEnvironment stores all the methods defined under a specific class in the program. Like the ObjectEnvironment, this class does not detect duplication on its own and assumes that if given a duplicate method, it is a child redefinition of a method in the parent. Moreover, it does not detect undefined methods. The MethodEnvironment stores the method along with its signature under a specific class. The signature can be later on looked up and the existance of a given method under a given class can be checked. ClassTree is the core of this semantic analyzer and the most important part of it. The ClassTree is used mainly to store and represent the inheritance relations between classes in the program, including the basic classes. The ClassTree assumes that the inheritance graph is acyclic and all the class definitions are valid; Thus its initialization must occur after the validation, which will terminate the program if errors on the class are detected. The ClassTree is used for three main purposes. First, it is used to traverse the tree in depth first search fashion through the visit_all function that follows the visitor design pattern. Second, it is used to perform the least upper bound (LUB) operation on two classes. This is the extended LUB operation that handles SELF_TYPE. Moreover, it treats a special "No_type" type as a child of all the other types. Finally, it supports the extended A <= B operation, which returns true if A is a child of B. The space complexity of the ClassTree is O(n). The time complexity of visit_all is O(n), while the time complexities of the LUB and <= operations are both O(log(n)) SegmentTree is a segment tree that returns the minimum item within a range of an array. It stores the classes in the ClassTree with their depth in the inheritance graph and is used to query the the Class with the minimum depth within a given range. Mainly used for the LUB operation TypeTable is a very simple datastructure. It stores all the types defined in the program, including the basic types and SELF_TYPES, and is used to query whether a given type is predefined, reserved, etc. This is used to detect undefined types and misuse of basic and reserved types. UnionFind is a union find datastructre (disjoint sets). It is used to quickly detect cycles in the inheritance graph. VisitorClass is an abstract class that is used for the visit_all operation. It is basically a visitor on a class that performs a certain operation. A class that implements a ClassVisitor must implement 3 functions: on_enter, visit, on_exit. on_enter is the first thing that gets executed as you enter the recursive function. visit is executed right before the children are recursivly visited. on_exit is executed right before the function returns. Refer to the UML diagram in the docs folder for an overview of the classes and their relationships Algorithms and Design Choices ------------------------------ Traversing the abstract syntax tree is a simple DFS traversal on a tree that is done using OOP abstraction. It is necessary to traverse the tree several times according to the design choices made and each traversal is very similar to the others (with the exception of the validation traversal) in terms of the order of doing things. The first traversal happens during the first stage. A tree node "n" is passed a Class pointer, which is said to be the ancestor class of "n". "n" will set its contianing class as the given class and will recursivly pass the given class to all its children. The second traversal that occurs is the validation traversal. The validation traversal, as an exception, happens in a breadth first search (BFS) order. This is mainly because, we need to validate all the classes before doing any validations on the other nodes (since errors on classes are fatal). Hence, a BFS traversal will garuantee that classes are visited before their children. The third and forth traversal occurs when populating the Environment and type/scope checking respectively. Environment population is a top-down DFS traversal. The environment is populated than children are visited. The type/scope checking, on the otherhand, is a bottom up DFS traversal. Children are type/scope checked first, then the parent is type checked. In both cases, classes are visited in a special order rather then sequentially. Visiting the classes happens through the ClassTree visit_all visitor pattern. ClassTree visit_all -------------------- 1. Enter 2. visit current class 3. recursively visit all the children 4. Exit Environment Population ---------------------- 1. Add all the methods to the environment 2. visit all the children the classes are traversed this way, rather than simply sequentially traversing them, because it is important to add all its methods to the method environment along with all the methods defined in any of its ancestors. The first step will add all methods in the environment defined under its direct parent along with all the methods defined under the given class. It is arguable using a proof by induction that only adding methods defined under the parent class is enough to add all and only the methods defined in any of its ancestors. The sketch of the proof would be as follows: Claim: The algorithm given above will result in an environment where every class in the environment has a reference to its methods and all the methods defined in any of its ancestors First let us define the following predicate and function: has_all_methods(env, class): true if all necessary methods are in the envirnoment apply_algo(env, class, parent): returns a environment with all the methods from the class and all the methods placed under parent added under class. 1. Base Case: Object class. The object class does not have any parent and thus this trivially holds. 2. Inductive hypothesis: has_all_methods(env, parent) -> has_all_methods(apply_algo(env, class, parent), class) 3. Inductive Step If we are at class Ci and it happens that has_all_methods(env, Ci) holds. In otherwords, the environment is populated with correct values for class Ci (i.e methods of Ci and all its ancestors). Then for a child class Cj apply_algo will add all the methods in the environment + the methods in Cj, and because the environment is populated with the necessary functions, then it must be the case that after visiting Cj the environment will be populated with the correct methods under Cj Type/Scope Checking ------------------- 1. Add the attributes to the environment 2. type check this class 3. visit all the children classes 4. remove the attributes from the environment Using a similar reasoning, it is necessary for classes to be traversed in this order for the sake of populating the environment with the attributes of the inherited classes as well. The proof sketched above would also work in this case. However, due to the additional step 4, it will need to be extended. Basically, we need to remove the attributes from the environment because they are not bounded to a certain class and as we go up the tree one level, two things are known for a fact due to the nature of the DFS algorithm. First, all the children of the current node has been visited already. Consequently, it is safe to remove the attributes that belongs to this class only. Second, because no other class that has not been visited yet inherits from the current class, it is important to remove these attributes since they are no longer relevant. Inheritance Graph Cycle Check ----------------------------- this is done using a unionfind datastructure. The algorithm is as follows: 1. for each parent-child edge 2. if parent and child are in the same component 3. report fatal error and terminate 4. union the parent the child components LUB and <= operations ----------------------- These operations are done by the ClassTree. LUB(C1, C2): 1. if type1 == type2 return type1 2. if type1 == no_type return type2 3. if type2 == no_type return type1 4. if type1 == SELF_TYPE type1 == current_class 5. if type2 == SELF_TYPE type2 == current_class 6. return Least Common ancestor of type1 and type2 step 1 is necessary because if type1 and type2 are both SELF_TYPE, then their LCA is SELF_TYPE according to the rules of the extended LUB. However, if only one is SELF_TYPE, then we just replace it by the current class and take their least common ancestor. By design, no_type is <= any other type For step 6, the RMQ algorithm for finding LCA using segment trees is used to quickly calculate the LCA. C1 <= C2: 1. true if C1 == C2 2. true if C2 != SELF_TYPE and lub(C1, C2) == C2 3. false otherwise Remember, for C1 != SELF_TYPE, C1 <= SELF_TYPE is always false. Having C1 inherit from C2 means that C2 is the least common ancestor for C1.