From 2a696e0a9dc6c4583dc97cd06e19349347f434ef Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Wed, 5 Jan 2022 13:50:27 +0100 Subject: [PATCH 001/142] Init with structure --- chapter/methodology.md | 39 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 39 insertions(+) diff --git a/chapter/methodology.md b/chapter/methodology.md index e69de29b..65068ede 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -0,0 +1,39 @@ +# Method + +## Nickel AST + +### Basic Elements + +### Meta Information + +### Records + +### Static access + +## Linearization + +### States + +### Distinguished Elements + +### Transfer from AST + +#### Retyping + +### Post-Processing + +### Resolving Elements + +## LSP Server + +### Diagnostics and Caching + +### Capabilities + +#### Hover + +#### Completion + +#### Jump to Definition + +#### Show references From eb084401d667dcbfa8775e1a31c112a990926afa Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Wed, 5 Jan 2022 15:45:32 +0100 Subject: [PATCH 002/142] Chapter introduction --- chapter/methodology.md | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/chapter/methodology.md b/chapter/methodology.md index 65068ede..2c0ef3f3 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -1,5 +1,12 @@ # Method +This chapter contains a detailed guide through the various steps and components of the Nickel Language Server (NLS). +Being written in the same language (Rust[@rust]) as the Nickel interpreter allows NLS to integrate existing components for language analysis. +Complementary, NLS is tightly coupled to Nickel's Syntax definition. +Hence, in [@sec:nickel] this chapter will first detail parts of the AST that are of particular interest for the LSP and require special handling. +Based on that [@sec:linearization] will introduce the main datastructure underlying all higher level LSP interactions and how the AST is transformed into this form. +Finally, in [@sec:lsp-server] the implementation of current LSP features is discussed on the basis of the previously reviewed compontents. + ## Nickel AST ### Basic Elements From bfdf4b1cd6210a5f6bec923546f6318b6ff28f10 Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Wed, 5 Jan 2022 16:56:07 +0100 Subject: [PATCH 003/142] Static Nickel --- chapter/methodology.md | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+) diff --git a/chapter/methodology.md b/chapter/methodology.md index 2c0ef3f3..9a7e98ed 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -9,8 +9,29 @@ Finally, in [@sec:lsp-server] the implementation of current LSP features is disc ## Nickel AST + +Nickel's Syntax tree is a single sum type, i.e. an enumeration of node types. +Each enumeration variant may refer to child nodes, representing a branch or hold terminal values in which case it is considered a leaf of the tree. +Additionally, nodes are parsed and represented, wrapped in another structure that encodes the span of the node and all its potential children. + ### Basic Elements +The data types of the Nickel language are closely related to JSON +On the leaf level, Nickel defines `Boolean`, `Number`, `String` and `Null` types +In addition to that the language implements native support for `Enum` values. + +Completing JSON compatibility, `List` and `Record` types are present as well. +Records on a syntax level are HashMaps, uniquely associating an identifier with a sub-node. + +These data types constitute a static subset of Nickel which allows writing JSON compatible expressions as shown in [@lst:nickel-static]. + +```{.nickel #lst:nickel-static caption="Example of a static Nickel expression"} +{ + list = [ 1, "string", null], + "enum value" = `Value +} +``` + ### Meta Information ### Records From 51a1876d6c451b1151b9dea93338a2523d139b8e Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Thu, 6 Jan 2022 16:59:11 +0100 Subject: [PATCH 004/142] Fixes to existing content --- chapter/methodology.md | 19 ++++++++++++------- 1 file changed, 12 insertions(+), 7 deletions(-) diff --git a/chapter/methodology.md b/chapter/methodology.md index 9a7e98ed..2ec8837c 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -3,7 +3,7 @@ This chapter contains a detailed guide through the various steps and components of the Nickel Language Server (NLS). Being written in the same language (Rust[@rust]) as the Nickel interpreter allows NLS to integrate existing components for language analysis. Complementary, NLS is tightly coupled to Nickel's Syntax definition. -Hence, in [@sec:nickel] this chapter will first detail parts of the AST that are of particular interest for the LSP and require special handling. +Hence, in [@sec:nickel-ast] this chapter will first detail parts of the AST that are of particular interest for the LSP and require special handling. Based on that [@sec:linearization] will introduce the main datastructure underlying all higher level LSP interactions and how the AST is transformed into this form. Finally, in [@sec:lsp-server] the implementation of current LSP features is discussed on the basis of the previously reviewed compontents. @@ -16,22 +16,27 @@ Additionally, nodes are parsed and represented, wrapped in another structure tha ### Basic Elements -The data types of the Nickel language are closely related to JSON -On the leaf level, Nickel defines `Boolean`, `Number`, `String` and `Null` types +The data types of the Nickel language are closely related to JSON. +On the leaf level, Nickel defines `Boolean`, `Number`, `String` and `Null`. In addition to that the language implements native support for `Enum` values. +Each of these are terminal leafs in the syntax tree. -Completing JSON compatibility, `List` and `Record` types are present as well. -Records on a syntax level are HashMaps, uniquely associating an identifier with a sub-node. +Completing JSON compatibility, `List` and `Record` constructs are present as well. +Records on a syntax level are HashMaps, uniquely associating an identifier with a sub-node. These data types constitute a static subset of Nickel which allows writing JSON compatible expressions as shown in [@lst:nickel-static]. ```{.nickel #lst:nickel-static caption="Example of a static Nickel expression"} { - list = [ 1, "string", null], - "enum value" = `Value + list = [ 1, "string", null], + "enum value" = `Value } ``` + + +Building on that Nickel also supports variables and functions which make up the majority of the AST stem. + ### Meta Information ### Records From f10cea006b2b1fb0452caa157233f80d251fece9 Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Thu, 6 Jan 2022 16:59:41 +0100 Subject: [PATCH 005/142] Meta information section --- chapter/methodology.md | 48 +++++++++++++++++++++++++++++++++++++++++- 1 file changed, 47 insertions(+), 1 deletion(-) diff --git a/chapter/methodology.md b/chapter/methodology.md index 2ec8837c..e8da99db 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -39,7 +39,53 @@ Building on that Nickel also supports variables and functions which make up the ### Meta Information -### Records +One key feature of Nickel is its gradual typing system [ref again?], which implies that values can be explicitly typed. +Complementing type information it is possible to annotate values with contracts and additional meta-data such as documentation, default values and merge priority a special syntax as displayed in [@lst:nickel-meta]. + + +```{.nickel #lst:nickel-meta caption="Example of a static Nickel expression"} +let Contract = { + foo | Num + | doc "I am foo", + hello | Str + | default = "world" + } + | doc "Just an example Contract" +in +let value | #Contract = { foo = 9 } +in value == { foo = 9, hello = "world"} + +> true +``` + +Internally, the addition of annotations wraps the annotated term in a `MetaValue` structure, that is creates an artificial tree node that describes its subtree. +Concretely, the expression shown in [@lst:nickel-meta-typed] translates to the AST in [@fig:nickel-meta-typed]. +The green `MetaValue` box is a virtual node generated during parsing and not present in the untyped equivalent. + +```{.nickel #lst:nickel-meta-typed caption="Example of a typed expression"} +let x: Num = 5 in x +``` + + +```{.graphviz #fig:nickel-meta-typed caption="AST of typed expression"} +strict digraph { + graph [fontname = "Fira Code"]; + node [fontname = "Fira Code"]; + edge [fontname = "Fira Code"]; + + meta [label="MetaValue", color="green", shape="box"] + let [label = "Let('x')"] + num [label = "Num(5)"] + var [label = "Var('x')"] + + meta -> let + let -> num + let -> var +} +``` + + +### Recursive Records ### Static access From 663a6b920cca7a6e78256207a5c063ab5a259cf8 Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Thu, 6 Jan 2022 18:11:50 +0100 Subject: [PATCH 006/142] Start on Record Shorthands --- chapter/methodology.md | 28 +++++++++++++++++++++++++++- 1 file changed, 27 insertions(+), 1 deletion(-) diff --git a/chapter/methodology.md b/chapter/methodology.md index e8da99db..bf1bc465 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -85,10 +85,36 @@ strict digraph { ``` -### Recursive Records ### Static access + + +### Record Shorthand + +Nickel supports a shorthand syntax to efficiently define nested records. +As a comparison the example in [@lst:nickel-record-shorthand] uses the shorthand syntax with resolves to the semantically equivalent record defined in [@lst:nickel-record-no-shorthand] + +```{.nickel #lst:nickel-record-shorthand caption="Nickel record using shorthand"} +{ + deeply.nested.record.field = true; +} +``` + +```{.nickel #lst:nickel-record-no-shorthand caption="Nickel record defined explicitly"} +{ + deeply = { + nested = { + record = { + field = true + } + } + } +} +``` + +Yet, on a syntax level different Nickel generates a different representation. + ## Linearization ### States From 3b2d52c0c181edac9d4cfca6ae22260ebb0656b8 Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Thu, 6 Jan 2022 19:19:16 +0100 Subject: [PATCH 007/142] Add subsubsection for static record access --- chapter/methodology.md | 45 +++++++++++++++++++++++++++++++++++++++++- 1 file changed, 44 insertions(+), 1 deletion(-) diff --git a/chapter/methodology.md b/chapter/methodology.md index bf1bc465..9bf71fbe 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -85,8 +85,51 @@ strict digraph { ``` +### Nested Record Access -### Static access +Nickel supports the referencing of variables which are represented as `Var` nodes that are resolved during runtime. +With records bound to a variable, a method to access elements inside that record is required. +The access of record members is represented using a special set of AST nodes depending on whether the member name requires an evaluation in which case resolution is deferred to the evaluation pass. +While the latter prevents static analysis of any deeper element by the LSP, `StaticAccess` can be used to resolve any intermediate reference. + +Notably Nickel represents static access chains in inverse order as unary operations which in turn puts the terminal `Var` node as a leaf in the tree. +Graphically, [@fig:nickel-static-access] shows the representation of the static access perfomed in [@lst:nickel-static-access] with the rest of the tree omitted. + +```{.nickel #lst:nickel-static-access caption="Nickel static access"} +let x = { + y = { + z = 1; + } +} in x.y.z +``` + + +```{.graphviz #fig:nickel-static-access caption="AST of typed expression" height=6cm} +strict digraph { + graph [fontname = "Fira Code"]; + node [fontname = "Fira Code", margin=0.25]; + edge [fontname = "Fira Code"]; + + rankdir="TD" + + let [label = "Let", color="grey"] + rec [label = "omitted", color="grey", style="dashed", shape="box"] + + x [label = "Var('x')"] + unop_x_y [label = ".", shape = "triangle", margin=0.066] + y [label = "StaticAccess('y')"] + unop_y_z [label = ".", shape = "triangle", margin=0.066] + z [label = "StaticAccess('z')"] + + + let -> rec + let -> unop_y_z + unop_y_z -> unop_x_y + unop_y_z -> z + unop_x_y -> y + unop_x_y -> x +} +``` From 8151ffa0f14305cc89c7916f11e2efd45591c650 Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Thu, 6 Jan 2022 19:19:55 +0100 Subject: [PATCH 008/142] Amend wording and graphic size --- chapter/methodology.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/chapter/methodology.md b/chapter/methodology.md index 9bf71fbe..ab1aede8 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -67,7 +67,7 @@ let x: Num = 5 in x ``` -```{.graphviz #fig:nickel-meta-typed caption="AST of typed expression"} +```{.graphviz #fig:nickel-meta-typed caption="AST of typed expression" height=4.5cm} strict digraph { graph [fontname = "Fira Code"]; node [fontname = "Fira Code"]; @@ -135,8 +135,8 @@ strict digraph { ### Record Shorthand -Nickel supports a shorthand syntax to efficiently define nested records. -As a comparison the example in [@lst:nickel-record-shorthand] uses the shorthand syntax with resolves to the semantically equivalent record defined in [@lst:nickel-record-no-shorthand] +Nickel supports a shorthand syntax to efficiently define nested records similarly to how nested record fields are accessed. +As a comparison the example in [@lst:nickel-record-shorthand] uses the shorthand syntax which resolves to the semantically equivalent record defined in [@lst:nickel-record-no-shorthand] ```{.nickel #lst:nickel-record-shorthand caption="Nickel record using shorthand"} { From 7e8509d96ce0d1379e37a2edf29ce6a844911f2f Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Fri, 7 Jan 2022 15:26:49 +0100 Subject: [PATCH 009/142] Linearization intro --- chapter/methodology.md | 64 ++++++++++++++++++++++++++++++++++++------ 1 file changed, 56 insertions(+), 8 deletions(-) diff --git a/chapter/methodology.md b/chapter/methodology.md index ab1aede8..d8562011 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -84,6 +84,46 @@ strict digraph { } ``` + ### Nested Record Access @@ -92,8 +132,8 @@ With records bound to a variable, a method to access elements inside that record The access of record members is represented using a special set of AST nodes depending on whether the member name requires an evaluation in which case resolution is deferred to the evaluation pass. While the latter prevents static analysis of any deeper element by the LSP, `StaticAccess` can be used to resolve any intermediate reference. -Notably Nickel represents static access chains in inverse order as unary operations which in turn puts the terminal `Var` node as a leaf in the tree. -Graphically, [@fig:nickel-static-access] shows the representation of the static access perfomed in [@lst:nickel-static-access] with the rest of the tree omitted. +Notably, Nickel represents static access chains in inverse order as unary operations which in turn puts the terminal `Var` node as a leaf in the tree. +[Figure @fig:nickel-static-access] shows the representation of the static access perfomed in [@lst:nickel-static-access] with the rest of the tree omitted. ```{.nickel #lst:nickel-static-access caption="Nickel static access"} let x = { @@ -116,17 +156,13 @@ strict digraph { rec [label = "omitted", color="grey", style="dashed", shape="box"] x [label = "Var('x')"] - unop_x_y [label = ".", shape = "triangle", margin=0.066] - y [label = "StaticAccess('y')"] - unop_y_z [label = ".", shape = "triangle", margin=0.066] - z [label = "StaticAccess('z')"] + unop_x_y [label = ".y", shape = "triangle", margin=0.066] + unop_y_z [label = ".z", shape = "triangle", margin=0.066] let -> rec let -> unop_y_z unop_y_z -> unop_x_y - unop_y_z -> z - unop_x_y -> y unop_x_y -> x } ``` @@ -158,8 +194,20 @@ As a comparison the example in [@lst:nickel-record-shorthand] uses the shorthand Yet, on a syntax level different Nickel generates a different representation. + + + ## Linearization +Being a domain specific language, the scope of analyzed Nickel files is expected to be small compared to other general purpose languages. +NLS therefore takes an *eager approach* to code analysis, resolving all information at once which is then stored in a linear data structure with efficient access to elements. +This data structure is referred to as *linearization*. +The term arises from the fact that the linearization is a transformation of the syntax tree into a linear structure which is presented in more detail in [@sec:transfer-from-ast]. +The implementation distinguishes two separate states of the linearization. +During its construction, the linearization will be in a *building* state, and is eventually post-processed yielding a *completed* state. +The semantics of these states are defined in [@sec:states], while the post-processing is described separately in [@sec:post-processing]. +Finally, [@sec:resolving-elements] explains how the linearization is accessed. + ### States ### Distinguished Elements From 2bf8cee400dfaa6424d869b8f0afa760973b66bb Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Sat, 8 Jan 2022 00:01:07 +0100 Subject: [PATCH 010/142] First words about different states --- chapter/methodology.md | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/chapter/methodology.md b/chapter/methodology.md index d8562011..cd9526b7 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -210,7 +210,18 @@ Finally, [@sec:resolving-elements] explains how the linearization is accessed. ### States -### Distinguished Elements +At its core the linearization is an array of `LinearizationItem`s which are derived from AST nodes during the linearization process. + +Closely related to nodes, `LinearizationItem`s maintain the position of their AST counterpart, as well as its type. +Unlike in the AST, metadata is directly associated with the element. +Further deviating from the AST representation, the type of the node and its kind are tracked separately. +The latter is used to distinguish between declarations of variables, records, record fields and variable usages as well as a wildcard kind for any other kind of structure, such as terminals control flow elements. + +As mentioned in the introduction NLS distinguishes a linearization in construction from a finalized one. +Both states are set apart by the auxiliary data maintained about the linearization items, the ordering of the items themselves and the resolution of their concrete types. +Additionally, both states implement a different set of methods. +For the `Building` state the linearization implements several methods used during the transfer of the AST and post-processing routines that defines the state transition into the `Completed` state. + ### Transfer from AST From 6c3b9f2be0cf5d32d95570c3713770a1e296f69b Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Sun, 9 Jan 2022 18:15:15 +0100 Subject: [PATCH 011/142] Finish states subsection --- chapter/methodology.md | 55 ++++++++++++++++++++++++++++++++++++------ 1 file changed, 48 insertions(+), 7 deletions(-) diff --git a/chapter/methodology.md b/chapter/methodology.md index cd9526b7..0d858e56 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -210,17 +210,58 @@ Finally, [@sec:resolving-elements] explains how the linearization is accessed. ### States -At its core the linearization is an array of `LinearizationItem`s which are derived from AST nodes during the linearization process. +At its core the linearization in either state is represented by an array of `LinearizationItem`s which are derived from AST nodes during the linearization process as well as state dependent auxiliary structures. Closely related to nodes, `LinearizationItem`s maintain the position of their AST counterpart, as well as its type. -Unlike in the AST, metadata is directly associated with the element. -Further deviating from the AST representation, the type of the node and its kind are tracked separately. +Unlike in the AST, *metadata* is directly associated with the element. +Further deviating from the AST representation, the *type* of the node and its *kind* are tracked separately. The latter is used to distinguish between declarations of variables, records, record fields and variable usages as well as a wildcard kind for any other kind of structure, such as terminals control flow elements. -As mentioned in the introduction NLS distinguishes a linearization in construction from a finalized one. -Both states are set apart by the auxiliary data maintained about the linearization items, the ordering of the items themselves and the resolution of their concrete types. -Additionally, both states implement a different set of methods. -For the `Building` state the linearization implements several methods used during the transfer of the AST and post-processing routines that defines the state transition into the `Completed` state. +The aforementioned separation of linearization states got special attention. +As the linearization process is integrated with the libraries underlying the Nickel interpreter, it had to be designed to cause minimal overhead during normal execution. +Hence, the concrete implementation employs type-states[@typestate] to separate both states on a type level and defines generic interfaces that allow for context dependent implementations. + +At its base the `Linearization` type is just a transparent wrapper around the particular `LinearizationState` which holds state specific data. +On top of that NLS defines a `Building` and `Completed` state. + +The `Building` state represents an accumulated created incrementally during the linearization process. +In particular that is a list of `LinearizationItems` of unresolved type ordered as they appear in a depth-first iteration of the AST. +Note that new elements are exclusively appended such that their `id` field during this phase is equal to the elements position at all time. +Additionally, the `Building` state maintains the scope structure for every item in a separate mapping. + +Once fully built a `Building` instance is post-processed yielding a `Completed` linearization. +While being defined similar to its origin, the structure is optimized for positional access, affecting the order of the `LinearizationItem`s and requiring an auxiliary mapping for efficient access to items by their `id`. +Moreover, types of items in the `Completed` linearization will be resolved. + +Type definitions of the `Linearization` as well as its type-states `Building` and `Completed` are listed in [@lst:nickel-definition-lineatization;@lst:nls-definition-building-type;@lst:nls-definition-completed-type]. +Note that only the former is defined as part of the Nickel libraries, the latter are specific implementations for NLS. + + +```{.rust #lst:nickel-definition-lineatization caption="Definition of Linearization structure"} +pub trait LinearizationState {} + +pub struct Linearization { + pub state: S, +} +``` + +```{.rust #lst:nls-definition-building-type caption="Type Definition of Building state"} +pub struct Building { + pub linearization: Vec>, + pub scope: HashMap, Vec>, +} +impl LinearizationState for Building {} +``` + +```{.rust #lst:nls-definition-completed-type caption="Type Definition of Completed state" } +pub struct Completed { + pub linearization: Vec>, + scope: HashMap, Vec>, + id_to_index: HashMap, +} +impl LinearizationState for Completed {} +``` + ### Transfer from AST From 661d0f058e59127bab047b9942a65bb23899855c Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Sun, 9 Jan 2022 18:33:48 +0100 Subject: [PATCH 012/142] Reference smart pointers --- chapter/methodology.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/chapter/methodology.md b/chapter/methodology.md index 0d858e56..da6f2480 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -221,7 +221,7 @@ The aforementioned separation of linearization states got special attention. As the linearization process is integrated with the libraries underlying the Nickel interpreter, it had to be designed to cause minimal overhead during normal execution. Hence, the concrete implementation employs type-states[@typestate] to separate both states on a type level and defines generic interfaces that allow for context dependent implementations. -At its base the `Linearization` type is just a transparent wrapper around the particular `LinearizationState` which holds state specific data. +At its base the `Linearization` type is a transparent smart pointer[@deref-chapter;@smart-pointer-chapter] to the particular `LinearizationState` which holds state specific data. On top of that NLS defines a `Building` and `Completed` state. The `Building` state represents an accumulated created incrementally during the linearization process. From 783f63c9598e3c8535bc89150ae68811bdeb9a41 Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Sun, 9 Jan 2022 19:39:27 +0100 Subject: [PATCH 013/142] Add access methods --- chapter/methodology.md | 34 ++++++++++++++++++++++++++++++++++ 1 file changed, 34 insertions(+) diff --git a/chapter/methodology.md b/chapter/methodology.md index da6f2480..17785a21 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -272,6 +272,40 @@ impl LinearizationState for Completed {} ### Resolving Elements +#### Resolving by position + +As part of the post-processing step discussed in [@sec:post-processing], the `LinearizationItem`s in the `Completed` linearization are reorderd by their occurence of the corresponding AST node in the source file. +To find items in this list three preconditions have to hold: + + +1. Each element has a corresponding span in the source +2. Items of different files appear ordered by `FileId` +3. Spans may overlap never intersect. + $$\text{Item}^2_\text{start} \geq \text{Item}^1_\text{start} \land \text{Item}^2_\text{end} \leq \text{Item}^1_\text{end}$$ +4. Items referring to the spans starting at the same position have to occur in the same order before and after the post-processing. + Concretely, this ensures that the tree-induced hierarchy is maintained, more precise elements follow broader ones + + +This first two properties are an implication of the preceding processes. +All elements are derived from AST nodes, which are parsed from files retaining their position. +Nodes that are generated by the runtime before being passed to the language server are either ignored or annotated with synthetic positions that are known to be in the bounds of the file and meet the second requirement. +For all other nodes the second requirement is automatically fulfilled by the grammar of the Nickel language. +The last requirement is achieved by using a stable sort during the post-processing. + +Given a concrete position, that is a `FileId` and `ByteIndex` in that file, a binary search is used to find the *last* element that *starts* at the given position. +According to the aforementioned preconditions an element found there is equivalent to being the most concrete element starting at this position. +In the more frequent case that no element starting at the provided position is found, the search instead yields an index which can be used as a starting point to iterate the linearization *backwards* to find an item with the shortest span containing the queried position. +Due to the third requirement, this reverse iteration can be aborted once an item's span ends before the query. +If the search has to be aborted, the query does not have a corresponding `LinearizationItem`. + +#### Resolving by ID + +During the building process item IDs are equal to their index in the underlying List which allows for efficient access by ID. +To allow similarly efficient access to nodes with using IDs a `Completed` linearization maintains a mapping of IDs to their corresponding index in the reordered array. +A queried ID is first looked up in this mapping which yields an index from which the actual item is read. + +#### Resolving by scope + ## LSP Server ### Diagnostics and Caching From 092cecdf1227e31c3b40f07b75c290332589f101 Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Mon, 10 Jan 2022 15:29:53 +0100 Subject: [PATCH 014/142] Resolution by scope --- chapter/methodology.md | 27 +++++++++++++++++++++++++++ 1 file changed, 27 insertions(+) diff --git a/chapter/methodology.md b/chapter/methodology.md index 17785a21..37561e5d 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -306,6 +306,33 @@ A queried ID is first looked up in this mapping which yields an index from which #### Resolving by scope +During the construction from the AST, the syntactic scope of each element is eventually known. +This allows to map scopes to a list of elements defined in this scope. +Definitions from higher scopes are not repeated, instead they are calculated on request. +As scopes are lists of scope fragments, for any given scope the set of referable nodes is determined by unifying IDs of all prefixes of the given scope, then resolving the IDs to elements. +The Rust implementation is given in [@lst:nls-resolve-scope] below. + +```{.rust #lst:nls-resolve-scope caption="Resolution of all items in scope"} +impl Completed { + pub fn get_in_scope( + &self, + LinearizationItem { scope, .. }: &LinearizationItem, + ) -> Vec<&LinearizationItem> { + let EMPTY = Vec::with_capacity(0); + // all prefix lengths + (0..scope.len()) + // concatenate all scopes + .flat_map(|end| self.scope.get(&scope[..=end]) + .unwrap_or(&EMPTY)) + // resolve items + .map(|id| self.get_item(*id)) + // ignore unresolved items + .flatten() + .collect() + } +} +``` + ## LSP Server ### Diagnostics and Caching From 98241c514207c65234414f1b5e788ab52dcaccb6 Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Tue, 11 Jan 2022 13:33:00 +0100 Subject: [PATCH 015/142] Correct type definition --- chapter/methodology.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/chapter/methodology.md b/chapter/methodology.md index 37561e5d..ae6d4ed6 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -243,6 +243,7 @@ pub trait LinearizationState {} pub struct Linearization { pub state: S, } + ``` ```{.rust #lst:nls-definition-building-type caption="Type Definition of Building state"} @@ -256,7 +257,7 @@ impl LinearizationState for Building {} ```{.rust #lst:nls-definition-completed-type caption="Type Definition of Completed state" } pub struct Completed { pub linearization: Vec>, - scope: HashMap, Vec>, + scope: HashMap, Vec>, id_to_index: HashMap, } impl LinearizationState for Completed {} From a5b072198f39c727a930b2880674565aacfd5b77 Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Tue, 11 Jan 2022 13:35:15 +0100 Subject: [PATCH 016/142] Reference item_at algorithm --- chapter/methodology.md | 38 ++++++++++++++++++++++++++++++++++++-- 1 file changed, 36 insertions(+), 2 deletions(-) diff --git a/chapter/methodology.md b/chapter/methodology.md index ae6d4ed6..de284277 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -278,7 +278,6 @@ impl LinearizationState for Completed {} As part of the post-processing step discussed in [@sec:post-processing], the `LinearizationItem`s in the `Completed` linearization are reorderd by their occurence of the corresponding AST node in the source file. To find items in this list three preconditions have to hold: - 1. Each element has a corresponding span in the source 2. Items of different files appear ordered by `FileId` 3. Spans may overlap never intersect. @@ -286,19 +285,54 @@ To find items in this list three preconditions have to hold: 4. Items referring to the spans starting at the same position have to occur in the same order before and after the post-processing. Concretely, this ensures that the tree-induced hierarchy is maintained, more precise elements follow broader ones - This first two properties are an implication of the preceding processes. All elements are derived from AST nodes, which are parsed from files retaining their position. Nodes that are generated by the runtime before being passed to the language server are either ignored or annotated with synthetic positions that are known to be in the bounds of the file and meet the second requirement. For all other nodes the second requirement is automatically fulfilled by the grammar of the Nickel language. The last requirement is achieved by using a stable sort during the post-processing. +The algorithm used is listed in [@lst:nls-resolve-at]. Given a concrete position, that is a `FileId` and `ByteIndex` in that file, a binary search is used to find the *last* element that *starts* at the given position. According to the aforementioned preconditions an element found there is equivalent to being the most concrete element starting at this position. In the more frequent case that no element starting at the provided position is found, the search instead yields an index which can be used as a starting point to iterate the linearization *backwards* to find an item with the shortest span containing the queried position. Due to the third requirement, this reverse iteration can be aborted once an item's span ends before the query. If the search has to be aborted, the query does not have a corresponding `LinearizationItem`. +```{.rust #lst:nls-resolve-at caption="Resolution of item at given position"} +impl Completed { + pub fn item_at( + &self, + locator: &(FileId, ByteIndex), + ) -> Option<&LinearizationItem> { + let (file_id, start) = locator; + let linearization = &self.linearization; + let item = match linearization + .binary_search_by_key( + locator, + |item| (item.pos.src_id, item.pos.start)) + { + // Found item(s) starting at `locator` + // search for most precise element + Ok(index) => linearization[index..] + .iter() + .take_while(|item| (item.pos.src_id, item.pos.start) == locator) + .last(), + // No perfect match found + // iterate back finding the first wrapping linearization item + Err(index) => { + linearization[..index].iter().rfind(|item| { + // Return the first (innermost) matching item + file_id == &item.pos.src_id + && start > &item.pos.start + && start < &item.pos.end + }) + } + }; + item + } +} +``` + #### Resolving by ID During the building process item IDs are equal to their index in the underlying List which allows for efficient access by ID. From 3193361f2f6a392f7a551e1d2d22f0c8d192a744 Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Tue, 11 Jan 2022 14:26:43 +0100 Subject: [PATCH 017/142] Apply suggestions from code review Co-authored-by: Yann Hamdaoui --- chapter/methodology.md | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/chapter/methodology.md b/chapter/methodology.md index de284277..b210d004 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -2,21 +2,21 @@ This chapter contains a detailed guide through the various steps and components of the Nickel Language Server (NLS). Being written in the same language (Rust[@rust]) as the Nickel interpreter allows NLS to integrate existing components for language analysis. -Complementary, NLS is tightly coupled to Nickel's Syntax definition. -Hence, in [@sec:nickel-ast] this chapter will first detail parts of the AST that are of particular interest for the LSP and require special handling. +Complementary, NLS is tightly coupled to Nickel's syntax definition. +Hence, [@sec:nickel-ast] will first detail parts of the AST that are of particular interest for the LSP and require special handling. Based on that [@sec:linearization] will introduce the main datastructure underlying all higher level LSP interactions and how the AST is transformed into this form. -Finally, in [@sec:lsp-server] the implementation of current LSP features is discussed on the basis of the previously reviewed compontents. +Finally, in [@sec:lsp-server] the implementation of current LSP features is discussed on the basis of the previously reviewed components. ## Nickel AST -Nickel's Syntax tree is a single sum type, i.e. an enumeration of node types. +Nickel's syntax tree is a single sum type, i.e. an enumeration of node types. Each enumeration variant may refer to child nodes, representing a branch or hold terminal values in which case it is considered a leaf of the tree. Additionally, nodes are parsed and represented, wrapped in another structure that encodes the span of the node and all its potential children. ### Basic Elements -The data types of the Nickel language are closely related to JSON. +The primitive values of the Nickel language are closely related to JSON. On the leaf level, Nickel defines `Boolean`, `Number`, `String` and `Null`. In addition to that the language implements native support for `Enum` values. Each of these are terminal leafs in the syntax tree. @@ -35,7 +35,7 @@ These data types constitute a static subset of Nickel which allows writing JSON -Building on that Nickel also supports variables and functions which make up the majority of the AST stem. +Building on that Nickel also supports variables and functions which make up the majority of the AST. ### Meta Information @@ -200,7 +200,7 @@ Yet, on a syntax level different Nickel generates a different representation. ## Linearization Being a domain specific language, the scope of analyzed Nickel files is expected to be small compared to other general purpose languages. -NLS therefore takes an *eager approach* to code analysis, resolving all information at once which is then stored in a linear data structure with efficient access to elements. +Hence, NLS takes an eager approach to code analysis, resolving all information at once which is then stored in a linear data structure with efficient access to elements. This data structure is referred to as *linearization*. The term arises from the fact that the linearization is a transformation of the syntax tree into a linear structure which is presented in more detail in [@sec:transfer-from-ast]. The implementation distinguishes two separate states of the linearization. From 85289ec0489b7be4e2e7fca8a4b8eb8d8e7745f6 Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Tue, 11 Jan 2022 14:34:41 +0100 Subject: [PATCH 018/142] Address review comment on span structure --- chapter/methodology.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/chapter/methodology.md b/chapter/methodology.md index b210d004..3da2045c 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -280,7 +280,7 @@ To find items in this list three preconditions have to hold: 1. Each element has a corresponding span in the source 2. Items of different files appear ordered by `FileId` -3. Spans may overlap never intersect. +3. Two spans are either within the bounds of the other or disjoint. $$\text{Item}^2_\text{start} \geq \text{Item}^1_\text{start} \land \text{Item}^2_\text{end} \leq \text{Item}^1_\text{end}$$ 4. Items referring to the spans starting at the same position have to occur in the same order before and after the post-processing. Concretely, this ensures that the tree-induced hierarchy is maintained, more precise elements follow broader ones From 8f79076eeb90b0aa1106cbf122aa8a9545959fc9 Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Thu, 13 Jan 2022 15:34:39 +0100 Subject: [PATCH 019/142] Add more subsections to ast transfer outline --- chapter/methodology.md | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/chapter/methodology.md b/chapter/methodology.md index 3da2045c..86b49949 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -267,7 +267,17 @@ impl LinearizationState for Completed {} ### Transfer from AST -#### Retyping +#### Metadata + +#### Records + +#### Static access + +#### Integration with Nickel + +##### Scope + +##### Retyping ### Post-Processing From 049a90a39f197182ba7c07e3abbbc86a518f1ee7 Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Thu, 13 Jan 2022 15:35:24 +0100 Subject: [PATCH 020/142] Rename Method chapter --- chapter/methodology.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/chapter/methodology.md b/chapter/methodology.md index 86b49949..a5a5a8df 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -1,4 +1,4 @@ -# Method +# Design implementation of NLS This chapter contains a detailed guide through the various steps and components of the Nickel Language Server (NLS). Being written in the same language (Rust[@rust]) as the Nickel interpreter allows NLS to integrate existing components for language analysis. From 7f5fc4a6a69719d07db4e73a82ffc689336f5f3a Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Thu, 13 Jan 2022 15:37:01 +0100 Subject: [PATCH 021/142] Remove Nickel AST section Moved to Background (#2) --- chapter/methodology.md | 192 ++--------------------------------------- 1 file changed, 7 insertions(+), 185 deletions(-) diff --git a/chapter/methodology.md b/chapter/methodology.md index a5a5a8df..07c64905 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -3,200 +3,22 @@ This chapter contains a detailed guide through the various steps and components of the Nickel Language Server (NLS). Being written in the same language (Rust[@rust]) as the Nickel interpreter allows NLS to integrate existing components for language analysis. Complementary, NLS is tightly coupled to Nickel's syntax definition. -Hence, [@sec:nickel-ast] will first detail parts of the AST that are of particular interest for the LSP and require special handling. -Based on that [@sec:linearization] will introduce the main datastructure underlying all higher level LSP interactions and how the AST is transformed into this form. +Based on that [@sec:linearization] will introduce the main datastructure underlying all higher level LSP interactions and how the AST described in [@sec:nickel-ast] is transformed into this form. Finally, in [@sec:lsp-server] the implementation of current LSP features is discussed on the basis of the previously reviewed components. -## Nickel AST - -Nickel's syntax tree is a single sum type, i.e. an enumeration of node types. -Each enumeration variant may refer to child nodes, representing a branch or hold terminal values in which case it is considered a leaf of the tree. -Additionally, nodes are parsed and represented, wrapped in another structure that encodes the span of the node and all its potential children. - -### Basic Elements - -The primitive values of the Nickel language are closely related to JSON. -On the leaf level, Nickel defines `Boolean`, `Number`, `String` and `Null`. -In addition to that the language implements native support for `Enum` values. -Each of these are terminal leafs in the syntax tree. - -Completing JSON compatibility, `List` and `Record` constructs are present as well. -Records on a syntax level are HashMaps, uniquely associating an identifier with a sub-node. - -These data types constitute a static subset of Nickel which allows writing JSON compatible expressions as shown in [@lst:nickel-static]. - -```{.nickel #lst:nickel-static caption="Example of a static Nickel expression"} { - list = [ 1, "string", null], - "enum value" = `Value -} -``` - - - -Building on that Nickel also supports variables and functions which make up the majority of the AST. - -### Meta Information - -One key feature of Nickel is its gradual typing system [ref again?], which implies that values can be explicitly typed. -Complementing type information it is possible to annotate values with contracts and additional meta-data such as documentation, default values and merge priority a special syntax as displayed in [@lst:nickel-meta]. - - -```{.nickel #lst:nickel-meta caption="Example of a static Nickel expression"} -let Contract = { - foo | Num - | doc "I am foo", - hello | Str - | default = "world" - } - | doc "Just an example Contract" -in -let value | #Contract = { foo = 9 } -in value == { foo = 9, hello = "world"} - -> true -``` - -Internally, the addition of annotations wraps the annotated term in a `MetaValue` structure, that is creates an artificial tree node that describes its subtree. -Concretely, the expression shown in [@lst:nickel-meta-typed] translates to the AST in [@fig:nickel-meta-typed]. -The green `MetaValue` box is a virtual node generated during parsing and not present in the untyped equivalent. - -```{.nickel #lst:nickel-meta-typed caption="Example of a typed expression"} -let x: Num = 5 in x -``` - - -```{.graphviz #fig:nickel-meta-typed caption="AST of typed expression" height=4.5cm} -strict digraph { - graph [fontname = "Fira Code"]; - node [fontname = "Fira Code"]; - edge [fontname = "Fira Code"]; - - meta [label="MetaValue", color="green", shape="box"] - let [label = "Let('x')"] - num [label = "Num(5)"] - var [label = "Var('x')"] - - meta -> let - let -> num - let -> var -} -``` - - - -### Nested Record Access - -Nickel supports the referencing of variables which are represented as `Var` nodes that are resolved during runtime. -With records bound to a variable, a method to access elements inside that record is required. -The access of record members is represented using a special set of AST nodes depending on whether the member name requires an evaluation in which case resolution is deferred to the evaluation pass. -While the latter prevents static analysis of any deeper element by the LSP, `StaticAccess` can be used to resolve any intermediate reference. - -Notably, Nickel represents static access chains in inverse order as unary operations which in turn puts the terminal `Var` node as a leaf in the tree. -[Figure @fig:nickel-static-access] shows the representation of the static access perfomed in [@lst:nickel-static-access] with the rest of the tree omitted. - -```{.nickel #lst:nickel-static-access caption="Nickel static access"} -let x = { - y = { - z = 1; + apiVersion = "1.1.0", + metadata = metadata_, + replicas = 3, + containers = { + "main container" = webContainer "k8s.gcr.io/#{name_}" } -} in x.y.z -``` - - -```{.graphviz #fig:nickel-static-access caption="AST of typed expression" height=6cm} -strict digraph { - graph [fontname = "Fira Code"]; - node [fontname = "Fira Code", margin=0.25]; - edge [fontname = "Fira Code"]; - - rankdir="TD" +} | #NobernetesConfig - let [label = "Let", color="grey"] - rec [label = "omitted", color="grey", style="dashed", shape="box"] - - x [label = "Var('x')"] - unop_x_y [label = ".y", shape = "triangle", margin=0.066] - unop_y_z [label = ".z", shape = "triangle", margin=0.066] - - - let -> rec - let -> unop_y_z - unop_y_z -> unop_x_y - unop_x_y -> x -} ``` - -### Record Shorthand - -Nickel supports a shorthand syntax to efficiently define nested records similarly to how nested record fields are accessed. -As a comparison the example in [@lst:nickel-record-shorthand] uses the shorthand syntax which resolves to the semantically equivalent record defined in [@lst:nickel-record-no-shorthand] - -```{.nickel #lst:nickel-record-shorthand caption="Nickel record using shorthand"} -{ - deeply.nested.record.field = true; -} -``` - -```{.nickel #lst:nickel-record-no-shorthand caption="Nickel record defined explicitly"} -{ - deeply = { - nested = { - record = { - field = true - } - } - } -} -``` - -Yet, on a syntax level different Nickel generates a different representation. - - - - ## Linearization Being a domain specific language, the scope of analyzed Nickel files is expected to be small compared to other general purpose languages. From 9948bf6537e9c5e63b088ad1d9ba709ab8ff2d96 Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Thu, 13 Jan 2022 15:37:20 +0100 Subject: [PATCH 022/142] Add symbols section in outline --- chapter/methodology.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/chapter/methodology.md b/chapter/methodology.md index 07c64905..a74aa18d 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -213,3 +213,5 @@ impl Completed { #### Jump to Definition #### Show references + +#### Symbols From 849ed591babee908b0d205e85b4f28f205036c77 Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Thu, 13 Jan 2022 15:37:51 +0100 Subject: [PATCH 023/142] Add illustrative example in Nickel --- chapter/methodology.md | 37 +++++++++++++++++++++++++++++++++++++ 1 file changed, 37 insertions(+) diff --git a/chapter/methodology.md b/chapter/methodology.md index a74aa18d..5f7a8f90 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -6,6 +6,43 @@ Complementary, NLS is tightly coupled to Nickel's syntax definition. Based on that [@sec:linearization] will introduce the main datastructure underlying all higher level LSP interactions and how the AST described in [@sec:nickel-ast] is transformed into this form. Finally, in [@sec:lsp-server] the implementation of current LSP features is discussed on the basis of the previously reviewed components. +## Illustrative example + +The example [@lst:nickel-complete-example] shows an illustrative high level configuration of a server. +Throughout this chapter, different sections about the NSL implementation will refer back to this example. + +```{.nickel #lst:nickel-complete-example caption="Nickel example with most features shown"} +let Port | doc "A contract for a port number" = contracts.from_predicate (fun value => + builtins.is_num value && + value % 1 == 0 && + value >= 0 && + value <= 65535) in + +let Container = { + image | Str, + ports | List #Port, +} in + +let NobernetesConfig = { + apiVersion | Str, + metadata.name | Str, + replicas | #nums.PosNat + | doc "The number of replicas" + | default = 1, + containers | { _ : #Container }, + +} in + +let name_ = "myApp" in + +let metadata_ = { + name = name_, +} in + +let webContainer = fun image => { + image = image, + ports = [ 80, 443 ], +} in { apiVersion = "1.1.0", From 4f0c45bf8c0736ce42e1ec00e8435d12c334602e Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Thu, 13 Jan 2022 16:00:33 +0100 Subject: [PATCH 024/142] Amending reasoning for choosing eager approach --- chapter/methodology.md | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/chapter/methodology.md b/chapter/methodology.md index 5f7a8f90..5a1d2e3a 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -58,9 +58,10 @@ let webContainer = fun image => { ## Linearization -Being a domain specific language, the scope of analyzed Nickel files is expected to be small compared to other general purpose languages. -Hence, NLS takes an eager approach to code analysis, resolving all information at once which is then stored in a linear data structure with efficient access to elements. -This data structure is referred to as *linearization*. +The focus of the NLS as presented in this work is to implement a working language server with a comprehensive feature set. +Prioritizing a sound feature set, NLS takes an eager, non-incremental approach to code analysis, resolving all information at once for each code update (`didChange` and `didOpen` events), assuming that initial Nickel projects remain reasonably small. +The analysis result is subsequently stored in a linear data structure with efficient access to elements. +This data structure is referred to in the following as *linearization*. The term arises from the fact that the linearization is a transformation of the syntax tree into a linear structure which is presented in more detail in [@sec:transfer-from-ast]. The implementation distinguishes two separate states of the linearization. During its construction, the linearization will be in a *building* state, and is eventually post-processed yielding a *completed* state. From 1fd345f13037f44f8c629d4a3a7f275f75041001 Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Sat, 15 Jan 2022 14:00:09 +0100 Subject: [PATCH 025/142] Change description of Building state --- chapter/methodology.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/chapter/methodology.md b/chapter/methodology.md index 5a1d2e3a..6737e19e 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -84,12 +84,12 @@ Hence, the concrete implementation employs type-states[@typestate] to separate b At its base the `Linearization` type is a transparent smart pointer[@deref-chapter;@smart-pointer-chapter] to the particular `LinearizationState` which holds state specific data. On top of that NLS defines a `Building` and `Completed` state. -The `Building` state represents an accumulated created incrementally during the linearization process. -In particular that is a list of `LinearizationItems` of unresolved type ordered as they appear in a depth-first iteration of the AST. -Note that new elements are exclusively appended such that their `id` field during this phase is equal to the elements position at all time. -Additionally, the `Building` state maintains the scope structure for every item in a separate mapping. +The `Building` state represents a raw linearization. +In particular that is a list of `LinearizationItems` of unresolved type ordered as they are created through a depth-first iteration of the AST. +Note that new items are exclusively appended such that their `id` field is equal to the position at all time during this phase. +Additionally, the `Building` state records all items for each scope in a separate mapping. -Once fully built a `Building` instance is post-processed yielding a `Completed` linearization. +Once fully built, a `Building` instance is post-processed yielding a `Completed` linearization. While being defined similar to its origin, the structure is optimized for positional access, affecting the order of the `LinearizationItem`s and requiring an auxiliary mapping for efficient access to items by their `id`. Moreover, types of items in the `Completed` linearization will be resolved. From c9ac80a05690c40556af435d61c23a0987df472c Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Sat, 15 Jan 2022 14:00:45 +0100 Subject: [PATCH 026/142] Find most ~"concrete"~ -> "specific" element --- chapter/methodology.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/chapter/methodology.md b/chapter/methodology.md index 6737e19e..f7ae6444 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -163,7 +163,7 @@ The last requirement is achieved by using a stable sort during the post-processi The algorithm used is listed in [@lst:nls-resolve-at]. Given a concrete position, that is a `FileId` and `ByteIndex` in that file, a binary search is used to find the *last* element that *starts* at the given position. -According to the aforementioned preconditions an element found there is equivalent to being the most concrete element starting at this position. +According to the aforementioned preconditions an element found there is equivalent to being the most specific element starting at this position. In the more frequent case that no element starting at the provided position is found, the search instead yields an index which can be used as a starting point to iterate the linearization *backwards* to find an item with the shortest span containing the queried position. Due to the third requirement, this reverse iteration can be aborted once an item's span ends before the query. If the search has to be aborted, the query does not have a corresponding `LinearizationItem`. From aaee232149f5c71d0373e2abfc2974ef7ad958e1 Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Sun, 16 Jan 2022 22:35:12 +0100 Subject: [PATCH 027/142] Add Usage Graph section --- chapter/methodology.md | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/chapter/methodology.md b/chapter/methodology.md index f7ae6444..4f676f22 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -123,6 +123,25 @@ pub struct Completed { impl LinearizationState for Completed {} ``` +### Usage Graph + +At the core the linearization is a simple *linear* structure. +Also, in the general case^[Except single primitive expressions] the linearization is reordered in the post-processing step. +This makes it impossible to encode relationships of nodes on a structural level. +Yet, Nickel's support for name binding of variables, functions and in recursive records implies great a necessity for node-to-node relationships to be represented in a representation that aims to work with these relationships. +On a higher level, tracking both definitions and usages of identifiers yields a directed graph. + +There are three main kids of vertices in such a graph. +**Declarations** are nodes that introduce an identifier, and can be referred to by a set of nods. +Referral is represented as **Usage** nodes which can either be bound to a declaration or unbound if no corresponding declaration is known. +In practice Nickel distinguishes simple variable bindings from name binding through record fields in recursive records. +It also Integrates a **Record** kind to provide deep record destructuring. + +During the linearization process this graphical model is recreated on the linear representation of the source. +Hence, each `LinearizationItem` is associated with one of the aforementioned kinds, encoding its function in the usage graph. +Nodes of the AST that do not fit in a usage graph, a wildcard kind `Structure` is applied. + + ### Transfer from AST From b5a7b611d810646f7e53c7173eb2f048d270bae7 Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Thu, 20 Jan 2022 12:27:01 +0100 Subject: [PATCH 028/142] WIP section on scopes --- chapter/methodology.md | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) diff --git a/chapter/methodology.md b/chapter/methodology.md index 4f676f22..4647a543 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -143,6 +143,32 @@ Nodes of the AST that do not fit in a usage graph, a wildcard kind `Structure` i +#### Scopes + + + +The Nickel language implements supports scopes with variable shadowing. + +1. It is not possible to access undeclared variables +2. Variable names can be defined + +An AST can be used to represent this logic. +A variable reference always refers to the closest parent node defining the variable. +Scopes are naturally separated using branching, each branch of a node represents a sub-scope of its parent, i.e. new declarations made in one branch are not visible in the other. + +When eliminating the tree structure, scopes have to be maintained in order to provide auto-completion of identifiers and list symbol names. +Since the bare linear data structure cannot be used to deduce a scope, related metadata has to be tracked separately. +The NLS tracks scopes both statically, and temporarily during the linearization. +The language server maintains a register for identifiers defined in every scope. +This register allows NLS to resolve possible completion targets as detailed in [@sec:resolving-by-scope]. + +For simplicity, scopes are represented by a prefix list. +Each inner scope appends its `ScopeId` to the existing prefix defining its outer scope. +Parallel scopes are given unique `ScopeId`s to tell them apart. + + + + ### Transfer from AST From 8689c79e3ac21c117b5ca93899e95cb693aa8c2c Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Mon, 24 Jan 2022 02:16:26 +0100 Subject: [PATCH 029/142] Transfer from AST intro --- chapter/methodology.md | 34 +++++++++++++++++++++++++++++++++- 1 file changed, 33 insertions(+), 1 deletion(-) diff --git a/chapter/methodology.md b/chapter/methodology.md index 4647a543..f295a138 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -123,7 +123,39 @@ pub struct Completed { impl LinearizationState for Completed {} ``` -### Usage Graph +### Transfer from AST + +The NLS project aims to present a transferable architecture that can be adapted for future languages. +Consequently, NLS faces the challenge of satisfying multiple goals + +1. To keep up with the frequent changes to the Nickel language and ensure compatibility at minimal cost, NLS needs to integrate critical functions of Nickel's runtime +2. Adaptions to Nickel to accommodate the language server should be minimal not to obstruct its development and maintain performance of the runtime. + + +To accommodate these goals NLS comprises three different parts as shown in [@fig:nls-nickel-structure]. +The `Linearizer` trait acts as an interface between Nickel and the language server. +NLS implements such a `Linearizer` specialized to Nickel which registers nodes and builds a final linearization. +As Nickel's type checking implementation was adapted to pass AST nodes to the `Linearizer`. +During normal operation the overhead induced by the `Linearizer` is minimized using a stub implementation of the trait. + + +```{.graphviz #fig:nls-nickel-structure caption="Interaction of Componenets"} +digraph { + nls [label="NLS"] + nickel [label="Nickel"] + als [label="Linearizer", shape=box] + stub [label="Stub interface"] + + nls -> nickel [label="uses"] + nls -> als [label="implements"] + stub -> als [label="implements"] + nickel -> als [label="uses"] + nickel -> stub [label="uses"] +} +``` + + +#### Usage Graph At the core the linearization is a simple *linear* structure. Also, in the general case^[Except single primitive expressions] the linearization is reordered in the post-processing step. From b7d75fbfc6839e35173bb4c81dbe13e06dbb56bf Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Mon, 24 Jan 2022 02:16:54 +0100 Subject: [PATCH 030/142] Scopes subsection reworked --- chapter/methodology.md | 25 +++++++++++++------------ 1 file changed, 13 insertions(+), 12 deletions(-) diff --git a/chapter/methodology.md b/chapter/methodology.md index f295a138..8f62da70 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -177,27 +177,28 @@ Nodes of the AST that do not fit in a usage graph, a wildcard kind `Structure` i #### Scopes - + -The Nickel language implements supports scopes with variable shadowing. +The Nickel language implements lexical scopes with name shadowing. -1. It is not possible to access undeclared variables -2. Variable names can be defined +1. A name can only be referred to after it has been defined +2. A name can be redefined for a local area An AST can be used to represent this logic. -A variable reference always refers to the closest parent node defining the variable. -Scopes are naturally separated using branching, each branch of a node represents a sub-scope of its parent, i.e. new declarations made in one branch are not visible in the other. +A variable reference always refers to the closest parent node defining the name. +Scopes are naturally separated using branching. +Each branch of a node represents a sub-scope of its parent, i.e. new declarations made in one branch are not visible in the other. -When eliminating the tree structure, scopes have to be maintained in order to provide auto-completion of identifiers and list symbol names. +When eliminating the tree structure, scopes have to be maintained in order to provide auto-completion of identifiers and list symbol names based on their scope as context. Since the bare linear data structure cannot be used to deduce a scope, related metadata has to be tracked separately. -The NLS tracks scopes both statically, and temporarily during the linearization. -The language server maintains a register for identifiers defined in every scope. +The language server maintains a register for identifiers defined in every scope. This register allows NLS to resolve possible completion targets as detailed in [@sec:resolving-by-scope]. For simplicity, scopes are represented by a prefix list. -Each inner scope appends its `ScopeId` to the existing prefix defining its outer scope. -Parallel scopes are given unique `ScopeId`s to tell them apart. - +Whenever a new lexical scope is entered the prefix list of the outer scope is extended by a unique identifier. +With the example in mind [@lst:nickel-complete-example] contains the defintion of a simple record. + +Additionally, to keep track of the variables in scope, and iteratively build a usage graph, NLS keeps track of the latest definition of each variable name and which `Declaration` node it refers to. From 7c38c0e66a1d67b912ff13f33def9105d99ed79d Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Mon, 24 Jan 2022 02:17:56 +0100 Subject: [PATCH 031/142] Linearizer subsection --- chapter/methodology.md | 67 +++++++++++++++++++++++++++++++++++++++++- 1 file changed, 66 insertions(+), 1 deletion(-) diff --git a/chapter/methodology.md b/chapter/methodology.md index 8f62da70..ee7f5192 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -203,7 +203,72 @@ Additionally, to keep track of the variables in scope, and iteratively build a u -### Transfer from AST +#### Linearizer + +The heart of the linearization the `Linearizer` trait as defined in [@lst:nls-linearizer-trait]. +The `Linearizer` lives in parallel to the `Linearization`. +Its methods modify on a shared reference to a `Building` `Linearization` + +`Linearizer::add_term` + ~ is used to record a new term, i.e. AST node. + ~ It's responsibility is to combine context information stored in the `Linearizer` and concrete information about a node to extend the `Linearization` by appropriate items. +`Linearizer::retype_ident` + ~ is used to update the type information for a current identifier. + ~ The reason this method exists is that not all variable definitions have a corresponding AST node but may be part of another node. + This is especially apparent with records where the field names part of the record node and as such are linearized with the record but have to be assigned there actual type separately. +`Linearizer::complete` + ~ implements the post-processing necessary to turn a final `Building` linearization into a `Completed` one. + ~ Note that the post-processing might depend on additional data +`Linearizer::scope` + ~ returns a new `Linearizer` to be used for a sub-scope of the current one. + ~ Multiple calls to this method yield unique instances, each with their own scope. + ~ It is the caller's responsibility to call this method whenever a new scope is entered traversing the AST. + ~ Notably, the recursive traversal of an AST ensures that scopes are correctly backtracked. + + + +While data stored in the `Linearizer::Building` state will be accessible at any point in the linearization process, the `Linearizer` is considered to be only *scope safe*. +No instance data is propagated back to the outer scopes `Linearizer`. +Neither have `Linearizers` of sibling scopes access to each other's data. + + +```rust{.rust #lst:nls-linearizer-trait caption="Interface of linearizer trait"} +pub trait Linearizer { + type Building: LinearizationState + Default; + type Completed: LinearizationState + Default; + type CompletionExtra; + + fn add_term( + &mut self, + lin: &mut Linearization, + term: &Term, + pos: TermPos, + ty: TypeWrapper, + ) + + fn retype_ident( + &mut self, + lin: &mut Linearization, + ident: &Ident, + new_type: TypeWrapper, + ) + + fn complete( + self, + _lin: Linearization, + _extra: Self::CompletionExtra, + ) -> Linearization + where + Self: Sized, + + fn scope(&mut self) -> Self; +} +``` + + + + + #### Metadata From 631c9b6630dda13092e89269d004276119557034 Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Mon, 24 Jan 2022 02:18:12 +0100 Subject: [PATCH 032/142] General process started --- chapter/methodology.md | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/chapter/methodology.md b/chapter/methodology.md index ee7f5192..6d7c187b 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -201,6 +201,16 @@ With the example in mind [@lst:nickel-complete-example] contains the defintion o Additionally, to keep track of the variables in scope, and iteratively build a usage graph, NLS keeps track of the latest definition of each variable name and which `Declaration` node it refers to. +#### General Process + +From the perspective of the language server, building a linearization is a completely passive process. +For each analysis NLS initializes an empty linearization in the `Building` state. +This linearization is then passed into Nickel's type-checker along a `Linearizer` instance. + +Type checking in Nickel is implemented as a complete recursive depth-first preorder traversal of the AST. + + + #### Linearizer From aaeaaacd5a4843f1c6bacab0dd2894b04b67daaf Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Mon, 24 Jan 2022 02:19:43 +0100 Subject: [PATCH 033/142] Fix definition list --- chapter/methodology.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/chapter/methodology.md b/chapter/methodology.md index 6d7c187b..599566df 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -222,13 +222,16 @@ Its methods modify on a shared reference to a `Building` `Linearization` `Linearizer::add_term` ~ is used to record a new term, i.e. AST node. ~ It's responsibility is to combine context information stored in the `Linearizer` and concrete information about a node to extend the `Linearization` by appropriate items. + `Linearizer::retype_ident` ~ is used to update the type information for a current identifier. ~ The reason this method exists is that not all variable definitions have a corresponding AST node but may be part of another node. This is especially apparent with records where the field names part of the record node and as such are linearized with the record but have to be assigned there actual type separately. + `Linearizer::complete` ~ implements the post-processing necessary to turn a final `Building` linearization into a `Completed` one. ~ Note that the post-processing might depend on additional data + `Linearizer::scope` ~ returns a new `Linearizer` to be used for a sub-scope of the current one. ~ Multiple calls to this method yield unique instances, each with their own scope. From 3da38d7cca32b283f766d48a4ec96f51d81a7333 Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Mon, 24 Jan 2022 02:23:25 +0100 Subject: [PATCH 034/142] Fix Code block --- chapter/methodology.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/chapter/methodology.md b/chapter/methodology.md index 599566df..cf01551e 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -245,7 +245,7 @@ No instance data is propagated back to the outer scopes `Linearizer`. Neither have `Linearizers` of sibling scopes access to each other's data. -```rust{.rust #lst:nls-linearizer-trait caption="Interface of linearizer trait"} +```{.rust #lst:nls-linearizer-trait caption="Interface of linearizer trait"} pub trait Linearizer { type Building: LinearizationState + Default; type Completed: LinearizationState + Default; From b249ffcdaaf9f9b2fea708d63b31ce6fb44b1333 Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Mon, 24 Jan 2022 11:25:00 +0100 Subject: [PATCH 035/142] minor typos --- chapter/methodology.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/chapter/methodology.md b/chapter/methodology.md index cf01551e..3e9651cf 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -221,7 +221,7 @@ Its methods modify on a shared reference to a `Building` `Linearization` `Linearizer::add_term` ~ is used to record a new term, i.e. AST node. - ~ It's responsibility is to combine context information stored in the `Linearizer` and concrete information about a node to extend the `Linearization` by appropriate items. + ~ Its responsibility is to combine context information stored in the `Linearizer` and concrete information about a node to extend the `Linearization` by appropriate items. `Linearizer::retype_ident` ~ is used to update the type information for a current identifier. @@ -231,18 +231,18 @@ Its methods modify on a shared reference to a `Building` `Linearization` `Linearizer::complete` ~ implements the post-processing necessary to turn a final `Building` linearization into a `Completed` one. ~ Note that the post-processing might depend on additional data - + `Linearizer::scope` ~ returns a new `Linearizer` to be used for a sub-scope of the current one. ~ Multiple calls to this method yield unique instances, each with their own scope. - ~ It is the caller's responsibility to call this method whenever a new scope is entered traversing the AST. - ~ Notably, the recursive traversal of an AST ensures that scopes are correctly backtracked. + It is the caller's responsibility to call this method whenever a new scope is entered traversing the AST. + ~ The recursive traversal of an AST implies that scopes are correctly backtracked. -While data stored in the `Linearizer::Building` state will be accessible at any point in the linearization process, the `Linearizer` is considered to be only *scope safe*. +While data stored in the `Linearizer::Building` state will be accessible at any point in the linearization process, the `Linearizer` is considered to be *scope safe*. No instance data is propagated back to the outer scopes `Linearizer`. -Neither have `Linearizers` of sibling scopes access to each other's data. +Neither have `Linearizer`s of sibling scopes access to each other's data. ```{.rust #lst:nls-linearizer-trait caption="Interface of linearizer trait"} From 51d8859d493b4a0733aa2f4bf35327a3c453aef8 Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Mon, 24 Jan 2022 15:05:51 +0100 Subject: [PATCH 036/142] Intro general process --- chapter/methodology.md | 123 ++++++++++++++++++++++++++++------------- 1 file changed, 85 insertions(+), 38 deletions(-) diff --git a/chapter/methodology.md b/chapter/methodology.md index 3e9651cf..ab90f2b1 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -201,23 +201,47 @@ With the example in mind [@lst:nickel-complete-example] contains the defintion o Additionally, to keep track of the variables in scope, and iteratively build a usage graph, NLS keeps track of the latest definition of each variable name and which `Declaration` node it refers to. -#### General Process +#### Linearizer -From the perspective of the language server, building a linearization is a completely passive process. -For each analysis NLS initializes an empty linearization in the `Building` state. -This linearization is then passed into Nickel's type-checker along a `Linearizer` instance. -Type checking in Nickel is implemented as a complete recursive depth-first preorder traversal of the AST. +The heart of the linearization the `Linearizer` trait as defined in [@lst:nls-linearizer-trait]. +The `Linearizer` lives in parallel to the `Linearization`. +Its methods modify a shared reference to a `Building` `Linearization` +```{.rust #lst:nls-linearizer-trait caption="Interface of linearizer trait"} +pub trait Linearizer { + type Building: LinearizationState + Default; + type Completed: LinearizationState + Default; + type CompletionExtra; + fn add_term( + &mut self, + lin: &mut Linearization, + term: &Term, + pos: TermPos, + ty: TypeWrapper, + ) + fn retype_ident( + &mut self, + lin: &mut Linearization, + ident: &Ident, + new_type: TypeWrapper, + ) -#### Linearizer + fn complete( + self, + _lin: Linearization, + _extra: Self::CompletionExtra, + ) -> Linearization + where + Self: Sized, + + fn scope(&mut self) -> Self; +} +``` -The heart of the linearization the `Linearizer` trait as defined in [@lst:nls-linearizer-trait]. -The `Linearizer` lives in parallel to the `Linearization`. -Its methods modify on a shared reference to a `Building` `Linearization` `Linearizer::add_term` ~ is used to record a new term, i.e. AST node. @@ -243,39 +267,62 @@ Its methods modify on a shared reference to a `Building` `Linearization` While data stored in the `Linearizer::Building` state will be accessible at any point in the linearization process, the `Linearizer` is considered to be *scope safe*. No instance data is propagated back to the outer scopes `Linearizer`. Neither have `Linearizer`s of sibling scopes access to each other's data. +Yet the `scope` method can be implemented to pass arbitrary state down to the scoped instance. -```{.rust #lst:nls-linearizer-trait caption="Interface of linearizer trait"} -pub trait Linearizer { - type Building: LinearizationState + Default; - type Completed: LinearizationState + Default; - type CompletionExtra; - - fn add_term( - &mut self, - lin: &mut Linearization, - term: &Term, - pos: TermPos, - ty: TypeWrapper, - ) - - fn retype_ident( - &mut self, - lin: &mut Linearization, - ident: &Ident, - new_type: TypeWrapper, - ) +#### General Process - fn complete( - self, - _lin: Linearization, - _extra: Self::CompletionExtra, - ) -> Linearization - where - Self: Sized, +From the perspective of the language server, building a linearization is a completely passive process. +For each analysis NLS initializes an empty linearization in the `Building` state. +This linearization is then passed into Nickel's type-checker along a `Linearizer` instance. - fn scope(&mut self) -> Self; -} +Type checking in Nickel is implemented as a complete recursive depth-first preorder traversal of the AST. +As such it could easily be adapted to interact with a `Linearizer` since every node is visited and both type and scope information is available without the additional cost of a separate traversal. +Moreover, type checking proved optimal to interact with traversal as most transformations of the AST happen afterwards. + +While the type checking algorithm is complex only a fraction is of importance for the linearization. +Reducing the type checking function to what is relevant to the linearization process yields [@lst:nickel-tc-abstract]. +Essentially, every term is unconditionally registered by the linearization. +This is enough to handle a large subset of Nickel. +In fact, only records, let bindings and function definitions require additional change to enrich identifiers they define with type information. + + +```{.rust #nickel-tc-abstract caption="Abstract type checking function"} +fn type_check_( + lin: &mut Linearization, + mut linearizer: L, + rt: &RichTerm, + ty: TypeWrapper, + /* omitted */ +) -> Result<(), TypecheckError> { + let RichTerm { term: t, pos } = rt; + + // 1. record a node + linearizer.add_term(lin, t, *pos, ty.clone()); + + // handling of each term variant + // recursively calling `type_check_` + // + // 2. retype identifiers if needed + match t.as_ref() { + Term::RecRecord(stat_map, ..) => { + for (id, rt) in stat_map { + let tyw = binding_type(/* ommitted */); + linearizer.retype_ident(lin, id, tyw); + } + } + Term::Fun(ident, _) | + Term::FunPattern(Some(ident), _)=> { + let src = state.table.fresh_unif_var(); + linearizer.retype_ident(lin, ident, src.clone()); + } + Term::Let(ident, ..) | + Term::LetPattern(Some(ident), ..)=> { + let ty_let = binding_type(/* omitted */); + linearizer.retype_ident(lin, ident, ty_let.clone()); + } + _ => { /* ommitted */} + } ``` From d98007dd1e7297a1980f0aa945042bff584d5e7a Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Mon, 24 Jan 2022 19:22:38 +0100 Subject: [PATCH 037/142] Code typos --- chapter/methodology.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/chapter/methodology.md b/chapter/methodology.md index ab90f2b1..895163f2 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -307,7 +307,7 @@ fn type_check_( match t.as_ref() { Term::RecRecord(stat_map, ..) => { for (id, rt) in stat_map { - let tyw = binding_type(/* ommitted */); + let tyw = binding_type(/* omitted */); linearizer.retype_ident(lin, id, tyw); } } @@ -321,7 +321,7 @@ fn type_check_( let ty_let = binding_type(/* omitted */); linearizer.retype_ident(lin, ident, ty_let.clone()); } - _ => { /* ommitted */} + _ => { /* omitted */ } } ``` From 30ed2cea6c04fc4dc61b90cf141d906e4d8642b8 Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Mon, 24 Jan 2022 19:23:07 +0100 Subject: [PATCH 038/142] Change process section title --- chapter/methodology.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/chapter/methodology.md b/chapter/methodology.md index 895163f2..f8b33937 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -270,7 +270,7 @@ Neither have `Linearizer`s of sibling scopes access to each other's data. Yet the `scope` method can be implemented to pass arbitrary state down to the scoped instance. -#### General Process +#### Linearization Process From the perspective of the language server, building a linearization is a completely passive process. For each analysis NLS initializes an empty linearization in the `Building` state. From 4275249436f7218ed163482f9d75d7f097fe7250 Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Mon, 24 Jan 2022 19:24:30 +0100 Subject: [PATCH 039/142] Registering structures let and function args --- chapter/methodology.md | 51 +++++++++++++++++++++++++++++++++++++++--- 1 file changed, 48 insertions(+), 3 deletions(-) diff --git a/chapter/methodology.md b/chapter/methodology.md index f8b33937..5cf745b4 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -325,16 +325,61 @@ fn type_check_( } ``` +While registering a node, NLS distinguishes 4 kinds of nodes. +These are *metadata*, *usage graph* related nodes, i.e. declarations and usages, *static access* of nested record fields, and *general elements* which is every node that does not fall into one of the prior categories. +```{.nickel #lst:nickel-simple-expr caption="Exemplary nickel expressions"} +// atoms +1 +true +null +// binary operations +42 * 3 +[ 1, 2, 3 ] @ [ 4, 5] -#### Metadata +// if-then-else +if true then "TRUE :)" else "false :(" -#### Records +// string iterpolation +"#{ "hello" } #{ "world" }!" +``` + +##### Structures + +In the most common case of general elements, the node is simply registered as a `LinearizationItem` of kind `Structure`. +This applies for all simple expressions like those exemplified in [@lst:nickel-simple-expr] +Essentially, any of such nodes turns into a typed span as the remaining information tracked is the item's span and type checker provided type. + +```{.nickel #lst:nickel-let-binding caption="Let bindings and functions in nickel"} + +// simple bindings +let name = in +let func = fun arg => in + +// or with patterns +let name @ { field, with_default = 2 } = in +let func = fun arg @ { field, with_default = 2 } => + in + +``` + +##### Declarations + +Name bindings are equally simple. +NLS generates a `Declaration` item for the given identifier and assigns the identifier's position and provided type. +Additionally, it associates the identifier with the `id` of the created item in its current environment. +If a binding contains a pattern, NLS creates additional items for each matched element. +Unfortunately, no types are provided for these by Nickel. +Examples of let bindings can be found in use in [@lst:nickel-complete-example or @lst:nickel-let-binding] + + + +##### Static access -#### Static access +##### Metadata #### Integration with Nickel From fe0fe5aae84e0f1a3fae61ba7d7783276b30ed8d Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Mon, 24 Jan 2022 19:24:52 +0100 Subject: [PATCH 040/142] Registering Records --- chapter/methodology.md | 110 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 110 insertions(+) diff --git a/chapter/methodology.md b/chapter/methodology.md index 5cf745b4..cb92bc01 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -376,6 +376,116 @@ Unfortunately, no types are provided for these by Nickel. Examples of let bindings can be found in use in [@lst:nickel-complete-example or @lst:nickel-let-binding] +##### Records + + +```{.graphviz #fig:nickel-record caption="A record in Nickel"} + +{ + a = 2, + b = { + ba = 1 + } +} + +``` + +```{.graphviz #fig:nickel-record-ast caption="AST representation of a record"} +digraph G { + node[shape="record", fontname = "Fira Code", fontsize = 9] + + outer [label = "{RecRecord | { apiVersion | metadata | containers}}"] + apiVersion [ label = "Str | \"1.1.0\"" ] + metadata [label = "Var | metadata_"] + containers [ label = "{RecRecord | \"main container\" }" ] + main_container [ label = "{App | { * | * }}" ] + webContainer [ label = "Var | webContainer" ] + image [ label = "Var | image"] + + + outer:f1 -> apiVersion + outer:f2 -> metadata + outer:f3 -> containers + containers:f1 -> main_container + main_container:f1 -> webContainer + main_container:f2 -> image +} + +``` + +Linearizing records proves more difficult. +In [@sec:graph-representation] the AST representation of Records was discussed. +As shown by [@fig:nickel-record-ast], Nickel does not have AST nodes dedicated to record fields. +Instead, it associates field names with values as part of the `Record` node. +For the language server on the other hand the record field is as important as its value, since it serves as name declaration. +For that reason NLS distinguishes `Record` and `RecordField` as independent kinds of linearization items. + +NLS has to create a separate item for the field and the value. +That is to maintain similarity to the other binding types. +It provides a specific and logical span to reference and allows the value to be of another kind, such as a variable usage like shown in the example. +The language server is bound to process nodes individually. +Therefore, it can not process record values at the same time as the outer record. +Yet, record values may reference other fields defined in the same record regardless of the order, as records are recursive by default. +Consequently, all fields have to be in scope and as such be linearized beforehand. +While, `RecordField` items are created while processing the record, they can not yet be connected to the value they represent, as the linearizer can not know the `id` of the latter. +This is because the subtree of each of the fields can be arbitrary large causing an unknown amount of items, and hence intermediate `id`s to be added to the Linearization. + +A summary of this can be seen for instance on the linearization of the previously discussed record in [@fig:nls-lin-records]. +Here, record fields are linearized first, pointing to some following location. +Yet, as the `containers` field value is processed first, the `metadata` field value is offset by a number of fields unknown when the outer record node is processed. + +```{.graphviz #fig:nls-lin-records caption="Linearization of a record"} +digraph G { + rankdir = LR; + ranksep = 2; + nodesep = .5; + node[shape="record", fontname = "Fira Code", fontsize = 9] + + lin [ label = " | | | | ... | | | ...", width=.1] + + outer [ label = "Record" ] + field_apiVersion [label = "RecordField |apiVersion "] + field_containers [label="RecordField | containers"] + field_Metadata [label = "RecordField | Metadata"] + inner [ label = "Record" ] + file_main_container [label="RecordField| main_containers"] + + + + lin:f1 -> outer + outer -> lin:f2 [style = dashed] + outer -> lin:f3 [style = dashed] + outer -> lin:f4 [style = dashed] + + lin:f2 -> field_apiVersion + field_apiVersion -> lin:f5 [style = dashed] + + lin:f6 -> inner + inner -> lin:f7 [style = dashed] + + lin:f3 -> field_containers + field_containers -> lin:f6 [style = dashed] + + lin:f4 -> field_Metadata + field_Metadata-> lin:f8 [style = dashed] + + lin:f7 -> file_main_container + file_main_container -> lin:f8 [style = dashed] +} +``` + +To provide the necessary references, NLS makes used of the *scope safe* memory of its `Linearizer` implementation. +This is possible, because each record value corresponds to its own scope. +The complete process looks as follows: + +1. When registering a record, first the outer `Record` is added to the linearization +2. This is followed by `RecordField` items for its fields, which at this point do not reference any value. +3. NLS then stores the `id` of the parent as well as the fields and the offsets of the corresponding items (`n-4` and `[(apiVersion, n-3), (containers, n-2), (metadata, n-1)]` respectively in the example [@fig:nls-lin-records]). +4. The `scope` method will be called in the same order as the record fields appear. + Using this fact, the `scope` method moves the data stored for the next evaluated field into the freshly generated `Linearizer` +5. **(In the sub-scope)** The linearizer associates the `RecordField` item with the (now known) `id` of the field's value. + The cached field data is invalidated such that this process only happens once for each field. + ##### Static access From e182ce78e7c220a3ec84ff69a563e1d923701aed Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Mon, 24 Jan 2022 19:27:41 +0100 Subject: [PATCH 041/142] Change record example --- chapter/methodology.md | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/chapter/methodology.md b/chapter/methodology.md index cb92bc01..4d9e19fd 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -378,16 +378,15 @@ Examples of let bindings can be found in use in [@lst:nickel-complete-example or ##### Records - -```{.graphviz #fig:nickel-record caption="A record in Nickel"} - +```{.nickel #fig:nickel-record caption="A record in Nickel"} { - a = 2, - b = { - ba = 1 + apiVersion = "1.1.0", + metadata = metadata_, + replicas = 3, + containers = { + "main container" = webContainer image } } - ``` ```{.graphviz #fig:nickel-record-ast caption="AST representation of a record"} From 5a7f66c75555bfea9a31355ed57687dc1be13403 Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Mon, 24 Jan 2022 19:48:57 +0100 Subject: [PATCH 042/142] fix misprinted listings --- chapter/methodology.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/chapter/methodology.md b/chapter/methodology.md index 4d9e19fd..42aad851 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -287,7 +287,7 @@ This is enough to handle a large subset of Nickel. In fact, only records, let bindings and function definitions require additional change to enrich identifiers they define with type information. -```{.rust #nickel-tc-abstract caption="Abstract type checking function"} +```{.rust #lst:nickel-tc-abstract caption="Abstract type checking function"} fn type_check_( lin: &mut Linearization, mut linearizer: L, @@ -378,7 +378,7 @@ Examples of let bindings can be found in use in [@lst:nickel-complete-example or ##### Records -```{.nickel #fig:nickel-record caption="A record in Nickel"} +```{.nickel #lst:nickel-record caption="A record in Nickel"} { apiVersion = "1.1.0", metadata = metadata_, From 468d767f047eba5b8d3157087c202f4aa7f1eb4a Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Wed, 26 Jan 2022 17:20:27 +0100 Subject: [PATCH 043/142] Format and fix to code examples --- chapter/methodology.md | 16 +++++++++------- 1 file changed, 9 insertions(+), 7 deletions(-) diff --git a/chapter/methodology.md b/chapter/methodology.md index 42aad851..400b0b02 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -12,11 +12,12 @@ The example [@lst:nickel-complete-example] shows an illustrative high level conf Throughout this chapter, different sections about the NSL implementation will refer back to this example. ```{.nickel #lst:nickel-complete-example caption="Nickel example with most features shown"} -let Port | doc "A contract for a port number" = contracts.from_predicate (fun value => - builtins.is_num value && - value % 1 == 0 && - value >= 0 && - value <= 65535) in +let Port | doc "A contract for a port number" = + contracts.from_predicate (fun value => + builtins.is_num value && + value % 1 == 0 && + value >= 0 && + value <= 65535) in let Container = { image | Str, @@ -44,12 +45,14 @@ let webContainer = fun image => { ports = [ 80, 443 ], } in +let image = "k8s.gcr.io/#{name_}" in + { apiVersion = "1.1.0", metadata = metadata_, replicas = 3, containers = { - "main container" = webContainer "k8s.gcr.io/#{name_}" + "main container" = webContainer image } } | #NobernetesConfig @@ -103,7 +106,6 @@ pub trait LinearizationState {} pub struct Linearization { pub state: S, } - ``` ```{.rust #lst:nls-definition-building-type caption="Type Definition of Building state"} From e6fffeded8ac90674096d680ae93ba4db1163a9f Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Wed, 26 Jan 2022 17:21:36 +0100 Subject: [PATCH 044/142] Overhaul some sentences --- chapter/methodology.md | 16 +++++++--------- 1 file changed, 7 insertions(+), 9 deletions(-) diff --git a/chapter/methodology.md b/chapter/methodology.md index 400b0b02..97b42b59 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -167,9 +167,9 @@ On a higher level, tracking both definitions and usages of identifiers yields a There are three main kids of vertices in such a graph. **Declarations** are nodes that introduce an identifier, and can be referred to by a set of nods. -Referral is represented as **Usage** nodes which can either be bound to a declaration or unbound if no corresponding declaration is known. -In practice Nickel distinguishes simple variable bindings from name binding through record fields in recursive records. -It also Integrates a **Record** kind to provide deep record destructuring. +Referral is represented by **Usage** nodes which can either be bound to a declaration or unbound if no corresponding declaration is known. +In practice Nickel distinguishes simple variable bindings from name binding through record fields which are resolved during the post-precessing. +It also Integrates a **Record** and **RecordField** kinds to aid record destructuring. During the linearization process this graphical model is recreated on the linear representation of the source. Hence, each `LinearizationItem` is associated with one of the aforementioned kinds, encoding its function in the usage graph. @@ -186,9 +186,8 @@ The Nickel language implements lexical scopes with name shadowing. 1. A name can only be referred to after it has been defined 2. A name can be redefined for a local area -An AST can be used to represent this logic. -A variable reference always refers to the closest parent node defining the name. -Scopes are naturally separated using branching. +An AST inherently supports this logic. +A variable reference always refers to the closest parent node defining the name and scopes are naturally separated using branching. Each branch of a node represents a sub-scope of its parent, i.e. new declarations made in one branch are not visible in the other. When eliminating the tree structure, scopes have to be maintained in order to provide auto-completion of identifiers and list symbol names based on their scope as context. @@ -196,9 +195,8 @@ Since the bare linear data structure cannot be used to deduce a scope, related m The language server maintains a register for identifiers defined in every scope. This register allows NLS to resolve possible completion targets as detailed in [@sec:resolving-by-scope]. -For simplicity, scopes are represented by a prefix list. -Whenever a new lexical scope is entered the prefix list of the outer scope is extended by a unique identifier. -With the example in mind [@lst:nickel-complete-example] contains the defintion of a simple record. +For simplicity, scopes are represented by a prefix list of integers. +Whenever a new lexical scope is entered the list of the outer scope is extended by a unique identifier. Additionally, to keep track of the variables in scope, and iteratively build a usage graph, NLS keeps track of the latest definition of each variable name and which `Declaration` node it refers to. From 41c2448d8c3c6c28a57858fa511c47f343592fef Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Wed, 26 Jan 2022 17:22:43 +0100 Subject: [PATCH 045/142] Section: linearizing variable usage and record destructuring --- chapter/methodology.md | 145 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 145 insertions(+) diff --git a/chapter/methodology.md b/chapter/methodology.md index 97b42b59..e7c9bbfb 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -485,6 +485,151 @@ The complete process looks as follows: 5. **(In the sub-scope)** The linearizer associates the `RecordField` item with the (now known) `id` of the field's value. The cached field data is invalidated such that this process only happens once for each field. +##### Variable Usage and Static Record Access + +Looking at the AST representation of record destructuring in [@fig:nickel-static-access] shows that accessing inner records involves chains of unary operations *ending* with a reference to a variable binding. +Each operation encodes one identifier, i.e. field of a referenced record. +However, to reference the corresponding declaration, the final usage has to be known. +Therefore, instead of linearizing the intermediate elements directly, the `Linearizer` adds them to a shared stack until the grounding variable reference is reached. +Whenever a variable usage is linearized, NLS checks the stack for latent destructors. +If destructors are present, NLS adds `Usage` items for each element on the stack. + +Note that record destructors can be used as values of record fields as well and thus refer to other fields of the same record. +As the `Linearizer` processes the field values sequentially, it is possible that a usage references parts of the record that have not yet been processed making it unavailable for NLS to fully resolve. +A visualization of this is provided in [@fig:nls-unavailable-rec-record-field] +For this reason the `Usages` added to the linearization are marked as `Deferred` and will be fully resolved during the post-processing phase as documented in [@sec:resolving-deferred-access]. +In [@fig:ncl-record-access] this is shown visually. +The `Var` AST node is linearized as a `Resolved` usage node which points to the existing `Declaration` node for the identifier. +Mind that this could be a `RecordField` too if referred to in a record. +NLS linearized the trailing access nodes as `Deferred` nodes. + + + +```{.graphviz #fig:nls-unavailable-rec-record-field caption="Example race condition in recursive records. The field `y.yz` cannot be not be referenced at this point as the `y` branch has yet to be linearized"} +digraph G { + node [shape=record] + spline=false + /* Entities */ + record_x [label="Record|\{y,z\}"] + field_y [label="Field|y"] + field_z [label="Field|z"] + + subgraph { + node [shape=record, color=grey, style=dashed] + record_y [label="Record|\{yy, yz\}"] + field_yy [label="Field|yy"] + field_yz [label="Field|yz"] + } + + var_z [label = "Usage|y.yz"] + + hidden [shape=point, width=0, height = 0] + + /* Relationships */ + record_x -> {field_y, field_z} + field_y -> record_y + field_z -> var_z + record_y -> {field_yy, field_yz} [color=grey] + var_z -> field_yz [style=dashed, label="Not resolvable"] + + var_z -> hidden [style=invis] + + {rank=same; field_y; field_z } + {rank=same; field_yy; field_yz } + {rank=same; record_y; hidden;} + +} +``` + +```{.graphviz #fig:ncl-record-access caption="Depiction of generated usage nodes for record destructuring"} +digraph G { + node[shape="record", fontname = "Fira Code", fontsize = 9] + compound=true; + splines="ortho"; + newrank=true; + rankdir = TD; + + + subgraph cluster_x { + label="AST Nodes" + + x [label = "Var | x"] + d_y [label = "Access | .y"] + d_z [label = "Access | .z"] + + + x->d_y->d_z + } + + subgraph cluster_lin { + + label = "Linearization items" + + subgraph cluster_items { + + label="Existing Nodes" + + + // hidden + { + node[group="items"] + decl_x [label = "{Declaration | x}"] + rec_x [label = "{Record | \{y\}}"] + + field_y [label = "{RecordField | y}"] + rec_y [label = "{Record | \{z\}}"] + + field_z [label = "{RecordField | z}"] + + } + + decl_x -> + rec_x -> + field_y -> + rec_y -> + field_z + + } + + subgraph cluster_deferred { + label = "Generated Nodes" + use_x [label = "{Resolved | x}"] + + def_y [label = "{Deferred | x | y}"] + def_z [label = "{Deferred | y | z}"] + + + def_y-> use_x [constraint=false; ] + def_z -> def_y [] + + } + + + } + + + {rank=same; decl_x; x;} + + {rank=same; def_y; d_y; rec_x} + {rank=same; def_z; d_z; field_y} + + + + x -> use_x [constraint = false; ] + d_z -> def_z + d_y -> def_y + + + use_x -> decl_x [constraint = false; ] + + + def_y -> decl_x [style=dashed;] + decl_x -> rec_x -> field_y [style=dashed] + + def_z:z:e -> field_y -> rec_y -> field_z [style=dotted] + +} +``` ##### Static access From c3456f518fbf7ea286ad724eff9ab79b70dd4056 Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Thu, 27 Jan 2022 14:42:17 +0100 Subject: [PATCH 046/142] Clarify `TermKind`s --- chapter/methodology.md | 129 ++++++++++++++++++++++++++--------------- 1 file changed, 81 insertions(+), 48 deletions(-) diff --git a/chapter/methodology.md b/chapter/methodology.md index e7c9bbfb..ff870af7 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -173,7 +173,47 @@ It also Integrates a **Record** and **RecordField** kinds to aid record destruct During the linearization process this graphical model is recreated on the linear representation of the source. Hence, each `LinearizationItem` is associated with one of the aforementioned kinds, encoding its function in the usage graph. -Nodes of the AST that do not fit in a usage graph, a wildcard kind `Structure` is applied. + +```rust +pub enum TermKind { + Declaration(Ident, Vec), + Record(HashMap), + RecordField { + ident: Ident, + record: ID, + usages: Vec, + value: Option, + }, + + Usage(UsageState), + + Structure, +} + +pub enum UsageState { + Unbound, + Resolved(ID), + Deferred { parent: ID, child: Ident }, +} + +``` + +The `TermKind` type is an enumeration of the discussed cases and defines the role of a `LinearizationItem` in the usage graph. + +Variable bindings + ~ are linearized using the `Declaration` variant which holds the bound identifier as well as a list of `ID`s corresponding to its `Usage`s. + +Records + ~ remain similar to their AST representation. The `Record` variant simply maps field names to the linked `RecordField` + +Record fields + ~ make for to most complicated kind. The `RecordField` kind augments the qualities of a `Declaration` representing an identifier, and tracking its `Usage`s, while also maintaining a link back to its parent `Record` as well as explicitly referencing the value represented. + +Variable usages + ~ are further specified. `Usage`s that can not be mapped to a declaration are tagged `Unbound` or otherwise `Resolved` to the complementary `Declaration` + ~ Record destructuring may require a late resolution as discussed in [@sed:variable-usage-and-static-record-access]. +Other nodes + ~ of the AST that do not fit in a usage graph, are linearized as `Structure`. @@ -383,7 +423,7 @@ Examples of let bindings can be found in use in [@lst:nickel-complete-example or apiVersion = "1.1.0", metadata = metadata_, replicas = 3, - containers = { + containers = { "main container" = webContainer image } } @@ -409,7 +449,6 @@ digraph G { main_container:f1 -> webContainer main_container:f2 -> image } - ``` Linearizing records proves more difficult. @@ -522,7 +561,7 @@ digraph G { } var_z [label = "Usage|y.yz"] - + hidden [shape=point, width=0, height = 0] /* Relationships */ @@ -548,86 +587,80 @@ digraph G { splines="ortho"; newrank=true; rankdir = TD; - + subgraph cluster_x { label="AST Nodes" - + x [label = "Var | x"] d_y [label = "Access | .y"] d_z [label = "Access | .z"] - - + + x->d_y->d_z } - + subgraph cluster_lin { - + label = "Linearization items" - - subgraph cluster_items { - + + subgraph cluster_items { + label="Existing Nodes" - - + + // hidden { node[group="items"] decl_x [label = "{Declaration | x}"] rec_x [label = "{Record | \{y\}}"] - + field_y [label = "{RecordField | y}"] rec_y [label = "{Record | \{z\}}"] - + field_z [label = "{RecordField | z}"] - + } - + decl_x -> rec_x -> field_y -> rec_y -> field_z - + } - + subgraph cluster_deferred { label = "Generated Nodes" use_x [label = "{Resolved | x}"] - + def_y [label = "{Deferred | x | y}"] def_z [label = "{Deferred | y | z}"] - - + + def_y-> use_x [constraint=false; ] def_z -> def_y [] - - } - - + + } + } - - - {rank=same; decl_x; x;} - - {rank=same; def_y; d_y; rec_x} - {rank=same; def_z; d_z; field_y} - - x -> use_x [constraint = false; ] d_z -> def_z - d_y -> def_y - - + d_y -> def_y + + use_x -> decl_x [constraint = false; ] - - + + def_y -> decl_x [style=dashed;] decl_x -> rec_x -> field_y [style=dashed] - def_z:z:e -> field_y -> rec_y -> field_z [style=dotted] - + + + {rank=same; decl_x; x;} + {rank=same; def_y; d_y; rec_x} + {rank=same; def_z; d_z; field_y} } ``` @@ -654,7 +687,7 @@ To find items in this list three preconditions have to hold: 2. Items of different files appear ordered by `FileId` 3. Two spans are either within the bounds of the other or disjoint. $$\text{Item}^2_\text{start} \geq \text{Item}^1_\text{start} \land \text{Item}^2_\text{end} \leq \text{Item}^1_\text{end}$$ -4. Items referring to the spans starting at the same position have to occur in the same order before and after the post-processing. +4. Items referring to the spans starting at the same position have to occur in the same order before and after the post-processing. Concretely, this ensures that the tree-induced hierarchy is maintained, more precise elements follow broader ones This first two properties are an implication of the preceding processes. @@ -680,7 +713,7 @@ impl Completed { let linearization = &self.linearization; let item = match linearization .binary_search_by_key( - locator, + locator, |item| (item.pos.src_id, item.pos.start)) { // Found item(s) starting at `locator` @@ -694,8 +727,8 @@ impl Completed { Err(index) => { linearization[..index].iter().rfind(|item| { // Return the first (innermost) matching item - file_id == &item.pos.src_id - && start > &item.pos.start + file_id == &item.pos.src_id + && start > &item.pos.start && start < &item.pos.end }) } @@ -727,7 +760,7 @@ impl Completed { ) -> Vec<&LinearizationItem> { let EMPTY = Vec::with_capacity(0); // all prefix lengths - (0..scope.len()) + (0..scope.len()) // concatenate all scopes .flat_map(|end| self.scope.get(&scope[..=end]) .unwrap_or(&EMPTY)) From 14223a06e781938149ad7980e2930600c48d23b6 Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Thu, 27 Jan 2022 14:48:19 +0100 Subject: [PATCH 047/142] Fix typo --- chapter/methodology.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/chapter/methodology.md b/chapter/methodology.md index ff870af7..96057807 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -521,7 +521,7 @@ The complete process looks as follows: 3. NLS then stores the `id` of the parent as well as the fields and the offsets of the corresponding items (`n-4` and `[(apiVersion, n-3), (containers, n-2), (metadata, n-1)]` respectively in the example [@fig:nls-lin-records]). 4. The `scope` method will be called in the same order as the record fields appear. Using this fact, the `scope` method moves the data stored for the next evaluated field into the freshly generated `Linearizer` -5. **(In the sub-scope)** The linearizer associates the `RecordField` item with the (now known) `id` of the field's value. +5. **(In the sub-scope)** The `Linearizer` associates the `RecordField` item with the (now known) `id` of the field's value. The cached field data is invalidated such that this process only happens once for each field. ##### Variable Usage and Static Record Access From c4a2b92fbc1e709f60966e45dc05a0b1f0ca1ab7 Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Thu, 27 Jan 2022 14:48:45 +0100 Subject: [PATCH 048/142] Section Variable referencing --- chapter/methodology.md | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/chapter/methodology.md b/chapter/methodology.md index 96057807..939c2c65 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -524,7 +524,17 @@ The complete process looks as follows: 5. **(In the sub-scope)** The `Linearizer` associates the `RecordField` item with the (now known) `id` of the field's value. The cached field data is invalidated such that this process only happens once for each field. -##### Variable Usage and Static Record Access + +##### Variable Reference + +While name declaration can happen in several ways, the usage of a variable is always expressed as a `Var` node wrapping a referenced identifier. +Registering a name usage is a multi-step process. + +First, NLS tries to find the identifier in its scoped aware name registry. +If the registry does not contain the identifier, NLS will linearize the node as `Unbound`. +In the case that the registry lookup succeeds, NLS retrieves the referenced `Declaration` or `RecordField`. The `Linearizer` will then add the `Resolved` `Usage` item to the linearization and update the declaration's list of usages. + +###### Variable Usage and Static Record Access Looking at the AST representation of record destructuring in [@fig:nickel-static-access] shows that accessing inner records involves chains of unary operations *ending* with a reference to a variable binding. Each operation encodes one identifier, i.e. field of a referenced record. From 8b77edbd7892b223a3280ecc469240289f1626f9 Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Thu, 27 Jan 2022 15:10:43 +0100 Subject: [PATCH 049/142] Fix code block: make it a listing --- chapter/methodology.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/chapter/methodology.md b/chapter/methodology.md index 939c2c65..f327d1fd 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -174,7 +174,7 @@ It also Integrates a **Record** and **RecordField** kinds to aid record destruct During the linearization process this graphical model is recreated on the linear representation of the source. Hence, each `LinearizationItem` is associated with one of the aforementioned kinds, encoding its function in the usage graph. -```rust +```{.rust #lst:nls-termkind-definition caption="Definition of a linearization items TermKind"} pub enum TermKind { Declaration(Ident, Vec), Record(HashMap), From e4cc1f3fd5cef7ddce8d3c1a28a2e9fdc4b2a288 Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Thu, 27 Jan 2022 15:12:05 +0100 Subject: [PATCH 050/142] Reword linearization of name bindings --- chapter/methodology.md | 29 ++++++++--------------------- 1 file changed, 8 insertions(+), 21 deletions(-) diff --git a/chapter/methodology.md b/chapter/methodology.md index f327d1fd..58bc0231 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -381,40 +381,27 @@ null [ 1, 2, 3 ] @ [ 4, 5] // if-then-else -if true then "TRUE :)" else "false :(" +if true then "TRUE :)" else "false :(" // string iterpolation "#{ "hello" } #{ "world" }!" ``` -##### Structures - In the most common case of general elements, the node is simply registered as a `LinearizationItem` of kind `Structure`. This applies for all simple expressions like those exemplified in [@lst:nickel-simple-expr] Essentially, any of such nodes turns into a typed span as the remaining information tracked is the item's span and type checker provided type. -```{.nickel #lst:nickel-let-binding caption="Let bindings and functions in nickel"} - -// simple bindings -let name = in -let func = fun arg => in - -// or with patterns -let name @ { field, with_default = 2 } = in -let func = fun arg @ { field, with_default = 2 } => - in - -``` ##### Declarations -Name bindings are equally simple. -NLS generates a `Declaration` item for the given identifier and assigns the identifier's position and provided type. -Additionally, it associates the identifier with the `id` of the created item in its current environment. -If a binding contains a pattern, NLS creates additional items for each matched element. -Unfortunately, no types are provided for these by Nickel. -Examples of let bindings can be found in use in [@lst:nickel-complete-example or @lst:nickel-let-binding] +In case of `let` bindings or function arguments name binding is equally simple. + +When the `Let` node is processed, the `Linearizer` generates `Declaration` items for each identifier contained. +As discussed in [@sec:let-bindings-and-functions] the `Let` node may contain a name binding as well as pattern matches. +The node's type supplied to the `Linearizer` accords to the value and is therefore applied to the name binding only. +Additionally, NLS updates its name register with the newly created `Declaration`s. +The same process applies for argument names in function declarations. ##### Records From 6f0aef5245598db3cff37c4a629d95aa0067d7f8 Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Thu, 27 Jan 2022 15:12:58 +0100 Subject: [PATCH 051/142] Add lost chapter header back to document --- chapter/methodology.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/chapter/methodology.md b/chapter/methodology.md index 58bc0231..59df43fa 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -369,6 +369,8 @@ While registering a node, NLS distinguishes 4 kinds of nodes. These are *metadata*, *usage graph* related nodes, i.e. declarations and usages, *static access* of nested record fields, and *general elements* which is every node that does not fall into one of the prior categories. +##### Structures + ```{.nickel #lst:nickel-simple-expr caption="Exemplary nickel expressions"} // atoms From 2e062aa1816e595f3e46988396aa1e5373528c1a Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Thu, 27 Jan 2022 20:30:49 +0100 Subject: [PATCH 052/142] Document linearizer fields --- chapter/methodology.md | 31 ++++++++++++++++++++++++++++++- 1 file changed, 30 insertions(+), 1 deletion(-) diff --git a/chapter/methodology.md b/chapter/methodology.md index 59df43fa..33b08c63 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -307,7 +307,36 @@ pub trait Linearizer { While data stored in the `Linearizer::Building` state will be accessible at any point in the linearization process, the `Linearizer` is considered to be *scope safe*. No instance data is propagated back to the outer scopes `Linearizer`. Neither have `Linearizer`s of sibling scopes access to each other's data. -Yet the `scope` method can be implemented to pass arbitrary state down to the scoped instance. +Yet, the `scope` method can be implemented to pass arbitrary state down to the scoped instance. +The scope safe storage of the `Linearizer` implemented by NLS, as seen in [@lst:nls-analyisis-host-definition], stores the scope aware register and scope related data. +Additionally, it contains fields to allow the linearization of records and record destructuring, as well as metadata ([@sec:records, @sec:variable-usage-and-static-record-access and @sec:metadata]) + +```rust +pub struct AnalysisHost { + env: Environment, + scope: Scope, + next_scope_id: ScopeId, + meta: Option, + /// Indexing a record will store a reference to the record as + /// well as its fields. + /// [Self::Scope] will produce a host with a single **`pop`ed** + /// Ident. As fields are typechecked in the same order, each + /// in their own scope immediately after the record, which + /// gives the corresponding record field _term_ to the ident + /// useable to construct a vale declaration. + record_fields: Option<(usize, Vec<(usize, Ident)>)>, + /// Accesses to nested records are recorded recursively. + /// ``` + /// outer.middle.inner -> inner(middle(outer)) + /// ``` + /// To resolve those inner fields, accessors (`inner`, `middle`) + /// are recorded first until a variable (`outer`). is found. + /// Then, access to all nested records are resolved at once. + access: Option>, +} +``` + + #### Linearization Process From d2912a66a6c6845bec379c5d41e01383d830bfc1 Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Thu, 27 Jan 2022 20:33:22 +0100 Subject: [PATCH 053/142] Section Metadata --- chapter/methodology.md | 16 +++++++++++++--- 1 file changed, 13 insertions(+), 3 deletions(-) diff --git a/chapter/methodology.md b/chapter/methodology.md index 33b08c63..d970ef92 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -692,11 +692,21 @@ digraph G { } ``` -##### Static access - ##### Metadata -#### Integration with Nickel +In [@sec:meta-information] was shown that on the syntax level, metadata "wraps" the annotated value. +Conversely, NLS encodes metadata in the `LinearizationItem` as metadata is intrinsically related to a value. +NLS therefore has to defer handling of the `MetaValue` node until the processing of the associated value in the succeeding call. +Like record destructors, NLS temporarily stores this metadata in the `Linearizer`'s memory. + +Metadata always precedes its value immediately. +Thus, whenever a node is linearized, NLS checks whether any latent metadata is stored. +If there is, it moves it to the value's `LinearizationItem`, clearing the temporary storage. + +Although metadata is not linearized as is, contracts encoded in the metadata can however refer to locally bound names. +Considering that only the annotated value is type-checked and therefore passed to NLS, resolving Usages in contracts requires NLS to separately walk the contract expression. +Therefore, NLS traverses the AST of expressions used as value annotations. +In order to avoid interference with the main linearization, contracts are linearized using their own `Linearizer`. ##### Scope From 7a275b088210d534b7d243772ac75b23c8b59dd9 Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Mon, 31 Jan 2022 13:24:04 +0100 Subject: [PATCH 054/142] Cleanup file --- chapter/methodology.md | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/chapter/methodology.md b/chapter/methodology.md index d970ef92..5f6472ab 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -1,4 +1,4 @@ -# Design implementation of NLS +# Design implementation of NLS This chapter contains a detailed guide through the various steps and components of the Nickel Language Server (NLS). Being written in the same language (Rust[@rust]) as the Nickel interpreter allows NLS to integrate existing components for language analysis. @@ -12,7 +12,7 @@ The example [@lst:nickel-complete-example] shows an illustrative high level conf Throughout this chapter, different sections about the NSL implementation will refer back to this example. ```{.nickel #lst:nickel-complete-example caption="Nickel example with most features shown"} -let Port | doc "A contract for a port number" = +let Port | doc "A contract for a port number" = contracts.from_predicate (fun value => builtins.is_num value && value % 1 == 0 && @@ -31,7 +31,7 @@ let NobernetesConfig = { | doc "The number of replicas" | default = 1, containers | { _ : #Container }, - + } in let name_ = "myApp" in @@ -51,7 +51,7 @@ let image = "k8s.gcr.io/#{name_}" in apiVersion = "1.1.0", metadata = metadata_, replicas = 3, - containers = { + containers = { "main container" = webContainer image } } | #NobernetesConfig @@ -97,7 +97,7 @@ While being defined similar to its origin, the structure is optimized for positi Moreover, types of items in the `Completed` linearization will be resolved. Type definitions of the `Linearization` as well as its type-states `Building` and `Completed` are listed in [@lst:nickel-definition-lineatization;@lst:nls-definition-building-type;@lst:nls-definition-completed-type]. -Note that only the former is defined as part of the Nickel libraries, the latter are specific implementations for NLS. +Note that only the former is defined as part of the Nickel libraries, the latter are specific implementations for NLS. ```{.rust #lst:nickel-definition-lineatization caption="Definition of Linearization structure"} @@ -212,6 +212,7 @@ Record fields Variable usages ~ are further specified. `Usage`s that can not be mapped to a declaration are tagged `Unbound` or otherwise `Resolved` to the complementary `Declaration` ~ Record destructuring may require a late resolution as discussed in [@sed:variable-usage-and-static-record-access]. + Other nodes ~ of the AST that do not fit in a usage graph, are linearized as `Structure`. @@ -243,7 +244,6 @@ Additionally, to keep track of the variables in scope, and iteratively build a u #### Linearizer - The heart of the linearization the `Linearizer` trait as defined in [@lst:nls-linearizer-trait]. The `Linearizer` lives in parallel to the `Linearization`. Its methods modify a shared reference to a `Building` `Linearization` @@ -604,7 +604,6 @@ digraph G { {rank=same; field_y; field_z } {rank=same; field_yy; field_yz } {rank=same; record_y; hidden;} - } ``` From a4a70a483d5eb1b514fdee0d98c01a7539bdace3 Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Mon, 31 Jan 2022 13:26:40 +0100 Subject: [PATCH 055/142] Remove section headers for scopes and retyping --- chapter/methodology.md | 3 --- 1 file changed, 3 deletions(-) diff --git a/chapter/methodology.md b/chapter/methodology.md index 5f6472ab..a7edf5c3 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -707,9 +707,6 @@ Considering that only the annotated value is type-checked and therefore passed t Therefore, NLS traverses the AST of expressions used as value annotations. In order to avoid interference with the main linearization, contracts are linearized using their own `Linearizer`. -##### Scope - -##### Retyping ### Post-Processing From fc6d4aac683e18eed1d01174314dd46ecd75916e Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Mon, 31 Jan 2022 13:26:48 +0100 Subject: [PATCH 056/142] Post processing intro --- chapter/methodology.md | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/chapter/methodology.md b/chapter/methodology.md index a7edf5c3..f74a54f9 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -709,6 +709,15 @@ In order to avoid interference with the main linearization, contracts are linear ### Post-Processing +Once the entire AST has been processed NLS modifies the Linearization to make it suitable as an efficient index to serve various LSP commands. + +After the post-processing the resulting linearization + +1. allows efficient lookup of elements from file locations +2. maintains an `id` based lookup +3. links deeply nested record destructors to the correct definitions +4. provides all available type information utilizing Nickel's typing backend + ### Resolving Elements From 5b134edae5d43ee240dfb398a5f8d6f671bdd2a8 Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Mon, 31 Jan 2022 13:27:14 +0100 Subject: [PATCH 057/142] Post-Processing: sorting --- chapter/methodology.md | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/chapter/methodology.md b/chapter/methodology.md index f74a54f9..ea3fb9d0 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -718,6 +718,20 @@ After the post-processing the resulting linearization 3. links deeply nested record destructors to the correct definitions 4. provides all available type information utilizing Nickel's typing backend +#### Sorting + +Since the linearization is performed in a preorder traversal, processing already happens in the order elements are defined physically. +Yet, during the linearization the location might be unstable or unknown for different items. +Record fields for instance are processed in an arbitrary order rather than the order they are defined. +Moreover, for nested records and record short notations, symbolic `Record` items are created which cannot be mapped to a physical location and are thus placed at the range `[0..=0]` in the beginning of the file. +Maintaining constant insertion performance and item-referencing require that the linearization is exclusively appended. +Each of these cases, break the physical linearity of the linearization. + +NLS thus defers reordering of items. +The language server uses a stable sorting algorithm to sort items by their associated span's starting position. +This way, nesting of items with the same start location is preserved. +Since several operations require efficient access to elements by `id`, which after the sorting does not correspond to the items index in the linearization, after sorting NLS creates an index mapping `id`s to list indices. + ### Resolving Elements From 3ee378bf337ab4ee14d89a791b82b89fae70f82f Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Tue, 1 Feb 2022 13:12:01 +0100 Subject: [PATCH 058/142] Post processing: Deferred Usages --- chapter/methodology.md | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+) diff --git a/chapter/methodology.md b/chapter/methodology.md index ea3fb9d0..9e2b302b 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -732,6 +732,27 @@ The language server uses a stable sorting algorithm to sort items by their assoc This way, nesting of items with the same start location is preserved. Since several operations require efficient access to elements by `id`, which after the sorting does not correspond to the items index in the linearization, after sorting NLS creates an index mapping `id`s to list indices. +#### Resolving deferred access + +[Section @sec:variable-usage-and-static-record-access] introduced the `Deferred` type for `Usages`. +Resolution of usages is deferred if chained destructors are used. +This is especially important in recursive records where any value may refer to other fields of the record which could still be unresolved. + +As seen in [@fig:ncl-record-access], the items generated for each destructor only link to their parent item. +Yet, the root access is connected to a known declaration. +Since at this point all records are fully processed NLS is able to resolve destructors iteratively. + +First NLS collects all deferred usages in a queue. +Each usage contains the *`id`* of the parent destructor as well as the *name* of the field itself represents. +NLS then tries to resolve the base record for the usage by resolving the parent. +If the value of the parent destructor is not yet known or a deferred usage, NLS will enqueue the destructor once again to be processed again later. +In practical terms that is after the other fields of a common record. +In any other case the parent consequently has to point to a record, either directly, through a record field or a variable. +NLS will then get the `id` of the `RecordField` for the destructors *name* and mark the `Usage` as `Known` +If no field with that name is present or the parent points to a `Structure` or `Unbound` usage, the destructor cannot be resolved in a meaningful way and will thus be marked `Unbound`. + + +#### Resolving types ### Resolving Elements From d7ce0211f6bcadb0dfbc440ca08ef14721c79735 Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Tue, 1 Feb 2022 13:29:46 +0100 Subject: [PATCH 059/142] Post processing: Resolving types --- chapter/methodology.md | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/chapter/methodology.md b/chapter/methodology.md index 9e2b302b..89e75a2d 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -754,6 +754,13 @@ If no field with that name is present or the parent points to a `Structure` or ` #### Resolving types + +As a necessity for type checking, Nickel generates type variables for any node of the AST which it hands down to the `Linearizer`. + +In order to provide meaningful information, the Language Server needs to derive concrete types from these variables. +The required metadata needs to be provided by the type checker. + + ### Resolving Elements #### Resolving by position From e0e670dd681f27e525512ab6c196aab5c05789c6 Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Tue, 1 Feb 2022 13:31:55 +0100 Subject: [PATCH 060/142] Cleanup figure code --- chapter/methodology.md | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/chapter/methodology.md b/chapter/methodology.md index 89e75a2d..c5c4f308 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -582,10 +582,10 @@ digraph G { field_z [label="Field|z"] subgraph { - node [shape=record, color=grey, style=dashed] - record_y [label="Record|\{yy, yz\}"] - field_yy [label="Field|yy"] - field_yz [label="Field|yz"] + node [shape=record, color=grey, style=dashed] + record_y [label="Record|\{yy, yz\}"] + field_yy [label="Field|yy"] + field_yz [label="Field|yz"] } var_z [label = "Usage|y.yz"] @@ -635,7 +635,6 @@ digraph G { label="Existing Nodes" - // hidden { node[group="items"] From aaf4e2321a1b5cbfa5d6a4cf85d1a88fbdc9d17e Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Fri, 4 Feb 2022 11:43:49 +0100 Subject: [PATCH 061/142] Section: Server::Intro --- chapter/methodology.md | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/chapter/methodology.md b/chapter/methodology.md index c5c4f308..1ded7a58 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -859,6 +859,18 @@ impl Completed { ## LSP Server +[Section @sec:commands-and-notifications] introduced the concept of capabilities in the context of the language server protocol. +This section describes how NSL uses the linearization described in [@sec:linearization] to implement a comprehensive set of features. +There are two kinds of capabilities, passive diagnostics and commands. + +NLS instructs the LSP client to notify the server once the user opens or modifies a file. +Each notification contains the complete source code of the file as well as its location. +NLS subsequently parses and type-checks the file using Nickel's libraries. +Since Nickel deals with error reporting already, NLS converts any error generated in these processes into [Diagnostic](https://microsoft.github.io/language-server-protocol/specifications/specification-current/#diagnostic) items and sends them to the client as server notifications. +Nickel errors provide detailed information about location of the issue as well as possible details which NLS can include in the Diagnostic items. + +As discussed in [@sec:linearization] and , the type-checking yields a `Completed` linearization which implements crucial methods to resolve elements. + ### Diagnostics and Caching ### Capabilities From 0decf02db92ca35f2798efa91dd33d6800c0fb5a Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Sat, 5 Feb 2022 14:11:43 +0100 Subject: [PATCH 062/142] Apply some text suggestions (style, grammar,typo) --- chapter/methodology.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/chapter/methodology.md b/chapter/methodology.md index 1ded7a58..04baf8ce 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -236,8 +236,8 @@ Since the bare linear data structure cannot be used to deduce a scope, related m The language server maintains a register for identifiers defined in every scope. This register allows NLS to resolve possible completion targets as detailed in [@sec:resolving-by-scope]. -For simplicity, scopes are represented by a prefix list of integers. -Whenever a new lexical scope is entered the list of the outer scope is extended by a unique identifier. +For simplicity, NLS represents scopes by a prefix list of integers. +Whenever a new lexical scope is entered, the list of the outer scope is extended by a unique identifier. Additionally, to keep track of the variables in scope, and iteratively build a usage graph, NLS keeps track of the latest definition of each variable name and which `Declaration` node it refers to. @@ -246,7 +246,7 @@ Additionally, to keep track of the variables in scope, and iteratively build a u The heart of the linearization the `Linearizer` trait as defined in [@lst:nls-linearizer-trait]. The `Linearizer` lives in parallel to the `Linearization`. -Its methods modify a shared reference to a `Building` `Linearization` +Its methods modify a shared reference to a `Building` `Linearization`. ```{.rust #lst:nls-linearizer-trait caption="Interface of linearizer trait"} @@ -294,7 +294,7 @@ pub trait Linearizer { `Linearizer::complete` ~ implements the post-processing necessary to turn a final `Building` linearization into a `Completed` one. - ~ Note that the post-processing might depend on additional data + ~ Note that the post-processing might depend on additional data. `Linearizer::scope` ~ returns a new `Linearizer` to be used for a sub-scope of the current one. @@ -309,7 +309,7 @@ No instance data is propagated back to the outer scopes `Linearizer`. Neither have `Linearizer`s of sibling scopes access to each other's data. Yet, the `scope` method can be implemented to pass arbitrary state down to the scoped instance. The scope safe storage of the `Linearizer` implemented by NLS, as seen in [@lst:nls-analyisis-host-definition], stores the scope aware register and scope related data. -Additionally, it contains fields to allow the linearization of records and record destructuring, as well as metadata ([@sec:records, @sec:variable-usage-and-static-record-access and @sec:metadata]) +Additionally, it contains fields to allow the linearization of records and record destructuring, as well as metadata ([@sec:records, @sec:variable-usage-and-static-record-access and @sec:metadata]). ```rust pub struct AnalysisHost { From 582b25183be3ee0caf86f191d34bfaa2ae53c14e Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Sat, 5 Feb 2022 14:12:50 +0100 Subject: [PATCH 063/142] Section: LSP Server :: Diagnostics and Caching --- chapter/methodology.md | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/chapter/methodology.md b/chapter/methodology.md index 04baf8ce..1b3fb1de 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -861,7 +861,9 @@ impl Completed { [Section @sec:commands-and-notifications] introduced the concept of capabilities in the context of the language server protocol. This section describes how NSL uses the linearization described in [@sec:linearization] to implement a comprehensive set of features. -There are two kinds of capabilities, passive diagnostics and commands. +NLS implements the most commonly compared capabilities *Code completion*, *Hover* *Jump to def*, *Find references*, *Workspace symbols* and *Diagnostics*. + +### Diagnostics and Caching NLS instructs the LSP client to notify the server once the user opens or modifies a file. Each notification contains the complete source code of the file as well as its location. @@ -869,9 +871,10 @@ NLS subsequently parses and type-checks the file using Nickel's libraries. Since Nickel deals with error reporting already, NLS converts any error generated in these processes into [Diagnostic](https://microsoft.github.io/language-server-protocol/specifications/specification-current/#diagnostic) items and sends them to the client as server notifications. Nickel errors provide detailed information about location of the issue as well as possible details which NLS can include in the Diagnostic items. -As discussed in [@sec:linearization] and , the type-checking yields a `Completed` linearization which implements crucial methods to resolve elements. +As discussed in [@sec:linearization] and [@sec:resolving-elements] the type-checking yields a `Completed` linearization which implements crucial methods to resolve elements. +NLS will cache the linearization for each processed file. +This way it can provide its LSP functions while a file is being edited. -### Diagnostics and Caching ### Capabilities From e96a1a4f86f944e94b67d73461d97f1fae9ba299 Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Sat, 5 Feb 2022 14:13:43 +0100 Subject: [PATCH 064/142] Section: LSP Server :: Hover and references --- chapter/methodology.md | 23 +++++++++++++++++++---- 1 file changed, 19 insertions(+), 4 deletions(-) diff --git a/chapter/methodology.md b/chapter/methodology.md index 1b3fb1de..b32378e2 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -875,15 +875,30 @@ As discussed in [@sec:linearization] and [@sec:resolving-elements] the type-chec NLS will cache the linearization for each processed file. This way it can provide its LSP functions while a file is being edited. +### Commands -### Capabilities +Contrary to Diagnostics, which are part of a `Notification` based interaction with the client and thus entirely asynchronous, `Commands` are issued by the client which expects an explicit synchronous answer. +While servers may report long-running tasks and defer sending eventual results back, user experience urges quick responses. +NLS achieves the required low latency by leveraging the eagerly built linearization. +Consequently, the language server implements most `Commands` through a series of searches and lookups of items. #### Hover -#### Completion +When hovering an item or issuing the corresponding command in text based editors, the LSP client will send a request for element information containing the cursor's *location* in a given *file*. +Upon request, NLS loads the cached linearization and performs a lookup for a `LinearizationItem` associated with the location using the linearization interface presented in [@sec:resolving-by-position]. +If the linearization contains an appropriate item, NLS serializes the item's type and possible metadata into a response object which is resolves the RPC call. +Otherwise, NLS signals no item could be found. + +#### Jump to Definition and Show references -#### Jump to Definition +Similar to *hover* requests, usage graph related commands associate a location in the source with an action. +NLS first attempts to resolve an item for the requested position using the cached linearization. +Depending on the command the item must be either a `Usage` or `Declaration`/`RecordField`. +Given the item is of the correct kind, the language server looks up the referenced declaration or associated usages respectively. +The stored position of each item is encoded in the LSP defined format and sent to the client. +In short, usage graph queries perform two lookups to the linearization. +One for the requested element and another one to retrieve the linked item. -#### Show references +#### Completion #### Symbols From d9b46a0dd791b3504b240b9cca30cbbb9900e07a Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Sun, 6 Feb 2022 13:37:05 +0100 Subject: [PATCH 065/142] Fix word --- chapter/methodology.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/chapter/methodology.md b/chapter/methodology.md index b32378e2..2fa3e85d 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -897,7 +897,7 @@ Depending on the command the item must be either a `Usage` or `Declaration`/`Rec Given the item is of the correct kind, the language server looks up the referenced declaration or associated usages respectively. The stored position of each item is encoded in the LSP defined format and sent to the client. In short, usage graph queries perform two lookups to the linearization. -One for the requested element and another one to retrieve the linked item. +One for the requested element and a second one to retrieve the linked item. #### Completion From 9522440c0be528a11dafa2d2083a3c2bd659363d Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Sun, 6 Feb 2022 13:37:24 +0100 Subject: [PATCH 066/142] Section: LSP Server :: Completion and Document symbols --- chapter/methodology.md | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/chapter/methodology.md b/chapter/methodology.md index 2fa3e85d..0bad2ffc 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -901,4 +901,13 @@ One for the requested element and a second one to retrieve the linked item. #### Completion -#### Symbols +Item completion makes use of the scope identifiers attached to each item. +Since Nickel implements lexical scopes, all declarations made in parent scopes can be a reference. +If two declarations use the same identifier, Nickel applies variable shadowing to refer to the most recent declaration, i.e., the declaration with the deepest applicable scope. +NLS uses scope identifiers which represent scope depth as described in [@sec:scopes] to retrieve symbol names for a reference scope using the method described in [@sec:resolving-by-scope]. +The current scope taken as reference is derived from the item at cursor position. + +#### Document Symbols + +The Nickel Language Server interprets all items of kind `Declaration` as document symbol. +Accordingly, it filters the linearization by kind and serializes all declarations into an LSP response object. From 72d52c038405c57bd26df7af15884f59a23ff0e0 Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Mon, 7 Feb 2022 16:05:54 +0100 Subject: [PATCH 067/142] Apply suggestions from code review Co-authored-by: Yann Hamdaoui --- chapter/methodology.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/chapter/methodology.md b/chapter/methodology.md index 0bad2ffc..7e3653cb 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -3,8 +3,8 @@ This chapter contains a detailed guide through the various steps and components of the Nickel Language Server (NLS). Being written in the same language (Rust[@rust]) as the Nickel interpreter allows NLS to integrate existing components for language analysis. Complementary, NLS is tightly coupled to Nickel's syntax definition. -Based on that [@sec:linearization] will introduce the main datastructure underlying all higher level LSP interactions and how the AST described in [@sec:nickel-ast] is transformed into this form. -Finally, in [@sec:lsp-server] the implementation of current LSP features is discussed on the basis of the previously reviewed components. +[Section @sec:linearization] will introduce the main datastructure underlying all higher level LSP interactions and how the AST described in [@sec:nickel-ast] is transformed into this form. +Finally, the implementation of current LSP features is discussed in [@sec:lsp-server]. ## Illustrative example @@ -137,7 +137,7 @@ Consequently, NLS faces the challenge of satisfying multiple goals To accommodate these goals NLS comprises three different parts as shown in [@fig:nls-nickel-structure]. The `Linearizer` trait acts as an interface between Nickel and the language server. NLS implements such a `Linearizer` specialized to Nickel which registers nodes and builds a final linearization. -As Nickel's type checking implementation was adapted to pass AST nodes to the `Linearizer`. +Nickel's type checking implementation was adapted to pass AST nodes to the `Linearizer`. During normal operation the overhead induced by the `Linearizer` is minimized using a stub implementation of the trait. From d7d2a93b05c846d0f284e3cc09e29f6995b3b86b Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Tue, 8 Feb 2022 15:35:23 +0100 Subject: [PATCH 068/142] Rework linearization Intro --- chapter/methodology.md | 26 ++++++++++++++++++-------- 1 file changed, 18 insertions(+), 8 deletions(-) diff --git a/chapter/methodology.md b/chapter/methodology.md index 7e3653cb..193dc0e9 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -62,14 +62,24 @@ let image = "k8s.gcr.io/#{name_}" in ## Linearization The focus of the NLS as presented in this work is to implement a working language server with a comprehensive feature set. -Prioritizing a sound feature set, NLS takes an eager, non-incremental approach to code analysis, resolving all information at once for each code update (`didChange` and `didOpen` events), assuming that initial Nickel projects remain reasonably small. -The analysis result is subsequently stored in a linear data structure with efficient access to elements. -This data structure is referred to in the following as *linearization*. -The term arises from the fact that the linearization is a transformation of the syntax tree into a linear structure which is presented in more detail in [@sec:transfer-from-ast]. -The implementation distinguishes two separate states of the linearization. -During its construction, the linearization will be in a *building* state, and is eventually post-processed yielding a *completed* state. -The semantics of these states are defined in [@sec:states], while the post-processing is described separately in [@sec:post-processing]. -Finally, [@sec:resolving-elements] explains how the linearization is accessed. +To answer requests, NLS needs to store more information than what is originally present in a Nickel AST. +Apart from missing data, an AST is not optimized for quick random access of nodes based on their position, which is a crucial operation for a language server. +To that end NLS introduces an auxiliary data structure, the *linearization*, which is derived from the AST. +It represents the original data linearly, performs an enrichment of the AST nodes and provides greater decoupling of the LSP functions from the implemented language. +[Section @sec:transfer-from-ast] details the process of transferring the AST. +After NLS parsed a Nickel source files to an AST it starts to fill the linearization, which is in a *building* state during this phase. +For reasons detailed in [@sec:post-processing], the linearization needs to be post-processed, yielding a *completed* state. +The completed linearization acts as the basis to handle all supported LSP requests as explained in [@sec:lsp-server]. +[Section @sec:resolving-elements] explains how a completed linearization is accessed. + +Advanced LSP implementations sometimes employ so-called incremental parsing, which allows updating only the relevant parts of an AST (and, in case of NLS, the derived linearization) upon small changes in the source. +However, an incremental LSP is not trivial to implement. +For once, NLS would not be able to leverage existing components from the existing Nickel implementation (most notably, the parser). +Parts of the nickel runtime, such as the typechecker, would need to be adapted or even reimplemented to work in an incremental way too. +Considering the scope of this thesis, the presented approach performs a complete analysis on every update to the source file. +The typical size of Nickel projects is assumed to remain small for quite some time, giving reasonable performance in practice. +Incremental parsing, type-checking and analysis can still be implemented as a second step in the future. + ### States From e0ed9e9c027e02864e11a8cb6bf7da7d53de8e45 Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Tue, 8 Feb 2022 16:11:01 +0100 Subject: [PATCH 069/142] Address review suggestions in States subsection --- chapter/methodology.md | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/chapter/methodology.md b/chapter/methodology.md index 193dc0e9..2478951a 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -83,12 +83,15 @@ Incremental parsing, type-checking and analysis can still be implemented as a se ### States -At its core the linearization in either state is represented by an array of `LinearizationItem`s which are derived from AST nodes during the linearization process as well as state dependent auxiliary structures. +At its core the linearization in either state is represented by an array of `LinearizationItem`s which are derived from AST nodes during the linearization process. +However, the exact structure of that array is differs as an effect of the post-processing. -Closely related to nodes, `LinearizationItem`s maintain the position of their AST counterpart, as well as its type. +`LinearizationItem`s maintain the position of their AST counterpart, as well as its type. Unlike in the AST, *metadata* is directly associated with the element. Further deviating from the AST representation, the *type* of the node and its *kind* are tracked separately. -The latter is used to distinguish between declarations of variables, records, record fields and variable usages as well as a wildcard kind for any other kind of structure, such as terminals control flow elements. +The latter is used to represent a usage graph on top of the linear structure. +It distinguishes between declarations (`let` bindings, function parameters, records) and variable usages. +Any other kind of structure, for instance, primitive values (Strings, numbers, boolean, enumerations), is recorded as `Structure`. The aforementioned separation of linearization states got special attention. As the linearization process is integrated with the libraries underlying the Nickel interpreter, it had to be designed to cause minimal overhead during normal execution. From 646f57b61673e3ae41be5f6416e071684338b057 Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Tue, 8 Feb 2022 16:42:54 +0100 Subject: [PATCH 070/142] Fix typo --- chapter/methodology.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/chapter/methodology.md b/chapter/methodology.md index 2478951a..10569e36 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -3,7 +3,7 @@ This chapter contains a detailed guide through the various steps and components of the Nickel Language Server (NLS). Being written in the same language (Rust[@rust]) as the Nickel interpreter allows NLS to integrate existing components for language analysis. Complementary, NLS is tightly coupled to Nickel's syntax definition. -[Section @sec:linearization] will introduce the main datastructure underlying all higher level LSP interactions and how the AST described in [@sec:nickel-ast] is transformed into this form. +[Section @sec:linearization] will introduce the main data structure underlying all higher level LSP interactions and how the AST described in [@sec:nickel-ast] is transformed into this form. Finally, the implementation of current LSP features is discussed in [@sec:lsp-server]. ## Illustrative example From a3d286a9cb29a4cabcc7256a6c9637c64e209438 Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Tue, 8 Feb 2022 16:43:31 +0100 Subject: [PATCH 071/142] Rewrite state subsection intro --- chapter/methodology.md | 27 +++++++++++++++------------ 1 file changed, 15 insertions(+), 12 deletions(-) diff --git a/chapter/methodology.md b/chapter/methodology.md index 10569e36..4ca74e67 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -80,6 +80,7 @@ Considering the scope of this thesis, the presented approach performs a complete The typical size of Nickel projects is assumed to remain small for quite some time, giving reasonable performance in practice. Incremental parsing, type-checking and analysis can still be implemented as a second step in the future. + ### States @@ -93,21 +94,23 @@ The latter is used to represent a usage graph on top of the linear structure. It distinguishes between declarations (`let` bindings, function parameters, records) and variable usages. Any other kind of structure, for instance, primitive values (Strings, numbers, boolean, enumerations), is recorded as `Structure`. -The aforementioned separation of linearization states got special attention. -As the linearization process is integrated with the libraries underlying the Nickel interpreter, it had to be designed to cause minimal overhead during normal execution. -Hence, the concrete implementation employs type-states[@typestate] to separate both states on a type level and defines generic interfaces that allow for context dependent implementations. +To separate the phases of the elaboration of the linearization in a type-safe, the implementation is based on type-states[@typestate]. +Type-states were chosen over an enumeration bases approach for the additional flexibility they provide to build a generic interface. +Thanks to the generic interface, the adaptions to Nickel to integrate NLS are expected to have almost no influence on the runtime performance of the language in an optimized build. -At its base the `Linearization` type is a transparent smart pointer[@deref-chapter;@smart-pointer-chapter] to the particular `LinearizationState` which holds state specific data. -On top of that NLS defines a `Building` and `Completed` state. +NLS implements separate type-states for the two phases of the linearization: `Building` and `Completed`. -The `Building` state represents a raw linearization. -In particular that is a list of `LinearizationItems` of unresolved type ordered as they are created through a depth-first iteration of the AST. -Note that new items are exclusively appended such that their `id` field is equal to the position at all time during this phase. -Additionally, the `Building` state records all items for each scope in a separate mapping. -Once fully built, a `Building` instance is post-processed yielding a `Completed` linearization. -While being defined similar to its origin, the structure is optimized for positional access, affecting the order of the `LinearizationItem`s and requiring an auxiliary mapping for efficient access to items by their `id`. -Moreover, types of items in the `Completed` linearization will be resolved. +building phase: + ~ A linearization in the `Building` state is a linearization under construction. + It is a list of `LinearizationItem`s of unresolved type, appended as they are created during a depth-first traversal of the AST. + ~ During this phase, the `id` affected to a new item is always equal to its index in the array. + ~ The Building state also records the definitions in scope of each item in a separate mapping. + +post-processing phase: + ~ Once fully built, a Building instance is post-processed to get a `Completed` linearization. + ~ Although fundamentally still represented by an array, a completed linearization is optimized for search by positions (in the source file) thanks to sorting and the use of an auxiliary map from `id`s to the new index of items. + ~ Additionally, missing edges in the usage graph have been created and he types of items are fully resolved in a completed linearization. Type definitions of the `Linearization` as well as its type-states `Building` and `Completed` are listed in [@lst:nickel-definition-lineatization;@lst:nls-definition-building-type;@lst:nls-definition-completed-type]. Note that only the former is defined as part of the Nickel libraries, the latter are specific implementations for NLS. From 4f0b00b975b7a0701a089fd1376e3fb17e392767 Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Tue, 8 Feb 2022 16:45:49 +0100 Subject: [PATCH 072/142] Apply suggestions from code review Co-authored-by: Yann Hamdaoui --- chapter/methodology.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/chapter/methodology.md b/chapter/methodology.md index 4ca74e67..288b9d2e 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -147,7 +147,7 @@ The NLS project aims to present a transferable architecture that can be adapted Consequently, NLS faces the challenge of satisfying multiple goals 1. To keep up with the frequent changes to the Nickel language and ensure compatibility at minimal cost, NLS needs to integrate critical functions of Nickel's runtime -2. Adaptions to Nickel to accommodate the language server should be minimal not to obstruct its development and maintain performance of the runtime. +2. Adaptions to Nickel to accommodate the language server should be minimal not obstruct its development and maintain performance of the runtime. To accommodate these goals NLS comprises three different parts as shown in [@fig:nls-nickel-structure]. @@ -214,7 +214,7 @@ pub enum UsageState { ``` -The `TermKind` type is an enumeration of the discussed cases and defines the role of a `LinearizationItem` in the usage graph. +The `TermKind` type is an enumeration which defines the role of a `LinearizationItem` in the usage graph. Variable bindings ~ are linearized using the `Declaration` variant which holds the bound identifier as well as a list of `ID`s corresponding to its `Usage`s. @@ -241,7 +241,7 @@ Other nodes The Nickel language implements lexical scopes with name shadowing. 1. A name can only be referred to after it has been defined -2. A name can be redefined for a local area +2. A name can be redefined locally An AST inherently supports this logic. A variable reference always refers to the closest parent node defining the name and scopes are naturally separated using branching. @@ -304,7 +304,7 @@ pub trait Linearizer { ~ Its responsibility is to combine context information stored in the `Linearizer` and concrete information about a node to extend the `Linearization` by appropriate items. `Linearizer::retype_ident` - ~ is used to update the type information for a current identifier. + ~ is used to update the type information of an identifier. ~ The reason this method exists is that not all variable definitions have a corresponding AST node but may be part of another node. This is especially apparent with records where the field names part of the record node and as such are linearized with the record but have to be assigned there actual type separately. From ef3ffea771ce9667398a2038d0f78c4a57efd3d0 Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Tue, 8 Feb 2022 17:05:41 +0100 Subject: [PATCH 073/142] Rewrite Usage graph intro --- chapter/methodology.md | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/chapter/methodology.md b/chapter/methodology.md index 288b9d2e..c19e8509 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -176,10 +176,11 @@ digraph { #### Usage Graph At the core the linearization is a simple *linear* structure. -Also, in the general case^[Except single primitive expressions] the linearization is reordered in the post-processing step. -This makes it impossible to encode relationships of nodes on a structural level. -Yet, Nickel's support for name binding of variables, functions and in recursive records implies great a necessity for node-to-node relationships to be represented in a representation that aims to work with these relationships. -On a higher level, tracking both definitions and usages of identifiers yields a directed graph. +Yet, it represents relationships of nodes on a structural level as a tree-like structure. +Taking into account variable usage information adds back-edges to the original AST, yielding a graph structure. +Both kinds of edges have to be encoded with the elements in the list. +Alas, items have to be referred to using `id`s since the index of items cannot be relied on(such as in e.g. a binary heap), because the array is reordered to optimize access by source position. + There are three main kids of vertices in such a graph. **Declarations** are nodes that introduce an identifier, and can be referred to by a set of nods. From 544a9ab098e0bbd8259da9f94488095fbcfe7267 Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Tue, 8 Feb 2022 17:06:23 +0100 Subject: [PATCH 074/142] Address comments to term kind definitions --- chapter/methodology.md | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/chapter/methodology.md b/chapter/methodology.md index c19e8509..5fcfaf95 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -221,10 +221,14 @@ Variable bindings ~ are linearized using the `Declaration` variant which holds the bound identifier as well as a list of `ID`s corresponding to its `Usage`s. Records - ~ remain similar to their AST representation. The `Record` variant simply maps field names to the linked `RecordField` + ~ remain similar to their AST representation. + The `Record` variant simply maps field names to the linked `RecordField` Record fields - ~ make for to most complicated kind. The `RecordField` kind augments the qualities of a `Declaration` representing an identifier, and tracking its `Usage`s, while also maintaining a link back to its parent `Record` as well as explicitly referencing the value represented. + ~ are represented as `RecordField` kinds and store: + - the same data as for identifiers (and, in particular, tracks its usages) + - a link to the parent `Record` + - a link to the value of the field Variable usages ~ are further specified. `Usage`s that can not be mapped to a declaration are tagged `Unbound` or otherwise `Resolved` to the complementary `Declaration` From 4ff87c513292902976be00080ee4599a04804e8b Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Tue, 8 Feb 2022 17:06:48 +0100 Subject: [PATCH 075/142] Try clarify retype_ident --- chapter/methodology.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/chapter/methodology.md b/chapter/methodology.md index 5fcfaf95..df4cbea9 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -311,7 +311,7 @@ pub trait Linearizer { `Linearizer::retype_ident` ~ is used to update the type information of an identifier. ~ The reason this method exists is that not all variable definitions have a corresponding AST node but may be part of another node. - This is especially apparent with records where the field names part of the record node and as such are linearized with the record but have to be assigned there actual type separately. + This is the case with records; Field *names* are not linearized separately but as part of the record. Thus, their type is not known to the linearizer and has to be added explicitly. `Linearizer::complete` ~ implements the post-processing necessary to turn a final `Building` linearization into a `Completed` one. From b1b2de8a04a1f96e7660fa880a8a9e11fe4452a7 Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Tue, 8 Feb 2022 17:24:48 +0100 Subject: [PATCH 076/142] flatten single child section `Variable Usage` --- chapter/methodology.md | 2 -- 1 file changed, 2 deletions(-) diff --git a/chapter/methodology.md b/chapter/methodology.md index df4cbea9..1a951e8f 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -573,8 +573,6 @@ First, NLS tries to find the identifier in its scoped aware name registry. If the registry does not contain the identifier, NLS will linearize the node as `Unbound`. In the case that the registry lookup succeeds, NLS retrieves the referenced `Declaration` or `RecordField`. The `Linearizer` will then add the `Resolved` `Usage` item to the linearization and update the declaration's list of usages. -###### Variable Usage and Static Record Access - Looking at the AST representation of record destructuring in [@fig:nickel-static-access] shows that accessing inner records involves chains of unary operations *ending* with a reference to a variable binding. Each operation encodes one identifier, i.e. field of a referenced record. However, to reference the corresponding declaration, the final usage has to be known. From 3de02d36fa9296b72ad1c3366a86e689dbe8b376 Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Tue, 8 Feb 2022 17:25:03 +0100 Subject: [PATCH 077/142] Correctly format list --- chapter/methodology.md | 1 + 1 file changed, 1 insertion(+) diff --git a/chapter/methodology.md b/chapter/methodology.md index 1a951e8f..3b52273a 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -226,6 +226,7 @@ Records Record fields ~ are represented as `RecordField` kinds and store: + - the same data as for identifiers (and, in particular, tracks its usages) - a link to the parent `Record` - a link to the value of the field From 465ef20cb1e6f914df201cd711db49fa61b3355f Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Tue, 8 Feb 2022 17:26:41 +0100 Subject: [PATCH 078/142] Remove redundant sentence --- chapter/methodology.md | 1 - 1 file changed, 1 deletion(-) diff --git a/chapter/methodology.md b/chapter/methodology.md index 3b52273a..f27a08e1 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -442,7 +442,6 @@ if true then "TRUE :)" else "false :(" In the most common case of general elements, the node is simply registered as a `LinearizationItem` of kind `Structure`. This applies for all simple expressions like those exemplified in [@lst:nickel-simple-expr] -Essentially, any of such nodes turns into a typed span as the remaining information tracked is the item's span and type checker provided type. ##### Declarations From 07a3b67c47ef86876e93cc739da0ccf5d1181584 Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Tue, 8 Feb 2022 17:38:25 +0100 Subject: [PATCH 079/142] Reword AST transfer intro --- chapter/methodology.md | 18 ++++++++++++------ 1 file changed, 12 insertions(+), 6 deletions(-) diff --git a/chapter/methodology.md b/chapter/methodology.md index f27a08e1..a38aeaab 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -146,15 +146,21 @@ impl LinearizationState for Completed {} The NLS project aims to present a transferable architecture that can be adapted for future languages. Consequently, NLS faces the challenge of satisfying multiple goals -1. To keep up with the frequent changes to the Nickel language and ensure compatibility at minimal cost, NLS needs to integrate critical functions of Nickel's runtime -2. Adaptions to Nickel to accommodate the language server should be minimal not obstruct its development and maintain performance of the runtime. +1. To keep up with the frequent changes to the Nickel language and ensure compatibility at minimal cost, NLS needs to *integrate critical functions* of Nickel's runtime +2. Adaptions to Nickel to accommodate the language server should be minimal not obstruct its development and *maintain performance of the runtime*. To accommodate these goals NLS comprises three different parts as shown in [@fig:nls-nickel-structure]. -The `Linearizer` trait acts as an interface between Nickel and the language server. -NLS implements such a `Linearizer` specialized to Nickel which registers nodes and builds a final linearization. -Nickel's type checking implementation was adapted to pass AST nodes to the `Linearizer`. -During normal operation the overhead induced by the `Linearizer` is minimized using a stub implementation of the trait. + +The `Linearizer` trait + ~ acts as an interface between Nickel and the language server. + NLS implements a `Linearizer` specialized to Nickel which registers AST nodes and builds a final linearization. +Nickel's type checking implementation + ~ was adapted to pass AST nodes to the `Linearizer`. + Modifications to Nickel are minimal, comprising only few additional function calls and a slightly extended argument list. +A stub implementation + ~ of the `Linearizer` trait is used during normal operation. + Since most methods of this implementation are `no-op`s, the compiler should be able to optimize away all `Linearizer` calls in release builds. ```{.graphviz #fig:nls-nickel-structure caption="Interaction of Componenets"} From d6045e5deca6f8a6f547cfac699aaed617777807 Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Wed, 9 Feb 2022 15:05:30 +0100 Subject: [PATCH 080/142] Make usage graph subsection more concise --- chapter/methodology.md | 29 +++++++++++++---------------- 1 file changed, 13 insertions(+), 16 deletions(-) diff --git a/chapter/methodology.md b/chapter/methodology.md index a38aeaab..ace3b9f7 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -187,15 +187,12 @@ Taking into account variable usage information adds back-edges to the original A Both kinds of edges have to be encoded with the elements in the list. Alas, items have to be referred to using `id`s since the index of items cannot be relied on(such as in e.g. a binary heap), because the array is reordered to optimize access by source position. +There are two groups of vertices in such a graph. +**Declarations** are nodes that introduce an identifier, and can be referred to by a set of nodes. +Referral is represented by **Usage** nodes. -There are three main kids of vertices in such a graph. -**Declarations** are nodes that introduce an identifier, and can be referred to by a set of nods. -Referral is represented by **Usage** nodes which can either be bound to a declaration or unbound if no corresponding declaration is known. -In practice Nickel distinguishes simple variable bindings from name binding through record fields which are resolved during the post-precessing. -It also Integrates a **Record** and **RecordField** kinds to aid record destructuring. - -During the linearization process this graphical model is recreated on the linear representation of the source. -Hence, each `LinearizationItem` is associated with one of the aforementioned kinds, encoding its function in the usage graph. +During the linearization process this graphical model is embeded into the items of the linearization. +Hence, each `LinearizationItem` is associated with a kind representing the item's role in the graph (see: [@lst:nls-termkind-definition]). ```{.rust #lst:nls-termkind-definition caption="Definition of a linearization items TermKind"} pub enum TermKind { @@ -218,17 +215,14 @@ pub enum UsageState { Resolved(ID), Deferred { parent: ID, child: Ident }, } - ``` -The `TermKind` type is an enumeration which defines the role of a `LinearizationItem` in the usage graph. - -Variable bindings +Variable bindings and function arguments ~ are linearized using the `Declaration` variant which holds the bound identifier as well as a list of `ID`s corresponding to its `Usage`s. Records ~ remain similar to their AST representation. - The `Record` variant simply maps field names to the linked `RecordField` + The `Record` variant simply maps the record's field names to the linked `RecordField` Record fields ~ are represented as `RecordField` kinds and store: @@ -238,11 +232,14 @@ Record fields - a link to the value of the field Variable usages - ~ are further specified. `Usage`s that can not be mapped to a declaration are tagged `Unbound` or otherwise `Resolved` to the complementary `Declaration` - ~ Record destructuring may require a late resolution as discussed in [@sed:variable-usage-and-static-record-access]. + ~ can be in three different states. + + 1. `Usage`s that can not (yet) be mapped to a declaration are tagged `Unbound` + 2. A `Resolved` usage introduces a back-link to the complementary `Declaration` + 3. For record destructuring resolution of the name might need to be `Deferred` to the post-processing as discussed in [@sec:variable-usage-and-static-record-access]. Other nodes - ~ of the AST that do not fit in a usage graph, are linearized as `Structure`. + ~ of the AST that do not participate in the usage graph, are linearized as `Structure` - A wildcard variant with no associated data. From c6f06dad38fd269dba513ea74a74982696f659a5 Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Wed, 9 Feb 2022 15:34:46 +0100 Subject: [PATCH 081/142] Reword declarations subsection --- chapter/methodology.md | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/chapter/methodology.md b/chapter/methodology.md index ace3b9f7..caf9017c 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -450,13 +450,12 @@ This applies for all simple expressions like those exemplified in [@lst:nickel-s ##### Declarations In case of `let` bindings or function arguments name binding is equally simple. - -When the `Let` node is processed, the `Linearizer` generates `Declaration` items for each identifier contained. -As discussed in [@sec:let-bindings-and-functions] the `Let` node may contain a name binding as well as pattern matches. -The node's type supplied to the `Linearizer` accords to the value and is therefore applied to the name binding only. -Additionally, NLS updates its name register with the newly created `Declaration`s. +As discussed in [@sec:let-bindings-and-functions] the `let` node may contain both a name and pattern matches. +For either the linearizer generates `Declaration` items and updates its name register. +However, type information is available for name bindings only, meaning pattern matches remain untyped. The same process applies for argument names in function declarations. +Due to argument currying, NLS linearizes only a single argument/pattern at a time. ##### Records From 2a0958c1f9a593d6b4bd39f285f2dbf9c9d43bc0 Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Wed, 9 Feb 2022 16:18:59 +0100 Subject: [PATCH 082/142] Include Value Sta --- chapter/methodology.md | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/chapter/methodology.md b/chapter/methodology.md index caf9017c..974f104c 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -196,13 +196,13 @@ Hence, each `LinearizationItem` is associated with a kind representing the item' ```{.rust #lst:nls-termkind-definition caption="Definition of a linearization items TermKind"} pub enum TermKind { - Declaration(Ident, Vec), + Declaration(Ident, Vec, ValueState), Record(HashMap), RecordField { ident: Ident, record: ID, usages: Vec, - value: Option, + value: ValueState, }, Usage(UsageState), @@ -215,10 +215,19 @@ pub enum UsageState { Resolved(ID), Deferred { parent: ID, child: Ident }, } + +pub enum ValueState { + Unknown, + Known(ID), +} ``` Variable bindings and function arguments - ~ are linearized using the `Declaration` variant which holds the bound identifier as well as a list of `ID`s corresponding to its `Usage`s. + ~ are linearized using the `Declaration` variant which holds + + - the bound identifier + - a list of `ID`s corresponding to its `Usage`s. + - its assigned value Records ~ remain similar to their AST representation. From a5062ea797d8fb236207b26eba1fd38c157a1b37 Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Wed, 9 Feb 2022 16:43:49 +0100 Subject: [PATCH 083/142] Fix dash --- chapter/methodology.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/chapter/methodology.md b/chapter/methodology.md index 974f104c..59bcab71 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -248,7 +248,7 @@ Variable usages 3. For record destructuring resolution of the name might need to be `Deferred` to the post-processing as discussed in [@sec:variable-usage-and-static-record-access]. Other nodes - ~ of the AST that do not participate in the usage graph, are linearized as `Structure` - A wildcard variant with no associated data. + ~ of the AST that do not participate in the usage graph, are linearized as `Structure` -- A wildcard variant with no associated data. From 723f06f0e0c9523ca5d8a41d8af84464867d5ada Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Wed, 9 Feb 2022 16:44:12 +0100 Subject: [PATCH 084/142] Reword records subsection --- chapter/methodology.md | 30 +++++++++++++++--------------- 1 file changed, 15 insertions(+), 15 deletions(-) diff --git a/chapter/methodology.md b/chapter/methodology.md index 59bcab71..26903416 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -501,25 +501,25 @@ digraph G { } ``` -Linearizing records proves more difficult. -In [@sec:graph-representation] the AST representation of Records was discussed. -As shown by [@fig:nickel-record-ast], Nickel does not have AST nodes dedicated to record fields. +[Section @sec:graph-representation] introduced the AST representation of Records. +As suggested by [@fig:nickel-record-ast], Nickel does not have AST nodes dedicated to record fields. Instead, it associates field names with values as part of the `Record` node. -For the language server on the other hand the record field is as important as its value, since it serves as name declaration. -For that reason NLS distinguishes `Record` and `RecordField` as independent kinds of linearization items. - -NLS has to create a separate item for the field and the value. -That is to maintain similarity to the other binding types. -It provides a specific and logical span to reference and allows the value to be of another kind, such as a variable usage like shown in the example. -The language server is bound to process nodes individually. +Since the language server is bound to process nodes individually, in effect, it will only see the values. Therefore, it can not process record values at the same time as the outer record. -Yet, record values may reference other fields defined in the same record regardless of the order, as records are recursive by default. +For the language server it is important to associate field names with their value, as it serves as name declaration. +For that reason, NLS distinguishes `Record` and `RecordField` as independent kinds of linearization items where `RecordFields` act as a bridge between the record and the value named after the field. + +To maintain similarity to other binding types, NLS has to create a separate item for the field and the value. +This also ensures, that the value can be linearized independently. + +Record values may reference other fields defined in the same record regardless of the order, as records are recursive by default. Consequently, all fields have to be in scope and as such be linearized beforehand. -While, `RecordField` items are created while processing the record, they can not yet be connected to the value they represent, as the linearizer can not know the `id` of the latter. -This is because the subtree of each of the fields can be arbitrary large causing an unknown amount of items, and hence intermediate `id`s to be added to the Linearization. +When linearizing a record, NLS will generate `RecordField` items for each field. +However, it can not associate the field's value with the item yet (which is expressed using `ValueState::None`). +This is because the subtree of each field can be arbitrary large, as is the offset of the corresponding linearization items. -A summary of this can be seen for instance on the linearization of the previously discussed record in [@fig:nls-lin-records]. -Here, record fields are linearized first, pointing to some following location. +The visualization ([@fig:nls-lin-records]) of the record discussed in [@lst:nickel-record] gives an example for this. +Here, the first items linearized are record fields. Yet, as the `containers` field value is processed first, the `metadata` field value is offset by a number of fields unknown when the outer record node is processed. ```{.graphviz #fig:nls-lin-records caption="Linearization of a record"} From 6036dc27be498741bfb500a7d094762ece55c1ac Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Wed, 9 Feb 2022 20:33:05 +0100 Subject: [PATCH 085/142] Rewrite Variable access subsection --- chapter/methodology.md | 61 ++++++++++++++++++++++++++++-------------- 1 file changed, 41 insertions(+), 20 deletions(-) diff --git a/chapter/methodology.md b/chapter/methodology.md index 26903416..9aec2645 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -577,30 +577,36 @@ The complete process looks as follows: ##### Variable Reference -While name declaration can happen in several ways, the usage of a variable is always expressed as a `Var` node wrapping a referenced identifier. +The usage of a variable is always expressed as a `Var` node that holds an identifier. Registering a name usage is a multi-step process. -First, NLS tries to find the identifier in its scoped aware name registry. +First, NLS tries to find the identifier in its scope-aware name registry. If the registry does not contain the identifier, NLS will linearize the node as `Unbound`. -In the case that the registry lookup succeeds, NLS retrieves the referenced `Declaration` or `RecordField`. The `Linearizer` will then add the `Resolved` `Usage` item to the linearization and update the declaration's list of usages. +In the case that the registry lookup succeeds, NLS retrieves the referenced `Declaration` or `RecordField`. The linearizer will then add a usage item in the `Resolved` state to the linearization and update the declaration's list of usages. -Looking at the AST representation of record destructuring in [@fig:nickel-static-access] shows that accessing inner records involves chains of unary operations *ending* with a reference to a variable binding. -Each operation encodes one identifier, i.e. field of a referenced record. +##### Resolution of Record Fields + +The AST representation of record destructuring in [@fig:nickel-static-access] shows that accessing inner records involves chains of unary operations *ending* with a reference to a variable binding. +Each operation encodes one field of a referenced record. However, to reference the corresponding declaration, the final usage has to be known. -Therefore, instead of linearizing the intermediate elements directly, the `Linearizer` adds them to a shared stack until the grounding variable reference is reached. +Therefore, instead of linearizing the intermediate elements directly, the `Linearizer` adds them to a shared stack until the grounding variable reference is registered. + Whenever a variable usage is linearized, NLS checks the stack for latent destructors. -If destructors are present, NLS adds `Usage` items for each element on the stack. +If destructors are present, it adds `Usage` items for each element on the stack. +Yet, because records are recursive it is possible that fields reference other fields' values. -Note that record destructors can be used as values of record fields as well and thus refer to other fields of the same record. -As the `Linearizer` processes the field values sequentially, it is possible that a usage references parts of the record that have not yet been processed making it unavailable for NLS to fully resolve. -A visualization of this is provided in [@fig:nls-unavailable-rec-record-field] -For this reason the `Usages` added to the linearization are marked as `Deferred` and will be fully resolved during the post-processing phase as documented in [@sec:resolving-deferred-access]. -In [@fig:ncl-record-access] this is shown visually. -The `Var` AST node is linearized as a `Resolved` usage node which points to the existing `Declaration` node for the identifier. -Mind that this could be a `RecordField` too if referred to in a record. -NLS linearized the trailing access nodes as `Deferred` nodes. +Consider the following example [@lst:nickel-recursive-record], which is depicted in [@fig:nls-unavailable-rec-record-field] +```{.nickel #lst:nickel-recursive-record caption="Example of a recursive record"} +{ + y = { + yy = "foo", + yz = z, + }, + z = y.yy +} +``` ```{.graphviz #fig:nls-unavailable-rec-record-field caption="Example race condition in recursive records. The field `y.yz` cannot be not be referenced at this point as the `y` branch has yet to be linearized"} digraph G { @@ -618,25 +624,40 @@ digraph G { field_yz [label="Field|yz"] } - var_z [label = "Usage|y.yz"] + var_z [label = "Usage|y.yy" ] + var_yz [label = "Usage|z" ] hidden [shape=point, width=0, height = 0] /* Relationships */ record_x -> {field_y, field_z} - field_y -> record_y + field_y -> record_y [color=grey] field_z -> var_z record_y -> {field_yy, field_yz} [color=grey] - var_z -> field_yz [style=dashed, label="Not resolvable"] + field_yz -> var_yz [color=grey] + var_z -> field_yy [style=dashed, label="Not resolvable"] + var_yz -> field_z [constraint=false, style=dashed, color=grey] - var_z -> hidden [style=invis] {rank=same; field_y; field_z } {rank=same; field_yy; field_yz } - {rank=same; record_y; hidden;} + {rank=same; record_y; var_z;} } ``` +Here, a conflict is guaranteed. +As the `Linearizer` processes the field values sequentially in arbitrary order, it is unable to resolve both `y.yz` and `z`. + +Assuming the value for `z` is linearized first, the items corresponding the destructuring of `y` can not be resolved. +While the *field* `y` is known, its value is not (cf. [@sec:records]), from which follows that `yy` is inaccessible. +Yet, `y.yy` will be possible to resolve once the value of `y` is processed. +For this reason the `Usage` generated from the destructor `.yy` is marked as `Deferred` and will be fully resolved during the post-processing phase as documented in [@sec:resolving-deferred-access]. + +In fact, NLS linearized all destructor elements as `Deferred` and resolves the correct references later. +[Figure @fig:ncl-record-access] shows this more clearly for the expression `x.y.z`. +The `Declaration` for `x` is known, therefore its `Var` AST node is linearized as a `Resolved` usage. +Mind that in records `x` could as well be a `RecordField`. + ```{.graphviz #fig:ncl-record-access caption="Depiction of generated usage nodes for record destructuring"} digraph G { node[shape="record", fontname = "Fira Code", fontsize = 9] From a6759ac6de26972af71abd7db56d65dc67c8957d Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Sun, 13 Feb 2022 14:24:07 +0100 Subject: [PATCH 086/142] general typos and words --- chapter/methodology.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/chapter/methodology.md b/chapter/methodology.md index 9aec2645..240bee81 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -744,7 +744,7 @@ digraph G { ##### Metadata In [@sec:meta-information] was shown that on the syntax level, metadata "wraps" the annotated value. -Conversely, NLS encodes metadata in the `LinearizationItem` as metadata is intrinsically related to a value. +Conversely, NLS encodes metadata as part of the `LinearizationItem` as it is considered to be intrinsically related to a value. NLS therefore has to defer handling of the `MetaValue` node until the processing of the associated value in the succeeding call. Like record destructors, NLS temporarily stores this metadata in the `Linearizer`'s memory. @@ -815,7 +815,7 @@ The required metadata needs to be provided by the type checker. #### Resolving by position -As part of the post-processing step discussed in [@sec:post-processing], the `LinearizationItem`s in the `Completed` linearization are reorderd by their occurence of the corresponding AST node in the source file. +As part of the post-processing step discussed in [@sec:post-processing], the `LinearizationItem`s in the completed linearization are reorderd by their occurence of the corresponding AST node in the source file. To find items in this list three preconditions have to hold: 1. Each element has a corresponding span in the source From caee52a677eb13b490208ff290a1ef3c748437db Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Wed, 9 Mar 2022 10:09:29 +0100 Subject: [PATCH 087/142] Update Nickel Syntax Co-authored-by: Yann Hamdaoui --- chapter/methodology.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/chapter/methodology.md b/chapter/methodology.md index 240bee81..5a008a30 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -21,16 +21,16 @@ let Port | doc "A contract for a port number" = let Container = { image | Str, - ports | List #Port, + ports | List Port, } in let NobernetesConfig = { apiVersion | Str, metadata.name | Str, - replicas | #nums.PosNat + replicas | nums.PosNat | doc "The number of replicas" | default = 1, - containers | { _ : #Container }, + containers | { _ : Container }, } in @@ -45,7 +45,7 @@ let webContainer = fun image => { ports = [ 80, 443 ], } in -let image = "k8s.gcr.io/#{name_}" in +let image = "k8s.gcr.io/%{name_}" in { apiVersion = "1.1.0", @@ -54,7 +54,7 @@ let image = "k8s.gcr.io/#{name_}" in containers = { "main container" = webContainer image } -} | #NobernetesConfig +} | NobernetesConfig ``` @@ -449,7 +449,7 @@ null if true then "TRUE :)" else "false :(" // string iterpolation -"#{ "hello" } #{ "world" }!" +"%{ "hello" } %{ "world" }!" ``` In the most common case of general elements, the node is simply registered as a `LinearizationItem` of kind `Structure`. From 7c1d66d030e8b14ed6beaf206b5883d88888c576 Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Wed, 9 Mar 2022 10:16:58 +0100 Subject: [PATCH 088/142] Apply suggestions from review - Typos - Clearer sentences Co-authored-by: Yann Hamdaoui --- chapter/methodology.md | 26 +++++++++++++------------- 1 file changed, 13 insertions(+), 13 deletions(-) diff --git a/chapter/methodology.md b/chapter/methodology.md index 5a008a30..9c3f7a73 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -66,7 +66,7 @@ To answer requests, NLS needs to store more information than what is originally Apart from missing data, an AST is not optimized for quick random access of nodes based on their position, which is a crucial operation for a language server. To that end NLS introduces an auxiliary data structure, the *linearization*, which is derived from the AST. It represents the original data linearly, performs an enrichment of the AST nodes and provides greater decoupling of the LSP functions from the implemented language. -[Section @sec:transfer-from-ast] details the process of transferring the AST. +[Section @sec:transfer-from-ast] details the process of transforming the AST to a linearization. After NLS parsed a Nickel source files to an AST it starts to fill the linearization, which is in a *building* state during this phase. For reasons detailed in [@sec:post-processing], the linearization needs to be post-processed, yielding a *completed* state. The completed linearization acts as the basis to handle all supported LSP requests as explained in [@sec:lsp-server]. @@ -85,7 +85,7 @@ Incremental parsing, type-checking and analysis can still be implemented as a se ### States At its core the linearization in either state is represented by an array of `LinearizationItem`s which are derived from AST nodes during the linearization process. -However, the exact structure of that array is differs as an effect of the post-processing. +However, the exact structure of that array during the different phases differs as an effect of the post-processing. `LinearizationItem`s maintain the position of their AST counterpart, as well as its type. Unlike in the AST, *metadata* is directly associated with the element. @@ -94,7 +94,7 @@ The latter is used to represent a usage graph on top of the linear structure. It distinguishes between declarations (`let` bindings, function parameters, records) and variable usages. Any other kind of structure, for instance, primitive values (Strings, numbers, boolean, enumerations), is recorded as `Structure`. -To separate the phases of the elaboration of the linearization in a type-safe, the implementation is based on type-states[@typestate]. +To separate the phases of the elaboration of the linearization in a type-safe way, the implementation is based on type-states[@typestate]. Type-states were chosen over an enumeration bases approach for the additional flexibility they provide to build a generic interface. Thanks to the generic interface, the adaptions to Nickel to integrate NLS are expected to have almost no influence on the runtime performance of the language in an optimized build. @@ -110,7 +110,7 @@ building phase: post-processing phase: ~ Once fully built, a Building instance is post-processed to get a `Completed` linearization. ~ Although fundamentally still represented by an array, a completed linearization is optimized for search by positions (in the source file) thanks to sorting and the use of an auxiliary map from `id`s to the new index of items. - ~ Additionally, missing edges in the usage graph have been created and he types of items are fully resolved in a completed linearization. + ~ Additionally, missing edges in the usage graph have been created and the types of items are fully resolved in a completed linearization. Type definitions of the `Linearization` as well as its type-states `Building` and `Completed` are listed in [@lst:nickel-definition-lineatization;@lst:nls-definition-building-type;@lst:nls-definition-completed-type]. Note that only the former is defined as part of the Nickel libraries, the latter are specific implementations for NLS. @@ -163,7 +163,7 @@ A stub implementation Since most methods of this implementation are `no-op`s, the compiler should be able to optimize away all `Linearizer` calls in release builds. -```{.graphviz #fig:nls-nickel-structure caption="Interaction of Componenets"} +```{.graphviz #fig:nls-nickel-structure caption="Interaction of Components"} digraph { nls [label="NLS"] nickel [label="Nickel"] @@ -185,7 +185,7 @@ At the core the linearization is a simple *linear* structure. Yet, it represents relationships of nodes on a structural level as a tree-like structure. Taking into account variable usage information adds back-edges to the original AST, yielding a graph structure. Both kinds of edges have to be encoded with the elements in the list. -Alas, items have to be referred to using `id`s since the index of items cannot be relied on(such as in e.g. a binary heap), because the array is reordered to optimize access by source position. +Alas, items have to be referred to using `id`s since the index of items cannot be relied on (such as in e.g. a binary heap), because the array is reordered to optimize access by source position. There are two groups of vertices in such a graph. **Declarations** are nodes that introduce an identifier, and can be referred to by a set of nodes. @@ -278,7 +278,7 @@ Additionally, to keep track of the variables in scope, and iteratively build a u #### Linearizer -The heart of the linearization the `Linearizer` trait as defined in [@lst:nls-linearizer-trait]. +The heart of the linearization is the `Linearizer` trait as defined in [@lst:nls-linearizer-trait]. The `Linearizer` lives in parallel to the `Linearization`. Its methods modify a shared reference to a `Building` `Linearization`. @@ -510,7 +510,7 @@ For the language server it is important to associate field names with their valu For that reason, NLS distinguishes `Record` and `RecordField` as independent kinds of linearization items where `RecordFields` act as a bridge between the record and the value named after the field. To maintain similarity to other binding types, NLS has to create a separate item for the field and the value. -This also ensures, that the value can be linearized independently. +This also ensures that the value can be linearized independently. Record values may reference other fields defined in the same record regardless of the order, as records are recursive by default. Consequently, all fields have to be in scope and as such be linearized beforehand. @@ -562,7 +562,7 @@ digraph G { } ``` -To provide the necessary references, NLS makes used of the *scope safe* memory of its `Linearizer` implementation. +To provide the necessary references, NLS makes use of the *scope safe* memory of its `Linearizer` implementation. This is possible, because each record value corresponds to its own scope. The complete process looks as follows: @@ -608,7 +608,7 @@ Consider the following example [@lst:nickel-recursive-record], which is depicted } ``` -```{.graphviz #fig:nls-unavailable-rec-record-field caption="Example race condition in recursive records. The field `y.yz` cannot be not be referenced at this point as the `y` branch has yet to be linearized"} +```{.graphviz #fig:nls-unavailable-rec-record-field caption="Example lock in recursive records. The field `y.yz` cannot be not be referenced at this point as the `y` branch has yet to be linearized"} digraph G { node [shape=record] spline=false @@ -774,13 +774,13 @@ Since the linearization is performed in a preorder traversal, processing already Yet, during the linearization the location might be unstable or unknown for different items. Record fields for instance are processed in an arbitrary order rather than the order they are defined. Moreover, for nested records and record short notations, symbolic `Record` items are created which cannot be mapped to a physical location and are thus placed at the range `[0..=0]` in the beginning of the file. -Maintaining constant insertion performance and item-referencing require that the linearization is exclusively appended. +Maintaining constant insertion performance and item-referencing requires that the linearization is exclusively appended. Each of these cases, break the physical linearity of the linearization. NLS thus defers reordering of items. The language server uses a stable sorting algorithm to sort items by their associated span's starting position. This way, nesting of items with the same start location is preserved. -Since several operations require efficient access to elements by `id`, which after the sorting does not correspond to the items index in the linearization, after sorting NLS creates an index mapping `id`s to list indices. +Since several operations require efficient access to elements by `id`, which after the sorting does not correspond to the items index in the linearization, after sorting NLS creates an index mapping `id`s to the new actual indices. #### Resolving deferred access @@ -875,7 +875,7 @@ impl Completed { #### Resolving by ID -During the building process item IDs are equal to their index in the underlying List which allows for efficient access by ID. +During the building process item IDs are equal to their index in the underlying array which allows for efficient access by ID. To allow similarly efficient access to nodes with using IDs a `Completed` linearization maintains a mapping of IDs to their corresponding index in the reordered array. A queried ID is first looked up in this mapping which yields an index from which the actual item is read. From 66ec962c715e9f9b0dcf3f870926db5850e9229d Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Wed, 9 Mar 2022 11:41:38 +0100 Subject: [PATCH 089/142] Recall "record access" Co-authored-by: Yann Hamdaoui --- chapter/methodology.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/chapter/methodology.md b/chapter/methodology.md index 9c3f7a73..a3608126 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -591,7 +591,7 @@ Each operation encodes one field of a referenced record. However, to reference the corresponding declaration, the final usage has to be known. Therefore, instead of linearizing the intermediate elements directly, the `Linearizer` adds them to a shared stack until the grounding variable reference is registered. -Whenever a variable usage is linearized, NLS checks the stack for latent destructors. +Whenever a variable usage is linearized, NLS checks the stack for latent destructors (record accesses). If destructors are present, it adds `Usage` items for each element on the stack. Yet, because records are recursive it is possible that fields reference other fields' values. From b681e0716994672ae4f4c1500c88b17e16e59d66 Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Fri, 18 Feb 2022 14:41:03 +0100 Subject: [PATCH 090/142] Attempt to make library separation figure more clear --- chapter/methodology.md | 42 ++++++++++++++++++++++++++++++++---------- 1 file changed, 32 insertions(+), 10 deletions(-) diff --git a/chapter/methodology.md b/chapter/methodology.md index a3608126..617631be 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -165,16 +165,38 @@ A stub implementation ```{.graphviz #fig:nls-nickel-structure caption="Interaction of Components"} digraph { - nls [label="NLS"] - nickel [label="Nickel"] - als [label="Linearizer", shape=box] - stub [label="Stub interface"] - - nls -> nickel [label="uses"] - nls -> als [label="implements"] - stub -> als [label="implements"] - nickel -> als [label="uses"] - nickel -> stub [label="uses"] + splines="ortho" + node [shape=record] + + + + { + node[style=dashed] + nls [label="NLS"] + host [label=Analysis] + } + + { + node[style=dotted] + nickel [label="Nickel"] + stub [label="Stub interface"] + } + + + als [label="Linearizer | \"] + +// {rank=same; host; stub; } + + + + host -> nickel [label="imports", constraint = false] + + stub -> als [label="implements | T = ()"] + nickel -> als [label="calls"] + nickel -> stub [label="defines", style=dashed, color=grey] + nls -> host [label="defines", style=dashed, color=grey] + host -> als [label="implements | T = Nickel"] + } ``` From 249720e90a71913b00db95aa6be99c138b80963b Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Wed, 9 Mar 2022 12:02:37 +0100 Subject: [PATCH 091/142] Address some review comments --- chapter/methodology.md | 22 +++++++++++----------- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/chapter/methodology.md b/chapter/methodology.md index 617631be..4be69019 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -1,9 +1,9 @@ # Design implementation of NLS -This chapter contains a detailed guide through the various steps and components of the Nickel Language Server (NLS). +This chapter guides through the various steps and components of the Nickel Language Server (NLS). Being written in the same language (Rust[@rust]) as the Nickel interpreter allows NLS to integrate existing components for language analysis. -Complementary, NLS is tightly coupled to Nickel's syntax definition. -[Section @sec:linearization] will introduce the main data structure underlying all higher level LSP interactions and how the AST described in [@sec:nickel-ast] is transformed into this form. +Aiming for an abstract interface, NLS defines its own data structure underpinning all higher level LSP interactions. +[Section @sec:linearization] will introduce this `linearization` data structure and explain how NLS bridges the gap between the explicitly handled Nickel AST towards the abstract linearization. Finally, the implementation of current LSP features is discussed in [@sec:lsp-server]. ## Illustrative example @@ -62,7 +62,7 @@ let image = "k8s.gcr.io/%{name_}" in ## Linearization The focus of the NLS as presented in this work is to implement a working language server with a comprehensive feature set. -To answer requests, NLS needs to store more information than what is originally present in a Nickel AST. +To answer requests, NLS needs to store more information than what is originally present in a Nickel AST such as information about references and types. Apart from missing data, an AST is not optimized for quick random access of nodes based on their position, which is a crucial operation for a language server. To that end NLS introduces an auxiliary data structure, the *linearization*, which is derived from the AST. It represents the original data linearly, performs an enrichment of the AST nodes and provides greater decoupling of the LSP functions from the implemented language. @@ -70,32 +70,32 @@ It represents the original data linearly, performs an enrichment of the AST node After NLS parsed a Nickel source files to an AST it starts to fill the linearization, which is in a *building* state during this phase. For reasons detailed in [@sec:post-processing], the linearization needs to be post-processed, yielding a *completed* state. The completed linearization acts as the basis to handle all supported LSP requests as explained in [@sec:lsp-server]. -[Section @sec:resolving-elements] explains how a completed linearization is accessed. +[Section @sec:resolving-elements] explains how a completed linearization is accessed efficiently. Advanced LSP implementations sometimes employ so-called incremental parsing, which allows updating only the relevant parts of an AST (and, in case of NLS, the derived linearization) upon small changes in the source. However, an incremental LSP is not trivial to implement. For once, NLS would not be able to leverage existing components from the existing Nickel implementation (most notably, the parser). -Parts of the nickel runtime, such as the typechecker, would need to be adapted or even reimplemented to work in an incremental way too. +Parts of the nickel runtime, such as the typechecker, would need to be adapted or even reimplemented to work incrementally too. Considering the scope of this thesis, the presented approach performs a complete analysis on every update to the source file. The typical size of Nickel projects is assumed to remain small for quite some time, giving reasonable performance in practice. -Incremental parsing, type-checking and analysis can still be implemented as a second step in the future. +Incremental parsing, type-checking and analysis can still be implemented as a second step in the future after gathering more usage data once nickel and the NLS enjoy greater adoption. ### States At its core the linearization in either state is represented by an array of `LinearizationItem`s which are derived from AST nodes during the linearization process. -However, the exact structure of that array during the different phases differs as an effect of the post-processing. +However, the exact structure of that array differs as an effect of the post-processing. -`LinearizationItem`s maintain the position of their AST counterpart, as well as its type. -Unlike in the AST, *metadata* is directly associated with the element. +`LinearizationItem`s maintain the position of their AST counterpart as well as its type. +Unlike in the AST ([sec:meta-information]), *metadata* is directly associated with the element. Further deviating from the AST representation, the *type* of the node and its *kind* are tracked separately. The latter is used to represent a usage graph on top of the linear structure. It distinguishes between declarations (`let` bindings, function parameters, records) and variable usages. Any other kind of structure, for instance, primitive values (Strings, numbers, boolean, enumerations), is recorded as `Structure`. To separate the phases of the elaboration of the linearization in a type-safe way, the implementation is based on type-states[@typestate]. -Type-states were chosen over an enumeration bases approach for the additional flexibility they provide to build a generic interface. +Type-states were chosen over an enumeration based approach for the additional flexibility they provide to build a generic interface. Thanks to the generic interface, the adaptions to Nickel to integrate NLS are expected to have almost no influence on the runtime performance of the language in an optimized build. NLS implements separate type-states for the two phases of the linearization: `Building` and `Completed`. From 698d777c5a26e08fdf5e7b09ddf9087fe40866a5 Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Wed, 9 Mar 2022 12:05:38 +0100 Subject: [PATCH 092/142] Make post-processing paragraph about reordering clearer --- chapter/methodology.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/chapter/methodology.md b/chapter/methodology.md index 4be69019..4cd67a23 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -792,12 +792,12 @@ After the post-processing the resulting linearization #### Sorting -Since the linearization is performed in a preorder traversal, processing already happens in the order elements are defined physically. +Since the linearization is performed in a preorder traversal, processing already happens in the order elements are defined in the file. Yet, during the linearization the location might be unstable or unknown for different items. Record fields for instance are processed in an arbitrary order rather than the order they are defined. -Moreover, for nested records and record short notations, symbolic `Record` items are created which cannot be mapped to a physical location and are thus placed at the range `[0..=0]` in the beginning of the file. +Moreover, for nested records and record short notations, symbolic `Record` items are created which cannot be mapped to the original source and are thus placed at the range `[0..=0]` in the beginning of the file. Maintaining constant insertion performance and item-referencing requires that the linearization is exclusively appended. -Each of these cases, break the physical linearity of the linearization. +Given the examples above, this breaks the original order of the items with respect to their assigned position. NLS thus defers reordering of items. The language server uses a stable sorting algorithm to sort items by their associated span's starting position. From 39056721fd34ae2006b9716f365d27af501af696 Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Wed, 9 Mar 2022 12:09:39 +0100 Subject: [PATCH 093/142] Apply suggested rewrite about type resolution --- chapter/methodology.md | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/chapter/methodology.md b/chapter/methodology.md index 4cd67a23..10e6be1e 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -827,10 +827,15 @@ If no field with that name is present or the parent points to a `Structure` or ` #### Resolving types -As a necessity for type checking, Nickel generates type variables for any node of the AST which it hands down to the `Linearizer`. +Nickel features type inference in order to relieve the programmer of the burden of writing a lot of redundant type annotations. +In a typed block, the typechecker is able to guess the type of all the values, even when they are not explicitly annotated by the user. +To do so, the typechecker generates constraints derived from inspecting the AST, and solve them along the way. +As a consequence, when a node is first encountered by NLS, its type is not necessarily known. +There, the typechecker associate to the new node a so-called unification variable, which is a placeholder for a later resolved type. +This unification variable is handed down to the `Linearizer`. + +Similar to runtime processing, NLS needs to resolve the final types separately. After typechecking, NLS is eventually able to query the resolved type corresponding to an AST node using the associated unification variable. -In order to provide meaningful information, the Language Server needs to derive concrete types from these variables. -The required metadata needs to be provided by the type checker. ### Resolving Elements From ddc697484253ceb901bc98db11dc6bbe72ebb7d9 Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Wed, 9 Mar 2022 12:13:10 +0100 Subject: [PATCH 094/142] call pattern matches "patterns" Co-authored-by: Yann Hamdaoui --- chapter/methodology.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/chapter/methodology.md b/chapter/methodology.md index 10e6be1e..71751820 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -481,7 +481,7 @@ This applies for all simple expressions like those exemplified in [@lst:nickel-s ##### Declarations In case of `let` bindings or function arguments name binding is equally simple. -As discussed in [@sec:let-bindings-and-functions] the `let` node may contain both a name and pattern matches. +As discussed in [@sec:let-bindings-and-functions] the `let` node may contain both a name and patterns. For either the linearizer generates `Declaration` items and updates its name register. However, type information is available for name bindings only, meaning pattern matches remain untyped. From 9118685cb1cc669682dbaa778f36187375b920fd Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Wed, 9 Mar 2022 12:14:04 +0100 Subject: [PATCH 095/142] Add wikipedia entry of currying as footnote --- chapter/methodology.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/chapter/methodology.md b/chapter/methodology.md index 71751820..ea9b4c96 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -486,7 +486,7 @@ For either the linearizer generates `Declaration` items and updates its name reg However, type information is available for name bindings only, meaning pattern matches remain untyped. The same process applies for argument names in function declarations. -Due to argument currying, NLS linearizes only a single argument/pattern at a time. +Due to argument currying[^https://en.wikipedia.org/wiki/Currying], NLS linearizes only a single argument/pattern at a time. ##### Records From c0910ad783e795a7ae1a7776467f35f3dd4c175b Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Wed, 9 Mar 2022 16:51:26 +0100 Subject: [PATCH 096/142] Typos and commas --- chapter/methodology.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/chapter/methodology.md b/chapter/methodology.md index ea9b4c96..da53a418 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -90,7 +90,7 @@ However, the exact structure of that array differs as an effect of the post-proc `LinearizationItem`s maintain the position of their AST counterpart as well as its type. Unlike in the AST ([sec:meta-information]), *metadata* is directly associated with the element. Further deviating from the AST representation, the *type* of the node and its *kind* are tracked separately. -The latter is used to represent a usage graph on top of the linear structure. +The latter is used to represent a usage graph on top of the linear structure. It distinguishes between declarations (`let` bindings, function parameters, records) and variable usages. Any other kind of structure, for instance, primitive values (Strings, numbers, boolean, enumerations), is recorded as `Structure`. @@ -207,13 +207,13 @@ At the core the linearization is a simple *linear* structure. Yet, it represents relationships of nodes on a structural level as a tree-like structure. Taking into account variable usage information adds back-edges to the original AST, yielding a graph structure. Both kinds of edges have to be encoded with the elements in the list. -Alas, items have to be referred to using `id`s since the index of items cannot be relied on (such as in e.g. a binary heap), because the array is reordered to optimize access by source position. +Alas, items have to be referred to using `id`s since the index of items cannot be relied on (such as in e.g., a binary heap), because the array is reordered to optimize access by source position. There are two groups of vertices in such a graph. **Declarations** are nodes that introduce an identifier, and can be referred to by a set of nodes. Referral is represented by **Usage** nodes. -During the linearization process this graphical model is embeded into the items of the linearization. +During the linearization process this graphical model is embedded into the items of the linearization. Hence, each `LinearizationItem` is associated with a kind representing the item's role in the graph (see: [@lst:nls-termkind-definition]). ```{.rust #lst:nls-termkind-definition caption="Definition of a linearization items TermKind"} @@ -246,7 +246,7 @@ pub enum ValueState { Variable bindings and function arguments ~ are linearized using the `Declaration` variant which holds - + - the bound identifier - a list of `ID`s corresponding to its `Usage`s. - its assigned value @@ -285,7 +285,7 @@ The Nickel language implements lexical scopes with name shadowing. An AST inherently supports this logic. A variable reference always refers to the closest parent node defining the name and scopes are naturally separated using branching. -Each branch of a node represents a sub-scope of its parent, i.e. new declarations made in one branch are not visible in the other. +Each branch of a node represents a sub-scope of its parent, i.e., new declarations made in one branch are not visible in the other. When eliminating the tree structure, scopes have to be maintained in order to provide auto-completion of identifiers and list symbol names based on their scope as context. Since the bare linear data structure cannot be used to deduce a scope, related metadata has to be tracked separately. From 0dd4775a9bf4673b1292210777490b0a45be6509 Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Wed, 9 Mar 2022 16:51:48 +0100 Subject: [PATCH 097/142] More detailed scope description --- chapter/methodology.md | 42 +++++++++++++++++++++++++++++++++++++----- 1 file changed, 37 insertions(+), 5 deletions(-) diff --git a/chapter/methodology.md b/chapter/methodology.md index da53a418..beb3c953 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -292,8 +292,30 @@ Since the bare linear data structure cannot be used to deduce a scope, related m The language server maintains a register for identifiers defined in every scope. This register allows NLS to resolve possible completion targets as detailed in [@sec:resolving-by-scope]. -For simplicity, NLS represents scopes by a prefix list of integers. -Whenever a new lexical scope is entered, the list of the outer scope is extended by a unique identifier. +The globally tracked scope metadata maps `ScopeId`s to a list of identifiers defined in the scope. +An instance of the linearizer is valid for a single scope and hence corresponds to a unique `ScopeId`. +Every item generated by the same linearizer is associated with the `ScopeId` of the instance. +A scope branch during the traversal of the AST is indicated through the `Linearizer::scope()` method. +The `Linearizer::scope()` method creates a new linearizer instance with a new `ScopeId`. +A `ScopeId` in turn is a "scope path", a list of path elements where the prefix is equal to the parent scope's `ScopeId`. +[Listing @lst:nickel-scope-example] shows the scopes for a simple expression in Nickel explictly. + + + +```{.nickel, #lst:nickel-scope-example, caption="Explicit display of Nickel scopes"} +---------------------------------------------+/1 + | +let record | + = { -------------------------------+ /1/1 | + | | + key1 = "value", -------- /1/1/1 | | + key2 = 123, ------------ /1/1/2 | | + | | + }----------------------------------+ | + in record ------------------------/1/2 | + | +---------------------------------------------+ +``` Additionally, to keep track of the variables in scope, and iteratively build a usage graph, NLS keeps track of the latest definition of each variable name and which `Declaration` node it refers to. @@ -904,14 +926,24 @@ impl Completed { During the building process item IDs are equal to their index in the underlying array which allows for efficient access by ID. To allow similarly efficient access to nodes with using IDs a `Completed` linearization maintains a mapping of IDs to their corresponding index in the reordered array. +For instance NLS would represent the example [@lst:nickel-scope-example] as shown in : + +```{.rust, #lst:nls-scopes-elements, caption="Items collected for each scope of the example. Simplified representation using concrete values"} +/1 -> { Declaration("record") } +/1/1 -> { RecordField("key1"), RecordField("key2") } +/1/1/1 -> { "value" } +/1/1/2 -> { 123 } +/1/2 -> { Usage("record") } +``` + A queried ID is first looked up in this mapping which yields an index from which the actual item is read. #### Resolving by scope During the construction from the AST, the syntactic scope of each element is eventually known. -This allows to map scopes to a list of elements defined in this scope. -Definitions from higher scopes are not repeated, instead they are calculated on request. -As scopes are lists of scope fragments, for any given scope the set of referable nodes is determined by unifying IDs of all prefixes of the given scope, then resolving the IDs to elements. +This allows to map an item's `ScopeId` to a list of elements defined in this scope by parent scopes. +As discussed in [@sec:scopes], scopes are lists of scope path elements, where the prefixes correspond to parent scopes. +For any given scope the set of referable nodes is determined by unifying the associated IDs of all prefixes of the given scope, then resolving the IDs to elements. The Rust implementation is given in [@lst:nls-resolve-scope] below. ```{.rust #lst:nls-resolve-scope caption="Resolution of all items in scope"} From f3bc950d548af9f374dd9de321c3cca724a82368 Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Fri, 11 Mar 2022 13:21:24 +0100 Subject: [PATCH 098/142] Fix listings --- chapter/methodology.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/chapter/methodology.md b/chapter/methodology.md index beb3c953..a892a4ad 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -302,7 +302,7 @@ A `ScopeId` in turn is a "scope path", a list of path elements where the prefix -```{.nickel, #lst:nickel-scope-example, caption="Explicit display of Nickel scopes"} +```{.nickel #lst:nickel-scope-example caption="Explicit display of Nickel scopes"} ---------------------------------------------+/1 | let record | @@ -928,7 +928,7 @@ During the building process item IDs are equal to their index in the underlying To allow similarly efficient access to nodes with using IDs a `Completed` linearization maintains a mapping of IDs to their corresponding index in the reordered array. For instance NLS would represent the example [@lst:nickel-scope-example] as shown in : -```{.rust, #lst:nls-scopes-elements, caption="Items collected for each scope of the example. Simplified representation using concrete values"} +```{.rust #lst:nls-scopes-elements caption="Items collected for each scope of the example. Simplified representation using concrete values"} /1 -> { Declaration("record") } /1/1 -> { RecordField("key1"), RecordField("key2") } /1/1/1 -> { "value" } From 4e908cf64b5ed22524b09bce4a525bd158625523 Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Sun, 13 Mar 2022 00:17:08 +0100 Subject: [PATCH 099/142] Apply prelude to chapter preview --- flake.nix | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/flake.nix b/flake.nix index 78a93af9..f2405c93 100644 --- a/flake.nix +++ b/flake.nix @@ -21,7 +21,7 @@ ''; compile-chapter-preview = pkgs.writeShellScriptBin "compile-chapter-preview" '' - pandoc $1 --defaults document.yaml -o "''${@:2}" + pandoc prelude/metadata.yaml prelude/prelude.md $1 --defaults document.yaml -o "''${@:2}" ''; in From 5feded9fdeb590e5ab4d5f3cbfcc7fd69d53b693 Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Mon, 14 Mar 2022 23:20:27 +0100 Subject: [PATCH 100/142] Reword introduction part a bit --- chapter/methodology.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/chapter/methodology.md b/chapter/methodology.md index a892a4ad..906c4a3a 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -1,7 +1,6 @@ # Design implementation of NLS -This chapter guides through the various steps and components of the Nickel Language Server (NLS). -Being written in the same language (Rust[@rust]) as the Nickel interpreter allows NLS to integrate existing components for language analysis. +This chapter guides through the components of the Nickel Language Server (NLS) as well as the implementation details of the source code analysis and information querying. Aiming for an abstract interface, NLS defines its own data structure underpinning all higher level LSP interactions. [Section @sec:linearization] will introduce this `linearization` data structure and explain how NLS bridges the gap between the explicitly handled Nickel AST towards the abstract linearization. Finally, the implementation of current LSP features is discussed in [@sec:lsp-server]. From ea15cb59d2d03315c1b217cfc9ffdd657b6de3e2 Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Mon, 14 Mar 2022 23:20:50 +0100 Subject: [PATCH 101/142] Add "Key objectives section" --- chapter/methodology.md | 54 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 54 insertions(+) diff --git a/chapter/methodology.md b/chapter/methodology.md index 906c4a3a..2caa44d4 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -5,6 +5,60 @@ Aiming for an abstract interface, NLS defines its own data structure underpinnin [Section @sec:linearization] will introduce this `linearization` data structure and explain how NLS bridges the gap between the explicitly handled Nickel AST towards the abstract linearization. Finally, the implementation of current LSP features is discussed in [@sec:lsp-server]. +## Key Objectives + +The following points are considered key objectives of this thesis implemented in particular for the Nickel Language Server. + +### Performance + +The usefulness of a language server correlates with its performance. +It may cause stutters in the editor, or prompt users to wait for responses when upon issuing LSP commands. +Different studies suggest that interruptions are detrimentatl to programmers productivity [@interruption-1, @interruption-2]. The more often and longer a task is interrupted the higher the frustration. +Hence, as called for in RQ.1 (cf. [@sec:research-questions]), a main criterion for the language server is its performance. + +Speaking of language servers there are two tasks that require processing, and could potentially cause interruptions. + +Upon source code changes, a language server may reprocess the code to gather general information, and provide diagnostics. +Since, for this the LSP uses notifications, and language servers generally run as separate processes, delays in processing may not directly affect the programmer. +However, depending on the implementation of the server, multiple changes may queue up preventing the timely response to other requests or delaying diagnostics. + +The JSON-RPC protocol underlying the LSP, is a synchronous protocol. +Each request requires that the server responded to the previous request before processing can begin. +Moreover, the order of requests has to be maintained. +Since many requests are issued implicitly by the editor, e.g., hover requests, there is a risk of request queuing which could delay the processing of explicit commands. +It is therefore important to provide nearly instantaneous replies to requests. + +It is to mention that the LSP defines "long running" requests, that may run in the background. +This concept mitigates queuing but can lead to similarly bad user experience as responses appear out of order or late. + +### Capability + +The second objective is to provide an LSP server that offers the most common LSP features as identified by [@langserver-org]. +Concretely, these capabilities are: + +1. Code completion, +2. Hover information, +3. Jump to definition, +4. Find references, +5. Workspace symbols, +6. Diagnostics + +For the work on NLS these six capabilities were considered as the goal for a minimal viable product. + +### Flexibility + +The Nickel Language just faced its initial release so changes and additions to the language are inevitable. +Since, NLS is expressed as the official tooling solution for the language, it has to be able to keep up with Nickel's development. +Therefore, the architecture needs to be flexible and simple enough to accommodate changes to the language's structure while remaining the server's capabilities and requiring little changes to the language core. +Likewise, extending the capabilities of the server should be simple enough and designed such future developers are able to pick up the work on NLS. + +### Generalizability + +In the interest of the academic audience and future developers of language servers, this thesis aims to present a reusable solution. +The implementation of NLS as examined in this thesis should act as an implementation example that can be applied to other, similar languages. +As a result the requirements on the language and its implementation should be minimal. +Also, the Language servers should not depend on the implementation of Nickel (e.g. types) too deeply. + ## Illustrative example The example [@lst:nickel-complete-example] shows an illustrative high level configuration of a server. From a9c0cf8e31af65072a98a8b93b39b5d48b917c3d Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Tue, 15 Mar 2022 01:40:50 +0100 Subject: [PATCH 102/142] Ad design decision section --- chapter/methodology.md | 54 ++++++++++++++++++++++++++++++++++++------ 1 file changed, 47 insertions(+), 7 deletions(-) diff --git a/chapter/methodology.md b/chapter/methodology.md index 2caa44d4..ab1c8660 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -59,6 +59,53 @@ The implementation of NLS as examined in this thesis should act as an implementa As a result the requirements on the language and its implementation should be minimal. Also, the Language servers should not depend on the implementation of Nickel (e.g. types) too deeply. + +## Design Decisions + +[@sec:considerable-dimensions] + +### Programming language + +Rust ([@rust]) was chosen as the implementing language of NLS primarily since Nickel itself is written in Rust. +Being written in the same language as the Nickel interpreter allows NLS to integrate existing components for language analysis. +This way, changes to the Nickel syntax or code analysis impose minimal adaptation of the Language Server. + +In fact, using any other language was never considered since that would require a separate implementation of integral parts of Nickel, which are actively being modified. + +Additionally, Rust has proven itself as a language for LSP Servers. +According to the official Rust language website [@rust], Rust is a low-level programming language that focuses on performance, reliability and productivity. +It is most known for its `trait` oriented design, algebraic data types [@adt-wiki?] and safety, while offering native performance comparable to C languages. + +The concept of `traits` [@traits] was chosen over common object inheritance as observed in Java or C#. +Instead, `traits` define composable interfaces without the complexities of nesting classes. +Effectively a `trait` is simply a set of methods implemented for a certain data type. + +Safety comes in form of *memory* safety, which is enforced by Rust's ownership model[@rust-ownership-model]. +A different kind of safety is *type* safety which is an implication of Rust's strong type system and `trait` based generics. + +Lastly, Rust has been employed by multiple LSP servers [@lib.rs#language-servers] which created a rich ecosystem of server abstractions. + +### File processing + +Earlier two differnet file processing models were discussed in [@sec:considerable-dimensions], incremental and complete processing. + +LSP implementations may employ so-called incremental parsing, which allows updating only the relevant parts of its source code model upon small changes in the source. +However, an incremental LSP is not trivial to implement, which is why it is mainly found in more complex servers such as the Rust Analyzer [@rust-analyzer] or the OCaml Language Server [@ocaml-lsp, @merlin]. + +Implementing an incremental LSP server for Nickel would be impractical. +NLS would not be able to leverage existing components from the non-incremental Nickel implementation (most notably, the parser). +Parts of the nickel runtime, such as the type checker, would need to be adapted or even reimplemented to work incrementally too. +Considering the scope of this thesis, the presented approach performs a complete analysis on every update to the source file. +The typical size of Nickel projects is assumed to remain small for quite some time, giving reasonable performance in practice. +Incremental parsing, type-checking and analysis can still be implemented as a second step in the future after gathering more usage data once nickel and the NLS enjoy greater adoption. + +### Code Analysis + +Code analysis approaches as introduced in [@sec:considerable-dimensions] can have both *lazy* and *eager* qualities. +Lazy solutions are generally more compatible with an incremental processing model, since these aim to minimizing the change induced computation. +NLS prioritizes to optimize for efficient queries to a pre-processed data model. +Similar to the file processing argument in [@sec:file-pressng], it is assumed that Nickel project's size allows for efficient enoigh eager analysis prioritizing a more straight forward implementation over optimized performance. + ## Illustrative example The example [@lst:nickel-complete-example] shows an illustrative high level configuration of a server. @@ -125,13 +172,6 @@ For reasons detailed in [@sec:post-processing], the linearization needs to be po The completed linearization acts as the basis to handle all supported LSP requests as explained in [@sec:lsp-server]. [Section @sec:resolving-elements] explains how a completed linearization is accessed efficiently. -Advanced LSP implementations sometimes employ so-called incremental parsing, which allows updating only the relevant parts of an AST (and, in case of NLS, the derived linearization) upon small changes in the source. -However, an incremental LSP is not trivial to implement. -For once, NLS would not be able to leverage existing components from the existing Nickel implementation (most notably, the parser). -Parts of the nickel runtime, such as the typechecker, would need to be adapted or even reimplemented to work incrementally too. -Considering the scope of this thesis, the presented approach performs a complete analysis on every update to the source file. -The typical size of Nickel projects is assumed to remain small for quite some time, giving reasonable performance in practice. -Incremental parsing, type-checking and analysis can still be implemented as a second step in the future after gathering more usage data once nickel and the NLS enjoy greater adoption. From 8aec8ae1933653e95063def0d5a713c46f31f74a Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Wed, 16 Mar 2022 18:40:29 +0100 Subject: [PATCH 103/142] Elaborate Example --- chapter/methodology.md | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/chapter/methodology.md b/chapter/methodology.md index ab1c8660..edd11e37 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -109,6 +109,16 @@ Similar to the file processing argument in [@sec:file-pressng], it is assumed th ## Illustrative example The example [@lst:nickel-complete-example] shows an illustrative high level configuration of a server. +Using Nickel, this file would be used to define the schema of the configuration format of another program. +Evaluation and validation is done in the context of Nickel, after which the evaluated structure is translated into a more common (but less expressive) format such as YAML or JSON. +Here, the chema for a configuration of a Kubernetes-like [@kubernetes] tool is defined using contracts, making exemplary use of variables and functions. +Specifically, it describes a way to provision named containers. +The user is able to specify container images and opened ports, as well as define metadata of the deployment. +The configuration is constrained by the `NobernetesConfig` contract. +The contract in turn defines the required fields and field types. +Notably, the fields `containers` and `replicas` are further constrained by individual contracts. +The `Port` contract is a logical contract that ensures the value is in the range of valid port numbers. +The example also shows different ways of declaring types (i.e. constraining record value types), string interpolation, as well as the usage of let bindings with standard types. Throughout this chapter, different sections about the NSL implementation will refer back to this example. ```{.nickel #lst:nickel-complete-example caption="Nickel example with most features shown"} From 1c4ff64298039a8c2fb24f7666268150199d4c9d Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Wed, 16 Mar 2022 19:21:44 +0100 Subject: [PATCH 104/142] More about traits and changes to rust intro --- chapter/methodology.md | 21 ++++++++++++++------- 1 file changed, 14 insertions(+), 7 deletions(-) diff --git a/chapter/methodology.md b/chapter/methodology.md index edd11e37..76294cdd 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -70,20 +70,27 @@ Rust ([@rust]) was chosen as the implementing language of NLS primarily since Ni Being written in the same language as the Nickel interpreter allows NLS to integrate existing components for language analysis. This way, changes to the Nickel syntax or code analysis impose minimal adaptation of the Language Server. -In fact, using any other language was never considered since that would require a separate implementation of integral parts of Nickel, which are actively being modified. +In fact, using any other language was never considered since that would have required a separate implementation of integral parts of Nickel, which are actively being developed. Additionally, Rust has proven itself as a language for LSP Servers. -According to the official Rust language website [@rust], Rust is a low-level programming language that focuses on performance, reliability and productivity. -It is most known for its `trait` oriented design, algebraic data types [@adt-wiki?] and safety, while offering native performance comparable to C languages. +Lastly, Rust has already been employed by multiple LSP servers [@lib.rs#language-servers] which created a rich ecosystem of server abstractions. +For instance the largest and most advaced LSP implementation in Rust -- the Rust Analyzer [@rust-analyzer] -- has contributed many tools such as an LSP server interface [@lsp-server-interface] and a refactoring oriented syntax tree represation [@rowan]. +Additionally, lots of smaller languages [@gluon, @slint, @mojom] implement Language Servers in Rust. +Rust appears to be a viable choice even for languages that are not originally implemented in Rust, such as Nix [@nix, @rninx-lsp]. -The concept of `traits` [@traits] was chosen over common object inheritance as observed in Java or C#. -Instead, `traits` define composable interfaces without the complexities of nesting classes. -Effectively a `trait` is simply a set of methods implemented for a certain data type. +In Rust the concept of `traits` [@traits] is fundamental, for the following reasons: +Traits are definitions of shared behavior. +Similar to interfaces in other languages, a trait defines a set of methods. +Traits are implemented for a type, exposing the defined methods on instances of the type. +Rust's support for generics[@generics] allows constraining arguments and structure fields to implementors of a certain trait. + + +Rust also excels due to its various safety features and performane. Safety comes in form of *memory* safety, which is enforced by Rust's ownership model[@rust-ownership-model]. A different kind of safety is *type* safety which is an implication of Rust's strong type system and `trait` based generics. +Finally, as Rust leverages the LLVM infrastructure and requires no runtime, its performance rivals the traditional C languages. -Lastly, Rust has been employed by multiple LSP servers [@lib.rs#language-servers] which created a rich ecosystem of server abstractions. ### File processing From 2c2548946fadd3189b17f1ce95101f10bd1cd71a Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Wed, 16 Mar 2022 19:22:28 +0100 Subject: [PATCH 105/142] Add more pandoc filters for plantuml and code inclusion --- flake.lock | 6 +++--- flake.nix | 6 +++++- 2 files changed, 8 insertions(+), 4 deletions(-) diff --git a/flake.lock b/flake.lock index 386b4549..df923ed8 100644 --- a/flake.lock +++ b/flake.lock @@ -61,11 +61,11 @@ ] }, "locked": { - "lastModified": 1641408929, - "narHash": "sha256-AwDTUe1ZlJSsaq4CuapMAaaBMzvYqhOILopafoz3Fg0=", + "lastModified": 1647002964, + "narHash": "sha256-lPpmrX8TPAWoEw9PGp6QH2pn24slMK3eu4pB4XvmQ+k=", "owner": "ysndr", "repo": "writing-tools", - "rev": "416877191c48b2c7ddcdd33f5a6b6da60889bfea", + "rev": "384e9fc38858f485479f022b24e39677aacea2ff", "type": "github" }, "original": { diff --git a/flake.nix b/flake.nix index f2405c93..2cd69496 100644 --- a/flake.nix +++ b/flake.nix @@ -32,8 +32,10 @@ writing.latex (writing.pandoc.override { filters = [ + "pandoc-include-code" (builtins.fetchurl "https://raw.githubusercontent.com/jgm/pandocfilters/f850b22/examples/graphviz.py") (builtins.fetchurl "https://raw.githubusercontent.com/jgm/pandocfilters/f850b22/examples/tikz.py") + (builtins.fetchurl "https://raw.githubusercontent.com/timofurrer/pandoc-plantuml-filter/master/pandoc_plantuml_filter.py") (pkgs.writeShellScript "pandoc-mermaid" '' MERMAID_BIN=${pkgs.nodePackages.mermaid-cli}/bin/mmdc exec python ${builtins.fetchurl "https://raw.githubusercontent.com/timofurrer/pandoc-mermaid-filter/master/pandoc_mermaid_filter.py"} '') @@ -43,12 +45,14 @@ # (builtins.fetchurl "https://raw.githubusercontent.com/tomduck/pandoc-fignos/master/pandoc_fignos.py") ]; citeproc = true; - extraPackages = [ pkgs.graphviz ]; + extraPackages = [ pkgs.graphviz pkgs.plantuml pkgs.haskellPackages.pandoc-include-code ]; pythonExtra = p: [ p.pygraphviz p.psutil ]; }) compile-all compile-toc compile-chapter-preview + + pkgs.graphviz pkgs.plantuml ]; }; }); From 904ad5b0d331f2cb6c0c707ea4dab4febf315732 Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Wed, 16 Mar 2022 19:23:01 +0100 Subject: [PATCH 106/142] Add plantuml definition of nls type structure. --- assets/class-diagram.plantuml | 151 ++++++++++++++++++++++++++++++++++ 1 file changed, 151 insertions(+) create mode 100644 assets/class-diagram.plantuml diff --git a/assets/class-diagram.plantuml b/assets/class-diagram.plantuml new file mode 100644 index 00000000..d2134e00 --- /dev/null +++ b/assets/class-diagram.plantuml @@ -0,0 +1,151 @@ +@startuml +skinparam linetype ortho +skinparam groupInheritance 2 + + +package ls::interface { + + class Linearizer << (T,coral) >> { + -- Associated Types -- + type Building; + type Completed; + type CompletionExtra; + + -- Methods -- + add_term(\n\ + &mut self,\n\ + &mut Linearization,\n\ + &Term,\n\ + TermPos,\n\ + TypeWrapper + ) + retype_ident(\n\ + &mut self,\n\ + &mut Linearization,\n\ + &Ident,\n\ + TypeWrapper, + ) + complete(Linearization)\n\ + -> Linearization + + } + + class Linearization << (S,steelblue) >> { + - state : T + } + + + class LinearizationState << (T,coral) >> {} + + class Building << (S,steelblue) >> { + + Vec>, + + scope: HashMap>, + } + + class Completed << (S,steelblue) >> { + + Vec>, + - scope: HashMap>, + - id_to_index: HashMap, + + -- + + get_item(&self, usize) \n\ + -> Option<&LinearizationItem> + + get_in_scope(&self, LinearizationItem)\n\ + -> Vec<&LinearizationItem> + + item_at(&self,&Location) \n\ + -> Option<&LinearizationItem> + } + + class LinearizationItem << (S,steelblue) >> { + + id: usize, + + pos: RawSpan, + + ty: S, + + kind: TermKind, + + scope: Scope, + + meta: Option, + } + + together { + circle Scope << (S,steelblue) >> {} + circle ScopeId << (S,steelblue) >> {} + circle Environment << (S,steelblue) >> {} + circle TermKind + } +} + +package nls { + + class AnalysisHost << (S,steelblue) >> { + - env: Environment, + - scope: Scope, + - next_scope_id: ScopeId, + - meta: Option, + - let_binding: Option, + - access: Option>, + - {field} record_fields: Option<(ID, Vec<(ID, Ident)>)>, + + } + + class "Linearizer for AnalysisHost" as impl_linearizer_for_AnalysisHost << (I,lime) >> { + -- Associated Types -- + type Building = Building; + type Completed = Completed; + type CompletionExtra = (UnifTable, HashMap); + } + +} + + + + +package Nickel { + + circle MetaValue << (S,steelblue) >> {} + + circle Term << (S,steelblue) >> {} + + circle RawSpan << (S,steelblue) >> {} + + circle Ident << (S,steelblue) >> {} + + circle TypeWrapper << (S,steelblue) >> {} + +} + +AnalysisHost --|> Linearizer +impl_linearizer_for_AnalysisHost . (AnalysisHost, Linearizer) : "Implementation" +impl_linearizer_for_AnalysisHost --+ (Building, Completed) +impl_linearizer_for_AnalysisHost ..> Ident + + + +Completed <--o "*" LinearizationItem +Completed --|> LinearizationState +Building <--o "*" LinearizationItem +Building --|> LinearizationState + +Building ..> Scope + +AnalysisHost <--* Scope +AnalysisHost <--* ScopeId +AnalysisHost <--* Environment +AnalysisHost <--o MetaValue +AnalysisHost <--o Ident + + + + +Linearizer ..|> Linearization +Linearizer ..|> Term +Linearizer ..|> TypeWrapper +Linearizer ..|> Ident + + +LinearizationItem ..|> RawSpan +LinearizationItem ..|> Scope +LinearizationItem ..|> TermKind +LinearizationItem ..|> MetaValue + + +Completed <--o Scope +@enduml From 72ffa07340bcc658afbe67949793632ad0fe98af Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Thu, 17 Mar 2022 14:47:33 +0100 Subject: [PATCH 107/142] Lead into rust features --- chapter/methodology.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/chapter/methodology.md b/chapter/methodology.md index 76294cdd..6d01b276 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -86,7 +86,7 @@ Rust's support for generics[@generics] allows constraining arguments and structu -Rust also excels due to its various safety features and performane. +Rust also excels due to its various safety features and performance, for the following reasons. Safety comes in form of *memory* safety, which is enforced by Rust's ownership model[@rust-ownership-model]. A different kind of safety is *type* safety which is an implication of Rust's strong type system and `trait` based generics. Finally, as Rust leverages the LLVM infrastructure and requires no runtime, its performance rivals the traditional C languages. From 0ebbbf796997de19a610b751e717725a6c205ad9 Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Thu, 17 Mar 2022 14:59:36 +0100 Subject: [PATCH 108/142] More clarifications on traits --- chapter/methodology.md | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/chapter/methodology.md b/chapter/methodology.md index 6d01b276..3b026919 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -81,13 +81,12 @@ Rust appears to be a viable choice even for languages that are not originally im In Rust the concept of `traits` [@traits] is fundamental, for the following reasons: Traits are definitions of shared behavior. Similar to interfaces in other languages, a trait defines a set of methods. -Traits are implemented for a type, exposing the defined methods on instances of the type. -Rust's support for generics[@generics] allows constraining arguments and structure fields to implementors of a certain trait. - - +One implements a trait for a certain type, by defining the behavior in the context of the type. +Rust's support for generics[@generics] allows constraining arguments and structure fields to implementors of a certain trait allowing to abstract concrete behavior from its interface. Rust also excels due to its various safety features and performance, for the following reasons. -Safety comes in form of *memory* safety, which is enforced by Rust's ownership model[@rust-ownership-model]. +Safety comes in form of *memory* safety, which is enforced by Rust's ownership model[@rust-ownership-model] and explicit memory handling. +The developer in turn needs to be aware of the implications of stack or heap located variables and their size in memory. A different kind of safety is *type* safety which is an implication of Rust's strong type system and `trait` based generics. Finally, as Rust leverages the LLVM infrastructure and requires no runtime, its performance rivals the traditional C languages. From 4a29e6b5cd0a17a4b2d347faf7e56edeb01cee02 Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Thu, 17 Mar 2022 16:02:48 +0100 Subject: [PATCH 109/142] implrove linearization intro --- chapter/methodology.md | 23 ++++++++++++++++------- 1 file changed, 16 insertions(+), 7 deletions(-) diff --git a/chapter/methodology.md b/chapter/methodology.md index 3b026919..3249d3f2 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -180,13 +180,22 @@ let image = "k8s.gcr.io/%{name_}" in The focus of the NLS as presented in this work is to implement a working language server with a comprehensive feature set. To answer requests, NLS needs to store more information than what is originally present in a Nickel AST such as information about references and types. Apart from missing data, an AST is not optimized for quick random access of nodes based on their position, which is a crucial operation for a language server. -To that end NLS introduces an auxiliary data structure, the *linearization*, which is derived from the AST. -It represents the original data linearly, performs an enrichment of the AST nodes and provides greater decoupling of the LSP functions from the implemented language. -[Section @sec:transfer-from-ast] details the process of transforming the AST to a linearization. -After NLS parsed a Nickel source files to an AST it starts to fill the linearization, which is in a *building* state during this phase. -For reasons detailed in [@sec:post-processing], the linearization needs to be post-processed, yielding a *completed* state. -The completed linearization acts as the basis to handle all supported LSP requests as explained in [@sec:lsp-server]. -[Section @sec:resolving-elements] explains how a completed linearization is accessed efficiently. + +To that end NLS introduces an auxiliary data structure, the so-called linearization. +The linearization is a linear representation of the program and consists of linearization items. +It is derived node by node, from the program's AST by the means of a recursive tree traversal. +The transfer process generates a set of linearization items for every node. +The kind of the items as well as any additional type information and metadata are determined by the state of the linearization, and the implementation of the process, also called linearizer. +Transferring AST nodes into an intermediate structure has the additional advantage of establishing a boundary between the language dependent and generic part of the language server, since linearization items could be implemented entirely language independent. +The transfer process is described in greater detail in [@sec:transfer-from-ast]. + +The linearization can be in the following two different general states that align with the two phases of its life cycle. +While NLS processes the AST, it is considered to be in a building state. +After the AST is fully transferred, the linearization enters the second, phase in which it is referred to as completed and used by the server to facilitate answering LSP requests. +The two states are syntactically separate and implementation dependent through the use of different types and the generic interface that allows the independent implementations of the linearizer. +Since different types represent the two states, the building state is explicitly transformed into a completed type allowing for additional post-processing (cf. [@sec:post-processing]). +To fully support all actions implemented by the server, the completed linearization provides several methods to access specific items efficiently. +The implementation of these methods is explained in [@sec:resolving-elements]. From 7be11c32f6e0b4aa6ec820c5d3794595280c4292 Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Thu, 17 Mar 2022 16:03:58 +0100 Subject: [PATCH 110/142] Apply suggestions from code review Improvements to linearizaion intro --- chapter/methodology.md | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/chapter/methodology.md b/chapter/methodology.md index 3249d3f2..12688ad0 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -177,9 +177,12 @@ let image = "k8s.gcr.io/%{name_}" in ## Linearization -The focus of the NLS as presented in this work is to implement a working language server with a comprehensive feature set. -To answer requests, NLS needs to store more information than what is originally present in a Nickel AST such as information about references and types. -Apart from missing data, an AST is not optimized for quick random access of nodes based on their position, which is a crucial operation for a language server. +The focus of the NLS as presented in this work is to implement a foundational set of LSP features as described in [@sec:capability]. +In order to process these capabilities efficiently as per [@sec:performance], NLS needs to store more information than what is originally present in a Nickel AST (cf. [@sec:nickel-ast]), such as information about references and types. +While these can be deduced from the AST lazily, it would require the repeated traversal of arbitrarily large tree with an associated cost to performance. +Therefore as hinted in [@sec:code-analysis], optimization is directed to efficient lookup from a pre-processed report. +Since most LSP commands refer to code positions, the intermediate structure must allow efficient lookup of analysis results based on positions. + To that end NLS introduces an auxiliary data structure, the so-called linearization. The linearization is a linear representation of the program and consists of linearization items. From 1f4e54b91b35386781070a36066b3c1b7f66da8d Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Thu, 17 Mar 2022 16:23:17 +0100 Subject: [PATCH 111/142] Update chapter/methodology.md Design decisions intro --- chapter/methodology.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/chapter/methodology.md b/chapter/methodology.md index 12688ad0..576554ca 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -62,7 +62,8 @@ Also, the Language servers should not depend on the implementation of Nickel (e. ## Design Decisions -[@sec:considerable-dimensions] +[Section @sec:considerable-dimensions] introduced several considerations with respect to the implementation of language servers. +Additionally, in [@sec:representative-lsp-projects] presents examples of different servers which guided the decisions made while implementing the NLS. ### Programming language From ac2bfe95b7129bf1ce8d79f1ca72f9811db97721 Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Thu, 17 Mar 2022 16:29:22 +0100 Subject: [PATCH 112/142] Update chapter/methodology.md --- chapter/methodology.md | 1 + 1 file changed, 1 insertion(+) diff --git a/chapter/methodology.md b/chapter/methodology.md index 576554ca..6a144a20 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -64,6 +64,7 @@ Also, the Language servers should not depend on the implementation of Nickel (e. [Section @sec:considerable-dimensions] introduced several considerations with respect to the implementation of language servers. Additionally, in [@sec:representative-lsp-projects] presents examples of different servers which guided the decisions made while implementing the NLS. +Additionally, in [@sec:representative-lsp-projects] presents examples of different servers which guided the decisions made while implementing the NLS. ### Programming language From b1ba3cde6cb9d6db43fa7df714c2e6bac3683111 Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Thu, 17 Mar 2022 16:43:45 +0100 Subject: [PATCH 113/142] Update chapter/methodology.md More on typesafety --- chapter/methodology.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/chapter/methodology.md b/chapter/methodology.md index 6a144a20..43a97dfb 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -90,6 +90,9 @@ Rust also excels due to its various safety features and performance, for the fol Safety comes in form of *memory* safety, which is enforced by Rust's ownership model[@rust-ownership-model] and explicit memory handling. The developer in turn needs to be aware of the implications of stack or heap located variables and their size in memory. A different kind of safety is *type* safety which is an implication of Rust's strong type system and `trait` based generics. +Type-safe languages such as Rust enforce explicit usage of data types for variables and function definitions. +Type annotations ensure that methods and fields can be accessed as part of the compilation saving users from passing incompatible data to functions. +This eliminating a common runtime failures as seen in dynamic languages like Python or JavaScript. Finally, as Rust leverages the LLVM infrastructure and requires no runtime, its performance rivals the traditional C languages. From 6b02010657b8480d079c9877d0d2b543d96e95c1 Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Mon, 21 Mar 2022 09:50:47 +0100 Subject: [PATCH 114/142] Update class diagram --- assets/class-diagram.plantuml | 236 +++++++++++++++++++++------------- 1 file changed, 145 insertions(+), 91 deletions(-) diff --git a/assets/class-diagram.plantuml b/assets/class-diagram.plantuml index d2134e00..00ebe638 100644 --- a/assets/class-diagram.plantuml +++ b/assets/class-diagram.plantuml @@ -2,79 +2,97 @@ skinparam linetype ortho skinparam groupInheritance 2 - -package ls::interface { - - class Linearizer << (T,coral) >> { - -- Associated Types -- - type Building; - type Completed; - type CompletionExtra; - - -- Methods -- - add_term(\n\ - &mut self,\n\ - &mut Linearization,\n\ - &Term,\n\ - TermPos,\n\ - TypeWrapper - ) - retype_ident(\n\ - &mut self,\n\ - &mut Linearization,\n\ - &Ident,\n\ - TypeWrapper, - ) - complete(Linearization)\n\ - -> Linearization - - } - - class Linearization << (S,steelblue) >> { - - state : T +package "Language Server Abstraction" { + + package Analysis { + + class Linearizer << (T,coral) >> { + -- Associated Types -- + type Building; + type Completed; + type CompletionExtra; + + -- Methods -- + + add_term(\n\ + &mut self,\n\ + &mut Linearization,\n\ + &Term,\n\ + TermPos,\n\ + TypeWrapper + ) + + +complete(Linearization)\n\t-> Linearization + + } + + class Linearization << (S,steelblue) >> { + - state : T + - update_file(FileId, String) + } + + + class LinearizationState << (T,coral) >> {} + + class Building << (S,steelblue) >> { + + Vec>, + + scope: HashMap>, + } + + class Completed << (S,steelblue) >> { + + Vec>, + - scope: HashMap>, + - id_to_index: HashMap, + + -- + + get_item(&self, usize) \n\ + -> Option<&LinearizationItem> + + get_in_scope(&self, LinearizationItem)\n\ + -> Vec<&LinearizationItem> + + item_at(&self,&Location) \n\ + -> Option<&LinearizationItem> + } + + class LinearizationItem << (S,steelblue) >> { + + id: usize, + + pos: RawSpan, + + ty: S, + + kind: TermKind, + + scope: Scope, + + meta: Option, + } } + package "LSP Implementations" { + class Handler << (T,coral) >> { + + handle(Server, R, &Linearization) + } - class LinearizationState << (T,coral) >> {} - - class Building << (S,steelblue) >> { - + Vec>, - + scope: HashMap>, + class GoToRef << (S,steelblue) >> {} + class GoToDef << (S,steelblue) >> {} + class Complete << (S,steelblue) >> {} + class Symbols << (S,steelblue) >> {} + class Hover << (S,steelblue) >> {} } - class Completed << (S,steelblue) >> { - + Vec>, - - scope: HashMap>, - - id_to_index: HashMap, - - -- - + get_item(&self, usize) \n\ - -> Option<&LinearizationItem> - + get_in_scope(&self, LinearizationItem)\n\ - -> Vec<&LinearizationItem> - + item_at(&self,&Location) \n\ - -> Option<&LinearizationItem> - } - class LinearizationItem << (S,steelblue) >> { - + id: usize, - + pos: RawSpan, - + ty: S, - + kind: TermKind, - + scope: Scope, - + meta: Option, - } - - together { - circle Scope << (S,steelblue) >> {} - circle ScopeId << (S,steelblue) >> {} - circle Environment << (S,steelblue) >> {} - circle TermKind + package "Support Types" { + class Scope << (S,steelblue) >> {} + class ScopeId << (S,steelblue) >> {} + class Environment << (S,steelblue) >> {} + class TermKind << (S,steelblue) >> {} } } package nls { + class Server << (S,steelblue) >> { + - cache: HashMap> + + ~ receive(Request) + ~ reply(Response) + } + note right of Server: asd + class AnalysisHost << (S,steelblue) >> { - env: Environment, - scope: Scope, @@ -90,62 +108,98 @@ package nls { -- Associated Types -- type Building = Building; type Completed = Completed; - type CompletionExtra = (UnifTable, HashMap); } } +package Nickel { + package types { + circle MetaValue << (S,steelblue) >> {} -package Nickel { + circle Term << (S,steelblue) >> {} - circle MetaValue << (S,steelblue) >> {} + circle RawSpan << (S,steelblue) >> {} - circle Term << (S,steelblue) >> {} + circle Ident << (S,steelblue) >> {} - circle RawSpan << (S,steelblue) >> {} + circle TypeWrapper << (S,steelblue) >> {} - circle Ident << (S,steelblue) >> {} + } + + class Parser << (S,steelblue) >> { + parse(String) -> Term + } - circle TypeWrapper << (S,steelblue) >> {} + class TypeChecker << (S,steelblue) >> { + type_check(S, Term) + } } -AnalysisHost --|> Linearizer +AnalysisHost ..|> Linearizer impl_linearizer_for_AnalysisHost . (AnalysisHost, Linearizer) : "Implementation" -impl_linearizer_for_AnalysisHost --+ (Building, Completed) -impl_linearizer_for_AnalysisHost ..> Ident +' impl_linearizer_for_AnalysisHost ..> Building +' impl_linearizer_for_AnalysisHost ..> Completed + + + +Completed ..|> LinearizationState +Completed o--> "*" LinearizationItem +' Completed o--> Scope + + +Building o--> "*" LinearizationItem +Building ..|> LinearizationState + +' Building o--> Scope + +' AnalysisHost o--> Scope +' AnalysisHost o--> ScopeId +' AnalysisHost o--> Environment +' AnalysisHost o--> MetaValue +' AnalysisHost o--> Ident + +Linearizer *..|> Linearization +' Linearizer ..> Term +' Linearizer ..> TypeWrapper +' Linearizer ..> Ident + + +' LinearizationItem o--> RawSpan +' LinearizationItem o--> Scope +' LinearizationItem o--> TermKind +' LinearizationItem o--> MetaValue + +Scope o--> ScopeId +Server o--> Completed +Server ..> AnalysisHost +Server ..> TypeChecker +Server ..> Parser -Completed <--o "*" LinearizationItem -Completed --|> LinearizationState -Building <--o "*" LinearizationItem -Building --|> LinearizationState +Server ..> GoToRef +Server ..> GoToDef +Server ..> Complete +Server ..> Symbols +Server ..> Hover -Building ..> Scope +GoToRef ..|> "R = GoToRefParams" Handler +GoToDef ..|>"R = GoToDefParams" Handler +Complete ..|>"R = CompletionParams" Handler +Symbols ..|>"R = SymbolsParams" Handler +Hover ..|> "R = HoverParams" Handler -AnalysisHost <--* Scope -AnalysisHost <--* ScopeId -AnalysisHost <--* Environment -AnalysisHost <--o MetaValue -AnalysisHost <--o Ident +TypeChecker *..|> Linearizer +' Parser ..> Term -Linearizer ..|> Linearization -Linearizer ..|> Term -Linearizer ..|> TypeWrapper -Linearizer ..|> Ident -LinearizationItem ..|> RawSpan -LinearizationItem ..|> Scope -LinearizationItem ..|> TermKind -LinearizationItem ..|> MetaValue -Completed <--o Scope @enduml From 512353c9e4ad03f2b9707b096b33b9ebe871e217 Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Mon, 21 Mar 2022 09:51:13 +0100 Subject: [PATCH 115/142] Describe architecture --- chapter/methodology.md | 90 ++++++++++++++++++++++-------------------- 1 file changed, 47 insertions(+), 43 deletions(-) diff --git a/chapter/methodology.md b/chapter/methodology.md index 43a97dfb..e667b7de 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -115,7 +115,51 @@ Incremental parsing, type-checking and analysis can still be implemented as a se Code analysis approaches as introduced in [@sec:considerable-dimensions] can have both *lazy* and *eager* qualities. Lazy solutions are generally more compatible with an incremental processing model, since these aim to minimizing the change induced computation. NLS prioritizes to optimize for efficient queries to a pre-processed data model. -Similar to the file processing argument in [@sec:file-pressng], it is assumed that Nickel project's size allows for efficient enoigh eager analysis prioritizing a more straight forward implementation over optimized performance. +Similar to the file processing argument in [@sec:file-pressng], it is assumed that Nickel project's size allows for efficient enoigh eager analysis prioritizing a more straight forward implementation over optimized performance. + +## High-Level Architecture + +This section describes The high-level architecture of NLS. +The entity diagram depicted in [@fig:class-diagram] shows the main elements at play. + +NLS needs to meet the flexibility and generalizability requirements as discussed in [@sec:flexibility, @sec:generalizability]. +In short three main considerations have to be satisfied: + +1. To keep up with the frequent changes to the Nickel language and ensure compatibility at minimal cost, NLS needs to *integrate critical functions* of Nickel's runtime +2. Adaptions to Nickel to accommodate the language server should be minimal not obstruct its development and *maintain performance of the runtime*. +3. To allow the adoption in other languages, the core language server should be separable from the nickel specifics. + +The architecture of NLS reflects these goals, using conceptional groups. +The core group labeled "Language Server", contains modules concerning both the source code analysis and LSP interaction. +The analysis is base on an internal representation of source code called `Linearization` which can be in one of two states, namely `Building` or `Completed`. +Either state manages an array of items (`LinearizationItems`) that are derived from AST nodes as well as various metadata facilitating the actions related to the state. +The building of the linearization is abstracted in the `Linearizer` trait. +Implementors of this trait convert AST nodes to linearization items and append said items to a shared linearization in the building state. +Finally, linearizers define how to post-process and complete the linearization. +The full linearization is described in detail in [@sec:linearization]. +The LSP capabilities are implemented as independent functions satisfying the same interface, accepting request parameters, and a reference to the completed linearization. +A reference to the server finally allows the handlers to send responses to LSP clients. +To facilitate most functions of the linearization and the LSP handlers, the language-server abstraction also defines a range of support types. + +Unlike the abstract language server module, the NLS module defines language specific implementations. +In particular, it implements the `Linearizer` trait through `AnalysisHost` which is referred to in the following of this document simply as the "linearizer". +The linearizer abstracts Nickel's AST nodes into linearization items. +Since the linearizer implementation is the only interface between Nickel and NLS, changes to the language that affect the AST require changes to this module only. +Representing the main binary, the Server module integrates the Nickel parser and type checker to perform deeper analysis based on an AST representation and to provide diagnostics to the client. +Moreover, the integration of Nickel's original modules avoids the need to rewrite these functions which allows NLS to profit from improvements with minimal adaption. +The analysis results are cached internally and used by the individual capability handlers to answer LSP requests. + +The Nickel module contains, apart from the parsing and type checking functions, a group of types related to AST nodes and the linearization process. +While these types currently appear throughout the entire architecture, in the future the language server library will use different abstractions to remove this dependency. +[Section @sec:future-work] lays out a more detailed plan how this will be achieved. + +\bls +```{.plantuml #fig:class-diagram include=assets/class-diagram.plantuml caption="Class Diagram"} +``` +\els + + + ## Illustrative example @@ -183,8 +227,7 @@ let image = "k8s.gcr.io/%{name_}" in ## Linearization The focus of the NLS as presented in this work is to implement a foundational set of LSP features as described in [@sec:capability]. -In order to process these capabilities efficiently as per [@sec:performance], NLS needs to store more information than what is originally present in a Nickel AST (cf. [@sec:nickel-ast]), such as information about references and types. -While these can be deduced from the AST lazily, it would require the repeated traversal of arbitrarily large tree with an associated cost to performance. +In order to process these capabilities efficiently as per [@sec:performance], NLS needs to store more information than what is originally present in a Nickel AST (cf. [@sec:nickel-ast]), such as information about references these can be deduced from the AST lazily, it would require the repeated traversal of arbitrarily large tree with an associated cost to performance. Therefore as hinted in [@sec:code-analysis], optimization is directed to efficient lookup from a pre-processed report. Since most LSP commands refer to code positions, the intermediate structure must allow efficient lookup of analysis results based on positions. @@ -272,8 +315,7 @@ impl LinearizationState for Completed {} The NLS project aims to present a transferable architecture that can be adapted for future languages. Consequently, NLS faces the challenge of satisfying multiple goals -1. To keep up with the frequent changes to the Nickel language and ensure compatibility at minimal cost, NLS needs to *integrate critical functions* of Nickel's runtime -2. Adaptions to Nickel to accommodate the language server should be minimal not obstruct its development and *maintain performance of the runtime*. + To accommodate these goals NLS comprises three different parts as shown in [@fig:nls-nickel-structure]. @@ -288,44 +330,6 @@ A stub implementation ~ of the `Linearizer` trait is used during normal operation. Since most methods of this implementation are `no-op`s, the compiler should be able to optimize away all `Linearizer` calls in release builds. - -```{.graphviz #fig:nls-nickel-structure caption="Interaction of Components"} -digraph { - splines="ortho" - node [shape=record] - - - - { - node[style=dashed] - nls [label="NLS"] - host [label=Analysis] - } - - { - node[style=dotted] - nickel [label="Nickel"] - stub [label="Stub interface"] - } - - - als [label="Linearizer | \"] - -// {rank=same; host; stub; } - - - - host -> nickel [label="imports", constraint = false] - - stub -> als [label="implements | T = ()"] - nickel -> als [label="calls"] - nickel -> stub [label="defines", style=dashed, color=grey] - nls -> host [label="defines", style=dashed, color=grey] - host -> als [label="implements | T = Nickel"] - -} -``` - #### Usage Graph From 3b73f1ed449e0c5eda3115f0d53772a457604cd7 Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Mon, 21 Mar 2022 14:39:25 +0100 Subject: [PATCH 116/142] Update nixpkgs --- flake.lock | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/flake.lock b/flake.lock index df923ed8..28596a2e 100644 --- a/flake.lock +++ b/flake.lock @@ -32,11 +32,11 @@ }, "nixpkgs": { "locked": { - "lastModified": 1640418986, - "narHash": "sha256-a8GGtxn2iL3WAkY5H+4E0s3Q7XJt6bTOvos9qqxT5OQ=", + "lastModified": 1647350163, + "narHash": "sha256-OcMI+PFEHTONthXuEQNddt16Ml7qGvanL3x8QOl2Aao=", "owner": "NixOS", "repo": "nixpkgs", - "rev": "5c37ad87222cfc1ec36d6cd1364514a9efc2f7f2", + "rev": "3eb07eeafb52bcbf02ce800f032f18d666a9498d", "type": "github" }, "original": { From fc8f5ba61ee724d1dc6a56811b321fe6a171ae48 Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Mon, 21 Mar 2022 14:42:14 +0100 Subject: [PATCH 117/142] Use forked plantuml filter --- flake.nix | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/flake.nix b/flake.nix index 2cd69496..e94bee2f 100644 --- a/flake.nix +++ b/flake.nix @@ -11,6 +11,8 @@ pkgs = nixpkgs.legacyPackages.${system}; writing = writing-tools.packages.${system}; + latex = (writing.latex.override { }); + compile-all = pkgs.writeShellScriptBin "compile-thesis" '' pandoc $(cat ./toc.txt) --defaults document.yaml -o "$@" ''; @@ -24,18 +26,20 @@ pandoc prelude/metadata.yaml prelude/prelude.md $1 --defaults document.yaml -o "''${@:2}" ''; + in { devShell = pkgs.mkShell { nativeBuildInputs = [ pkgs.bashInteractive ]; buildInputs = [ - writing.latex + latex (writing.pandoc.override { + inherit latex; filters = [ "pandoc-include-code" + (builtins.fetchurl "https://raw.githubusercontent.com/ysndr/pandocfilters/7bee1ae/examples/plantuml.py") (builtins.fetchurl "https://raw.githubusercontent.com/jgm/pandocfilters/f850b22/examples/graphviz.py") (builtins.fetchurl "https://raw.githubusercontent.com/jgm/pandocfilters/f850b22/examples/tikz.py") - (builtins.fetchurl "https://raw.githubusercontent.com/timofurrer/pandoc-plantuml-filter/master/pandoc_plantuml_filter.py") (pkgs.writeShellScript "pandoc-mermaid" '' MERMAID_BIN=${pkgs.nodePackages.mermaid-cli}/bin/mmdc exec python ${builtins.fetchurl "https://raw.githubusercontent.com/timofurrer/pandoc-mermaid-filter/master/pandoc_mermaid_filter.py"} '') From 2a357d7f79b50f831a4545b66b620b6fb220aa8e Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Mon, 21 Mar 2022 14:43:11 +0100 Subject: [PATCH 118/142] Add item resolution activity diagram --- assets/element-lookup.plantuml | 47 ++++++++++++++++++++++++++++++++++ chapter/methodology.md | 6 +++++ 2 files changed, 53 insertions(+) create mode 100644 assets/element-lookup.plantuml diff --git a/assets/element-lookup.plantuml b/assets/element-lookup.plantuml new file mode 100644 index 00000000..d137feec --- /dev/null +++ b/assets/element-lookup.plantuml @@ -0,0 +1,47 @@ +@startuml +start +floating note right + **Inputs:** + linearization + cursor position (file, location) +end note + + +:binary search linearization: + +""item.file == file"" and +""item.position == position""; + +if (element found) then (exact match at ""idx"") + while (linearization[idx+1].position == position) + :set ""idx = idx+1""; + endwhile + stop +else (possible match) + repeat + :set ""idx = idx-1""; + if (""linearization[idx]"" in ""file"") then (possible match) + if (position inside span of ""linearization[idx]"") then (match) + stop + else (no match, check next element) + endif + else (no match) + + note right + items are ordered by file + reading an item from a different ""file"" means: + all items from the matching file + have been visited without a match + end note + endif + repeat while (idx > 0) is (hello) + end + floating note right + no match in all preceding items + end note +endif + + + + +@enduml diff --git a/chapter/methodology.md b/chapter/methodology.md index e667b7de..2cdfc519 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -1052,6 +1052,12 @@ impl Completed { } ``` + +\bls +```{.plantuml #fig:element-lookup include="assets/element-lookup.plantuml" caption="Activity diagram of item resolution by position"} +``` +\els + #### Resolving by ID During the building process item IDs are equal to their index in the underlying array which allows for efficient access by ID. From 3a86feb7cc3c311d8d0211bbd2dcdb2e06e05306 Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Mon, 21 Mar 2022 14:44:03 +0100 Subject: [PATCH 119/142] Amend description of some resolution sections --- chapter/methodology.md | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/chapter/methodology.md b/chapter/methodology.md index 2cdfc519..1fca95dc 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -1061,8 +1061,14 @@ impl Completed { #### Resolving by ID During the building process item IDs are equal to their index in the underlying array which allows for efficient access by ID. -To allow similarly efficient access to nodes with using IDs a `Completed` linearization maintains a mapping of IDs to their corresponding index in the reordered array. -For instance NLS would represent the example [@lst:nickel-scope-example] as shown in : +To allow similarly efficient dereferencing of node IDs, a `Completed` linearization maintains a mapping of IDs to their corresponding index in the reordered array. + +#### Resolving by scope + +During the construction from the AST, the syntactic scope of each element is eventually known. +This allows to map an item's `ScopeId` to a list of elements defined in this scope by parent scopes. +As discussed in [@sec:scopes], scopes are lists of scope path elements, where the prefixes correspond to parent scopes. +For instance, NLS would represent the example [@lst:nickel-scope-example] as shown in [@lst:nls-scopes-elements] below. ```{.rust #lst:nls-scopes-elements caption="Items collected for each scope of the example. Simplified representation using concrete values"} /1 -> { Declaration("record") } @@ -1072,14 +1078,8 @@ For instance NLS would represent the example [@lst:nickel-scope-example] as show /1/2 -> { Usage("record") } ``` -A queried ID is first looked up in this mapping which yields an index from which the actual item is read. - -#### Resolving by scope - -During the construction from the AST, the syntactic scope of each element is eventually known. -This allows to map an item's `ScopeId` to a list of elements defined in this scope by parent scopes. -As discussed in [@sec:scopes], scopes are lists of scope path elements, where the prefixes correspond to parent scopes. For any given scope the set of referable nodes is determined by unifying the associated IDs of all prefixes of the given scope, then resolving the IDs to elements. +Concretely, the identifiers in scope of the value `123` in the [example @lst:nls-scopes-elements] are `{Declaration("record"), RecordField("key1"), RecordField("key2") }`{.rust}. The Rust implementation is given in [@lst:nls-resolve-scope] below. ```{.rust #lst:nls-resolve-scope caption="Resolution of all items in scope"} From 56c9afe8d0b79307db8e11667001cd6cda86c571 Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Mon, 21 Mar 2022 14:44:36 +0100 Subject: [PATCH 120/142] Add landscape support --- prelude/metadata.yaml | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/prelude/metadata.yaml b/prelude/metadata.yaml index e8411f0c..e2455ecc 100644 --- a/prelude/metadata.yaml +++ b/prelude/metadata.yaml @@ -2,4 +2,8 @@ autoSectionLabels: true header-includes: - \AtBeginDocument{\floatplacement{codelisting}{H}} + - | + \usepackage{lscape} + \newcommand{\bls}{\begin{landscape}} + \newcommand{\els}{\end{landscape}} ... From 51164e3773d259b9bf0d488d798949aa310d997d Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Mon, 21 Mar 2022 14:48:05 +0100 Subject: [PATCH 121/142] Add meaningful caption to entity diagram --- chapter/methodology.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/chapter/methodology.md b/chapter/methodology.md index 1fca95dc..d1b1d24e 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -154,7 +154,7 @@ While these types currently appear throughout the entire architecture, in the fu [Section @sec:future-work] lays out a more detailed plan how this will be achieved. \bls -```{.plantuml #fig:class-diagram include=assets/class-diagram.plantuml caption="Class Diagram"} +```{.plantuml #fig:class-diagram include=assets/class-diagram.plantuml caption="Entity Diagram showing the architecture of NLS, explicit dependency arrows omitted for legibility."} ``` \els From 034e1edc13a0fb2db17923793600e237ade0e807 Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Mon, 21 Mar 2022 14:53:36 +0100 Subject: [PATCH 122/142] Explicitly ignore plantuml outputs --- .gitignore | 1 + 1 file changed, 1 insertion(+) diff --git a/.gitignore b/.gitignore index ab7be0a3..ff72d780 100644 --- a/.gitignore +++ b/.gitignore @@ -1,3 +1,4 @@ **/*.pdf !previews/**/*.pdf +plantuml-images/ .direnv/ From c94c1fd70bf83f95bcee5881eb0b2e4fc2630319 Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Mon, 21 Mar 2022 16:57:28 +0100 Subject: [PATCH 123/142] Edit Server implementation section --- chapter/methodology.md | 33 ++++++++++++++++++++------------- 1 file changed, 20 insertions(+), 13 deletions(-) diff --git a/chapter/methodology.md b/chapter/methodology.md index d1b1d24e..b3269eed 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -3,7 +3,7 @@ This chapter guides through the components of the Nickel Language Server (NLS) as well as the implementation details of the source code analysis and information querying. Aiming for an abstract interface, NLS defines its own data structure underpinning all higher level LSP interactions. [Section @sec:linearization] will introduce this `linearization` data structure and explain how NLS bridges the gap between the explicitly handled Nickel AST towards the abstract linearization. -Finally, the implementation of current LSP features is discussed in [@sec:lsp-server]. +Finally, the implementation of current LSP features is discussed in [@sec:lsp-server-implementation]. ## Key Objectives @@ -160,7 +160,6 @@ While these types currently appear throughout the entire architecture, in the fu - ## Illustrative example The example [@lst:nickel-complete-example] shows an illustrative high level configuration of a server. @@ -228,7 +227,7 @@ let image = "k8s.gcr.io/%{name_}" in The focus of the NLS as presented in this work is to implement a foundational set of LSP features as described in [@sec:capability]. In order to process these capabilities efficiently as per [@sec:performance], NLS needs to store more information than what is originally present in a Nickel AST (cf. [@sec:nickel-ast]), such as information about references these can be deduced from the AST lazily, it would require the repeated traversal of arbitrarily large tree with an associated cost to performance. -Therefore as hinted in [@sec:code-analysis], optimization is directed to efficient lookup from a pre-processed report. +Therefore, as hinted in [@sec:code-analysis], optimization is directed to efficient lookup from a pre-processed report. Since most LSP commands refer to code positions, the intermediate structure must allow efficient lookup of analysis results based on positions. @@ -1103,23 +1102,31 @@ impl Completed { } ``` -## LSP Server +## LSP Server Implementation + +This section describes how NSL uses the linearization described in [@sec:linearization] to implement the set of features proposed in [@sec:capability]. + +### Server Interface + +As mentioned in [@sec:programming-language] the Rust language ecosystem maintains several porjects supporting the development of LSP compliant servers. +NLS is based on the `lsp-server` crate [@lsp-server-crate], a contribution by the Rust Analyzer, which promises long-term support and compliance with the latest LSP specification. -[Section @sec:commands-and-notifications] introduced the concept of capabilities in the context of the language server protocol. -This section describes how NSL uses the linearization described in [@sec:linearization] to implement a comprehensive set of features. -NLS implements the most commonly compared capabilities *Code completion*, *Hover* *Jump to def*, *Find references*, *Workspace symbols* and *Diagnostics*. +Referring to [@fig:class-diagram], the `Server` module represents the main server binary. +It integrates the analysis steps with Nickel's parsing and type-checking routines. +The resulting analysis is used to serve LSP requests. ### Diagnostics and Caching NLS instructs the LSP client to notify the server once the user opens or modifies a file. -Each notification contains the complete source code of the file as well as its location. -NLS subsequently parses and type-checks the file using Nickel's libraries. -Since Nickel deals with error reporting already, NLS converts any error generated in these processes into [Diagnostic](https://microsoft.github.io/language-server-protocol/specifications/specification-current/#diagnostic) items and sends them to the client as server notifications. -Nickel errors provide detailed information about location of the issue as well as possible details which NLS can include in the Diagnostic items. +An update notification contains the complete source code of the file as well as its location. +Upon notification, NLS first attempts to create an AST from the source code contained in the request payload by passing to Nickel's parser module. +NLS will then instantiate a new linearizer which is applied to Nickel's type-checking functions, which has the following benefits. +Leveraging type checking serves as both provider of type diagnostics or complete tree traversal yielding a linearization of the entire code in the absence of errors. +Moreover, inferred types computed during the type-checking, can be used to resolve element types of the linearization items. +Errors arising in either step reported to the client as [Diagnostic](https://microsoft.github.io/language-server-protocol/specifications/specification-current/#diagnostic) including detailed information about location and possible details provided by the Nickel infrastructure. As discussed in [@sec:linearization] and [@sec:resolving-elements] the type-checking yields a `Completed` linearization which implements crucial methods to resolve elements. -NLS will cache the linearization for each processed file. -This way it can provide its LSP functions while a file is being edited. +NLS will cache the linearization for each processed file so that it can provide its LSP functions even while a file is being edited, i.e, in a possibly invalid state. ### Commands From ee2bbdd44a524e46ab441f0ea117738025e8ef95 Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Mon, 21 Mar 2022 17:01:16 +0100 Subject: [PATCH 124/142] Make sentence lighter --- chapter/methodology.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/chapter/methodology.md b/chapter/methodology.md index b3269eed..8f9c23c5 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -2,7 +2,7 @@ This chapter guides through the components of the Nickel Language Server (NLS) as well as the implementation details of the source code analysis and information querying. Aiming for an abstract interface, NLS defines its own data structure underpinning all higher level LSP interactions. -[Section @sec:linearization] will introduce this `linearization` data structure and explain how NLS bridges the gap between the explicitly handled Nickel AST towards the abstract linearization. +[Section @sec:linearization] will introduce this linearization data structure and explain how NLS bridges the gap from the explicitly handled Nickel AST. Finally, the implementation of current LSP features is discussed in [@sec:lsp-server-implementation]. ## Key Objectives From 613360c686dcbe6256b4389e745aaf64eebb3379 Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Mon, 21 Mar 2022 18:06:04 +0100 Subject: [PATCH 125/142] Include StubHost in diagram --- assets/class-diagram.plantuml | 26 +++++++++++++++++++++++--- 1 file changed, 23 insertions(+), 3 deletions(-) diff --git a/assets/class-diagram.plantuml b/assets/class-diagram.plantuml index 00ebe638..f4f77368 100644 --- a/assets/class-diagram.plantuml +++ b/assets/class-diagram.plantuml @@ -60,6 +60,17 @@ package "Language Server Abstraction" { + scope: Scope, + meta: Option, } + + + class StubHost << (S,steelblue) >> { + } + + class "Linearizer for StubHost" as impl_linearizer_for_StubHost << (I,lime) >> { + -- Associated Types -- + type Building = (); + type Completed = (); + } + } package "LSP Implementations" { @@ -91,7 +102,6 @@ package nls { ~ receive(Request) ~ reply(Response) } - note right of Server: asd class AnalysisHost << (S,steelblue) >> { - env: Environment, @@ -101,7 +111,6 @@ package nls { - let_binding: Option, - access: Option>, - {field} record_fields: Option<(ID, Vec<(ID, Ident)>)>, - } class "Linearizer for AnalysisHost" as impl_linearizer_for_AnalysisHost << (I,lime) >> { @@ -136,11 +145,18 @@ package Nickel { type_check(S, Term) } + class "Nickel" as Nickel_bin << (S,steelblue) >> { + + } + } AnalysisHost ..|> Linearizer impl_linearizer_for_AnalysisHost . (AnalysisHost, Linearizer) : "Implementation" +StubHost ..|> Linearizer +impl_linearizer_for_StubHost . (StubHost, Linearizer) : "Implementation" + ' impl_linearizer_for_AnalysisHost ..> Building ' impl_linearizer_for_AnalysisHost ..> Completed @@ -178,7 +194,7 @@ Scope o--> ScopeId Server o--> Completed Server ..> AnalysisHost Server ..> TypeChecker -Server ..> Parser +Server ..> Parser : "S = AnalysisHost" Server ..> GoToRef Server ..> GoToDef @@ -193,6 +209,10 @@ Symbols ..|>"R = SymbolsParams" Handler Hover ..|> "R = HoverParams" Handler TypeChecker *..|> Linearizer + +Nickel_bin ..> StubHost +Nickel_bin ..Parser : "S = StubHost" +Nickel_bin ..> TypeChecker ' Parser ..> Term From 8c087c4f135b9c1448df386f4aed47446162d07c Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Mon, 21 Mar 2022 19:02:30 +0100 Subject: [PATCH 126/142] Fix label in element lookup --- assets/element-lookup.plantuml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/assets/element-lookup.plantuml b/assets/element-lookup.plantuml index d137feec..353190e9 100644 --- a/assets/element-lookup.plantuml +++ b/assets/element-lookup.plantuml @@ -34,7 +34,7 @@ else (possible match) have been visited without a match end note endif - repeat while (idx > 0) is (hello) + repeat while (idx > 0) is (check next preceding element) end floating note right no match in all preceding items From 33177b63baa8eef5f978fd68dfd110784f85f82d Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Wed, 23 Mar 2022 11:35:07 +0100 Subject: [PATCH 127/142] Fix scope example layout --- chapter/methodology.md | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/chapter/methodology.md b/chapter/methodology.md index 8f9c23c5..83d3f237 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -432,7 +432,7 @@ A `ScopeId` in turn is a "scope path", a list of path elements where the prefix ```{.nickel #lst:nickel-scope-example caption="Explicit display of Nickel scopes"} ----------------------------------------------+/1 +---------------------------------------------+ /1 | let record | = { -------------------------------+ /1/1 | @@ -440,11 +440,10 @@ let record | key1 = "value", -------- /1/1/1 | | key2 = 123, ------------ /1/1/2 | | | | - }----------------------------------+ | - in record ------------------------/1/2 | + } ---------------------------------+ | + in record -------------------------- /1/2 | | ---------------------------------------------+ -``` Additionally, to keep track of the variables in scope, and iteratively build a usage graph, NLS keeps track of the latest definition of each variable name and which `Declaration` node it refers to. From 46fd99aad0b2d9e0ff8398862be3860a0672e7a0 Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Wed, 23 Mar 2022 12:11:11 +0100 Subject: [PATCH 128/142] Fix Code Block --- chapter/methodology.md | 1 + 1 file changed, 1 insertion(+) diff --git a/chapter/methodology.md b/chapter/methodology.md index 83d3f237..014d4437 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -444,6 +444,7 @@ let record | in record -------------------------- /1/2 | | ---------------------------------------------+ +``` Additionally, to keep track of the variables in scope, and iteratively build a usage graph, NLS keeps track of the latest definition of each variable name and which `Declaration` node it refers to. From 0e74a02c40b58b5803e6938e31cff9ebe4100243 Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Wed, 23 Mar 2022 16:06:15 +0100 Subject: [PATCH 129/142] Fix typos Co-authored-by: Martin Monperrus --- chapter/methodology.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/chapter/methodology.md b/chapter/methodology.md index 014d4437..dcdd39d0 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -1,4 +1,4 @@ -# Design implementation of NLS +# Design and Implementation This chapter guides through the components of the Nickel Language Server (NLS) as well as the implementation details of the source code analysis and information querying. Aiming for an abstract interface, NLS defines its own data structure underpinning all higher level LSP interactions. @@ -119,7 +119,7 @@ Similar to the file processing argument in [@sec:file-pressng], it is assumed th ## High-Level Architecture -This section describes The high-level architecture of NLS. +This section describes the high-level architecture of NLS. The entity diagram depicted in [@fig:class-diagram] shows the main elements at play. NLS needs to meet the flexibility and generalizability requirements as discussed in [@sec:flexibility, @sec:generalizability]. @@ -165,7 +165,7 @@ While these types currently appear throughout the entire architecture, in the fu The example [@lst:nickel-complete-example] shows an illustrative high level configuration of a server. Using Nickel, this file would be used to define the schema of the configuration format of another program. Evaluation and validation is done in the context of Nickel, after which the evaluated structure is translated into a more common (but less expressive) format such as YAML or JSON. -Here, the chema for a configuration of a Kubernetes-like [@kubernetes] tool is defined using contracts, making exemplary use of variables and functions. +Here, the schema for a configuration of a Kubernetes-like [@kubernetes] tool is defined using contracts, making exemplary use of variables and functions. Specifically, it describes a way to provision named containers. The user is able to specify container images and opened ports, as well as define metadata of the deployment. The configuration is constrained by the `NobernetesConfig` contract. From dc9ab2ced7148756f59c1d9a52d4bdb883932b63 Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Fri, 25 Mar 2022 13:14:08 +0100 Subject: [PATCH 130/142] Add descriptive sentence for each capability --- chapter/methodology.md | 16 +++++++++++----- 1 file changed, 11 insertions(+), 5 deletions(-) diff --git a/chapter/methodology.md b/chapter/methodology.md index dcdd39d0..4ec59ed9 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -36,12 +36,18 @@ This concept mitigates queuing but can lead to similarly bad user experience as The second objective is to provide an LSP server that offers the most common LSP features as identified by [@langserver-org]. Concretely, these capabilities are: -1. Code completion, -2. Hover information, -3. Jump to definition, -4. Find references, -5. Workspace symbols, +1. Code completion + Suggest identifiers, methods or values at the cursor position. +2. Hover information + Present additional information about an item under the cursor, i.e., types, contracts and documentation. +3. Jump to definition + Find and jump to the definition of a local variable or identifier. +4. Find references + List all usages of a defined variable. +5. Workspace symbols + List all variables in a workspace or document. 6. Diagnostics + Analyze source code, i.e., parse and type check and notify the LSP Client if errors arise. For the work on NLS these six capabilities were considered as the goal for a minimal viable product. From d9da6f8fd37458527a4f8b6d54cf6f46723330c6 Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Fri, 25 Mar 2022 13:14:49 +0100 Subject: [PATCH 131/142] Rewrite introductive sentence for traits Co-authored-by: Martin Monperrus --- chapter/methodology.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/chapter/methodology.md b/chapter/methodology.md index 4ec59ed9..2f982b6d 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -86,7 +86,7 @@ For instance the largest and most advaced LSP implementation in Rust -- the Rust Additionally, lots of smaller languages [@gluon, @slint, @mojom] implement Language Servers in Rust. Rust appears to be a viable choice even for languages that are not originally implemented in Rust, such as Nix [@nix, @rninx-lsp]. -In Rust the concept of `traits` [@traits] is fundamental, for the following reasons: +In Rust `traits` [@traits] are the fundamental concept used to abstract methods from the underlying data. Traits are definitions of shared behavior. Similar to interfaces in other languages, a trait defines a set of methods. One implements a trait for a certain type, by defining the behavior in the context of the type. From 8d4d23004366cccbe8819420c447cd1f574bfeef Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Sat, 26 Mar 2022 14:42:13 +0100 Subject: [PATCH 132/142] More thorough LinearizationItem introduction --- chapter/methodology.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/chapter/methodology.md b/chapter/methodology.md index 2f982b6d..cdaa1518 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -139,6 +139,9 @@ The architecture of NLS reflects these goals, using conceptional groups. The core group labeled "Language Server", contains modules concerning both the source code analysis and LSP interaction. The analysis is base on an internal representation of source code called `Linearization` which can be in one of two states, namely `Building` or `Completed`. Either state manages an array of items (`LinearizationItems`) that are derived from AST nodes as well as various metadata facilitating the actions related to the state. +The `LinearizationItem` is an abstract representation of code units represented by AST nodes or generated to support an AST derived item. +Items associate a certain span with its type, metadata, scope and a unique id, making it referable to. +Additionally, `LinearizationItem`s are assigned a `TermKind` which distinguishes different functions of the item in the context of the linearization. The building of the linearization is abstracted in the `Linearizer` trait. Implementors of this trait convert AST nodes to linearization items and append said items to a shared linearization in the building state. Finally, linearizers define how to post-process and complete the linearization. From 8d0fa6dfece2c4a2d916f9ea070eb20b812b463a Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Sat, 26 Mar 2022 14:49:42 +0100 Subject: [PATCH 133/142] Split long sentence Co-authored-by: Martin Monperrus --- chapter/methodology.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/chapter/methodology.md b/chapter/methodology.md index cdaa1518..0ffc50d7 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -235,7 +235,8 @@ let image = "k8s.gcr.io/%{name_}" in ## Linearization The focus of the NLS as presented in this work is to implement a foundational set of LSP features as described in [@sec:capability]. -In order to process these capabilities efficiently as per [@sec:performance], NLS needs to store more information than what is originally present in a Nickel AST (cf. [@sec:nickel-ast]), such as information about references these can be deduced from the AST lazily, it would require the repeated traversal of arbitrarily large tree with an associated cost to performance. +In order to process these capabilities efficiently as per [@sec:performance], NLS needs to store more information than what is originally present in a Nickel AST (cf. [@sec:nickel-ast]), such as information about references. +Although this can be deduced from the AST lazily, working with Nickel's tree representation is inefficient, as it is not optimized for random access and search operations. Therefore, as hinted in [@sec:code-analysis], optimization is directed to efficient lookup from a pre-processed report. Since most LSP commands refer to code positions, the intermediate structure must allow efficient lookup of analysis results based on positions. From 7dcf653acfde3684c6e2f7403211936a3ce4b845 Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Sat, 26 Mar 2022 15:14:01 +0100 Subject: [PATCH 134/142] Clarify LinarizationItem role Co-authored-by: Martin Monperrus --- chapter/methodology.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/chapter/methodology.md b/chapter/methodology.md index 0ffc50d7..258f9443 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -267,8 +267,8 @@ However, the exact structure of that array differs as an effect of the post-proc `LinearizationItem`s maintain the position of their AST counterpart as well as its type. Unlike in the AST ([sec:meta-information]), *metadata* is directly associated with the element. -Further deviating from the AST representation, the *type* of the node and its *kind* are tracked separately. -The latter is used to represent a usage graph on top of the linear structure. +Further deviating from the AST representation, both the *type* of the node and its references to other items are encoded explicitly in the `LinearizationType`. +The references form an implicit usage graph on top of the linear structure. It distinguishes between declarations (`let` bindings, function parameters, records) and variable usages. Any other kind of structure, for instance, primitive values (Strings, numbers, boolean, enumerations), is recorded as `Structure`. From 9ecc3429853ff9bc5eba21b901f47b9e4973a5bb Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Sat, 26 Mar 2022 15:53:03 +0100 Subject: [PATCH 135/142] Add more detail about generic linearizer interface usage --- chapter/methodology.md | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/chapter/methodology.md b/chapter/methodology.md index 258f9443..455e69b7 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -274,7 +274,13 @@ Any other kind of structure, for instance, primitive values (Strings, numbers, b To separate the phases of the elaboration of the linearization in a type-safe way, the implementation is based on type-states[@typestate]. Type-states were chosen over an enumeration based approach for the additional flexibility they provide to build a generic interface. -Thanks to the generic interface, the adaptions to Nickel to integrate NLS are expected to have almost no influence on the runtime performance of the language in an optimized build. +First, type-states allow to implement separate utility methods for either state and enforce specific states on the type level. +Second the `Linearization` struct provides a common context for all states like an enumeration, yet statically determining the Variant. +Additionally, the `Linearizer` trait can be implemented for arbitrary `LinearizationState`s. +This allows other LSP implementations to base on the same core while providing, for example, more information during the building phase. +The unit type `()` is a so called "zero sized type" [@zero-sized-type), it represents the absence of a value. +NLS provides a `Linearizer` implementation based on unit types and empty method definitions. +As a result, the memory footprint of this linearizer is effectively zero and most method calls will be removed as part of compile time optimizations. NLS implements separate type-states for the two phases of the linearization: `Building` and `Completed`. From 2b9ea9bda67e9c74dd6ec511f5a2718b9f26071b Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Sat, 26 Mar 2022 15:54:51 +0100 Subject: [PATCH 136/142] Refer to objectives section Co-authored-by: Martin Monperrus --- chapter/methodology.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/chapter/methodology.md b/chapter/methodology.md index 455e69b7..5143a45e 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -327,7 +327,7 @@ impl LinearizationState for Completed {} ### Transfer from AST -The NLS project aims to present a transferable architecture that can be adapted for future languages. +The NLS project aims to present a transferable architecture that can be adapted for future languages as elaborated in [@sec:generalizability]. Consequently, NLS faces the challenge of satisfying multiple goals From 8fb36db39551b130a1c3acf3d7cf52330ffaba43 Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Sat, 26 Mar 2022 15:59:37 +0100 Subject: [PATCH 137/142] Apply suggestions from code review Minor changes of wording and elaboration Co-authored-by: Martin Monperrus --- chapter/methodology.md | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/chapter/methodology.md b/chapter/methodology.md index 5143a45e..1323dcbb 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -282,7 +282,7 @@ The unit type `()` is a so called "zero sized type" [@zero-sized-type), it repre NLS provides a `Linearizer` implementation based on unit types and empty method definitions. As a result, the memory footprint of this linearizer is effectively zero and most method calls will be removed as part of compile time optimizations. -NLS implements separate type-states for the two phases of the linearization: `Building` and `Completed`. +NLS defines two type-state variants according to the two phases of the linearization: `Building` and `Completed`. building phase: @@ -342,8 +342,9 @@ Nickel's type checking implementation ~ was adapted to pass AST nodes to the `Linearizer`. Modifications to Nickel are minimal, comprising only few additional function calls and a slightly extended argument list. A stub implementation - ~ of the `Linearizer` trait is used during normal operation. - Since most methods of this implementation are `no-op`s, the compiler should be able to optimize away all `Linearizer` calls in release builds. + ~ of the `Linearizer` trait is used during normal operation of the interpreter. + Since most methods of this implementation are `no-op`s, the compiler is expected to be able to remove most `Linearizer` related method calls in optimized release builds. + This promises minimal runtime impact incurred by the integration of lsp APIs. #### Usage Graph @@ -428,11 +429,12 @@ The Nickel language implements lexical scopes with name shadowing. 1. A name can only be referred to after it has been defined 2. A name can be redefined locally -An AST inherently supports this logic. +An AST supports this concept due to its hierarchical structure. A variable reference always refers to the closest parent node defining the name and scopes are naturally separated using branching. Each branch of a node represents a sub-scope of its parent, i.e., new declarations made in one branch are not visible in the other. -When eliminating the tree structure, scopes have to be maintained in order to provide auto-completion of identifiers and list symbol names based on their scope as context. +When eliminating the tree structure, scopes have to be maintained. +This is to provide LSP capabilities such as auto-completion [@sec:auto-completion] of identifiers and list symbol names [@sec:document-symbols], which require the item's scope as context. Since the bare linear data structure cannot be used to deduce a scope, related metadata has to be tracked separately. The language server maintains a register for identifiers defined in every scope. This register allows NLS to resolve possible completion targets as detailed in [@sec:resolving-by-scope]. @@ -653,7 +655,7 @@ For either the linearizer generates `Declaration` items and updates its name reg However, type information is available for name bindings only, meaning pattern matches remain untyped. The same process applies for argument names in function declarations. -Due to argument currying[^https://en.wikipedia.org/wiki/Currying], NLS linearizes only a single argument/pattern at a time. +Due to argument currying [@currying], NLS linearizes only a single argument/pattern at a time. ##### Records From e67230da6b19f7b0749c2b00d09aea3f93f668dd Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Sat, 26 Mar 2022 16:00:33 +0100 Subject: [PATCH 138/142] Add phases take-away Co-authored-by: Martin Monperrus --- chapter/methodology.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/chapter/methodology.md b/chapter/methodology.md index 1323dcbb..b6b0d2f4 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -297,6 +297,9 @@ post-processing phase: ~ Additionally, missing edges in the usage graph have been created and the types of items are fully resolved in a completed linearization. Type definitions of the `Linearization` as well as its type-states `Building` and `Completed` are listed in [@lst:nickel-definition-lineatization;@lst:nls-definition-building-type;@lst:nls-definition-completed-type]. +As hinted above, the `Linearization` struct acts as the overarching context only. +Therefore it is similar to an enumeration where the concrete variants are unknown but statically determined at compile time. +The `LinearizationState`s can be implemented according to the needs of the Linearizer implementation of the LSP server built on top of the core module. Note that only the former is defined as part of the Nickel libraries, the latter are specific implementations for NLS. From e57ee66a14e6e5b35b02d3d5c5fc6a2de1c5428d Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Sat, 26 Mar 2022 16:01:29 +0100 Subject: [PATCH 139/142] Remind core requirements for transfer --- chapter/methodology.md | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/chapter/methodology.md b/chapter/methodology.md index b6b0d2f4..6535a9f2 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -333,7 +333,10 @@ impl LinearizationState for Completed {} The NLS project aims to present a transferable architecture that can be adapted for future languages as elaborated in [@sec:generalizability]. Consequently, NLS faces the challenge of satisfying multiple goals - +1. The core of the server should be language independent. +2. Language dependent features should serve the core abstractions. +3. To keep up with Nickel's rapid development ensuring compatibility at minimal cost, critical functions should integrate with the Nickel language implementation. +4. Adaptions to Nickel should be minimal not obstruct its development and runtime performance. To accommodate these goals NLS comprises three different parts as shown in [@fig:nls-nickel-structure]. From 3b65ce02b28df417cbcd84aa737a53f5258d513c Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Wed, 23 Mar 2022 16:24:30 +0100 Subject: [PATCH 140/142] Escape pdf url (cherry picked from commit d2136c167f13f078425cf5e6889bb3c158878941) --- .github/workflows/preview.yml | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/.github/workflows/preview.yml b/.github/workflows/preview.yml index 5fbaf6fb..9491fdc3 100644 --- a/.github/workflows/preview.yml +++ b/.github/workflows/preview.yml @@ -45,7 +45,7 @@ jobs: branch=${{ github.head_ref }} input="${branch%%#*}" pdf_output="${{ env.PREVIEW_PATH }}/${branch}.pdf" - + pdf_output_url=$(python -c "import urllib.parse; import sys; print(urllib.parse.quote(sys.argv[1]))" "$pdf_output") mode=$([[ "$input" == "chapter/"* ]] && echo "chapter" || echo "all") tag="${branch#*#}" @@ -54,6 +54,7 @@ jobs: echo "::set-output name=ref::$branch" echo "::set-output name=input::$input" echo "::set-output name=pdf_output::$pdf_output" + echo "::set-output name=pdf_output_url::$pdf_output_url" echo "::set-output name=mode::$mode" echo "::set-output name=tag::$tag" @@ -166,4 +167,4 @@ jobs: status: ${{ job.status }} auto_inactive: false deployment_id: ${{ steps.deployment.outputs.deployment_id }} - env_url: "https://github.com/${{github.repository}}/tree/${{env.PREVIEW_REF}}/${{ steps.config.outputs.pdf_output }}" + env_url: "https://github.com/${{github.repository}}/tree/${{env.PREVIEW_REF}}/${{ steps.config.outputs.pdf_output_url }}" From c1eb04e61e6efd8c53a4e3e839d7a9b647154673 Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Fri, 25 Mar 2022 13:12:23 +0100 Subject: [PATCH 141/142] Add sequence diagrams for analysis --- assets/eager-analysis.plantuml | 55 ++++++++++++++++++++++++++++++++++ assets/lazy-analysis.plantuml | 44 +++++++++++++++++++++++++++ chapter/methodology.md | 13 +++++++- 3 files changed, 111 insertions(+), 1 deletion(-) create mode 100644 assets/eager-analysis.plantuml create mode 100644 assets/lazy-analysis.plantuml diff --git a/assets/eager-analysis.plantuml b/assets/eager-analysis.plantuml new file mode 100644 index 00000000..ce950757 --- /dev/null +++ b/assets/eager-analysis.plantuml @@ -0,0 +1,55 @@ +@startuml + +!pragma teoz true + +actor User +participant NLS +entity Linearization +database Cache +participant Handler + +group Ahead of time analysis + +[o-> NLS : File update + +activate NLS + +NLS -> Linearization +activate Linearization +... +Linearization -> NLS +deactivate Linearization + +NLS -> Cache: Store linearization for File +activate Cache +Cache -> NLS +deactivate Cache + +NLS ->o? +deactivate NLS + +end + +User -> NLS: LSP Request +activate NLS + +NLS -> Cache: lookup linearization +activate Cache + +Cache -> NLS +deactivate Cache + +NLS -> Handler: Pass request and linearization +activate Handler +deactivate NLS +note right of Handler + Performs lookup to + immutable linearization. +end note +Handler -> User: Send LSP Response + +deactivate Handler + + + +@enduml diff --git a/assets/lazy-analysis.plantuml b/assets/lazy-analysis.plantuml new file mode 100644 index 00000000..84132a52 --- /dev/null +++ b/assets/lazy-analysis.plantuml @@ -0,0 +1,44 @@ +@startuml + +actor User +participant NLS +' participant Cache + +User -> NLS: LSP Request +activate NLS + +' NLS -> Cache: Request cached? + +' alt cache found + +' Cache --> NLS: Cached Response + +' else cache not found + +' Cache --> NLS: Cache Failure + +NLS -> NLS: Prepare Context + +{start} NLS -> Analysis: Pass Context +activate Analysis + +... + +note right of Analysis + Possible analysis: + - type checking + - finding references + - find definition +end note +{end} Analysis -> NLS: Respond Analysis +deactivate Analysis + +NLS -> NLS: Prepare LSP Response + +' NLS -> Cache: Cache Response + +' end +NLS -> User: Send Response +deactivate NLS + +@enduml diff --git a/chapter/methodology.md b/chapter/methodology.md index 6535a9f2..c9a6d509 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -121,7 +121,18 @@ Incremental parsing, type-checking and analysis can still be implemented as a se Code analysis approaches as introduced in [@sec:considerable-dimensions] can have both *lazy* and *eager* qualities. Lazy solutions are generally more compatible with an incremental processing model, since these aim to minimizing the change induced computation. NLS prioritizes to optimize for efficient queries to a pre-processed data model. -Similar to the file processing argument in [@sec:file-pressng], it is assumed that Nickel project's size allows for efficient enoigh eager analysis prioritizing a more straight forward implementation over optimized performance. +Similar to the file processing argument in [@sec:file-pressng], it is assumed that Nickel project's size allows for efficient enough eager analysis prioritizing a more straight forward implementation over optimized performance. + +An example workflow of both lazy and eager processing is examplified in the sequence diagrams [@fig:nls-lazy-processing-seq] and [@fig:nls-eager-processing-seq] respectively. +As mentioned in the previous paragraph, it is assumed that the performance gains of direct lookup after an "Ahead of time analysis", outperform the lazy analysis in terms of responsiveness. +At the same time the initial analysis is expected to complete in reasonably short time for typical Nickel workflows. + +```{.plantuml #fig:nls-lazy-processing-seq include=assets/lazy-analysis.plantuml caption="Sequence diagram depicting lazy handling of LSP requests."} +``` + +```{.plantuml #fig:nls-eager-processing-seq include=assets/eager-analysis.plantuml caption="Sequence diagram depicting eager analysis and handling of LSP requests."} +``` + ## High-Level Architecture From ad17e94d9f65838016fb18502a209744d5fe1faf Mon Sep 17 00:00:00 2001 From: Yannik Sander Date: Sun, 17 Jul 2022 13:54:19 +0200 Subject: [PATCH 142/142] Fix various typos --- chapter/methodology.md | 41 ++++++++++++++++++++--------------------- 1 file changed, 20 insertions(+), 21 deletions(-) diff --git a/chapter/methodology.md b/chapter/methodology.md index c9a6d509..2c956fec 100644 --- a/chapter/methodology.md +++ b/chapter/methodology.md @@ -13,7 +13,7 @@ The following points are considered key objectives of this thesis implemented in The usefulness of a language server correlates with its performance. It may cause stutters in the editor, or prompt users to wait for responses when upon issuing LSP commands. -Different studies suggest that interruptions are detrimentatl to programmers productivity [@interruption-1, @interruption-2]. The more often and longer a task is interrupted the higher the frustration. +Different studies suggest that interruptions are detrimental to programmers productivity [@interruption-1, @interruption-2]. The more often and longer a task is interrupted the higher the frustration. Hence, as called for in RQ.1 (cf. [@sec:research-questions]), a main criterion for the language server is its performance. Speaking of language servers there are two tasks that require processing, and could potentially cause interruptions. @@ -28,7 +28,7 @@ Moreover, the order of requests has to be maintained. Since many requests are issued implicitly by the editor, e.g., hover requests, there is a risk of request queuing which could delay the processing of explicit commands. It is therefore important to provide nearly instantaneous replies to requests. -It is to mention that the LSP defines "long running" requests, that may run in the background. +It is to mention that the LSP defines "long-running" requests, that may run in the background. This concept mitigates queuing but can lead to similarly bad user experience as responses appear out of order or late. ### Capability @@ -69,8 +69,7 @@ Also, the Language servers should not depend on the implementation of Nickel (e. ## Design Decisions [Section @sec:considerable-dimensions] introduced several considerations with respect to the implementation of language servers. -Additionally, in [@sec:representative-lsp-projects] presents examples of different servers which guided the decisions made while implementing the NLS. -Additionally, in [@sec:representative-lsp-projects] presents examples of different servers which guided the decisions made while implementing the NLS. +Additionally, in [@sec:representative-lsp-projects] presents examples of different servers which guided the decisions made while implementing the NLS. ### Programming language @@ -82,7 +81,7 @@ In fact, using any other language was never considered since that would have req Additionally, Rust has proven itself as a language for LSP Servers. Lastly, Rust has already been employed by multiple LSP servers [@lib.rs#language-servers] which created a rich ecosystem of server abstractions. -For instance the largest and most advaced LSP implementation in Rust -- the Rust Analyzer [@rust-analyzer] -- has contributed many tools such as an LSP server interface [@lsp-server-interface] and a refactoring oriented syntax tree represation [@rowan]. +For instance the largest and most advanced LSP implementation in Rust -- the Rust Analyzer [@rust-analyzer] -- has contributed many tools such as an LSP server interface [@lsp-server-interface] and a refactoring oriented syntax tree representation [@rowan]. Additionally, lots of smaller languages [@gluon, @slint, @mojom] implement Language Servers in Rust. Rust appears to be a viable choice even for languages that are not originally implemented in Rust, such as Nix [@nix, @rninx-lsp]. @@ -98,7 +97,7 @@ The developer in turn needs to be aware of the implications of stack or heap loc A different kind of safety is *type* safety which is an implication of Rust's strong type system and `trait` based generics. Type-safe languages such as Rust enforce explicit usage of data types for variables and function definitions. Type annotations ensure that methods and fields can be accessed as part of the compilation saving users from passing incompatible data to functions. -This eliminating a common runtime failures as seen in dynamic languages like Python or JavaScript. +This eliminates common runtime failures as they occur in dynamic languages like Python or JavaScript. Finally, as Rust leverages the LLVM infrastructure and requires no runtime, its performance rivals the traditional C languages. @@ -121,9 +120,9 @@ Incremental parsing, type-checking and analysis can still be implemented as a se Code analysis approaches as introduced in [@sec:considerable-dimensions] can have both *lazy* and *eager* qualities. Lazy solutions are generally more compatible with an incremental processing model, since these aim to minimizing the change induced computation. NLS prioritizes to optimize for efficient queries to a pre-processed data model. -Similar to the file processing argument in [@sec:file-pressng], it is assumed that Nickel project's size allows for efficient enough eager analysis prioritizing a more straight forward implementation over optimized performance. +Similar to the file processing argument in [@sec:file-processing], it is assumed that Nickel project's size allows for efficient enough eager analysis prioritizing a more straight forward implementation over optimized performance. -An example workflow of both lazy and eager processing is examplified in the sequence diagrams [@fig:nls-lazy-processing-seq] and [@fig:nls-eager-processing-seq] respectively. +An example workflow of both lazy and eager processing is exemplified in the sequence diagrams [@fig:nls-lazy-processing-seq] and [@fig:nls-eager-processing-seq] respectively. As mentioned in the previous paragraph, it is assumed that the performance gains of direct lookup after an "Ahead of time analysis", outperform the lazy analysis in terms of responsiveness. At the same time the initial analysis is expected to complete in reasonably short time for typical Nickel workflows. @@ -151,7 +150,7 @@ The core group labeled "Language Server", contains modules concerning both the s The analysis is base on an internal representation of source code called `Linearization` which can be in one of two states, namely `Building` or `Completed`. Either state manages an array of items (`LinearizationItems`) that are derived from AST nodes as well as various metadata facilitating the actions related to the state. The `LinearizationItem` is an abstract representation of code units represented by AST nodes or generated to support an AST derived item. -Items associate a certain span with its type, metadata, scope and a unique id, making it referable to. +Items associate a certain span with its type, metadata, scope and a unique ID, making it referable to. Additionally, `LinearizationItem`s are assigned a `TermKind` which distinguishes different functions of the item in the context of the linearization. The building of the linearization is abstracted in the `Linearizer` trait. Implementors of this trait convert AST nodes to linearization items and append said items to a shared linearization in the building state. @@ -285,31 +284,31 @@ Any other kind of structure, for instance, primitive values (Strings, numbers, b To separate the phases of the elaboration of the linearization in a type-safe way, the implementation is based on type-states[@typestate]. Type-states were chosen over an enumeration based approach for the additional flexibility they provide to build a generic interface. -First, type-states allow to implement separate utility methods for either state and enforce specific states on the type level. +First, type-states allow implementing separate utility methods for either state and enforce specific states on the type level. Second the `Linearization` struct provides a common context for all states like an enumeration, yet statically determining the Variant. Additionally, the `Linearizer` trait can be implemented for arbitrary `LinearizationState`s. This allows other LSP implementations to base on the same core while providing, for example, more information during the building phase. -The unit type `()` is a so called "zero sized type" [@zero-sized-type), it represents the absence of a value. +The unit type `()` is a so-called "zero sized type" [@zero-sized-type], it represents the absence of a value. NLS provides a `Linearizer` implementation based on unit types and empty method definitions. As a result, the memory footprint of this linearizer is effectively zero and most method calls will be removed as part of compile time optimizations. NLS defines two type-state variants according to the two phases of the linearization: `Building` and `Completed`. -building phase: +Building phase: ~ A linearization in the `Building` state is a linearization under construction. It is a list of `LinearizationItem`s of unresolved type, appended as they are created during a depth-first traversal of the AST. ~ During this phase, the `id` affected to a new item is always equal to its index in the array. ~ The Building state also records the definitions in scope of each item in a separate mapping. -post-processing phase: +Post-Processing phase: ~ Once fully built, a Building instance is post-processed to get a `Completed` linearization. ~ Although fundamentally still represented by an array, a completed linearization is optimized for search by positions (in the source file) thanks to sorting and the use of an auxiliary map from `id`s to the new index of items. ~ Additionally, missing edges in the usage graph have been created and the types of items are fully resolved in a completed linearization. Type definitions of the `Linearization` as well as its type-states `Building` and `Completed` are listed in [@lst:nickel-definition-lineatization;@lst:nls-definition-building-type;@lst:nls-definition-completed-type]. As hinted above, the `Linearization` struct acts as the overarching context only. -Therefore it is similar to an enumeration where the concrete variants are unknown but statically determined at compile time. +Therefore, it is similar to an enumeration where the concrete variants are unknown but statically determined at compile time. The `LinearizationState`s can be implemented according to the needs of the Linearizer implementation of the LSP server built on top of the core module. Note that only the former is defined as part of the Nickel libraries, the latter are specific implementations for NLS. @@ -462,7 +461,7 @@ Every item generated by the same linearizer is associated with the `ScopeId` of A scope branch during the traversal of the AST is indicated through the `Linearizer::scope()` method. The `Linearizer::scope()` method creates a new linearizer instance with a new `ScopeId`. A `ScopeId` in turn is a "scope path", a list of path elements where the prefix is equal to the parent scope's `ScopeId`. -[Listing @lst:nickel-scope-example] shows the scopes for a simple expression in Nickel explictly. +[Listing @lst:nickel-scope-example] shows the scopes for a simple expression in Nickel explicitly. @@ -779,14 +778,14 @@ The complete process looks as follows: 3. NLS then stores the `id` of the parent as well as the fields and the offsets of the corresponding items (`n-4` and `[(apiVersion, n-3), (containers, n-2), (metadata, n-1)]` respectively in the example [@fig:nls-lin-records]). 4. The `scope` method will be called in the same order as the record fields appear. Using this fact, the `scope` method moves the data stored for the next evaluated field into the freshly generated `Linearizer` -5. **(In the sub-scope)** The `Linearizer` associates the `RecordField` item with the (now known) `id` of the field's value. +5. **In the sub-scope** The `Linearizer` associates the `RecordField` item with the (now known) `id` of the field's value. The cached field data is invalidated such that this process only happens once for each field. ##### Variable Reference The usage of a variable is always expressed as a `Var` node that holds an identifier. -Registering a name usage is a multi-step process. +Registering a name usage is a multistep process. First, NLS tries to find the identifier in its scope-aware name registry. If the registry does not contain the identifier, NLS will linearize the node as `Unbound`. @@ -821,7 +820,7 @@ digraph G { node [shape=record] spline=false /* Entities */ - record_x [label="Record|\{y,z\}"] + record_x [label="Record|\{y, z\}"] field_y [label="Field|y"] field_z [label="Field|z"] @@ -1016,7 +1015,7 @@ If no field with that name is present or the parent points to a `Structure` or ` Nickel features type inference in order to relieve the programmer of the burden of writing a lot of redundant type annotations. In a typed block, the typechecker is able to guess the type of all the values, even when they are not explicitly annotated by the user. To do so, the typechecker generates constraints derived from inspecting the AST, and solve them along the way. -As a consequence, when a node is first encountered by NLS, its type is not necessarily known. +As a consequence, when a node is first encountered by NLS, its type is not necessarily known. There, the typechecker associate to the new node a so-called unification variable, which is a placeholder for a later resolved type. This unification variable is handed down to the `Linearizer`. @@ -1028,7 +1027,7 @@ Similar to runtime processing, NLS needs to resolve the final types separately. #### Resolving by position -As part of the post-processing step discussed in [@sec:post-processing], the `LinearizationItem`s in the completed linearization are reorderd by their occurence of the corresponding AST node in the source file. +As part of the post-processing step discussed in [@sec:post-processing], the `LinearizationItem`s in the completed linearization are reordered by their occurrence of the corresponding AST node in the source file. To find items in this list three preconditions have to hold: 1. Each element has a corresponding span in the source @@ -1143,7 +1142,7 @@ This section describes how NSL uses the linearization described in [@sec:lineari ### Server Interface -As mentioned in [@sec:programming-language] the Rust language ecosystem maintains several porjects supporting the development of LSP compliant servers. +As mentioned in [@sec:programming-language] the Rust language ecosystem maintains several projects supporting the development of LSP compliant servers. NLS is based on the `lsp-server` crate [@lsp-server-crate], a contribution by the Rust Analyzer, which promises long-term support and compliance with the latest LSP specification. Referring to [@fig:class-diagram], the `Server` module represents the main server binary.